Sie sind auf Seite 1von 42

File Organization and Index Structures

Instructor: Mr Mourad Benchikh


Text Books: Elmasri & Navathe Chap. 5+6 Ramakrishnan & Gehrke Chap. 7+8+9 Oracle9i documentation

Databases are stored physically as files of records typically stored on magnetic disks. This chapter will deal with the organization of databases in storage and the techniques for accessing them efficiently using various algorithms some of which require auxiliary data structures called Indexes. Emphasize on search process ; deletion, update, and insertion issues will not be covered.

First-Semester 1427-1428

Storage Medium
Primary Storage
Main memory, smaller but faster cache memories. Fast access to data but is of limited storage capacity Can be operated on directly by the CPU

Secondary Storage
Magnetic disks, optical disks and tapes Larger capacity and less cost Slower access to data Data cannot be processed directly by CPU

Magnetic Disks
Secondary storage. Transfer of data between main memory and disk takes place in units of disk blocks: blocks units of data transfer and data allcation. For read command: the block from disk is copied into the buffer For write command: the contents of the buffer are copied into the disk block

Records
Records
Data is usually stored in form of records. Each record consists of a collection of related data values or items. Records usually describe entities and their attributes. For example, an EMPLOYEE record represents and employee entity and each field value in the record specifies some attribute of that employee, such as NAME, BIRTHDATE, SALARY. A collection of field names and their corresponding data types constitutes a record type or record format. C-Notation:
struct employee{ char name[30]; char ssn[9]; int salary; int jobCode; char department[20]; };

File

File

A sequence of records. Usually all records in a file are of the same record type (Fixed-length records)

Variable-length records: some possible schemes:


The file records are of the same record type but one or more of the fields are of varying size. The file records are of the same record type but one or more of the fields may have multiple values for the individual records. The file records are of the same record type, but one or more of the fields are optional. The file include records of different types, each record will be preceded by a record type indication: if a relation exists between EMPLOYEE and DEPARTMENT, then their corresponding records are physically
contiguous (clustered) in order to minimize I/O operations

In general, a block contains one or more records specific to one file only:
Spanned organization: records can cross block boundaries Unspanned organization: records cant cross block boundaries.

Blocking Factor: Bfr = Number of records per block.

Allocating File Blocks


Contiguous Allocation
The file blocks are allocated to consecutive disk blocks. Reading the whole file is very fast (using double buffering) Expanding the file is difficult

Linked Allocation
Each file block contains a pointer to the next file block. Easy to expand but slow to read the whole file.

Combination
Allocates clusters of consecutive disk blocks and the clusters are linked.

Indexed allocation
One or more index blocks contain pointers to the actual file blocks.

Organization & Access Method


File Organization
The organization of the data of a file into records, blocks, and access structures The way records and blocks are placed on the storage medium and interlinked Example: Sorted File.

Access Method
Provide a group of operations that can be applied to a file :
Open, Find, Delete, Modify, Insert, Close,..etc.

It is possible to apply several access methods to a file organization. Some access methods can be applied only to files organized in certain ways:
Cannot apply an indexed access method to a file without an index.

Choose the file organization that efficiently implement the access methods needed by the application.

Heap Files (Unordered Files)


Heap File (Pile)
The simplest type of file organization. Records are placed in the file in the order in which they are inserted. New records are inserted at the end of the file. the address of the last block in file header Searching, using any search cdt, involves a linear search, an expensive procedure

Relative or Direct File


Relative or (Direct File)
Unordered fixed-length records using unspanned blocks and contiguous allocation We can then access any record by its position in the file. The ith record is located in block i/Bfr . Helpful organization to locate a record by its position but not helpful to locate a record based on a search condition.

Sorted Files
Organization that physically order the records of a file on disk based on the values of one of the their fields called the ordering field. If the ordering field is also a key field of the file then the field is called the ordering key for the file. Figure 5.9 shows an ordered file with NAME as the ordering key field (assuming that employees have distinct names). Reading the records in order of the ordering key values becomes extremely efficient, because no sorting is required. Using a search condition based on the value of an ordering key field results in faster access when the binary search technique is used. Ordering does not provide any advantage for random or ordered access of the records based on values for the other non-ordering fields of the file. In this case, do a linear search for random access

Binary Search
Algorithm 5.1 Binary search on an ordering key of a disk file L= 1; U = b; /* b is the number of file blocks*/ while(U >= L) do begin I = (L + U) div 2; read block I of the file into the buffer; if K < (ordering key field value of the first record in block I) then U = I-1 else if K > (ordering key field value of the last record in block I) then L = I+1 else if the record with ordering key field value = K is in the buffer then goto found else goto notFound endif; goto notFound;

If b is the number of a sorted files block, then in average log2(b) is the number of blocks to search using a binary search.

Hashing Organization
Provides very fast access to records on certain search conditions. The search condition must be an equality condition on a hash field of the file. In most cases, the hash field is also a key field of the file (hash key)

Hashing
To provide a function h, called a hash function, that is applied to the hash field value of a record and yields the address of the disk block in which the record is stored. A search for the record within the block can be carried out in a main memory buffer.

Internal files

Internal Hashing

Hashing is also used as an internal search structure within a program whenever a group of records accessed exclusively by using the value of one field. Hashing is implemented as a hash table through the use of an array of records. Suppose that the array index range is from 0 to N-1; then we have N slots whose addresses correspond to the array indexes. We choose a hash function that transforms the hash field value into an integer between 0 and N-1. One common hash function is the h(K) = K mod M function, this value is used for the record address.

Internal Hashing
Key

0 1 N record slots

r records

H(K) K mod N N-1 In general, r N

Hashing Function
Key is student id (six digits) Assume we have N = 100,000 record slots numbered 00000 99999 H(K): student_id mod 100000
085768 085768 mod 100000 = 85768 134281 134281 mod 100000 = 34281 101004 101004 mod 100000 = 1004 100000 100000 mod 100000 = 0 601004 601004 mod 100000 = 1004 (collision)

Collision
Collision
A collision occurs when the hash field value of a record that is being inserted hashes to an address that already contains a different record. The process of finding another position (after collision) is called collision resolution. Methods for collision resolution: Open addressing Chaining Multiple hashing

Hashing for disk files is called external hashing. The target address space is made of buckets, each of which holds multiple records.
A bucket is either one disk block or a cluster of contiguous blocks.

External Hashing

The hashing function maps a the indexing fields value into a relative

bucket number. A table maintained in the file header converts the bucket number into the corresponding disk block address.

Dynamic Files & Hashing


One problem with hashing so far is that the address space N is fixed.
Extendible hashing
If the number of records grows beyond original size, the file must be reorganized

How to handle dynamic files better?


Dynamic hashing Linear hashing

Indexing
Index File (same idea as textbook index) : auxiliary structure designed to speed up access to desired data. Indexing field: field on which the index file is defined. Index file stores each value of the index field along with pointer: pointer(s) to block(s) that contain record(s) with that field value or pointer to the record with that field value: <Indexing Field, Pointer> In oracle, the pointer is called RowID which tells the DBMS where the row (record) is located (by file, block within that
file, and row within the block).

To find a record in the data file based on a certain selection criterion on an indexing field, we initially access the index file, which will allow the access of the record on the data file. Index file much smaller than the data file => searching will be fast. Indexing important for file systems and DBMSs: Databases eventually map data to file structures on disk : Records of each relation may be stored in a separate file. Records of several different relations can be stored in the same file (i.e. physically clustered file organization : to minimize I/O) In DBMSs, the query processor accesses the index structures for processing a query (e.g., indexed join called also single-loop join)

Types of Indexes
Indexes on ordered vs. unordered files Dense vs. non-dense (i.e. sparse) indexes
- Dense: An entry in the index file for each record of the data file. - Sparse: only some of the data records are represented in the index, often one index entry per block of the data file.

Primary indexes vs. secondary indexes Ordered Indexes Hash indexes


- Ordered Indexes: indexing fields stored in sorted order. - Hash indexes: indexing fields stored using a hash function.

Single-level vs. multi-level


single-level index is an ordered file and is searched using binary search. multi-level ones are tree-structured that improve the search and require a more elaborate search algorithm.

Index on a single indexing field Index on multiple indexing fields (i.e.Composite Index).
If a certain combination of fields is used frequently, set an index on multiple fields.

Single-Level Ordered Index :

Primary Index

Physical records may be kept ordered on the primary key The index is ordered but only one entry record for each block (non-dense). Each index entry has the value of the primary key field for the first record (or the last record) in a block and a pointer to that block. Reduces the index requirements
fewer index entries than records in the file binary search over index can be faster (fewer index block to read than ordered? file approach).

Single-Level Ordered Index: Primary


10567 11589 15973 J. Doe T. Allen M. Smith
B. Zimmer T. Atkins J. Wong S. Allen P. Wright

Index
3 2 3
1 4 3 4 2

CS BA CS
BS ME BA CS ME

15973 75623 96256

29579 34596 75623 84920 96256

Single-Level Ordered Index:

Clustering Index

Records physically ordered by a non-key field Same general structure as ordered file index
<Clustering field, Block pointer>

One entry in the index for each distinct value of the clustering field with a pointer to the first block in the data file that has a record with that value for its clustering field.
Possibly many records for one index entry (non-dense)

Sometimes entire blocks reserved for each distinct clustering field value

Single-Level Ordered Index: Clustering


11589 75623 29579
BA BS CS ME

Index
2 3 1 3 3 4 4 2

T. Allen J. Wong B. Zimmer J. Doe M. Smith S. Allen T. Atkins P. Wright

BA BA BS CS CS CS ME ME

10567 15973 84920 34596 96256

Single-Level Ordered Index: Secondary


Ordered file with two fields.
Non-ordering field (indexing field) Block pointer or a record pointer

Indexes

There can be several secondary indexes for the same file but only one primary index. Dense Secondary Index (non-ordering key field). See Figure 6.4. Several options for a secondary index on a non-key field: Option1:Include several index entries with the same value of the indexing field -one for each record- dense index. Option2: More commonly used, have a single entry for each index value but to create an extra level of indirection to handle the multiple pointers. See figure 6.5 Etc.

Types of Single-Level Ordered Indexes


Ordering Field Key Field Non-key Field Primary Index Clustering Index Non-ordering Field Secondary Index (key) Secondary Index (non-key)

Number of first-level Index entries Primary Clustering Secondary (Key) Secondary (nonkey) Number of blocks in data file Number of distinct index field values Number of records in a data file Number of distinct index field values (Option 2 )

Dense or non-dense Non-dense Non-dense Dense Non-Dense

Static Multilevel Indexes


Multilevel index considers the index file (first level) as an ordered file with a distinct value of each value of the indexing field. The primary index to first level is called second level of the multilevel index. Hence multilevel index with r1 first-level entries will have approximately t levels, t = logfo r1
. Fanout : fo = Nb records per First level block.

Indexed Sequential File: commonly used file organization


The data file is an ordered file with a multilevel primary index on its ordering key field. See Figure 6.6

Multilevel index speeds record search. Problems of index deletion & insertion which may require reorganization of the index: when the data file is modified, the index must be updated.

Dynamic Multilevel Indexes


Retain the benefits of using multilevel indexing while reducing index insertion & deletion problems: automatically reorganizes itself with small, local changes in the face of insertions and deletions. Leave some space in each of its blocks for inserting new entries. Dynamic multilevel indexes are implemented as B-trees and often as B+-trees.
B-tree: . allow an indexing field value to appear only once at some level in the tree ;
. pointer to data at each node.

B+-tree: . pointers to data are stored only at the leaf nodes of the tree ;
. Leaf nodes have an entry for every indexing field value. . The leaf nodes are usually linked together to provide ordered access on the indexing field to the records. . All the leaf nodes of the tree are at the same depth: retrieval of any record takes the same time. . In Oracle B+-tree is called B*-tree??? see next figure -

Other types of indexes


-Other indexing techniques other than tree-based techniques are: hashed-based techniques:
-Hashing can be used not only for file organization, but also for index-structure creation: a hash index organizes the indexing fields, with their associated pointers, into a hash file structure.

3-levels B+-index

Files of mixed records:Clusters in Oracle 9i


A cluster is made up of a group of tables that share the same data blocks,
These tables have been grouped together because they share common columns and are often used together. For example, the EMP and DEPT tables share the DEPTNO column called cluster key-. When you cluster the EMP and DEPT tables clustered tables-, Oracle physically stores all rows for each department from both the EMP and DEPT tables in the same data blocks. Advantages:
Access time improves for joins of clustered tables The cluster key is the column, or group of columns, that the clustered tables have in common. Each cluster key value is stored only once each in the cluster and the cluster index, no matter how many rows of different tables contain the value. Therefore, less storage might be required to store related table and index data in a cluster than is necessary in non-clustered table format. For example, notice how each cluster key (each DEPTNO) is stored just once for many rows that contain the same value in both the EMP and DEPT tables. see next figure-

A hash cluster : for performance access


Oracle physically stores the rows of a table in a hash cluster and retrieves them according to the results of a hash function. a way to improve the performance of data retrieval

Clusters in Oracle 9i (contd)

Clusters in Oracle 9i (contd)


Steps
Create the cluster
CREATE CLUSTER emp_dept (deptno NUMBER(3)) PCTUSED 80 PCTFREE 5 SIZE 600 TABLESPACE users STORAGE (INITIAL 200k NEXT 300K MINEXTENTS 2 MAXEXTENTS 20 PCTINCREASE 33);

Creating Clustered Tables


CREATE TABLE dept ( deptno NUMBER(3) PRIMARY KEY, . . . ) CLUSTER emp_dept (deptno); CREATE TABLE emp ( empno NUMBER(5) PRIMARY KEY, ename VARCHAR2(15) NOT NULL, . . . deptno NUMBER(3) REFERENCES dept) CLUSTER emp_dept (deptno);

Creating the Cluster Indexe: A cluster index must be created before


any rows can be inserted into any clustered table CREATE INDEX emp_dept_index ON CLUSTER emp_dept INITRANS 2 MAXTRANS 5 TABLESPACE users STORAGE (INITIAL 50K NEXT 50K MINEXTENTS 2 MAXEXTENTS 10 PCTINCREASE 33) PCTFREE 5;

SQL, Oracle9i and Indexes


SQL-92 doesnt include statement for index structure, and so there are some variation in index-related commands cross different DBMSs.

When a table is created, it is desirable to add indexes on certain attributes


Especially the primary key

The existence of indexes can greatly speed query processing


Consider selecting a subset of tuples from a relation based on the value of the key field or a join like:
R R.ATTR1>S.ATTR2 S

Indexes can be created implicitly by the DBMS at table creation time


E.g. on any attribute designated as a primary key Oracle automatically creates an index when UNIQUE or PRIMARY KEY constraints clause is specified in a Create Table.

Indexes may also be created explicitly with SQL DDL commands Consider the following Oracle Statements:
When you create an index, Oracle fetches and sorts the columns to be indexed, and stores the RowId along with the index value for each row. Then Oracle loads the index from the bottom up. CREATE INDEX emp_ename ON emp(ename); Oracle sorts the EMP table on the
ENAME column. It then loads the index with the ENAME and corresponding RowId values in this sorted order. When it uses the index, Oracle does a quick search through the sorted ENAME values and then uses the associated RowId values to locate the rows having the sought ENAME value.

SQL, Oracle9i and Indexes

In Oracle you can create more than one index using the same columns

provided that you specify distinctly different combinations of the columns In Oracle you cannot create an index that references only one column in a table if another such index already exists.

SQL, Oracle9i and Indexes (contd)


Consider the following Oracle Statements (contd):
CREATE UNIQUE INDEX pkIdx ON Staff(SIN) Creates an index on the field SIN in the table Staff The UNIQUE keyword ensures the uniqueness of SIN values in the table (and index). This uniqueness is enforced even when adding an index to a table with existing data. If the SIN field is non-unique then the index creation fails. If the UNIQUE keyword is not used, then two rows of the table can have the same value.
Nonunique indexes are sorted by the index key and rowid.

Composite index is an index that you create on multiple columns in a table CREATE INDEX CInd ON Student(Fname, Lname); Composite indexes can speed retrieval of data for SELECT statements in which the WHERE clause references all or the leading portion of the columns in the composite index - DROP INDEX clIdx; -Drops the index clIdx-.

Oracle and indexes


Table indexes:

SQL, Oracle9i and Indexes (contd)

Store each field value repeatedly with each stored RowId. Oracle uses B*-tree (B+-tree ???) as internal structure of a table index.

Bitmap indexes:
Rather than a B*-tree, bitmap indexes store the RowIds associated with a field value as a bitmap. Each bit in the bitmap corresponds to a possible RowId, and if the bit is set, it means that the row with the corresponding RowId contains the field value.
A mapping function converts the bit position to an actual RowId, so the bitmap index provides the same functionality as a regular index even though it uses a different representation internally. Among the advantages of using bitmap indexes: speed searches in case where low cardinality columns are used - columns in which the number of distinct values is small compared to the number of rows in the table-.

Cluster indexes:
A cluster index is an index defined specifically for a cluster. A cluster index contains an entry for each cluster key value. To locate a row in a cluster
the cluster index is used to find the cluster key value, which points to the data block associated with that cluster key value.

- create bitmap index Emp_M_S on Employee(Marital_Status);


- create bitmap index Emp_R on Employee(Region);

SQL, Oracle9i and Indexes (contd)

Oracle and indexes (contd)


Function-Based indexes

SQL, Oracle9i and Indexes (contd)

You can create indexes based on Oracle Functions.


You can create such an index -Create index name_emp on emp(upper(name)). Can facilitates processing the query: select * from emp where upper(ename)=ALI.

- Index-Organized table
The entire table is stored within an index structure. Create table employee (ID char(9) primary key, name varchar2(20)) organization index; Instead of maintaining two separate storages for the table and the B*-tree index, the database system only maintains a single B*-tree index . The tables data is sorted by the tables primary key.-primary key mandatoryEach B*-tree index leaf entry contains <primary_key_value, non_primary_key_column_values> instead of <key, ROWID  Advantages Because data rows are stored in the index, index-organized tables provide faster key-based access
to table data for queries that involve exact match or range search, or both. The storage requirements are reduced because key columns are not duplicated as they are in an ordinary table and its index. Also, no storage for the RowID is needed.

Index-Organized Table

Oracle DB has logical and physical structures.

Overview of Oracle9i DB structure and Space management

Such separation allow logical structures to be defined identically across different hardware and operating system platforms.

Logical DB structures represent the components see in an Oracle DB. Consist of:
Tablespaces: The DB is divided logically divided into units called tablespaces regrouping together related logical structures like all applications objects. SYSTEM tablespace is the minimum tablesapce requirement at DB creation. It always contains the Data Dictionary.. Blocks: a block is the smallest unit of storage in Oracle. Extents: an extent is a grouping of contiguous blocks. Segments: a segment is a set of extents allocated for logical structures (as schemas). There are four segment types : data segments (store table (cluster) data), index segments (store index data), temporary segments (for temporary work: sort,etc.), undo segments (store undo information) Schema objects : are the logical structures referring to the DBs data: tables, views, indexes, cluster, etc.

Overview of Oracle9i DB structure and Space management


Physical DB structures represents the method of internal storage. Consist of:
Datafiles: contain all the DB data. An Oracle DB should have one or more data files. Each data file is associated with only one tablespace. A tablespace can consists of more than one data file.
When a user wants to read data in a table and the requested information is not in the memory cache of the DB, it is read from the appropriate datafiles and stored in memory. Modified or new data is not necessary written to a datafile immediately. It is pooled in memory and written to the appropriate datafiles all at once as determined by the DBW).

Redo log files: record all changes made to data. These files are critical for DB operation and recovery from failure. Two or more redo log files are necessary. A redo log is made of redo entries (I.e. redo records). Control files: maintain information about the physical structure of the DB (ex. name and location of every data file and redo log file, etc.). Every Oracle DB has at least one control file.

Das könnte Ihnen auch gefallen