Beruflich Dokumente
Kultur Dokumente
Databases are stored physically as files of records typically stored on magnetic disks. This chapter will deal with the organization of databases in storage and the techniques for accessing them efficiently using various algorithms some of which require auxiliary data structures called Indexes. Emphasize on search process ; deletion, update, and insertion issues will not be covered.
First-Semester 1427-1428
Storage Medium
Primary Storage
Main memory, smaller but faster cache memories. Fast access to data but is of limited storage capacity Can be operated on directly by the CPU
Secondary Storage
Magnetic disks, optical disks and tapes Larger capacity and less cost Slower access to data Data cannot be processed directly by CPU
Magnetic Disks
Secondary storage. Transfer of data between main memory and disk takes place in units of disk blocks: blocks units of data transfer and data allcation. For read command: the block from disk is copied into the buffer For write command: the contents of the buffer are copied into the disk block
Records
Records
Data is usually stored in form of records. Each record consists of a collection of related data values or items. Records usually describe entities and their attributes. For example, an EMPLOYEE record represents and employee entity and each field value in the record specifies some attribute of that employee, such as NAME, BIRTHDATE, SALARY. A collection of field names and their corresponding data types constitutes a record type or record format. C-Notation:
struct employee{ char name[30]; char ssn[9]; int salary; int jobCode; char department[20]; };
File
File
A sequence of records. Usually all records in a file are of the same record type (Fixed-length records)
In general, a block contains one or more records specific to one file only:
Spanned organization: records can cross block boundaries Unspanned organization: records cant cross block boundaries.
Linked Allocation
Each file block contains a pointer to the next file block. Easy to expand but slow to read the whole file.
Combination
Allocates clusters of consecutive disk blocks and the clusters are linked.
Indexed allocation
One or more index blocks contain pointers to the actual file blocks.
Access Method
Provide a group of operations that can be applied to a file :
Open, Find, Delete, Modify, Insert, Close,..etc.
It is possible to apply several access methods to a file organization. Some access methods can be applied only to files organized in certain ways:
Cannot apply an indexed access method to a file without an index.
Choose the file organization that efficiently implement the access methods needed by the application.
Sorted Files
Organization that physically order the records of a file on disk based on the values of one of the their fields called the ordering field. If the ordering field is also a key field of the file then the field is called the ordering key for the file. Figure 5.9 shows an ordered file with NAME as the ordering key field (assuming that employees have distinct names). Reading the records in order of the ordering key values becomes extremely efficient, because no sorting is required. Using a search condition based on the value of an ordering key field results in faster access when the binary search technique is used. Ordering does not provide any advantage for random or ordered access of the records based on values for the other non-ordering fields of the file. In this case, do a linear search for random access
Binary Search
Algorithm 5.1 Binary search on an ordering key of a disk file L= 1; U = b; /* b is the number of file blocks*/ while(U >= L) do begin I = (L + U) div 2; read block I of the file into the buffer; if K < (ordering key field value of the first record in block I) then U = I-1 else if K > (ordering key field value of the last record in block I) then L = I+1 else if the record with ordering key field value = K is in the buffer then goto found else goto notFound endif; goto notFound;
If b is the number of a sorted files block, then in average log2(b) is the number of blocks to search using a binary search.
Hashing Organization
Provides very fast access to records on certain search conditions. The search condition must be an equality condition on a hash field of the file. In most cases, the hash field is also a key field of the file (hash key)
Hashing
To provide a function h, called a hash function, that is applied to the hash field value of a record and yields the address of the disk block in which the record is stored. A search for the record within the block can be carried out in a main memory buffer.
Internal files
Internal Hashing
Hashing is also used as an internal search structure within a program whenever a group of records accessed exclusively by using the value of one field. Hashing is implemented as a hash table through the use of an array of records. Suppose that the array index range is from 0 to N-1; then we have N slots whose addresses correspond to the array indexes. We choose a hash function that transforms the hash field value into an integer between 0 and N-1. One common hash function is the h(K) = K mod M function, this value is used for the record address.
Internal Hashing
Key
0 1 N record slots
r records
Hashing Function
Key is student id (six digits) Assume we have N = 100,000 record slots numbered 00000 99999 H(K): student_id mod 100000
085768 085768 mod 100000 = 85768 134281 134281 mod 100000 = 34281 101004 101004 mod 100000 = 1004 100000 100000 mod 100000 = 0 601004 601004 mod 100000 = 1004 (collision)
Collision
Collision
A collision occurs when the hash field value of a record that is being inserted hashes to an address that already contains a different record. The process of finding another position (after collision) is called collision resolution. Methods for collision resolution: Open addressing Chaining Multiple hashing
Hashing for disk files is called external hashing. The target address space is made of buckets, each of which holds multiple records.
A bucket is either one disk block or a cluster of contiguous blocks.
External Hashing
The hashing function maps a the indexing fields value into a relative
bucket number. A table maintained in the file header converts the bucket number into the corresponding disk block address.
Indexing
Index File (same idea as textbook index) : auxiliary structure designed to speed up access to desired data. Indexing field: field on which the index file is defined. Index file stores each value of the index field along with pointer: pointer(s) to block(s) that contain record(s) with that field value or pointer to the record with that field value: <Indexing Field, Pointer> In oracle, the pointer is called RowID which tells the DBMS where the row (record) is located (by file, block within that
file, and row within the block).
To find a record in the data file based on a certain selection criterion on an indexing field, we initially access the index file, which will allow the access of the record on the data file. Index file much smaller than the data file => searching will be fast. Indexing important for file systems and DBMSs: Databases eventually map data to file structures on disk : Records of each relation may be stored in a separate file. Records of several different relations can be stored in the same file (i.e. physically clustered file organization : to minimize I/O) In DBMSs, the query processor accesses the index structures for processing a query (e.g., indexed join called also single-loop join)
Types of Indexes
Indexes on ordered vs. unordered files Dense vs. non-dense (i.e. sparse) indexes
- Dense: An entry in the index file for each record of the data file. - Sparse: only some of the data records are represented in the index, often one index entry per block of the data file.
Index on a single indexing field Index on multiple indexing fields (i.e.Composite Index).
If a certain combination of fields is used frequently, set an index on multiple fields.
Primary Index
Physical records may be kept ordered on the primary key The index is ordered but only one entry record for each block (non-dense). Each index entry has the value of the primary key field for the first record (or the last record) in a block and a pointer to that block. Reduces the index requirements
fewer index entries than records in the file binary search over index can be faster (fewer index block to read than ordered? file approach).
Index
3 2 3
1 4 3 4 2
CS BA CS
BS ME BA CS ME
Clustering Index
Records physically ordered by a non-key field Same general structure as ordered file index
<Clustering field, Block pointer>
One entry in the index for each distinct value of the clustering field with a pointer to the first block in the data file that has a record with that value for its clustering field.
Possibly many records for one index entry (non-dense)
Sometimes entire blocks reserved for each distinct clustering field value
Index
2 3 1 3 3 4 4 2
BA BA BS CS CS CS ME ME
Indexes
There can be several secondary indexes for the same file but only one primary index. Dense Secondary Index (non-ordering key field). See Figure 6.4. Several options for a secondary index on a non-key field: Option1:Include several index entries with the same value of the indexing field -one for each record- dense index. Option2: More commonly used, have a single entry for each index value but to create an extra level of indirection to handle the multiple pointers. See figure 6.5 Etc.
Number of first-level Index entries Primary Clustering Secondary (Key) Secondary (nonkey) Number of blocks in data file Number of distinct index field values Number of records in a data file Number of distinct index field values (Option 2 )
Multilevel index speeds record search. Problems of index deletion & insertion which may require reorganization of the index: when the data file is modified, the index must be updated.
B+-tree: . pointers to data are stored only at the leaf nodes of the tree ;
. Leaf nodes have an entry for every indexing field value. . The leaf nodes are usually linked together to provide ordered access on the indexing field to the records. . All the leaf nodes of the tree are at the same depth: retrieval of any record takes the same time. . In Oracle B+-tree is called B*-tree??? see next figure -
3-levels B+-index
Indexes may also be created explicitly with SQL DDL commands Consider the following Oracle Statements:
When you create an index, Oracle fetches and sorts the columns to be indexed, and stores the RowId along with the index value for each row. Then Oracle loads the index from the bottom up. CREATE INDEX emp_ename ON emp(ename); Oracle sorts the EMP table on the
ENAME column. It then loads the index with the ENAME and corresponding RowId values in this sorted order. When it uses the index, Oracle does a quick search through the sorted ENAME values and then uses the associated RowId values to locate the rows having the sought ENAME value.
In Oracle you can create more than one index using the same columns
provided that you specify distinctly different combinations of the columns In Oracle you cannot create an index that references only one column in a table if another such index already exists.
Composite index is an index that you create on multiple columns in a table CREATE INDEX CInd ON Student(Fname, Lname); Composite indexes can speed retrieval of data for SELECT statements in which the WHERE clause references all or the leading portion of the columns in the composite index - DROP INDEX clIdx; -Drops the index clIdx-.
Store each field value repeatedly with each stored RowId. Oracle uses B*-tree (B+-tree ???) as internal structure of a table index.
Bitmap indexes:
Rather than a B*-tree, bitmap indexes store the RowIds associated with a field value as a bitmap. Each bit in the bitmap corresponds to a possible RowId, and if the bit is set, it means that the row with the corresponding RowId contains the field value.
A mapping function converts the bit position to an actual RowId, so the bitmap index provides the same functionality as a regular index even though it uses a different representation internally. Among the advantages of using bitmap indexes: speed searches in case where low cardinality columns are used - columns in which the number of distinct values is small compared to the number of rows in the table-.
Cluster indexes:
A cluster index is an index defined specifically for a cluster. A cluster index contains an entry for each cluster key value. To locate a row in a cluster
the cluster index is used to find the cluster key value, which points to the data block associated with that cluster key value.
- Index-Organized table
The entire table is stored within an index structure. Create table employee (ID char(9) primary key, name varchar2(20)) organization index; Instead of maintaining two separate storages for the table and the B*-tree index, the database system only maintains a single B*-tree index . The tables data is sorted by the tables primary key.-primary key mandatoryEach B*-tree index leaf entry contains <primary_key_value, non_primary_key_column_values> instead of <key, ROWID Advantages Because data rows are stored in the index, index-organized tables provide faster key-based access
to table data for queries that involve exact match or range search, or both. The storage requirements are reduced because key columns are not duplicated as they are in an ordinary table and its index. Also, no storage for the RowID is needed.
Index-Organized Table
Such separation allow logical structures to be defined identically across different hardware and operating system platforms.
Logical DB structures represent the components see in an Oracle DB. Consist of:
Tablespaces: The DB is divided logically divided into units called tablespaces regrouping together related logical structures like all applications objects. SYSTEM tablespace is the minimum tablesapce requirement at DB creation. It always contains the Data Dictionary.. Blocks: a block is the smallest unit of storage in Oracle. Extents: an extent is a grouping of contiguous blocks. Segments: a segment is a set of extents allocated for logical structures (as schemas). There are four segment types : data segments (store table (cluster) data), index segments (store index data), temporary segments (for temporary work: sort,etc.), undo segments (store undo information) Schema objects : are the logical structures referring to the DBs data: tables, views, indexes, cluster, etc.
Redo log files: record all changes made to data. These files are critical for DB operation and recovery from failure. Two or more redo log files are necessary. A redo log is made of redo entries (I.e. redo records). Control files: maintain information about the physical structure of the DB (ex. name and location of every data file and redo log file, etc.). Every Oracle DB has at least one control file.