Sie sind auf Seite 1von 11

File Organization and Storage Structures

o Storage of data

File Organization and Storage Structures

Prim ary Storage = Main Memory


Fast Volatile Expensive

Secondary Storage = Files in disks or tapes


Non-Volatile

> Secondary Storage is preferred for storing data


File Organization and Storage Structures - 1 File Organization and Storage Structures - 2

Basic Concepts
o Information are stored in data files o Each file is a sequence of records o Each record consists of one or more fields

Logical Record Vs Physical Record


o Logical record Eg. The record of a staff (SG37). A record o Physical record

Sno SL21 SG37 SG14

Lname White Beech Ford

Position Manager Snr Asst Deputy

NIN WK440211B WL432514C WL220658D

Bno B5 B3

The unit of transfer between disk and primary storage. A page, A block Generally, a physical record consists of more than one logical record

B B3

File Organization and Storage Structures - 3

File Organization and Storage Structures - 4

CS3462 Introduction to Database Systems Helena Wong, 2001

Logical Record Vs Physical Record

File Organization & Access Method

o File Organiza tion means the physical arrangement Sno SL21 SG37 SG14 SA9 SG5 SL41
Lname Position NIN White Beech Ford Howe Brand Lee Bno Page

of data in a file into records and pages on secondary storage Eg. Ordered files, indexed sequential file etc. o Access Method means the steps involved in storing and retrieving records from a file.

Manager WK440211B B5 Snr Asst WL432514C B3 Deputy WL220658D B3 2 Assistant WM532187D B7 Manager WK588932E B3 Assistant WA290573K B5 1

Eg. Using an indexed access method to retrieve a record from an indexed sequntial file.

File Organization and Storage Structures - 5

File Organization and Storage Structures - 6

Heap Files
o Heap files are files of unordered records. o Quick insertion (no particular ordering) When a new record is created, it is put in the last page of the file if there is sufficient space. Otherwise a new page is added to the file. o Slow retrieval (only allow linear search) reading pages from the file until a required record is found. o To delete a record, the record is marked as deleted. Space is reclaimed during periodical reoganization.

Ordered Files
o Ordered Files: Records are sorted on field(s) => Ke y o Allow Binary Searching Suppose one page stores one record. To search for SG37, search the middle page (6/2 = 3) first. W e find that SG37 does not exist in this page(SG14). Then, since SG37 is greater than SG14, we search the middle page within the lower half of the file, and so on.

File Organization and Storage Structures - 7

File Organization and Storage Structures - 8

Ordered Files
o Inserting a record If the appropriate page is full, may have to reorganize the whole file => Time consuming Solution: use a temporary unsorted file (transaction file). Merge to the sorted file periodically. o Rarely used unless come with an index => Indexed Sequential File

Direct Files
o Direct Files are also called Hash Files or Random Files o No need to write records sequen tially o Use a hash function to calculate the number of the page (bucket) which a record should be located o Eg., use the division-remainder calculation method that, bucket_no = Record_key mod 3

o Both Heap Files and Ordered Files are also called Sequential Files.

File Organization and Storage Structures - 9

File Organization and Storage Structures - 10

Direct Files

Direct Files
Open Add ressing o Upon a collision, the system performs a linear search to find the first available slot. o When last bucket has been searched, starts from the first bucket. o SL41 will be inserted to:

o Problem: If a new record SG41 is created, which bucket to go? o Collision Management Open addressing, Unchained o verflow, Ch ain ed overflow , Mu ltip le h ash ing

Bucket 1

File Organization and Storage Structures - 11

File Organization and Storage Structures - 12

Direct Files
Unchained Overflow o An overflow area is maintained for collisions. o SL41 will be inserted to:

Direct Files
Chained Overflow o Each bucket has a synonym pointe r o Value of the synonym pointer: Zero: no collision occurred Non-zero: the overflow bucket used

Bucket 3

File Organization and Storage Structures - 13

File Organization and Storage Structures - 14

Direct Files
Multiple Hashing
o Upon collision, apply a second hashing function to produce a new hash address in an overflow area.

Direct Files
Limitation (of Hashing)
Inappropriate for some retrievals: based on pattern matching eg. Find all students with ID like 98xxxxxx. Involving ranges of values eg. Find all students from 50100000 to 50199999. Based on a field other than the hash field

File Organization and Storage Structures - 15

File Organization and Storage Structures - 16

Indexes
Index: A data structure that allows particular records in a file to be located more quickly ~ Index in a book

Indexes
TERMINOLOGY Data file: a file containing the logical records Index file: a file containing the index records Indexing field: the field used to order the index records in the index file Key: One or more fields which can uniquely identify a record (eg. No 2 students have the same student ID).

An index can be sparse or dense: Sparse: record for only some of the search key values (eg. Staff Ids: CS001, EE001, MA001). Applicable to ordered data files only. Dense: record for every search key value. (eg. Staff Ids: CS001, CS002, .. CS089, EE001, EE002, ..)

File Organization and Storage Structures - 17

File Organization and Storage Structures - 18

Indexes
TYPES OF INDEXES Primary Index: An index ordered in the same way as the data file, which is sequentially ordered according to a key. (The indexing field is equal to this key.) Secondary Index: An index that is defined on a nonordering field of the data file. (The indexing field need not contain unique values). > A data file can associate with at most one primary index plus several secondary indexes.

Indexed Sequential Files


What are Indexed Sequential Files? = A sorted data file with a primary index Advan tage of an Indexed Sequential File Allows both sequential processing and individual record retrieval through the index. Structure o f an Indexed Sequential File o A primary storage area o A separate index or indexes o An overflow area

File Organization and Storage Structures - 19

File Organization and Storage Structures - 20

B+-Trees
In B+-T ree, data or indexes are stored in a hierarchy of nodes

B+-Trees
o B => Balanced o Consistent access time (for each access, same number of nodes are searched) TERMINOLOGY Degree (Order) : The maximum number of children allowed per parent. Depth : The maximum number of levels between the root node and a leaf node in the tree.

Point to data
File Organization and Storage Structures - 21 File Organization and Storage Structures - 22

B+-Trees
In practice, each node in the tree is actually a page, so we can store many pointers and keys. Eg. For a page size of 4KB, the B+-Tree can be of order 512. Access time depends more ofen upon depth than on breadth => Shallow trees are preferred. RULES o The root (if not a leaf node) must have at least 2 children o For a tree of order n, each node (except root and leaf) must have between n/2 and n pointers and children. If n/2 is not an integer, the result is rounded up.
File Organization and Storage Structures - 23

B+-Trees
RULES (Contd): o For a tree or order n, the number of key values in a leaf node must be between (n-1)/2 and (n-1) pointers and children. If (n-1)/2 is not an integer, the result is rounded up. o The number of key values contained in a nonleaf node is 1 less than the number of pointers. o The tree must always be balanced: every path from the root node to a leaf must have the same length. o Leaf nodes are linked in order of key values.

File Organization and Storage Structures - 24

B+-Trees
Balancing can be costly to maintain.

B+-Trees
Example:

Example: Adding SG14 Adding SA9

File Organization and Storage Structures - 25

File Organization and Storage Structures - 26

B+-Trees
Example: Adding SA9

Summary
o o o o Basic concepts (Files, Records, Fields) Primary storage vs secondary storage Logical record vs physical record File Organization (and access methods) Heap files Ordered Files (Binary Search) Direct Files (Hashing) Indexes Indexed Sequential Files B+- Trees

File Organization and Storage Structures - 27

File Organization and Storage Structures - 28

Das könnte Ihnen auch gefallen