Beruflich Dokumente
Kultur Dokumente
Chapter 5:
5.21 Discuss the techniques for allowing a hash file to expand and shrink dynamically. What are the
advantages and disadvantages of each?
Extendible hashing:
A type of directory
Linear hashing:
allows a hash file to expand and shrink its number of buckets dynamically without a directory file.
5.22 What are mixed files used for? What are other types of primary file organizations?
A mixed file is a file which contains records of different record types. This would be used if related records
of different types were clustered (placed together) on disk blocks. For example, the GRADE_REPORT
records of a particular student may be placed following that STUDENT’s record.
Three primary file organizations: unordered, ordered, hashed. Or pile, sorted, hashed, indexed (B-tree)
5.27 A parts file with Part# as hash key includes records with the following Part# values: 2369, 3760, 4692,
4871, 5659, 1821, 1074, 7115, 1620, 2426, 3943, 4750, 6975, 4981, 9208. The file uses eight buckets,
numbered 0 to 7. Each bucket is one disk block and holds two records. Load these records into the file in
the given order, using the hash function h(K) = K mod 8. Calculate the average number of block accesses
for a random retrieval on Part#.
2369 1
3760 0
4692 4
4871 7
5659 3
1821 5
1074 2
7115 3
1620 4
2428 4 Overflow
3943 7
4750 6
6975 7 Overflow
4981 5
9208 0
0 3760 9208
1 2369
2 1074
3 5659 7115
5 1821 4981
6 4750
Two records out of 15 are in overflow, which will require an additional block access. The other records
require only one block access.
Average time is: (1 (13/15)) + (2(2/15)) = 0.867 + 0.266 = 1.133 block accesses.
5.34 Suppose that we have a hash file of fixed-length records, and suppose that over-flow is handled by
chaining. Outline algorithms for insertion, deletion, and modification of a file record. State any
assumptions you make.
Insert:
Read bucket
Indexing field: (page 155 in the textbook) record fields that are used to construct an index. Any field in a
file can be used to create an index and multiple indexes on different fields can be constructed on a file.
Primary key field: (page 157 in the textbook) the ordering key field of the file. A field that uniquely
identifies a record.
Clustering field: (page 159 in the textbook) If the records of a file are physically ordered on a non-key field
– which does not have a distinct value for each record – that field is called the clustering field.
Secondary key field: (page 162 in the textbook) A secondary index is also an ordered file with two fields
(like a primary index). However the first field is of the same data type as some non-ordering field of the
data file that is an indexing field. If the secondary access structure uses a key field that has a distinct value
for every record it is called a secondary key field.
Block anchor: (page 157 in the textbook) The first record in each block of the data file.
Dense index: (page 157 in the textbook) An index that has an index entry for every search key value (and
hence every record) in the data file.
Non-dense (sparse) index page 157 in the textbook) An index that has entries for only some of the search
values.
6.5 What is the order p of a B-tree? Describe the structure of B-tree nodes.
Refer to page 170 in the text.
A search tree of order p is a tree such that each node contains at most p – 1 search values and p pointers in
the order <P1, K1, P2, K2, … Pq-1, Pq> where q <= p; each Pi value
6.6 What is the order p of a B+-tree? Describe the structure of both internal and leaf nodes of a B+-tree.
Each internal node is of the form <P1, K1, P2, K2, …, Pq-1, Kq-1, Pq>
Each internal node, except the root, has at least ceil(p/2) tree pointers. The root node has at least two tree
pointers if it is an internal node.
Each leaf node is of the form << K1, Pr1>, <K2, Pr2>, …, <Kq-1,Prq-1>,Pnext>
Where q <= p, each Pri is a data pointer, and Pnext points to the next leaf node.
6.7 How does a B-tree differ from a B+-tree? Why is a B+-tree usually preferred as an access structure to a
data file?
A B-tree has data pointers in both internal and leaf nodes. A B-tree has only tree pointers in internal nodes
and all data pointers are in leaf nodes.
Because entries in the internal nodes of a B+-tree contain only tree pointers and not data pointers more
entries can be packed into an internal node of a B+-tree leading to fewer levels improving search time. In
addition, the entire tree can be traversed in order using the Pnext pointers.
6.14 Consider a disk with block size=512 bytes. A block pointer is P= 6 bytes long, and a record pointer is
R = 7 bytes long. A file has r=30,000 EMPLOYEE records of fixed-length. Each record has the following
fields: NAME (30 bytes), SSN (9 bytes), DEPARTMENTCODE (9 bytes), ADDRESS (40 bytes), PHONE
(9 bytes), BIRTHDATE (8 bytes), SEX (1 byte), JOBCODE (4 bytes), SALARY (4 bytes, real number).
An additional byte is used as a deletion marker.
Suppose the file is ordered by key field SSN and we want to construct a primary index on SSN (talk about
primary key and unique) Calculate:
i. The index blocking factor bfr I (which is also the fan out fo)
Since the third level has only 1 block it is the top index level. Hence, the index has x = 3 levels
The total number of blocks for the index bi = b1 + b2 + b3 = 221 + 7 + 1 = 229 blocks