Sie sind auf Seite 1von 3

Exercise 13.6.

7: Suppose that we have 4096-byte blocks in which we store


records of 100 bytes. The block header consists of an offset table, as in
Fig.13.19, using 2-byte pointers to records within the block. On an average day,
two records per block are inserted, and one record is deleted. A deleted record
must have its pointer replaced by a tombstone, because there may be dangling
pointers to it. For specificity, assume the deletion on any day always occurs
before the insertions. If the block is initially empty, after how many days will
there be no room to insert any more records?

1st day: use 2 * (100 + 2) = 204 bytes


Remaining bytes = 4096 204 = 3892 bytes.
Each subsequent day: use 2 * (100 + 2) 100 = 104 bytes
It will take (3892 / 104) = 37 days to fill up the block.
Answer: 38 days

Exercise 13.7.6: An MPEG movie uses about one gigabyte per hour of play. If we
carefully organized several movies on a Megatron 747 disk, how many could we
deliver with only small delay (say 100 milliseconds) from one disk. Use the
timing estimates of Example 13.2, but remember that you can choose how the
movies are laid out on the disk.

Worst time to position the right block = 17.38 + 8.33 = 25.71 ms


The remaining time to read the data into main memory buffer
= 100 25.71 = 74.29 ms
During 74.29 ms, we can read a full 8 (= 74.29 / 8.33) tracks.
To simplify, we do not consider the remaining 7.65 ms.
8 tracks contain 8 * 256 sectors/track * 4096 bytes = 8,388,608 bytes
16 tracks per cylinder. Read the next 8 tracks in 8.33 * 8 = 66.64 ms
Then move the next cylinder and read the first 8 tracks in the cylinder.
1.00025 ms + 66.64 ms = 67.64025 ms
The average time to read 8 tracks
(66.64 + 67.64025) / 2 = 67.140125 ms
Moving play rate: 1 Gbytes per hour 298.262 bytes per ms
So we can play about (8388608 / (298.262 * 67.140125)) = 418 movies

If we organize 418 chunks from different movies fit into 8 tracks, and all
movies occupy consecutive cylinders, we can play as many as 418 movies
with an initial delay of 100 ms.
Exercise 14.1.1: Suppose blocks hold either three records, or ten key-pointer
pairs. As a function of n, the number of records, how many blocks do we need
to hold a data file and:

(a) A dense index


n/3 + n/10 = 13n/30

(b) A sparse index?


n/3 + (n/3)/10 = 11n/30

Exercise 14.1.7: Suppose we have a repository of 1000 documents, and we wish


to build an inverted index with 10,000 words. A block can hold ten word-pointer
pairs or 50 pointers to either a document or a position within a document. The
distribution of words is Zipfian (see the box on The Zipfian Distribution in
Section 16.4.3); the number of occurrences of the ith most frequent word is
100000/ , for i = 1, 2, ... , 10000.

a) What is the average number of words per document?









b) Suppose our inverted index only records for each word all the documents that
have that word. What is the maximum number of blocks we could need to
hold the inverted index?
In the worst case, each word will appear in each document at least once.
The maximum number of blocks is:
10000 / 10 (word-pointer pairs) + 10000 * 1000 / 50 (bucket pointers)
= 201,000

c) Suppose our inverted index holds pointers to each occurrence of each word.
How many blocks do we need to hold the inverted index?
10000 / 10 + 20000 * 1000 / 50 = 401,000

d) Repeat (b) if the 400 most common words ( stop words) are not included in
the index.
(10000 400) / 10 + (10000 400) * 1000 / 50 = 192,960

e) Repeat (c) if the 400 most common words are not included in the index.






From (c), (10000 400) / 10 + 16,000,000 / 50 = 320,960


Exercise 14.2.1: Suppose that blocks can hold either ten records or 99 keys and
100 pointers. Also assume that the average B-tree node is 70% full; i.e., it will
have 69 keys and 70 pointers. We can use B-trees as part of several different
structures. For each structure described below, determine (1) the total number of
blocks needed for a 1,000,000-record file, and (2) the average number of disk
I/Os to retrieve a record given its search key. You may assume nothing is in
memory initially, and the search key is the primary key for the records.

a) The data file is a sequential file, sorted on the search key, with 10 records
per block. The B-tree is a dense index.

For data file: 1,000,000 / 10 = 100,000 blocks


For index: 14492 + 207 + 3 + 1 = 14,703 blocks
The leaf level: 1,000,000 / 69 = 14,492 blocks
The next level: 14,492 / 70 = 207 blocks
The third level: 207 / 70 = 3 blocks
The root level: 1 block
The total number of blocks = 114,703 blocks
5 disk IOs (4 disk IOs to scan B-tree and 1 disk IO to read data)

b) The same as (a), but the data file consists of records in no particular order,
packed 10 to a block.
Same as (a)

Exercise 14.3.3: The material of this section assumes that search keys are
unique. However, only small modifications are needed to allow the techniques to
work for search keys with duplicates. Describe the necessary changes to
insertion, deletion, and lookup algorithms, and suggest the major problems that
arise when there are duplicates in each of the following kinds of hash tables:

(a) simple
If there are many duplicates, there will be many unused buckets and the
buckets that are used would have long overflow lists.

(b) linear
If there are many duplicates, searching becomes tedious since we need to
read more than one block and follow several overflow buckets.

(c) extensible
If there are many duplicates we will have frequent splits.

Das könnte Ihnen auch gefallen