Beruflich Dokumente
Kultur Dokumente
Exercise 13.7.6: An MPEG movie uses about one gigabyte per hour of play. If we
carefully organized several movies on a Megatron 747 disk, how many could we
deliver with only small delay (say 100 milliseconds) from one disk. Use the
timing estimates of Example 13.2, but remember that you can choose how the
movies are laid out on the disk.
If we organize 418 chunks from different movies fit into 8 tracks, and all
movies occupy consecutive cylinders, we can play as many as 418 movies
with an initial delay of 100 ms.
Exercise 14.1.1: Suppose blocks hold either three records, or ten key-pointer
pairs. As a function of n, the number of records, how many blocks do we need
to hold a data file and:
b) Suppose our inverted index only records for each word all the documents that
have that word. What is the maximum number of blocks we could need to
hold the inverted index?
In the worst case, each word will appear in each document at least once.
The maximum number of blocks is:
10000 / 10 (word-pointer pairs) + 10000 * 1000 / 50 (bucket pointers)
= 201,000
c) Suppose our inverted index holds pointers to each occurrence of each word.
How many blocks do we need to hold the inverted index?
10000 / 10 + 20000 * 1000 / 50 = 401,000
d) Repeat (b) if the 400 most common words ( stop words) are not included in
the index.
(10000 400) / 10 + (10000 400) * 1000 / 50 = 192,960
e) Repeat (c) if the 400 most common words are not included in the index.
a) The data file is a sequential file, sorted on the search key, with 10 records
per block. The B-tree is a dense index.
b) The same as (a), but the data file consists of records in no particular order,
packed 10 to a block.
Same as (a)
Exercise 14.3.3: The material of this section assumes that search keys are
unique. However, only small modifications are needed to allow the techniques to
work for search keys with duplicates. Describe the necessary changes to
insertion, deletion, and lookup algorithms, and suggest the major problems that
arise when there are duplicates in each of the following kinds of hash tables:
(a) simple
If there are many duplicates, there will be many unused buckets and the
buckets that are used would have long overflow lists.
(b) linear
If there are many duplicates, searching becomes tedious since we need to
read more than one block and follow several overflow buckets.
(c) extensible
If there are many duplicates we will have frequent splits.