Sie sind auf Seite 1von 7

Submit Assignment For Help

Massachusetts Institute of Technology Go To Code Directly Handout 5


info@programminghomeworkhelper.com
6.854J/18.415J: Advanced Algorithms Thursday, September 22, 2005
David Karger

Problem Set 3

Due: Wednesday, September 28, 2005.

Notice that one problem is marked noncollaborative. As you might expect, this prob­
lem should be done without any collaboration.

Problem 1. Augment the van Emde Boas priority queue to support the following oper­
ations on integers in the range {0, 1, 2, . . . , u − 1} in O(log log u) worst­case time each and
O(u) space total:

Find (x, Q): Report whether the element x is stored in the structure.
Predecessor (x, Q): Return x’s predecessor, the element of largest value less than x, or
null if x is the minimum element.
Successor (x, Q): Return x’s successor, the element of smallest value greater than x, or
null if x is the maximum element.

NONCOLLABORATIVE Problem 2. In class we saw how to use a van Emde Boas


priority queue to get O(log log u) time per queue operation (insert, delete­min, decrease­key)
when the range of values is {1, 2, . . . , u}. Show that for the single­source shortest paths
problem on a graph with n nodes and range of edge lengths {1, 2, . . . , C}, we can obtain
O(log log C) time per queue operation, even though the range of values in the queue is
{1, 2, . . . , nC},

Problem 3. In class we considered building a depth­k, base­Δ (implicit) trie over integers
in the range from 1 to C (where Δ = C 1/k ) that supported insert in time O(k) and delete­min
in time O(Δ). By choosing k and Δ appropriately we found shortest paths in O(m+n log C)
time. We now improve this bound. Consider modifying the delete­min operation, where we
scan forward through a trie node and reach a new a bucket of items. If that bucket has more
than t items in it, we expand it to multiple buckets in a node at the next trie level down as
before. But if there are fewer than t items, we simply store them in a heap. During inserts
or decrease­keys, new items may be added to the heap, and if the heap size grows beyond t,
we expand it to a trie node of buckets as before.

(a) Let I(t), D(t), X(t) denote the times to insert, decrease key, and extract min in

a heap of size at most t. Prove that the amortized times for operations in the

new data structure can be bounded by

https://www.programminghomeworkhelper.com/
2 Handout 5: Problem Set 3

• O(kΔ/t + I(t)) for insert


• O(D(t) + I(t)) for decrease­key
• O(X(t) for extract­min
Hint: When an item is inserted, give it kΔ/t units of potential energy. Each
time the item gets pushed down into a new trie node, have it donate Δ/t of its
potential energy to that node. Argue that this is a valid analysis, and that the
potential energy at nodes is sufficient to pay for scanning trie nodes during an
extract­min.
√ k
(b) Argue that using Fibonacci
√ heaps and setting k = log C and t = 2 gives a

running time of O(m + n log C) for shortest paths.

Problem 4. Perfect hashing is nice, but does have the drawback that the perfect hash
function has a lengthy description (since you have to describe the second­level hash function
for each bucket). Consider the following alternative approach to producing a perfect hash
function with a small description. Define bi­bucket hashing, or bashing, as follows. Given n
items, allocate two arrays of size n1.5 . When inserting an item, map it to one bucket in each
array, and place it in the emptier of the two buckets.

(a) Suppose a random function is used to map each item to buckets. Give a good

upper bound on the expected number of collisions. Hint: What is the probability

that the k th inserted item collides with some previously inserted item?

(b) Argue that bashing can be implemented efficiently, with the same expected out­

come, using the ideas from 2­universal hashing.

(c) Conclude an algorithm with linear expected time (ignoring array initialization)

for identifying a perfect bash function for a set of n items. How large is the

description of the resulting function?

OPTIONAL (d) Generalize the above approach to use less space by exploiting tri­

bucket hashing (trashing), quad­bucket hashing (quashing), and so on.

OPTIONAL Problem 5. Our bucketing data structures (and in particular ven Emde
Boas queues) use arrays, and we never worried about the time taken to initialize them.
Devise a way to avoid initializing large arrays. More specifically, develop a data structure
that holds n items according to an index i ∈ {1, . . . n} and supports the following operations
in O(1) time (worst case) per operation:

init Initializes the data structure to empty.

set(i, x) places item x at index i in the data structure.

get(i) returns the item stored in index i, or “empty” if nothing is there.

Handout 5: Problem Set 3 3

Your data structure should use O(n) space and should work regardless of what garbage
values are stored in that space at the beginning of the execution. Hint: use extra space to
remember which entries of the array have been initialized.

OPTIONAL Problem 6. Can a van Emde Boas type data structure be combined with
some ideas from Fibonacci heaps to support insert/decrease­key in O(1) time and delete­min
in O(log log u) time?
Massachusetts Institute of Technology Handout 8
6.854J/18.415J: Advanced Algorithms Wednesday, September 28, 2005
David Karger

Problem Set 3 Solutions


Problem 1. We augment the vEB queue to also hold a maximum element. We implement
the desired operations as follows:

• find(x,Q): We check if x is either the minimum or maximum of the current queue.


If so, we return it. Otherwise, make a recursive call and find low(x) in the subqueue
Q[high(x)].
• predecessor(x,Q): If x is less than the minimum of Q, return null. If x is greater
than the maximum of Q, return the maximum of Q. Otherwise, we make a recursive
call to find the predecessor of low(x) in the subqueue Q[high(x)]. If the result of this
recursive call is non­null, then we return the result. Otherwise, we make a call to find
the predecessor of high(x) in Q.summary. The result of this call tells us the subqueue
that is non­empty among the subqueues. In particular, if it is non­null, then we return
the maximum element from that subqueue. However, if the result of the call was null,
then we can return the minimum of Q.
• successor(x,Q): The algorithm is very similar to predecessor.

Problem 2. Let us first recall Dijkstra’s algorithm. Given a weighted graph G = (V, E),
and a node s ∈ V , it finds the length of the shortest path from s to any node u ∈ V . The
algorithm works by maintaining a set of nodes S for which the shortest path from s to v ∈ S
is already known. During each iteration, the algorithm chooses the node in u ∈ V \ S whose
estimated distance from s is minimum. Then, for each node w that is adjacent to u, the
algorithm sets the estimated distance from s to w to be the the minimum of the current
estimated distance and the distance from s to u, and then using the edge (u, w).
The key property to note in the above algorithm is that the maximum difference between
the estimated distances from s to any node in the priority queue holding the nodes in V \ S
is C. This can be proven by induction on the iteration on the number of nodes in S.
The solution then is to use two vEB queues, where the range of values in each vEB queue
is [1, C]. The “left” queue will store the smaller values, and the “right” queue will store the
larger values. When we want to insert an estimated distance into the queue, we only insert
the distance mod C. Along with each queue, we will store the “base” value of that queue,
which will be a multiple of C. By the key property above, we will never have to worry about
filling a third queue, since the difference between any values in the two queues is at most C.

Problem 3.

(a) We will present how to perform the operations, and how to amortize the cost.
2 Handout 8: Problem Set 3 Solutions

Insert We associate with each inserted item a potential of k + ktΔ . The additive
factor of k will be spent on constant time operations that will be performed each
time an item goes one level down, and the factor of ktΔ will reimburse the cost
of scanning nodes. A new node is created when the number of elements in a
heap exceeds t. Each of these elements donates Δt of energy, and goes one level
down, so in total we collect Δ units of energy, and assign it to the newly created
node as a reimbursement for scanning this node. Eventually, while performing
insert of x, we descent down the trie, charging x’s energy, until we meet either a
blob, into which we insert x in constant time, or a heap, into which we insert x
in O(I(t)) time, or which is expanded down, and the heap’s elements pay for it.
Finally, the amortized cost of the operation is O(k + ktΔ + I(t)).

Decrease­key We remove an element x from a structure in which it is. It


can be either a heap or a blob, and the operation costs us at most O(D(t))
time. Next, in constant time, we determine a new bucket in which it should be,
and follow down if the bucket has been already expanded (this is paid by the
potential of x). Eventually, we determine a bucket or a blob into which x has to
be inserted, and we can do it in O(I(t)) time. Possible expansions are paid by
the potential of elements of a heap. The amortized cost of decrease­key operation
is O(D(t) + I(t)) time.

Extract­min Starting from the last entry from which we have extracted a
minimum, we scan for the first non­empty entry, possibly going down, if the
starting entry has been meanwhile expanded. In total, we scan each entry once,
and the scanning is paid by the potential associated with nodes. We find a heap
or a blob. If the latter, we convert it into a heap, and if an expansion occurs, we
continue scanning. After all we find a non­empty heap, from which we remove the
minimum in O(X(t)) time. For everything except the last removal the potential
is charged. The amortized time cost is only O(X(t)).
(b) We perform at most n inserts, n extract­min, and m decrease­key operations.

This means that running time R can be bounded by

� � � �

O n k+ + I(t) + nX(t) + m (D(t) + I(t)) .
t
Since in the case of Fibonacci heaps I(t) = O(1), D(t) = O(1), X(t) = O(log t),
we can simplify our upper bound to
� � ��

O m+n k+ + log t .
t
We substitute first Δ = C 1/k , and t = 2k , achieving
kC 1/k
� � ��
R=O m+n k+ .
2k
Handout 8: Problem Set 3 Solutions 3


We transform the last addend, knowing that k = log C:
√ √
kC 1/k log C2log C/ log C �
= √ = log C.
2k 2 log C
Eventually, we achieve �
R = O(m + n log C).

Problem 4. (a) Consider the (k + 1)st item inserted. Since only k buckets (at
worst) are occupied, the probability that both candidate locations are occupied is
only (k/n1/5 )2 . Thus, the expected number of times an item is actually inserted
into an already­occupied bucket is at most
n−1
� (n − 1)(n)(2n − 1)
(k/n1.5 )2 =
k=0
6n3
≤ 1/3

Now let’s consider pairwise collisions. Item k collides with item j < k only
if (i) one of the candidate locations of item k is the location as item j (this
has probability at most 2/n1.5 ) and (ii) the other candidate location for item
k contains at least one element (probability k/n1.5 ). Thus, the probability k
collides with j is at most k/n3 . Summing over the k possible values of j < k,
we find the expected number of collisions for item k is at most k 2 /n3 . Summing
over all k, we get the same result as above: O(1) expected collisions.
(b) Start with a 2­universal family of hash functions mapping n items to 2n1.5 lo­

cations. Consider any particular set of n items. Consider choosing a random

function from the hash family family. The probability that item k collides with

item j is 1/2n1.5 by pairwise independence, implying by √ the union bound that

the probability k collides with any item is at most 1/2 n.

Now suppose that we allocate two arrays of size 2n1.5 and choose a random 2­
universal hash function from the family independently for each array. If an item
has no collision in either array, then it will be placed in an empty bucket by
the bash function. We need merely analyze the probability that this happens for
every item (this would make the bash function perfect).

The probability that item k has a collision in both arrays is at most (1/2 n)2 =
1/4n. It follows that the expected number of items colliding with some other
item is at most 1/4. This implies in turn that with probability 3/4, every item is
placed in an empty bucket by the (perfect) bash function. This in turn implies
that some pair of 2­universal hash functions defines a perfect bash for our set of
n items.
Since every set of items gets a perfect bash from this scheme, it follows that the
family of pairs of 2­universal functions above is a perfect bash family. Since the
4 Handout 8: Problem Set 3 Solutions

2­universal family has size polynomial in the universe, so does the family of pairs
of 2­universal functions.
(c) If we map our n items to k candidate locations in an array of size n1+1/k , our

collision odds work out as above and we get a constant number of collisions.

Similarly, k random 2­universal hash families, each mapping to a set of size

n1+1/k , has a constant probability of being perfect for any particular set of items,

so the set of all such functions provides a perfect family (of polynomial size for

any constant k). This gives a tradeoff of k probes for perfect hashing in space

O(n1+1/k ).
Note that while we can achieve perfect hashing to O(n) space, the resulting family
does not have polynomial size (since a different, subsidiary hash function must
be chosen for each sub­hash­table).

Das könnte Ihnen auch gefallen