Beruflich Dokumente
Kultur Dokumente
Algorithms
F O U R T H
E D I T I O N
linear probing
context
ST implementations: summary
worst-case cost
(after N inserts)
average-case cost
(after N random inserts)
implementation
ordered
iteration?
key
interface
search
insert
delete
search hit
insert
delete
sequential search
(unordered list)
N/2
N/2
no
equals()
binary search
(ordered array)
lg N
lg N
N/2
N/2
yes
compareTo()
BST
1.38 lg N
1.38 lg N
yes
compareTo()
red-black BST
2 lg N
2 lg N
2 lg N
1.00 lg N
1.00 lg N
1.00 lg N
yes
compareTo()
Q. Can we do better?
A. Yes, but with different access to the data.
2
1
2
hash("it") = 3
??
Issues.
hash("times") = 3
"it"
4
5
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
linear probing
context
Efficiently computable.
Each table index equally likely for each key.
key
Ex 1. Phone numbers.
table
index
x.hashCode()
y.hashCode()
}
public final class Boolean
{
private final boolean value;
...
char
Unicode
'a'
97
'b'
98
'c'
99
...
Ex.
String s = "call";
int code = s.hashCode();
Basic rule. Need to use the whole key to compute hash code;
consult an expert for state-of-the-art hash codes.
11
Modular hashing
Hash code. An int between -231 and 231 - 1.
Hash function. An int between 0 and M - 1 (for use as array index).
typically a prime or power of 2
bug
1-in-a-billion bug
hashCode() of "polygenelubricants" is -231
correct
12
10 11 12 13 14 15
M / 2 tosses.
10 11 12 13 14 15
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
linear probing
context
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
linear probing
context
Collisions
Collision. Two distinct keys hashing to same index.
Birthday problem
0
1
hash("it") = 3
2
3
??
hash("times") = 3
"it"
4
5
th
10
11
12
A 8
st[]
E 12
null
X 7
S 0
L 11
P 10
M 9
H 5
3
4
C 4
R 3
18
.125
0
0
10
20
30
Binomial distribution (N = 10 4 , M = 10 3 , ! = 10 )
21
ST implementations: summary
worst-case cost
(after N inserts)
average case
(after N random inserts)
implementation
ordered
iteration?
key
interface
search
insert
delete
search hit
insert
delete
sequential search
(unordered list)
N/2
N/2
no
equals()
binary search
(ordered array)
lg N
lg N
N/2
N/2
yes
compareTo()
BST
1.38 lg N
1.38 lg N
yes
compareTo()
red-black tree
2 lg N
2 lg N
2 lg N
1.00 lg N
1.00 lg N
1.00 lg N
yes
compareTo()
separate chaining
lg N *
lg N *
lg N *
3-5 *
3-5 *
3-5 *
no
equals()
hashCode()
22
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
linear probing
context
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
linear probing
context
st[0]
st[1]
jocularly
null
st[2]
listen
st[3]
suburban
null
st[30000]
browsing
25
st[]
M = 16
10
11
12
13
14
15
search K
hash(K) = 5
st[]
M = 16
10
E
K
search miss
(return null)
11
12
13
14
15
st[]
10
11
12
13
14
15
M = 16
28
/* as before */
29
Clustering
Cluster. A contiguous block of items.
Observation. New keys likely to hash into middle of big clusters.
30
displacement = 3
M/8.
31
1+
1
1
search hit
1+
1
(1
)2
Pf.
Parameters.
ST implementations: summary
worst-case cost
(after N inserts)
average case
(after N random inserts)
implementation
ordered
iteration?
key
interface
search
insert
delete
search hit
insert
delete
sequential search
(unordered list)
N/2
N/2
no
equals()
binary search
(ordered array)
lg N
lg N
N/2
N/2
yes
compareTo()
BST
1.38 lg N
1.38 lg N
yes
compareTo()
red-black tree
2 lg N
2 lg N
2 lg N
1.00 lg N
1.00 lg N
1.00 lg N
yes
compareTo()
separate chaining
lg N *
lg N *
lg N *
3-5 *
3-5 *
3-5 *
no
equals()
hashCode()
linear probing
lg N *
lg N *
lg N *
3-5 *
3-5 *
3-5 *
no
equals()
hashCode()
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
linear probing
context
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
linear probing
context
int hashCode()
hash = 0;
skip = Math.max(1, length() / 8);
(int i = 0; i < length(); i += skip)
hash = s[i] + (37 * hash);
return hash;
Downside:
http://www.cs.princeton.edu/introcs/13loop/Hello.java
http://www.cs.princeton.edu/introcs/13loop/Hello.class
http://www.cs.princeton.edu/introcs/13loop/Hello.html
http://www.cs.princeton.edu/introcs/12type/index.html
36
Bro server:
key
hashCode()
key
hashCode()
key
hashCode()
"Aa"
2112
"AaAaAaAa"
-540425984
"BBAaAaAa"
-540425984
"BB"
2112
"AaAaAaBB"
-540425984
"BBAaAaBB"
-540425984
"AaAaBBAa"
-540425984
"BBAaBBAa"
-540425984
"AaAaBBBB"
-540425984
"BBAaBBBB"
-540425984
"AaBBAaAa"
-540425984
"BBBBAaAa"
-540425984
"AaBBAaBB"
-540425984
"BBBBAaBB"
-540425984
"AaBBBBAa"
-540425984
"BBBBBBAa"
-540425984
"AaBBBBBB"
-540425984
"BBBBBBBB"
-540425984
38
40
Use linear probing, but skip a variable amount, not just 1 each time.
Effectively eliminates clustering.
Can allow table to become nearly full.
More difficult to implement delete.
Cuckoo hashing. (linear-probing variant)
Hash key to two positions; insert key into either position; if occupied,
reinsert displaced key into its alternative position (and recur).
Simpler to code.
No effective alternative for unordered keys.
Faster for simple keys (a few arithmetic ops versus log N compares).
Better system support in Java for strings (e.g., cached hash code).
Balanced search trees.
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
linear probing
context
Algorithms
Algorithms
F O U R T H
E D I T I O N
linear probing
context