Beruflich Dokumente
Kultur Dokumente
4/1/2003 9:02 AM
www.jntuworld.com
Tries
e
mize
mi
nimize
ze
nimize
4/1/2003 9:02 AM
nimize
ze
ze
Tries
Preprocessing Strings
After preprocessing the pattern, KMPs algorithm performs
pattern matching in time proportional to the text size
Tries
4/1/2003 9:02 AM
s
u
l
e
y
l
l
Tries
e
y
4/1/2003 9:02 AM
Tries
We insert the
words of the
text into a
trie
Each leaf
stores the
occurrences
of the
associated
word in the
text e
The standard trie for a set of strings S is an ordered tree such that:
4/1/2003 9:02 AM
Tries
4/1/2003 9:02 AM
o
c
k
p
5
78
4/1/2003 9:02 AM
s e e
b e a r ?
s e l
s t o c k !
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
s e e
b u l
l ?
b u y
s t o c k !
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
b i d
s t o c k !
b i d
s t o c k !
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
h e a r
t h e
b e l
l ?
s t o p !
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
i
d
47, 58
u
l
l
e
y
36
a
r
69
30
Tries
e
e
0, 24
t
l
l
12
o
c
k
17, 40,
51, 62
p
84
Tries
4/1/2003 9:02 AM
www.jntuworld.com
Compressed Trie
A compressed trie has
internal nodes of degree
at least two
It is obtained from
standard trie by
compressing chains of
redundant nodes
Compact Representation
b
ar
id
ll
ell
ll
to
ck
0 1 2 3 4
4/1/2003 9:02 AM
S[8] =
h e a r
b e l l
S[2] =
s e l l
S[6] =
b i d
S[9] =
s t o p
S[3] =
s t o c k
1, 0, 0
1, 2, 3
7
0 1 2 3
b u l l
b u y
S[7] =
S[5] =
1, 1, 1
Tries
0 1 2 3
s e e
b e a r
S[4] =
S[1] =
S[0] =
i
l
6, 1, 2
8, 2, 3
0, 0, 0
7, 0, 3
4, 1, 1
4, 2, 3
4/1/2003 9:02 AM
0, 1, 1
5, 2, 2
0, 2, 2
3, 1, 2
2, 2, 3
3, 3, 4
9, 3, 3
Tries
m i n i m i z e
0 1 2 3 4 5 6 7
m i n i m i z e
0 1 2 3 4 5 6 7
mi
nimize
ze
7, 7
mize
nimize
ze
nimize
4/1/2003 9:02 AM
ze
4, 7
Tries
00
a
010 011
b
4/1/2003 9:02 AM
10
11
a
Tries
2, 7
0, 1
6, 7
4/1/2003 9:02 AM
2, 7
2, 7
6, 7
6, 7
Tries
10
1, 1
d
b
Given a text string X, we want to find a prefix code for the characters
of X that yields a small encoding for X
Frequent characters should have long code-words
Rare characters should have short code-words
Example
X = abracadabra
T1 encodes X into 29 bits
T2 encodes X into 24 bits
T1
T2
d
a
11
4/1/2003 9:02 AM
b
c
Tries
d
12
Tries
4/1/2003 9:02 AM
www.jntuworld.com
Huffmans Algorithm
Given a string X,
Huffmans algorithm
construct a prefix
code the minimizes
the size of the
encoding of X
It runs in time
O(n + d log d), where
n is the size of X
and d is the number
of distinct characters
of X
A heap-based
priority queue is
used as an auxiliary
structure
4/1/2003 9:02 AM
Example
Algorithm HuffmanEncoding(X)
Input string X of size n
Output optimal encoding trie for X
C distinctCharacters(X)
computeFrequencies(C, X)
Q new empty heap
for all c C
T new single-node tree storing c
Q.insert(getFrequency(c), T)
while Q.size() > 1
f1 Q.minKey()
T1 Q.removeMin()
f2 Q.minKey()
T2 Q.removeMin()
T join(T1, T2)
Q.insert(f1 + f2, T)
return Q.removeMin()
Tries
11
a
5
b
2
c
1
d
1
X = abracadabra
Frequencies
13
b
2
4/1/2003 9:02 AM
r
2
a
5
2
d
r
2
2
a
5
a
5
Tries
4
d
4
d
r
14