Beruflich Dokumente
Kultur Dokumente
Princeton University • COS 226 • Algorithms and Data Structures • Spring 2004 • Kevin Wayne • http://www.Princeton.EDU/~cos226 2
Tries Applications
Tries. Applications.
n Store characters in internal nodes, not keys. n Spell checkers.
n Store records in external nodes. n Data compression. stay tuned
n Use the characters of the key to guide the search. n Princeton U-CALL.
n NB: from retrieval, but pronounced "try." n Computational biology.
n You can get at anything if its organized properly in 40 or 100 bits! n Routing tables for IP addresses.
n Storing and querying XML documents.
n Associative arrays, associative indexing.
Example: sells sea shells by the sea shore
sells
shore
shells 4 5
Existence Symbol Table: Operations Keys
6 7
Existence Symbol Table: Implementations Cost Summary R-Way Existence Trie: Example
8 9
R-Way Existence Trie: Java Implementation R-Way Existence Trie: Implementation
10 11
R-Way Existence Trie: Implementation Existence Symbol Table: Implementations Cost Summary
12 13
Existence TST Existence TST: Implementation
\0
hi
15 16
Existence TST: Java Implementation Existence Symbol Table: Implementations Cost Summary
19 20
Bottom line: more flexible than BST and can be faster than hashing.
Near neighbor search.
especially if lots of search misses
n Find all strings in ST that differ in £ P characters from query.
n Application: spell checking for OCR.
sea sells
23 24
sea sells
25 26
PATRICIA Tries Suffix Tree
Patricia tries. Practical Algorithm to Retrieve Information Coded in Alphanumeric. Suffix tree: PATRICIA trie of suffixes of a string.
n Collapse one-way branches in binary trie.
n Thread trie to eliminate multiple node types.
Applications. Applications.
n Database search. n Longest common substring.
n P2P network search. n Longest repeated substring.
n IP routing tables: find longest prefix match. n Longest palindromic substring.
n Compressed quad-tree for N-body simulation. n Longest common prefix of two substrings.
n Efficiently storing and querying XML documents. n Computational biology databases (BLAST, FASTA).
n Search for music by melody.
27 28
Why useful?
# collect data n Using algorithm with strings is more useful.
foreach student ($argv)
n Running algorithm with indices (instead of ST lookup) is faster.
foreach input (input100.txt input1000.txt input10000.txt)
foreach program (worstfit bestfit)
t[$student][$input][$program] = `time java $program < $input`
end while (true) { while (true) {
end int p = StdIn.readInt(); String s = StdIn.readString();
end int q = StdIn.readInt(); String t = StdIn.readString();
... int p = st.index(s);
# compute statistics uf.unite(p, q); int q = st.index(t);
... ...
. . .
} uf.unite(p, q);
...
Idealized excerpt from COS 226 timing script }
29 30
Associative Indexing: Application Symbol Table Summary
Real version.
n N objects: "www.cs.princeton.edu", "www.harvard.edu"
n Any graph processing application.
31 32