Beruflich Dokumente
Kultur Dokumente
Query string
s = “yotubecom” and τ = 2
Index
Filter: No Verify:
Yes
Dataset R Signature(s) ∩ Results
ED(r,s) ≤ τ?
Signature(r) = ϕ?
Threshold τ
Filter-and-Verification Framework
Query string s
Index
Filter: No Verify: Yes
Dataset R Signature(s) ∩ alignment filter? Results
Signature(r) = ϕ? If yes, ED(r,s) ≤ τ?
Threshold τ
Complexity Improvement:
Improved from 𝑂(min 𝑟 , 𝑠 ∗ τ) to 𝑂(𝑞τ2)
Alignment Filter
If στ+1
𝑖=1 𝑒𝑟𝑟𝑖 > τ, ED r, s > τ
Alignment Filter
Substring edit distance (sed)
𝑠𝑒𝑑 𝑔𝑖 , 𝑟 is the minimum edit distance between 𝑔𝑖 and
any substring of r.
Alignment filter:
If στ+1
𝑖=1 𝑠𝑒𝑑(𝑔𝑖 , 𝑟) > τ, 𝐸𝐷 𝑟, 𝑠 > τ
Alignment Filter
Accelerating Calculation:
• The computation complexity of sed(𝑔𝑖, r) is O(q|r|).
• By position filter, 𝑔𝑖 can only align to a substring xi of r
where |xi|<2τ + 𝑞.
• Thus if στ+1
𝑖=1 𝑠𝑒𝑑(𝑔𝑖 , 𝑥𝑖 ) > τ, ED(𝑟, 𝑠)> τ.
• The complexity reduced to O qτ .
Complexity Improvement:
Improved from 𝑂(min 𝑟 , 𝑠 ∗ τ) to 𝑂(𝑞τ2)
Evaluating Alignment Filter
Average Search Time
g1 g2 g5 g6 g9 g10 g11
|Pre(•)|= qτ+1
>g10 >g10 >g10 >g10 >g10 >g10 >g10
g3 g4 g7 g8 g11 g12 g13
Pre(s)
q(s): The sorted q-gram set of string s
youtubecom
yoytupecxm
q=3, the 3 non-consecutive errors destroy 8 q-grams
consecutive errors:
youtubecom
youtzpxcom
q=3, the 3 consecutive errors only destroy 5 q-grams