Beruflich Dokumente
Kultur Dokumente
b
a
Pattern Matching
a
b
c
a
Pattern Matching
Strings
A string is a sequence of
characters
Examples of strings:
Java program
HTML document
DNA sequence
Digitized image
ASCII
Unicode
{0, 1}
{A, C, G, T}
Text editors
Search engines
Biological research
Pattern Matching
Brute-Force Algorithm
Algorithm BruteForceMatch(T, P)
Input text T of size n and pattern
P of size m
Output starting index of a
substring of T equal to P or 1
if no such substring exists
a match is found, or
for i 0 to n m
all placements of the
{ test shift i of the pattern }
pattern have been tried
j0
Brute-force pattern
while j m T[i j] P[j]
matching runs in time O(nm)
Example of worst case:
jj1
T aaa ah
if j m
P aaah
return i {match at i}
may occur in images and
else
DNA sequences
unlikely in English text
break while loop {mismatch}
return -1 {no match anywhere}
The brute-force pattern
matching algorithm
compares the pattern P with
the text T for each possible
shift of P relative to T, until
either
Pattern Matching
Boyer-Moore Heuristics
The Boyer-Moores pattern matching algorithm is based on
two heuristics
Looking-glass heuristic: Compare P with a subsequence of T
moving backwards
Character-jump heuristic: When a mismatch occurs at T[i] c
Example
a
p a t
r i
1
t h m
r i
t e r n
2
t h m
m a t c h i n g
3
t h m
r
r
4
t h m
Pattern Matching
a l g o r
5
t h m
t h m
11 10 9 8 7
r i t h m
r
6
t h m
Last-Occurrence Function
Boyer-Moores algorithm preprocesses the pattern P and
the alphabet to build the last-occurrence function L
mapping to integers, where L(c) is defined as
Example:
{a, b, c, d}
P abacab
L(c)
Pattern Matching
The Boyer-Moore
Algorithm
Case 1: j 1l
Algorithm BoyerMooreMatch(T, P, )
L lastOccurenceFunction(P, )
im1
jm1
repeat
if T[i] P[j]
if j 0
return i { match at i }
else
ii1
jj1
else
{ character-jump }
l L[T[i]]
i i m min(j, 1l)
jm1
until i n 1
return 1 { no match }
a .
i
b a
j l
mj
b a
Case 2: 1lj
.
Pattern Matching
a .
i
a .
l
b .
j
m (1 l)
a .
1l
b .
Example
a b a
a a b a d
a b a
a b a a b b
a b a
a b a
a b
4
a b
13 12 11 10 9
a b a
5
a b a
a b
7
a b
a b a
a b
a b a
a b
Pattern Matching
Analysis
Boyer-Moores algorithm
runs in time O(nm s)
Example of worst case:
T aaa a
P baaa
a a
a a
12 11 10 9
a a
a a
18 17 16 15 14 13
Pattern Matching
a a
a a
24 23 22 21 20 19
b a
a a
a b a a b x .
a b a a b a
j
a b a a b a
No need to
repeat these
comparisons
Pattern Matching
Resume
comparing
here
10
P[j]
F(j)
a b a a b x .
a b a a b a
j
a b a a b a
F(j1)
Pattern Matching
11
i increases by one, or
the shift amount i j
increases by at least one
(observe that F(j 1) < j)
Algorithm KMPMatch(T, P)
F failureFunction(P)
i0
j0
while i n
if T[i] P[j]
if j m 1
return i j { match }
else
ii1
jj1
else
if j 0
j F[j 1]
else
ii1
return 1 { no match }
Pattern Matching
12
i increases by one, or
the shift amount i j
increases by at least one
(observe that F(j 1) < j)
Algorithm failureFunction(P)
F[0] 0
i1
j0
while i m
if P[i] P[j]
{we have matched j + 1 chars}
F[i] j + 1
ii1
jj1
else if j 0 then
{use failure function to shift P}
j F[j 1]
else
F[i] 0 { no match }
ii1
Pattern Matching
13
Example
a b a c a a b a c c a b a c a b a a b b
1 2 3 4 5 6
a b a c a b
7
a b a c a b
8 9 10 11 12
a b a c a b
13
P[j]
F(j)
a b a c a b
14 15 16 17 18 19
a b a c a b
Pattern Matching
14