Sie sind auf Seite 1von 51

KMP Algorithm

Knuth-Morris-Pratt

String Searching Problem


Input: A word string W and a text string S
Check if W exists as a substring of S, and if it does then
return its location.

Output: The position in S at which W is found

Brute Force
S:

W:

Brute Force
S:
W:

Brute Force
S:
W:

Brute Force
S:
W:

Brute Force
S:
W:

Brute Force
S:
W:

Brute Force
S:
W:

Worst Case of Brute Force


S:

W:

Worst Case of Brute Force


S:

W:

If |S|=n, |W|=m then the algorithm runs in O(mn) time.

Better Algorithms
Backward Algorithm

Raita Algorithm

Boyer and Moore Algorithm

Reverse Factor Algorithm

Colussi Algorithm

Reverse Colussi Algorithm

Crochemore and Perrin Algorithm

Self Max-Suffix Algorithm

Galil Gianardo Algorithm

Simon Algorithm

Galil and Seiferas Algorithm

Skip Search Algorithm

Horsepool Algorithm

Smith Algorithm

Knuth Morris and Pratt Algorithm

Tuned Boyer and Moore Algorithm

KMP Skip Algorithm

Two Way Algorithm

Max-Suffix Matching Algorithm

Uniqueness Algorithm

Morris and Pratt Algorithm

Wide Window Algorithm

Quick Searching Algorithm

Zhu and Takaoka Algorithm

KMP
Linear Time
Avoids comparisons with elements of S that
have already been involved in a comparison,
i.e. backtracking in S never occurs
Time: O(m+n)
Space: O(m+n)

KMP
Differs from brute force by always keeping
track of the information that it gains from
previous comparisons
A failure function or partial matching table
(T) is computed which tells us how much of
the last comparison can be reused if it fails
T[i]=the longest prefix of W that is also a
proper suffix of W[0..i]

KMP
T shows how much of the beginning of W
matches up to the portion of S immediately
preceding the failed comparison.
.

No need to repeat these comparisons

Resume
comparing here

Sliding Window Approach


Nearly all exact string matching algorithms
use the slide window approach
Whenever a mismatch is found, slide the
window to the right

Sliding Window Approach


Nearly all exact string matching algorithms
use the slide window approach
Whenever a mismatch is found, slide the
window to the right

Suffix to Prefix Rule


For a window to have any chance to match a pattern, in
some way, there must be a suffix of the window which is
equal to a prefix of the pattern.

KMP
T shows how much of the beginning of W
matches up to the portion of S immediately
preceding the failed comparison.
.

No need to repeat these comparisons

Resume
comparing here

KMP
T shows how much of the beginning of W
matches up to the portion of S immediately
preceding the failed comparison.
.

No need to repeat these comparisons

Resume
comparing here

KMP example
m
S
W
i

KMP example
m
S
W
i

KMP example
m
S
W
i

KMP example
m
S
W
i

KMP example
m
S
W
i

KMP example
m
S
W
i

KMP example
m
S
W
i

KMP example
m
S
W
i

KMP
Calculating the longest valid suffix during
runtime will be very inefficient
Pre-processing can eliminate the problem,
as the suffix also exists in W itself

KMP
The algorithm preprocesses the word W to
produce the prefix function, which gives the
number of steps the pattern can skip for every
possible location of a mismatch

Components of KMP
Compute Prefix Function: For a given W,
compute a table T of equal length where T[i]
gives the length of the longest prefix of W
that is also a proper suffix of W[0..i].
KMP Matcher Function: Actual searching.

Example of a prefix function


W
T

A
0

Example of a prefix function


W
T

Example of a prefix function


W
T

Example of a prefix function


W
T

Example of a prefix function


W
T

Example of a prefix function


W
T

Example of a prefix function


W
T

Example of a prefix function


W
T

Example
S
W
T

Example
S
W
T

Example
S
W
T

Example
S
W
T

Example
S
W
T

Example
S
W
T

Example
S
W
T

Example
S
W
T

Example
S
W
T

Matcher Function
KMP(String S, String W):
set T to prefixFunc(W)

//Compute the partial match table

set q to 0

//Candidate character of W initially 0

for every i in range 0 to n-1


while q>0 and W[q] is not equal to S[i]
set q to T[q-1]

//Mismatch, backtrack if you can

if W[q] is equal to S[i]


increment q

//Match, move to next character

if q is equal to m
print i-m+1

//Entire W has been found

set q to T[q-1]

//Find others

Prefix Function
prefixFunc(List W):
set T[0] to 0

//Set first element of table to 0

set k to 0

//Candidate character initially 0

for every q in range 1 to m-1


while k>0 and W[k] is not equal to W[q]
set k to T[k-1]

//Mismatch, backtrack if possible

if W[k] is equal to W[q]


increment k
Set T[q] to k
return T

//Match, move to next character


//Store result

Runtime Analysis
Although the algorithm as implemented here contains a
loop within a loop, it runs in linear time. This is because the
backtracking statement, which essentially shifts the sliding
window to the right, can only execute a maximum of n
times in the entire run of the for loop. The remaining body
of the for loop runs executes exactly n times itself, giving
a runtime of O(n) for the matching function.
Similar reasoning applies to the prefix function.