BOYER-MOORE ALGORITHM

PROBLEM
The text is an array T.1,T.2,...,T.N and the pattern is an array P.1,P.2,...,P.M. The elements of P. and T. are characters drawn from a finite alphabet Sigma., Sigma. is an array of R elements. We say that pattern P. occurs with shift S in text T. if 0<=S<=N-M and if T.SpJ=P.J, where SpJ indicates S+J, for J=1,...,M. String-matching problem is the problem of finding all valid shifts with which a given pattern occurs in a given text.

ALGORITHM
The BOYER_MOORE_MATCHER algorithm was invented by Robert S. Boyer and J. Strother Moore. Its worst-case running time is O((N-M+1)*M+R).

PRACTICE
When a text and a pattern are real strings then the fastest solution is by REAL_STRING_MATCHER. In general case there is not an unambiguous winner. You have to test subroutines:

NAIVE_STRING_MATCHER
KMP_MATCHER
BOYER_MOOR_MATCHER

on your data. Example, where M respectively R is number of elements in P. respectively in Sigma. and S is number of valid shifts:

Running time in seconds for N=100000
Algorithm M=10,R=1999,S=50 M=10,R=4,S=10000 M=100,R=4,S=10000
Naive 16  25  150 
KMP 21  22  23 
Boyer-Moore 15  138 

IMPLEMENTATION
Unit: internal subroutine
 
Global variables: array T.1,...,T.N of strings, array P.1,...,P.M of strings, array Sigma.1,...,Sigma.R of strings
 
Parameters: positive integer N - number of elements in T., positive integer M - number of elements in P., positive integer R - number of elements in Sigma.
 
Output: displays on the screen Pattern occurs with shift S for all valid shift S
 


BOYER_MOORE_MATCHER:
procedure expose T. P. Sigma.
parse arg M, N, R
call COMPUTE_LAST_OCCURENCE_FUNCTION M, R
call COMPUTE_GOOD_SUFFIX_FUNCTION M
S = 0
do while S <= N - M
  do J = M to 1 by -1
    SpJ = S + J
    if P.J <> T.SpJ then leave
  end
  if J = 0
    then do
      say "Pattern occurs at shift" S
      S = S + Gama.0
    end
    else do
      SpJ = S + J; TSpJ = T.SpJ
      S = S + MAX(Gama.J, J - Lambda.TSpJ)
    end
end
return
 
COMPUTE_LAST_OCCURENCE_FUNCTION:
procedure expose P. Lambda. Sigma.
parse arg M, R
do I = 1 to R
  SigmaI = Sigma.I; Lambda.SigmaI = 0
end
do J = 1 to M
  PJ = P.J; Lambda.PJ = J
end
return
 
COMPUTE_GOOD_SUFFIX_FUNCTION:
procedure expose Gama. P.
parse arg M
call COMPUTE_PREFIX_FUNCTION M
do I = 1 to M
  PuvPi.I = Pi.I; PuvP.I = P.I
end
do I = M to 1 by -1
  MmIp1 = M - I + 1; P.I = PuvP.MmIp1
end
call COMPUTE_PREFIX_FUNCTION M
do I = 1 to M; P.I = PuvP.I; end
do J = 0 to M
  Gama.J = M - PuvPi.M
end
do L = 1 to M
  J = M - Pi.L
  if Gama.J > L - Pi.L
    then Gama.J = L - Pi.L
end
return
 
COMPUTE_PREFIX_FUNCTION:
procedure expose Pi. P.
parse arg M
Pi.1 = 0; K = 0
do Q = 2 to M
  Kp1 = K + 1
  do while K > 0 & P.Kp1 <> P.Q
    K = Pi.K; Kp1 = K + 1
  end
  Kp1 = K + 1
  if P.Kp1 = P.Q then K = K + 1
  Pi.Q = K
end
return

 

CONNECTIONS
String-Matching Problem
     Matcher for real string
     Naive string matcher
     Knuth-Morris-Pratt matcher

Literature
Cormen T. H., Leiserson Ch. E., Rivest R. L. Introduction to Algorithms
The MIT Press, Cambridge, 1990


cover contents index main page

last modified 8th August 2001
Copyright 2000-2001 Vladimir Zabrodsky
Czech Republic.

mail