BOYER-MOORE ALGORITHM

PROBLEM
The text is an array T.1,T.2,...,T.N and the pattern is an array P.1,P.2,...,P.M. The elements of P. and T. are characters drawn from a finite alphabet Sigma., Sigma. is an array of R elements. We say that pattern P. occurs with shift S in text T. if 0<=S<=N-M and if T.SpJ=P.J, where SpJ indicates S+J, for J=1,...,M. String-matching problem is the problem of finding all valid shifts with which a given pattern occurs in a given text.

ALGORITHM
The BOYER_MOORE_MATCHER algorithm was invented by Robert S. Boyer and J. Strother Moore. Its worst-case running time is O((N-M+1)*M+R).

PRACTICE
When a text and a pattern are real strings then the fastest solution is by REAL_STRING_MATCHER. In general case there is not an unambiguous winner. You have to test subroutines:

 NAIVE_STRING_MATCHER KMP_MATCHER BOYER_MOOR_MATCHER

on your data. Example, where M respectively R is number of elements in P. respectively in Sigma. and S is number of valid shifts:

Running time in seconds for N=100000
Algorithm M=10,R=1999,S=50 M=10,R=4,S=10000 M=100,R=4,S=10000
Naive 16  25  150
KMP 21  22  23
Boyer-Moore 15  138

IMPLEMENTATION
Unit: internal subroutine

Global variables: array T.1,...,T.N of strings, array P.1,...,P.M of strings, array Sigma.1,...,Sigma.R of strings

Parameters: positive integer N - number of elements in T., positive integer M - number of elements in P., positive integer R - number of elements in Sigma.

Output: displays on the screen Pattern occurs with shift S for all valid shift S

 BOYER_MOORE_MATCHER: procedure expose T. P. Sigma. parse arg M, N, R call COMPUTE_LAST_OCCURENCE_FUNCTION M, R call COMPUTE_GOOD_SUFFIX_FUNCTION M S = 0 do while S <= N - M   do J = M to 1 by -1     SpJ = S + J     if P.J <> T.SpJ then leave   end   if J = 0     then do       say "Pattern occurs at shift" S       S = S + Gama.0     end     else do       SpJ = S + J; TSpJ = T.SpJ       S = S + MAX(Gama.J, J - Lambda.TSpJ)     end end return   COMPUTE_LAST_OCCURENCE_FUNCTION: procedure expose P. Lambda. Sigma. parse arg M, R do I = 1 to R   SigmaI = Sigma.I; Lambda.SigmaI = 0 end do J = 1 to M   PJ = P.J; Lambda.PJ = J end return   COMPUTE_GOOD_SUFFIX_FUNCTION: procedure expose Gama. P. parse arg M call COMPUTE_PREFIX_FUNCTION M do I = 1 to M   PuvPi.I = Pi.I; PuvP.I = P.I end do I = M to 1 by -1   MmIp1 = M - I + 1; P.I = PuvP.MmIp1 end call COMPUTE_PREFIX_FUNCTION M do I = 1 to M; P.I = PuvP.I; end do J = 0 to M   Gama.J = M - PuvPi.M end do L = 1 to M   J = M - Pi.L   if Gama.J > L - Pi.L     then Gama.J = L - Pi.L end return   COMPUTE_PREFIX_FUNCTION: procedure expose Pi. P. parse arg M Pi.1 = 0; K = 0 do Q = 2 to M   Kp1 = K + 1   do while K > 0 & P.Kp1 <> P.Q     K = Pi.K; Kp1 = K + 1   end   Kp1 = K + 1   if P.Kp1 = P.Q then K = K + 1   Pi.Q = K end return

CONNECTIONS
String-Matching Problem
Matcher for real string
Naive string matcher
Knuth-Morris-Pratt matcher

Literature
Cormen T. H., Leiserson Ch. E., Rivest R. L. Introduction to Algorithms
The MIT Press, Cambridge, 1990

 cover contents index main page