Beruflich Dokumente
Kultur Dokumente
BY JOHN HOPCROFT
COMPUTER SCIENCE DEPARTMENT School of Humanities and Sciences STANFORD UN IVERS ITY
AN
John Hopcroft
Abstract An algorithm is given for minimizing the number of states in a finite automaton or for determining if two finite automata are equivalent. algorithm is bounded by knlogn states. The constant The asymptotic running time of the
This research was supported by the National Science Foundation under grant number NSF-GJ-96, and the Office of Naval Research under grant number N-00014-67-A-0112-0057 NR 044-402. Reproduction in whole or in part is permitted for any purpose of the United States Government.
AN n log n ALGORITHM FOR MINIMIZING STATES IN A FINITE AUTOMATON John Hopcroft Stanford University
Introduction Most basic texts on finite automata give algorithms for minimizing the number of states in a finite automaton [l, 21. where n However, a worst case analysis of these algorithms indicate that they are n2 processes
For finite automata with large numbers of states, these algorithms are
grossly inefficient.
Thus in this paper we describe an algorithm for minimizing the states in which the n log n . The constant of proportionality depends
asymptotic running time in a worst case analysis grows as linearly on the number of input symbols. automata are equivalent.
The essence of previously published algorithms was to first partition the states according to their outputs. The blocks of the partitions are then repeatedly refined by examining the successor state on a States whose successor states on a given input are in different
given input for each state in the block. blocks are placed in separate blocks.
When no further refinement is possible, all states in the same block of Consider the example in Figure 1. The initial partition is Input 0 1 2 3 4 5 1 2 3 4
states of states 1, 2, 3 and 4 are in the first block of the partition and the successor of state 5 is in the second block, the first iteration refines the partition into the blocks (1,2,3,4)( 5) and (6) . (6) ;
in this example all pairs of states are inequivalent. For this example it is seen that as many as n
iterations may be required and the total number of steps needed to execute the algorithm if implemented in a straightforward fashion on a digital computer is n2 . The algorithm proposed in this paper may also require n over all iterations yields only n log n . detail. iterations but the work per iteration summed
We illustrate the algorithm by an example before specifying it in First the state table
Next a block and an input symbol on which the partition is refined are selected. Assume The states in each block are further partitioned depending on Thus the next partition is (1,2,3,4)(5)(6) .
whether on input 0 their next state is in block (6) or not. Note that had we partitioned on the block (1,2,3,4,5) 1
More generally, once we have partitioned on a block and an input symbol, we need never partition on that block and input symbol again until the block is split and then we need only partition on one of the two subblocks. Since the time needed to 4 5 6 3 4 5,6 previous state Figure 2 number of steps in the algorithm is bounded by n log n . 4 5 .6 0 0 1
partition on a block is proportional to the transitions into the block and since we can always select the half with fewer transitions, the total
Formal description of the algorithm Let A = (S,I,G,F) inputs, be a finite automaton where S is a finite set of states, I is a finite set of
6 is a mapping from S x I into S and F c S is the set of final states. No initial state is
specified since it is of no importance in what follows. The mapping 6 is extended to SxI* in the usual * manner where I denotes the set of all finite length strings of symbols from I . States s and t are * said to be equivalent if for each x in I , 6(s,x) is in F if and only if 6(t,x) is in F . The algorithm for finding the equivalence classes of S is described below. 1. Step trseS and ae1 construct 6-'(s,a) = {tlE(t,a) = s] . Step 2. Construct B(1) = F , a(i) = {s]ssB(i) 3 Step . Step 4. Set k=3 VasI construct B(2) = S-F and for each a in I and l_< i 5 2 construct and E-'(s,a) # @] .
L(a) =
Cl3
c23
if la(l) \ 5 142) I
otherwise
5 Step
Step 6. Step 7.
Vj < k st 3% in B(j) with s(t,a) es(i) perform steps 7a, 7b,a7c, and 7d. Step 7a. 7b. Step Partition B(j) into B'(j) = {tls(t,a) e&(i)] and B"(j) = B(j)-B'(j) . and construct B(k) = B"(j) . Construct corresponding a(j)
for each a in I .
7 c Step
L(a) U Cjl
L(a) = L(a) U ikj Step 7d. 8 Step . Set k = k+l .
if j{L(a) otherwise
Return to Step 5.
Correctness of algorithm The claim is made that on termination of the algorithm two states are equivalent if and only if they are in the same block. some a in The algorithm must terminate since the only times that an index is added to L(a) for
after a refinement of a block of the partition. for some a . Thus the algorithm must terminate.
shown by
It is easily
induction on the number of times Step 7a is executed that if s is in B(i) and Clearly, it is true the first time Step 7a is
t is in B(j) , i # j , then s is not equivalent to t . executed since only two blocks exist, states. Blocks are refined at Step 7a B(1)
shown to be inequivalent. To see that two inequivalent states cannot be in the same block when the algorithm terminates, assume that states s assume 8( s, a) and t are in B(i) and that s and t are not equivalent. Without loss of generality,
same block then there exists a shortest x an x exists and hence a shortest x
one or the other of 6(s,x) and 6(t,x) but not states. Let a be the ~(s,Y) and 6(t,y) and
both is in a final state and each block consists solely of final or solely of nonfinal last symbol of x and write x = ya . Then 6(s,y) are not equivalent and 6(6(s,y),a) replace t by 6(t,y) .) and 6(6(t,y),a) and F(t,y) are in the same block, are in different blocks.
Replace s by 6(s,y)
Consider the point at which the block containing 6(s,a) and &(t,a) was
partitioned so that o(s,a) and s(t,a) first appeared in separate subblocks. At that point one of the two subblocks was placed in L(a) . t is partitioned with s a contradiction. When this subblock is removed from L(a) , the block containing s and Thus s and t cannot both be in B(i) ,
Analysis of the running time The running time of the algorithm is clearly dependent on the implementation. The algorithm has been programmed in AIGOL. Since the implementation consists of approximately 300 ALGOL statements we shall simply
indicate how the various steps were implemented and discuss informally their running time.
6 -'(s,a) , L(a) , etc. were represented by linked lists in such a way that an item Vectors were also
could be added o-- deleted at the beginning of the l'st <r. a fixed number of steps. maintained to indicate if a state was or was x;ot on a c:'-r~,n list.
to determine if the item is on the list and is essen',in .n Step 7c. The sets B(i) represented
8s
dou'[Ijr
linked lists
SO
that an
iwtem
number of steps once the position is given. B(i) and a(i) Steps 1, I', states ti!?rs
s , its position in
could be determined in a fixed number of steps. and 4 are executed only once and eal:ire time proportional to the product of the number of
9 f orm a simple loop.
f,teps :, thr3';-tI
traverse t?e l,;op for given a in I and i on input a 'erminating on states in B(i) .
in I,(a)
(To see tl.i,s, note that Steps 5, 6 and 8 are finite. I-o see if there exists a t in B(j) with 8(t,a)
Each time a new t ic found, the block containing t is located and t The block is then placed on a list of blocks Finally we go down the list of blocks which have
placed or- : list of states to be split off from the :jl~.:. which na-,e becr~ refined if it is not already on the list. been refined and actually partition them.
The number of blocks we must look at must be less than the number The time necessary to actually partition When the number is summed over all blocks a terminating on states in
of state t-.a-,s'tians on input a terminatin,? on states in B(i) . a block is piorzrtional to the number of states to be split off.
which are part;t.ioned it adds up to the number of state transitions on input B(i) .) I; be the constant of proportionality.
Consider :,i,e time spent in the loop of Step j thro@~ 8 for a given input symbol a .
Assume that at
step > the blocks of the partition are B(l),B(Z),...,B(m) and that L(a) = {il,i2, . . ., ir] . Let {ir+-l,ip32,...,im] = {1,2,...,m] -L(a) . The clati, is made that the total time spent on traversals of the
is selected in Step 5 until the program terminates is bounded by m + C ai. loe(ai ./T)! j=r+l J 3
Clearly the bound is valid if the algorithm has terminated. other than L(a)
a , then the time spent is not included in T . However, since blocks get split and the set
modified we must show that the new value of T , call it !?' , is less than or equal to the old value If a block whose index is in L(a) is partitioned, then a term of the form b log b is replaced by c log c + (b-c) log(b-c) which decreases the value of T . If a block whose index is not
of r .
is partitioned, then a term of the form b log b/2 -c log c + (b -c) log(b -c)/2
where c <b-c . _
Since
c _<b/2
and (-b-c)/2 _< b/2 , < c log b/2 + (b -c) log b/2 _< blogb/? .
ither case $ is less than T . Finally, ass-Lme r # 0 and a and some B in L(a) has been
4
selected at Step 5.
As we showed earlier, the time around the loop is bounded by kal . Thus by induction,
the total time is bounded by m k[ap + a1 log(aJ?) + i a. log a. + c ai W3(ai /2) lj j=l lj j=r+l j j jb
1 .
I
We must show that this eqression is less than or equal to T . That is, we need show
&P
Clearly
+ aa log a!/2 _< a1 log ae . This completes the proof of the claim.
of input symbols and adding in the time for Steps 1 through 4 yields a total bound proportional to n log n .
Experimental results and conclusions In order to obtain timing information, the algorithm was applied to two classes of finite automata. Automata in the first class are given by A(n) = (
6(i,O) = i-l and 6(i,l) = i for 2 _< i _< n . B(n) = ([1,2,..., n3, CO,ll,~, Cill _< i,_< n/23) 6(n/4+ i,O) = 6(n/4+ i,l) = 2i-1 for n/2 <i_<n .
The running times on an IBM 360/67 for the two classes are listed in Table 1. n 100 1000 2003
A(n)
B(n)
Table 1.
Note that A(n) is the example which required n2 Our algorithm is particularly suited for
A(n) and a detailed analysis shows that the running time The worst case for
should grow linearly-with the number of states as the experimental evidence indicates. our algorithm is typified by B(n) B(n) should grow as n log n
in which blocks are always partitioned equally. The running time for
for both the current algorithm and for previously published algorithms. The
results seem to indicate that the algorithm is practical for minimizing states in finite automata (or testing equivalence of finite automata) of up to several thousand states.
References 1. 2. Harrison, M. A., McCluskey, E. J., Introduction to Switching and Automata Theory, McGraw-Hill, New York, 1965. Introduction to the Theory of Switching Circuits, McGraw-Hill, New York, 1965.
l.-. r, -,
i-/
: I
,. .-
: (, t 1 ^>:L! L:::t? 3,164 C^(C :,:jr, oce7 *; *: f $3 1; ,; t 1 $,;iC 1:71 01: 7 2 ::73
-- -? - 3- - --+a - -- -
:c74
-4
:c75 - QC7h 4-
:Ci7 -
- - -- - - 4
,CE4 >I(-,?5
---
LZEf?
--
Ii794 cc<5
---
CO96 - 4 CCC7 - Cc58 - LCCO - )1,0 -01r1 4ClCZ - 0133 -OlC4 - ClC5 - c!lCh - OlC7 - CLC8 - 4 ClC3 - OLLC - 2111 -CA12 - 3 1 1 3 4Cl14 - c115 5@116 - -
712s
--
-6 -- - 5 -:- --
Cl?4 2135
t--
313,; - Cl?2 6-
?L?F, :137
--6
Cl443 1:1t1
---
OlL.2
-5
Cl45 Gi44
---
-5 -4 --
4---/t
Cl57 - QlC3 4-
Cl54 Cl53
---
---4 --
ECCIh @CC{ 3L:CKLISTZEH(Jr3); I:r,k!CCCKL ISTZE4C(2):=2; ChS; !F (p~-ivc~+:-( L)>=?l AhC 4 PfJF(Ch;_(l )<=,tJPr?4F(L) SF;Ih ACC (!:LCCYL I STChrE, 1) ; ChBLcC,YLISTC+1( 1) :=l; E hl!: EL3 F PEGIh Act (s?LiSKL ISTcr.~r2) ; ChhLSr.KLISTCNE(Z):=2; cr.r; . rLccK:=3;
1 TI-_N
Cl75 - 2170 -CL77 -8178 -c175 - 3 CAE: - 31Sl 3ala? - 0183 -Cl@4 - 31&5 -~:=C~TJ(?L~CKLISTC~F); RF~Cv~FR~V(eLCCKLISTCNF); ChCLCCKCI3TCNC(J):=3; GC T O DIVCNE;
Clfb
-3
CIVZEFC:
CILCAE:
ClE7 - Cl88 - c1e4 - 0 1 9 0 30151 - Cl92 - Cl53 4 0194 -0195 -OlSb - ClS7 - OlSd - 0159 -c2c3 - OZCl - 4 02c2 - 0223 -3 r)2C4 - 02C5 - G2Cb - zic7 3C2CH - 02C9 - 3210 40211 -c/212 -c213 - 0214 -0215 -C216 - -3217 -C2lR - 4 3214 -c220 - 3 c221 -0222 -022.4 -0224 -c225 - 0226 -0227 -022t3 - CZiQ 3023ll - C23L - 0232 --
RfTURh :
EhC E L S F C C T C FIhISkEC; STATE:=ZCRObFXT(JI; kHILE STATF-r=O DC f!EGIh C E L L :=PREvChZERO(STATE-fob; WkICE CELL.-=0 C O YEGIh LASTcTATE :=CATA(CELL): K:=dLCCK(LASTSTATE) ; Arr>( $PLIT(KI ,LASTSTAT!l ; if L C C K S P L I T (K )=O THFY ACi)( 5PL ITf?Lr:CKS,K 1 ; hUMlhSPLIT(K1 :=hUCIhSLIT(K)+l; HLCCKSPLIT(K) :=l; CELL :=NEXT(CELLb; ENP; STATf :=tEFC\EXf(STbTE); ErJC ; r,r T C FETUPh; .iiTATE:=CIjEhFXT( J) ; hl-ILE STATE-=0 D O SEGIh C E L L :=PPEVCNGNE( S T A T E - N ) ; kt-ILE CFLL1=9 C C HEC IN LASTSTATE :=CATA(CELLJ; K:=HLCCK(tASTSTATE) ; ACZ(SPLIT(totCASTSTATC); I F ELCCKSPLIT(K)=O THEN ACII(SPLITBLCCKS,K~; hUYINSPLIT(K):=hU~INS~LIT(K)+~; PLCCKSPLIT(Kl:=l; C E L L :=NEXT(CELL) ; EhC; STATE :=CNFhEXT( S T A T E ) : EF\C; CC~CEhT~*n*s4~~~4*~~d*****++**~~~**~~~~*~~~~~~*~~~~~**~~~~~$~~*~*~* * REcfhE B L C C K S *~$1S$a9~**4****~$~~~~*~~~**~*~~~*~~~~~~*~~~~*~~~~~*~~~~~~**~~~ I F (SfLITt?LCCKS=Cl Tt-EN GU T C REPFAT; RSPLIT:-CAT4(SPLITBLCCKS); PE?CVFF~Cb(SPLITPLCCKSI; ~LCCKPLIT(~SPLIT):=C; I F hUNI~~SLOCK(B~PLIJ)=NU~I~~SPLIT@SPLT) TtIEh HECIh hL~INFLIT(BSPLIT):=0; ht-ILF ISFL!T(SSPLIT)-.=O) C13 REFfOVF:~K~lY~SPCIT~R~PL!T~I; r,C T O F:rTLRy\;
233 -3 (32?L - -
C 2 3 5 - i? 30 - -
CL37 - I;238 3c23q - c24u - 2241 -.)2LL - :!24.$ - c744 - 2245 -(:246 -3 2 4 7 4cc?43 - c 2 4 '3 - c25: - ?LCl - Cic2 - d 17c j - ._ L , CL54 - 4 I?/';5 - c2rf: 4X%57 - czj;j - '325'; -fl26Ci - SZCl - 2262 -Tzt3 - 4 :2t4 - 3 OLt5 - c2t5 - 0 2 6 7 - 1269 -~oLt.9 - 32 7 ': 3 CLjl - -
2ILiZ -
t-273 -3 0274 - -
CL75
?-
3276 -2277 -CL79 - ? !)27c; - d / F ) - 3 2 & l 30/"2 - )2?+ -ZiG/l -3 c2e5 - CL56 3':')Fl - C?[>j - ,!Lc;'J - 3 ,!';,- - -
@LCoL
00 12: 17
>ECFMRCR
1976
ci
PAGF
1 c L b --
32F2 - 0293 -ci94 - 0245 -c?i=t !C2S7 - CL58 - c299 - n3cc - C3CL - ll3c2 - @3C3 4 0304 -03c5 - 03G6 - 4 c3c7 - 3 c3ce - 2 c3c9 - 1 lCR32 3YTES CF C C D E GEhERATEC
I?,SZ.?3 S E C C N C S I N CCMPILATICh,