Sie sind auf Seite 1von 16

AN/N LOG N ALGORITHM FOR MINIMIZING STATES IN kF I N ITE AUTOMATON

BY JOHN HOPCROFT

STAN-CS-71-190 January, 1971

COMPUTER SCIENCE DEPARTMENT School of Humanities and Sciences STANFORD UN IVERS ITY

AN

N LOG N ALGORITHM FOR MINIMIZING STATES IN A FINITE AUTOMATON

John Hopcroft

Abstract An algorithm is given for minimizing the number of states in a finite automaton or for determining if two finite automata are equivalent. algorithm is bounded by knlogn states. The constant The asymptotic running time of the

where k is some constant and n is the number of

k depends linearly on the size of the input alphabet.

This research was supported by the National Science Foundation under grant number NSF-GJ-96, and the Office of Naval Research under grant number N-00014-67-A-0112-0057 NR 044-402. Reproduction in whole or in part is permitted for any purpose of the United States Government.

AN n log n ALGORITHM FOR MINIMIZING STATES IN A FINITE AUTOMATON John Hopcroft Stanford University

Introduction Most basic texts on finite automata give algorithms for minimizing the number of states in a finite automaton [l, 21. where n However, a worst case analysis of these algorithms indicate that they are n2 processes

is the number of states.

For finite automata with large numbers of states, these algorithms are

grossly inefficient.

Thus in this paper we describe an algorithm for minimizing the states in which the n log n . The constant of proportionality depends

asymptotic running time in a worst case analysis grows as linearly on the number of input symbols. automata are equivalent.

Clearly the same algorithm can be used to determine if two finite

The essence of previously published algorithms was to first partition the states according to their outputs. The blocks of the partitions are then repeatedly refined by examining the successor state on a States whose successor states on a given input are in different

given input for each state in the block. blocks are placed in separate blocks.

When no further refinement is possible, all states in the same block of Consider the example in Figure 1. The initial partition is Input 0 1 2 3 4 5 1 2 3 4

the partition can be shown to be equivalent. (1,2,3,4,5)(6)

. Since on input 0 , the successor State __ 1 2 3 4. Output - - - 0 0 0 0 0 1 '

states of states 1, 2, 3 and 4 are in the first block of the partition and the successor of state 5 is in the second block, the first iteration refines the partition into the blocks (1,2,3,4)( 5) and (6) . (6) ;

Successive refinements yield (1,2,3)(4)(5) (1,2)(3)(4)(5)(6)

and (l)(2)(3)(4)(5)(6) - Thus,

in this example all pairs of states are inequivalent. For this example it is seen that as many as n

5 6 5 6 I! 6 6 Next State Figure 1

iterations may be required and the total number of steps needed to execute the algorithm if implemented in a straightforward fashion on a digital computer is n2 . The algorithm proposed in this paper may also require n over all iterations yields only n log n . detail. iterations but the work per iteration summed

We illustrate the algorithm by an example before specifying it in First the state table

Extensive use of list processing is employed to reduce the computation time.

is inverted to obtain the table shown in Figure 2. ($293,495) (6) .

The states are partitioned according to their outputs

Next a block and an input symbol on which the partition is refined are selected. Assume The states in each block are further partitioned depending on Thus the next partition is (1,2,3,4)(5)(6) .

the block (6) and input 0 are selected.

whether on input 0 their next state is in block (6) or not. Note that had we partitioned on the block (1,2,3,4,5) 1

and input 0 we would have obtained the same result.

More generally, once we have partitioned on a block and an input symbol, we need never partition on that block and input symbol again until the block is split and then we need only partition on one of the two subblocks. Since the time needed to 4 5 6 3 4 5,6 previous state Figure 2 number of steps in the algorithm is bounded by n log n . 4 5 .6 0 0 1

partition on a block is proportional to the transitions into the block and since we can always select the half with fewer transitions, the total

Formal description of the algorithm Let A = (S,I,G,F) inputs, be a finite automaton where S is a finite set of states, I is a finite set of

6 is a mapping from S x I into S and F c S is the set of final states. No initial state is

specified since it is of no importance in what follows. The mapping 6 is extended to SxI* in the usual * manner where I denotes the set of all finite length strings of symbols from I . States s and t are * said to be equivalent if for each x in I , 6(s,x) is in F if and only if 6(t,x) is in F . The algorithm for finding the equivalence classes of S is described below. 1. Step trseS and ae1 construct 6-'(s,a) = {tlE(t,a) = s] . Step 2. Construct B(1) = F , a(i) = {s]ssB(i) 3 Step . Step 4. Set k=3 VasI construct B(2) = S-F and for each a in I and l_< i 5 2 construct and E-'(s,a) # @] .

L(a) =

Cl3
c23

if la(l) \ 5 142) I
otherwise

5 Step

Select a in I and i in L(a) . Delete i from L(a) .

The algorithm terminates when L(a) = $ for each a in I .

Step 6. Step 7.

Vj < k st 3% in B(j) with s(t,a) es(i) perform steps 7a, 7b,a7c, and 7d. Step 7a. 7b. Step Partition B(j) into B'(j) = {tls(t,a) e&(i)] and B"(j) = B(j)-B'(j) . and construct B(k) = B"(j) . Construct corresponding a(j)

Replace B(j) by B'(j) and a(k)

for each a in I .

7 c Step

TagI modify L(a) as follows.

L(a) U Cjl
L(a) = L(a) U ikj Step 7d. 8 Step . Set k = k+l .

if j{L(a) otherwise

and 0 < la(j)1 _< la(k)\

Return to Step 5.

Correctness of algorithm The claim is made that on termination of the algorithm two states are equivalent if and only if they are in the same block. some a in The algorithm must terminate since the only times that an index is added to L(a) for

I are in Step 4 which is executed only once and in Step 7c.

An index is added at Step 7c only

after a refinement of a block of the partition. for some a . Thus the algorithm must terminate.
shown by

Each time Step 6 is executed, an index is removed from L(a)

It is easily

induction on the number of times Step 7a is executed that if s is in B(i) and Clearly, it is true the first time Step 7a is

t is in B(j) , i # j , then s is not equivalent to t . executed since only two blocks exist, states. Blocks are refined at Step 7a B(1)

containing only final states and B(2) containing only nonfinal

only when successor states on a given input have previously been

shown to be inequivalent. To see that two inequivalent states cannot be in the same block when the algorithm terminates, assume that states s assume 8( s, a) and t are in B(i) and that s and t are not equivalent. Without loss of generality,

is in B(j) and s(t,a) is in B(k) where j f k .

(If s(s,a) and s(t,a) are in the Clearly

same block then there exists a shortest x an x exists and hence a shortest x

such that 6(s,x) and 6(t,x) are in distinct blocks.

since for same x

one or the other of 6(s,x) and 6(t,x) but not states. Let a be the ~(s,Y) and 6(t,y) and

both is in a final state and each block consists solely of final or solely of nonfinal last symbol of x and write x = ya . Then 6(s,y) are not equivalent and 6(6(s,y),a) replace t by 6(t,y) .) and 6(6(t,y),a) and F(t,y) are in the same block, are in different blocks.

Replace s by 6(s,y)

Consider the point at which the block containing 6(s,a) and &(t,a) was

partitioned so that o(s,a) and s(t,a) first appeared in separate subblocks. At that point one of the two subblocks was placed in L(a) . t is partitioned with s a contradiction. When this subblock is removed from L(a) , the block containing s and Thus s and t cannot both be in B(i) ,

and t going into separate subblocks.

Analysis of the running time The running time of the algorithm is clearly dependent on the implementation. The algorithm has been programmed in AIGOL. Since the implementation consists of approximately 300 ALGOL statements we shall simply

indicate how the various steps were implemented and discuss informally their running time.

Tlie sets sui:h as

6 -'(s,a) , L(a) , etc. were represented by linked lists in such a way that an item Vectors were also

could be added o-- deleted at the beginning of the l'st <r. a fixed number of steps. maintained to indicate if a state was or was x;ot on a c:'-r~,n list.

This eliminates searching a list simply and a(i) were

to determine if the item is on the list and is essen',in .n Step 7c. The sets B(i) represented
8s

dou'[Ijr

linked lists

SO

that an

iwtem

could oe added or deleted

an)qJhere in the list in a fixed

number of steps once the position is given. B(i) and a(i) Steps 1, I', states ti!?rs

The struct:,Ye was such that given a state

s , its position in

could be determined in a fixed number of steps. and 4 are executed only once and eal:ire time proportional to the product of the number of
9 f orm a simple loop.

'lf: number of input symbols.

f,teps :, thr3';-tI

The time necessary to

traverse t?e l,;op for given a in I and i on input a 'erminating on states in B(i) .

in I,(a)

is proportional to the number of state transitions

(To see tl.i,s, note that Steps 5, 6 and 8 are finite. I-o see if there exists a t in B(j) with 8(t,a)

Step 7 WI do not need to examine B(j) in a(i) . such t!,a:

for eacil J < 'I a(i)

!:ather we look at each state in s(t,a) is in a(i) .

and i,hen consult the inverse state table to find each t

Each time a new t ic found, the block containing t is located and t The block is then placed on a list of blocks Finally we go down the list of blocks which have

placed or- : list of states to be split off from the :jl~.:. which na-,e becr~ refined if it is not already on the list. been refined and actually partition them.

The number of blocks we must look at must be less than the number The time necessary to actually partition When the number is summed over all blocks a terminating on states in

of state t-.a-,s'tians on input a terminatin,? on states in B(i) . a block is piorzrtional to the number of states to be split off.

which are part;t.ioned it adds up to the number of state transitions on input B(i) .) I; be the constant of proportionality.

Consider :,i,e time spent in the loop of Step j thro@~ 8 for a given input symbol a .

Assume that at

step > the blocks of the partition are B(l),B(Z),...,B(m) and that L(a) = {il,i2, . . ., ir] . Let {ir+-l,ip32,...,im] = {1,2,...,m] -L(a) . The clati, is made that the total time spent on traversals of the

loop for which input symbol a

is selected in Step 5 until the program terminates is bounded by m + C ai. loe(ai ./T)! j=r+l J 3

T=k(? ai log ai. j=l j J

Clearly the bound is valid if the algorithm has terminated. other than L(a)

If the loop is traversed for an input symbol

a , then the time spent is not included in T . However, since blocks get split and the set

modified we must show that the new value of T , call it !?' , is less than or equal to the old value If a block whose index is in L(a) is partitioned, then a term of the form b log b is replaced by c log c + (b-c) log(b-c) which decreases the value of T . If a block whose index is not

of r .

the expression in L(a)

is partitioned, then a term of the form b log b/2 -c log c + (b -c) log(b -c)/2

is replaced by the expression

where c <b-c . _

Since

c _<b/2

and (-b-c)/2 _< b/2 , < c log b/2 + (b -c) log b/2 _< blogb/? .

c log c + {b -c) log@ -c)/2

ither case $ is less than T . Finally, ass-Lme r # 0 and a and some B in L(a) has been
4

selected at Step 5.

As we showed earlier, the time around the loop is bounded by kal . Thus by induction,

the total time is bounded by m k[ap + a1 log(aJ?) + i a. log a. + c ai W3(ai /2) lj j=l lj j=r+l j j jb

1 .
I

We must show that this eqression is less than or equal to T . That is, we need show

&P
Clearly

+ aa log a!/2 _< a1 log ae . This completes the proof of the claim.

al +a1 log a!/2 = ap(log a!/2 + log 2) = ap log aI .

The first time Step 5 is executed the formula for

T is bounded by knlog n . Multiply by the number

of input symbols and adding in the time for Steps 1 through 4 yields a total bound proportional to n log n .

Experimental results and conclusions In order to obtain timing information, the algorithm was applied to two classes of finite automata. Automata in the first class are given by A(n) = (

fL2, . . ..n].{O,l],s,{l]) where 5(1,0) = 6(1,1) = 1 and

6(i,O) = i-l and 6(i,l) = i for 2 _< i _< n . B(n) = ([1,2,..., n3, CO,ll,~, Cill _< i,_< n/23) 6(n/4+ i,O) = 6(n/4+ i,l) = 2i-1 for n/2 <i_<n .

Automata in the second class are given (for even n) by

where 5(i,O) = 6(i,l) = n/2 + 2i-1 and = 6(n/2+i,l) = 2i-1 for

l_< i _< n/4 and S(n/2+i,O)

The running times on an IBM 360/67 for the two classes are listed in Table 1. n 100 1000 2003

A(n)

B(n)

Table 1.

time in seconds steps for previous algorithms.

Note that A(n) is the example which required n2 Our algorithm is particularly suited for

A(n) and a detailed analysis shows that the running time The worst case for

should grow linearly-with the number of states as the experimental evidence indicates. our algorithm is typified by B(n) B(n) should grow as n log n

in which blocks are always partitioned equally. The running time for

for both the current algorithm and for previously published algorithms. The

results seem to indicate that the algorithm is practical for minimizing states in finite automata (or testing equivalence of finite automata) of up to several thousand states.

References 1. 2. Harrison, M. A., McCluskey, E. J., Introduction to Switching and Automata Theory, McGraw-Hill, New York, 1965. Introduction to the Theory of Switching Circuits, McGraw-Hill, New York, 1965.

l.-. r, -,

i-/

: I

,. .-

. CL1 << r * t:; -

: (, t 1 ^>:L! L:::t? 3,164 C^(C :,:jr, oce7 *; *: f $3 1; ,; t 1 $,;iC 1:71 01: 7 2 ::73

-- -? - 3- - --+a - -- -

:c74

-4

:c75 - QC7h 4-

:Ci7 -

:c73 cc79 !; r) I? fj ZCEl ,?CE2 :Cf3

- - -- - - 4

,CE4 >I(-,?5

---

LZEf?

--

cce7 - CC&?8 - i;,;Eq - :rco - n .c GL) 1 cc92 - c c 5 3 4-

Ii794 cc<5

---

CO96 - 4 CCC7 - Cc58 - LCCO - )1,0 -01r1 4ClCZ - 0133 -OlC4 - ClC5 - c!lCh - OlC7 - CLC8 - 4 ClC3 - OLLC - 2111 -CA12 - 3 1 1 3 4Cl14 - c115 5@116 - -

?111 - 1 1 ? eTll> - Cli j - -2 1 L 1 - * CLLI --@123 h:LL't --

712s

--

rllih Cli7 )lZY :1is J 1 3 ,I n131 c132 1: 1 1 -.J

-6 -- - 5 -:- --

Cl?4 2135

t--

313,; - Cl?2 6-

?L?F, :137

--6

Cl443 1:1t1

---

OlL.2

-5

Cl45 Gi44

---

Cl45 0146 Cl47

-5 -4 --

Cl48 0143 c1c31 T'lC1

4---/t

Cl57 - QlC3 4-

Cl54 Cl53

---

ClEh - 4 (7157 - <lCE 4$15) - Cltt; - 31t1 - 4 CLCl - 3 1 6 3 4-

c1t4 3tt3 7166 Cl67

---4 --

ECCIh @CC{ 3L:CKLISTZEH(Jr3); I:r,k!CCCKL ISTZE4C(2):=2; ChS; !F (p~-ivc~+:-( L)>=?l AhC 4 PfJF(Ch;_(l )<=,tJPr?4F(L) SF;Ih ACC (!:LCCYL I STChrE, 1) ; ChBLcC,YLISTC+1( 1) :=l; E hl!: EL3 F PEGIh Act (s?LiSKL ISTcr.~r2) ; ChhLSr.KLISTCNE(Z):=2; cr.r; . rLccK:=3;

1 TI-_N

i l k + ! -3 ,lt - 3 1 7 : - ,:171 -0172 -0173 -Cl74 !-

1;9 CECEHER 1 9 7 0 a; 12: 17 P4CE 4


J:=S~Tb(~LCCKtIST7~a~l~

~FCV~FI:lIY(i:LCCKLI~TZE~T:l; ChQLCCKLT~TZERC(Jl:=@: r,t: Tf-1 !?I\, <StJ; =hc CLTE I=


~>C~T~;LI~;TC~~I=~ TkEh P "C IN

Cl75 - 2170 -CL77 -8178 -c175 - 3 CAE: - 31Sl 3ala? - 0183 -Cl@4 - 31&5 -~:=C~TJ(?L~CKLISTC~F); RF~Cv~FR~V(eLCCKLISTCNF); ChCLCCKCI3TCNC(J):=3; GC T O DIVCNE;

Clfb

-3

CIVZEFC:

CILCAE:

ClE7 - Cl88 - c1e4 - 0 1 9 0 30151 - Cl92 - Cl53 4 0194 -0195 -OlSb - ClS7 - OlSd - 0159 -c2c3 - OZCl - 4 02c2 - 0223 -3 r)2C4 - 02C5 - G2Cb - zic7 3C2CH - 02C9 - 3210 40211 -c/212 -c213 - 0214 -0215 -C216 - -3217 -C2lR - 4 3214 -c220 - 3 c221 -0222 -022.4 -0224 -c225 - 0226 -0227 -022t3 - CZiQ 3023ll - C23L - 0232 --

RfTURh :

EhC E L S F C C T C FIhISkEC; STATE:=ZCRObFXT(JI; kHILE STATF-r=O DC f!EGIh C E L L :=PREvChZERO(STATE-fob; WkICE CELL.-=0 C O YEGIh LASTcTATE :=CATA(CELL): K:=dLCCK(LASTSTATE) ; Arr>( $PLIT(KI ,LASTSTAT!l ; if L C C K S P L I T (K )=O THFY ACi)( 5PL ITf?Lr:CKS,K 1 ; hUMlhSPLIT(K1 :=hUCIhSLIT(K)+l; HLCCKSPLIT(K) :=l; CELL :=NEXT(CELLb; ENP; STATf :=tEFC\EXf(STbTE); ErJC ; r,r T C FETUPh; .iiTATE:=CIjEhFXT( J) ; hl-ILE STATE-=0 D O SEGIh C E L L :=PPEVCNGNE( S T A T E - N ) ; kt-ILE CFLL1=9 C C HEC IN LASTSTATE :=CATA(CELLJ; K:=HLCCK(tASTSTATE) ; ACZ(SPLIT(totCASTSTATC); I F ELCCKSPLIT(K)=O THEN ACII(SPLITBLCCKS,K~; hUYINSPLIT(K):=hU~INS~LIT(K)+~; PLCCKSPLIT(Kl:=l; C E L L :=NEXT(CELL) ; EhC; STATE :=CNFhEXT( S T A T E ) : EF\C; CC~CEhT~*n*s4~~~4*~~d*****++**~~~**~~~~*~~~~~~*~~~~~**~~~~~$~~*~*~* * REcfhE B L C C K S *~$1S$a9~**4****~$~~~~*~~~**~*~~~*~~~~~~*~~~~*~~~~~*~~~~~~**~~~ I F (SfLITt?LCCKS=Cl Tt-EN GU T C REPFAT; RSPLIT:-CAT4(SPLITBLCCKS); PE?CVFF~Cb(SPLITPLCCKSI; ~LCCKPLIT(~SPLIT):=C; I F hUNI~~SLOCK(B~PLIJ)=NU~I~~SPLIT@SPLT) TtIEh HECIh hL~INFLIT(BSPLIT):=0; ht-ILF ISFL!T(SSPLIT)-.=O) C13 REFfOVF:~K~lY~SPCIT~R~PL!T~I; r,C T O F:rTLRy\;

233 -3 (32?L - -

C 2 3 5 - i? 30 - -

CL37 - I;238 3c23q - c24u - 2241 -.)2LL - :!24.$ - c744 - 2245 -(:246 -3 2 4 7 4cc?43 - c 2 4 '3 - c25: - ?LCl - Cic2 - d 17c j - ._ L , CL54 - 4 I?/';5 - c2rf: 4X%57 - czj;j - '325'; -fl26Ci - SZCl - 2262 -Tzt3 - 4 :2t4 - 3 OLt5 - c2t5 - 0 2 6 7 - 1269 -~oLt.9 - 32 7 ': 3 CLjl - -

2ILiZ -

t-273 -3 0274 - -

CL75

?-

3276 -2277 -CL79 - ? !)27c; - d / F ) - 3 2 & l 30/"2 - )2?+ -ZiG/l -3 c2e5 - CL56 3':')Fl - C?[>j - ,!Lc;'J - 3 ,!';,- - -

ST~:~FI:RC (VEQS IIXh z lP!OL6S 1

@LCoL

00 12: 17

>ECFMRCR

1976

ci

PAGF

1 c L b --

32F2 - 0293 -ci94 - 0245 -c?i=t !C2S7 - CL58 - c299 - n3cc - C3CL - ll3c2 - @3C3 4 0304 -03c5 - 03G6 - 4 c3c7 - 3 c3ce - 2 c3c9 - 1 lCR32 3YTES CF C C D E GEhERATEC

I?,SZ.?3 S E C C N C S I N CCMPILATICh,

Das könnte Ihnen auch gefallen