Hopcroft Original CS TR 71 190

AN/N LOG N ALGORITHM FOR MINIMIZING STATES IN kF I N ITE AUTOMATON
BY JOHN HOPCROFT
STAN-CS-71-190 January, 1971
COMPUTER SCIENCE DEPARTMENT School of Humanities and Sciences STANFORD UN IVERS ITY
AN
N LOG N ALGORITHM FOR MINIMIZING STATES IN A FINITE AUTOMATON
John Hopcroft
Abstract An algorithm is given for minimizing the number of states in a finite automaton or for determining if two finite automata are equivalent. algorithm is bounded by knlogn states. The constant The asymptotic running time of the
where k is some constant and n is the number of
k depends linearly on the size of the input alphabet.
This research was supported by the National Science Foundation under grant number NSF-GJ-96, and the Office of Naval Research under grant number N-00014-67-A-0112-0057 NR 044-402. Reproduction in whole or in part is permitted for any purpose of the United States Government.
AN n log n ALGORITHM FOR MINIMIZING STATES IN A FINITE AUTOMATON John Hopcroft Stanford University
Introduction Most basic texts on finite automata give algorithms for minimizing the number of states in a finite automaton [l, 21. where n However, a worst case analysis of these algorithms indicate that they are n2 processes
is the number of states.
For finite automata with large numbers of states, these algorithms are
grossly inefficient.
Thus in this paper we describe an algorithm for minimizing the states in which the n log n . The constant of proportionality depends
asymptotic running time in a worst case analysis grows as linearly on the number of input symbols. automata are equivalent.
Clearly the same algorithm can be used to determine if two finite
The essence of previously published algorithms was to first partition the states according to their outputs. The blocks of the partitions are then repeatedly refined by examining the successor state on a States whose successor states on a given input are in different
given input for each state in the block. blocks are placed in separate blocks.
When no further refinement is possible, all states in the same block of Consider the example in Figure 1. The initial partition is Input 0 1 2 3 4 5 1 2 3 4
the partition can be shown to be equivalent. (1,2,3,4,5)(6)
. Since on input 0 , the successor State __ 1 2 3 4. Output - - - 0 0 0 0 0 1 '
states of states 1, 2, 3 and 4 are in the first block of the partition and the successor of state 5 is in the second block, the first iteration refines the partition into the blocks (1,2,3,4)( 5) and (6) . (6) ;
Successive refinements yield (1,2,3)(4)(5) (1,2)(3)(4)(5)(6)
and (l)(2)(3)(4)(5)(6) - Thus,
in this example all pairs of states are inequivalent. For this example it is seen that as many as n
5 6 5 6 I! 6 6 Next State Figure 1
iterations may be required and the total number of steps needed to execute the algorithm if implemented in a straightforward fashion on a digital computer is n2 . The algorithm proposed in this paper may also require n over all iterations yields only n log n . detail. iterations but the work per iteration summed
We illustrate the algorithm by an example before specifying it in First the state table
Extensive use of list processing is employed to reduce the computation time.
is inverted to obtain the table shown in Figure 2. ($293,495) (6) .
The states are partitioned according to their outputs
Next a block and an input symbol on which the partition is refined are selected. Assume The states in each block are further partitioned depending on Thus the next partition is (1,2,3,4)(5)(6) .
the block (6) and input 0 are selected.
whether on input 0 their next state is in block (6) or not. Note that had we partitioned on the block (1,2,3,4,5) 1
and input 0 we would have obtained the same result.
More generally, once we have partitioned on a block and an input symbol, we need never partition on that block and input symbol again until the block is split and then we need only partition on one of the two subblocks. Since the time needed to 4 5 6 3 4 5,6 previous state Figure 2 number of steps in the algorithm is bounded by n log n . 4 5 .6 0 0 1
partition on a block is proportional to the transitions into the block and since we can always select the half with fewer transitions, the total
Formal description of the algorithm Let A = (S,I,G,F) inputs, be a finite automaton where S is a finite set of states, I is a finite set of
6 is a mapping from S x I into S and F c S is the set of final states. No initial state is
specified since it is of no importance in what follows. The mapping 6 is extended to SxI* in the usual * manner where I denotes the set of all finite length strings of symbols from I . States s and t are * said to be equivalent if for each x in I , 6(s,x) is in F if and only if 6(t,x) is in F . The algorithm for finding the equivalence classes of S is described below. 1. Step trseS and ae1 construct 6-'(s,a) = {tlE(t,a) = s] . Step 2. Construct B(1) = F , a(i) = {s]ssB(i) 3 Step . Step 4. Set k=3 VasI construct B(2) = S-F and for each a in I and l_< i 5 2 construct and E-'(s,a) # @] .
L(a) =
Cl3
c23
if la(l) \ 5 142) I
otherwise
5 Step
Select a in I and i in L(a) . Delete i from L(a) .
The algorithm terminates when L(a) = $ for each a in I .
Step 6. Step 7.
Vj < k st 3% in B(j) with s(t,a) es(i) perform steps 7a, 7b,a7c, and 7d. Step 7a. 7b. Step Partition B(j) into B'(j) = {tls(t,a) e&(i)] and B"(j) = B(j)-B'(j) . and construct B(k) = B"(j) . Construct corresponding a(j)
Replace B(j) by B'(j) and a(k)
for each a in I .
7 c Step
TagI modify L(a) as follows.
L(a) U Cjl
L(a) = L(a) U ikj Step 7d. 8 Step . Set k = k+l .
if j{L(a) otherwise
and 0 < la(j)1 _< la(k)\
Return to Step 5.
Correctness of algorithm The claim is made that on termination of the algorithm two states are equivalent if and only if they are in the same block. some a in The algorithm must terminate since the only times that an index is added to L(a) for
I are in Step 4 which is executed only once and in Step 7c.
An index is added at Step 7c only
after a refinement of a block of the partition. for some a . Thus the algorithm must terminate.
shown by
Each time Step 6 is executed, an index is removed from L(a)
It is easily
induction on the number of times Step 7a is executed that if s is in B(i) and Clearly, it is true the first time Step 7a is
t is in B(j) , i # j , then s is not equivalent to t . executed since only two blocks exist, states. Blocks are refined at Step 7a B(1)
containing only final states and B(2) containing only nonfinal
only when successor states on a given input have previously been
shown to be inequivalent. To see that two inequivalent states cannot be in the same block when the algorithm terminates, assume that states s assume 8( s, a) and t are in B(i) and that s and t are not equivalent. Without loss of generality,
is in B(j) and s(t,a) is in B(k) where j f k .
(If s(s,a) and s(t,a) are in the Clearly
same block then there exists a shortest x an x exists and hence a shortest x
such that 6(s,x) and 6(t,x) are in distinct blocks.
since for same x
one or the other of 6(s,x) and 6(t,x) but not states. Let a be the ~(s,Y) and 6(t,y) and
both is in a final state and each block consists solely of final or solely of nonfinal last symbol of x and write x = ya . Then 6(s,y) are not equivalent and 6(6(s,y),a) replace t by 6(t,y) .) and 6(6(t,y),a) and F(t,y) are in the same block, are in different blocks.
Replace s by 6(s,y)
Consider the point at which the block containing 6(s,a) and &(t,a) was
partitioned so that o(s,a) and s(t,a) first appeared in separate subblocks. At that point one of the two subblocks was placed in L(a) . t is partitioned with s a contradiction. When this subblock is removed from L(a) , the block containing s and Thus s and t cannot both be in B(i) ,
and t going into separate subblocks.
Analysis of the running time The running time of the algorithm is clearly dependent on the implementation. The algorithm has been programmed in AIGOL. Since the implementation consists of approximately 300 ALGOL statements we shall simply
indicate how the various steps were implemented and discuss informally their running time.
Tlie sets sui:h as
6 -'(s,a) , L(a) , etc. were represented by linked lists in such a way that an item Vectors were also
could be added o-- deleted at the beginning of the l'st <r. a fixed number of steps. maintained to indicate if a state was or was x;ot on a c:'-r~,n list.
This eliminates searching a list simply and a(i) were
to determine if the item is on the list and is essen',in .n Step 7c. The sets B(i) represented
8s
dou'[Ijr
linked lists
SO
that an
iwtem
could oe added or deleted
an)qJhere in the list in a fixed
number of steps once the position is given. B(i) and a(i) Steps 1, I', states ti!?rs
The struct:,Ye was such that given a state
s , its position in
could be determined in a fixed number of steps. and 4 are executed only once and eal:ire time proportional to the product of the number of
9 f orm a simple loop.
'lf: number of input symbols.
f,teps :, thr3';-tI
The time necessary to
traverse t?e l,;op for given a in I and i on input a 'erminating on states in B(i) .
in I,(a)
is proportional to the number of state transitions
(To see tl.i,s, note that Steps 5, 6 and 8 are finite. I-o see if there exists a t in B(j) with 8(t,a)
Step 7 WI do not need to examine B(j) in a(i) . such t!,a:
for eacil J < 'I a(i)
!:ather we look at each state in s(t,a) is in a(i) .
and i,hen consult the inverse state table to find each t
Each time a new t ic found, the block containing t is located and t The block is then placed on a list of blocks Finally we go down the list of blocks which have
placed or- : list of states to be split off from the :jl~.:. which na-,e becr~ refined if it is not already on the list. been refined and actually partition them.
The number of blocks we must look at must be less than the number The time necessary to actually partition When the number is summed over all blocks a terminating on states in
of state t-.a-,s'tians on input a terminatin,? on states in B(i) . a block is piorzrtional to the number of states to be split off.
which are part;t.ioned it adds up to the number of state transitions on input B(i) .) I; be the constant of proportionality.
Consider :,i,e time spent in the loop of Step j thro@~ 8 for a given input symbol a .
Assume that at
step > the blocks of the partition are B(l),B(Z),...,B(m) and that L(a) = {il,i2, . . ., ir] . Let {ir+-l,ip32,...,im] = {1,2,...,m] -L(a) . The clati, is made that the total time spent on traversals of the
loop for which input symbol a
is selected in Step 5 until the program terminates is bounded by m + C ai. loe(ai ./T)! j=r+l J 3
T=k(? ai log ai. j=l j J
Clearly the bound is valid if the algorithm has terminated. other than L(a)
If the loop is traversed for an input symbol
a , then the time spent is not included in T . However, since blocks get split and the set
modified we must show that the new value of T , call it !?' , is less than or equal to the old value If a block whose index is in L(a) is partitioned, then a term of the form b log b is replaced by c log c + (b-c) log(b-c) which decreases the value of T . If a block whose index is not
of r .
the expression in L(a)
is partitioned, then a term of the form b log b/2 -c log c + (b -c) log(b -c)/2
is replaced by the expression
where c <b-c . _
Since
c _<b/2
and (-b-c)/2 _< b/2 , < c log b/2 + (b -c) log b/2 _< blogb/? .
c log c + {b -c) log@ -c)/2
ither case $ is less than T . Finally, ass-Lme r # 0 and a and some B in L(a) has been
4
selected at Step 5.
As we showed earlier, the time around the loop is bounded by kal . Thus by induction,
the total time is bounded by m k[ap + a1 log(aJ?) + i a. log a. + c ai W3(ai /2) lj j=l lj j=r+l j j jb
1 .
I
We must show that this eqression is less than or equal to T . That is, we need show
&P
Clearly
+ aa log a!/2 _< a1 log ae . This completes the proof of the claim.
al +a1 log a!/2 = ap(log a!/2 + log 2) = ap log aI .
The first time Step 5 is executed the formula for
T is bounded by knlog n . Multiply by the number
of input symbols and adding in the time for Steps 1 through 4 yields a total bound proportional to n log n .
Experimental results and conclusions In order to obtain timing information, the algorithm was applied to two classes of finite automata. Automata in the first class are given by A(n) = (
fL2, . . ..n].{O,l],s,{l]) where 5(1,0) = 6(1,1) = 1 and
6(i,O) = i-l and 6(i,l) = i for 2 _< i _< n . B(n) = ([1,2,..., n3, CO,ll,~, Cill _< i,_< n/23) 6(n/4+ i,O) = 6(n/4+ i,l) = 2i-1 for n/2 <i_<n .
Automata in the second class are given (for even n) by
where 5(i,O) = 6(i,l) = n/2 + 2i-1 and = 6(n/2+i,l) = 2i-1 for
l_< i _< n/4 and S(n/2+i,O)
The running times on an IBM 360/67 for the two classes are listed in Table 1. n 100 1000 2003
A(n)
B(n)
Table 1.
time in seconds steps for previous algorithms.
Note that A(n) is the example which required n2 Our algorithm is particularly suited for
A(n) and a detailed analysis shows that the running time The worst case for
should grow linearly-with the number of states as the experimental evidence indicates. our algorithm is typified by B(n) B(n) should grow as n log n
in which blocks are always partitioned equally. The running time for
for both the current algorithm and for previously published algorithms. The
results seem to indicate that the algorithm is practical for minimizing states in finite automata (or testing equivalence of finite automata) of up to several thousand states.
References 1. 2. Harrison, M. A., McCluskey, E. J., Introduction to Switching and Automata Theory, McGraw-Hill, New York, 1965. Introduction to the Theory of Switching Circuits, McGraw-Hill, New York, 1965.
l.-. r, -,
i-/
: I
,. .-
. CL1 << r * t:; -
: (, t 1 ^>:L! L:::t? 3,164 C^(C :,:jr, oce7 *; *: f $3 1; ,; t 1 $,;iC 1:71 01: 7 2 ::73
-- -? - 3- - --+a - -- -
:c74
-4
:c75 - QC7h 4-
:Ci7 -
:c73 cc79 !; r) I? fj ZCEl ,?CE2 :Cf3
- - -- - - 4
,CE4 >I(-,?5
---
LZEf?
--
cce7 - CC&?8 - i;,;Eq - :rco - n .c GL) 1 cc92 - c c 5 3 4-
Ii794 cc<5
---
CO96 - 4 CCC7 - Cc58 - LCCO - )1,0 -01r1 4ClCZ - 0133 -OlC4 - ClC5 - c!lCh - OlC7 - CLC8 - 4 ClC3 - OLLC - 2111 -CA12 - 3 1 1 3 4Cl14 - c115 5@116 - -
?111 - 1 1 ? eTll> - Cli j - -2 1 L 1 - * CLLI --@123 h:LL't --
712s
--
rllih Cli7 )lZY :1is J 1 3 ,I n131 c132 1: 1 1 -.J
-6 -- - 5 -:- --
Cl?4 2135
t--
313,; - Cl?2 6-
?L?F, :137
--6
Cl443 1:1t1
---
OlL.2
-5
Cl45 Gi44
---
Cl45 0146 Cl47
-5 -4 --
Cl48 0143 c1c31 T'lC1
4---/t
Cl57 - QlC3 4-
Cl54 Cl53
---
ClEh - 4 (7157 - <lCE 4$15) - Cltt; - 31t1 - 4 CLCl - 3 1 6 3 4-
c1t4 3tt3 7166 Cl67
---4 --
ECCIh @CC{ 3L:CKLISTZEH(Jr3); I:r,k!CCCKL ISTZE4C(2):=2; ChS; !F (p~-ivc~+:-( L)>=?l AhC 4 PfJF(Ch;_(l )<=,tJPr?4F(L) SF;Ih ACC (!:LCCYL I STChrE, 1) ; ChBLcC,YLISTC+1( 1) :=l; E hl!: EL3 F PEGIh Act (s?LiSKL ISTcr.~r2) ; ChhLSr.KLISTCNE(Z):=2; cr.r; . rLccK:=3;
1 TI-_N
i l k + ! -3 ,lt - 3 1 7 : - ,:171 -0172 -0173 -Cl74 !-
1;9 CECEHER 1 9 7 0 a; 12: 17 P4CE 4

J:=S~Tb(~LCCKtIST7~a~l~
~FCV~FI:lIY(i:LCCKLI~TZE~T:l; ChQLCCKLT~TZERC(Jl:=@: r,t: Tf-1 !?I\, <StJ; =hc CLTE I=

~>C~T~;LI~;TC~~I=~ TkEh P "C IN
Cl75 - 2170 -CL77 -8178 -c175 - 3 CAE: - 31Sl 3ala? - 0183 -Cl@4 - 31&5 -~:=C~TJ(?L~CKLISTC~F); RF~Cv~FR~V(eLCCKLISTCNF); ChCLCCKCI3TCNC(J):=3; GC T O DIVCNE;
Clfb
-3
CIVZEFC:
CILCAE:
ClE7 - Cl88 - c1e4 - 0 1 9 0 30151 - Cl92 - Cl53 4 0194 -0195 -OlSb - ClS7 - OlSd - 0159 -c2c3 - OZCl - 4 02c2 - 0223 -3 r)2C4 - 02C5 - G2Cb - zic7 3C2CH - 02C9 - 3210 40211 -c/212 -c213 - 0214 -0215 -C216 - -3217 -C2lR - 4 3214 -c220 - 3 c221 -0222 -022.4 -0224 -c225 - 0226 -0227 -022t3 - CZiQ 3023ll - C23L - 0232 --
RfTURh :
EhC E L S F C C T C FIhISkEC; STATE:=ZCRObFXT(JI; kHILE STATF-r=O DC f!EGIh C E L L :=PREvChZERO(STATE-fob; WkICE CELL.-=0 C O YEGIh LASTcTATE :=CATA(CELL): K:=dLCCK(LASTSTATE) ; Arr>( $PLIT(KI ,LASTSTAT!l ; if L C C K S P L I T (K )=O THFY ACi)( 5PL ITf?Lr:CKS,K 1 ; hUMlhSPLIT(K1 :=hUCIhSLIT(K)+l; HLCCKSPLIT(K) :=l; CELL :=NEXT(CELLb; ENP; STATf :=tEFC\EXf(STbTE); ErJC ; r,r T C FETUPh; .iiTATE:=CIjEhFXT( J) ; hl-ILE STATE-=0 D O SEGIh C E L L :=PPEVCNGNE( S T A T E - N ) ; kt-ILE CFLL1=9 C C HEC IN LASTSTATE :=CATA(CELLJ; K:=HLCCK(tASTSTATE) ; ACZ(SPLIT(totCASTSTATC); I F ELCCKSPLIT(K)=O THEN ACII(SPLITBLCCKS,K~; hUYINSPLIT(K):=hU~INS~LIT(K)+~; PLCCKSPLIT(Kl:=l; C E L L :=NEXT(CELL) ; EhC; STATE :=CNFhEXT( S T A T E ) : EF\C; CC~CEhT~*n*s4~~~4*~~d*****++**~~~**~~~~*~~~~~~*~~~~~**~~~~~$~~*~*~* * REcfhE B L C C K S *~$1S$a9~**4****~$~~~~*~~~**~*~~~*~~~~~~*~~~~*~~~~~*~~~~~~**~~~ I F (SfLITt?LCCKS=Cl Tt-EN GU T C REPFAT; RSPLIT:-CAT4(SPLITBLCCKS); PE?CVFF~Cb(SPLITPLCCKSI; ~LCCKPLIT(~SPLIT):=C; I F hUNI~~SLOCK(B~PLIJ)=NU~I~~SPLIT@SPLT) TtIEh HECIh hL~INFLIT(BSPLIT):=0; ht-ILF ISFL!T(SSPLIT)-.=O) C13 REFfOVF:~K~lY~SPCIT~R~PL!T~I; r,C T O F:rTLRy\;
233 -3 (32?L - -
C 2 3 5 - i? 30 - -
CL37 - I;238 3c23q - c24u - 2241 -.)2LL - :!24.$ - c744 - 2245 -(:246 -3 2 4 7 4cc?43 - c 2 4 '3 - c25: - ?LCl - Cic2 - d 17c j - ._ L , CL54 - 4 I?/';5 - c2rf: 4X%57 - czj;j - '325'; -fl26Ci - SZCl - 2262 -Tzt3 - 4 :2t4 - 3 OLt5 - c2t5 - 0 2 6 7 - 1269 -~oLt.9 - 32 7 ': 3 CLjl - -
2ILiZ -
t-273 -3 0274 - -
CL75
?-
3276 -2277 -CL79 - ? !)27c; - d / F ) - 3 2 & l 30/"2 - )2?+ -ZiG/l -3 c2e5 - CL56 3':')Fl - C?[>j - ,!Lc;'J - 3 ,!';,- - -
ST~:~FI:RC (VEQS IIXh z lP!OL6S 1
@LCoL
00 12: 17
>ECFMRCR
1976
ci
PAGF
1 c L b --
32F2 - 0293 -ci94 - 0245 -c?i=t !C2S7 - CL58 - c299 - n3cc - C3CL - ll3c2 - @3C3 4 0304 -03c5 - 03G6 - 4 c3c7 - 3 c3ce - 2 c3c9 - 1 lCR32 3YTES CF C C D E GEhERATEC
I?,SZ.?3 S E C C N C S I N CCMPILATICh,

Hopcroft Original CS TR 71 190

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Hopcroft Original CS TR 71 190

Hochgeladen von

Copyright:

Verfügbare Formate

AN/N LOG N ALGORITHM FOR MINIMIZING STATES IN kF I N ITE AUTOMATON

STAN-CS-71-190 January, 1971

N LOG N ALGORITHM FOR MINIMIZING STATES IN A FINITE AUTOMATON

where k is some constant and n is the number of

k depends linearly on the size of the input alphabet.

is the number of states.

Clearly the same algorithm can be used to determine if two finite

the partition can be shown to be equivalent. (1,2,3,4,5)(6)

. Since on input 0 , the successor State __ 1 2 3 4. Output - - - 0 0 0 0 0 1 '

Successive refinements yield (1,2,3)(4)(5) (1,2)(3)(4)(5)(6)

and (l)(2)(3)(4)(5)(6) - Thus,

5 6 5 6 I! 6 6 Next State Figure 1

Extensive use of list processing is employed to reduce the computation time.

is inverted to obtain the table shown in Figure 2. ($293,495) (6) .

The states are partitioned according to their outputs

the block (6) and input 0 are selected.

and input 0 we would have obtained the same result.

Select a in I and i in L(a) . Delete i from L(a) .

The algorithm terminates when L(a) = $ for each a in I .

Replace B(j) by B'(j) and a(k)

TagI modify L(a) as follows.

and 0 < la(j)1 _< la(k)\

I are in Step 4 which is executed only once and in Step 7c.

An index is added at Step 7c only

Each time Step 6 is executed, an index is removed from L(a)

containing only final states and B(2) containing only nonfinal

only when successor states on a given input have previously been

is in B(j) and s(t,a) is in B(k) where j f k .

(If s(s,a) and s(t,a) are in the Clearly

such that 6(s,x) and 6(t,x) are in distinct blocks.

since for same x

and t going into separate subblocks.

Tlie sets sui:h as

This eliminates searching a list simply and a(i) were

could oe added or deleted

an)qJhere in the list in a fixed

The struct:,Ye was such that given a state

'lf: number of input symbols.

The time necessary to

is proportional to the number of state transitions

Step 7 WI do not need to examine B(j) in a(i) . such t!,a:

for eacil J < 'I a(i)

!:ather we look at each state in s(t,a) is in a(i) .

and i,hen consult the inverse state table to find each t

loop for which input symbol a

T=k(? ai log ai. j=l j J

If the loop is traversed for an input symbol

the expression in L(a)

is replaced by the expression

c log c + {b -c) log@ -c)/2

al +a1 log a!/2 = ap(log a!/2 + log 2) = ap log aI .

The first time Step 5 is executed the formula for

T is bounded by knlog n . Multiply by the number

fL2, . . ..n].{O,l],s,{l]) where 5(1,0) = 6(1,1) = 1 and

Automata in the second class are given (for even n) by

where 5(i,O) = 6(i,l) = n/2 + 2i-1 and = 6(n/2+i,l) = 2i-1 for

l_< i _< n/4 and S(n/2+i,O)

time in seconds steps for previous algorithms.

. CL1 << r * t:; -

:c73 cc79 !; r) I? fj ZCEl ,?CE2 :Cf3

cce7 - CC&?8 - i;,;Eq - :rco - n .c GL) 1 cc92 - c c 5 3 4-

?111 - 1 1 ? eTll> - Cli j - -2 1 L 1 - * CLLI --@123 h:LL't --