Sie sind auf Seite 1von 71

Finite-State Automata and Algorithms

Bernd Kiefer, kiefer@dfki.de Many thanks to Anette Frank for the slides MSc. Computational inguistics Course, SS !""#

$%er%ie&
Finite-state automata 'FSA( ) *hat for+
) ,ecap- Chomsky hierarchy of grammars and languages ) FSA, regular languages and regular e.pressions ) Appropriate pro/lem classes and applications

Finite-state automata and algorithms


) ,egular e.pressions and FSA ) 0eterministic '0FSA( %s. non-deterministic '1FSA( finite-state automata ) 0etermini2ation- from 1FSA to 0FSA ) Minimi2ation of 0FSA

3.tensions- finite-state transducers and FS4 operations

Finite-state automata- *hat for+


Chomsky 5ierarchy of anguages ,egular languages '4ype-6( Conte.t-free languages '4ype-!( Conte.t-sensiti%e languages '4ype-7( 4ype-" languages 5ierarchy of 8rammars and Automata ,egular 9S grammar Finite-state automata Conte.t-free 9S grammar 9ush-do&n automata 4ree ad:oining grammars inear /ounded automata 8eneral 9S grammars 4uring machine

computationally more comple. less efficient

Finite-state automata model regular languages


,egular e.pressions

desc

ec i

fy

; ri/e

e ; sp

spec

des cri/

i fy

Finite automata e.ecuta/le= Finite-state MAC5<13

descri/e;specify recogni2e

,egular languages

Finite-state automata model regular languages


,egular e.pressions

desc

ec i

fy

; ri/e

e ; sp

spec

des cri/

i fy

Finite ,egular grammars automata e.ecuta/le= e.ecuta/le= Finite-state MAC5<13

descri/e;specify recogni2e;generate

,egular languages > properties of regular languages > appropriate pro/lem classes > algorithms for FSA

anguages, formal languages and grammars


Alphabet - finite set of sym/ols String - se?uence x1 ... xn of sym/ols xi from the alpha/et
) Special case- empty string

Language over - the set of strings that can /e generated from


) Sigma star @ - set of all possi/le strings o%er the alpha/et A Ba, bC @ A B, a, b, aa, ab, ba, bb, aaa, aab, ...C ) Sigma plus D - D A @ -BC ) Special languages- A BC 'empty language( BC 'language of empty string(

Strings

A formal language - a su/set of @ Basic operation on strings- concatenation


) <f a = xi xm and b = xm+1 xn then a b = ab = xi xmxm+1 xn ) Concatenation is associati%e /ut not commutati%e ) is identity element - a = a = a

A grammar of a particular type generates a language of a corresponding type

,ecap on Formal 8rammars and anguages


A formal grammar is a tuple 8 A E , , S, ,F
) ) ) ) alpha/et of terminal symbols alpha/et of non terminal symbols ' =( S the start symbol , finite set of rules , @ @ of the form &here A and and @

4he language L!"# generate$ by a grammar "


) set of strings % @ that can /e $erive$ from S according to 8AE ,, S, ,F &erivation' gi%en 8AE, , S, ,F and u,v @ A ' (@ ) a $irect $erivation '7 step( w G v holds iff u7, u! @ e.ist such that % = u1 u( and v = u1 u(, and , e.ists ) a $erivation w G* v holds iff either % = v or ) @ e.ists such that % "* ) an$ ) " v

A language generate$ by a grammar "' <.e., '8( strongly depends on , =

'8( A B% - S "* % + % @C

Chomsky 5ierarchy of 8rammars


Classification of languages generated /y formal grammars
) A language is of type i 'i = ,,1,(,-( iff it is generated /y a type-i grammar ) Classification according to increasingly restricted types of production rules L-type-0 L-type-1 L-type-2 L-type-3 ) 3%ery grammar generates a uni?ue language, /ut a language can /e generated /y se%eral different grammars. ) 4&o grammars are !.ea/ly# e0uivalent if they generate the same string language Strongly e0uivalent if they generate /oth the same string language and the same tree language

Chomsky 5ierarchy of 8rammars


4ype-" languages- general phrase structure grammars
no restrictions on the form of production rulesar/itrary strings on 5S and ,5S of rules A grammar 8 A E, , S, ,F generates a language -type-" iff
) all rules , are of the form , &here D and @ '&ith A ( ) <.e., 5S a nonempty se?uence of 14 or 4 sym/ols &ith at least one 14 sym/ol and ,5S a possi/ly empty se?uence of 14 or 4 sym/ols

3.ample8 A EBS,A,B,C,0,3C,BaC,S,,F, '8( A Ba!n G n7C S ACaB. CB 3. a3 3a. Ca aaC. a0 0a. A3 . CB 0B. A0 AC. a!! A aaaa '8( iff S @ aaaa

Chomsky 5ierarchy of 8rammars


4ype-7 languages- conte.t-sensiti%e grammars
A grammar 8 A E, , S, ,F generates a language -type-7 iff
) all rules , are of the form A , or S '&ith no S sym/ol on ,5S( %here A and , , @ ' A (, ) <.e., 5S- non-empty se?uence of 14 or 4 sym/ols &ith at least one 14 sym/ol and ,5S a nonempty se?uence of 14 or 4 sym/ols 'e.ception- S # ) For all rules 5S ,5S - G 5SG G,5SG

3.ampleA B an /n cn G n7C , A B S a S B C, a B a /, S a B C, / B / /, C B B C, / C / c, c C c c C a6/6c6 A aaa///ccc '8( iff S @ aaa///ccc

Chomsky 5ierarchy of 8rammars


4ype-! languages- conte.t-free grammars
A grammar 8 A E, , S, ,F generates a language -type-! iff
) all rules , are of the form A , &here A and @ ' A ( ) <.e., 5S- a single 14 sym/olH ,5S a 'possi/ly empty( se?uence of 14 or 4 sym/ols

3.ampleA B a n / a n G n 7 C

, A B S A S A, S /, A a C

Chomsky 5ierarchy of 8rammars


4ype-6 languages- regular or finite-state grammar
A grammar 8 A E, , S, ,F is called right 'left( linear 'or regular( iff
) all rules , are of the form % or A %1 !or A 1%#, %here A,1 and % ) i.e., 5S- a single 14 sym/olH ,5S- a 'possi/ly empty( se?uence of 4 sym/ols, optionally follo&ed 'preceded( /y a 14 sym/ol

3.ample A B a, / C A B S, A, BC , A B S a A,

S a / / A A / / B B /

B / B, A a A, B/ A//B C

SaAaaAaa//Baa///Baa////

$perations on languages
4ypical set-theoretic operations on languages
) ) ) ) Inion- 7 ! A B & - & 7 or & ! C <ntersection- 7 ! A B & - & 7 and & ! C 0ifference- 7 - ! A B & - & 7 and & ! C Complement of * &rt. @- ) A @ -

anguage-theoretic operations on languages


) Concatenation- 7 ! A B&7&! - &7 7 and &! !C ) <teration- "ABC, 7A , !A , ... @A i" i, ) Mirror image- -7 A B&-7 - & C
D

A i>"

Inion, concatenation and Kleene star are called regular operations ,egular sets;languages- languages that are defined /y the regular operations- concatenation '( , union '( and kleene star '@( ,egular languages are close$ under concatenation, union, /leene star, intersection an$ complementation

,egular languages, regular e.pressions and FSA


ec i

,egular e.pressions

desc

fy

; ri/e

e ; sp

spec

des

cri/

ify

Finite ,egular grammars automata e.ecuta/le= e.ecuta/le= Finite-state MAC5<13

descri/e;specify recogni2e;generate

,egular languages

,egular languages and regular e.pressions


,egular sets;languages can /e specified;defined /y regular e.pressions 8i%en a set of terminal sym/ols , the follo&ing are regular e.pressions ) is a regular e.pression ) For e%ery a , a is a regular e.pression ) <f , is a regular e.pression, then ,@ is a regular e.pression ) <f J,, are regular e.pressions, then J, 'J ,( and J , are regular e.pressions 3%ery regular e.pression denotes a regular language
) ) ) ) ) '( A BC 'a( A BaC for all a '( A '( '( ' # = '( '# '*( = '(@

Finite-state automata 'FSA(


8rammars- generate 'or recogni2e( languages Automata- recogni2e 'or generate( languages Finite-state automata recogni2e regular languages A finite automaton 'FA( is a tuple A A E,, , ?",FF
) ) ) ) ) a finite non-empty set of states a finite alpha/et of input letters a transition function ?" the initial state F the set of final 'accepting( states
p p a p r ?

4ransition graphs 'diagrams() states- circles ) transitions- directed arcs /et&een circles ) initial state ) final state p 'p, a( A ? p A ?" rF

FSA transition graphs and tra%ersal


4ransition graph
?" l ?M c ?7 l e e ?! ?N e ?6 ?O a % t ?K e ?# r ?L S A ?" F A B?L, ?O C 4ransition function -
'?",c(A?7 '?",e(A?6 '?",l(A?M '?7,l(A?! '?!,e(A?6 '?6,a(A?K '?6,%(A?#

4ra%ersal of an FSA A 2omputation %ith an 3SA


c
c l ?M

e
l e e

e
e ?6 ?O

r
a % t ?K e ?# r ?L

'?K,r(A?L '?M,e(A?N '?N,t(A?O '?O,t(A?# '?#,e(A?K

?"

?7

?! ?N

FSA transition graphs and tra%ersal


4ransition graph
?" l ?M c ?7 l e e ?! ?N e ?6 ?O a % t ?K e ?# r ?L

State diagram
?" ?7 ?! ?6 ?K ?L ?M ?N ?O ?# a " " " ?K " " " " " " c ?7 " " " " " " " " " e ?6 " ?6 " " " ?N " " ?K l ?M ?! " " " " " " " " r " " " " ?L " " " " " t " " " " " " " ?O ?# " % " " " ?# " " " " " "

4ra%ersal of an FSA A 2omputation %ith an 3SA


c
c l ?M

e
l e e

e
e ?6 ?O

r
a % t ?K e ?# r ?L

?"

?7

?! ?N

FSA can /e used for > acceptance 'recognition( > generation

FSA tra%ersal and acceptance of an input string


4raversal of a 'deterministic( FSA
) FSA traversal is defined /y states and transitions of A, relati%e to an input string %@ ) A configuration of A is defined /y the current state and the unread part of the input string- '?, &i,(, &ith ?, &i suffi. of & ) A transition- a /inary relation /et&een configurations '?,&i( |)A '?P,&iD7( iff &i A 2&iD7 for 2 and '?,2(A ?P '?,&i( yiel$s '?P,&iD7( in a single transition step ) ,efle.i%e, transiti%e closure of |)A- '?, &i( |)@A '?P, &:( '?, &i( yiel$s '?P, &:( in 2ero or a finite num/er of steps

Acceptance
) 0ecide &hether an input string % is in the language 'A( defined /y FSA A ) An FSA A accepts a string & iff '?",&( |)@A '?f, (, &ith ?" initial state, ?f F ) 4he language L!A# accepte$ by 3SA A is the set of all strings accepte$ by A <.e., & 'A( iff there is some ?f FA such that '?",&( |)@A '?f, (

,egular grammars and Finite-state automata


A grammar 8 A E, , S, ,F is called right linear 'or regular( iff all rules , are of the form A % or A %1, %here A,1 and % @
) ABa, /C, ABS,A,BC, ,ABS aA, A aA, A //B, B /B, B /C S aA aaA aa//B aa///B aa//// ) 4he 14 sym/ol corresponds to a state in an FSA- the future of the deri%ation only depends on the identity of this state or sym/ol and the remaining production rules. S ) 2orrespon$ence of type - grammar rules a A %ith transitions in a !non $eterministic# 3SA'

& B (,&)= & (,&)=?, ?

/ /

A / B B /

) Con%ersely, &e can construct an FSA from the rules of a type-6 language

/ ,egular grammars and FSA can /e sho&n to /e e?ui%alent ,egular grammars generate regular languages ,egular languages are defined /y concatenation, union, kleene star

0eterministic finite-state automata


0eterministic finite-state automata '0FSA(
) at each state, there is at most one transition that can /e taken to read the ne.t input sym/ol ) the ne.t state 'transition( is fully $etermine$ by current configuration ) is functional 'and there are no -transitions(

0eterminism is a useful property for an FSA to ha%e=


) Acceptance or re:ection of an input can /e computed in linear time ,!n# for inputs of length n ) 3specially important for processing of A,83 documents

Appropriate pro/lem classes for FSA


) ,ecognition and acceptance of regular languages, in particular string manipulation, regular phonological and morphological processes ) Appro.imations of non-regular languages in morphology, shallo& finitestate parsing, Q

Multiple e?ui%alent FSA


FSA for the language lehr A B lehrbar, lehrbar/eit, belehrbar, belehrbar/eit, unbelehrbar, unbelehrbar/eit, unlehrbar, unlehrbar/eit C 0FSA for lehr /e lehr
un /e lehr lehr /ar keit

,egular e.pression and FSA for lehr - 'un G ( '/e lehr G lehr( /ar 'keit G ( 'non-deterministic( un keit /e lehr
/ar

3?ui%alent FSA 'non-deterministic(


un /e

lehr

lehr /ar keit

0efining FSA through regular e.pressions


FSA for e%en mildly comple. regular languages are /est constructed from regular e.pressions=
3%ery regular e.pression denotes a regular language
) ) '( A BC 'a( A BaC for all a

' ( A '( '( ' # = '( '# '*( = '(@

3%ery regular e.pression translates to a FSA 'Closure properties(


) An FSA for a '&ith 'a( A BaC(, a ) An FSA for '&ith '( A B C(, ) Concatenation of t&o FSA FA and FB

FA

= ( initial state( = ( set of final states(

FB

FAB

= {(<?i,>,?:) | ?i , ?: = }

0efining FSA through regular e.pressions


) union of t&o FSA FA and FBFA SAB A s" 'ne& state( FAB A B s: C 'ne& state( FB AB A A B B'E?",F,?2( G ?" A SAB, ' ?2 A SA or ?2 A SB(C B'E?2,F,?:( G '?2FA or ?2FB(, ?i FABC

FA B

) Kleene Star o%er an FSA FA FA SA@ A s" 'ne& state( FA@ A B ?: C 'ne& state( AB A A B'E?:,F,?2( G ?: FA, ?2 A SA(C B'E?",F,?2( G ?" A SA@, ' ?2 A SA or ?2 A FA@(C B'E?2,F,?:( G ?2FA , ?:FA@C

FA@

0efining FSA through regular e.pressions


'a/ a/a(@

transition' mo%e to '?, ( &ithout reading an input sym/ol FSA construction from regular e.pressions yields a non-deterministic FSA '1FSA( ) Choice of ne.t state is only partially $etermine$ /y the current configuration, i.e., &e cannot al&ays predict &hich state &ill /e the ne.t state in the tra%ersal

1on-deterministic finite-state automata '1FSA(


5on $eterminism

<ntroduced /y -transitions and;or 4ransition /eing a relation R o%er @ , i.e. a set of triples E?source,2,?targetF 3?ui%alently- 4ransition function maps to a set of states- - '( a finite non-empty set of states a finite alpha/et of input letters a transition function * '( ?" the initial state F the set of final 'accepting( states

A non-deterministic FSA '1FSA( is a tuple A A E,, , ?",FF


'or a finite relation o%er @ (

Adapted definitions for transitions an$ acceptance of a string /y a 1FSA


'?,&( |)A '?P,&iD7( iff &i A 2&iD7 for 2@ and ?P '?,2( An 10FA '&;o ( accepts a string & iff there is some traversal such that '?",&( |)@A '?P, ( and ?P F. A string & is re6ecte$ /y 10FA A iff A does not accept &, i.e. all configurations of A for string & are re:ecting configurations=

1on-determinism in FSA
'a/ a/a(@

a

/ a

1on-determinism in FSA
'a/ a/a(@

a

/ a

1on-determinism in FSA
'a/ a/a(@

a

/ a

1on-determinism in FSA
'a/ a/a(@

a

/ a eof

1on-determinism in FSA
'a/ a/a(@

a

/ a eof

3?ui%alence of 0FSA and 1FSA


0espite non-determinism, 1FSA are not more po&erful than 0FSAthey accept the same class of languages- regular languages For e%ery non-deterministic FSA there is deterministic FSA that accepts the same language 'and %ice %ersa(
) 4he corresponding 0FSA has in general more states, in &hich it models the sets of possi/le states the 1FSA could /e in in a gi%en tra%ersal

4here is an algorithm '%ia su/set construction( that allo&s con%ersion of an 1FSA to an e?ui%alent 0FSA 7fficiency consi$erations' an FSA is most efficient and compact iff <t is a 0FSA 'efficiency( 0etermini2ation of 1FSA <t is minimal 'compact encoding( Minimi2ation of FSA

3?ui%alence of 0FSA and 1FSA


FSA A7 and A! are e?ui%alent iff 'A7( A 'A!( 4heorem- for each 1FSA there is an e?ui%alent 0FSA Construction- A A E , , , ?", F F a 1FSA o%er
) define eps!0# A B p | (?, , p( C ) define an FSA ASA EP,, P, ?"P,FPF o%er sets of states, &ith PABB G B C ?"PABeps'?"(C P'B,a( A B eps'p( G ? and pB such that '?, a, p( C FPABB G B F T C AP satisfies the definition of a 0FSA. *e need to sho& that 'A( A 'AP( 0efine 0'?, %( -A B p G '?, %(
@ A

'p, ) }
@ AU

and

0U'J, %( -A B 9 U G 'J, %(

'9, ) C

3?ui%alence of 0FSA and 1FSA- 9roof


9ro%e- 0'?", %( A 0U'B?"C, %( /y induction o%er length of %

G%G = " - /y definition of 0 and 0U <nduction step- G%8 = kD7, % A v a, /y hypothesis0'?", v( A 0U'B?"C, v( A Bp7, p!, ... , pk CA 9
/y def. of 0- 0'?", %( Ap 9 Beps'?( G 'p, a, ?( } /y def. of U- 0U'Bp7, p!, ... , pk C, a( Ap 9 Beps'?( G 'p, a, ?( } it follo&s0U'B?"C, %( A U'0U'B?"C, %(, a( A 0U'Bp7, p!, ... , pk C, a(

A p 9 Beps'?( G 'p, a, ?( } = 0'?", %( ?.e.d.

Finally, A and AU only accept if 0U'B?"C, %( A 0'?", %( contain a state F

0etermini2ation /y su/set construction


1FSA AAE,, ,?",FF
/ a

ASAEP,, P,?"P,FPF Su/set constructionCompute P from for all su/sets S and a s.th. P'S,a( A B sPG sS s.th. 's,a,sS(C

a a
a /

'A(A a'/a(@ a'//a(@

0etermini2ation /y su/set construction


1FSA AAE,, ,?",FF
/ a 7 ! K

ASAEP,, P, ?"P,FPF
PA B B G B B7,!,6,K,L,MC ?"PAB7C, P'B7C,a(AB!,6C, P'BKC,a(A B!C, P'B7C,/(A, P'BKC,/(A , P'B!,6C,a(A , P'B6C,a(A , P'B!,6C,/(A BK,LC, P'B6C,/(A BLC, P'BK,LC,a(A B!C, P'BLC,a(A , P'BK,LC,/(A BMC, P'BLC,/(A BMC P'B!C,a(A , P'B!C,/(A BKC, FPA BB!,6C,B!C,B6CC P'BMC,a(A B6C, P'BMC,/(A ,

a
/ 6 L M /

a
a

'A(A a'/a(@ a'//a(@

0etermini2ation /y su/set construction


1FSA AAE,, ,?",FF
/ a 7 ! K

ASAEP,, P, ?"P,FPF
PA B B G B B7,!,6,K,L,MC ?"PAB7C, P'B7C,a(AB!,6C, P'BKC,a(A B!C, P'B7C,/(A, P'BKC,/(A , P'B!,6C,a(A , P'B6C,a(A , P'B!,6C,/(A BK,LC, P'B6C,/(A BLC, P'BK,LC,a(A B!C, P'BLC,a(A , P'BK,LC,/(A BMC, P'BLC,/(A BMC P'B!C,a(A , P'B!C,/(A BKC, FPA BB!,6C,B!C,B6CC P'BMC,a(A B6C, P'BMC,/(A ,

a
/ 6 L M /

a
a

!,6

0etermini2ation /y su/set construction


1FSA AAE,, ,?",FF
/ a 7 ! K

ASAEP,, P, ?"P,FPF
PA B B G B B7,!,6,K,L,MC ?"PAB7C, P'B7C,a(AB!,6C, P'BKC,a(A B!C, P'B7C,/(A, P'BKC,/(A , P'B!,6C,a(A , P'B6C,a(A , P'B!,6C,/(A BK,LC, P'B6C,/(A BLC, P'BK,LC,a(A B!C, P'BLC,a(A , P'BK,LC,/(A BMC, P'BLC,/(A BMC P'B!C,a(A , P'B!C,/(A BKC, FPA BB!,6C,B!C,B6CC P'BMC,a(A B6C, P'BMC,/(A ,

a
/ 6 L M /

a
a

!,6 /

K,L

0etermini2ation /y su/set construction


1FSA AAE,, ,?",FF
/ a 7 ! K

ASAEP,, P, ?"P,FPF
PA B B G B B7,!,6,K,L,MC ?"PAB7C, P'B7C,a(AB!,6C, P'BKC,a(A B!C, P'B7C,/(A, P'BKC,/(A , P'B!,6C,a(A , P'B6C,a(A , P'B!,6C,/(A BK,LC, P'B6C,/(A BLC, P'BK,LC,a(A B!C, P'BLC,a(A , P'BK,LC,/(A BMC, P'BLC,/(A BMC P'B!C,a(A , P'B!C,/(A BKC, FPA BB!,6C,B!C,B6CC P'BMC,a(A B6C, P'BMC,/(A ,

a
/ 6 L M a / !

a
a

!,6 /

K,L / M

0etermini2ation /y su/set construction


1FSA AAE,, ,?",FF
/ a 7 ! K

0FSA ASAEP,, P, ?"P,FPF


/ ! a 7 a !,6 / K,L / M a a / 6 / L K

a
/ 6 L M /

a
a

'A( A 'AP( A a'/a(@ a'//a(@

-transitions and -closure


Su/set construction must account for -transitions -closure
) 4he -closure of some state ? consists of ? as &ell as all states that can /e reached from ? through a se?uence of -transitions

-closure defined on sets of states


-closure',( A
?,

? closur(?) <f rclosure(?) and (r, ,?S), then ?P closure(?),

-closure'?(

(&ith )

Su/set construction for -1FSA


) Compute P from for all su/sets S and a s.th. P'S,a( A B sPPG sS s.th. 's,a,sS( and sPP -closure'sP( C

3.ample
-1FSA for 'aG/(c@
-closure for all s-closure'"(AB",7,!C, -closure'7(AB7C, -closure'!(AB!C, -closure'6(AB6,L,M,N,#C, -closure'K(ABK,L,M,N,#C, -closure'L(ABL,M,N,#C, -closure'M(ABM,N,#C, -closure'N(ABNC, -closure'O(ABO,N,#C, -closure'#(AB#C
" ! 7 a / 6 K L M N c O #

4ransition function o%er su/sets P'B"C,(A B",7,!C, P'B",7,!C,a(AB6,L,M,N,#C, P'B",7,!C,/(A BK,L,M,N,#C, P'B6,L,M,N,#C,c(A BO,N,#C, P'BK,L,M,N,#C,c(A BO,N,#C, P'BO,N,#C,c(A BO,N,#C
a "7! / 6LMN# KLMN# c c ON# c

An algorithm for su/set construction


Construction of 0FSA ASAEP,, P,?"P,FPF from 1FSA AAE,, , ?",FF ) PABBG B C, if unconstrained can /e !GG &ith GG A 66 this could lead to an FSA &ith !66 states 'e.ceeds the range of integers in most programming languages( ) Many of these states may /e useless
"

/ / a

" "7

7 7!

! "!

a,/

A 'aG/(a@ a/Da@

"7!

An algorithm for su/set construction


Construction of 0FSA ASAEP,, P,?",FPF from 1FSA AAE,, , ?",FF ) PABBG BC, if unconstrained can /e !GG &ith GG A 66 this could lead to an FSA &ith !66 states 'e.ceeds the range of integers in many programming languages( ) Many of these states may /e useless /
"

/ / a

"

7 / 7! a,/ "7!

a a

! "!

a,/

"7 a,/ /

A 'aG/(a@ a/Da@

An algorithm for su/set construction


Construction of 0FSA ASAEP,, P,?",FPF from 1FSA AAE,, , ?",FF ) PABBG BC, if unconstrained can /e !GG &ith GG A 66 this could lead to an FSA &ith !66 states 'e.ceeds the range of integers in many programming languages( ) Many of these states may /e useless /
"

/ / a

"

7 / 7! a,/ "7!

a a

! "!

a,/

"7 a,/ /

A 'aG/(a@ a/Da@

1o transition can e%er enter these states

An algorithm for su/set construction


Construction of 0FSA ASAEP,, P,?",FPF from 1FSA AAE,, , ?",FF ) PABBG BC, if unconstrained can /e !GG &ith GG A 66 this could lead to an FSA &ith !66 states 'e.ceeds the range of integers in many programming languages( ) Many of these states may /e useless /
"

/ / a

"

a 7! /

a,/

A 'aG/(a@ a/Da@

$nly consider states that can /e tra%ersed starting from ?"

An algorithm for su/set construction


Basic idea- &e only need to consider states B that can e%er /e tra%ersed /y a string &@, starting from ?"S <.e., those B for &hich B A P'?",&(, for some &@, &ith P the recursi%ely constructed transition function for the target 0FSA AP Consider all strings &@ in order of their length- , a,/, aa,a/,/a,//, aaa,...
/ " " a 7! lA" '( lA7 'a,/( ! " a / / 7! a ! a

lA!,6,K, Q 'aa, a/, /a, //, aaa, aa/, a/a, Q(

) ) ) )

Construction /y increasing lengths of strings For each a, construct transitions to kno&n or ne& states according to 1e& target states 'AP( are placed in a ?ueue 'F<F$( 4ermination- no states left on ?ueue

An algorithm for su/set construction


0343,M<1<V3', , , ?", F( ?"S ?"
P B?"SC 31JI3I3'9ueue, ?"S( while Jueue T S 03JI3I3'9ueue( for a

Comple.ity Ma.imal num/er of states placed in ?ueue is !GG So, &orst case runtime is e.ponential > determini2ation is a costly operation, > /ut results in an efficient FSA 'linear in si2e of the input( > a%oids computation of isolated states Actual run time depends on the shape of the 1FSA

P'S,a( A r S 'r,a( if P'S,a( P P P P'S,a( 31JI3I3'9ueue, P'S,a(( if P'S,a( F T FS BP'S,a(C fi fi return 'P,, P, ?"S, FP(

Minimi2ation of FSA
Can &e transform a large automaton into a smaller one 'pro%ided a smaller one e.ists(+ <f A is a 0FSA, is there an algorithm for constructing an e?ui%alent minimal automaton Amin from A+ A
" a 7 / / c 6 a

AS
! a " a 7

/,c /

! a

A is e0uivalent to AS i.e., 'A( A 'AS( AS is smaller than A i.e., GG F GSG

A can /e transformed to AS) States ! and 6 in A W$o the same 6obX- once A is in state ! or 6, it accepts the same suffix string. Such states are called e0uivalent. ) 4hus, &e can eliminate state 6 &ithout changing the language of A, /y re$irecting all arcs leading to 6 to !, instead.

Minimi2ation of FSA
A 0FSA can /e minimi2ed if there are pairs of states ?,?S that are e0uivalent 4&o states ?,?P are e0uivalent iff they accept the same right language. ,ight language of a state) For AAE,, , ?",FF a 0FSA, the right language L!0# of a state 0 is the set of all strings accepted /y A starting in state ? '?( A B&@ G @'?,&( FC ) 1ote- '?"( A 'A( State e?ui%alence) For AAE,, , ?",FF a 0FSA, if ?,?P, 0 an$ 0: are e0uivalent !0 0;# iff '?( A '?P( ) is an e?ui%alence relation 'i.e., refle.i%e, transiti%e and symmetric( ) partitions the set of states into a num/er of $is6oint sets 91 .. 9n of e0uivalence classes s.th. iA7..m Ji A and ? ?P for all ?,?P Ji

9artitioning a state set into e?ui%alence classes


C0 C2
! a 6 a / N K " / / M a a 7

C1
a L a

All classes Ci consist of e0uivalent states ?:Ai..n that accept i$entical right languages '?:( *hene%er t&o states ?,?S /elong to different classes, '?( T '?S(

C4

C3

Equivalence classes on state set defined by

Minimi2ationelimination of e?ui%alent states

Minimi2ation of a 0FSA
A 0FSA AAE,, , ?",FF that contains e0uivalent states 0, 0; can /e transformed to a smaller, e?ui%alent 0FSA APAEP,, P, ?",FPF &here

P A YB?PC, FPAFYB?PC,

P's,a( A ? if 's,a( A ?PH P's,a( A 's,a( other&ise P is like &ith all transitions to ?P redirected to ?-

4&o-step algorithm
) 0etermine all pairs of e?ui%alent states ?,?P ) Apply 0FSA reduction until no such pair ?,?P is left in the automaton

<inimality
) 4he resulting FSA is the smallest 0FSA 'in si2e of ( that accepts 'A(&e ne%er merge different e?ui%alence classes, so &e o/tain one state per class. *e cannot do any further reduction and still recogni2e 'A(. As long as &e ha%e F7 state per class, &e can do further reduction steps. A 0FSA AAE,, , ?",FF is minimal iff there is no pair of distinct /ut e?ui%alent states , i.e. ?, ?P - ? ?P ? A ?P

3.ample

" / ! a 6 a / /

7 a M a L a ! a K N 6 /

" /

a / N

3.ample

" / ! a 6 a / /

a / ! a K 6

" /

a,/

Algorithm
M<1<M<V3', , , ?", F(
main 3?ClassZ[ 9A,4<4<$1'A( ?" 3?ClassZ?"[ for E?,a,?SF '?,a( min'3?ClassZ?S[( for ? if ? min'3?ClassZ?[( YB?C if ? F F FYB?C

M<1<M<V3 > 9A,4<4<$1'A(- determines all e?classes of states in A - returns array 3?ClassZ?[ of e?. classes of ? > redirect all transitions E?,a,?SF to point to min'3?ClassZ?P[( > remo%e all redundant states from and F

Computing partitions- 1a\%e partitioning


1A<]3^9A,4<4<$1', , , ?", F( for each ?

3?ClassZ?[ B?C for each ? for each ?S if 3?ClassZ?[ T 3?ClassZ?S[ C53CK3JI<]A 31C3'A?,A?S( A 4rue 3?ClassZ?[ 3?ClassZ?[ 3?ClassZ?S[ 3?ClassZ?S[ 3?ClassZ?[

1A<]3^9A,4<4<$1 > array 3?Class of pointers to dis:oint sets for e?ui%alence classes > first loop- initiali2ing 3?Class /y B?C, for each ?

> second nested loop- if &e find ne& e?ui%alent states ? ?P, &e merge the respecti%e e?ui%alence classes 3?Classes and reset 3?ClassZ?[ to point to the ne& merged class ,untime comple.ity- loops- "'GG! ( Check3?ui%alence- "'GG! _ GG( "'GGK _ GG( =

Computing partitions- 0ynamic 9rogramming


Source of inefficiency- nai%e algorithm tra%erses the &hole automaton to determine, for pairs ?,?S, &hether they are e?ui%alent ,esults of pre%ious e?ui%alence checks can /e reused
a / a

p pS
/ pS a

? ?S
?S a

<f ? ?S, '?( '?P(, therefore, for all Ep,pSF s.th. -7'p,a(A? and -7'pP,a(A?P for some a, p pP.

4hus, non-e?ui%alence results can /e propagated 9ropagation from final;non-final pairs- '?( '?P( if ? F ?PF 9ropagation from pairs E?,?PF &here '?,a( is defined /ut '?P,a( is not.

9ropagation of non-e?ui%alent states


ocal3?ui%alenceCheck'?,?S( if '?F and ?SF( or '?F and ?SF( return 'False( if a s.th. only one of '?,a(, '?P,a( is defined return 'False( return '4rue( 9,$9A8A43'?,?S( for a for p-7'?,a(, for pP-7'?P,a( if 3?ui%Zmin'p,pP(,ma.'p,pP([A7 3?ui%Zmin'p,pP(,ma.'p,pP([ " 9,$9A8A43'p,pS( 5on e0uivalence chec/ for states E?,?SF ) $nly one of ?, ?P is final ) For some a, '?,a( is defined, '?P,a( is not

9ropagation '<(- 4a/le filling algorithm 'Aho, Sethi, Illman( represent e?ui%alence relation as a ta/le
70uiv, cells filled &ith /oolean %alues initiali2e all cells &ith 7H reset to " for non-e?ui%alent states main loop- call of 9,$9A8A43 for none?ui%alent states from ocal3?ui%alenceCheck

9ropagation of non-e?ui%alent states


ocal3?ui%alenceCheck'?,?S( if '?F and ?SF( or '?F and ?SF( return 'False( if a s.th. only one of '?,a(, '?P,a( is defined return 'False( return '4rue( 9,$9A8A43'?,?S( for a for p-7'?,a(, for pP-7'?P,a( if 3?ui%Zmin'p,pP(,ma.'p,pP([A7 3?ui%Zmin'p,pP(,ma.'p,pP([ " 9,$9A8A43'p,pS(

,untime Comple.ity- "'GG! _ GG( > 9,$9A8A43 is ne%er called t&ice on a gi%en pair of states 'checks
3?ui%Z?,?P[A7(

Space re?uirements- "'GG!( cells


4a/leFilling9A,4<4<$1', , , ?", F( for ?,?S, ?<?P 3?ui%Z?,?P[ 7 for ? for ?S, ?<?P if 3?ui%Z?,?P[A7 and ocal3?ui%alenceCheck'?,?P(AFalse 3?ui%Z?,?P[ " 9,$9A8A43'?,?S(

More optimi2ations
5opcroft and Illman- space re?uirement "'GG(, /y associating states &ith their e?ui%alence classes 5opcroft- ,untime comple.ity of "'GG _ logGG _ GG(, /y distinction of acti%e;non-acti%e /locks

Br2o2o&skiSs Algorithm
Minimi2ation /y re%ersal and determini2ation
'A( 'A(-7

0FSA A

reverse

1FSA A-7

$etermini)e

0FSA A-7
'A( reverse

0FSA 'A-7(-7

'A( $etermini)e

1FSA 'A-7(-7

=eversal > Final states of A` - set of initial states of A > <nitial state of A` - F of A > )'?,a( A Bp G 'p,a(A? C > 'A-7( A 'A(-7

*hy does it yield a minimal 0FSA AS+


0FSA A
0FSA A-7 ?o

rev

1FSA A-7
c / d a a

$et

0FSA A-7

rev

1FSA 'A-7(-7
a / a

$et
c d ?o

0FSA 'A-7(-7

1FSA 'A)7( -7

rev

Consider the right languages of states 0, 0: in 53SA !A 1# 1' > >f for all distinct states ?, ?S '?( T '?P(, i.e. '?( '?P( A , it holds that each pair of states ?,?P recogni2e $ifferent right languages, and thus, that the 1FSA 'A-7(-7 satisfies the minimality con$ition for a 0FSA. > <f there &ere states ?,?P in 1FSA 'A-7(-7 s.th. '?( '?P( , there &ould /e some string & that leads to t&o distinct states in 0FSA A-7. 4his contradicts the $eterminicity criterion of a 0FSA. > 0etermini2ation of 1FSA 'A-7(-7 does not destroy the property of minimality

Applications of FSA- String Matching


3.act, full string matching
) e.icon lookup- search for gi%en &ord;string in a le.icon ) Compile le.icon entries to FSA /y union ) 4est input &ords for acceptance in le.icon-FSA compile to FSA recognition;application;lookup of input &ord % in;to FSA Ale.icon'?",&( |)@Ale.icon '?f, ( is true, &ith ?" initial state and ?f F

*ord list

transition ta/le=

tra%ersal and recognition 'acceptance(

Applications of FSA- String Matching


Su/string matching
) <dentify stop &ords in stream of te.t ) Stem recognition- small, smaller, smallest

Make use of full po&er of finite-state operations=


) ,egular e.pression &ith any-sym/ols for te.t search

? small' G er G est( ? ? ' a G the G Q( ?

) Compilation to 1FSA, con%ert to 0FSA ) Application /y composition of FS4 &ith full te.t

FSAte.t stream FS4small - if defined, search term is su/string of te.t

Application of FSA- ,eplacement


'Su/(string replacement
) 0elete stop &ords in te.t ) Stemming- reduce;replace inflected forms to stems- smallest small ) Morphology- map inflected forms to lemmas 'and 9oS-tags(goo$, better, best goodDAd: ) 4okeni2ation- insert token /oundaries ) Q

Finite-state transducers 'FS4(

From Automata to 4ransducers


Automata
recognition of an input string %
l e a % e

4ransducers
recognition of an input string % generation of an output string %:
?" l l ?7 e e ?! a f ?6 % t ?K e ?L

?"

?7

?!

?6

?K

?L

define a language accept strings, &ith transitions defined for symbols

define a relation /et&een languages e?ui%alent to FSA that accept pairs of strings, &ith transitions defined for pairs of sym/ols ?x,y@ operations- replacement

deletion Ea, F, a -BC insertion E, aF, a -BC su/stitution Ea, /F, a,/ , a T /

4ransducers and composition


An FS4s encodes a relation /et&een languages A relation may contain an infinite num/er of ordered pairs, e.g. translating lo&er case letters to upper case
a-A, /-B, c-C,... .-a y-b 2-V 2-V y-b

a lo&er;upper case transducer

a path through the lo&er;upper case transducer, for string .y22y 4he application of a transducer to a string may also /e %ie&ed as composition of the FS4 &ith the 'identity relation on the string(
?" l l l ?7 e e e 3 ?! f f f F ?6 % t t 4 ?K e D]B0 ?L ?K l e 3 f F % 4 e D]B0 ?L

?"

?7

?!

?6

?K

?K

iterature
5.,. e&is and C.5. 9apadimitriou- 3lements of the 4heory of Computation. 9rentice-5all, 1e& cersey 'Chapter !(. c. 5opcroft and c. Illman- <ntroduction to Automata 4heory, anguages, and Computation, Addison-*esley, Massachusetts, 'Chapter !,6(. B.5. 9artee, A. ter Meulen and ,.3. *all- Mathematical Methods in inguistics, Klu&er Academic 9u/lishers, 0ordrecht 'Chapter 7L.L,7L.M, 7N( 0. curafsky and c.5. Martin- Speech and anguage 9rocessing. An introduction to 1atural anguage 9rocessing, Computational inguistics, and Speech ,ecognition, 9rentice-5all, 1e& cersey 'Chapter !(. C. Martin-]ide- Formal 8rammars and anguages. <n- ,. Mitko% 'ed(- $.ford 5and/ook of Computational inguistics, 'Chapter O(. . Karttunen- Finite-state 4echnology. <n- ,. Mitko% 'ed(- $.ford 5and/ook of Computational inguistics, 'Chapter 7O(.

$ff-the-shelf finite-state tools


aero. finite-state tools ) http-;;&&&..rce..ero..com;competencies;content-analysis;fst; F aero. Finite State Compiler '0emo( ) aFS4 4ools 'pro%ided &ith Beesley and Karttunen- Finite-State Morphology, CS < 9u/lications( 8eert:an %an 1oordPs finite-state tools ) http-;;odur.let.rug.nl;d%annoord;Fsa; FSA Itilities at cohn 5opkins ) http-;;cs.:hu.edu;d:ason;K"M;soft&are.html A4e4 FSM i/rary ) http-;;&&&.research.att.com;s&;tools;fsm;

3.ercises
*rite a program for acceptance of a string /y a 0FSA. 4hen e.tend it to a finite-state transducer that can translate a surface form to lemma D 9$S, or /et&een upper and lo&er case. / 0etermini2e the follo&ing 1FSA /y su/set construction. 7 a p p,? p A7AEBp,?,r,sC,Ba,/C,7,p,BsCF &here 7 is as follo&s? r r
r s s s s

Construct an 1FSA &ith -transitions from the regular e.pression 'aG/(ca@, according to the construction principles for union, concatenation and kleene star. 4hen transform the 1FSA to a 0FSA /y su/set construction. Find a minimal 0FSA for the FSA AAEBA,..,3C,B",7C, 6,A,BC,3CF 'using the ta/le filling algorithm /y propagation(. " 7
A B C 0 3
6

B B 0 0 C

0 C 3 3 -

Das könnte Ihnen auch gefallen