Beruflich Dokumente
Kultur Dokumente
Bernd Kiefer, kiefer@dfki.de Many thanks to Anette Frank for the slides MSc. Computational inguistics Course, SS !""#
$%er%ie&
Finite-state automata 'FSA( ) *hat for+
) ,ecap- Chomsky hierarchy of grammars and languages ) FSA, regular languages and regular e.pressions ) Appropriate pro/lem classes and applications
desc
ec i
fy
; ri/e
e ; sp
spec
des cri/
i fy
descri/e;specify recogni2e
,egular languages
desc
ec i
fy
; ri/e
e ; sp
spec
des cri/
i fy
descri/e;specify recogni2e;generate
,egular languages > properties of regular languages > appropriate pro/lem classes > algorithms for FSA
Strings
'8( A B% - S "* % + % @C
3.ample8 A EBS,A,B,C,0,3C,BaC,S,,F, '8( A Ba!n G n7C S ACaB. CB 3. a3 3a. Ca aaC. a0 0a. A3 . CB 0B. A0 AC. a!! A aaaa '8( iff S @ aaaa
3.ampleA B a n / a n G n 7 C
, A B S A S A, S /, A a C
3.ample A B a, / C A B S, A, BC , A B S a A,
S a / / A A / / B B /
B / B, A a A, B/ A//B C
SaAaaAaa//Baa///Baa////
$perations on languages
4ypical set-theoretic operations on languages
) ) ) ) Inion- 7 ! A B & - & 7 or & ! C <ntersection- 7 ! A B & - & 7 and & ! C 0ifference- 7 - ! A B & - & 7 and & ! C Complement of * &rt. @- ) A @ -
A i>"
Inion, concatenation and Kleene star are called regular operations ,egular sets;languages- languages that are defined /y the regular operations- concatenation '( , union '( and kleene star '@( ,egular languages are close$ under concatenation, union, /leene star, intersection an$ complementation
,egular e.pressions
desc
fy
; ri/e
e ; sp
spec
des
cri/
ify
descri/e;specify recogni2e;generate
,egular languages
4ransition graphs 'diagrams() states- circles ) transitions- directed arcs /et&een circles ) initial state ) final state p 'p, a( A ? p A ?" rF
e
l e e
e
e ?6 ?O
r
a % t ?K e ?# r ?L
?"
?7
?! ?N
State diagram
?" ?7 ?! ?6 ?K ?L ?M ?N ?O ?# a " " " ?K " " " " " " c ?7 " " " " " " " " " e ?6 " ?6 " " " ?N " " ?K l ?M ?! " " " " " " " " r " " " " ?L " " " " " t " " " " " " " ?O ?# " % " " " ?# " " " " " "
e
l e e
e
e ?6 ?O
r
a % t ?K e ?# r ?L
?"
?7
?! ?N
Acceptance
) 0ecide &hether an input string % is in the language 'A( defined /y FSA A ) An FSA A accepts a string & iff '?",&( |)@A '?f, (, &ith ?" initial state, ?f F ) 4he language L!A# accepte$ by 3SA A is the set of all strings accepte$ by A <.e., & 'A( iff there is some ?f FA such that '?",&( |)@A '?f, (
/ /
A / B B /
) Con%ersely, &e can construct an FSA from the rules of a type-6 language
/ ,egular grammars and FSA can /e sho&n to /e e?ui%alent ,egular grammars generate regular languages ,egular languages are defined /y concatenation, union, kleene star
,egular e.pression and FSA for lehr - 'un G ( '/e lehr G lehr( /ar 'keit G ( 'non-deterministic( un keit /e lehr
/ar
lehr
FA
FB
FAB
= {(<?i,>,?:) | ?i , ?: = }
FA B
) Kleene Star o%er an FSA FA FA SA@ A s" 'ne& state( FA@ A B ?: C 'ne& state( AB A A B'E?:,F,?2( G ?: FA, ?2 A SA(C B'E?",F,?2( G ?" A SA@, ' ?2 A SA or ?2 A FA@(C B'E?2,F,?:( G ?2FA , ?:FA@C
FA@
transition' mo%e to '?, ( &ithout reading an input sym/ol FSA construction from regular e.pressions yields a non-deterministic FSA '1FSA( ) Choice of ne.t state is only partially $etermine$ /y the current configuration, i.e., &e cannot al&ays predict &hich state &ill /e the ne.t state in the tra%ersal
<ntroduced /y -transitions and;or 4ransition /eing a relation R o%er @ , i.e. a set of triples E?source,2,?targetF 3?ui%alently- 4ransition function maps to a set of states- - '( a finite non-empty set of states a finite alpha/et of input letters a transition function * '( ?" the initial state F the set of final 'accepting( states
'?,&( |)A '?P,&iD7( iff &i A 2&iD7 for 2@ and ?P '?,2( An 10FA '&;o ( accepts a string & iff there is some traversal such that '?",&( |)@A '?P, ( and ?P F. A string & is re6ecte$ /y 10FA A iff A does not accept &, i.e. all configurations of A for string & are re:ecting configurations=
1on-determinism in FSA
'a/ a/a(@
a
/ a
1on-determinism in FSA
'a/ a/a(@
a
/ a
1on-determinism in FSA
'a/ a/a(@
a
/ a
1on-determinism in FSA
'a/ a/a(@
a
/ a eof
1on-determinism in FSA
'a/ a/a(@
a
/ a eof
4here is an algorithm '%ia su/set construction( that allo&s con%ersion of an 1FSA to an e?ui%alent 0FSA 7fficiency consi$erations' an FSA is most efficient and compact iff <t is a 0FSA 'efficiency( 0etermini2ation of 1FSA <t is minimal 'compact encoding( Minimi2ation of FSA
'p, ) }
@ AU
and
0U'J, %( -A B 9 U G 'J, %(
'9, ) C
G%G = " - /y definition of 0 and 0U <nduction step- G%8 = kD7, % A v a, /y hypothesis0'?", v( A 0U'B?"C, v( A Bp7, p!, ... , pk CA 9
/y def. of 0- 0'?", %( Ap 9 Beps'?( G 'p, a, ?( } /y def. of U- 0U'Bp7, p!, ... , pk C, a( Ap 9 Beps'?( G 'p, a, ?( } it follo&s0U'B?"C, %( A U'0U'B?"C, %(, a( A 0U'Bp7, p!, ... , pk C, a(
ASAEP,, P,?"P,FPF Su/set constructionCompute P from for all su/sets S and a s.th. P'S,a( A B sPG sS s.th. 's,a,sS(C
a a
a /
ASAEP,, P, ?"P,FPF
PA B B G B B7,!,6,K,L,MC ?"PAB7C, P'B7C,a(AB!,6C, P'BKC,a(A B!C, P'B7C,/(A, P'BKC,/(A , P'B!,6C,a(A , P'B6C,a(A , P'B!,6C,/(A BK,LC, P'B6C,/(A BLC, P'BK,LC,a(A B!C, P'BLC,a(A , P'BK,LC,/(A BMC, P'BLC,/(A BMC P'B!C,a(A , P'B!C,/(A BKC, FPA BB!,6C,B!C,B6CC P'BMC,a(A B6C, P'BMC,/(A ,
a
/ 6 L M /
a
a
ASAEP,, P, ?"P,FPF
PA B B G B B7,!,6,K,L,MC ?"PAB7C, P'B7C,a(AB!,6C, P'BKC,a(A B!C, P'B7C,/(A, P'BKC,/(A , P'B!,6C,a(A , P'B6C,a(A , P'B!,6C,/(A BK,LC, P'B6C,/(A BLC, P'BK,LC,a(A B!C, P'BLC,a(A , P'BK,LC,/(A BMC, P'BLC,/(A BMC P'B!C,a(A , P'B!C,/(A BKC, FPA BB!,6C,B!C,B6CC P'BMC,a(A B6C, P'BMC,/(A ,
a
/ 6 L M /
a
a
!,6
ASAEP,, P, ?"P,FPF
PA B B G B B7,!,6,K,L,MC ?"PAB7C, P'B7C,a(AB!,6C, P'BKC,a(A B!C, P'B7C,/(A, P'BKC,/(A , P'B!,6C,a(A , P'B6C,a(A , P'B!,6C,/(A BK,LC, P'B6C,/(A BLC, P'BK,LC,a(A B!C, P'BLC,a(A , P'BK,LC,/(A BMC, P'BLC,/(A BMC P'B!C,a(A , P'B!C,/(A BKC, FPA BB!,6C,B!C,B6CC P'BMC,a(A B6C, P'BMC,/(A ,
a
/ 6 L M /
a
a
!,6 /
K,L
ASAEP,, P, ?"P,FPF
PA B B G B B7,!,6,K,L,MC ?"PAB7C, P'B7C,a(AB!,6C, P'BKC,a(A B!C, P'B7C,/(A, P'BKC,/(A , P'B!,6C,a(A , P'B6C,a(A , P'B!,6C,/(A BK,LC, P'B6C,/(A BLC, P'BK,LC,a(A B!C, P'BLC,a(A , P'BK,LC,/(A BMC, P'BLC,/(A BMC P'B!C,a(A , P'B!C,/(A BKC, FPA BB!,6C,B!C,B6CC P'BMC,a(A B6C, P'BMC,/(A ,
a
/ 6 L M a / !
a
a
!,6 /
K,L / M
a
/ 6 L M /
a
a
-closure'?(
(&ith )
3.ample
-1FSA for 'aG/(c@
-closure for all s-closure'"(AB",7,!C, -closure'7(AB7C, -closure'!(AB!C, -closure'6(AB6,L,M,N,#C, -closure'K(ABK,L,M,N,#C, -closure'L(ABL,M,N,#C, -closure'M(ABM,N,#C, -closure'N(ABNC, -closure'O(ABO,N,#C, -closure'#(AB#C
" ! 7 a / 6 K L M N c O #
4ransition function o%er su/sets P'B"C,(A B",7,!C, P'B",7,!C,a(AB6,L,M,N,#C, P'B",7,!C,/(A BK,L,M,N,#C, P'B6,L,M,N,#C,c(A BO,N,#C, P'BK,L,M,N,#C,c(A BO,N,#C, P'BO,N,#C,c(A BO,N,#C
a "7! / 6LMN# KLMN# c c ON# c
/ / a
" "7
7 7!
! "!
a,/
A 'aG/(a@ a/Da@
"7!
/ / a
"
7 / 7! a,/ "7!
a a
! "!
a,/
"7 a,/ /
A 'aG/(a@ a/Da@
/ / a
"
7 / 7! a,/ "7!
a a
! "!
a,/
"7 a,/ /
A 'aG/(a@ a/Da@
/ / a
"
a 7! /
a,/
A 'aG/(a@ a/Da@
) ) ) )
Construction /y increasing lengths of strings For each a, construct transitions to kno&n or ne& states according to 1e& target states 'AP( are placed in a ?ueue 'F<F$( 4ermination- no states left on ?ueue
Comple.ity Ma.imal num/er of states placed in ?ueue is !GG So, &orst case runtime is e.ponential > determini2ation is a costly operation, > /ut results in an efficient FSA 'linear in si2e of the input( > a%oids computation of isolated states Actual run time depends on the shape of the 1FSA
P'S,a( A r S 'r,a( if P'S,a( P P P P'S,a( 31JI3I3'9ueue, P'S,a(( if P'S,a( F T FS BP'S,a(C fi fi return 'P,, P, ?"S, FP(
Minimi2ation of FSA
Can &e transform a large automaton into a smaller one 'pro%ided a smaller one e.ists(+ <f A is a 0FSA, is there an algorithm for constructing an e?ui%alent minimal automaton Amin from A+ A
" a 7 / / c 6 a
AS
! a " a 7
/,c /
! a
A can /e transformed to AS) States ! and 6 in A W$o the same 6obX- once A is in state ! or 6, it accepts the same suffix string. Such states are called e0uivalent. ) 4hus, &e can eliminate state 6 &ithout changing the language of A, /y re$irecting all arcs leading to 6 to !, instead.
Minimi2ation of FSA
A 0FSA can /e minimi2ed if there are pairs of states ?,?S that are e0uivalent 4&o states ?,?P are e0uivalent iff they accept the same right language. ,ight language of a state) For AAE,, , ?",FF a 0FSA, the right language L!0# of a state 0 is the set of all strings accepted /y A starting in state ? '?( A B&@ G @'?,&( FC ) 1ote- '?"( A 'A( State e?ui%alence) For AAE,, , ?",FF a 0FSA, if ?,?P, 0 an$ 0: are e0uivalent !0 0;# iff '?( A '?P( ) is an e?ui%alence relation 'i.e., refle.i%e, transiti%e and symmetric( ) partitions the set of states into a num/er of $is6oint sets 91 .. 9n of e0uivalence classes s.th. iA7..m Ji A and ? ?P for all ?,?P Ji
C1
a L a
All classes Ci consist of e0uivalent states ?:Ai..n that accept i$entical right languages '?:( *hene%er t&o states ?,?S /elong to different classes, '?( T '?S(
C4
C3
Minimi2ation of a 0FSA
A 0FSA AAE,, , ?",FF that contains e0uivalent states 0, 0; can /e transformed to a smaller, e?ui%alent 0FSA APAEP,, P, ?",FPF &here
P A YB?PC, FPAFYB?PC,
P's,a( A ? if 's,a( A ?PH P's,a( A 's,a( other&ise P is like &ith all transitions to ?P redirected to ?-
4&o-step algorithm
) 0etermine all pairs of e?ui%alent states ?,?P ) Apply 0FSA reduction until no such pair ?,?P is left in the automaton
<inimality
) 4he resulting FSA is the smallest 0FSA 'in si2e of ( that accepts 'A(&e ne%er merge different e?ui%alence classes, so &e o/tain one state per class. *e cannot do any further reduction and still recogni2e 'A(. As long as &e ha%e F7 state per class, &e can do further reduction steps. A 0FSA AAE,, , ?",FF is minimal iff there is no pair of distinct /ut e?ui%alent states , i.e. ?, ?P - ? ?P ? A ?P
3.ample
" / ! a 6 a / /
7 a M a L a ! a K N 6 /
" /
a / N
3.ample
" / ! a 6 a / /
a / ! a K 6
" /
a,/
Algorithm
M<1<M<V3', , , ?", F(
main 3?ClassZ[ 9A,4<4<$1'A( ?" 3?ClassZ?"[ for E?,a,?SF '?,a( min'3?ClassZ?S[( for ? if ? min'3?ClassZ?[( YB?C if ? F F FYB?C
M<1<M<V3 > 9A,4<4<$1'A(- determines all e?classes of states in A - returns array 3?ClassZ?[ of e?. classes of ? > redirect all transitions E?,a,?SF to point to min'3?ClassZ?P[( > remo%e all redundant states from and F
3?ClassZ?[ B?C for each ? for each ?S if 3?ClassZ?[ T 3?ClassZ?S[ C53CK3JI<]A 31C3'A?,A?S( A 4rue 3?ClassZ?[ 3?ClassZ?[ 3?ClassZ?S[ 3?ClassZ?S[ 3?ClassZ?[
1A<]3^9A,4<4<$1 > array 3?Class of pointers to dis:oint sets for e?ui%alence classes > first loop- initiali2ing 3?Class /y B?C, for each ?
> second nested loop- if &e find ne& e?ui%alent states ? ?P, &e merge the respecti%e e?ui%alence classes 3?Classes and reset 3?ClassZ?[ to point to the ne& merged class ,untime comple.ity- loops- "'GG! ( Check3?ui%alence- "'GG! _ GG( "'GGK _ GG( =
p pS
/ pS a
? ?S
?S a
<f ? ?S, '?( '?P(, therefore, for all Ep,pSF s.th. -7'p,a(A? and -7'pP,a(A?P for some a, p pP.
4hus, non-e?ui%alence results can /e propagated 9ropagation from final;non-final pairs- '?( '?P( if ? F ?PF 9ropagation from pairs E?,?PF &here '?,a( is defined /ut '?P,a( is not.
9ropagation '<(- 4a/le filling algorithm 'Aho, Sethi, Illman( represent e?ui%alence relation as a ta/le
70uiv, cells filled &ith /oolean %alues initiali2e all cells &ith 7H reset to " for non-e?ui%alent states main loop- call of 9,$9A8A43 for none?ui%alent states from ocal3?ui%alenceCheck
,untime Comple.ity- "'GG! _ GG( > 9,$9A8A43 is ne%er called t&ice on a gi%en pair of states 'checks
3?ui%Z?,?P[A7(
More optimi2ations
5opcroft and Illman- space re?uirement "'GG(, /y associating states &ith their e?ui%alence classes 5opcroft- ,untime comple.ity of "'GG _ logGG _ GG(, /y distinction of acti%e;non-acti%e /locks
Br2o2o&skiSs Algorithm
Minimi2ation /y re%ersal and determini2ation
'A( 'A(-7
0FSA A
reverse
1FSA A-7
$etermini)e
0FSA A-7
'A( reverse
0FSA 'A-7(-7
'A( $etermini)e
1FSA 'A-7(-7
=eversal > Final states of A` - set of initial states of A > <nitial state of A` - F of A > )'?,a( A Bp G 'p,a(A? C > 'A-7( A 'A(-7
rev
1FSA A-7
c / d a a
$et
0FSA A-7
rev
1FSA 'A-7(-7
a / a
$et
c d ?o
0FSA 'A-7(-7
1FSA 'A)7( -7
rev
Consider the right languages of states 0, 0: in 53SA !A 1# 1' > >f for all distinct states ?, ?S '?( T '?P(, i.e. '?( '?P( A , it holds that each pair of states ?,?P recogni2e $ifferent right languages, and thus, that the 1FSA 'A-7(-7 satisfies the minimality con$ition for a 0FSA. > <f there &ere states ?,?P in 1FSA 'A-7(-7 s.th. '?( '?P( , there &ould /e some string & that leads to t&o distinct states in 0FSA A-7. 4his contradicts the $eterminicity criterion of a 0FSA. > 0etermini2ation of 1FSA 'A-7(-7 does not destroy the property of minimality
*ord list
transition ta/le=
) Compilation to 1FSA, con%ert to 0FSA ) Application /y composition of FS4 &ith full te.t
4ransducers
recognition of an input string % generation of an output string %:
?" l l ?7 e e ?! a f ?6 % t ?K e ?L
?"
?7
?!
?6
?K
?L
define a relation /et&een languages e?ui%alent to FSA that accept pairs of strings, &ith transitions defined for pairs of sym/ols ?x,y@ operations- replacement
deletion Ea, F, a -BC insertion E, aF, a -BC su/stitution Ea, /F, a,/ , a T /
a path through the lo&er;upper case transducer, for string .y22y 4he application of a transducer to a string may also /e %ie&ed as composition of the FS4 &ith the 'identity relation on the string(
?" l l l ?7 e e e 3 ?! f f f F ?6 % t t 4 ?K e D]B0 ?L ?K l e 3 f F % 4 e D]B0 ?L
?"
?7
?!
?6
?K
?K
iterature
5.,. e&is and C.5. 9apadimitriou- 3lements of the 4heory of Computation. 9rentice-5all, 1e& cersey 'Chapter !(. c. 5opcroft and c. Illman- <ntroduction to Automata 4heory, anguages, and Computation, Addison-*esley, Massachusetts, 'Chapter !,6(. B.5. 9artee, A. ter Meulen and ,.3. *all- Mathematical Methods in inguistics, Klu&er Academic 9u/lishers, 0ordrecht 'Chapter 7L.L,7L.M, 7N( 0. curafsky and c.5. Martin- Speech and anguage 9rocessing. An introduction to 1atural anguage 9rocessing, Computational inguistics, and Speech ,ecognition, 9rentice-5all, 1e& cersey 'Chapter !(. C. Martin-]ide- Formal 8rammars and anguages. <n- ,. Mitko% 'ed(- $.ford 5and/ook of Computational inguistics, 'Chapter O(. . Karttunen- Finite-state 4echnology. <n- ,. Mitko% 'ed(- $.ford 5and/ook of Computational inguistics, 'Chapter 7O(.
3.ercises
*rite a program for acceptance of a string /y a 0FSA. 4hen e.tend it to a finite-state transducer that can translate a surface form to lemma D 9$S, or /et&een upper and lo&er case. / 0etermini2e the follo&ing 1FSA /y su/set construction. 7 a p p,? p A7AEBp,?,r,sC,Ba,/C,7,p,BsCF &here 7 is as follo&s? r r
r s s s s
Construct an 1FSA &ith -transitions from the regular e.pression 'aG/(ca@, according to the construction principles for union, concatenation and kleene star. 4hen transform the 1FSA to a 0FSA /y su/set construction. Find a minimal 0FSA for the FSA AAEBA,..,3C,B",7C, 6,A,BC,3CF 'using the ta/le filling algorithm /y propagation(. " 7
A B C 0 3
6
B B 0 0 C
0 C 3 3 -