Beruflich Dokumente
Kultur Dokumente
Language Processors (E2.15) Review: Lexical analysis, regular expressions, finite state automata
1
NFA (1) NFA (1)
DFA and NFA (1) DFA and NFA (2)
Deterministic FSA allow only one transition from one Another difference between DFA and NFA is that NFA
state to another for a given input allow ε-transitions
Non-deterministic FSA allow more than one transition An ε-transition is a transition between states without
possibilities for a given input, reading any input
for example from q0, given a as input we can go either to Indicates equivalent states
q2, or q3
a q3 state input
state input
b ε
a b
q0 a q2 q0 {q2} {q1} b
q0 {q2, q3} q1 q0 q2
b
ε
….
…. q1 q1
E2.15 - Language Processors E2.15 - Language Processors
(Lect 5) 5 (Lect 5) 6
2
NFA (3) NFA (3)
Subset construction algorithm (2)
Subset construction algorithm (1)
So, NFA can be reduced to DFA [which are Key idea: construct an NFA for each symbol and for
each operator (|, union, Kleene *); join the resulting
easier to program]! NFA with ε-transitions (in precedence order)
Why NFA?!
Instrumental for the automatic construction of DFA NFA representing the empty string NFA for symbol a
from regular expressions
ε a
RE -> DFA tough; instead we proceed as q0 q1 q0 q1
RE -> NFA -> DFA
ε-transitions a key element! NFA representing union: ab
Time for Thompson’s algorithm for converting a ε b
a RE to an NFA: q0 q1 q2 q3
3
Regular Expressions to NFA Regular Expressions to NFA
Thompson’s algorithm (2) Thompson’s algorithm (3) - Example
NFA representing a | b
Converting regular expression a(b|c)* to a NFA
a
ε q1 q2 ε
q0 We start with each symbol, e.g. a:
ε ε q3
b a
q4 q5 q0 q1 (and similarly for b and c)
…Converting regular expression a(b|c)* to a NFA (cont.) And finally: a(b|c)* can be constructed as
4
Regular Expressions to NFA Automatic construction (1)
Thompson’s algorithm (3) – Example (4) Overview of the process
?! But can’t we do much better by having instead of: Yes, but the algorithmic nature of Thompson’s
construction means that NFA can be created
b automatically from RE
ε q1 q2 ε
ε Subsequently they can be converted to DFA (the
q8 q6 ε q7 subset construction algorithm), for which its easier to
c q5
automatically generate code
ε q3 q4 There are algorithms for minimizing DFA (ensuring that
ε ε
ε the constructed DFA has the smallest number states
q0 a q9 possible)
b
Minimal source
RE NFA DFA code
a DFA
the equivalent: q0 q1
Automatic construction: process overview
E2.15 - Language Processors c E2.15 - Language Processors
(Lect 5) 17 (Lect 5) 18
Start by assuming that all the states in the DFA are Once the DFA has been constructed, code to
equivalent simulate it can be generated.
Work through the states, putting different states in The state transition table defines function move (s,ch)
separate sets which returns the next state from s given ch as input
Two states are considered different if: character. Then:
One is a final state and the other one isn’t S:=S0
The transition function maps them to different states, based on the c:=nextchar();
same input character. while c <> eof do
1. The minimization algorithm starts by initially creating two sets of s:= move (s, c)
states, final and non-final. c:= nextchar();
end;
2. For each state-set created from (1), it examines the transitions
for each state and for each input symbol. It the transition is to a if s is in FinalStates then
different state for any two states, then they are put into different return “yes”;
state-sets. else return “no”;
5
Summary
Recommended Reading
NFA provide a useful intermediary point between RE
and DFA. Chapter 3 of Aho et al
Lexical analysers can be generated by hand, or
automatically by converting regular expressions to Section 2.1 of Grune et al.
NFA (Thompson’s algorithm), then the NFA to a DFA
(subset construction algorithm), minimizing the DFA
states, and converting into code.
Next lecture:
Lexical analysis using LEX