Beruflich Dokumente
Kultur Dokumente
Lexical Analysis
Scanners
Tokens
Regular expressions
Finite automata
FLex - a scanner generator
3
Scanners
token
characters Scanner Parser
next token
Symbol
Table
4
Tokens
Scanner
Scanner
definition in Scanner
Generator
matalanguage
Program in
Scanner Token types &
programming
semantic values
language
7
Languages
9
Regular Expressions
is a RE denoting L = {}
If a alphabet, then a is a RE denoting L = {a}
Suppose r and s are RE denoting L(r) and L(s)
alternation: (r) | (s) is a RE denoting L(r) L(s)
concatenation: (r) (s) is a RE denoting L(r)L(s)
repetition: (r)* is a RE denoting (L(r))*
(r) is a RE denoting L(r)
10
Examples
a|b {a, b}
(a | b)(a | b) {aa, ab, ba, bb}
a* {, a, aa, aaa, ...}
(a | b)* the set of all strings of as and bs
a | a*b the set containing the string a and all
strings consisting of zero or more as followed
by a b
11
Regular Definitions
if {return IF;}
[a-z][a-z0-9]* {return ID;}
[0-9]+ {return NUM;}
([0-9]+.[0-9]*)|([0-9]*.[0-9]+) {return REAL;}
(--[a-z]*\n)|(
| \n | \t)+
{/*do nothing for white spaces and comments*/}
. { error(); }
14
Completeness and Disambiguity
. /* match any */
([0-9]+.[0-9]*)|([0-9]*.[0-9]+) /* REAL */
0.9
if /* IF */
[a-z][a-z0-9]* /* ID */
if
16
Finite Automata
17
Nondeterministic Finite Automata (NFA)
An NFA consists of
A finite set of states
A finite set of input symbols
A transition function that maps (state, symbol)
pairs to sets of states
A state distinguished as start state
A set of states distinguished as final states
18
An Example
20
An Example
(a | b)*abb aabb
start a b b
0 1 2 3
21
An Example
(a | b)*abb aaba
start a b b
0 1 2 3
a
b
22
Another Example
aa* | bb*
a
a
2 3
start
1
4 5
b
b
24
Deterministic Finite Automata (DFA)
25
An Example
RE: (a | b)*abb
States: {1, 2, 3, 4}
Input symbols: {a, b}
Transition function:
(1,a) = {2}, (2,a) = {2}, (3,a) = {2}, (4,a) = {2}
(1,b) = {1}, (2,b) = {3}, (3,b) = {4}, (4,b) = {1}
Start state: 1
Final state: {4}
26
Finite-State Transition Diagram
(a | b)*abb
b
a
start b b
1 2 3 4
a a
b a
27
Acceptance of DFA
28
An Example
(a | b)*abb aabb
b
a
start b b
1 2 3 4
a a
b a
29
An Example
(a | b)*abb aaba
b
a
start b b
1 2 3 4
a a
b a
30
Combined Finite Automata
start i f
if 1 2 3 IF
ID
start a-z
[a-z][a-z0-9]* 1 2 a-z,0-9
0-9 REAL
([0-9]+.[0-9]*) .
0-9 2 3 0-9
| start 1
. 0-9
([0-9]*.[0-9]+) 4 5 0-9
31 REAL
Combined Finite Automata
i f
2 3 4 IF
ID
start a-z
1 5 6 a-z,0-9
0-9 REAL
.
0-9 8 9 0-9
7
. 0-9
10 11 0-9
NFA
REAL
32
Combined Finite Automata
f IF
2 3
g-z
a-e a-z,0-9
i 4 a-z,0-9
j-z
ID
start a-h 0-9 REAL
1 0-9 .
5 6 0-9
.
0-9
7 8 0-9
DFA
33 REAL
Recognizing the Longest Match
34
An Example
f IF
2 3
iffail+ g-z S C L P
a-e a-z,0-9 1 0 0
i 4 a-z,0-9 i 2 0 0
j-z
ID f 3 3 2
start a-h 0-9 REAL f 4 4 3
1 0-9 . a 4 4 4
5 6 0-9 i 4 4 5
.
0-9 l 4 4 6
7 8 0-9 + ?
DFA
REAL
35
Lexical Analyzer Generators
RE
NFA
DFA
36
From a RE to an NFA
start
i f
start a
i f
37
From a RE to an NFA
is N(s) fs
start f
i
it N(t) ft
for s t, construct
start fs
i N(s) it N(t) f
38
From a RE to an NFA
start is N(s) fs
i f
39
An Example
(a | b)*abb
a
1 2
start a b b
7 5 6 8 9 10 11
b
3 4
40
Simulating a DFA
Input: An input string ended with eof and a DFA with start
state s0 and final states F.
Output: The answer yes if accepts, no otherwise.
begin
s := s0; c := nextchar;
while c <> eof do begin
s := move(s, c); c := nextchar
end;
if s is in F then return yes else return no
41 end.
An Example
(a | b)*abb
b
a
start b b
1 2 3 4
a a
b a
42
An Example
bbababb bbabab
s=1 s=1
s = move(1, b) = 1 s = move(1, b) = 1
s = move(1, b) = 1 s = move(1, b) = 1
s = move(1, a) = 2 s = move(1, a) = 2
s = move(2, b) = 3 s = move(2, b) = 3
s = move(3, a) = 2 s = move(3, a) = 2
s = move(2, b) = 3 s = move(2, b) = 3
s = move(3, b) = 4 s is not in {4}
s is in {4}
43
Simulating an NFA
begin
S := -closure({s0}); c := nextchar;
while c <> eof do begin
S := -closure(move(S, c)); c := nextchar
end;
if S F <> then return yes else return no
end.
44
Operations on NFA states
45
An Example
(a | b)*abb
a
3 4
start a b b
1 2 7 8 9 10 11
b
5 6
46
An Example
S = -closure({1}) = {1,2,3,5,8}
S = -closure(move({1,2,3,5,8}, b)) bbabb
= -closure({6}) = {2,3,5,6,7,8}
S = -closure(move({2,3,5,6,7,8}, b))
= -closure({6}) = {2,3,5,6,7,8}
S = -closure(move({2,3,5,6,7,8}, a))
= -closure({4,9}) = {2,3,5,5,7,8,9}
S = -closure(move({2,3,5,5,7,8,9}, b))
= -closure({6,10}) = {2,3,5,6,7,8,10}
S = -closure(move({2,3,5,6,7,8,10}, b))
= -closure({6,11}) = {2,3,5,6,7,8,11}
47 S {11} <>
Computation of -closure
(a | b)*abb
a
3 4
start a b b
1 2 7 8 9 10 11
b
5 6
50
An Example
-closure({1}) = {1,2,3,5,8} = A
-closure(move(A, a))=-closure({4,9}) = {2,3,4,5,7,8,9} = B
-closure(move(A, b))=-closure({6}) = {2,3,5,6,7,8} = C
-closure(move(B, a))=-closure({4,9}) = B
-closure(move(B, b))=-closure({6,10}) = {2,3,5,6,7,8,10} = D
-closure(move(C, a))=-closure({4,9}) = B
-closure(move(C, b))=-closure({6}) = C
-closure(move(D, a))=-closure({4,9}) = B
-closure(move(D, b))=-closure({6,11}) = {2,3,5,6,7,8,11} = E
-closure(move(E, a))=-closure({4,9}) = B
-closure(move(E, b))=-closure({6}) = C
51
An Example
Input Symbol
State
a b
A = {1,2,3,5,8} B C
B = {2,3,4,5,7,8,9} B D
C = {2,3,5,6,7,8} B C
D = {2,3,5,6,7,8,10} B E
E = {2,3,5,6,7,8,11} B C
52
An Example
{2,3,5,
6,7,8}
b
b a
start b
{2,3,4,5, {2,3,5,6, {2,3,5,6,
{1,2,3,5,8} a
7,8,9} 7,8,10} b 7,8,11}
a
a a
53
Flex Lexical Analyzer Generator
%{
auxiliary declarations
%}
regular definitions
%%
translation rules
%%
auxiliary procedures
55
Translation Rules
P1 action1
P2 action2
...
Pn actionn
%%
username printf( %s, getlogin() );
%{
int lines = 0, chars = 0;
%}
%%
\n ++lines; ++chars;
. ++chars; /* all characters except \n */
%%
main() {
yylex();
printf(lines = %d, chars = %d\n, lines, chars);
58 }
Example III
%{
#define EOF 0
#define LE 25
...
%}
delim [ \t\n]
ws {delim}+
letter [A-Za-z]
digit [0-9]
id {letter}({letter}|{digit})*
number {digit}+(\.{digit}+)?(E[+\-]?{digit}+)?
59 %%
Example III
yylex()
a function implementing the lexical analyzer and returning
the token matched
yytext
a global pointer variable pointing to the lexeme matched
yyleng
a global variable giving the length of the lexeme matched
yylval
61 an external global variable storing the attribute of the token
NFA from Flex Programs
P1 | P2 | ... | Pn
N(P1)
s0 N(P2)
...
N(Pn)
62
Rules
63
Rules
64
Rules
%option yylineno
67