Beruflich Dokumente
Kultur Dokumente
Token:
group of characters forming basic, atomic chunk of syntax;
parser:
handle grouping tokens into syntax trees
a word
Restricted nature of scanning allows faster implementation
scanning is time-consuming in many compilers
Whitespace:
characters between tokens that are ignored
Craig Chambers
17
CSE 401
Craig Chambers
18
Complications
CSE 401
Alternatives:
Fortran: line-oriented, whitespace doesnt separate
do 10 i = 1.100
.. a loop ..
Pattern: typically defined using a regular expression
10 continue
Haskell: can use identation & layout to imply grouping
Craig Chambers
19
CSE 401
Craig Chambers
20
CSE 401
Classes of languages
all languages
Turing-computable languages
context-free languages
regular languages
Craig Chambers
21
CSE 401
Craig Chambers
22
Notational conveniences
Defined inductively
base cases:
CSE 401
Ek means k occurrences of E
{E} means E*
inductive cases:
sequence of two REs: E1E2
either of two REs: E1|E2
Notes:
can use parentheses for grouping
precedence: * highest, sequence, | lowest
whitespace insignificant
Craig Chambers
23
CSE 401
Craig Chambers
24
CSE 401
Identifiers
ident
Integer constants
letter
::= a | b | ... | z
digit
::= 0 | 1 | ... | 9
alphanum
integer
::= digit+
sign
::= + | -
signed_int
Craig Chambers
25
CSE 401
real
::= signed_int
[fraction] [exponent]
fraction
::= . digit+
exponent
Craig Chambers
26
CSE 401
Meta-rules
char*
string
::= "
character
"
char
escape
::= \("|'|\|n|r|t|v|b|a)
Whitespace
whitespace
comment
Craig Chambers
27
CSE 401
Craig Chambers
28
CSE 401
Program
Token
Digit)*
ID
Letter
Digit
::= 0 | ... | 9
Integer
::= Digit+
Delimiter
::= ; | . | , | = |
( | ) | { | } | [ | ]
Whitespace
Craig Chambers
29
CSE 401
Craig Chambers
30
CSE 401
Determinism
An FSA has:
a set of states
Deterministic: always know which way to go
Example:
not(*)
*
0
Craig Chambers
31
CSE 401
Craig Chambers
32
CSE 401
A solution
A problem:
Can it be bridged?
Craig Chambers
33
CSE 401
Craig Chambers
34
RE NFA
NFA DFA
Define by cases
CSE 401
E1 E2
E1 | E2
Craig Chambers
35
CSE 401
Craig Chambers
36
CSE 401
DFA code
Pros
straightforward to write by hand
fast
Cons
if T not empty:
if a DFA state has T as a label, add a transition labeled x from S
to T
otherwise create a new DFA state labeled T, add a transition
labeled x from S to T, and process T
Craig Chambers
37
CSE 401
Pros
convenient for automatic generation
exactly matches specification, if tool-generated
Cons
magic
table lookups may be slower than direct code
but switch statements get compiled into table lookups, so....
can translate table lookups into switch statements, if beneficial
Craig Chambers
39
CSE 401
Craig Chambers
38
CSE 401