Beruflich Dokumente
Kultur Dokumente
Compiler: Revisietd
Compiler
Synthesis
Analysis
Generate Generate
Intermediate Code Target Code
Formal Languages
In mathematics, computer science and linguistics, a formal
language is a set of strings of symbols that may be
constrained by rules that are specific to it
Alphabet
The alphabet of a formal language is the set of symbols, letters, or
tokens from which the strings of the language may be formed
Words
The strings formed from the alphabet of a language are called
words
Formal Languages Cont…
A formal language L is a set of finite-length
words (or "strings") over some finite alphabet
A. is the empty word.
Example:
A = {a, b, c}
L1 = {ab, c}
A model of a compiler front end
Chapter 3
Lexical Analysis
Operations on Languages: Revisited
If L = {A, B,…, Z, a, b,…, z} and D = {0,1,…,9}
LUD = ? 62 strings of length
– AB or A•B (concatenation)
• A regular expression formed by A followed by B
• (a)(b) = {ab}
– A* (Kleene closure)
• A regular expression formed by zero or more repetitions of A
• a* = {, a, aa, aaa, …}
More Complex Example
(a|b|c)* = {, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc …}
Regular Expression
• Some notational convenience
– P+
• PP* (at least one)
– Not(A)
• V-A
– AK
• AA …A (k copies)
– A?
• Optional, zero or one occurrence of A
Finite Automata and Scanners
• Finite automaton (FA)
– can be used to recognize the tokens specified by a regular expression
• A FA consists of
– A finite set of states S
– A set of input symbols (the input symbol alphabet)
– A set of transitions (or moves) from one state to another,
labeled with characters in V
– A special start state s0 (only one)
– A set of final, or accepting, states F
is a transition
is a final state
( a b c +) +
a b c
c
Finite Automata and Scanners
• Other Example
– (0|1)*0(0|1)(0|1)
0 0,1 0,1
1 2 3 4
0,1 0
0,1
(0|1) * 0 (0|1) (0|1)
Finite Automata and Scanners
• Other Example
– ID = L(L|D)*(_(L|D)+)* Final for two * symbol
L (L|D)* (_(L|D)+)*
L|D
L -
L|D
Regular Expressions for Identifier & Keyword
Identifier keyword
Identifiers are names that are given C take some reserved word called
to various program elements, such keyword, they have predefine
as variable, function and arrays. meaning in C.
Identifiers consist of letters and Keyword consist only letter.
digits.
Identifier’s first character must be a Keyword’s all characters are letters.
letter.
Identifiers use Upper and lowercase Keywords are all lowercase.
letters.
Upper and lowercase letters are not Upper and lowercase letters are also
equivalent. not equivalent.
Like: X, xy, sum_5, _weather etc. Like: class, struct, auto, short, long
But 4th is not identifier cause identifier etc.
first character must a letter.
A grammar for branching statements
Transition Diagrams
Recognition of Reserved Words and
Identifiers
Table Encoding of FA
Two kinds of FA
• Deterministic: next transition is unique
• Non-deterministic: otherwise
a ...
a
...