Sie sind auf Seite 1von 24

Instructor: Marryam Murtaza

Compiler: Revisietd
Compiler

Synthesis
Analysis

Front End Back End

Generate Generate
Intermediate Code Target Code
Formal Languages
 In mathematics, computer science and linguistics, a formal
language is a set of strings of symbols that may be
constrained by rules that are specific to it
 Alphabet
The alphabet of a formal language is the set of symbols, letters, or
tokens from which the strings of the language may be formed
 Words
The strings formed from the alphabet of a language are called
words
Formal Languages Cont…
A formal language L is a set of finite-length
words (or "strings") over some finite alphabet
A.  is the empty word.
Example:
A = {a, b, c}
L1 = {ab, c}
A model of a compiler front end
Chapter 3
Lexical Analysis
Operations on Languages: Revisited
If L = {A, B,…, Z, a, b,…, z} and D = {0,1,…,9}
LUD = ? 62 strings of length

LD=? 520 strings of length

L4=? Set of all four letter strings

L*=? Set of all strings including ∊


L(LUD)*=? all strings of letters and digits
beginning with a letter
+
L =?
set of all strings of one or more digits
Regular Expression
• if A and B are regular expressions, so are
– A | B (alternation)
• A regular expression formed by A or B
• (a)|(b) = {a, b}

– AB or A•B (concatenation)
• A regular expression formed by A followed by B
• (a)(b) = {ab}

– A* (Kleene closure)
• A regular expression formed by zero or more repetitions of A
• a* = {, a, aa, aaa, …}
More Complex Example
(a|b|c)* = {, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc …}
Regular Expression
• Some notational convenience
– P+
• PP* (at least one)

– Not(A)
• V-A

– AK
• AA …A (k copies)

– A?
• Optional, zero or one occurrence of A
Finite Automata and Scanners
• Finite automaton (FA)
– can be used to recognize the tokens specified by a regular expression

• A FA consists of
– A finite set of states S
– A set of input symbols  (the input symbol alphabet)
– A set of transitions (or moves) from one state to another,
labeled with characters in V
– A special start state s0 (only one)
– A set of final, or accepting, states F

FA = {S, , s0, F, move }


Finite Automata and Scanners
is a state

is a transition

is the start state

is a final state

Example at next page….


Finite Automata and Scanners
• Example
– A transition diagram
• This machine accepts (abc+)+

( a b c +) +

a b c

c
Finite Automata and Scanners
• Other Example
– (0|1)*0(0|1)(0|1)

0 0,1 0,1
1 2 3 4

0,1 0

0,1
(0|1) * 0 (0|1) (0|1)
Finite Automata and Scanners
• Other Example
– ID = L(L|D)*(_(L|D)+)* Final for two * symbol
L (L|D)* (_(L|D)+)*

L|D

L -

L|D
Regular Expressions for Identifier & Keyword

Identifier keyword
Identifiers are names that are given C take some reserved word called
to various program elements, such keyword, they have predefine
as variable, function and arrays. meaning in C.
Identifiers consist of letters and Keyword consist only letter.
digits.
Identifier’s first character must be a Keyword’s all characters are letters.
letter.
Identifiers use Upper and lowercase Keywords are all lowercase.
letters.
Upper and lowercase letters are not Upper and lowercase letters are also
equivalent. not equivalent.
Like: X, xy, sum_5, _weather etc. Like: class, struct, auto, short, long
But 4th is not identifier cause identifier etc.
first character must a letter.
A grammar for branching statements
Transition Diagrams
Recognition of Reserved Words and
Identifiers
Table Encoding of FA
Two kinds of FA
• Deterministic: next transition is unique
• Non-deterministic: otherwise

a ...

a
...

Which path we should select?


Nondeterministic Finite Automaton
• An NFA can have multiple transitions for one
input in a given state
Deterministic Finite Automaton (DFA)
• There is only one transition per input per
state.
• There are no ∊-moves
• DFAs are easier to implement – table driven
Assignment 1 (Part a)
• Construct an FA for the following regular
expressions
– (XYZ+)*
– (0|1)1+(000)*(0|1)*
– ((ABC+)DB+)+
– (ABC*)*DC*
– (L|D) +_(D|L) *
Assignment 1 (Part b)
• Construct FA that accept multiple identifiers.
• Rules to construct identifiers:
– First letter should be capital alphabet
– Remaining letter may contains small or capital
alphabet, any digit, or symbol (i.e _, @, $, % etc)
– Last letter should end with a digit
– Spaces are not allowed
• Note: First make the regular expression and
then construct FA for the specific language

Das könnte Ihnen auch gefallen