Beruflich Dokumente
Kultur Dokumente
Chapter 12
Context Free Grammars
• By definition a context-free grammar is a finite set of
variables (also called non-terminals or syntactic
categories - synonym for "variable") each of which
represents a language.
• The languages represented by the variables are
described recursively in terms of each other and
primitive symbols called terminals.
• The rules relating the variables are called productions.
Context Free Grammars
• Example
– S → aS
– S→Λ
– Continuous strings of as
• Strings with at least one double letter
– S → ADA
– A → aA
– A → bA
– A→Λ
– D → aa
– D → bb
Context Free Grammars
• A context-free grammar, is a collection of three things
– An alphabet of letters called terminals from which strings of
language are generated
– A set of symbols called nonterminals, one of which is a symbol S,
termed as the start symbol
– A finite set of productions (production rules) of the form
• One nonterminal Finite string of Terminals and / or Nonterminals
– S → aS
– S → bb
Context Free Grammars
• CFG For L = {anbn n 0 1 2 3 4 …}
– S → aSb
– S→Λ
– S → ab
• CFG For EQUAL
– S → aB
– S → bA
– A→a
– A → aS
– A → bAA
– B→b
– B → bS
– B → aBB
Context Free Grammars
• CFG For EQUAL
– S → aB
– S → bA
– A→a
– A → aS
– A → bAA
– B→b
– B → bS
– B → aBB
• Can be compactly written as
S → aB | bA <S> ::= a<B> | b<A>
A → a | aS | bAA <A >::= a | a<S> | b<A><A>
B → b | bS | Abb <B> ::= b | b<S> | <A>bb
Backus-Naur Form
• This format for writing a CFG is called Backus-Naur
Form
• It is abbreviated as BNF
– ALPHA → A|B|….|Z|a|b|c….|z
NUMERIC ALPHANUMERIC | Λ
– NUMERIC → 0|1|2…|9
Context Free Grammars
• CFG For arithmetic expressions
– <expression> <expression> + <expression>
– <expression> <expression> * <expression>
– <expression> <expression> - <expression>
– <expression> <expression> + <expression>
– <expression> (<expression>)
– <expression> <number>
Context Free Grammars
• Derivation or Generation
• S → abS | Λ
• S abS
– ababS
– abababS
– ababab
– abab
Parse Trees
• A tree format used for the derivation of a string from
the CFG
• Parse tree, Syntax tree, Generation tree, Production
tree, Derivation tree
• Start symbol of the CFG at root
• Non terminals are represented as nodes
• Terminals as leaves
• Every next level of tree is a derivation from a
production of CFG
• The yield of a parse tree is a terminal string held at
all the leaves
Parse Trees
• Examples
– S → abS | Λ
– Derivation of abababab
a b S
a b S
a b S
a b Λ
Derivation
• Left Most Derivation
– If a word w is generated by a CFG by a certain derivation and
at each step in the derivation a rule of production is applied to
the leftmost nonterminal in the working string then this
derivation is called a leftmost derivation
• Right Most Derivation
– If a word w is generated by a CFG by a certain derivation and
at each step in the derivation a rule of production is applied to
the rightmost nonterminal in the working string then this
derivation is called a rightmost derivation
Ambiguity
• A CFG is called ambiguous if for at least one
word in the language that it generated, there
are two possible derivations of the word that
corresponds to different syntax trees.
S S S S
a a
S S S a S a
a S S a a S S a
a a a a
• S → aS | a
Total language Tree
• A tree with Start symbol at its root and whose nodes
are working strings of terminals and nonterminal
aa bX aXX
– BNF
• <integer> ::=<number>| +<number> | -<number>
– EBNF
• <integer>::= [+|-]<digit>{<digit>}*
Problems
• CFG for Variable Declaration
– VarDec → Type Identifier;
– Type → int | float | double | char
– Identifier → Alpha Alphanumeric
– Alpha →a|b|…|z|A|B…|Z
– Aplhanumeric → Alpha Alphanumeric | Numeric
Alphanumeric | Λ
– Numeric → 0 | 1 | 2 | … | 9
Lukasiewicz Notation
• Prefix Notation
• S → S + S| S * S| number
– 3+4*5
+ *
3 * + 5
4 5 3 4
3+(4*5) (3+4)*5
Lukasiewicz Notation
• The operators no more remain nonterminal
• S → *| + |number
• + → ++|+*|+number|*+|**|*number| number+| number*| number
number
• * → ++|+*|+number|*+|**|*number|number+| number*| number
number
• Left most derivation
• Pre-order traversal of the tree built from this notation gives the
expression
• Evaluation (1+2) * (3+4) * 5 (looking for first o-o-o substring)
Language Span of CFGs
• All possible languages can be generated by CFGs
• It is of the form
(terminal)(terminal)…(terminal)Nonterminal
Regular Languages and CFGs
• All regular languages are also Context Free
• Therefore CFGs can be written for all RLs
• Theorem
– Given any FA, there is a CFG that generates
exactly the same language accepted by the FA.
– All regular languages are Context Free
– We will prove this using the Constructive Proof of
the Theorem i.e.
• Reduction of an FA into a CFG describing the same
languages
Regular languages and CFGs
• Conversion Algorithm
– The non terminals in the CFG will be all the names of
the states in the FA with the start state renamed S.
– For every edge at a state X leading to State Y
• Create the production X→aY and do the same for b edges
• For loops add the production X → aX
b
S → aM
S → bS Derivation of babbaaba
M →aF through CFG and
M →bS
traversal through FA
F →aF
F →bF
F→Λ
Regular Languages and CFG
• FA to CFG
– Words that contain a double aa
– All words having different first and last
letters
Regular Languages and CFG
• Solution
– Differentiate CFGs defining RLs and those defining non RLs
Regular Languages and CFGs
• Theorem
– If all the productions in a given CFG fit one
of the two forms
• Nonterminal → semiword
• Nonterminal → word
– Where word can be null, the language
generated by this CFG is regular
Regular languages and CFGs
• Proof
– Consider a general CFG of this form
• N1 → w1N2
• N2 → w2N3
• N3 → w3N4
• N4 →w5 (Can have many more productions)
– Ns are non-terminals while ws are terminals. Together they
form a familiar pattern: semiword
– Draw and label circles for all Ns and one extra circle labeled
with a +. Mark the S circle with -.
– For every production of the form Nx → w yNz draw a directed
edge from state Nx to Nz labelled with the word w
– If Nx = Nz then the path is a loop
– For every production of the form Np → wq draw a directed
edge from Np to + and label it with the word wq, even if wq is
Null
Regular Languages and CFGs
• Example
– S → aA | bB
– A → aS | a
– B → bS | b
Λ Productions
• Productions of the form
– N→Λ
– are called null (Λ) productions
• All grammars that generate the Λ string include at
least one null production
• Some grammars that do not generate Λ string still
might contain null productions
– S → aX
– X→Λ
Λ Productions
• Hazards of Λ Productions
– Create ambiguity in word derivation
– Pose problems in some advanced
algorithms following shortly
• Solution
– Kill Them !!!
Killing Null Productions
• Theorem
– If L is a context free language generated by CFG
that includes Λ-productions then there is a
different CFG that has no Λ- productions that
generates exactly the same language L with the
exception of only Λ.
Killing Λ Productions
• Constructive Algorithm
– Identify Null Productions
– Remove each of them one by one
– For each NT having a null production, add
productions where the NT has been replaced
by null
• Example
– S aSa | bSb |Λ becomes
– S aSa | bSb |aa |bb
Killing Λ Productions
• Problem Identified !!!
– S a | Xb | aYa
–XY|Λ
–Yb|X
Killing Λ Productions
54
Example
Consider the CFG
S Xa
X aX | bX | Λ
X bX Xb
55
Example
S XY
X Zb
Y bW • Null-able Non-terminals are?
Z AB • A, B, Z and W
WZ
A aA | bA | Λ
B Ba | Bb | Λ
56
S XY
X Zb
Y bW
Z AB Example Contd.
WZ
A aA | bA | Λ
B Ba | Bb | Λ