Sie sind auf Seite 1von 68

Context Free Grammars

Chapter 12
Context Free Grammars
• By definition a context-free grammar is a finite set of
variables (also called non-terminals or syntactic
categories - synonym for "variable") each of which
represents a language.
• The languages represented by the variables are
described recursively in terms of each other and
primitive symbols called terminals.
• The rules relating the variables are called productions.
Context Free Grammars
• Example
– S → aS
– S→Λ
– Continuous strings of as
• Strings with at least one double letter
– S → ADA
– A → aA
– A → bA
– A→Λ
– D → aa
– D → bb
Context Free Grammars
• A context-free grammar, is a collection of three things
– An alphabet  of letters called terminals from which strings of
language are generated
– A set of symbols called nonterminals, one of which is a symbol S,
termed as the start symbol
– A finite set of productions (production rules) of the form
• One nonterminal Finite string of Terminals and / or Nonterminals

• The strings of terminals and nonterminals can consist of only


terminals or of only nonterminals, or of any mixture of terminals
and nonterminals or even the empty string
• A CFG must has at least one production that has the nonterminal
S at its left side
Context Free Grammars
• Nonterminal / Variables / Syntactic category
– A symbol that can be substituted by some other
symbol(s)
– Variable because the same non-terminal can have
multiple substitutions
• Terminal
– A symbol that cannot be substituted further
– Letters from the alphabet set
Context Free Grammars
• Conventions for CFG
– Nonterminals are written in upper case letters

– Terminals Symbols are written in lower case

• Terminal symbols are also called atomic


symbols
Context Free Grammars
• Terminologies
– Generation or Derivation
• The sequence of applications of the rules that
produces the finished string of terminals from
the starting symbol is called a generation or a
derivation of the word
– Production
• The grammatical rules are called productions
Context Free Languages
• The language generated by a CFG is the set of all
strings of terminals that can be produced from the
start symbol S using the productions as
substitutions.

• A language generated by a CFG is called a Context


Free Language (CFL)
Context Free Grammars
• Non terminals vs. terminals
–S→X
–S→Y
–X→Λ
– Y → aY
– Y → bY
–Y→a
–Y→b
Context Free Grammars
• S → XaaX
• X → aX
• X → bX
• X→Λ
• (a+b)* aa (a+b)*
CFG
• Examples
– All strings that don’t end at ba
– All strings that contain the substring
“bbb”
– All strings that start and end with
different letters
CFG
• Which languages do these CFGs
define
– S → abS
– S → ab

– S → aS
– S → bb
Context Free Grammars
• CFG For L = {anbn n 0 1 2 3 4 …}
– S → aSb
– S→Λ
– S → ab
• CFG For EQUAL
– S → aB
– S → bA
– A→a
– A → aS
– A → bAA
– B→b
– B → bS
– B → aBB
Context Free Grammars
• CFG For EQUAL
– S → aB
– S → bA
– A→a
– A → aS
– A → bAA
– B→b
– B → bS
– B → aBB
• Can be compactly written as
S → aB | bA <S> ::= a<B> | b<A>
A → a | aS | bAA <A >::= a | a<S> | b<A><A>
B → b | bS | Abb <B> ::= b | b<S> | <A>bb
Backus-Naur Form
• This format for writing a CFG is called Backus-Naur
Form

• It is abbreviated as BNF

• Also called Backus Normal Form

• Consist of arrows to define production

• Vertical Bars to present choices (disjunction)

• Terminals and non Terminals to build a production


Variations in CFG Notations
• → or ::=
• <> For NonTerminals
• Underline the non terminals
• Symbol for null Λ, , 
Context Free Grammars
• CFG For identifier
– IDENTIFIER → ALPHA ALPHANUMERIC

– ALPHA → A|B|….|Z|a|b|c….|z

– ALPHANUMERIC → ALPHA ALPHANUMERIC |

NUMERIC ALPHANUMERIC | Λ

– NUMERIC → 0|1|2…|9
Context Free Grammars
• CFG For arithmetic expressions
– <expression>  <expression> + <expression>
– <expression>  <expression> * <expression>
– <expression>  <expression> - <expression>
– <expression>  <expression> + <expression>
– <expression>  (<expression>)
– <expression>  <number>
Context Free Grammars
• Derivation or Generation
• S → abS | Λ
• S  abS
–  ababS
–  abababS
–  ababab
–  abab
Parse Trees
• A tree format used for the derivation of a string from
the CFG
• Parse tree, Syntax tree, Generation tree, Production
tree, Derivation tree
• Start symbol of the CFG at root
• Non terminals are represented as nodes
• Terminals as leaves
• Every next level of tree is a derivation from a
production of CFG
• The yield of a parse tree is a terminal string held at
all the leaves
Parse Trees
• Examples
– S → abS | Λ
– Derivation of abababab

a b S

a b S

a b S

a b Λ
Derivation
• Left Most Derivation
– If a word w is generated by a CFG by a certain derivation and
at each step in the derivation a rule of production is applied to
the leftmost nonterminal in the working string then this
derivation is called a leftmost derivation
• Right Most Derivation
– If a word w is generated by a CFG by a certain derivation and
at each step in the derivation a rule of production is applied to
the rightmost nonterminal in the working string then this
derivation is called a rightmost derivation
Ambiguity
• A CFG is called ambiguous if for at least one
word in the language that it generated, there
are two possible derivations of the word that
corresponds to different syntax trees.

• A CFG which is not ambiguous is called


unambiguous CFG
Ambiguous Grammars
• S → aS |Sa |a
• Derivation of aaa

S S S S

a a
S S S a S a

a S S a a S S a

a a a a

• S → aS | a
Total language Tree
• A tree with Start symbol at its root and whose nodes
are working strings of terminals and nonterminal

• The descendant of each node are all the possible


results of applying every applicable production to
the working string one at a time. A string of all
terminals is a terminal node in the tree

• Total Language Tree


Total Language Tree
• S → aa | bX |aXX
• X → ab | b
S

aa bX aXX

bab bb aabX abX aXab aXb

aabab aabb abab abb aabab abab aabb aabb


EBNF
• BNF grammars are not an ideal notation
for communicating the rules to the
practicing programmer

• EBNF provides a complex set of


recursive rules
EBNF
• Notational Extensions
– An optional element may be indicated by
enclosing the element in square brackets []
– A choice of alternatives may use the symbol |
within a single rule optionally enclosed by
parenthesis if needed
– An arbitrary sequence of instances of an
element may be indicated by enclosing the
element in braces followed by an asterisk {…}*
EBNF
• Examples

– BNF
• <integer> ::=<number>| +<number> | -<number>

• <number> ::= <digit> | <number><digit>

– EBNF
• <integer>::= [+|-]<digit>{<digit>}*
Problems
• CFG for Variable Declaration
– VarDec → Type Identifier;
– Type → int | float | double | char
– Identifier → Alpha Alphanumeric
– Alpha →a|b|…|z|A|B…|Z
– Aplhanumeric → Alpha Alphanumeric | Numeric
Alphanumeric | Λ
– Numeric → 0 | 1 | 2 | … | 9
Lukasiewicz Notation
• Prefix Notation
• S → S + S| S * S| number
– 3+4*5

• S → (S + S)|(S * S)| number


– Derivations by replacement of NT with calculated results

• Arithmetic Operators are binary having operands


already in proper format
Lukasiewicz Notation
S S

+ *

3 * + 5

4 5 3 4

3+(4*5) (3+4)*5
Lukasiewicz Notation
• The operators no more remain nonterminal
• S → *| + |number
• + → ++|+*|+number|*+|**|*number| number+| number*| number
number
• * → ++|+*|+number|*+|**|*number|number+| number*| number
number
• Left most derivation
• Pre-order traversal of the tree built from this notation gives the
expression
• Evaluation (1+2) * (3+4) * 5 (looking for first o-o-o substring)
Language Span of CFGs
• All possible languages can be generated by CFGs

• All regular languages and some of the non-regular

languages can be generated by CFGs

• Some regular (not all) and some non-regular

languages can be generated by the CFGs

• Which statement is true?


Regular Languages and
CFG
• A semiword is a string of terminals(may be none)
concatenated with exactly one nonterminal on the
right.

• It is of the form
(terminal)(terminal)…(terminal)Nonterminal
Regular Languages and CFGs
• All regular languages are also Context Free
• Therefore CFGs can be written for all RLs
• Theorem
– Given any FA, there is a CFG that generates
exactly the same language accepted by the FA.
– All regular languages are Context Free
– We will prove this using the Constructive Proof of
the Theorem i.e.
• Reduction of an FA into a CFG describing the same
languages
Regular languages and CFGs
• Conversion Algorithm
– The non terminals in the CFG will be all the names of
the states in the FA with the start state renamed S.
– For every edge at a state X leading to State Y
• Create the production X→aY and do the same for b edges
• For loops add the production X → aX

– For every final state X, create the production X→Λ


a
a
x y x
Regular Languages and CFG
• The CFG generated through this procedure
generates the same language as accepted
by the FA
• Proof
– (i) Every word accepted by FA can be generated
by CFG
– (ii) Every word generated by CFG is accepted by
FA
Regular Languages and CFG
• Example
a a,b
b
a
S- M F+

b
S → aM
S → bS Derivation of babbaaba
M →aF through CFG and
M →bS
traversal through FA
F →aF
F →bF
F→Λ
Regular Languages and CFG

• FA to CFG
– Words that contain a double aa
– All words having different first and last
letters
Regular Languages and CFG

• Can a CFG be converted back to an FA, RE or a TG.

• Need a constructive algorithm if possible

• Would this algorithm be applicable to all CFGs

• What about CFGs defining non RLs: Failure !!!! FAs


cant be built for non RLs

• Solution
– Differentiate CFGs defining RLs and those defining non RLs
Regular Languages and CFGs

• Theorem
– If all the productions in a given CFG fit one
of the two forms
• Nonterminal → semiword
• Nonterminal → word
– Where word can be null, the language
generated by this CFG is regular
Regular languages and CFGs
• Proof
– Consider a general CFG of this form
• N1 → w1N2
• N2 → w2N3
• N3 → w3N4
• N4 →w5 (Can have many more productions)
– Ns are non-terminals while ws are terminals. Together they
form a familiar pattern: semiword
– Draw and label circles for all Ns and one extra circle labeled
with a +. Mark the S circle with -.
– For every production of the form Nx → w yNz draw a directed
edge from state Nx to Nz labelled with the word w
– If Nx = Nz then the path is a loop
– For every production of the form Np → wq draw a directed
edge from Np to + and label it with the word wq, even if wq is
Null
Regular Languages and CFGs

• The resultant figure is a transition graph

• Each path in this TG from – to + corresponds


to a word generated by the CFG

• Conversely derivation of a word from this CFG


corresponds to a path in the TG from – to +.

• The language of this CFG is regular


Regular Grammars
• Regular Grammars
– A CFG is called a regular grammar if each of
its productions is of one of the two forms
• Nonterminals → semiword
• Nonterminals → word

• Example
– S → aA | bB
– A → aS | a
– B → bS | b
Λ Productions
• Productions of the form
– N→Λ
– are called null (Λ) productions
• All grammars that generate the Λ string include at
least one null production
• Some grammars that do not generate Λ string still
might contain null productions
– S → aX
– X→Λ
Λ Productions
• Hazards of Λ Productions
– Create ambiguity in word derivation
– Pose problems in some advanced
algorithms following shortly
• Solution
– Kill Them !!!
Killing Null Productions
• Theorem
– If L is a context free language generated by CFG
that includes Λ-productions then there is a
different CFG that has no Λ- productions that
generates exactly the same language L with the
exception of only Λ.
Killing Λ Productions

• Constructive Algorithm
– Identify Null Productions
– Remove each of them one by one
– For each NT having a null production, add
productions where the NT has been replaced
by null
• Example
– S  aSa | bSb |Λ becomes
– S  aSa | bSb |aa |bb
Killing Λ Productions
• Problem Identified !!!
– S  a | Xb | aYa
–XY|Λ
–Yb|X
Killing Λ Productions

• Null able Non-terminal


– In CFG a nonterminal N is called nullable if
• There is a production N → Λ, or
• There is a derivation that starts at N and leads
to Λ (N  ….  Λ)
Killing Λ Productions
• Problem Solved !!!
• Modified Replacement Rule
– Delete all Λ-productions
– Add the following productions: For every
production X → old string add new productions of
the form X → .. Where the right side will account
for any modification of the old string that can be
formed by deleting all possible subsets of
nullable nonterminals while avoiding introduction
of a null production in this process
Killing Null Productions
• Not So Fast !!!!!!!!!!
– S → Xay | YY | aX | ZYX
– X → Za | bZ | ZZ | Yb
– Y → Ya| XY | Λ
– Z → aX | YYY
– How could one identify a nullable NT in
such a complex grammar
• Solution
– A bucket of Blue Paint
Example
Consider the CFG
S  a | Xb | aYa
XY|Λ
Yb|X
Old nullable New So the new CFG is
Production Production
XY nothing
S  a | Xb | aa | aYa |b
XΛ nothing
YX nothing XY
S  Xb Sb Yb|X
S  aYa S  aa

54
Example
Consider the CFG
S  Xa
X  aX | bX | Λ

Old nullable New So the new CFG is


Production roduction
S  Xa Sa S  a | Xa
X  aX | bX | a | b
X  aX Xa

X  bX Xb

55
Example

S  XY
X  Zb
Y  bW • Null-able Non-terminals are?

Z  AB • A, B, Z and W
WZ
A  aA | bA | Λ
B  Ba | Bb | Λ

56
S  XY
X  Zb
Y  bW
Z  AB Example Contd.
WZ
A  aA | bA | Λ
B  Ba | Bb | Λ

Old nullable New So the new CFG is


Production Production
S  XY
X  Zb Xb
Y  bW Yb X  Zb | b
Z  AB Z  A and Z  B Y  bW | b
WZ Nothing new
Z  AB | A | B
A  aA Aa
A  bA Ab WZ
B  Ba B a A  aA | bA | a | b
B  Bb Bb
B  Ba | Ba | a | b
57
Unit Productions
• A production of the form
– Nonterminal → one Nonterminal
– Is called a unit production

• Unit productions are some times required to


change the form of a working string
– (Arbitrary)A(arbitrary)
– (Arbitrary)B(Arbitrary)

• Unit Production are also problematic and


thus need to be exterminated
Killing Unit Productions
• Theorem
– If there is a CFG for the language that has no Λ-
productions, then there is also a CFG for L with no
Λ-productions and no unit productions
Killing Unit Productions
• Naïve Elimination Rule
– Eliminate unit productions one by one and replace them
with new productions without changing the language
being generated by the CFG
– Infinite loop and no benefit
• Example
– S → A |bb
– A→B|b
– B→S|a

• Modified Elimination Rule


– Eliminate all unit productions simultaneously
– Look for any sequence of productions that lead to a
replacement with a unit production. Replace all such
derived unit productions with the final replacement.
Killing Unit Productions
• Example
– S → A | bb
– A→B|b
– B→S|a
• Unit Productions
– S→A
– A→B
– B→S
• Derived Unit Production
– S→A→B
– A→B→S
– B→S→A
Killing Unit Productions
• New CFG
– S → bb|b|a
– A → b|a|bb
– B → a|bb|b
New Format for CFG
• Theorem
– If L is a language generated by some CFG,
then there is another CFG that generated
all the non-Λ words of L, all of whose
productions are of one of the two basic
forms
• Nonterminal → string of only Nonterminals
• Nonterminal → one terminal
New Format for CFG
• Proof
– Suppose a CFG contains non terminals S, X1, X2,X3 …
and two terminals a and b
– Add two new nonterminals A and B and two productions
• A→a
• B→b
– For every previous production involving terminals,
replace each a with the nonterminal a and b with the
nonterminal B
– Any production which is already in the desired form
should be left untouched to avoid introduction of unit
productions
– All the productions now are of the form
• Nonterminal → strings of only nonterminals
• Nonterminal → one terminal
New format for CFG
• Example
– S → X1 | X2aX2 | aSb | b
– X1 → X2X2 | b
– X2 → aX2 | aaX1
Chomsky Normal Form: The
Ultimate Target !
• If a CFG has only productions of the form
– Nonterminals → strings of exactly two
Nonterminals
– Nonterminals → one terminal
• It is said to be in Chomsky Normal Form, or
CNF
• Theorem
– For any context Free language L, the non Λ words
of the language can be generated by a CFG in CNF
format
CNF
• Proof
– Any CFG can be converted to the following
format
• Nonterminal → strings of Nonterminals or
• Nonterminal → one terminal
– For this new CFG modify the productions so that
they become in the CNF
– This conversion requires addition of new
nonterminals
• S → X1X2X3X4 will be converted to
– S → X1R1
– R1 → X2R2
– R2 → X3X4
CNF
• Example
– S → aSa | bSb | a | b | aa | bb
• CNF
– S → AR1
– R1 → SA
– S → BR3
– R3 →SB
– S → AA
– S → BB
– S→b
– S→a
– A→a
– B→b

Das könnte Ihnen auch gefallen