Sie sind auf Seite 1von 42

Foundations of Computer

Science (COSC-3302),
Lecture 4 (prepared after
Chapter 4 of Martins 2011
Stefan Andrei
textbook)

04/22/16

Lecture 4 COSC 3302

Course topics
1.1. Mathematical Tools and Techniques (Chapter 1)
1.2. Finite Automata and the Languages they Accept (Chapter 2)
1.3. Regular Expressions, Nondeterminism, and Kleenes Theorem (Chapter 3)
Exam 1
2.1. Context-Free Languages (Chapter 4)
2.2. Pushdown Automata (Chapter 5)
2.3. Context-Free and Non-Context-Free Languages (Chapter 6)
Exam 2
3.1. Turing Machines (Chapter 7)
3.2. Recursively Enumerable Languages (Chapter 8)
3.3. Undecidable Problems (Chapter 9)
3.4. Computable Functions (Chapter 10)
3.5. Introduction to Computational Complexity (Chapter 11)
Exam 3
04/22/16

Lecture 4 COSC 3302

Overview of Previous
Lecture
Regular Expressions, Nondeterminism, and
Kleenes Theorem
1.
2.
3.
4.

Regular Languages and Regular Expressions


Nondeterministic Finite Automata
The Nondeterminism in an NFA Can Be Eliminated
Kleenes Theorem

04/22/16

Lecture 4 COSC 3302

Context-Free Languages
1.
2.

3.
4.
5.

Using Grammar Rules to Define a Language


Context-Free Grammars: Definitions and More
Examples
Regular Languages and Regular Grammars
Derivation Trees and Ambiguity
Simplified Forms and Normal Forms

04/22/16

Lecture 4 COSC 3302

Using Grammar Rules to


Define a Language

Regular languages and FAs are too simple for


many purposes.

Using context-free grammars allows us to describe


more interesting languages.
Much high-level programming language syntax can
be expressed with context-free grammars.
Context-free grammars with a very simple form
provide another way to describe the regular
languages.

Grammars can be ambiguous.


We will study how derivations can be related
to the structure of the string being derived.

04/22/16

Lecture 4 COSC 3302

Using Grammar Rules to


Define a Language (contd.)

A grammar is a set of rules, usually simpler than those of


English, by which strings in a language can be generated
Consider the language AnBn = {anbn | n 0}, defined
using the recursive definition:
AnBn
For every S AnBn, aSb AnBn
Think of S as a variable representing an arbitrary
element, and write these rules as:
S
S aSb
(In the process of obtaining an element of AnBn, S can be
replaced by either string.)

04/22/16

Lecture 4 COSC 3302

Using Grammar Rules to


Define a Language (contd.)

If and are strings, and contains at least one


occurrence of S, then means that is
obtained from in one step, by using one of the
two rules to replace a single occurrence of S by
either or aSb
For example, we could write:
S aSb aaSbb aaaSbbb aaabbb
to describe a derivation of the string aaabbb.

We can simplify the rules by using the | symbol to


mean or, so that the rules become S | aSb.

04/22/16

Lecture 4 COSC 3302

Context-Free Grammars:
Definitions and More
Definition: A context-free grammar (CFG) is a 4Examples
tuple G=(V, , S, P), where V and are disjoint

finite sets, S V, and P is a finite set of


formulas of the form A , where A V and
(V )*.

Elements of are terminal symbols, or terminals,


and elements of V are variables, or nonterminals.
S is the start variable, and elements of P are
grammar rules, or productions.
We use for productions in a grammar and for
a step in a derivation.
The notations n and * refer to n steps
and zero or more steps, respectively.

04/22/16

Lecture 4 COSC 3302

Context-Free Grammars:
Definitions and More
We will sometimes write to indicate a derivation
Examples
(contd.)
in a particular grammar
G

means that there are strings 1, 2, and in


(V )* and a production A in P such that
= 1A2 and = 12
This is a single step in a derivation.
What is the meaning of context free?

Answer: What makes the grammar context-free is


that the production above, with left side A, can be
applied wherever A occurs in the string (irrespective
of the context; i.e., regardless of what 1 and 2 are).

04/22/16

Lecture 4 COSC 3302

Context-Free Grammars:
Definitions and More
Definition: If
Examples
(contd.)
G = (V, , S, P) is a CFG,
the language generated by G is

L(G) = { x * | S G* x }
(S is the start variable, and x is a string of
terminals).

A language L is a context-free language


(CFL) if there is a CFG G with L = L(G).

04/22/16

Lecture 4 COSC 3302

10

Context-Free Grammars:
Definitions and More
Consider AEqB = {x {a,b}* | n (x) = n (x)}
Examples
(contd.)
a

Let us design a CFG for AEqB.


If x is a non-null string in AEqB then either x =
ay, where y Lb = {z | nb(z) = na(z) + 1}, or x
= by, where y La = {z | na(z) = nb(z) + 1}.

We represent Lb by the variable B and La by the


variable A.
The productions so far are S | aB | bA.
All we need now are productions for A and B.

04/22/16

Lecture 4 COSC 3302

11

Context-Free Grammars:
Definitions and More
If a string x L starts with a, then the remainder is a
Examples
(contd.)
member of AEqB.

If it starts with b, the rest has two more as than bs.


Observation: a string containing two more as than
bs must be the concatenation of two strings, each
with one more a; similarly with a and b reversed.
The grammar resulting from these observations is
S | aB | bA
A aS | bAA
B bS | aBB
(Note: if A were the start variable, it would generate La)

04/22/16

Lecture 4 COSC 3302

12

Context-Free Grammars:
Definitions and More
Theorem 4.9: If L and L are CFLs over , then
Examples
so are L L , (contd.)
L L , and L *.
1

1 2

Proof. Suppose G1 and G2 are CFGs that


generate L1 and L2 respectively, and assume
that they have no variables in common.
Suppose that S1 and S2 are the start variables.
Su, Sc and Sk, the start variables of the new
grammars, will be new variables.

Gu just adds the rules Su S1 | S2 to G1 and G2


Gc just adds the rule Sc S1S2 to G1 and G2
Gk just adds the rules Sk | SkS1 to G1

04/22/16

Lecture 4 COSC 3302

13

Regular Languages and


Regular Grammars

The three operations in Theorem 4.9 are the ones


involved in the recursive definition of regular
languages.
The basic regular languages over , and {},
are easily seen to be CFLs.
Now we can prove by structural induction that every
regular language over is a CFL.
In fact, however, the CFG can be of a simpler form.
Definition 4.13: A context-free grammar is
regular if every production is of the form A B or A
.

04/22/16

Lecture 4 COSC 3302

14

Regular Languages and


Regular Grammars (contd.)

Theorem 4.14: For every language L *, L is


regular if and only if L = L(G) for some regular
grammar G.
Proof:
(only if) If L is a regular language, then there is a
FA
M=(Q, , q0, A, ) that accepts it.

Define G=(V, , S, P) by letting

04/22/16

V be Q,
S the initial state q0, and
P the set containing the production T aU for every
transition (T, a) = U in M and the production T for
every accepting state T of M.
Lecture 4 COSC 3302

15

Regular Languages and


Regular Grammars (contd.)

G is a regular grammar, and G accepts the


same language as M.

For every x = a1a2an, the transitions on these


symbols that start at q0 end at an accepting
state if and only if there is a derivation of x in G.

(if) To prove the other direction we can start


with a regular grammar G and reverse the
construction to produce M.

M may be an NFA, but it still accepts L(G), and it


follows that L(G) is regular.

04/22/16

Lecture 4 COSC 3302

16

Example

What is the regular grammar


corresponding to the FA on the
right? (get a *)
Let S, A, B, C, and D be the nonterminals corresponding to states
q0, q1, q2, q3, and q4.
S aA | aB | bD
A aS
B aC
C bS
D

04/22/16

Lecture 4 COSC 3302

17

Derivation Trees and


Ambiguity

So far we have been interested in what


strings a CFG generates.
It is also useful to consider how a string is
generated by a CFG.
A derivation may provide information
about the structure of a string, and if a
string has several possible derivations, one
may be more appropriate than another.
We can draw trees to represent
derivations.

04/22/16

Lecture 4 COSC 3302

18

Derivation Trees and


Ambiguity (contd.)

The root node represents the start variable S.


Any interior node and its children represent a
production A used in the derivation; the node
represents A, and the children, from left to right,
represent the symbols in .
Each leaf node represents a symbol or .
The string derived is read off from left to right,
ignoring s.
Every derivation has exactly one derivation tree,
but a tree can represent more than one
derivation.

04/22/16

Lecture 4 COSC 3302

19

Derivation Trees and


Ambiguity
In a derivation, (contd.)
at each step some production

is applied to some occurrence of a variable.


Consider a derivation that starts S S + S.
We could apply a production to either the first
or second of the Ss, but the resulting trees
would be the same.
When we talk about a string having several
possible derivations, one being more
appropriate, we are talking about derivations
corresponding to different trees.

04/22/16

Lecture 4 COSC 3302

20

Derivation Trees and


Ambiguity
(contd.)
We can distinguish
between trivially different

derivations and essentially different ones by


specifying that in a derivation, we always choose
the left-most variable to expand.
Definition 4.16: A derivation in a CFG is a
leftmost derivation (LMD) if, at each step, a
production is applied to the leftmost variableoccurrence in the current string.
Note: A rightmost derivation (RMD) is defined
similarly, the difference being that at each step,
a production is applied to the rightmost variableoccurrence in the current string.

04/22/16

Lecture 4 COSC 3302

21

Derivation Trees and


Theorem 4.17:
Ambiguity
(contd.)
If G is a CFG, then for any x

L(G) these three statements are equivalent:

x has more than one derivation tree


x has more than one LMD
x has more than one RMD.

Proof: see book.


Definition 4.18: A CFG G is ambiguous if, for
at least one x L(G), x has more than one
derivation tree (or equivalently, according to
Theorem 4.17, more than one LMD).

04/22/16

Lecture 4 COSC 3302

22

Derivation Trees and


Ambiguity
(contd.)
A classic example of ambiguity is the dangling else.

In C, an if-statement can be defined by


S if ( E ) S | if ( E ) S else S
| OS.
(where OS stands for other statement)
Consider the statement
if (e1)
if (e2) f();
else g();
In C, the else belongs to the second if, but this
grammar does not rule out the other
interpretation.

04/22/16

Lecture 4 COSC 3302

23

The two derivation trees show the two interpretations of a


dangling else.

04/22/16

Lecture 4 COSC 3302

24

Derivation Trees and


Ambiguity (contd.)

Clearly the grammar given is ambiguous,


but there are equivalent grammars that
allow only the correct interpretation.
Example:
S S1 | S2

04/22/16

S1

if (

S1

S2

if (

S|

else
if (

S1 | OS
E

Lecture 4 COSC 3302

S1

else

S2

25

Derivation Trees and


Consider
the CFG G(contd.)
: SS+S | S*S
Ambiguity

| (S) | a
G generates simple algebraic expressions.
One reason for ambiguity is that the relative
precedence of + and * has not been specified: a+a*a
could be interpreted as (a+a)*a or as a+(a*a).
In fact, S S + S causes ambiguity by itself, because
a+a+a could be interpreted as either (a+a)+a or a+
(a+a). Similarly for S S * S.
We might try to correct both problems by using the
productions:
S S + T | T
T T + F | F
(think of T as term and F as factor.)

04/22/16

Lecture 4 COSC 3302

26

Derivation Trees and


Ambiguity
* now has higher(contd.)
precedence than + (all the

multiplications are performed within a term).


By making the production S S + T, not S T + S,
we make + associate to the left. Similarly for *.
We want parenthetical expressions to be evaluated
first; this means we should consider such an
expression to be part of a factor.
The resulting unambiguous CFG generating L(G) is
S S+T|T
T T*F | F
F (S) | a
(proofs of unambiguity and equivalence are both
somewhat complicated)

04/22/16

Lecture 4 COSC 3302

27

Simplified Forms and Normal


Questions about the strings generated by a CFG are
Forms

sometimes easier to answer if we know something


about the form of the productions.
For example, if we know that a grammar has no
-productions and no unit productions (A
B) we can deduce that no derivation of a string x
can take more than 2|x| - 1 steps (see book for
details).
We could then, in principle, determine whether x
can be derived by considering derivations no
longer than this.
We show how to modify an arbitrary CFG to have no
productions of either of these types.

04/22/16

Lecture 4 COSC 3302

28

Simplified and Normal Forms


Suppose we have the productions A
(contd.)
BCDC, B , and C .

We must retain the production A BCDC


but we should add instead:

If we get rid of -productions, then the steps that


replace B and C by will no longer be possible,
but we must still be able to get all the same nonnull strings from A.

A CDC | BDC | BCD | DC | CD | BD | D .

We will need to know what variables can


derive (we will call such a non-terminal a
nullable variable).
04/22/16

Lecture 4 COSC 3302

29

Simplified and Normal Forms


(contd.)

Definition 4.26: A recursive definition of


the set of nullable variables of G.

If there is a production A then A is


nullable.
If A1, A2, , Ak are nullable variables and
there is a production B A1A2 Ak , then B
is nullable.

This leads immediately to an algorithm for


identifying the nullable variables.

04/22/16

Lecture 4 COSC 3302

30

Simplified and Normal Forms


Theorem
4.27: For every CFG G = (V, , S, P) the
(contd.)

following algorithm produces a CFG G1=(V, , S, P1)


having no -productions for which L(G1) = L(G) {}.

Proof.

Identify the nullable variables in V and initialize P1 to P.

For every production A in P, add to P1 every production


obtained by deleting from one or more variableoccurrences involving a nullable variable.
Delete every -production from P , as well as every
1
production of the form A A.

Note: If belongs to the CFG language, then a new start


symbol S can be defined with two new productions: S S | .

04/22/16

Lecture 4 COSC 3302

31

Examples

Let G = ({S}, {a, b}, S, {S aSb | }).


(get a star) What is the equivalent CFG without -productions?
Let us consider G = ({S, S}, {a, b}, S, {S S | , S aSb |
ab}).
(get a star) How about G = ({S, A}, {a, b, c}, S, {S cS | aAb,
A aAb | })?
Let us consider G = ({S, A}, {a, b, c}, S, {S cS | aAb | ab, A
aAb | ab}).
(get a star) What is the language generated by the second CFG?
L(G) = {cmanbn | m 0, n 1}.

04/22/16

Lecture 4 COSC 3302

32

Simplified and Normal Forms


(contd.)
The procedure we use to eliminate unit

productions is similar.
We first identify pairs of variables (A, B) for
which
A * B (in this case we call B Aderivable); then for each such pair (A, B)
and each nonunit production B , we add
the production A .
Such pairs can be found as follows:

If A B is a production, then B is A-derivable.


If C is A-derivable and C B is a production, then
B is A-derivable.
No other variables are A-derivable.

04/22/16

Lecture 4 COSC 3302

33

Simplified and Normal Forms


(contd.)
Theorem 4.28: For every CFG G = (V, , S,
P) without -productions, the CFG G1=(V, ,
S, P1) produced by the following algorithm
generates the same language as G and has
no unit productions:

Initialize P1 to P, and for each A V, identify


the
A-derivable variables.
For every such pair A * B and every nonunit
production B , add the production A to
P1.

Delete all unit productions from P1.

04/22/16

Lecture 4 COSC 3302

34

Example

Let us consider G = ({S, A, C}, {a, b, c}, S, {S C,


C cC | A, A aAb | ab}).
We have two unit productions: S C, and C A.
We get two trivial derivations: S C, C A
and by transitivity, we get a new Sderivable one: S * A.
The equivalent CFG is given by:

04/22/16

G = ({S, A}, {a, b, c}, S, {S cC | aAb | ab, C cC |


aAb | ab, A aAb | ab}).
Lecture 4 COSC 3302

35

Simplified and Normal Forms


(contd.)
Definition 4.29: A CFG is said to be in
Chomsky normal form if every production is
of one of these two types:
A BC (where B and C are variables)
A (where is a terminal)

Theorem 4.30: For every context-free


grammar G, there is another CFG G1 in
Chomsky normal form such that L(G1) = L(G)
{}.
The algorithm on the next slide shows how to
generate G1.

04/22/16

Lecture 4 COSC 3302

36

Simplified and Normal Forms


(contd.)
The first step is to eliminate -productions and unit

productions.
The second step is to introduce for every terminal
symbol a new variable X and production X .
In every production, replace every terminal by its
new variable (except for the new productions
above).
Replace a production like A BACB by the
productions A BY1, Y1 AY2, Y2 CB, where Y1
and Y2 are new variables.

The resulting CFG is in Chomsky normal form (CNF).

04/22/16

Lecture 4 COSC 3302

37

Example of getting a CNF

G=({S,X}, {a, b}, S, {S ab | aXb, X ab | aXb})


Replace each terminal symbol by non-terminal symbol and a
rule that goes into the terminal symbol:
S AB | AXB, X AB | AXB, A a, B b
The rules S AB, X AB, A a, B b are already in
Chomsky Normal Form.
We need to transform the other rules, these are, S AXB, X
AXB.
The rule S AXB is transformed into S AY, Y XB.
The rule X AXB is transformed into X AZ, Z XB.
Note: Since XB repeats twice, we can have instead X AY.
The final CFG: S AB | AY, Y XB, X AB | AY, A a, B
b.
04/22/16

Lecture 4 COSC 3302

38

Summary

Context-Free Languages

1.

Using Grammar Rules to Define a Language


Context-Free Grammars: Definitions and More
Examples
Regular Languages and Regular Grammars
Derivation Trees and Ambiguity
Simplified Forms and Normal Forms

2.

3.
4.
5.

04/22/16

Lecture 4 COSC 3302

39

Reading suggestions

From [Martin; 2011]


Chapter 4 (Context-Free Languages)

04/22/16

Lecture 4 COSC 3302

40

Coming up next

From [Martin; 2011]:


Chapter 5 (Pushdown Automata)

04/22/16

Lecture 4 COSC 3302

41

Thank you for your


attention!
Questions?

04/22/16

Lecture 4 COSC 3302

42

Das könnte Ihnen auch gefallen