Beruflich Dokumente
Kultur Dokumente
Introduction to
Theoretical Computer Science
Section G
Instructor: G. Grahne
Lectures: Tuesdays and Thursdays,
11:45 { 13:00, H 521
O_ce hours: Tuesdays,
14:00 - 15:00, LB 903-11
_ All slides shown here are on the web.
www.cs.concordia.ca/~teaching/comp335/200
3F
1
Thanks to David Ford for TEX assistance.
Thanks to the following students of comp335
Winter 2002, for spotting errors in previous
versions of the slides: Omar Khawajkie,
Charles
deWeerdt, Wayne Jang, Keith Kang, BeiWang,
Yapeng Fan, Monzur Chowdhury, Pei Jenny
Tse, Tao Pan, Shahidur Molla, Bartosz
Adamczyk,
Hratch Chitilian, Philippe Legault.
Tutor: TBA
Tutorial: Tuesdays, 13:15 { 14:05, H 411
_ Tutorials are an integral part of this course.
2
Course organization
Textbook: J. E. Hopcroft, R. Motwani, and
J. D. Ullman Introduction to Automata Theory,
Languages, and Computation, Second Edition,
Addison-Wesley, New York, 2001.
Sections: There are two parallel sections. The
material covered by each instructor is roughly
the same. There are four common
assignments
and a common _nal exam. Each section
will have di_erent midterm tests.
Assignments: There will be four assignments.
Each student is expected to solve the
assignments
independently, and submit a solution for
every assigned problem.
3
Examinations: There will be three midterm
examinations, each lasting thirty minutes and
covering the material of the most recent
assignment.
The _nal examination will be a threehour
examination at the end of the term.
Weight distribution:
Midterm examinations: 3 _ 15% = 45%;
Final examination: = 55%:
_ At the end of the term, any midterm exam
mark lower than your _nal exam mark will be
replaced by your _nal exam mark. To pass
the course you must submit solutions for all
assigned problems.
4
Important: COMP 238 and COMP 239 are
prerequisites. For a quick refresher course,
read Chapter 1 in the textbook.
_ Spend some time every week on:
(1) learning the course content,
(2) solving exercises.
_ Visit the course web site regularly for
updated
information.
5
Motivation
_ Automata = abstract computing devices
_ Turing studied Turing Machines (=
computers)
before there were any real computers
_ We will also look at simpler devices than
Turing machines (Finite State Automata,
Pushdown
Automata, . . . ), and speci_cation means,
such as grammars and regular expressions.
_ NP-hardness = what cannot be e_ciently
computed
6
Finite Automata
Finite Automata are used as a model for
_ Software for designing digital cicuits
_ Lexical analyzer of a compiler
_ Searching for keywords in a _le or on the
web.
_ Software for verifying _nite state systems,
such as communication protocols.
7
_ Example: Finite Automaton modelling an
on/o_ switch
Push
Push
Start
off on
_ Example: Finite Automaton recognizing the
string then
t th the
Start t h e n
then
8
Structural Representations
These are alternative ways of specifying a
machine
Grammars: A rule like E ) E+E speci_es an
arithmetic expression
_ Lineup ) Person:Lineup
says that a lineup is a person in front of a
lineup.
Regular Expressions: Denote structure of
data,
e.g.
'[A-Z][a-z]*[][A-Z][A-Z]'
matches Ithaca NY
does not match Palo Alto CA
Question: What expression would match
Palo Alto CA
9
Central Concepts
Alphabet: Finite, nonempty set of symbols
Example: _ = f0; 1g binary alphabet
Example: _ = fa; b; c; : : : ; zg the set of all
lower
case letters
Example: The set of all ASCII characters
Strings: Finite sequence of symbols from an
alphabet _, e.g. 0011001
Empty String: The string with zero
occurrences
of symbols from _
_ The empty string is denoted _
10
Length of String: Number of positions for
symbols in the string.
jwj denotes the length of string w
j0110j = 4; j_j = 0
Powers of an Alphabet: _k = the set of
strings of length k with symbols from _
Example: _ = f0; 1g
_1 = f0; 1g
_2 = f00; 01; 10; 11g
_0 = f_g
Question: How many strings are there in _3
11
The set of all strings over _ is denoted __
__ = _0 [ _1 [ _2 [ _ _ _
Also:
_+ = _1 [ _2 [ _3 [ _ _ _
__ = _+ [ f_g
Concatenation: If x and y are strings, then
xy is the string obtained by placing a copy of
y immediately after a copy of x
x = a1a2 : : : ai; y = b1b2 : : : bj
xy = a1a2 : : : aib1b2 : : : bj
Example: x = 01101; y = 110; xy = 01101110
Note: For any string x
x_ = _x = x
12
Languages:
If _ is an alphabet, and L _ __
then L is a language
Examples of languages:
_ The set of legal English words
_ The set of legal C programs
_ The set of strings consisting of n 0's followed
by n 1's
f_; 01; 0011; 000111; : : :g
13
_ The set of strings with equal number of 0's
and 1's
f_; 01; 10; 0011; 0101; 1001; : : :g
_ LP = the set of binary numbers whose value
is prime
f10; 11; 101; 111; 1011; : : :g
_ The empty language ;
_ The language f_g consisting of the empty
string
Note: ; 6= f_g
Note2: The underlying alphabet _ is always
_nite
14
Problem: Is a given string w a member of a
language L?
Example: Is a binary number prime = is it a
meber in LP
Is 11101 2 LP ? What computational resources
are needed to answer the question.
Usually we think of problems not as a yes/no
decision, but as something that transforms an
input into an output.
Example: Parse a C-program = check if the
program is correct, and if it is, produce a
parse
tree.
Let LX be the set of all valid programs in prog
lang X. If we can show that determining
membership
in LX is hard, then parsing programs
written in X cannot be easier.
Question: Why?
15
Finite Automata Informally
Protocol for e-commerce using e-money
Allowed events:
1. The customer can pay the store (=send
the money-_le to the store)
2. The customer can cancel the money (like
putting a stop on a check)
3. The store can ship the goods to the
customer
4. The store can redeem the money (=cash
the check)
5. The bank can transfer the money to the
store
16
e-commerce
The protocol for each participant:
134
2
redeem transfer
cancel
Start
ab
c
df
eg
Start
(a) Store
(b) Customer (c) Bank
redeem transfer
ship ship
redeem transfer
ship
pay
cancel
Start pay
17
Completed protocols:
cancel
134
2
redeem transfer
cancel
Start
ab
c
df
eg
Start
(a) Store
(b) Customer (c) Bank
ship ship ship
redeem transfer
pay redeem transfer
pay, cancel
ship. redeem, transfer,
pay,
ship
pay, ship
pay,cancel pay,cancel pay,cancel
pay,cancel pay,cancel pay,cancel
cancel, ship cancel, ship
pay,redeem, pay,redeem,
Start
18
The entire system as an Automaton:
CCCCCCC
PPPPPP
PPPPPP
P,C P,C
C P,C P,C P,C P,C P,C P,C
C
PSSS
PSSS
PSS
PSSS
abcdefg
1
2
3
4
Start
P,C
P,C P,C P,C
R
R
S
T
T
R
R
R
R
19
Deterministic Finite Automata
A DFA is a quintuple
A = (Q;_; _; q0; F)
_ Q is a _nite set of states
_ _ is a _nite alphabet (=input symbols)
_ _ is a transition function (q; a) 7! p
_ q0 2 Q is the start state
_ F _ Q is a set of _nal states
20
Example: An automaton A that accepts
L = fx01y : x; y 2 f0; 1g_g
The automaton A = (fq0; q1; q2g; f0; 1g; _; q0;
fq1g)
as a transition table:
01
! q0 q2 q0
?q1 q1 q1
q2 q2 q1
The automaton as a transition diagram:
10
q 0 1 0 q2 q1 0, 1
Start
21
An FA accepts a string w = a1a2 _ _ _ an if
there
is a path in the transition diagram that
1. Begins at a start state
2. Ends at an accepting state
3. Has sequence of labels a1a2 _ _ _ an
Example: The FA
Start q 0 1 0 q q
0, 1
12
accepts e.g. the string 01101
22
_ The transition function _ can be extended
to ^_ that operates on states and strings (as
opposed to states and symbols)
Basis: ^_(q; _) = q
Induction: ^_(q; xa) = _(^_(q; x); a)
_ Now, fomally, the language accepted by A
is
L(A) = fw : ^_(q0;w) 2 Fg
_ The languages accepted by FA:s are called
regular languages
23
Example: DFA accepting all and only strings
with an even number of 0's and an even
number
of 1's
qq
qq
01
23
Start
0
0
1
1
0
0
1
1
Tabular representation of the Automaton
01
? ! q0 q2 q1
q1 q3 q0
q2 q0 q3
q3 q1 q2
24
Example
Marble-rolling toy from p. 53 of textbook
AB
CD
x
xx
3
2
1
25
A state is represented as sequence of three
bits
followed by r or a (previous input rejected or
accepted)
For instance, 010a, means
left, right, left, accepted
Tabular representation of DFA for the toy
AB
! 000r 100r 011r
?000a 100r 011r
?001a 101r 000a
010r 110r 001a
?010a 110r 001a
011r 111r 010a
100r 010r 111r
?100a 010r 111r
101r 011r 100a
?101a 011r 100a
110r 000a 101a
?110a 000a 101a
111r 001a 110a
26
Nondeterministic Finite Automata
A NFA can be in several states at once, or,
viewded another way, it can \guess" which
state to go to next
Example: An automaton that accepts all and
only strings ending in 01.
Start q 0 1 0 q q
0, 1
12
Here is what happens when the NFA
processes
the input 00101
q0
q2
q0 q0 q0 q0 q0
q1 q1 q1
q2
00101
(stuck)
(stuck)
27
Formally, a NFA is a quintuple
A = (Q;_; _; q0; F)
_ Q is a _nite set of states
_ _ is a _nite alphabet
_ _ is a transition function from Q _ _ to the
powerset of Q
_ q0 2 Q is the start state
_ F _ Q is a set of _nal states
28
Example: The NFA from the previous slide is
(fq0; q1; q2g; f0; 1g; _; q0; fq2g)
where _ is the transition function
01
! q0 fq0; q1g fq0g
q1 ; fq2g
?q2 ; ;
29
Extended transition function ^_.
Basis: ^_(q; _) = fqg
Induction:
^_(q; xa) =
[
p2^_(q;x)
_(p; a)
Example: Let's compute ^_(q0; 00101) on the
blackboard
_ Now, fomally, the language accepted by A is
L(A) = fw : ^_(q0;w) \ F 6= ;g
30
Let's prove formally that the NFA
Start q 0 1 0 q q
0, 1
12
accepts the language fx01 : x 2 __g: We'll do
a mutual induction on the three statements
below
0. w 2 __ ) q0 2 ^_(q0;w)
1. q1 2 ^_(q0;w) , w = x0
2. q2 2 ^_(q0;w) , w = x01
31
Basis: If jwj = 0 then w = _. Then statement
(0) follows from def. For (1) and (2) both
sides are false for _
Induction: Assume w = xa, where a 2 f0; 1g;
jxj = n and statements (0){(2) hold for x. We
will show on the blackboard in class that the
statements hold for xa.
32
Equivalence of DFA and NFA
_ NFA's are usually easier to \program" in.
_ Surprisingly, for any NFA N there is a DFA D,
such that L(D) = L(N), and vice versa.
_ This involves the subset construction, an
important
example how an automaton B can be
generically constructed from another
automaton
A.
_ Given an NFA
N = (QN;_; _N; q0; FN)
we will construct a DFA
D = (QD;_; _D; fq0g; FD)
such that
L(D) = L(N)
.
33
The details of the subset construction:
_ QD = fS : S _ QNg.
Note: jQDj = 2jQNj, although most states in
QD are likely to be garbage.
_ FD = fS _ QN : S \ FN 6= ;g
_ For every S _ QN and a 2 _,
_D(S; a) =
[
p2S
_N(p; a)
34
Let's construct _D from the NFA on slide 27
01
;;;
! fq0g fq0; q1g fq0g
fq1g ; fq2g
?fq2g ; ;
fq0; q1g fq0; q1g fq0; q2g
?fq0; q2g fq0; q1g fq0g
?fq1; q2g ; fq2g
?fq0; q1; q2g fq0; q1g fq0; q2g
35
Note: The states of D correspond to subsets
of states of N, but we could have denoted the
states of D by, say, A � F just as well.
01
AAA
!BEB
CAD
?D A A
EEF
?F E B
?G A D
?H E F
36
We can often avoid the exponential blow-up
by constructing the transition table for D only
for accessible states S as follows:
Basis: S = fq0g is accessible in D
Induction: If state S is accessible, so are the
states in
S
a2_ _D(S; a).
Example: The \subset" DFA with accessible
states only.
Start
{q {q {q 0 0 0 , q1 , q2 } }
01
10
0
1
}
37
Theorem 2.11: Let D be the \subset" DFA
of an NFA N. Then L(D) = L(N).
Proof: First we show on an induction on jwj
that
^_D(fq0g;w) = ^_N(q0;w)
Basis: w = _. The claim follows from def.
38
Induction:
^_D(fq0g; xa)
def
= _D(^_D(fq0g; x); a)
i:h: = _D(^_N(q0; x); a)
cst
=
[
p2^_N(q0;x)
_N(p; a)
def
= ^_N(q0; xa)
Now (why?) it follows that L(D) = L(N).
39
Theorem 2.12: A language L is accepted by
some DFA if and only if L is accepted by some
NFA.
Proof: The \if" part is Theorem 2.11.
For the \only if" part we note that any DFA
can be converted to an equivalent NFA by
modifying
the _D to _N by the rule
_ If _D(q; a) = p, then _N(q; a) = fpg.
By induction on jwj it will be shown in the
tutorial that if ^_D(q0;w) = p, then ^_N(q0;w)
=
fpg.
The claim of the theorem follows.
40
Exponential Blow-Up
There is an NFA N with n+1 states that has
no equivalent DFA with fewer than 2n states
Start
0, 1
0, 1 0, 1 0, 1
qqqq012n
1 0, 1
L(N) = fx1c2c3 _ _ _ cn : x 2 f0; 1g_; ci 2 f0; 1gg
Suppose an equivalent DFA D with fewer than
2n states exists.
D must remember the last n symbols it has
read.
There are 2n bitsequences a1a2 _ _ _ an
9 q; a1a2 _ _ _ an; b1b2 _ _ _ bn : q 2 ^_N(q0;
a1a2 _ _ _ an);
q 2 ^_N(q0; b1b2 _ _ _ bn);
a1a2 _ _ _ an 6= b1b2 _ _ _ bn
41
Case 1:
1a2 _ _ _ an
0b2 _ _ _ bn
Then q has to be both an accepting and a
nonaccepting state.
Case 2:
a1 _ _ _ ai�11ai+1 _ _ _ an
b1 _ _ _ bi�10bi+1 _ _ _ bn
Now ^_N(q0; a1 _ _ _ ai�11ai+1 _ _ _ an0i�1) =
^_N(q0; b1 _ _ _ bi�10bi+1 _ _ _ bn0i�1)
and ^_N(q0; a1 _ _ _ ai�11ai+1 _ _ _ an0i�1) 2
FD
^_N(q0; b1 _ _ _ bi�10bi+1 _ _ _ bn0i�1) =2 FD
42
FA's with Epsilon-Transitions
An _-NFA accepting decimal numbers
consisting
of:
1. An optional + or - sign
2. A string of digits
3. a decimal point
4. another string of digits
One of the strings (2) are (4) are optional
qqqqq
q
01235
4
Start
0,1,...,9 0,1,...,9
ε ε
0,1,...,9
0,1,...,9
,+,-
.
.
43
Example:
_-NFA accepting the set of keywords febay;
webg
1
234
5678
Start
Σ
w
e
e
bay
b
44
An _-NFA is a quintuple (Q;_; _; q0; F) where _
is a function from Q __[ f_g to the powerset
of Q.
Example: The _-NFA from the previous slide
E = (fq0; q1; : : : ; q5g; f:;+;�; 0; 1; : : : ; 9g _;
q0; fq5g)
where the transition table for _ is
_ +,- . 0; : : : ; 9
! q0 fq1g fq1g ; ;
q1 ; ; fq2g fq1; q4g
q2 ; ; ; fq3g
q3 fq5g ; ; fq3g
q4 ; ; fq3g ;
?q5 ; ; ; ;
45
ECLOSE
We close a state by adding all states
reachable
by a sequence __ _ _ _ _
Inductive de_nition of ECLOSE(q)
Basis:
q 2 ECLOSE(q)
Induction:
p 2 ECLOSE(q) and r 2 _(p; _) )
r 2 ECLOSE(q)
46
Example of _-closure
1
236
457
ε
ε ε
ε
aε
b
For instance,
ECLOSE(1) = f1; 2; 3; 4; 6g
47
_ Inductive de_nition of ^_ for _-NFA's
Basis:
^_(q; _) = ECLOSE(q)
Induction:
^_(q; xa) =
[
p2_(^_(q;x);a)
ECLOSE(p)
Let's compute on the blackboard in class
^_(q0; 5:6) for the NFA on slide 43
48
Given an _-NFA
E = (QE;_; _E; q0; FE)
we will construct a DFA
D = (QD;_; _D; qD; FD)
such that
L(D) = L(E)
Details of the construction:
_ QD = fS : S _ QE and S = ECLOSE(S)g
_ qD = ECLOSE(q0)
_ FD = fS : S 2 QD and S \ FE 6= ;g
_ _D(S; a) =
[
fECLOSE(p) : p 2 _(t; a) for some t 2 Sg
49
Example: _-NFA E
qqqqq
q
01235
4
Start
0,1,...,9 0,1,...,9
ε ε
0,1,...,9
0,1,...,9
,+,-
.
.
DFA D corresponding to E
Start
{{{{
{{
qqqq
qq
011
,q}
1
},q
4
} 2, q3, q5}
2} 3, q5}
0,1,...,9 0,1,...,9
0,1,...,9
0,1,...,9
0,1,...,9
0,1,...,9
+,-
.
.
.
50
Theorem 2.22: A language L is accepted by
some _-NFA E if and only if L is accepted by
some DFA.
Proof: We use D constructed as above and
show by induction that ^_D(q0;w) = ^_E(qD;w)
Basis: ^_E(q0; _) = ECLOSE(q0) = qD = ^_(qD;
_)
51
Induction:
^_E(q0; xa) =
[
p2_E(^_E(q0;x);a)
ECLOSE(p)
=
[
p2_D(^_D(qD;x);a)
ECLOSE(p)
=
[
p2^_D(qD;xa)
ECLOSE(p)
= ^_D(qD; xa)
52
Regular expressions
A FA (NFA or DFA) is a \blueprint" for
contructing
a machine recognizing a regular language.
A regular expression is a \user-friendly,"
declarative
way of describing a regular language.
Example: 01_ +10_
Regular expressions are used in e.g.
1. UNIX grep command
2. UNIX Lex (Lexical analyzer generator) and
Flex (Fast Lex) tools.
53
Operations on languages
Union:
L [M = fw : w 2 L or w 2 Mg
Concatenation:
L:M = fw : w = xy; x 2 L; y 2 Mg
Powers:
L0 = f_g; L1 = L; Lk+1 = L:Lk
Kleene Closure:
L _ = 1[
i=0
Li
Question: What are ;0; ;i, and ;_
54
Building regex's
Inductive de_nition of regex's:
Basis: _ is a regex and ; is a regex.
L(_) = f_g, and L(;) = ;.
If a 2 _, then a is a regex.
L(a) = fag.
Induction:
If E is a regex's, then (E) is a regex.
L((E)) = L(E).
If E and F are regex's, then E +F is a regex.
L(E +F) = L(E) [ L(F).
If E and F are regex's, then E:F is a regex.
L(E:F ) = L(E):L(F).
If E is a regex's, then E? is a regex.
L(E?) = (L(E))_.
55
Example: Regex for
L = fw 2 f0; 1g_ : 0 and 1 alternate in wg
(01)_ +(10)_ +0(10)_ +1(01)_
or, equivalently,
(_+1)(01)_(_+0)
Order of precedence for operators:
1. Star
2. Dot
3. Plus
Example: 01_ +1 is grouped (0(1)_)+1
56
Equivalence of FA's and regex's
We have already shown that DFA's, NFA's,
and _-NFA's all are equivalent.
ε -NFA NFA
RE DFA
To show FA's equivalent to regex's we need to
establish that
1. For every DFA A we can _nd (construct,
in this case) a regex R, s.t. L(R) = L(A).
2. For every regex R there is a _-NFA A, s.t.
L(A) = L(R).
57
Theorem 3.4: For every DFA A = (Q;_; _; q0; F)
there is a regex R, s.t. L(R) = L(A).
Proof: Let the states of A be f1; 2; : : : ; ng,
with 1 being the start state.
_ Let R
(k)
ij bea regex describing the set of
labels of all paths in A from state i to state
j going through intermediate states f1; : : : ;
kg
only.
i
k
j
58
R
(k)
ij will be de_ned inductively. Note that
L
0
@
M
j2F
R1j
(n)
1
A = L(A)
Basis: k = 0, i.e. no intermediate states.
_ Case 1: i 6= j
R
(0)
ij =
M
fa2_:_(i;a)=jg
a
_ Case 2: i = j
R
(0)
ii =
0
B@
M
fa2_:_(i;a)=ig
a
1
CA
+_
59
Induction:
R
(k)
ij
=
R
(k�1)
ij
+
R
(k�1)
ik
_
R
(k�1)
kk
__R
(k�1)
kj
R kj
(k-1)
R kk
(k-1) R ik
(k-1)
ikkkk
Zero or more strings in
In In
j
60
Example: Let's _nd R for A, where
L(A) = fx0y : x 2 f1g_ and y 2 f0; 1g_g
1
Start 0 0,1
12
R
(0)
11 _+1
R
(0)
12 0
R
(0)
21 ;
R
(0)
22 _+0+1
61
We will need the following simpli_cation rules:
_ (_+R)_ = R_
_ R+RS_ = RS_
_ ;R = R; = ; (Annihilation)
_ ;+R = R+; = R (Identity)
62
R
(0)
11 _+1
R
(0)
12 0
R
(0)
21 ;
R
(0)
22 _+0+1
R
(1)
ij = R
(0)
ij +R
(0)
i1
_
R
(0)
11
__R
(0)
1j
By direct substitution Simpli_ed
R
(1)
11 _+1+(_+1)(_+1)_(_+1) 1_
R
(1)
12 0+(_+1)(_+1)_0 1_0
R
(1)
21 ;+;(_+1)_(_+1) ;
R
(1)
22 _+0+1+;(_+1)_0 _+0+1
63
Simpli_ed
R
(1)
11 1_
R
(1)
12 1_0
R
(1)
21 ;
R
(1)
22 _+0+1
R
(2)
ij = R
(1)
ij +R
(1)
i2
_
R
(1)
22
__R
(1)
2j
By direct substitution
R
(2)
11 1_ +1_0(_+0+1)_;
R
(2)
12 1_0+1_0(_+0+1)_(_+0+1)
R
(2)
21 ;+(_+0+1)(_+0+1)_;
R
(2)
22 _+0+1+(_+0+1)(_+0+1)_(_+0+1)
64
By direct substitution
R
(2)
11 1_ +1_0(_+0+1)_;
R
(2)
12 1_0+1_0(_+0+1)_(_+0+1)
R
(2)
21 ;+(_+0+1)(_+0+1)_;
R
(2)
22 _+0+1+(_+0+1)(_+0+1)_(_+0+1)
Simpli_ed
R
(2)
11 1_
R
(2)
12 1_0(0+1)_
R
(2)
21 ;
R
(2)
22 (0+1)_
The _nal regex for A is
R
(2)
12 = 1_0(0+1)_
65
Observations
There are n3 expressions R
(k)
ij
Each inductive step grows the expression 4-
fold
R
(n)
ij could have size 4n
For all fi; jg _ f1; : : : ; ng, R
(k)
ij uses R
(k�1)
kk
so we have to write n2 times the regex R
(k�1)
kk
We need a more e_cient approach:
the state elimination technique
66
The state elimination technique
Let's label the edges with regex's instead of
symbols
q
q
p
p
11
km
s
Q
Q
P1
Pm
k
1
11 R
R 1m
R km
R k1
S
67
Now, let's eliminate state s.
11 R Q1 P1
R 1m
R k1
R km
Q1 Pm
Qk
Qk
P1
Pm
q
q
p
p
11
km
+ S*
+
+
+
S*
S*
S*
For each accepting state q eliminate from the
original automaton all states exept q0 and q.
68
For each q 2 F we'll be left with an Aq that
looks like
Start
R
S
T
U
that corresponds to the regex Eq =
(R+SU_T)_SU_
or with Aq looking like
R
Start
corresponding to the regex Eq = R_
_ The _nal expression is
M
q2F
Eq
69
Example: A, where L(A) = fW : w = x1b; or w
=
x1bc; x 2 f0; 1g_; fb; cg _ f0; 1gg
Start
0,1
1 0,1 0,1
ABCD
We turn this into an automaton with regex
labels
0+1
Start 0 + 1 0 + 1
ABCD
1
70
0+1
Start 0 + 1 0 + 1
ABCD
1
Let's eliminate state B
0+1
CD
Start (0 + 1) 0 + 1
A
1
Then we eliminate state C and obtain AD
0+1
D
Start (0 + 1)(0 + 1)
A
1
with regex (0+1)_1(0+1)(0+1)
71
From
0+1
CD
Start (0 + 1) 0 + 1
A
1
we can eliminate D to obtain AC
0+1
C
Start (0 + 1)
A
1
with regex (0+1)_1(0+1)
_ The _nal expression is the sum of the
previous
two regex's:
(0+1)_1(0+1)(0+1)+(0+1)_1(0+1)
72
From regex's to _-NFA's
Theorem 3.7: For every regex R we can
construct
and _-NFA A, s.t. L(A) = L(R).
Proof: By structural induction:
Basis: Automata for _, ;, and a.
ε
a
(a)
(b)
(c)
73
Induction: Automata for R+S, RS, and R_
(a)
(b)
(c)
R
S
RS
R
ε ε
ε ε
ε
ε
ε
ε ε
74
Example: We convert (0+1)_1(0+1)
ε
ε
ε
ε
0
1
ε
ε
ε
ε
0
1
ε
1ε
Start
(a)
(b)
(c)
0
1
ε ε
ε
ε
ε ε
ε ε
ε
0
1
ε ε
ε
ε
ε ε
ε
75
Algebraic Laws for languages
_ L [M = M [ L.
Union is commutative.
_ (L [M) [ N = L [ (M [ N).
Union is associative.
_ (LM)N = L(MN).
Concatenation is associative
Note: Concatenation is not commutative, i.e.,
there are L and M such that LM 6= ML.
76
_ ; [ L = L [ ; = L.
; is identity for union.
_ f_gL = Lf_g = L.
f_g is left and right identity for concatenation.
_ ;L = L; = ;.
; is left and right annihilator for
concatenation.
77
_ L(M [ N) = LM [ LN.
Concatenation is left distributive over union.
_ (M [ N)L = ML [ NL.
Concatenation is right distributive over union.
_ L [ L = L.
Union is idempotent.
_ ;_ = f_g, f_g_ = f_g.
_ L+ = LL_ = L_L, L_ = L+ [ f_g
78
_ (L_)_ = L_. Closure is idempotent
Proof:
w 2 (L_)_ () w 2
1[
i=0
1[
j=0
Lj
!i
() 9k;m 2 N : w 2 (Lm)k
() 9p 2 N : w 2 Lp
() w 2
1[
i=0
Li
() w 2 L_ _
79
Algebraic Laws for regex's
Evidently e.g. L((0+1)1) = L(01+11)
Also e.g. L((00+101)11) = L(0011+10111).
More generally
L((E +F)G) = L(EG+FG)
for any regex's E, F, and G.
_ How do we verify that a general identity like
above is true?
1. Prove it by hand.
2. Let the computer prove it.
80
In Chapter 4 we will learn how to test
automatically
if E = F, for any concrete regex's
E and F.
We want to test general identities, such as
E +F = F +E, for any regex's E and F.
Method:
1. \Freeze" E to a1, and F to a2
2. Test automatically if the frozen identity is
true, e.g. if L(a1 +a2) = L(a2 +a1)
Question: Does this always work?
81
Answer: Yes, as long as the identities use only
plus, dot, and star.
Let's denote a generalized regex, such as (E
+F)E
by
E(E;F)
Now we can for instance make the
substitution
S = fE=0;F=11g to obtain
S (E(E;F)) = (0+11)0
82
Theorem 3.13: Fix a \freezing" substitution
= fE1=a1; E2=a2; : : : ; Em=amg.
Let E(E1; E2; : : : ; Em) be a generalized regex.
Then for any regex's E1;E2; : : : ;Em,
w 2 L(E(E1;E2; : : : ;Em))
if and only if there are strings wi 2 L(Ei), s.t.
w = wj1wj2 _ _ _wjk
and
aj1aj2 _ _ _ ajk 2 L(E(a1; a2; : : : ; am))
83
For example: Suppose the alphabet is f1; 2g.
Let E(E1; E2) be (E1 + E2)E1, and let E1 be 1,
and E2 be 2. Then
w 2 L(E(E1;E2)) = L((E1 +E2)E1) =
(f1g [ f2g)f1g = f11; 21g
if and only if
9w1 2 L(E1) = f1g; 9w2 2 L(E2) = f2g : w =
wj1wj2
and
aj1aj2 2 L(E(a1; a2))) = L((a1+a2)a1) = fa1a1;
a2a1g
if and only if
j1 = j2 = 1, or j1 = 1, and j2 = 2
84
Proof of Theorem 3.13: We do a structural
induction of E.
Basis: If E = _, the frozen expression is also _.
If E = ;, the frozen expression is also ;.
If E = a, the frozen expression is also a. Now
w 2 L(E) if and only if there is u 2 L(a), s.t.
w = u and u is in the language of the frozen
expression, i.e. u 2 fag.
85
Induction:
Case 1: E = F+G.
Then (E) = (F)+(G), and
L((E)) = L((F)) [ L((G))
Let E and and F be regex's. Then w 2 L(E +F)
if and only if w 2 L(E) or w 2 L(F), if and only
if a1 2 L((F)) or a2 2 L((G)), if and only if
a1 2 (E), or a2 2 (E).
Case 2: E = F:G.
Then (E) = (F):(G), and
L((E)) = L((F)):L((G))
Let E and and F be regex's. Then w 2 L(E:F )
if and only if w = w1w2, w1 2 L(E) and w2 2
L(F),
and a1a2 2 L((F)):L((G)) = (E)
Case 3: E = F_.
Prove this case at home.
86
Examples:
To prove (L+M)_ = (L_M_)_ it is enough to
determine if (a1+a2)_ is equivalent to
(a_1a_2)_
To verify L_ = L_L_ test if a_1 is equivalent to
a_1a_1.
Question: Does L+ML = (L+M)L hold?
87
Theorem 3.14: E(E1; : : : ; Em) = F(E1; : : : ;
E m) ,
L((E)) = L((F))
Proof:
(Only if direction) E(E1; : : : ; Em) = F(E1; : : : ;
E m)
means that L(E(E1; : : : ;Em)) =
L(F(E1; : : : ;Em))
for any concrete regex's E1; : : : ;Em. In
particular
then L((E)) = L((F))
(If direction) Let E1; : : : ;Em be concrete
regex's.
Suppose L((E)) = L((F)). Then by Theorem
3.13,
w 2 L(E(E1; : : :Em)) ,
9wi 2 L(Ei);w = wj1 _ _ _wjm; aj1 _ _ _ ajm 2
L((E)) ,
9wi 2 L(Ei);w = wj1 _ _ _wjm; aj1 _ _ _ ajm 2
L((F)) ,
w 2 L(F(E1; : : :Em))
88
Properties of Regular Languages
_ Pumping Lemma. Every regular language
satis_es the pumping lemma. If somebody
presents you with fake regular language, use
the pumping lemma to show a contradiction.
_ Closure properties. Building automata from
components through operations, e.g. given L
and M we can build an automaton for L \M.
_ Decision properties. Computational analysis
of automata, e.g. are two automata
equivalent.
_ Minimization techniques. We can save
money
since we can build smaller machines.
89
The Pumping Lemma Informally
Suppose L01 = f0n1n : n _ 1g were regular.
Then it would be recognized by some DFA A,
with, say, k states.
Let A read 0k. On the way it will travel as
follows:
_ p0
0 p1
00 p2
::::::
0 k pk
)9i < j : pi = pj Call this state q.
90
Now you can fool A:
If ^_(q; 1i) 2 F the machine will foolishly
accept
0j1i.
If ^_(q; 1i) =2 F the machine will foolishly
reject
0i1i.
Therefore L01 cannot be regular.
_ Let's generalize the above reasoning.
91
Theorem 4.1.
The Pumping Lemma for Regular Languages.
Let L be regular.
Then 9n; 8w 2 L : jwj _ n ) w = xyz such that
1. y 6= _
2. jxyj _ n
3. 8k _ 0; xykz 2 L
92
Proof: Suppose L is regular
The L is recognized by some DFA A with, say,
n states.
Let w = a1a2 : : : am 2 L; m > n.
Let pi = ^_(q0; a1a2 _ _ _ ai).
)9i < j : pi = pj
93
Now w = xyz, where
1. x = a1a2 _ _ _ ai
2. y = ai+1ai+2 _ _ _ aj
3. z = aj+1aj+2 : : : am
Start
pi p0
a1 . . . ai
ai+1 . . . aj
aj+1 . . . am
x=z=
y=
Evidently xykz 2 L, for any k _ 0. Q:E:D:
94
Example: Let Leq be the language of strings
with equal number of zero's and one's.
Suppose Leq is regular. Then w = 0n1n 2 L.
By the pumping lemma w = xyz, jxyj _ n,
y 6= _ and xykz 2 Leq
w = 0| 00{z_ _}_
x _|_{_z0}
y
|0111{z_ _ _ 11}
z
In particular, xz 2 Leq, but xz has fewer 0's
than 1's.
95
Suppose Lpr = f1p : p is prime g were regular.
Let n be given by the pumping lemma.
Choose a prime p _ n+2.
w=
z }p| {
|111{z_ _}_
x _|_{_z1}
y
jyj=m
|1111{z_ _ _ 11}
z
Now xyp�mz 2 Lpr
jxyp�mzj = jxzj+(p � m)jyj =
p � m+(p � m)m = (1+m)(p � m)
which is not prime unless one of the factors
is 1.
_ y 6= _ ) 1+m > 1
_ m = jyj _ jxyj _ n; p _ n+2
) p � m _ n+2 � n = 2.
96
Closure Properties of Regular Languages
Let L and M be regular languages. Then the
following languages are all regular:
_ Union: L [M
_ Intersection: L \M
_ Complement: N
_ Di_erence: L nM
_ Reversal: LR = fwR : w 2 Lg
_ Closure: L_.
_ Concatenation: L:M
_ Homomorphism:
h(L) = fh(w) : w 2 L; h is a homom. g
_ Inverse homomorphism:
h�1(L) = fw 2 _ : h(w) 2 L; h : _ ! _ is a
homom. g
97
Theorem 4.4. For any regular L and M, L[M
is regular.
Proof. Let L = L(E) and M = L(F). Then
L(E +F) = L [M by de_nition.
Theorem 4.5. If L is a regular language over
_, then so is L = __ n L.
Proof. Let L be recognized by a DFA
A = (Q;_; _; q0; F):
Let B = (Q;_; _; q0;Q n F). Now L(B) = L.
98
Example:
Let L be recognized by the DFA below
Start
{q {q {q 0 0 0 , q1 , q2 } }
01
10
0
1
}
Then L is recognized by
10
Start
{q {q {q 0 0 0 , q1 , q2 } }
01
}
1
0
Question: What are the regex's for L and L
99
Theorem 4.8. If L and M are regular, then
so is L \M.
Proof. By DeMorgan's law L \ M = L [M.
We already that regular languages are closed
under complement and union.
We shall shall also give a nice direct proof,
the
Cartesian construction from the e-commerce
example.
100
Theorem 4.8. If L and M are regular, then
so in L \M.
Proof. Let L be the language of
AL = (QL;_; _L; qL; FL)
and M be the language of
AM = (QM;_; _M; qM; FM)
We assume w.l.o.g. that both automata are
deterministic.
We shall construct an automaton that
simulates
AL and AM in parallel, and accepts if and
only if both AL and AM accept.
101
If AL goes from state p to state s on reading a,
and AM goes from state q to state t on reading
a, then AL\M will go from state (p; q) to state
(s; t) on reading a.
Start
Input
AND Accept
a
L
M
A
A
102
Formally
AL\M = (QL_QM;_; _L\M; (qL; qM); FL_FM);
where
_L\M((p; q); a) = (_L(p; a); _M(q; a))
It will be shown in the tutorial by and
induction
on jwj that
^_L\M((qL; qM);w) =
_
^_L(qL;w); ^_M(qM;w)
_
The claim then follows.
Question: Why?
103
Example: (c) = (a) _ (b)
Start
Start
1
0 0,1
1 0,1
0
(a)
(b)
Start
0,1
pq
rs
pr ps
qr qs
0
1
1
0
0
1
(c)
104
Theorem 4.10. If L and M are regular
languages,
then so in L nM.
Proof. Observe that L n M = L \ M. We
already know that regular languages are
closed
under complement and intersection.
105
Theorem 4.11. If L is a regular language,
then so is LR.
Proof 1: Let L be recognized by an FA A.
Turn A into an FA for LR, by
1. Reversing all arcs.
2. Make the old start state the new sole
accepting
state.
3. Create a new start state p0, with _(p0; _) =
F
(the old accepting states).
106
Theorem 4.11. If L is a regular language,
then so is LR.
Proof 2: Let L be described by a regex E.
We shall construct a regex ER, such that
L(ER) = (L(E))R.
We proceed by a structural induction on E.
Basis: If E is _, ;, or a, then ER = E.
Induction:
1. E = F +G. Then ER = FR +GR
2. E = F:G. Then ER = GR:FR
3. E = F_. Then ER = (FR)_
We will show by structural induction on E on
blackboard in class that
L(ER) = (L(E))R
107
Homomorphisms
A homomorphism on _ is a function h : __ ! __,
where _ and _ are alphabets.
Let w = a1a2 _ _ _ an 2 __. Then
h(w) = h(a1)h(a2) _ _ _ h(an)
and
h(L) = fh(w) : w 2 Lg
Example: Let h : f0; 1g_ ! fa; bg_ be de_ned by
h(0) = ab, and h(1) = _. Now h(0011) = abab.
Example: h(L(10_1)) = L((ab)_).
108
Theorem 4.14: h(L) is regular, whenever L
is.
Proof:
Let L = L(E) for a regex E. We claim that
L(h(E)) = h(L).
Basis: If E is _ or ;. Then h(E) = E, and
L(h(E)) = L(E) = h(L(E)).
If E is a, then L(E) = fag, L(h(E)) = L(h(a)) =
fh(a)g = h(L(E)).
Induction:
Case 1: L = E + F. Now L(h(E + F)) =
L(h(E)+h(F)) = L(h(E))[L(h(F)) = h(L(E))[
h(L(F)) = h(L(E) [ L(F)) = h(L(E +F)).
Case 2: L = E:F . Now L(h(E:F )) =
L(h(E)):L(h(F))
= h(L(E)):h(L(F)) = h(L(E):L(F))
Case 3: L = E_. Now L(h(E_)) = L(h(E)_) =
L(h(E))_ = h(L(E))_ = h(L(E_)).
109
Inverse Homomorphism
Let h : __ ! __ be a homom. Let L _ __,
and de_ne
h�1(L) = fw 2 __ : h(w) 2 Lg
L h(L)
h-1 (L) L
(a)
(b)
h
h
110
Example: Let h : fa; bg ! f0; 1g_ be de_ned by
h(a) = 01, and h(b) = 10. If L = L((00+1)_),
then h�1(L) = L((ba)_).
Claim: h(w) 2 L if and only if w = (ba)n
Proof: Let w = (ba)n. Then h(w) = (1001)n 2
L.
Let h(w) 2 L, and suppose w =2 L((ba)_).
There
are four cases to consider.
1. w begins with a. Then h(w) begins with
01 and =2 L((00+1)_).
2. w ends in b. Then h(w) ends in 10 and
=2 L((00+1)_).
3. w = xaay. Then h(w) = z0101v and =2
L((00+1)_).
4. w = xbby. Then h(w) = z1010v and =2
L((00+1)_).
111
Theorem 4.16: Let h : __ ! __ be a homom.,
and L _ __ regular. Then h�1(L) is
regular.
Proof: Let L be the language of A = (Q;_; _; q0;
F).
We de_ne B = (Q;_; ; q0; F), where
(q; a) = ^_(q; h(a))
It will be shown by induction on jwj in the
tutorial
that ^(q0;w) = ^_(q0; h(w))
Start h(a) to A
Accept/reject
Input a
h
A
Input
112
Decision Properties
We consider the following:
1. Converting among representations for
regular
languages.
2. Is L = ;?
3. Is w 2 L?
4. Do two descriptions de_ne the same
language?
113
From NFA's to DFA's
Suppose the _-NFA has n states.
To compute ECLOSE(p) we follow at most n2
arcs.
The DFA has 2n states, for each state S and
each a 2 _ we compute _D(S; a) in n3 steps.
Grand total is O(n32n) steps.
If we compute _ for reachable states only, we
need to compute _D(S; a) only s times, where
s
is the number of reachable states. Grand total
is O(n3s) steps.
114
From DFA to NFA
All we need to do is to put set brackets
around
the states. Total O(n) steps.
From FA to regex
We need to compute n3 entries of size up to
4n. Total is O(n34n).
The FA is allowed to be a NFA. If we _rst
wanted to convert the NFA to a DFA, the total
time would be doubly exponential
From regex to FA's We can build an
expression
tree for the regex in n steps.
We can construct the automaton in n steps.
Eliminating _-transitions takes O(n3) steps.
If you want a DFA, you might need an
exponential
number of steps.
115
Testing emptiness
L(A) 6= ; for FA A if and only if a _nal state
is reachable from the start state in A. Total
O(n2) steps.
Alternatively, we can inspect a regex E and
tell
if L(E) = ;. We use the following method:
E = F + G. Now L(E) is empty if and only if
both L(F) and L(G) are empty.
E = F:G. Now L(E) is empty if and only if
either L(F) or L(G) is empty.
E = F_. Now L(E) is never empty, since _ 2
L(E).
E = _. Now L(E) is not empty.
E = a. Now L(E) is not empty.
E = ;. Now L(E) is empty.
116
Testing membership
To test w 2 L(A) for DFA A, simulate A on w.
If jwj = n, this takes O(n) steps.
If A is an NFA and has s states, simulating A
on w takes O(ns2) steps.
If A is an _-NFA and has s states, simulating
A on w takes O(ns3) steps.
If L = L(E), for regex E of length s, we _rst
convert E to an _-NFA with 2s states. Then we
simulate w on this machine, in O(ns3) steps.
117
Equivalence and Minimization of Automata
Let A = (Q;_; _; q0; F) be a DFA, and fp; qg _
Q.
We de_ne
p _ q , 8w 2 __ : ^_(p;w) 2 F i_ ^_(q;w) 2 F
_ If p _ q we say that p and q are equivalent
_ If p 6_ q we say that p and q are
distinguishable
IOW (in other words) p and q are
distinguishable
i_
9w : ^_(p;w) 2 F and ^_(q;w) =2 F; or vice
versa
118
Example:
Start
0
0
1
1
0
1
0
1
1
0
10
0
11
0
ABCD
EFGH
^_(C; _) 2 F; ^_(G; _) =2 F ) C 6_ G
^_(A; 01) = C 2 F; ^_(G; 01) = E =2 F ) A 6_ G
119
What about A and E?
Start
0
0
1
1
0
1
0
1
1
0
10
0
11
0
ABCD
EFGH
^_(A; _) = A =2 F; ^_(E; _) = E =2 F
^_(A; 1) = F = ^_(E; 1)
Therefore ^_(A; 1x) = ^_(E; 1x) = ^_(F; x)
^_(A; 00) = G = ^_(E; 00)
^_(A; 01) = C = ^_(E; 01)
Conclusion: A _ E.
120
We can compute distinguishable pairs with
the
following inductive table _lling algorithm:
Basis: If p 2 F and q 62 F, then p 6_ q.
Induction: If 9a 2 _ : _(p; a) 6_ _(q; a),
then p 6_ q.
Example: Applying the table _lling algo to A:
B
C
D
E
F
G
H
ABCDEFG
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xx
121
Theorem 4.20: If p and q are not distinguished
by the TF-algo, then p _ q.
Proof: Suppose to the contrary that that there
is a bad pair fp; qg, s.t.
1. 9w : ^_(p;w) 2 F; ^_(q;w) =2 F, or vice
versa.
2. The TF-algo does not distinguish between
p and q.
Let w = a1a2 _ _ _ an be the shortest string
that
identi_es a bad pair fp; qg.
Now w 6= _ since otherwise the TF-algo would
in the basis distinguish p from q. Thus n _ 1.
122
Consider states r = _(p; a1) and s = _(q; a1).
Now fr; sg cannot be a bad pair since fr; sg
would be indenti_ed by a string shorter than
w.
Therefore, the TF-algo must have discovered
that r and s are distinguishable.
But then the TF-algo would distinguish p from
q in the inductive part.
Thus there are no bad pairs and the theorem
is true.
123
Testing Equivalence of Regular Languages
Let L and M be reg langs (each given in some
form).
To test if L = M
1. Convert both L and M to DFA's.
2. Imagine the DFA that is the union of the
two DFA's (never mind there are two start
states)
3. If TF-algo says that the two start states
are distinguishable, then L 6= M, otherwise
L = M.
124
Example:
Start
Start
0
0
1
1
0
10
1
1
0
AB
CD
E
We can \see" that both DFA accept
L(_+(0+1)_0). The result of the TF-algo is
B
C
D
E
ABCD
x
x
x
x
xx
Therefore the two automata are equivalent.
125
Minimization of DFA's
We can use the TF-algo to minimize a DFA
by merging all equivalent states. IOW, replace
each state p by p=_
.
Example: The DFA on slide 119 has
equivalence
classes ffA;Eg; fB;Hg; fCg; fD; Fg; fGgg.
The \union" DFA on slide 125 has equivalence
classes ffA;C;Dg; fB;Egg.
Note: In order for p=_
to be an equivalence
class, the relation _ has to be an equivalence
relation (reexive, symmetric, and transitive).
126
Theorem 4.23: If p _ q and q _ r, then p _ r.
Proof: Suppose to the contrary that p 6_ r.
Then 9w such that ^_(p;w) 2 F and ^_(r;w) 62
F,
or vice versa.
OTH, ^_(q;w) is either accpeting or not.
Case 1: ^_(q;w) is accepting. Then q 6_ r.
Case 1: ^_(q;w) is not accepting. Then p 6_ q.
The vice versa case is proved symmetrically
Therefore it must be that p _ r.
127
To minimize a DFA A = (Q;_; _; q0; F)
construct
a DFA B = (Q=_;_; ; q0=_; F=_
), where
(p=_; a) = _(p; a)=_
In order for B to be well de_ned we have to
show that
If p _ q then _(p; a) _ _(q; a)
If _(p; a) 6_ _(q; a), then the TF-algo would
conclude
p 6_ q, so B is indeed well de_ned. Note
also that F=_
contains all and only the accepting
states of A.
128
Example: We can minimize
Start
0
0
1
1
0
1
0
1
1
0
10
0
11
0
ABCD
EFGH
to obtain
Start
1
0
0
1
1
0
1
0
1
A,E 0
G D,F
B,H C
129
NOTE:We cannot apply the TF-algo to NFA's.
For example, to minimize
Start
0,1
0
10
AB
C
we simply remove state C.
However, A 6_ C.
130
Why the Minimized DFA Can't Be Beaten
Let B be the minimized DFA obtained by
applying
the TF-algo to DFA A.
We already know that L(A) = L(B).
What if there existed a DFA C, with
L(C) = L(B) and fewer states than B?
Then run the TF-algo on B \union" C.
Since L(B) = L(C) we have qB
0 _ qC
0.
Also, _(qB
0 ; a) _ _(qC
0 ; a), for any a.
131
Claim: For each state p in B there is at least
one state q in C, s.t. p _ q.
Proof of claim: There are no inaccessible
states,
so p = ^_(qB
0 ; a1a2 _ _ _ ak), for some string a1a2 _ _ _ ak.
Now q = ^_(qC
0 ; a1a2 _ _ _ ak), and p _ q.
Since C has fewer states than B, there must
be
two states r and s of B such that r _ t _ s, for
some state t of C. But then r _ s (why?)
which is a contradiction, since B was
constructed
by the TF-algo.
132
Context-Free Grammars and Languages
_ We have seen that many languages cannot
be regular. Thus we need to consider larger
classes of langs.
_ Contex-Free Languages (CFL's) played a
central
role natural languages since the 1950's,
and in compilers since the 1960's.
_ Context-Free Grammars (CFG's) are the
basis
of BNF-syntax.
_ Today CFL's are increasingly important for
XML and their DTD's.
We'll look at: CFG's, the languages they
generate,
parse trees, pushdown automata, and
closure properties of CFL's.
133
Informal example of CFG's
Consider Lpal = fw 2 __ : w = wRg
For example otto 2 Lpal; madamimadam 2
Lpal.
In Finnish language e.g. saippuakauppias 2
Lpal
(\soap-merchant")
Let _ = f0; 1g and suppose Lpal were regular.
Let n be given by the pumping lemma. Then
0n10n 2 Lpal. In reading 0n the FA must make
a loop. Omit the loop; contradiction.
Let's de_ne Lpal inductively:
Basis: _; 0, and 1 are palindromes.
Induction: If w is a palindrome, so are 0w0
and 1w1.
Circumscription: Nothing else is a palindrome.
134
CFG's is a formal mechanism for de_nitions
such as the one for Lpal.
1: P ! _
2: P ! 0
3: P ! 1
4: P ! 0P0
5: P ! 1P1
0 and 1 are terminals
P is a variable (or nonterminal, or syntactic
category)
P is in this grammar also the start symbol.
1{5 are productions (or rules)
135
Formal de_nition of CFG's
A context-free grammar is a quadruple
G = (V; T; P; S)
where
V is a _nite set of variables.
T is a _nite set of terminals.
P is a _nite set of productions of the form
A ! _, where A is a variable and _ 2 (V [ T)_
S is a designated variable called the start
symbol.
136
Example: Gpal = (fPg; f0; 1g; A; P), where A =
fP ! _; P ! 0; P ! 1; P ! 0P0; P ! 1P1g.
Sometimes we group productions with the
same
head, e.g. A = fP ! _j0j1j0P0j1P1g.
Example: Regular expressions over f0; 1g can
be de_ned by the grammar
Gregex = (fEg; f0; 1g; A;E)
where A =
fE ! 0;E ! 1;E ! E:E;E ! E+E;E ! E?;E ! (E)g
137
Example: (simple) expressions in a typical
prog
lang. Operators are + and *, and arguments
are ident_ers, i.e. strings in
L((a+b)(a+b+0+1)_)
The expressions are de_ned by the grammar
G = (fE; Ig; T; P;E)
where T = f+; _; (; ); a; b; 0; 1g and P is the
following
set of productions:
1: E ! I
2: E ! E +E
3: E ! E _ E
4: E ! (E)
5: I ! a
6: I ! b
7: I ! Ia
8: I ! Ib
9: I ! I0
10: I ! I1
138
Derivations using grammars
_ Recursive inference, using productions from
body to head
_ Derivations, using productions from head to
body.
Example of recursive inference:
String Lang Prod String(s) used
(i) a I 5 -
(ii) b I 6 -
(iii) b0 I 9 (ii)
(iv) b00 I 9 (iii)
(v) a E 1 (i)
(vi) b00 E 1 (iv)
(vii) a+b00 E 2 (v), (vi)
(viii) (a+b00) E 4 (vii)
(ix) a _ (a+b00) E 3 (v), (viii)
139
Let G = (V; T; P; S) be a CFG, A 2 V ,
f_; _g _ (V [ T)_, and A ! 2 P.
Then we write
_A_ )G
__
or, if G is understood
_A_ ) __
and say that _A_ derives __.
We de_ne _) to be the reexive and transitive
closure of ), IOW:
Basis: Let _ 2 (V [ T)_. Then _ _) _.
Induction: If _ _) _, and _ ) , then _ _) .
140
Example: Derivation of a _ (a+b00) from E in
the grammar of slide 138:
E ) E _ E ) I _ E ) a _ E ) a _ (E) )
a_(E+E) ) a_(I+E) ) a_(a+E) ) a_(a+I) )
a _ (a+I0) ) a _ (a+I00) ) a _ (a+b00)
Note: At each step we might have several
rules
to choose from, e.g.
I _ E ) a _ E ) a _ (E), versus
I _ E ) I _ (E) ) a _ (E):
Note2: Not all choices lead to successful
derivations
of a particular string, for instance
E ) E +E
won't lead to a derivation of a _ (a+b00).
141
Leftmost and Rightmost Derivations
Leftmost derivation )lm
: Always replace the leftmost
variable by one of its rule-bodies.
Rightmost derivation )rm
: Always replace the
rightmost variable by one of its rule-bodies.
Leftmost: The derivation on the previous
slide.
Rightmost:
E )rm
E _ E )rm
E_(E) )rm
E_(E+E) )rm
E_(E+I) )rm
E_(E+I0)
)rm
E_(E+I00) )rm
E_(E+b00) )rm
E_(I+b00)
)rm
E _ (a+b00) )rm
I _ (a+b00) )rm
a _ (a+b00)
We can conclude that E _)rm
a _ (a+b00)
142
The Language of a Grammar
If G(V; T; P; S) is a CFG, then the language of
G is
L(G) = fw 2 T_ : S _)G
wg
i.e. the set of strings over T_ derivable from
the start symbol.
If G is a CFG, we call L(G) a
context-free language.
Example: L(Gpal) is a context-free language.
Theorem 5.7:
L(Gpal) = fw 2 f0; 1g_ : w = wRg
Proof: (_-direction.) Suppose w = wR. We
show by induction on jwj that w 2 L(Gpal)
143
Basis: jwj = 0, or jwj = 1. Then w is _; 0;
or 1. Since P ! _; P ! 0; and P ! 1 are
productions, we conclude that P _)G
w in all
base cases.
Induction: Suppose jwj _ 2. Since w = wR,
we have w = 0x0, or w = 1x1, and x = xR.
If w = 0x0 we know from the IH that P _) x.
Then
P ) 0P0 _) 0x0 = w
Thus w 2 L(Gpal).
The case for w = 1x1 is similar.
144
(_-direction.) We assume that w 2 L(Gpal)
and must show that w = wR.
Since w 2 L(Gpal), we have P _) w.
We do an induction of the length of _).
Basis: The derivation P _) w is done in one
step.
Then w must be _; 0; or 1, all palindromes.
Induction: Let n _ 1, and suppose the
derivation
takes n+1 steps. Then we must have
w = 0x0 _( 0P0 ( P
or
w = 1x1 _( 1P1 ( P
where the second derivation is done in n
steps.
By the IH x is a palindrome, and the inductive
proof is complete.
145
Sentential Forms
Let G = (V; T; P; S) be a CFG, and _ 2 (V [T)_.
If
S _) _
we say that _ is a sentential form.
If S )lm
_ we say that _ is a left-sentential form,
and if S )rm
_ we say that _ is a right-sentential
form
Note: L(G) is those sentential forms that are
in T_.
146
Example: Take G from slide 138. Then E _ (I
+E)
is a sentential form since
E ) E_E ) E_(E) ) E_(E+E) ) E_(I+E)
This derivation is neither leftmost, nor
rightmost
Example: a _ E is a left-sentential form, since
E )lm
E _ E )lm
I _ E )lm
a_E
Example: E_(E+E) is a right-sentential form,
since
E )rm
E _ E )rm
E _ (E) )rm
E _ (E +E)
147
Parse Trees
_ If w 2 L(G), for some CFG, then w has a
parse tree, which tells us the (syntactic)
structure
of w
_ w could be a program, a SQL-query, an
XMLdocument,
etc.
_ Parse trees are an alternative
representation
to derivations and recursive inferences.
_ There can be several parse trees for the
same
string
_ Ideally there should be only one parse tree
(the \true" structure) for each string, i.e. the
language should be unambiguous.
_ Unfortunately, we cannot always remove the
ambiguity.
148
Constructing Parse Trees
Let G = (V; T; P; S) be a CFG. A tree is a parse
tree for G if:
1. Each interior node is labelled by a variable
in V .
2. Each leaf is labelled by a symbol in V [ T
[ f_g.
Any _-labelled leaf is the only child of its
parent.
3. If an interior node is lablelled A, and its
children (from left to right) labelled
X1;X2; : : : ;Xk;
then A ! X1X2 : : :Xk 2 P.
149
Example: In the grammar
1: E ! I
2: E ! E +E
3: E ! E _ E
4: E ! (E)
___
the following is a parse tree:
E
E+E
I
This parse tree shows the derivation E _) I+E
150
Example: In the grammar
1: P ! _
2: P ! 0
3: P ! 1
4: P ! 0P0
5: P ! 1P1
the following is a parse tree:
P
P
P
00
11
ε
It shows the derivation of P _) 0110.
151
The Yield of a Parse Tree
The yield of a parse tree is the string of leaves
from left to right.
Important are those parse trees where:
1. The yield is a terminal string.
2. The root is labelled by the start symbol
We shall see the the set of yields of these
important parse trees is the language of the
grammar.
152
Example: Below is an important parse tree
E
E*E
I
a
E
EE
I
a
I
I
I
b
()
+
0
0
The yield is a _ (a+b00).
Compare the parse tree with the derivation on
slide 141.
153
Let G = (V; T; P; S) be a CFG, and A 2 V .
We are going to show that the following are
equivalent:
1. We can determine by recursive inference
that w is in the language of A
2. A _) w
3. A _)lm
w, and A _)rm
w
4. There is a parse tree of G with root A and
yield w.
To prove the equivalences, we use the
following
plan.
Recursive
tree
Parse
inference
Leftmost
derivation
Rightmost
Derivation derivation
154
From Inferences to Trees
Theorem 5.12: Let G = (V; T; P; S) be a
CFG, and suppose we can show w to be in
the language of a variable A. Then there is a
parse tree for G with root A and yield w.
Proof: We do an induction of the length of
the inference.
Basis: One step. Then we must have used a
production A ! w. The desired parse tree is
then
A
w
155
Induction: w is inferred in n + 1 steps.
Suppose
the last step was based on a production
A ! X1X2 _ _ _Xk;
where Xi 2 V [ T. We break w up as
w1w2 _ _ _wk;
where wi = Xi, when Xi 2 T, and when Xi 2 V;
then wi was previously inferred being in Xi, in
at most n steps.
By the IH there are parse trees i with root Xi
and yield wi. Then the following is a parse tree
for G with root A and yield w:
A
XXX
www
k
k
12
12...
...
156
From trees to derivations
We'll show how to construct a leftmost
derivation
from a parse tree.
Example: In the grammar of slide 6 there
clearly
is a derivation
E ) I ) Ib ) ab:
Then, for any _ and _ there is a derivation
_E_ ) _I_ ) _Ib_ ) _ab_:
For example, suppose we have a derivation
E ) E +E ) E +(E):
The we can choose _ = E + ( and _ =) and
continue the derivation as
E +(E) ) E +(I) ) E +(Ib) ) E +(ab):
This is why CFG's are called context-free.
157
Theorem 5.14: Let G = (V; T; P; S) be a
CFG, and suppose there is a parse tree with
root labelled A and yield w. Then A _)lm
w in G.
Proof: We do an induction on the height of
the parse tree.
Basis: Height is 1. The tree must look like
A
w
Consequently A ! w 2 P, and A )lm
w.
158
Induction: Height is n + 1. The tree must
look like
A
XXX
www
k
k
12
12...
...
Then w = w1w2 _ _ _wk, where
1. If Xi 2 T, then wi = Xi.
2. If Xi 2 V , then Xi _)lm
wi in G by the IH.
159
Now we construct A _)lm
w by an (inner) induction
by showing that
8i : A _)lm
w1w2 _ _ _wiXi+1Xi+2 _ _ _Xk:
Basis: Let i = 0. We already know that
A )lm
X1Xi+2 _ _ _Xk:
Induction: Make the IH that
A _)lm
w1w2 _ _ _wi�1XiXi+1 _ _ _Xk:
(Case 1:) Xi 2 T. Do nothing, since Xi = wi
gives us
A _)lm
w1w2 _ _ _wiXi+1 _ _ _Xk:
160
(Case 2:) Xi 2 V . By the IH there is a
derivation
Xi )lm
_1 )lm
_2 )lm _ _ _ )lm
wi. By the contexfree
property of derivations we can proceed
with
A _)lm
w1w2 _ _ _wi�1XiXi+1 _ _ _Xk )lm
w1w2 _ _ _wi�1_1Xi+1 _ _ _Xk )lm
w1w2 _ _ _wi�1_2Xi+1 _ _ _Xk )lm
___
w1w2 _ _ _wi�1wiXi+1 _ _ _Xk
161
Example: Let's construct the leftmost
derivation
for the tree
E
E*E
I
a
E
EE
I
a
I
I
I
b
()
+
0
0