Sie sind auf Seite 1von 24

LECTURE 34

LINEAR BOUNDED AUTOMATA

 A linear bounded automaton (LBA) is an abstract machine


that would be identical to a Turing machine, except that
during a computation with given input its tape-head is not
allowed to move outside a bounded region of its infinite tape,
the number of accessible tape-cells being a linear function of
the input-size.

 The tape itself has infinite length in order to accomodate


inputs of arbitrary length.

 An equivalent alternative version of this model supplies the


LBA with a different finite-length tape for each input of
different size, the length of the tape being a linear function of
the input-size.

o In this alternative version, there is a supply of


infinitely many finite tapes, there being one tape for
each input-size.

 Like the Turing machine, an LBA is a finite state


machine equipped with a tape.

 Unlike a Turing machine system, an LBA system (the finite


state machine plus its tape) has only finitely many system
states for each input; that is, for each input, an LBA system
behaves as a finite automaton.

 A pertinent difference between an LBA and a FSA is that the


number of states of an FSA is fixed for that FSA, while the

133
number of states of an LBA varies as a linear function of the
amount of input given to that LBA.

 It is possible to show, using the Pumping lemma, that there


are languages which are recognized by some LBA that cannot
be recognized by any push-down automaton. Linear-bounded
automata recognize the class of context-sensitive languages.

 A linear bounded automaton is a multi-track non-


deterministic Turing machine with a tape of some bounded
finite length.

Length = function (Length of the initial input string, constant


c)

Here,

Memory information ≤ c × Input information

 The computation is restricted to the constant bounded area.


The input alphabet contains two special symbols which serve
as left end markers and right end markers which mean the
transitions neither move to the left of the left end marker nor
to the right of the right end marker of the tape.

 A linear bounded automaton can be defined as an 8-tuple (Q,


X, ∑, q0, ML, MR, δ, F) where −

o Q is a finite set of states

o X is the tape alphabet

o ∑ is the input alphabet

o q0 is the initial state

o ML is the left end marker

o MR is the right end marker where MR ≠ ML

o δ is a transition function which maps each pair (state,


tape symbol) to (state, tape symbol, Constant ‘c’)
where c can be 0 or +1 or -1

134
o F is the set of final states

 A deterministic linear bounded automaton is always context-


sensitive and the linear bounded automaton with empty
language is undecidable.

 LBAs have nore power than NPDA.

 LBAs have less power than Turing Machine.

135
LECTURE 35

CONTEXT SENSITIVE LANGUAGE

Definition. A grammar is a quadruple (V, Σ, S, P), such that:

V is a finite set of variable symbols.

Σ is the alphabet (of terminal symbols) of the grammar. It is required


that Σ ∩ V = ∅.

S ∈ V is the starting variable.

P is a finite set of productions πi of the form α → β, where α, β ∈ (Σ∪V


)∗ .

Definition. A grammar G defines a unique subset of (Σ ∪ V )∗ called the


set of sentential forms of G by the following rules:

S is a sentential form of G

A’ := a1βa2 is a sentential form if and only if a := a1αa2 is a sentential


form, where a1, a2 ∈ (Σ ∪ V )∗ and α → β is a production of G

In this case we say a derives a 0 in G and may write this as a ` a 0.

Example. Let G = ({S}, {a}, S, {S → a, S → aSa}). Some sentential forms


of G:

S |- a

S |- aSa

aSa |- aaa

136
a, aaa, aaaaa, aaaaaaa are all strings generated by G. Clearly L(G), the
language generated by G, is {a n|n is odd}.

Given a grammar G = (V, Σ, S, P); G is context-sensitive if every


production in P is of the form

αAβ → αγβ,

with A ∈ V , α, β ∈ (Σ ∪ V )∗ and γ ∈ (V ∪ Σ)∗ − λ. However S → λ is


allowed provided that there is no rule in P with S on the right. A
language L ⊆ Σ ∗ is a context-sensitive language if it is generated by
some context-sensitive grammar G = (V, Σ, S, P). This means that for
every string s ∈ L there is a derivation of s from S, using the
productions in P.

Definition. Given a grammar G = (V, Σ, S, P), we say that G is length


increasing if for all productions α → β in P we have that |α| ≤ |β|.

Context-sensitive languages can also be defined as the class of


languages generated by length increasing grammars. It is evident from
the definition of context-sensitive grammars that they are necessarily
length increasing. The fact that all length increasing grammars
generate context-sensitive languages is slightly more surprising.

Given a length increasing grammar G1 there is a context sensitive


grammar G2 such that L(G1) = L(G2).

Since G1 is length increasing, each production can be written as X0X1 .


. . Xm → Y0Y1 . . . Yn, m ≤ n,

where Xi , Yj ∈ (Σ ∩ V ) ∗ for all 0 ≤ i ≤ m, and 0 ≤ j ≤ n. This is not


necessarily a context-sensitive production. For instance AB → CDE is
length increasing but not context-sensitive. Context-sensitive
productions can have only one variable substituted for on the right-
hand side. Hence to find an equivalent grammar, we need context-
sensitive productions that derive the sentential form on the right from
the one on the left. This can be done quite simply; in the above
example AB → CDE can be rewritten as AB → XB, XB → XDE, X → C,

137
provided that X doesn’t appear in any other productions. In the
general case:

X0X1 . . . Xm → Z0X1 . . . Xm

Z0X1 . . . Xm → Z0Z1 . . . Xm

...

Z0Z1 . . . Zm−1Xm → Z0Z1 . . . Zm−1Ym . . . Yn

Z0Z1 . . . Zm−1Ym . . . Yn → Y0Z1 . . . Zm−1Ym . . . Yn

...

Y0Y1 . . . Zm−1Ym . . . Yn → Y0Y1 . . . Ym−1Ym . . . Yn

Where Zk ∈ (Σ ∩ V ) ∗ for all 0 ≤ k ≤ m − 1. Note that each Zk appears


only in the above set of productions. This and the fact that the entire
sentential form is recorded in the productions (as opposed to using Z0
→ Y0 say), ensures that no derivations are possible other than the one
we intend. Hence applying the above scheme to all of the non-
context-sensitive productions in G1 will yield a context-sensitive
grammar G2 with L(G1) = L(G2).

Definition. A nondeterministic linear bounded automaton (LBA) µ is a


5-tuple (A, Q, δ, q0, F) where the components are defined as follows:

A - The tape alphabet, defined as Σ ∪ {⊳, ⊲, ˩}. These extra symbols


are the left and right end markers and empty symbol respectively. The
read/write head cannot move beyond the end markers.

Q - The set of states. It is assumed that Q ∩ A = ∅

δ : Q×A → P({Q∪F}×A×{0, L, R}) - The transition function. The symbols


0, L, R indicate that the read/write head should remain in place, move
left or right respectively. If (q’ , a’ , d) ∈ δ(q, ⊳) then d = R or d = 0.
Similarly if (q’ , a’ , d) ∈ δ(q, ⊲) then d = 0 or d = L.

q0 - The initial state, a unique element of Q.

138
F - The set of final states, where Q ∩ F = ∅. This ensures that
computation always halts when a final state is reached.

Definition. A configuration of an LBA µ is a string in (Q ∪ A)∗ that


represents the work tape, current state and the position of the
read/write head. For example, µ in its initial state on input s = a0a1 . . .
an would have configuration q0⊳a0a1 . . . an ⊲. Note that the tape cell
currently being scanned is to the right of the state symbol.

Definition. A computation of µ on a string s ∈ Σ∗ is a sequence of


configurations (also called a run) of µ; c0, c1, . . . , cn such that c0 = q0
⊳ s⊲ and if ci = ⊳s1qxs2⊲ and ci+1 = ⊳s1x 0 q 0 s2⊲ then (q 0 , x0 , R) ∈
δ(q, x). Similarly, the read/write head may move left or remain in
place and this is reflected in ci , ci+1. An accepting computation is any
run on µ such that the final state qn is in F.

139
LECTURE 36

CHOMSKY HEIRARCHY OF FORMAL LANGUAGES

According to Chomsky hierarchy, grammars are divided of 4 types:

Type 0 known as unrestricted grammar.

Type 1 known as context sensitive grammar.

Type 2 known as context free grammar.

Type 3 Regular Grammar.

Type 0: Unrestricted Grammar:


Type-0 grammars include all formal grammars. Type 0 grammar
language are recognized by turing machine. These languages are also
known as the Recursively Enumerable languages.

Grammar Production in the form of αβ where

α is (V+T)*V(V+T)*

140
V : Variables
T : Terminals.

β is ( V + T )*.
In type 0 there must be at least one variable on Left side of
production.

For example,

Sab  ba
A  S.

Here, Variables are S, A and Terminals a, b.

Type 1: Context Sensitive Grammar)


Type-1 grammars generate the context-sensitive languages. The
language generated by the grammar are recognized by the Linear
Bound Automata
In Type 1:
I. First of all Type 1 grammar should be Type 0.
II. Grammar Production in the form of

α β

|α||β|

i.e count of symbol in α is less than or equal to β

For Example,
S –> AB
AB –> abc
B –> b

Type 2: Context Free Grammar:


Type-2 grammars generate the context-free languages. The language
generated by the grammar is recognized by a Pushdown automata.
Type-2 grammars generate the context-free languages.
In Type 2,
1. First of all it should be Type 1.
2. Left hand side of production can have only one variable.

141
| α | = 1.

Their is no restriction on β.

For example,
S –> AB
A –> a
B –> b

Type 3: Regular Grammar:


Type-3 grammars generate regular languages. These languages are
exactly all languages that can be accepted by a finite state automaton.
Type 3 is most restricted form of grammar.
Type 3 should be in the given form only :

V –> VT* / T*.


(or)
V –> T*V /T*

for example :
S –> ab.

142
LECTURE 37

BASIC DEFINITION AND DESCRIPTION OF LBA

Definition. Let LBA = (Q, T, V, q0, ♯, δ, F) be a Turing machine,


where

δ : Q × (V \ {♯}) → 2Q×V×{Left, Right, Stay}

and

δ : Q × {♯} → 2Q×{♯}×{Left, Right, Stay}.

Then LBA is a (nondeterministic) linear bounded automaton.

One can observe that ♯ signs cannot be rewritten to any other


symbol, therefore, the automaton can store some results of its
subcomputations only in the space provided by the input, i.e., in
fact, the length of the input can be used during the computation,
only.

Example. Give an LBA that accepts the language {aibici∣i ∈ ℕ}.

Solution:

Idea:

 The automaton rewrites the first a to A, and changes its


state, looks for the first b.

 The automaton rewrites the first b to B, and changes its


state, looks for the first c.

 The automaton rewrites the first c to C, and changes its


state, looks (backward) for the first a.

 The capital letters A,B,C are read without changing them.

 The above movements are repeated.

143
 If finally only capital letters remain between the border ♯
signs, then the automaton accepts (the input).

Formally, let

LBA = ({q0, q1, q2, q3, q4, qf}, {a,b,c}, {a,b,c,A,B,C,♯}, q0, ♯, δ, {qf})

be a deterministic LBA, where δ consists of the next transitions:

1. δ (q0, ♯) = (qf, ♯, Right) – the empty word is accepted by


LBA.

2. δ (q0, a) = (q1, A, Right) – the first (leftmost) a is rewritten


to A and LBA changes its state.

3. δ (q0, B) = (q0, B, Left) – the capital letters B and C are


skipped in state q0,

4. δ (q0, C) = (q0, C, Left) – by moving the head to the left.

5. δ (q1, a) = (q1, a, Right) – letter a is skipped in state q1 to


the right.

6. δ (q1, B) = (q1, B, Right) – capital B is also skipped.

7. δ (q1, b) = (q2, B, Right) – the leftmost b is rewritten by B


and the state becomes q2.

8. δ (q2, b) = (q2, b, Right) – letter b is skipped in state q 2 to


the right.

9. δ (q2, C) = (q2, C, Right) – capital C is also skipped in this


state.

10. δ (q2, c) = (q3, C, Left) – the leftmost c is rewritten by C


and LBA changes its state to q3.

11. δ (q3, a) = (q3, a, Left) – letters a,b are skipped in state q3

12. δ (q3, b) = (q3, b, Left) – by moving the head of the


automaton to the left.

13. δ (q3, C) = (q3, C, Left) – capital letters C,B are skipped in


state q3

14. δ (q3, B) = (q3, B, Left) – by moving the head of the


automaton to the left.

144
15. δ (q0, A) = (q3, A, Right) – the head is positioned after the
last A and the state is changed to q0.

16. δ (q4, B) = (q3, B, Right) – if there is a B after the last A the


state is changed to q4.

17. δ (q4, B) = (q4, B, Right) – in state q4 capital letters B and C


are skipped

18. δ (q4, C) = (q4, C, Right) – by moving the head to the right.

19. δ (qf, ♯) = (q4, ♯, Left) – if in q4 there were only capital


letters on the tape, LBA accepts.

The work of the automaton can be described as follows: it is


clear by transition 1, that λ is accepted.

Otherwise the head reads the first letter of the input: if the
input starts with an a, then it is replaced by A and q1 is the new
state. If the first letter of the input is not a, then LBA gets stuck,
i.e., it is halting without acceptance, since there are no defined
transitions in this state for the other input letters.

In state q1 LBA looks for the first b by moving to the right


(skipping every a and B, if any; and halting without acceptance
by finding other letters before the first b). When a b is found it
is rewritten to a B and the automaton changes its state to q2.

In q2 the first c is searched, the head can move through on b's


and C's (but not on other letters) to the right. When it finds a c
it rewrites by a C and changes the state to q3 and starts to
move back to the left.

In q3 the head can go through on letters a,B,b,C to the left and


when finds an A it steps to the right and the state becomes
q0. In q0 if there is an a under the head, then it is rewritten by an
A and the whole loop starts again.

If in q0 a letter B is found (that could happen when every a is


rewritten by an A already), LBA changes its state to q4. In this
state by stepping to the right LBA checks if every b and c is
already rewritten and if so, i.e., their number is the same as the

145
number of a's and their order was correct (the input is in the
language a*b*c*), then LBA reaches the marker ♯ sign after the
input and accepts.

When at any stage some other letter is being read than it is


expected (by the previous description), then LBA halts without
accepting the input.

Thus, the accepted language is exactly {aibici∣i∈ ℕ}.

Theorem. The class of languages accepted by linear bounded


automata and the class of context-sensitive languages coincide.

Proof. We do not give a formal proof here, instead we present


the idea of a proof. A context-sensitive language can be defined
by a monotone grammar. In a monotone grammar (apart from
the derivation of the empty word, if it is in the language), the
length of the sentential form cannot be decreased by any step of
the derivation. Consequently, starting from the derived word,
and applying the rules in a way which is an inverse, the length is
monotonously decreasing till we obtain the start symbol. In this
way every context-sensitive language can be accepted by an LBA.

The other way around, the work of an LBA can be described by a


grammar, working in inverse way of the generation (starting from
a word the start symbol is the target). These grammars are
similar to the previously used monotone grammars, and thus, if
an LBA is accepting a language L, then L is context-sensitive.

QED.

Actually, there are other models of LBA, in which the workspace


(the maximal tape-length during the computation) is limited
by c1 · ∣w∣ + c0, where w is the input and c0, c1 ∈ ℝ constants. The
accepting power of these models are the same as of those that
have been presented.

However, the deterministic model is more interesting, since it is


related to a long-standing and still open question.

146
It is known that every context-sensitive language can be accepted by
deterministic Turing machines, using at most c2· ∣w∣2 + c1 · ∣w∣
+ c0 space during the computations, where c2, c1, c0 are constants.
However, it is neither proven nor disproved that deterministic linear
bounded automata (using c1 · ∣w∣ + c2 space) can recognize every
context-sensitive language. This is still a famous open problem.

147
LECTURE 38

ORGANIZATION OF LBA

 In 1960, Myhill introduced an automaton model today known as


deterministic linear bounded automaton.

 Shortly thereafter, Landweber proved that the languages accepted


by a deterministic LBA are always context-sensitive.

 In 1964, Kuroda introduced the more general model of


(nondeterministic) linear bounded automata, and showed that the
languages accepted by them are precisely the context-sensitive
languages.
 Suppose that a given LBA M has
o q states,
o m characters in the tape alphabet ,
o and the input length is n
 Then M can be in at most

configurations.
 The halting problem is solvable for linear bounded automata.
o HaltLBA = {< M, w > |M is an LBA and M halts on w} is
decidable.
o An LBA that stops on input w must stop in at most α(|w|)
steps.
 The membership problem is solvable for linear bounded
automata.

148
o ALBA = {< M,w > |M is an LBA and M accepts w} is
decidable.
 The emptiness problem is unsolvable for linear bounded
automata.
o For every Turing machine there is a linear bounded
automaton which accepts the set of strings which are valid
halting computations for the Turing machine.
 A language is accepted by an LBA iff it is context sensitive.
Proof:
If L is a CSL, then L is accepted by some LBA.
Let G = (N, Σ, S, P) be the given grammar such that L(G) = L.
Construct LBA M with tape alphabet Σ × {N ∪ Σ}(2- track machine)
First track holds input string w Second track holds a sentential
form α of G, initialized to S.
If w = , M halts without accepting. Repeat :
1. Non-deterministically select a position i in α.
2. Non-deterministically select a production β → γ of G.
3. If β appears beginning in position i of α, replace β by γ
there.
a. If the sentential form is longer than w, LBA halts
without accepting.
4. Compare the resulting sentential form with w on track 1. If
they match, accept. If not go to step 1.
 If there is a linear bounded automaton M accepting the language
L, then there is a context sensitive grammar generating L − {}.
 Derivation simulates moves of LBA
 Three types of productions
o Productions that can generate two copies of a string in Σ∗ ,
along with some symbols that act as markers to keep the
two copies separate.
o Productions that can simulate a sequence of moves of M.
During this portion of a derivation, one of the two copies
of the original string is left unchanged; the other,
representing the input tape to M, is modified accordingly.

149
o Productions that can erase everything but the unmodified
copy of the string, provided that the simulated moves of
M applied to the other copy cause M to accept.

 Intuitively, a LBA is a (single-tape) nondeterministic TM using


linear space.

 Formally, a LBA is a nondeterministic TM s.t.

o Its input alphabet includes two special symbols # and $,


the left and right endmarkers,

o The LBA has no moves left from # or right from $, nor may
print another symbol over # or $.

o The language accepted by a linear-bounded TM M,

{w | w ϵ (Σ –{#, $})*, q0#w$ |-- * M αqβ for some q ϵ F}.

150
LECTURE 39

BASICS OF CONTEXT FREE LANGUAGES-1

In this section, we prove that the class of context-sensitive


languages is closed under union, concatenation and Kleene-star.
It is also closed under the other set theoretical operations.

Theorem. The class of context-sensitive languages is closed


under the regular operations.

Proof. The proof is constructive. Let L1 and L2 be two context-


sensitive languages. Let the monotone grammars G1 = (N1, T,
S1, P1) and G2 = (N2, T, S2, P2) be in Kuroda normal form and
generate the languages L1 and L2, respectively, such
that N1 ∩ N2 = ∅ (this can be done by renaming nonterminals of a
grammar without affecting the generated language).

First, we show the closure under union.

 If λ ∉ L1 ∪ L2, then let

G = (N1 ∪ N2 ∪ {S}, T, S, P1 ∪ P2 ∪ {S → S1, S → S2}),

where S ∉ N1 ∪ N2, a new symbol. It can be seen


that G generates the language L1 ∪ L2.

 If λ ∈ L1 ∪ L2 (i.e., S1 → λ ∈ P1 and/or S2 → λ ∈ P2), then let

G = (N1 ∪ N2 ∪ {S}, T, S,

P1 ∪ P2 ∪ {S → S1, S → S2, S → λ} \ {S1 → λ, S2 → λ}),

where S ∉ N1 ∪ N2. In this way, G generates the


language L1 ∪ L2.

The closure under concatenation is proven for the following four


cases:

 If λ ∉ L1 and λ ∉ L2, then let

151
G = (N1 ∪ N2 ∪ {S}, T, S, P1 ∪ P2 ∪ {S → S1S2}),

where S ∉ N1 ∪ N2 a new symbol.

 If λ ∈ L1 and λ ∉ L2, then let

G = (N1 ∪ N2 ∪ {S}, T, S, P1 ∪ P2 ∪ {S → S1S2, S → S2} \


{S1 → λ}),

where S is a new symbol.

 If λ ∉ L1 and λ ∈ L2, then let

G = (N1 ∪ N2 ∪ {S}, T, S, P1 ∪ P2 ∪ {S → S1S2, S → S1} \


{S2 → λ}),

where S ∉ N1 ∪ N2 a new symbol.

 If λ∈ L1 and λ ∈ L2, then let

G = (N1 ∪ N2 ∪ {S}, T, S,

P1 ∪ P2 ∪ {S → S1S2, S → S1, S → S2, S → λ} \ {S1 → λ, S2 →


λ}),

where S ∉ N1 ∪ N2.

It can be easily seen that G generates the language L1L2.

Finally, let us consider the closure under Kleene-star. Let


now G1 = (N1, T, S1, P1) and G2 = (N2, T, S2, P2) be in Kuroda normal
form and generate the languages L (both G1 and G2) such
that N1 ∩ N2 =∅.

Let

G = (N1 ∪ N2 ∪ {S, S'},

P1 ∪ P2 ∪ {S →
λ, S → S1, S → S1S2, S → S1S2S', S' → S1, S' → S1S2, S' → S1S2S'} \
{S1 → λ, S2 → λ}),

where S, S' ∉ N1 ∪ N2, they are new symbols. Then G generates


the language L*.

QED.

The closure of the class of context-sensitive languages under


complementation was a famous problem and was open for more

152
than 20 years. In the 1980's, Immerman, Szelepcsényi solved this
problem independently. We present this result without proof.

Theorem. The class of context-sensitive languages is closed


under complementation.

Theorem. The class of context-sensitive languages is closed


under intersection.

Proof. The proof uses the fact that this class is closed both under
union and complementation. Let us consider the context-
sensitive languages L1 and L2. Then, the complement of each of
them is context-sensitive according to the theorem of Immerman
and Szelepcsényi. Their union is also context-sensitive, as we
have proven constructively. The complement of this language is
also context-sensitive. However, this language is the same
as L1 ∩ L2 by the De Morgan law.

QED.

153
LECTURE 40

PROPERTIES OF CONTEXT SENSITIVE GRAMMAR-2

 Context-free languages based on grammars with


productions A → α are very important since they describe
many aspects of programming languages and admit very
efficient parsers.

 CFLs have a natural machine model (PDA) that is useful


e.g. to evaluate arithmetic expressions.

 Properties of CFLs are mostly undecidable, Emptiness and


Finiteness being notable exceptions.

 Hewlett-Packard figured out 40 years ago the reverse


Polish notation is by far the best way to perform lengthy
arithmetic calculations. Very easy to implement with a
stack.

 Why the hedging about “aspects of programming


languages”?

Because some properties of programming language lie


beyond the power of CFLs. Here is a classical example:
variables must be declared before they can be used.

begin

int x;

...

x = 17;

...

end

154
Obviously, we would want the compiler to address this
kind of situation.

 To deal with problems like this one we need to strengthen


our grammars. The key is to remove the constraint of being
“context-free.”

 This leads to another important grammar based class of


languages: context-sensitive languages. As it turns out,
CSL are plenty strong enough to describe programming
languages—but in the real world it does not matter, it is
better to think of programming language as being context-
free, plus a few extra constraints.

 As we will see shortly, though, the associated machines


(linear bounded automata) are quite important in
complexity theory

 A context-sensitive grammar (CSG) is a grammar where all


productions are of the form

αAβ → αγβ where γ ≠ ε

Some authors also allow S → ε in which case S may not


appear on the righthand side of any production. A language
is context-sensitive if it can be generated by a context-
sensitive grammar.

Note the constraint that the replacement string γ ≠ ε; as a


consequence we have

α ⇒ β implies |α| ≤ |β|.

This should look familiar from our discussion of ε-free


CFG.

 Lemma. Every context-sensitive language is decidable.

 Theorem. context-sensitive languages are closed under


union, concatenation, Kleene star and reversal. They are
also closed under ε-free homomorphisms.

155
 Theorem (Kuroda). Every context-sensitive grammar can
be written with productions of the form

A → BC AB → CD A → a

 Theorem. It is undecidable whether a CSG generates the


empty language.

 The development happened in stages:

o Myhill 1960 considered deterministic LBAs.

o Landweber 1963 showed that they produce only


context-sensitive languages.

o Kuroda 1964 generalized to nondeterministic LBAs


and showed that this produces precisely the context-
sensitive languages.

 Theorem. A language is accepted by a (nondeterministic)


LBA iff it is context-sensitive.

156

Das könnte Ihnen auch gefallen