Sie sind auf Seite 1von 57

Automata and Language Theory

First Semester SY 2009-2010


To understand language
To know the fundamental units of
language
How can you define a language?
English
Filipino
Japanese
Chinese
Etc…
English
Different entities: letters, words, and
sentences
Groups of letters make up words
Groups of words make up sentences
Not all collection of letters form a valid
word and as well as not all collection of
words form a valid sentence.
Computer Languages
Certain character strings are recognizable words
(DO, IF, END, …)
Strings of words that are recognizable commands
Sets of commands becomes a program ( with or
without data) that can be compiled.
It is necessary for us to adopt a definition of a
“Language Structure” that is, a structure in which
the decision of whether a given string of units
constitutes a valid larger unit is not a matter of
guesswork, but based on explicitly stated rules.
Theory of Formal Languages
“formal” refers to the fact that all rules
for the languages are explicitly stated in
terms of what strings of symbols can occur.
No liberties are tolerated, and no reference
to any “deeper understanding” is required.
Languages - will be considered solely as
symbols on paper and not as expressions of
ideas in the minds of humans.
Not a communication among intellects, but
a game of symbols with formal rules.
The term “formal”, emphasizes that it is
the form of the string of symbols we are
interested in, not the meaning.
Alphabet – finite set of fundamental
units out of which we build structures.
Language – a certain specified set of
strings of characters from the alphabet.
Words – those strings that are
permissible in the language.
Λ (empty string or null string) – a string
having no letter
No matter what “alphabet” we are
considering, the null string will always
Λ and for all languages the null word, if
it is a word in the language, is also Λ
Two words are considered the same if
all their letters are the same and in the
same order.
φ - language that has no words
A subtle but important difference between a word
that has no letters, Λ, and the language that has
no words, φ
It is not true that Λ is a word in the language
φ since this language has no words at all.
A certain language L does not contain the
word Λ and we wish to add it to L, we use the
“union of sets” operation denoted by “+” to
form L + {Λ}.
A certain language L does not contain the
word Λ and we wish to add it to L, we use the
“union of sets” operation denoted by “+” to
form L + {Λ}.
This language is not equal or same as L.
Nevertheless, L + φ is the same as L since no new
word have been added.
The fact that φ is a language even though it
has no words will turn out to be an important
distinction.
English – familiar example of language
The alphabet is the usual set of letters plus
the apostrophe and hyphen. Let us denote
it by a Greek letter capital sigma:

Σ = { a b c d e . . . Z ’ -}

Σ, Γ, φ, Λ - time-honored tradition
If we wished to be supermeticulous, we
would also include in Σ the uppercase
letters and the seldom used diacritical
marks.
Valid Words in the Language
- Strings, more or less, as is done in a
dictionary say,
ENGLISH-WORDS={all the words in a standard
dictionary}
{} – mathematical notational denoting a
set
o Does not have any grammar
o Formal definition of language of the
sentences in English , we must begin by
saying that this time our basic alphabet is
the entries in the dictionary.
Consider Γ, the capital gamma, as the
alphabet

Γ= {the entries in a standard dictionary, plus a


blank space, plus the usual
punctuation marks}

Language, ENGLISH-SENTENCES
Rule: Grammatical Rules
The trick of defining the language ENGLISH-
SENTENCES by listing all the rules of English
grammar allows us to give a finite description of
If we go by the rules of grammar only,
Examine:
“I ate three Tuesdays”
- this string is allowed in formal language
- grammatically correct; only its meaning reveals that it is
ridiculous
Meaning is something that we do not refer to
in formal languages.
Syntax, not semantics or diction
“just like the bad teacher who is interested only
in the correct spelling, not the ideas in a
homework composition”
Two ways of defining Abstract Languages we
treat
Either they will be presented as an alphabet and
the exhaustive list of all valid words, or
They will be presented as an alphabet and a set of
rules defining the acceptable words.
Defining a Language – presenting the
alphabet and then specifying which strings
are words.
The word “specify” is trickier than we may at
first suppose
Consider the Language MY-PET and the
alphabet for this language is
{a c d g o t}
- there is only one word for this language, and for
our own perverse reasons we wish to specify it by
this sentence:
If the Earth and the Moon ever collide, then
MY-PET = {cat}
but, if the Earth and the Moon never collide, then
MY-PET = {dog}
 MY-PET issue
 One or the other of these two events will occur, but
at this point in history of the universe, it is
impossible to be certain whether the word dog is or
is not in the language MY-PET.
 This sentence is therefore not an adequate
specification of the language MY-PET because it is
not useful
 To be an acceptable specification of a language, a
set of rules must enable us to decide, in a finite
amount of time, whether a given string of alphabet
letters is or is not a word in the language.
MY-PET issue continued…
It is not a requirement that all the letters in
the alphabet need to appear in the words
selected for the language.
Language Different entities:
letters/alphabets, words, and sentences
Λ
empty string or null string
a string having no letter
φ
A language with no words
+
union of sets
Automata and Language Theory
Define words in a language
The set of language-defining rules can
be of two kinds
- They can either tell us how to test a
string of alphabet letters that we might be
presented with, to see if it is a valid word,
or
- They can tell us how to construct all the
words in the language by some clear
procedures
Consider an alphabet having only one
letter, the letter x,
Σ = {x}
Language definition: any nonempty string of
alphabet characters is a word:
L1= {x xx xxx xxxx . . .}
Alternate form:
L1= {xn for n = 1 2 3 . . . }
Concatenation – two strings are written
down side by side to form a new longer
string
Example: to concatenate the word xxx and
the word xx, we obtain the word xxxxx.
The words in this language are analogous
to positive integers, and the operation of
concatenation is analogous to addition.

xn concatenated with xm is the word xn+m


Concatenation continued…
It is often convenient to designate the
words in a given language by new
symbols, that is, other than the ones in the
alphabet. Say,
The word xxx is called a and the word xx is b.
Concatenating a and b,
ab = xxxxx
It is not always true that when two words
are concatenated they produce another
word in the language
Concatenation continued…
For example, the language is
L2 = {x xxx xxxxx xxxxxxx . . . }
= {xodd}
= {x2n+1 for n = 0 1 2 3 . . .}
Then, a = xxx and b = xxxxx are both words in
L2, but their concatenation ab = xxxxxxxx is
not in L2.
In these simple examples, when we concatenate a
with b, we get the same word as when we
concatenate b with a.
ab = ba
Concatenation continued…
ab = ba
This relationship does not hold for all
languages. In English when we
concatenate “house” and “boat,” we get
“houseboat,” which is indeed a word but
distinct from “boathouse,” which is
different thing – not because they have
different meanings, but because they are
different words.
“Merry-go-round” and “carousel” mean the
same thing, but they are different words.
Example
Consider another language. Let us begin
with the alphabet:
∑={0 1 2 3 4 5 6 7 8 9}
and define the set of words:
L3= {any finite string of alphabet letters
that does not start with the letter zero}
L3= { 1 2 3 4 5 6 7 8 9 10 11 12 . .
.}
Defining Language rules
… how to test …
L1= {xn for n = 1 2 3 . . . }
… how to construct …
L3= {any finite string of alphabet letters that
does not start with the letter zero}
DEFINITION
We define the function length of a string to be a
number of letters in the string.
Example:
a = xxxx in the language L1
length(a) = 4
if c = 428 in the language L3
length© = 3
Or, we could write directly that in L1
length(xxxx) = 4
and in L3
length(428) = 3
Example:
length(Λ) = 0
if length(w) = 0, then w = Λ.

therefore, another definition of L3


L3={ any finite string of alphabet letters that, if it
has a length more than 1, does not start with a
zero}
There is some inherent ambiguity in the
phrase “any finite string,” since it is not clear
whether we intend to include the null string
(Λ, the string of no letters).
We’ll define a language like L1 but that
does contain Λ.
L4 = { Λ x xx xxx xxxx . . .
}
= { xn for n = 0 1 2 3 . . .
}

Here x0 = Λ, not x0 = 1 as in algebra.


DEFINITION
Let us introduce the function reverse. If a
is a word in some language L, then
reverse(a) is the same string of letters
spelled backward, called the reverse of a,
even if this backward string is not a word
in L.
EXAMPLE
reverse(xxx) = xxx
reverse(xxxx) = xxxx
reverse(145) = 541

But let us also note that in L3


reverse(140) = 041
which is not a word in L3.
DEFINITION
Let us define a new language called
PALINDROME over the alphabet ∑ = {a
b}

PALINDROME = {Λ, all strings x such that


reverse(x)=x}

If we begin listing the elements in


PALINDROME, we find
PALINDROME = {Λ a b aa bb aaa aba
bab bbb
aaaa abba . . .}
DEFINITION
Given an alphabet ∑, we wish to define a
language in which any string of letters from ∑
is a word, even the null string. This language
we shall call the closure of the alphabet.
It is denoted by writing a star (an asterisk)
after the name of the alphabet as a
superscript:
∑*
This notation is sometimes known as the
Kleene star after the logician who was one
of the founders of this subject.
EXAMPLE
If ∑ = {x}, then
∑*=L4= {Λ x xx xxx . . .}

EXAMPLE
If ∑ = { 0 1 }, then
∑* = {Λ 0 1 00 01 10 11 000 001 . . .}

EXAMPLE
If ∑ = {a b c}, then
∑* ={Λ a b c aa ab ac ba bb bc ca
cb cc aaa …}
We can think of the Kleene star as an operation
that makes an infinite language of strings of letters
out of an alphabet. Infinitely many words, each of
finite length.

Lexicographic order
words arranged in size order (words of shortest length first)
and then listed all the words of the same length
alphabetically.
We shall usually follow this method of sequencing a
language.

DEFINITION
If S is a set of words, then by S* we mean the set of
all finite strings formed by concatenating words from
Let us not make the mistake of confusing the two
language

ENGLISH-WORDS* and ENGLISH-SENTENCES

the first language contains the word butterbutterbutterhat,


the second does not.
Words in the ENGLISH-WORDS are the concatenate of
arbitrarily many words from ENGLISH-WORDS
While ENGLISH-SENTENCES are restricted to juxtaposing
only words from ENGLISH-WORDS in an order that compiles
with the rules of grammar.
EXAMPLE
If S = {aa b}, then
S*= {Λ plus any word composed of factors of aa
and b}
= {Λ plus all strings of a’s and b’s in which the
a’s
occur in even clumps}
= {Λ b aa aab baa bbb aaaa aabb baab
bbaa
bbbb aaaab aabaa aabbb baaaa baabb
bbaab … }

The string aabaaab is not S* since it has a clump of a’s


of length 3.
EXAMPLE
If S = {a ab}, then
S*= {Λ plus any word composed of factors of a
and ab}
= {Λ plus all strings of a’s and b’s except those
that
start with b and those that contain double
b}
= {Λ a aa ab aaa aab aba aaaa aaab
aaba abaa
abab aaaaa aaaab aaaba aabaa aabab
abaaa
abaab ababa ...}
To prove that a certain word is in the closure
language S*, we must show how it can be
written as a concatenate of words from the
base set S.

In the last example, to show that abaab is


in S*, we can factor as follows:
(ab)(a)(ab)
These three factors are all in the set S;
therefore, their concatenation is in S*. This is
the only way to factor this string into factors
of (a) and (ab). When this happens, we say
Sometimes factoring is not unique. For
example, consider S= {xx xxx}. Then
S*= {Λ and all strings of more than one
x}
= {xn for n = 0 2 3 4 5 . . .}
= {Λ xx xxx xxxx xxxxx xxxxxx . .
.}

Notice that the word x is not in the language


S*. The string xxxxxxx is in the closure for
any of these three reasons. It is
(xx)(xx)(xxx) or (xx)(xxx)(xx) or
Proof by constructive algorithm - the method of
proving that something exists by showing how to
create it.

If the alphabet has no letters, then its closure is the


language with the null string as its only word,
because Λ is always a word in a Kleene closure.
Symbolically, we write
If ∑ = φ (the empty set),
∑*= {Λ }
This is not the same as
if S = {Λ },
then S*= {Λ }
which is also true for a different reason, that is Λ Λ=
The Kleene closure of two sets can end
up being he same language even if the
two sets that we started with were not.
EXAMPLE
Consider the following languages
S = { a b ab} and T= {a b bb}
Then both S* and T* are languages of a’s and
b’s since any string of a’s and b’s can be
factored into syllables or either (a) and (b),
both of which are in S and T.
If for some reason we with to modify
the concept of closure to refer only to
the concatenation of some (not zero)
string for a set S, we use the notation +
instead of *. For Example

if S={x},
then S+ = {x xx xxx xxxx ….}
(S+ = S* except the word ^)
This is not to say that S+ cannot, in
general, contain a word ^. It can, but
only on the condition that S contains
the word I initially.

What happens if we apply the closure


operator twice?
(S*)* or S**
Theorem 1
For any set S of strings we have S*=S**
Proof
It is analogous to say that if people are
made up of molecules and molecules are
made up of atoms, the people are made up
of atoms
Every words in S** is made up of factors
from S*. Every factor from S* is made up of
factors from S. Therefore, every word in
S** is made up of factors from S.
Therefore, every word in S** is also a word
in S*. We can write
S** ⊂ S*
⊂ means “is contained in or equal to.”
Now, in general, it is true that for any set A
we know that A⊂A*, since A* we can chose
as a word any one from A. So if we
consider A to be our set S*, we have
S* ⊂ S**

Together, these two inclusions prove that


S*=S**
Consider the language S*, where S = {ab
ba}.
a. Write at least 30 words in S*.
b. Can any of this language contain
the substring aaa or bbb?
c. Give another description of this
language.
Consider the language S*,
where S={a b}. How many word
does this language have:
a. of length 2
b. of length 3
c. of length n
Consider the language S*,
where S={aa aba baa}.
a. Show that the words:
aabaa, baaabaaa, & baaaaababaaaa
are all in this language.
b. Can any in this language have an
odd total number
1. Consider the language S*,
where S = {aa ab ba bb}
Give another description of this
language

2. Give an example of set S such that S*


only contains all possible strings of a’s
and b’s that have length divisible by 3.
3. Prove that for all sets S.
a. (S+)* = (S*)*
b. (S+)+ = S+
c. is (S*)+ = (S+)* for all sets S?

Das könnte Ihnen auch gefallen