Beruflich Dokumente
Kultur Dokumente
2 Introduction
Computational models, or models of computation, are abstractions of devices that
produce outputs (answers) from inputs (data). For simplicity, we'll assume a basic model
of a computational device as a black box that has a single input channel and a single
output channel. One simple physical form of a such a computational device would be a
black box with a set of buttons on the front, exactly one of which can be pressed at any
time, and a set of lights on the top, exactly one of which is lit at any time. An input value is
specified by pressing one of the buttons; the output value is specified by the single light
that is lit. We can easily think of this device as producing a single output (light) in response
to a single input (button). But we can also think of the device as being given a sequence of
Chapter 14
Page 2
inputs (by punching one button after another) and producing a sequence of outputs (by
one light after another being lit). The most recent input is often referred to as the current
input, and the light currently lit is the current output. Note that what goes on inside the
box can be very complicated or even random; this simple model can be modified to
accommodate any computational task we like. But we are interested only in a small set of
the possible behaviors of the box, and we are interested only in behaviors that are
deterministic.
The simplest computational model is one in which the box computes a function of the
input value. Such a box will produce an output value (turn on a light) solely on the basis
of the current input value (the last button pressed). Because the output value is determined
only by the most recent input value, we say that the computation requires no memory; this
means that no information about previous inputs has to be stored to perform the
computation that determines which light shall be lit. Furthermore, given a particular input,
say the ith button, the output is always the same.
We are interested in a more complex computational model that uses some information
about past inputs in determining its output. Conceptually, we can imagine a device that
keeps track of every input it has processed, that is, the entire input history. Simpler devices
might keep track of less information, such as how many times the leftmost button was
pressed, but even this information is unbounded in the sense that the number that must be
stored may become arbitrarily large, and require arbitrarily much storage to represent it.
We're interested in a simpler class of machines called Finite State Machines (FSM), which
can store only a finite amount of information, and give outputs that depend on that
information and the input.
A Finite State Machine is a device that can be in any one of a specified finite set of states
and which can change state as a result of an input. Thus, each time an FSM gets an input,
we consider it to change states, although it may enter the same state it was in before.
Finite state machines are useful for programming because they provide an alternative
model for controlling program execution. Recall that program control is what determines
which program statement (or block of statements) is to be executed next. The most
common control structure is the default sequential structure; this causes the statements of
a program to be executed sequentially, one after another, just as they appear in the
program. The other control structures are
1. Alternative selection. This is usually embodied as an if, if...else, or switch
statement. This performs one or more tests, and based on the result of the tests,
chooses one block of code from a collection of blocks and executes it. The block
of code may of course be empty, as it is in the false branch of the if statement.
2. Iteration. This is any loop structure; it causes some loop body to be executed a
number of times with exit from the loop based on a test.
3. Subroutine. This causes the current action to be suspended. Control then branches
to the subroutine code; upon completion of the subroutine, the original action
resumes where it left off. Often recursion (which occurs when a subroutine calls
Printed % , " :
Chapter 14
Page 3
Printed % , " :
Chapter 14
Page 4
Example: Our first example of an FSM is one that takes as input a sequence of binary
digits and produces as output the same sequence except alternate ones have been changed
to zeros. Thus the input 011010111 becomes 010010010. The machines has two states s 0
and s1, with s0 being the start state. Both the input and output sets are {0,1}. The next
state and output functions are shown in the tables below.
input
input
S0
S0
S1
S1
S1
S0
state
state
S0
S1
1
1
0
Output
Nextstate
Figure 1
An alternate but equivalent representation for a FSM is shown in Figure 2 below and is
called a state diagram. States are shown as circles; the start state is indicated by the bold
incoming arrow. The next state function and output functions are shown using directed
arrows from one state to another. Each arrow is labeled with one element of I and one
element of O. If the machine is in some state s and the current input symbol is x, then we
follow the arc labeled x/y from s to a new state and produce output y.
0/0
0/0
1/1
S0
S1
1/0
Figure 2
The machine shown above produces a stream of output symbols that is exactly as long as
the input stream. By allowing nothing to be an element of the output symbol set, we can
specify machines whose output stream is shorter than the input. For example, the machine
in Figure 3 has a two symbol input set {a,b}, and produces outputs of {a, b, nothing}. It
reads in strings of as and bs and collapses substrings of as into a single a; and substrings
of bs into a single b. Hence the input string aaaabaabbbbbabbab will produce the output
abababab, and the input aaaaaaaaaaaaaab will produce ab.
Printed % , " :
Chapter 14
Page 5
a/nothing
a/a
S0
S1
a/a
b/b
b/b
S
b/nothing
Figure 3
Figure 4
With postal rates constantly going up, its impossible to keep this example current. So
just return with us to those thrilling days of yesteryear when a first class stamp was 25
cents.
1
Printed % , " :
Chapter 14
Page 6
In the above table, every possibility has been accounted for. In other words, no matter
what state the machine is in and which input it receives, there is exactly one output to
produce and one next state. For example, suppose the machine is in the state "10 cents". If
the customer puts in a nickel, it adds five cents to the amount put in so far and hence goes
to state "15 cents" and produces the output "nothing". If another nickel is added, the
machine goes to the 20 cents state and again produces nothing. If the customer then
adds a dime, the machine now has enough money to dispense a stamp and have 5 cents
credit left over. Hence the machine produces the output one stamp and goes to the 5
cents state. From any state in the stamp machine, adding a quarter will result in the
output one stamp with no change in the credit balance. And so the new state is in fact
the same as the old.
The state diagram for the stamp machine, shown in Figure 5, provides an easy way to
visualize what the stamp machine does. It starts out in the initial state (credit = 0 cents).
An input can be a nickel, a dime, or a quarter. When the input is received the machine goes
from the current state into a new state, following the arrow which is labeled with the type
of coin that has been inserted. Some arrows are also labeled to denote that a stamp is
produced as output when these paths are taken. If the arrow the machine follows does not
say to output a stamp, then the output is "nothing". Notice that new state doesnt always
imply different state. For any credit balance, adding a quarter produces a stamp and leaves
the credit balance unchanged. Thus from any state, when the input is a quarter, the
machine always enters the same state it was in before and dispenses a stamp.
Figure 5
As it is currently specified, the stamp machine will cheat the user out of some money if he
or she runs out of coins while the machine is in some state other than 0. We could make
the machine more realistic and more humane by adding another input: I am done and
adding four new outputs corresponding to giving change of 5, 10, 15, and 20 cents. We
would then add a new arc from each state to the 0 state with the input symbol being I am
done, and the output being the appropriate amount of change. For example the arc from
the 15 cents state would dispense 15 cents in change. The arc from the 0 state back to
itself would produce no output.
Printed % , " :
Chapter 14
Page 7
4 Implementing a FSM
A FSM can be implemented with a simple loop. Each time through the loop we get one
input symbol, produce the appropriate output symbol, and then go to the next state. When
the input stream is exhausted, the loop terminates and the machine stops.
state=startstate;
while(thereismoreinput)
{
x=nextinputsymbol;
output((state,x)); //Generateappropriateoutput.
state=(state,x); //Movetonextstate.
}
The body of the loop contains three operations. First, we must get the next input symbol.
This could be done with a read statement or perhaps by getting the next element from an
array or linked list. The second and third statements contain function calls to generate the
appropriate output and go to the next state, respectively. This could be done by hard
wiring the output and transition information into the code of the functions or by using a
more general table look-up scheme.
Printed % , " :
Chapter 14
Page 8
since we have now seen an even number of as. The last output symbol produced by the
machine just before it stops, gives the parity of the input. The state diagram for this
machine is shown below. Notice one unusual thing about this machine. Since output is
associated with state transitions, there can be no output produced by the empty string even
though the string of zero as is of even length.
a/O
S0
S1
even
odd
a/E
Figure 6
Figure 7 shows a FSM that determines whether a binary integer is evenly divisible by 2, by
three, by both 2 and 3, or by neither. The input is the sequence of binary digits that
constitute the number (reading left to right). The output is 2, 3, or b (for both) and n (for
neither). The sequence of output symbols has no real significance, although the individual
symbols give the division property of that portion of the input seen so far. The last output
symbol gives the property for the entire number. We can think of such machines as
implementing a function mapping input strings onto a single element of the output set.
0/b
S0
0/b
S3
1/n
S1
1/3
0/2
S2
0/2
1/n
1/3
1/n
0/2
S4
S5
0/2
1/n
Figure 7
How was the state diagram of Figure 7 designed? The divisibility of an integer n by 2 and
3 is determined by the value of n mod 6, or the remainder when n is divided by 6. Thus, if
n is divisible by 6, then it is divisible by both 2 and 3, and if the remainder of n divided by
6 is 4, then n is divisible by 2 (because 4 is) and not by 3 (because 4 is not). The state
subscript on the states in Figure 7 represents the value of n mod 6. When the binary
representation of n is extended by adding a 0, the result is the binary representation of 2n.
When the binary representation of n is extended by adding a 1, the result is the binary
representation of 2n + 1. With those facts in hand, constructing the state diagram of Figure
7 is straightforward.
Figure 8 shows a FSM whose input is an arbitrary string of letters, digits, and blanks. The
final output indicates whether the string is blank (consisting solely of blanks) or numeric
(digits and blanks) or alphabetic (letters and blanks) or alphanumeric (letters, digits, and
blanks). Rather than labeling the arcs with all possible inputs, we use 'L' for letter, 'D' for
digit, and 'B' for blank. The outputs are 'B' for blank, 'N' for numeric, 'Ab' for alphabetic,
Printed % , " :
Chapter 14
Page 9
and 'An' for alphanumeric. Note that once a string is found to contain both a letter and a
digit, then it is certainly alphanumeric regardless of what else is in the string. Hence state
4, which is where we go when we find a string to be alphanumeric, is what is called a sink
state. It is to finite state machines what a Roach Motel is to a roach: once you get there,
you can never leave.
L/Ab
S2
D,B/An
L/Ab
S0
B/B
S1
B/B
D/N
S4
L,B,D/An
L,B/An
S3
D/N
Figure 8
6 Acceptor machines
In some cases, we can eliminate the need for the separate output function altogether and
instead incorporate the output into the states. Shown below is a modified version of the
even/odd a machine from Figure 6 But this machine has the output symbols associated
with the states rather than along the transitions. The interpretation is that if the machine is
in a particular state, then it produces the output associated with that state. And the final
output is the output associated with the state in which the machine stops rather than the
output associated with the last transition. The machine below produces the same output as
does the machine in Figure 6. As a bonus, this machine correctly produces the output E for
the empty string.
a
S1
S0
even
odd
Figure 9
The second example is derived from the FSM in Figure 7. We can determine the division
property of the input simply by observing the state that the machine is in when it stops
after having read the entire input sequence. If it ends up in states 2 or 4, then the input is
evenly divisible by only 2; if it ends up in state 3, then it is divisible by 3 only. If the
machine stops in state 0, then the input is divisible by both 2 and 3. And if it ends up in
states 1 or 5, then the input is divisible by neither 2 or 3. Hence we can modify this
machine to incorporate the output into the states as is shown below.
Printed % , " :
Chapter 14
Page 10
S1
n
S0
b
S3
3
S2
2
1
0
S4
2
S5
Figure 10
We can do the same thing with the FSM in Figure 8. State 0 indicates blanks only; state 1
indicates alphabetic; state 2 indicates numeric; and state 3 indicates alphanumeric. The
revised state diagrams are shown in Figures 5 and 6 below. Note that we have been able to
add a new output, e for empty, associated with s0.
L
S2
Ab
L
S0
e
S1
B
B,D
S4
An
S3
N
L,B,D
L,D
Figure 11
A special case of such a machine is one in which the output set contains only two
elements. All input sequences are thus mapped onto one or the other output symbol. We
can think of the output symbols as the binary digits 0 and 1 and associate the notion of
accept with 1 and reject with 0. Then we can think of the machine as either accepting
or rejecting an input depending on whether that input causes the machine to stop in an
accepting or rejecting state. Such machines are called acceptor automata.
The first three examples below are acceptor automata made from examples we have seen
already. These figures also show one further shorthand notation. Instead of writing the
output values 0 or 1 in the each state, we use a double circle to indicate accepting states
(output of 1), and a single circle to indicate rejecting states (output of 0).
The machine in Figure 12 accepts strings of as that are of even length and rejects strings
of odd length. Notice that this machine correctly accepts the empty string. The machine in
Figure 13 accepts binary integers that are evenly divisible by 2 or by 3 or by both. The
machine in Figure 14 accepts strings that are either alphabetic or numeric and rejects
strings that are blank or alphanumeric.
Printed % , " :
Chapter 14
Page 11
a
S0
S1
even
odd
Figure 12
0
1
S0
S1
S2
S3
1
0
S4
S5
0
1
Figure 13
L
S2
B,D
L
S0
S4
S1
L,B,D
D
L,D
S3
D
Figure 14
The machine in Figure 15 might be used in the lexical analysis phase of a compiler. It takes
strings of characters and accepts those that are valid identifiers (for example, names of
variables or procedures). To be accepted, a string must begin with a letter (indicated by
the generic 'L') followed by letters and digits ('D'). We denote by 'S' any character such as
$ or ? that is neither a letter nor digit. Note that encountering any 'S' character takes us
immediately to the sink state S3 which is a rejecting state and from which there is no exit.
Note also that this FSM imposes no limit on the length of the input. Pascal, for example,
allows names of up to 255 characters. However, there is no easy way to impose such a
limit on a FSM. This is a limitation that is inherent to the FSM and one we will consider in
more detail below.
Printed % , " :
Chapter 14
Page 12
L,D
S1
L
S0
S
D,S
S2
L.D.S
Figure 15
The FSM in Figure 16 reads strings of characters and accepts only those strings that
contain a single unsigned integer, possibly preceded and followed by blanks. For the sake
of simplicity, it is customary not to show the sink rejecting state nor the arrows leading to
it. Instead, you can assume that if the machine is in some state s and looking at input
symbol x, and if there is no arrow labeled x leading from s, then you go to the sink
rejecting state and stay there. Thus, for example, if the machine is in state 0 and see the
letter a, then you next go a rejecting sink state.
The five states in this machine can be thought of as being associated with the five different
classes of strings. In state 0 we have seen nothing yet. If the machine stops in s 0 then we
know the input was empty. State 1 indicates that we have seen only blanks so far; ending
there indicates a string made up of blanks only. State 2 is associated with strings that have
zero or more blanks followed by a contiguous substring of digits. In state 3 we know that
we have seen zero or more leading blanks followed by a contiguous substring of digits
followed by one or more blanks. And state 4 (not shown) is the sink state where we go if
we encounter a character other than a digit or blank or if we see more than one contiguous
string of digits. States 2 and 3 are accepting; the others are rejecting.
b
S0
S1
S2
b
b
S3
Figure 16
We can take our integer acceptor one step further by accepting a string of characters that
contains one real number optionally preceded and followed by blanks. The real number can
be represented either in standard notation or in scientific notation. The construction of this
machine is left as an exercise. Try the machine on the following real numbers as well as on
some strings that do not contain valid reals.
Printed % , " :
Chapter 14
Page 13
100
-100
3.1415
6.02E24
-8.8E-11
7 String Searching
Another practical use of acceptors is in string searching. String searching problems are
very common problems, most notably in text editing. The usual statement of the problem
is: "determine whether string X occurs in string Y." (For this problem string X will be
referred to as the pattern and string Y as the target.) The naive method of doing this
would be to write a loop that goes through the target string a character at a time and
checks to see if the pattern occurs beginning at that character. After the complete pattern
has been compared to the target, go to the next character in the target string and start
again. This simple algorithm, (which could be simplified somewhat by using the substring
facility of Java), is implemented by the following code, which we will call algorithm A:
Printed % , " :
Chapter 14
Page 14
This simple solution can be inefficient because, in the worst case, most characters in the
target are examined m times: once to see if it they be the first character in an occurrence of
the pattern, once to see if they could be the second character in an occurrence of the
pattern that began one symbol to the left, etc. Thus the number of character-character
comparisons done in the worst case is (mn) where m is the length of the pattern and n is
the length of the target. A way to avoid the multiple comparisons of each is for the
program to remember some information about the characters that have been read so far.
No characters more than m positions to the left of the character currently being examined
can affect whether or not this character is part of an occurrence of the pattern. This is
because these earlier characters are separated from the current character by a distance
which is longer than the length of the pattern. Therefore all we need to know is, at most,
what the last m characters read were. Since m is a finite number, there are only a finite
number of possible combinations for these m characters to have had. Therefore, this
information can be stored in the states of a finite state machine. This means that a finite
state machine can be used to solve the string searching problem.
What is needed is a finite state machine that will read an input string and accept the string
if it contains the pattern. Since a finite state machine doesn't go back and reread any
Printed % , " :
Chapter 14
Page 15
characters of the input string, the finite state machine will read each character of the target
exactly once. Clearly, this is the better than algorithm A2. The program corresponding to
this finite state machine would look something like this:
Now, all that remains to be specified is the state transition function of the finite state
machine, but note that the finite state machine is determined completely by the pattern.
The following is a finite state machine that finds the first occurrence of the pattern "123"
in target strings of digits.
Figure 17
A high-level description of the automaton is that as it reads characters of the target string
which might be part of an occurrence of the pattern "123" it proceeds straight across the
diagram from left to right. Whenever it finds a character that does not fit the pattern it
must retreat some number of steps. The number of steps it retreats depends on what the
what the previously read target characters were, which is the same thing as saying that it
depends on what state the automaton is in. For a more detailed view, trace what happens
with the input string "2122123". The machine starts in the initial state s0. The first
character read is "2". Since the pattern starts with the character "1", what has been read so
far can't be the beginning of an occurrence of the pattern in the target. Therefore the
machine stays in state s0. Now read the character "1". This is the first character of the
pattern string. This could be at the beginning of an occurrence of the pattern so the
machine goes to state s1. Likewise, read the next character which is "2". At each point this
Although this algorithm is better than algorithm A, it is not the best we can do. Boyer
and Moore have developed a string matching algorithm that is faster, by a linear factor,
than this one.
2
Printed % , " :
Chapter 14
Page 16
still might be an occurrence of the pattern, so the machine moves to s2. Now read a "2".
This is not the next character of the pattern, which is a "3". Therefore, this is not an
occurrence of the pattern and the machine must go backwards two steps to state s 0. Now
read in the next character, which is "1". As before, the machine moves to state s 1.
Continuing on, the machine reads the "2" and goes into state s 2, and then finally reads the
"3" and goes into state s3. This is the accepting state, so upon reaching it the machine
reports that it has found an occurrence of the pattern. At this point it has read the entire
target string, so the task is finished.
From this we can deduce the loop invariant for the corresponding program:
INV. The machine is in state si (0 <= i <= 3) if and only if i is the largest value such
that the last i characters of the target string that were read are equal to the first i
characters of the pattern string.
Thus if the machine is in state s0 (as it is when it hasn't yet read any characters of the target
string), then it has matched 0 characters of the pattern. If it is in state s 2, then the last 2
characters read match the first 2 characters of the pattern. And if we are in s 3, then the
entire pattern must have been found in the target. The simplicity of this loop invariant
should by itself be enough to show that this is a good way to solve the string searching
problem.
A problem which we haven't talked about is how to use this method if the pattern to be
searched for is not known in advance. In this case, the transition function cannot be
prepared ahead of time, but will have to be computed as part of the searching program.
This is somewhat harder, but still easy enough to make the finite automaton method
worthwhile.
This algorithm, known as the Knuth-Morris-Pratt string matching algorithm is a well
known application of finite state machines. The KMP algorithm is somewhat more
complex than we've let on. For any pattern string, it constructs a finite automaton with
which to process the target string.
Printed % , " :
Chapter 14
Page 17
To get a feel for the kinds of languages that cannot be accepted by acceptor automata,
consider the language of algebraic expressions consisting of single letter variable names
and the operators +, -, * and /. The very simple machine in Figure 18 accepts this
language. Figure 19 shows a slightly more complex machine that accepts arithmetic
expressions with one level of parenthesization allowed. But no acceptor automaton can
accept arithmetic expressions that contain unbounded parenthesization.
op
L
S0
S1
S2
L
Figure 18
op
S0
S1
S2
L
(
S0
)
L
op
S1
S2
L
Figure 19
To see why this is true, lets assume the contrary. That is, lets assume that an acceptor
automaton M accepts the language of valid arithmetic expressions with no limit on the
level of parentheses. Since M is a FSM, it must have a finite number of states, say n states.
Now, consider the consider the expression (na)n. This is a string of n left parentheses
followed by the letter a and followed by n right parentheses. It is a valid arithmetic
expression and should thus be accepted by M. If we trace the action of M operating on
this string, we will see that M visits some state s i at least twice while processing the left
parentheses. We know this must be true since M undergoes n state transitions while
processing the n left parentheses and hence visits n+1 states. Since there are only n states
in M, then one of the states, say si, must be visited at least twice.
This is an example of the pigeon hole principle. It derives its name from the pigeon holes
used by post office workers to sort mail. Simply stated, if you have lots of letters to be put
into the only a few holes, then at least one of the holes must receive more than one letter.
More formally, if you have a set of size n, and draw n+1 samples from this set (with
replacement,) then at least one of the elements of the set will be drawn at least twice. In
the case of a FSM, processing an input of length n takes the machine through n+1 states:
the start state plus the n new states that are arrived at via the n state transitions. Hence at
least one state must have been visited at least twice.
Given that M visits si twice, we can divide the string of left parentheses into three parts:
x,y, and z. The first substring x contains the parentheses that take us from the start state s 0
Printed % , " :
Chapter 14
Page 18
to the first occurrence of si. It is possible that x is empty if si is in fact s0. Then substring y
takes us from the first occurrence of si to the second occurrence of si. This substring must
contain at least one symbol. And the substring z simply contains the rest of the left
parentheses. It takes the machine from si to sj, the state that M is in after seeing all of the
left parentheses; z might be empty too. Now, consider the string xza) n. Since y contains at
least one parenthesis, the string xz has fewer left parentheses than does xyz. Hence xza) n
contains more right parentheses than left parentheses and is invalid and should be rejected.
But x takes M from s0 to si; and z takes M from si to sj. Thus in processing both strings,
M is in state sj with a)n remaining to be seen. Since the action of a FSM is completely
determined by its state and the input, M will thus do the same thing on both strings: it will
either accept both or reject both. But thats an error; M was supposed to accept one and
reject the other. And so we have arrived at a contradiction. We assumed that M existed
and have shown that any M that alleges to accept the language will err by either accepting
an invalid string or rejecting a valid string. Hence our only conclusion is that the
assumption that M exists must be false.
Intuitively, the weakness in finite state machines is that they have only a bounded amount
of memory. If a task requires more than that amount of memory, the FSM cannot handle
the task. For example, matching arbitrarily nested parentheses takes an unbounded amount
of memory and hence is beyond the capability of the FSM. Similarly, while a FSM can
accept strings that contain a single integer, no FSM can determine the value of an arbitrary
integer.
This weakness in the power of the FSM clearly limits what we can do with it. However,
the basic notion of the FSM can be extended and used as a control mechanism in more
powerful computational models.
Printed % , " :
Chapter 14
s0
Page 19
fai l
The initial state of the machine is s0, with the counter initialized to 0. As each left or right
parenthesis is encountered in an input string, the next state is chosen based both on the
present state and the value of the counter. When in state s0, if a '(' is read, the counter is
incremented; if a ')' is read, the counter is decremented. If the value of the counter is ever
0 and a ')' is read, the string is not well-formed and the machine enters the 'fail' state. And
if the counter is not equal to 0 when the string is processed, there were more left
parentheses than right ones. Note that both acceptance and the next state function (and in
general, the output function as well) use the value of the registers in determining what to
do next.
A text editor on a computer provides another illustration of the finite state model. The
principal input to a text editor is keystrokes, but the same keystrokes can mean entirely
different things to the text editor, depending on the 'state' of the software. Most
commonly, keystrokes are text to be inserted into the document, but, they may represent
the name of the file under which the document is to be stored, or a character sequence to
be searched for within the document, or instructions on how the document is to be
formatted for printing. Thus, the text editor can be considered to have many states, and its
reaction to a sequence of keystrokes is dependent on which state it is in.
The user-interface for the UCSD Pascal System for the Apple II computer has three states:
system, editor, and filer. What the interface does in response to a users input depends both
on the input and on the state. For example, typing the character e in system state causes
the system to go to the editor state. Once in the editor, typing an e adds that character
to the file currently being edited. And typing an e in the filer state produces a extended
listing of the files on the current volume.
A program skeleton for using a general FSM control structure is given below. Here, the
FSM has been implemented using the Java switch statement, and the end of input is
indicated by a sentinel. Note that the statements inside the case statement (indicated as
"process inputSymbol in state si") are arbitrary statements, including possibly compound
statements, or even another finite state machine construct. The next state function is a
function not only of the state si and the input symbol, but also the other parts of the state
-- the values of registers, counters, arrays, etc.
Printed % , " :
Chapter 14
Page 20
varstate=startState;
while(true)
{
getthenextinputsymbol
//inv:allsymbolsprecedingthecurrentinputhavebeenprocessed
//inthecorrectstate,andthecurrentstatereflectsthe
//inputhistory.
//StopFSMatendofinput.
if(inputsymbol==sentinel)break;
switch(state)
{
cases0:processinputsymbolinstates0;
state=stateTransition(state,inputsymbol);
break;
cases1: processinputsymbolinstates1;
state=stateTransition(state,inputsymbol);
break;
...
casesn: processinputsymbolinstatesn;
state=stateTransition(state,inputsymbol);
break;
}
}
Printed % , " :
Chapter 14
Page 21
over the input and while it is longer than its predecessor, it is easy to write, understand and
modify.
Printed % , " :
Chapter 14
Printed % , " :
Page 22
Chapter 14
Page 23
case4:
}
i++;
}
switch(state)
{
case0:return"|"+res+"|"+"isempty";
case1:return"|"+res+"|"+"isallblank";
case2:return"|"+res+"|"+"isvalid:"+value;
case3:return"|"+res+"|"+"isvalid:"+value;
default:return"|"+res+"|"+"isinvalid";
}
}
The real number acceptor (exercise 7) can be similarly extended to determine the value of
the real.
Outside a comment.
Have just seen a second '/'. Now inside comment mode until <cr>.
Have just seen a '*' that followed a '/'. Now inside comment mode.
S0
S1
not / or *
/
<cr>
S2
Printed % , " :
Chapter 14
S4
not <cr>
Page 24
S3
*
not / or *
not *
Comment Stripper
Given the state diagram, the code is not difficult to write, but it's worth a moment's
reflection to consider how difficult the code would be to understand without knowledge
of how it arose. Documentation of code based on a FSM should always describe the FSM,
either with a diagram or (when a diagram is not feasible) a state transition table, and a
similar description of how the output is generated.
Printed % , " :
Chapter 14
Page 25
//Stripcommentsfromparameterstring.
publicstaticStringcommentStrip(Strings)
//pre:true
//post:ReturnedStringistheinboundstringstrippedofcomments.
{
intstate=0;//CurrentFSMstate.
StringoutS="";//Willholdoutboundstring.
charc='';//Currentcharacter.
for(inti=0;i<s.length();i++)
{
c=s.charAt(i);
switch(state)
{
//Notincommentmode.
case0:if(c=='/')
state=1;
else
outS=outS+c;
break;
//Haveseena'/'.
case1:if(c=='/')
state=2;
else
if(c=='*')
state=3;
else
{
state=0;
outS=outS+'/'+c;
}
break;
//Haveseensecond'/';entercommentmodefortherest
//ofthisline.
case2:if(c=='\n')
{
state=0;
outS=outS+'\n';
}
break;
//Haveseen'/*';entercommentmodeuntil"*/'.
case3:if(c=='*')
state=4;
break;
//Incommentmodeandhaveseena'*'.
//Ifnextcharis'/',leavecommentmode.
case4:if(c=='/')
state=0;
elseif(c!='*')
state=3;
break;
}
}
returnoutS;
}
Chapter 14
Page 26
efficient transmission. A very simple way to do this is to encode any run of three or more
of the same character as <count><character> where count is an integer indicating the
number
of
occurrences
of
character.
For
example,
the
message
aaaaabaaaaacccccccbbbbbbababc would be abbreviated as 5ab5a7c6bababc. Encoding
runs of length two would produce no compression; encoding singletons would actually
make the resulting string longer.
To do the compression requires that we keep track of two things: the character seen most
recently and the number of consecutive occurrences of that character. The former can be
done with a FSM since there are only three characters; keeping track of the count,
however, is beyond what a FSM can do and is handled as an extension. The procedures to
compress a string and to display the compressed string are shown below.
publicstaticStringoutput(intcount,charc)
{
//Compressahomogeneousstringofcharacters.
// ifcount=1,returnc.
// ifcountis2,returncc
// ifcountis3ormore,returncompressedform:count+c
//precount>=1
if(count==1)
returnString.valueOf(c);
else
if(count==2)
returnString.valueOf(c)+String.valueOf(c);
else
returncount+String.valueOf(c);
}
Printed % , " :
Chapter 14
Page 27
publicstaticStringcompress(Strings)
{
//Compresssbyencodingrunsoflength3ormore.
//FSMstates: 's'forstart.
//
'a'seeinga's.
//
'b'seeingb's
//
'c'seeingc's
charstate='s';//FSMstate.
intcount=1; //Currentrunlength.
StringoutString="";//Stringtobereturned.
for(inti=0;i<s.length();i++)
{
switch(state)
{
case's'://Getfirstcharacterofs.
state=s.charAt(0);
break;
case'a': //Haveseenoneormorea's.
if(s.charAt(i)=='a')
count++;
else
{
outString=outString+output(count,'a');
count=1;
state=s.charAt(i);
}
break;
case'b': //Haveseenoneofmoreb's.
if(s.charAt(i)=='b')
count++;
else
{
outString=outString+output(count,'b');
count=1;
state=s.charAt(i);
}
break;
case'c': //Haveseenoneofmorec's.
if(s.charAt(i)=='c')
count++;
else
{
outString=outString+output(count,'c');
count=1;
state=s.charAt(i);
}
break;
}
}
//Flushfinalcharacter(s).
outString=outString+output(count,state);
returnoutString;
}
The above code for compress, developed from a straightforward FSM model, works well,
but has several sections that look very similar. A few moments of thought reveal that the
redundancies are due to the fact that the actions in all the states except the start state are
Printed % , " :
Chapter 14
Page 28
nearly identical. The start state can be eliminated by making the initial state the first
character of the sequence (thus changing the range of the for loop), and the separate cases
for the remaining states can be combined into one, eliminating the redundancies and giving
the following code.
publicstaticStringcompress(Strings)
{
//Compresssbyencodingrunsoflength3ormore.
//pre:true
//post:Returnedvalueiscompressedversionofs.
if(s.length()==0)return"";//Handleemptystring.
charstate=s.charAt(0);//Initializestate.
intcount=1; //Currentrunlength.
StringoutString="";//Stringtobereturned.
for(inti=1;i<s.length();i++)
{
if(s.charAt(i)==state)
//Repeatofpreviouscharacter.
count++;
else
//Newcharacter.
{
outString=outString+output(count,state);
state=s.charAt(i);
count=1;
}
}
//Flushfinalcharacter(s).
outString=outString+output(count,state);
returnoutString;
}
10 Summary
The notion of state and of transition between states based on input is very common,
ranging from the children's board game Candyland to traffic lights to many software
applications. As we have seen the basic finite state machine is not powerful enough to
handle most of these applications, but is easily extended and is the basis for a very useful
and powerful programming paradigm.
Printed % , " :
Chapter 14
Page 29
11 Exercises
1. Draw the state-transition diagram for the following finite state machine and describe in
English what it does.
Set of input symbols = {a,$}
Set of output symbols = {a,0,1,2}
Set of states = {s0,s1,s2}
Initial state = s0
output function F:
Printed % , " :
Chapter 14
Page 30
2. Construct the formal specification for the finite state machine given by the following
state transition diagram and explain in English what it does.
(In the diagram, the notation on the arrows gives the input symbol followed by the output
symbol)
3. Write a program which, given a pattern string, generates the state transition table to be
used by a finite automaton which will find all occurrences of the pattern string in an
unspecified target string.
4. Assume that you have been hired by a traffic light manufacturer to design an
"intelligent" traffic light. Inputs to your system will include signals from various timers,
from sensors that detect the presence of cars in the left-turn and through lanes, and
from pedestrian "walk" buttons. The outputs set the lights (including left-turn and
pedestrian signals), ring a bell for blind pedestrians, and reset the timers.
a.)
b.)
Printed % , " :
Chapter 14
b.)
c.)
d.)
Page 31
6. Suppose that we have encoded the letters of the alphabet as decimal numbers which
are represented in character form. For example, "01" is "a", "02" is "b" and so on.
Write a Pascal program which translates a string of digits into a sequence of characters
using a finite automaton.
7. a.)
b.)
Expand the real number acceptor to calculate the value of the real number.
Ignore the possibility of overflow.
c.)
8. Literal strings are strings of characters delimited by quotation marks ("). Any character
may be contained within a literal string. If a quotation mark is to be represented within
a literal string, it is done by using two consecutive quotation marks ("").
a.)
design an acceptor for literal strings which do not contain any quotation marks
b.)
design an acceptor for literal strings which may contain quotation marks
c.)
b.)
all strings of a's and b's in which the substring "ab" occurs at least twice
c.)
all strings of a's and b's in which the third character from the end is a b.
d.)
all strings of a's and b's which contain either the pattern "aab" or the pattern
"baa"
Printed % , " :
Chapter 14
10.
Page 32
Describe informally the strings accepted by the machines given by the following
diagrams:
a.)
(b.)
11.
Construct a finite state machine which is an acceptor for valid telephone numbers.
Valid telephone numbers consist of the following:
a.)
b.)
If the first digit is not a "0" and the second digit is neither "0" nor "1" then it is a
local number and should be exactly seven digits long.
Printed % , " :
Chapter 14
c.)
12.
Page 33
If the first digit is not a "0" and the second digit is a "0" or a "1" then it is a longdistance number and should be exactly ten digits long.
"The monster has been very difficult to deal with lately. It is almost impossible to wake
him up in the morning. The only thing that will rouse him is 10,000 volts applied to the
bolts on his neck. This wakes him up, but unfortunately it never fails to make him enraged,
and he goes out and terrorizes the village. Whenever this happens, I send Igor out to calm
him down. If this is successful, then the monster becomes docile and will help me with my
experiments. If not, then the villagers threaten him with sticks and pitchforks, and he gets
frightened and retreats back into the castle where he falls asleep. When he is docile, his
only problem is that he is too eager to help and gets in the way. When this happens, I have
Igor sing to him. Under this stimulus he becomes sentimental and sits and hums to himself.
If Igor keeps singing, the monster will fall asleep, but if Igor stops singing the monster
becomes docile and helpful again."
13.
a.)
Consider the monster as a finite automaton. What are the states? What are the
inputs and outputs?
b.)
Printed % , " :
Chapter 14
Page 34
ANALOGY
INTRODUCTION
3.1
Example:thestampmachine
IMPLEMENTING A FSM
ACCEPTOR MACHINES
STRING SEARCHING
12
7.1
NaiveStringSearch(AlgorithmA)
12
7.2
FiniteStateStringSearch(AlgorithmB)
13
15
16
9.1
IntegerReader1
19
9.2
IntegerReader2
20
9.3
Commentlocator
21
9.4
9.5
23
TextCompression
23
Printed % , " :
Chapter 14
Page 35
10 SUMMARY
26
11 EXERCISES
27
Printed % , " :