Sie sind auf Seite 1von 116

Chapter One

You Should Learn To Program

The Mystique of Programming

One of the oddities of the ongoing personal computer revolution has been its
failure to dispel the myth of the programmer as genius. Most people regard
programming to be some black magic accessible only to the high practitioners,
the "real" programmers. We commonly lump programmers into the same
category with neurosurgeons, nuclear physicists, and astronauts. Each of the
latter three professions is immensely difficult, requiring long and careful
training. Those who master such jobs merit a respect bordering on awe.
Moreover, there are not halfway states in these three fields: either you make it
or you fail. We don't have amateur brain surgeons, nuclear hobbyists, or
weekend astronauts.

But programmers do not belong in the same category with neurosurgeons,


nuclear physicists, and astronauts, even though they go to great lengths to give
that impression. Anyone who has been exposed to computers can testify to
the dense fog of jargon that seems to shroud the machines in mystery. Now,
every special group of people creates its own special subset of the language;
it's a normal means of establishing a group identity and often simplifies
communications. But of all the groups I have encountered, programmers have
the most singular lingo. The jargon of programming goes far beyond the
reasonable need for handy terms that precisely describe complex concepts
unique to the group. Why, for example, should a programmer "architect the
program's user interface" instead of simply "designing how people use the
program"? Why do programmers talk about "CPU cycles" as a measure of time
when they aren't actually counting those cycles? Why don't they simply talk
about time?

The answer to these questions is that programmers have developed a culture


that prides itself in a sharply defined hierarchy. It is a highly structured
meritocracy, with the highest positions reserved for those most knowledgeable
in the arcane ways of computers. The members of this meritocracy wear no
badges of rank, no feathered plumes, no sashes or medals. They carry their
rank in their vocabulary; they dazzle with an opulence of acronyms.

Another aspect of this culture is its exaggerated sense of taste and


discernment. Tell a programmer that you prefer BASIC; it's like telling a
Jehovah's Witness that you worship Satan. Prepare yourself for a true fire-and-
brimstone sermon on the evils of BASIC. BASIC, your programmer friend will
tell you, is not merely a slow language or an unstructured language; no, it is an
execreble language, and trashy language. Only peasants use BASIC. You'll get
similar reactions about anything else about computers. Hardware? Brand X
was brought down from on high by Moses himself; everything else is rank trash
undeserving of the name "computer". Operating systems? Anybody who
doesn't use XXX operating system is a simpleton.

Guess where all this puts you, the nonprofessional? You haven't even heard of
this wonderful operating system or that fabulous piece of hardware. If only
simpletons and perverts use other products, you must be an even lower
creature not to have heard of them. Perhaps you'd better slink off before you
are exposed for the presumptuous fraud you are. Leave programming to the
experts who know what they are doing. Right?

Wrong! Programming is not like neurosurgery, nuclear physics, or astronautics.


Programming is like writing, woodworking, or photography. Anybody can do it.
Doing it well, doing it like an expert – that takes a lot of work, a lot of
experience, and a lot of talent. But anybody who can write a comprehensible
paragraph can write a workable program. All it takes is a computer and some
time.

Two Wrong Reasons for Learning to Program

Given that learning to program is within most people's intellectual grasp, the
next question to consider is, why should anyone bother? Before I answer that
question, I must first dispel two commonly cited (but incorrect) reasons for
learning to program.

The first reason has bounced around in countless expressions. Computers are
the coming thing, we are told. Someday soon, they will be everywhere: in your
car, your kitchen, even your comode. Why, you won't be able to flush the toilet
if you can't program. A variation on this theme has shown up in television
commercials. Perhaps you've seen them: You are standing knee-deep in snow.
You are dressed in rags. You are shivering; your teeth are chattering. Looking
through the nearby window, you see warm, happy people, talking and
laughing, talking about computers. Someone notices you, and with a mixture
of pity and contempt they close the drapes. You're out in the cold because You
never learned to program! Your parents never bought you a Brand X
computer!
This may be what some computer manufacturers want you to believe, but it
just isn't true. This arguement is based on naive assumptions about the
relationship between people and technology. No technology, no matter how
useful, makes any headway into the mainstream of society until it is accessible
to the average person. If the computers of the future require as much skill to
operate as, say, private airplanes, then they won't be any more numerous than
private airplanes. Computers won't become as ubiquitous as automobiles until
they are as easy to use as automobiles. So you really don't need to learn to
program to prepare yourself for the brave new world of computers. The LED-
blinking, disk-spinning, RAM-guzzling monsters that left the laboratory will
evolve into cuddly puppies by the time they reach your doorstep.

A second "mis-reason" for learning to program refers to the myriad


employment opportunities that will be available to those graced with the skill
of computer programming. Perhaps people have in mind the stunning
newspaper accounts of sixteen-year-old "whiz kids" who made their first
million before their Junior Prom. The fact is, most of those newspaper stories
were grossly exaggerated. For every person who has gotten rich writing
programs, there must be thousands slaving away on a typical white-collar
salary. As an ex-Atari employee who has known both high income and
protracted unemployment, I can assure you that nobody should enter
programming for the money.

Some Good Reasons for Learning to Program

Why, then, should anybody learn to program? I can give you several good
reasons. The first is recreation – and I'm not talking about computer games. I
refer instead to the creative enjoyment one derives from writing one's own
program. It might seem strange to suggest that someone would go to all the
trouble of writing a computer program for the sheer fun of it. But consider the
range of similar activities that people undertake for recreation: photography,
pot-making, painting, music – the list is very long. Every year people spend
billions of dollars and millions of hours on such hobbies.

For example, when Uncle Fred and Aunt Etna go to San Francisco, they stop by
the Golden Gate Bridge where Uncle Fred takes a snapshot of Aunt Etna
standing in front of the bridge. The lighting isn't particularly good, and the
wind keeps dislodging her hat, and three kids are strangling a seagull in the
background, but Uncle Fred doesn't mind. He could step into any tourist shop
and purchase a picture-postcard of the Golden Gate Bridge, taken by a
professional photographer in an airplane on the one day of the year when the
fog and the light are perfect. But he wouldn't value that postcard as much as
the picture he took all by himself. He figured the lighting, the framing, and the
composition; it's his little work of art and he'll take it over the professional's
picture any day.

Amateur programming is the same as amateur photography. It is a creative


exercise and is therefore fun. Moreover, programming offers a tremendous
range of creative possibilities. Most other creative outlets impose constraints
on your efforts. Either it costs too much, or it's simply impossible with the
materials available. Sure, you could write a great symphony, if only you had a
symphony orchestra to play it. You'd love to make a fun movie, but you don't
have the cameras, equipment, laboratories, actors, or stage. Amateur
programming, on the other hand, is not so constraining. You can write a
program about anything you want. Spaceships, blackjack, accounting books,
wine – if you can imagine it, you can write a program about it. Computer
programming is limitless in scope because its expression is not tied to a
physical medium like clay or paint or photographic film. A computer program is
an assembly of thought and imagination captured in much purer form than we
can achieve with other media. There are constraints, it is true, but like a young
bird on its first successful flight, the constraints are less of a concern than the
possibilities.

The second reason for learning to program is that it will teach you the
importance of clear communications. This is a lesson that I did not learn until I
was well into my twenties. When I was younger, I would shoot off my mouth,
propelling myself into stupid predicaments at which I can now laugh only
because many years have passed. At the time, I angrily blamed these
catastrophes on the blindness of my seniors, who obviously lacked the insight
to recognize my genius. I now realize that these disasters arose because I did
not take the responsibility to express myself clearly.

It's not so easy to blame the computer when you fail to communicate clearly to
it. The young hothead can't blame the computer when his bad input generates
a "Syntax Error" message. He may rant and rave, but in his heart he knows that
the computer is not "out to get him" or "always trying to trip him up". (How
many times do parents hear those lines?) When the temper tantrum ends he
sits down and mutters to himself, "Well, let's figure out what I have to say to
this thing to make it do what I want." The good thing about this experience is
that the computer will eventually respond. If the hothead can only get his
inputs right, the computer will do whatever he wants it to do. The lesson
learned from this experience is important because people are much the same:
they will do almost anything for you, but you've got to make the effort to
communicate clearly. If only I had known that twenty years ago!

The most important reason for learning to program, though, concerns the way
that it will change your thinking. I maintain that learning to program will make
you a better thinker. To explain why, I'll have to digress for a moment and talk
about language.

I once had a crusty old English teacher who loved to embarrass his students.
Each time an assignment was handed in, he would select the worst blooper of
a sentence from the assignments and read it out loud, using voice intonation
to accentuate the absurdity of the blooper. Everybody in the class (save for
one) would howl with laughter at the inanity. When my turn came, I lacked the
wisdom to keep my mouth shut; no, I had to dig my hole deeper. "That's not
right," I argued, "I know what I'm talking about, and that's not what I meant to
say." To which my teacher retorted, "If you can't say it, you don't know it."

That English teacher had expressed one of the fundamental truths of human
existence. Thought and language are intimately associated. The expression of a
thought is not merely a postscript to the process of thinking the thought in the
first place. It's not as if our thoughts exist and grow in some pure, ethereal
"thoughtworld", devoid of any manifestation, until such time as we choose to
pluck one out of the mist and condense it into base words. No! The act of
expressing a thought is part and parcel of the thinking itself. Language is the
vehicle of thought.

This implies that the nature of our thinking is shaped by the nature of the
language we use. For most readers of this book, that language is English. Now,
English is a powerful and expressive language, arguably the most expressive in
the world. It is blessed with a vocabulary three or four times larger than that of
the next largest language. It is also blessed with an unparallelled flexibility. We
speakers of English warp and twist the language to suit our every whim. We
make puns, many of them strained. We coin a torrent of new words with wild
abandon. Advertising people routinely put the language through unbelievable
torture tests. And we shamelessly steal expressions from other languages. (The
French are the longest-suffering victims of this last crime, yet they seem
unappreciative of our attempts at reimbursement with such gems as "jeans"
and "burger".) Through all of this abuse, the English language performs with
truly English aplomb. It readily warps and twists to conform to all the ridiculous
demands we make upon it. It is the "one size fits all" language, the Gumby of
languages. This flexibility bestows upon the English language vast power to
express almost any new idea that comes along.

Yet, in one respect, English fails us. Sometimes we need to think with absolute
precision, with hard, unyielding logic. Sometimes our thoughts must be
disciplined and rigorous. On these occasions, flexibility becomes ambiguity,
and plasticity becomes imprecision. When one word, "dig", can mean three
completely different things (to excavate, to insult, or to appreciate), it loses all
value for absolutely precise expression.

I experienced this limitation of English when I taught college physics.


Oftentimes my students, struggling with difficult concepts, would throw their
hands up in frustration and grieve that they just weren't logical enough to
understand physics. Their admission reflected some truth and some falsehood.
It is false to assert that a human being is not smart enough to understand
physics. It is nonsense to assert that one person is innately incapable of
understanding a concept that another person grasps. It's not as if there were
some section of the brain labelled "Logical Thinking" that shows up as only a
void in some brains. It is true that some of my poorer students were not well-
equipped to learn physics, but their deficit was one of language, not of mind.
Trying to think logically with the English language is like trying to cut down a
tree with a nail file – it's the wrong tool for the job.

What we need is a different language, a language that expresses precise,


rigorous thoughts clearly and simply. I offer for your consideration a computer
language, any computer language. Computer languages are ideal tools for
disciplined thinking, for they are so emphatically narrow-minded. Every word
in a computer language means exactly one thing and only one thing. Words are
put together to form commands according to a simple set of rules for which
there are no exceptions and no variations. Computer languages are very
different from human languages like English. They are utterly useless for the
general-purpose work that human languages must handle, but they are much
better-suited for the special task of rigorous thinking.

Consider, for example, the expression of one of the fundamental laws of


physics, Newton's law of gravitation. In English, the law looks like this:

"The gravitational force between two point masses is directly proportional to


the product of their masses and inversely proportional to the square of the
distance between them."
Sounds pretty imposing, doesn't it? You probably have to read it a few times to
figure out what it's saying, don't you? Since you can't understand this
mouthful, you must be dumb, right?

Not necessarily. Consider now the way we would say the same thing in a
computer language, BASIC:

F=G*M1*M2/R**2

Now, if you can't read BASIC, this might look just as bad as the upper
statement. But I'll ask you to use your imagination for a moment. If you were
equally unfamiliar with English and BASIC, and I presented you with both
statements, which do you think would be easier to learn: the long unwieldy
one or the short simple one.

A language is a vehicle for exploring intellectual territory. English is like a dune


buggy; it's tough, it's resilient, and it can cover a lot of territory. But a
computer language is like a boat; it can take you to places you never dreamed
existed. You'll find your initial attempts at learning slow and tedious. After all,
any language, human or computer, is frustrating to learn at first. But once you
get rolling, you'll experience an exhilirating sense of discovery as you chart new
mental territory and think with a clarity and precision you had never thought
was within your reach. You should learn to program, and do it now, for
ultimately learning to program will do more for you than all the computer
games and spreadsheets and word processors in the world. It will make you a
better thinker.

Chapter Two

How to Talk to a Computer

BASIC

We will embark on this grand adventure using BASIC as our vehicle. BASIC is
the most common computer language; versions of BASIC are available on just
about every microcomputer. BASIC boasts two other advantages that make it
ideal for our efforts.

First, it is an interactive language; when you attempt something in BASIC, you


see the results of your attempt almost immediately. Many other computer
languages require you to endure time-consuming intermediate steps before
you can see the results of your work. BASIC's interactivity has great importance
to a beginning programmer. The fast feedback it offers makes it much easier to
see one's mistakes. Indeed, this fast feedback encourages lots of
experimenting with the language. The trial-and-error approach that BASIC
encourages is an excellent way to learn. Of course, professional programmers
are not supposed to use trial-and-error programming techniques, so computer
scientists disdain BASIC as a naughty language. Since we're not here to become
professional programmers, we'll just thumb our noses at those computer
scientists.

The second advantage of BASIC is its simplicity. Other computer languages


require you to learn all sorts of theory, structure, and special commands
before you can write your first program. It can take days or weeks of study
before one is knowledgeable enough to write the simplest program with the
language. Not so BASIC. This language is so simple that you can start using it in
a matter of minutes. As you learn more, your programs can become better. It's
that simple. And that is exactly how you're going to learn programming. Let us
begin.

Fire up your computer and get it started in BASIC. You'll have to go to the
computer store and buy a BASIC language program. Most computers will
require you to load BASIC into the computer from your disk drive. I can't tell
you how to set up your computer with BASIC because it varies with each
computer. Follow the instruction manual that comes with the BASIC language;
when the screen says "READY", you're all set to go.

KEYBOARD ETIQUETTE

Before you can easily converse with the computer, you will need to learn a few
simple rules of etiquette that will simplify matters for you. These rules aren't
"break this rule and the computer will explode into a million pieces" type rules;
they're not even "break this rule and I'll think that you are a bad person" type
rules. Instead, they are "break this rule and the computer will probably get
confused" type rules.

A computer keyboard acts just like a typewriter keyboard, with a very few
special exceptions. You type a key on the keyboard and the corresponding
character appears on the screen. The exceptions are what give people
problems. One of these is the backspace key. This key takes you back one
space, normally erasing the character to the left. This is the key you use to
correct a typo &emdash; you just backspace all the way back to the typo and
retype from there.
A very tricky key that causes no end of trouble with beginners is the control
key. It is usually marked "Ctrl" on the keyboard. It doesn't control anything at
all. It is really like a second shift key: you hold it down while striking a second
key. What does this tell the computer? That depends on the type of computer,
the program you are using, and very likely the phase of the moon. Control-X,
for example, might delete a character in some situations; in other situations it
might exit a program. How do you know what the control key does in your
situation? You have to study the documentation. Oftentimes the only
explanation of a particular control key combination is found in a footnote on a
loose-leaf addendum that fell out of the binder when you opened the package.
For now, a very good rule is: don't bother with the control key.

It's important to realize that the computer does not carry on a normal human-
type conversation with you. That is, it's not sitting there listening to everything
you type in. In most situations, you type in an entire command like "DO THIS"
and then you hit the RETURN key on the right side of the keyboard. The
RETURN key has a special meaning to the computer. It means "OK, computer,
I've gotten this command just the way I want it. Go ahead and do it now." Lots
of people will type a command into the computer and get mad waiting for the
computer to execute it; meanwhile, the computer is waiting for them to hit the
RETURN key to tell it to go ahead.

Sometimes when you are working with the computer you get stuck. You tell it
to do something and for some reason it just sits there like a dumb ox. Or
maybe you tell it to do one thing and it retorts with some idiotic question like
"How many pages to print?" when you didn't want to print any pages. You tell
it "no pages!" but it just keeps on asking for the number of pages to print. You
type "0", "none", "nada", but it makes no difference; the computer wants to
know how many pages you want to print. Getting stuck like this is very
frustrating and has done more to slow the computer revolution than anything
else. How do you get unstuck? Well, there is always a right way to get unstuck,
some sneaky command like Control-Q that will magically solve all your
problems, but it might take you forever to find that command in the manual.
So it's good to know a few emergency techniques that often work, although
they frequently have some nasty side effects.

The first emergency command is the ESCAPE key. Most computers place the
ESCAPE key on the upper left corner of the keyboard. It's probably labelled
"ESC". ESCAPE is supposed to mean something like "escape from this awful
trap", and is frequently used as a fairly innocuous escape hatch.

If the ESCAPE key doesn't work, the next one to try is often the BREAK key. This
key, if it exists on your computer, will probably be on the upper right corner of
the keyboard; it might be called "BRK". Its name is misleading. It will not break
your computer. It is supposed to mean something like "break off this line of
communication". The BREAK key will almost always break you out of a BASIC
program that's locked up.

Even more powerful than the BREAK key is the RESET key. This is a powerful
and dangerous key. RESET usually means "Stop everything! Erase your
memory! Start all over from scratch!" RESET will erase everything in the
computer's memory: the program, the data, everything. The only survivors will
be the programs stored in ROM and whatever you saved onto disk. When you
use RESET, you are throwing away all the work you did since you last saved
things to disk. But at least it regains control of the computer.

There are rare occasions when even RESET won't do the trick. Some computers
don't have a RESET key. Sometimes the computer goes so completely bonkers
that even the RESET key doesn't bring it around. There is always one last resort
with any computer that has gone wild: turn it off. This is a violent and brutal
tactic, akin to the determined leader slapping his raving underling with the
command, "Get ahold of yourself, kid!" Wait a few seconds for the computer
to completely forget whatever was driving it wild, then turn it back on. It will
awaken calm, serene, and completely unaware of the wild or stubborn
behavior that had possessed it only seconds earlier. Please, however, do not
overuse this tactic; it's not good for the computer's chips if overdone. How
would your body feel if, every time you made a tiny mistake, they put you to
sleep with an anesthetic shot, and woke you up 5 seconds later with a cold
shower?

These rules of etiquette with the computer are only general guidelines; in
many situations they may not apply. But they are useful when you are in doubt
and cannot find the correct course of action.

A COMMAND

Let's try something with the computer. Type the following command into the
computer:
PRINT "HELLO"

This means that you should type the capital letters P,R,I,N,T, a quotation mark
with two ticks, not one, the capital letters H,E,L,L,O, and another quotation
mark with two ticks. When it looks right, press the RETURN key. If you typed it
properly, the computer will print the word "HELLO" directly underneath your
command. Thus, your screen should look like this:

PRINT "HELLO"

HELLO

Congratulations! You have just given the computer a command, and it has
successfully executed your command. You have demonstrated who is the
master and who is the slave.

Now try variations on the theme. Give it the same PRINT command, only this
time, instead of the word "HELLO", try some other words. Try uppercase and
lowercase letters; give it your name to print in some clever sentence. The
general rule is simple: anything you put between the quotation marks, it will
print. Try it.

Now let's try something else. Let's give it a new command, not a PRINT
command, but a nonsense command. Make up something good, like
SNARGLEBLAB, or XQYQKLZ, or whatever, then type it in and press RETURN.
What happens? The computer will probably come back with a short, strange
message. Most likely it will say "SYNTAX ERROR", but it might say something
like "ERROR 23" or "HUH?" or some other indication that it is confused. Here
you encounter one of the most important and frustrating problems in all of
computing, the source of more frazzled nerves, lost hours, and smashed
computers than all other sources combined. I refer to the scurillous, sinful,

SYNTAX ERROR

A syntax error arises whenever a command is misspelled or out of place, or in


some way contains a flaw. To properly understand syntax errors and their
implications, you must first understand some general principles of
communications with flaws.

Consider communications between people, such as speech or writing. We


normally think of such communication as being flawless, yet in truth all
communications between people are full of flaws. The trick is, we are very
good at filling in the gops created by such flaws. For example, in the previous
sentence, I deliberately misspelled the word "gaps"; did you notice? If you did
notice, it certainly didn't stop you and probably didn't even slow you down.
Consider the process you went through when you read the word "gops". On
first reading the word, you didn't recognize it. You probably went back and
reread it to make sure that you got it right. Sure enough, it really does say
"gops". OK, scan memory carefully: is there such a word as "gops"? No, there
isn't. Well, what could it be? Here you went through a long, almost
unconscious pattern-matching process, considering such possibilities as "gobs",
"hops", "tops", "cops", "gods", and so on until finally you came upon "gaps".
You realized that "gaps" fit the sentence perfectly, and so you mentally
replaced "gops" with "gaps". And the entire process probably took you less
than a second.

This kind of activity is very common in human communications. It is absolutely


essential in understanding spoken language, for spoken language is a messy
hodge-podge of words, phrases, and clauses thrown together in a wild jumble.
Let's eavesdrop on a conversation between two people on a city street:

"Fine, uh, did you get the, uh, excuse me, the uh, package from Steve?"

"Steve, jeez, he's too busy. Yeah, I got a quarter. Sure. He gave it to Fred."

"What the hell's he need with it?"

"Hey, there's Tom. Hey, Tom, where'd you get that bump? No, he doesn't need
it, he was just around when Steve needed to hand it off to somebody."

"So when do you collect from Fred?"

"Oh, two, three days."

You probably have very little trouble figuring out the meaning of this
conversation. But if you were too look at it coldly, like an English teacher, why,
you'd find all sorts of mistakes and syntax errors. Indeed, you could easily show
that this conversation makes absolutely no sense.

Yet, it does make sense; you can figure out what's going on. How are we able
to extract meaning from communications that are highly flawed? Some people
give redundancy as the answer. Redundancy is the practice doing the same
thing two or more times as a back-up. The reasoning is that a normal English
sentence is redundant; it says its message more than once, so that if the
message becomes garbled, people can still understand it.

There is much truth to the redundancy arguement, but I don't think that it hits
the nail on the head. I prefer to think in terms of context. A word does not
exist in isolation; it is part of a sentence. A sentence does not exist in isolation;
it is part of a longer communication. And even communications do not exist in
isolation; they are part of a relationship and part of the world. The sentence
provides context for the word; the communication provides context for the
sentence; and the relationship provides context for the communication. The
context provides information that allows us to fill in the gaps in
communications, to correct the flaws.

For example, in the street conversation, the pronoun "it" refers to the package.
We know that even though we talk about other possible antecedents (a
quarter, a bump), because the real thrust of the conversation is about the
package, not the quarter or the bump. The context of the conversation tells us
that. On a higher level, we are able to understand the comment, ". . .Yeah, I
got a quarter. Sure . . .", only if we know something about city streets and
panhandlers. Again, the context of the world gives meaning to an otherwise
flawed sentence.

Now at last, I am ready to make my point about syntax errors. We humans are
able to understand deeply flawed communications only because we
understand the context in which they are made. But computers don't know
anything about the world, so they don't have any context, so they cannot
correct flawed communications as we can. Hence, the tendency of computers
to be so frustratingly picayune, so infuriatingly narrow-minded.

Actually, computers aren't the only ones to exhibit extreme sensitivity to


syntax errors. In the right circumstances, people can be just as block-headed.
How many times have you heard this line:

"We're sorry, sir, but we cannot honor your request. You have not properly
filled out form RC/22b, Authorization for Transmittal of Individual Confidential
Information. Please review this form and fill it out completely and resubmit
your application."
If you think about it, this is really the same response as the computer's
"SYNTAX ERROR" plaint. Both the computer and the bureaucrat respond to a
communication that does not fit their required format with the same
unyielding obtuseness.

Just to be fair, let me throw in another example:

"If the moment of inertia tensor is not diagonal, unstable perturbations will
develop."

Does this sentence make any sense to you? You have no context with which to
understand it. How else could you respond to it other than to throw up your
hands and say "Syntax error!"?

Whose fault is it in a case like this? Who has the responsibility for insuring that
the communication gets through loudly and clearly? The speaker or the
audience? We Americans, steeped in egalitarianism and a spirit of cooperation,
tend to answer that both sides share responsibility for clear communications. I
disagree, and I think that your experiences with computers will lend weight to
my position. I think that the responsibility for clear communications falls
squarely on the shoulders of the speaker.

In theory, the audience of a communication is supposed to give feedback to


the speaker on how clearly the message is getting through. Thus, if you and I
are conversing, and something I say doesn't make sense to you, you are
supposed to say, "Run that by me one more time, please." This scheme insures
that our conversation will move smoothly and efficiently.

In the real world, however, this seldom happens. Sometimes the audience
doesn't want to appear stupid by asking a dumb question. Sometimes the
listener believes that his confusion is temporary and will quickly abate.
Sometimes the listener is so completely lost that he doesn't even know where
to begin. Whatever the cause, he just nods his head knowingly and mutters,
"That's fascinating".

How then can we communicate well if our audience can't be trusted to speak
up when it is confused? There is only one sure-fire solution, and that is to
assume very little of the audience. You must be pessimistic, assuming that
everything you say is lost on the audience. You must drive home every single
point with ruthless drive and determination. You must be a defensive driver,
imagining what is going on in the mind of the audience, trying to guess where
they might be tripped up next. This is the only way to communicate with
confidence.

It is a hard lesson to learn, but the computer provides an excellent training


ground. It assumes nothing, knows nothing. It has no context with which it can
second-guess what you really meant to say. It can respond to your commands
only in their absolute, literal sense. It will indeed drive you crazy. But it will also
discipline you to think hard about exactly what you say. It is a sobering
experience to have so many of your commands rejected as syntax errors
&emdash; you never knew you were so sloppy a communicator. But once you
learn to express yourself clearly and carefully, you will find that the number of
syntax errors you generate will fall.

The lesson you learn in the process will benefit you in many areas other than
computers. Precise communications are important in almost everything we do.
How many arguements have you endured that were started by a
misunderstanding? How many foul-ups have you gotten into because
somebody didn't express themselves exactly? I well remember the time my
father drove 30 miles to pick me up "at the Doggie Diner on Main Street".
Neither of us knew that there were two Doggie Diners on Main Street. He went
to one, I went to the other. If I had been more specific, the screwup would
never have developed.

The most powerful demonstration of the crucial importance of precise


communications is provided by an airline accident some years ago. As I recall,
the tower had instructed the pilot, "Come down 3000". This was an ambiguous
instruction; was the pilot instructed to come down to 3000 feet, or was he
ordered to come down by 3000 feet? The pilot assumed the former, but the
tower meant the latter. The pilot came down to 3000 feet; there was a
mountain at 3200 feet; there were no survivors. Precise communications can
be a matter of life and death.

Chapter Three

Arithmetic, Deferred Execution, and Input

You are now on speaking terms with your computer. The next task is to learn a
few simple expressions, the computer equivalent of "My name is Fred", "Does
this bus go to Notre Dame?", or "Where is the bathroom?". This chapter will
introduce you to three absolutely fundamental facets of computing:
arithmetic, deferred execution, and input. We begin with

ARITHMETIC
Many people mistakenly think that performing arithmetic computations is the
prime function of a computer. In truth, computers spend most of their time
doing far less exalted work: moving bits of information around from one place
to another, painstakingly examining huge piles of data for those few scraps of
data that are just what the user ordered, or rewriting the data in a form that is
easier for the user to appreciate. Nevertheless, arithmetic is an excellent topic
to begin studying because it is familiar to people. If you can do arithmetic on a
calculator, you can do arithmetic on a computer. In fact, it's even easier on the
computer. Try this with your computer:

PRINT 3*4

The computer will type under your command the answer, 12, so quickly that
you might suspect that it's up to some trickery. OK, type in some different
numbers. Use some big, messy numbers like 3254 or 17819. The general rule
is: first, type the word PRINT in capital letters. Then put a space. Then the first
number, an asterisk to mean "multiply" and then the second number. If you
make a mistake, use the BackSpace key to go back over the mistake, then type
it over. When you have it right, press the RETURN key.

You may get a few minor items wrong. For example, when you used a number
like 3254, did you type it as 3254 or as 3,254? That comma in between the 3
and the 2 will generate a syntax error. It may seem picayune, but I warned you
that computers have no sense of context. Because commas are so small and
hard to notice, they cause more syntax errors than any other character. So
watch your commas!

The spaces are also important. Some versions of BASIC use a space as a
"delimiter". A delimiter is a marker that tells you where the end of one word is
and where the beginning of the next word is. It may seem silly
untilyoutrytoreadabunchofwordswithoutanydelimitersatall. So give the
computer a break and give it spaces where it needs them. B u t d o n ' t p u t i n
extraspacesorthecomputerwillgetveryconfused,OK?

There is no reason why you have to restrict yourself to multiplication. If you


wish, you can do addition, subtraction, or division just as easily. The symbol for
multiplication is an asterisk: *. The symbol for addition is a plus sign: +. The
symbol for subtraction is a minus sign: -. And the symbol for division is a slash:
/. With division, the computer will divide the first number by the second
number. With subtraction, the computer will subtract the second number from
the first number. So to subtract 551 from 1879 you type:

PRINT 1879-551

To divide 18 by 3 type:

PRINT 18/3

But what if you want to do more complex calculations? Suppose, for example,
that you want to add 8 to 12 and divide the sum by 4. The first idea that comes
to most people's minds is to type:

PRINT 8+12/4

which will yield a result of 11. Why? Because this command is ambiguous. I
told the computer to do two operations &emdash; an addition and a division.
Which one did I want done first? It makes a difference! The way I described the
problem, I wanted the addition done first, then the division. Instead, the
computer did the division first, dividing 12 by 4 to get 3. Then it added 3 to 8 to
get 11. If it had done what I wanted it to do, it would have added 8 to 12 to get
20, then divided the 20 by 4 to get 5. Quite a mixup, yes?

How does one avoid mixups like this? The primary means is through an idea
called "operator precedence". This is a big phrase that means very little.
Whenever we have a situation in which two operators (an operator is one of
the four arithmetic operation symbols: +, -, *, or /) vie for precedence, we
automatically yield to the * or the /. It's one of those arbitrary rules of the road
like "Y'all drive on the right side of the road, y'hear?" Thus, in our example
above, the computer gave precedence to the division operation over the
addition operation, and performed the division first.

If you are a reasonable and thoughtful person, you probably have two quick
objections to this system of operator precedence. First, you might wonder
what happens when two operators with equal precedence contest each other.
Who wins? Well, it turns out that it doesn't really matter. For example, if I
type:

PRINT 3+4-2
It doesn't matter one bit whether the addition or the subtraction is done first.
Try it. 3+4 is 7; subtract 2 gives 5. If you do it backwards, 4-2 is 2; add 3 gives 5.
See? It doesn't matter what order you do them in. The same thing applies to
multiplication and division:

PRINT 3*4/2

If we do the multiplication first, we get 3*4 is 12; divide by 2 gives 6. If we do


the division first, then we get 4/2 is 2; multiply by 3 gives 6. So operator
precedence doesn't matter with operators of equal precedence.

Your second objection might be, "OK, how do we get the computer to do the
calculation that we really wanted:"

PRINT 8+12/4

In other words, how do we get the computer to add 8 to 12 before it divides by


4? The answer is to bring in a new concept, the parenthesis pair. If you want a
particular operation done first, bundle it up with a pair of parentheses, like so:

PRINT (8+12)/4

I always imagine parentheses as a pair of protective arms huddling two


numbers together, protecting them from the cold winds of operator
precedence. In our example, that 12 belongs with the 8, not the 4, but the
cruel computer would tear our hapless 12 away from the 8 and mate it in
unholy union with the 4. The parentheses become like the bonds of true love,
protecting and preserving relationships that a cold set of rules would violate.
To adapt a phrase, "Parenthesis conquers all." So much for ridiculous
metaphors.

You can use parentheses to build all sorts of intricate arithmetic expressions.
You can pile parentheses on top of parentheses to get ever more complex
expressions. Here is an example:

PRINT (((3+4)/7)+((6-2)/2))/3

What does this mess mean? The way to decode a monstrosity like this is to
start with the innermost operation(s) and work outward. In this example, the
3+4 is an innermost operation, and so is the 6-2. They are innermost because
no parentheses serve to break up the computation. If you were to mentally
perform these operations, you would see that the big long command is
equivalent to:

PRINT (((7)/7)+((4)/2))/3

All I did to get this was to replace the "3+4" with a "7", and replace the "6-2"
with a "4". Now notice that both the 7 and the 4 are surrounded by a complete
pair of parentheses. Now, a pair of parentheses around one single number is a
waste of time, because you don't need to protect a solitary number from
anything. Remember, parentheses protect relationships, not numbers. Having
a pair of parentheses around a number is like putting a paperclip on a single
piece of paper. So get let's get rid of those excess parenthesis:

PRINT ((7/7)+(4/2))/3

Now we have another pair of uncluttered operations: 7/7 and 4/2. Let's make
them come true:

PRINT ((1)+(2))/3

Well, gee, now we have more numbers floating inside extraneous parenthesis.
Out go the extra parentheses:

PRINT (1+2)/3

Now we're getting so close we can smell it. Finish up the operation:

PRINT (3)/3

Clear out the parentheses:

PRINT 3/3

And there is the answer:

PRINT 1

This long exercise shows how the computer figures out a long and messy pile
of parentheses.

How do you create such a pile? There is no specific answer to this question, no
cookbook for building expressions. I can give you a few guidelines that will
make the effort easier. First, when in doubt, use parentheses. Whenever you
want to make sure that a pair of numbers are calculated first, group them
together with a pair of parentheses. Using extra parentheses is like using extra
paper clips: it is a little wasteful but it doesn't hurt, and if it gives you some
insurance, do it.

Second, always count your parentheses to make sure they balance. If you have
five right parentheses, then you must have five left parentheses &emdash; no
more, no less. If your parentheses don't balance, you will generate a syntax
error.

VARIABLES
Congratulations! All of this learning has catapulted you to the level at which
you can use your expensive computer as a $10 calculator. If you are willing to
continue, I can now show you an idea that will take you a little further than
you could go with a calculator. It is the concept of indirection as expressed in
the idea of a variable.

Indirection is one of the most important concepts associated with computers.


It is absolutely essential that you understand indirection if you are to write any
useful programs. More important, indirection is a concept that can be applied
to many real-world considerations.

In the simplest case of indirection, we learn to talk not of a number itself, but
of a box that holds the number, whatever it might be. The box is given a name
so that we can talk about it. For example, try this command on your computer:

FROGGY=12

This command does two actions: first, it creates a box &emdash; a variable
&emdash; that we will call "FROGGY"; second, it puts the number 12 into this
box. From here on, we can talk about FROGGY instead of talking about 12.

You might wonder, why do we need code words for simple numbers? If I want
to mess around with the number 12, why don't I just say 12, instead of going
through all this mumbo-jumbo about FROGGY?
The trick lies in the realization that the actual value at any given instant is not
the essence of the thing. For example, suppose we talked about a different
number: the time. Let's say that you and I are having a conversation about
time. You say, "What time is it?" I say, "The time is 1:22:30." That number,
1:22:30, is formatted in a strange way, but you have to admit that it is a
bonafide number. Thereafter, whenever you think of time, do you think of
1:22:30? Of course not. Time is a variable whose value was 1:22:30 for one
second. When we think about time, we don't fixate on the number 1:22:30; we
instead think of time as a variable that can take many different values. This is
the essence of a variable: something that could be any of many different
numbers, but at any given time has exactly one number. For example, your
speed is a variable: sometimes you are going 55 mph and sometimes you are
going 0 mph. Your bank balance is a variable: one day it might be $27.34 and
another day it might be $5327.34.

The importance of indirection is that it allows us to focus our attention on


grander relationships. Do you remember your grammar school arithmetic
exercises: "You are travelling at 40 mph. How far can you travel in two hours?"
This is a simple arithmetic problem, but behind it lies a much more interesting
and powerful concept. It lies in the equation

distance = speed * time

The big idea here is that this equation only makes sense if you forget the petty
details of exactly what the speed is, and what the time is. It is true whatever
the speed is, and whatever the time is. When we use an equation like this, we
transcend the petty world of numbers and focus our attention on grander
relationships between real-world concepts. If, to understand this equation, you
must use examples ("Well, 40 mph for 2 hours gives 80 miles"), then you have
not fully grasped the concept of indirection. Examples are a useful means of
introducing you to the concept, of supporting your weight as you learn to walk,
but the time must come when you unbolt the trainer wheels and think in terms
of the relationship itself, not merely its application in a few examples.
Variables are the means for doing this.

There is an experimental effort underway at some computer science


laboratories to develop a computer language in which the user is not required
to think in terms of indirection. It is called "programming by example", and is a
total perversion of the philosophy of computing. The user of such languages
does not describe concepts and relationships in their true form; instead, he
provides many examples of their effects. The computer then draws inferences
for the user and engages in the indirection itself. In an extreme application of
this philosophy, the user would not tell the computer that "distance = speed *
time". Instead, the user would tell the computer that "When the speed was 40
mph and the time was 2 hours, the distance was 80 miles; when the speed was
20 mph and the time was 1 hour, the distance was 20 miles." After the user
succeeds in listing enough examples, the computer is able to infer the correct
relationship.

Programming by example appears to be a new application of artificial


intelligence that will make computers more accessible to users by allowing
them to program the computers in simple terms, without being forced to think
in terms of grand generalities. In truth, it is a step backwards, for it reverses
the relationship between human and computer. It forces the human to do the
drudge work, listing lots of petty examples, while the computer engages in the
exalted thinking. The proper relationship between human and computer
makes the human the thinker and the computer the drudge. To realize this
relationship, you must have the courage to use your mind, to think in larger
terms of relationships between variables, not merely individual numbers.
Unless, of course, you enjoy being a drudge.

The concept of indirection is not confined to mathematical contexts. We use


indirection in our language all the time. When we say, "Children look like their
parents", we are making a general statement about the nature of human
beings. Only the most literal of nincompoops is troubled by this statement,
asking "Which children look like which parents?" We all know that the noun
"children" applies to any children. It is a variable; if you want a specific case,
then grab a specific child off the street and plug him into this verbal equation.
Take little Johnny Smith; his parents are Fred and Wilma Smith. Then the
statement becomes "Johnny Smith looks like Fred and Wilma Smith." Again,
the important concept is not about Johnny and Fred and Wilma, but about
children and parents in general.

Time to get back to variables themselves. A variable is a container for a


number. We can save a number into a variable, and thenceforth perform any
operations on the variable, changing its value, multiplying or dividing other
numbers by the variable, using it just as if it were a number itself. And it is a
number, only we don't care when we write the program whether that number
is a 12 or a 513; our program is meant to work with the variable whatever its
value might be.

Some exercises are in order. Try this:

FROGGY=3*5
PRINT FROGGY

Now make some changes:

BIRDIE=FROGGY+5
PRINT BIRDIE

Not only can you put a number into a variable, but you can also take a number
out, as demonstrated by this example. The computer remembers that FROGGY
has a value of 15, and retrieves that value to calculate the value of BIRDIE.

Now for something that might really throw you:

FROGGY=FROGGY+1

If you think in terms of algebra, this equation must look like nonsense. After
all, how can a number equal itself plus 1? The answer is that the line presented
above is not an equation but a command. It is called an assignment statement,
for its true function is not to declare an equality to the world but to put a
number into a variable. An assignment statement tells the computer to take
whatever is on the right side of the equals sign, calculate it to get a number,
and put that number into the variable on the left side of the equals sign. Thus,
the above assignment statement will take the value of FROGGY, which
happens to be 15 just now, and add 1 to it, getting a result of 16. It will then
put that 16 into FROGGY.

It is time to summarize what we have learned before we move on to deferred


execution:

1) You can form an expression out of numbers, operators, and variables.


2) Multiplication and division have precedence over addition and subtraction.
3) Parentheses defeat the normal rules of precedence.
4) Variables are "indirect numbers" and can be treated like numbers.
5) You set a variable's value with an assignment statement.
6) Anything you can calculate, you can PRINT.
With these items under our belts, let's move on to the next topic.

DEFERRED EXECUTION
This rather imposing term, sounding like a temporary reprieve from a death
sentence, in truth means something far less dramatic. In the context of
computers, execution means nothing more than the carrying out of
commands. One does not idly converse with a computer; one instead issues
commands. All of the things you have learned so far, and all of the things that
you will learn, are commands that tell the computer to do something. You
issue the command, and the computer executes the command. The question I
take up in this section is, When does the computer execute your commands?

You might think the question silly. After all, you didn't buy the computer to sit
around and wait for it to execute your commands at its leisure. I can imagine
you barking in true military style, "Computer, when I issue a command, I want
it executed NOW, not later!"

But there are indeed times when it is desirable for the computer to be able to
execute your commands later. A command that is executed now happens once
and is gone forever, but a command that can be executed later can be
executed later tomorrow, and later the next day, and the next, and the next, as
many times as you want. We can give a command right now and expect that it
be executed right now; but it would be even more useful to be able to record a
command right now and execute it at any later date.

This still may seem a bit silly. Why should anyone bother recording a command
for later reference? If I want to PRINT 3+4 sometime next week, why don't I
just type "PRINT 3+4" next week when I need it? Why go to the bother of some
scheme for storing that command for later use?

The answer is, it all depends on how big a command you consider storing.
There isn't much point in storing a simple command such as "PRINT 3+4". But
what if you have a big calculation that has many steps? Typing in all those
steps every single time you wanted to do the calculation would be a big job. If
you could store all those steps the first time, and then call them automatically
every time you needed to do the calculation, then you would have saved a
great deal of time. What a wonderful idea!

There is a term we use for this wonderful idea: we call it a computer program.
A computer program is nothing more than a collection of commands for the
computer, saved for future reference. When you tell the computer to run a
particular program, you are instructing it to execute all those commands that
were stored by the programmer.

There is an interesting analogy here. Suppose that you were the boss at a
factory. It would be wasteful to stand over each worker, telling him or her
what to do at each step of the manufacturing process. ("OK, now put that short
screw into the hole at the top. Good. Now put the nut onto the bolt. Now...") A
much more efficient way is to explain the entire process to the worker before
he or she starts work. Once the worker has memorized the process, you don't
have to worry about him or her any more. This is analogous to the storing of
commands for a computer. What is particularly curious is the concept of a
program that you, the user, did not write. When you buy a computer program
and put it into your computer, it is rather like the boss at the factory saying to
the workers, "Here is a book of instructions for how to build a new machine. I
don't even know what the instructions are, but I like the machine. Follow these
instructions."

The concept of deferred execution is not unique to the computer. We see it in


a variety of places in our regular lives. A cookbook is a set of commands that
tell you how to make food. In the corporate world we have the venerable
"Policies and Procedures Manual" that tells us how to get along in the
corporate environment. But my favorite example is the Constitution of the
United States of America. This document is composed of a set of commands
that prescribe how the government of the USA will operate. It specifies who
will do what, and when, and how. Like a computer program, it has variables:
the President, the Congress, the Supreme Court. Each variable can take
different "values" &emdash; the President can be Washington, Lincoln,
Roosevelt, and so on, but the commands are the same regardless of the
"value" of the President, the Congress, or the Supreme Court. Like any
computer program, a great deal of effort was expended getting each part of
the Constitution just right, tightening up the sloppy wording, making sure that
the commands would work in all conceivable situations. And like any real
computer program, the programmers have spent a long time getting all the
bugs out. Despite this, it has worked very well for nearly two hundred years
now. Show me a computer program with that kind of performance record.

So the concept of deferred execution is really not some weird new idea that
only works in the silicon minds of computers. It's been around for a while. With
computers, though, deferred execution is used in a very pure, clean context,
uncluttered by the complexities of the real world. If you really want to
understand the idea of deferred execution, the computer is the place to see it
clearly.

How do you get deferred execution on your computer? With BASIC, the
technique is simple: give numbers to your commands. Where earlier you
typed:

FROGGY=3*5
BIRDIE=FROGGY+5
PRINT BIRDIE

Now type this:

1 FROGGY=3*5
2 BIRDIE=FROGGY+5
3 PRINT BIRDIE

Those numbers in front of the commands tell the computer that these
instructions are meant to be executed later. The computer will save them for
later use. To prove it to yourself, type LIST. Sure enough, the computer will list
the program that you typed in. It remembers! Even better, it can now execute
all three commands for you. Type

RUN

The computer will respond almost instantly by printing "20" immediately


below the command RUN. It executed all three commands in your little
program, and those three commands together caused it to print the "20".
Congratulations. You have written and executed your first computer program.
Break out the champagne.

Those numbers in front of the commands &emdash; the 1, 2, and 3 that began
the lines &emdash; those numbers actually tell the computer more than the
mere fact that you intend the commands to be executed later. They also
specify the sequence in which the commands are to be executed. The
computer automatically sorts them and executes command #1 first, command
#2 second, and command #3 third. The sequence with which commands are
executed can be vitally important. Consider this sequence of commands:
1. Put the walnut on the table.
2. Move your hand away from the walnut.
3. Hit the walnut with the hammer.

Now, if you got the commands in the wrong sequence, and executed #1, then
#3, then #2, you would truly appreciate the importance of executing
commands in the proper order. That's why we give numbers to these
commands: it makes it very easy for the stupid computer to get the right
commands in the right order.

By the way, it isn't necessary to number the commands 1, 2, 3, . . . and so on.


Most BASIC programmers number their commands 10, 20, 30, . . . and up. The
computer is smart enough to be able to figure out that 10 is smaller than 20,
and so it starts with command #10, then does command #20, then command
#30, and so on. It always starts with the lowest-numbered command, whatever
that is, and then goes to the next larger number, then the next, and so on. You
might wonder, why would anybody want to number their commands by 10's
instead of just plain old 1, 2, 3, . . . Well, consider the wisdom of the Founding
Fathers. They wrote the best Constitution they could, and then they made a
provision for adding amendments to their masterpiece. They knew that, no
matter how good their constitution was, someday there would be a need to
change it. Now, if you number your commands 1, 2, 3, . . . and someday you
need to change your program, what are you going to do if you need to add a
new command between command #2 and command #3? Sorry, the computer
won't allow you to add a command #2 1/2. But if you number your commands
10, 20, 30, . . ., then if you need to add a command between #20 and #30, you
just call it command #25. Unless, of course, you are wiser than the Founding
Fathers, and expect no need to change your program. . .

You can now write very large programs. Just keep adding commands, giving
each a line number, making sure that they are in the order you want, and
trying them out with the RUN command. When you get a program finished the
way you want it, or if you want to save it before ending a session with the
computer, you must tell the computer to save the program to your diskette.
The command for doing this will probably look something like this:

SAVE"MYPROGRAM"

Unfortunately, since all computers are different, you will probably need to type
something slightly different from this. Look up the exact wording in your BASIC
manual under "Saving a Program". The "MYPROGRAM" part is the name that
you give your program. You can give your program almost any name you want.
Call it "THADDEUS" or "AARDVARK" or "ROCK"; about all the computer will
care is that 1) you don't give it a name that it's already using for something
else, and 2) that the name isn't too long &emdash; usually 8 characters is the
limit.

If you don't save the program, then it will be lost as soon as you turn off the
computer or load another program. When you want to get your program back,
you will have to type something just like the SAVE command, only you type
"LOAD" instead of "SAVE". You'll still have to tell it the name of the program
that you want to load.

INPUT
One last topic and we are done with this chapter. I want to introduce you to
the INPUT command. This little command allows the computer to accept input
from the keyboard while the program is running. An example shows how
simple it is:

10 INPUT FROGGY
20 BIRDIE=FROGGY+5
30 PRINT BIRDIE

If you were to RUN this program, you would see a question mark appear on the
screen. The question mark is a prompt, the computer's way of telling you that
it is expecting you to do something. In this case, it is waiting for you to type a
number and press RETURN. When you do this, it will take that number and put
it into FROGGY. Then it will proceed with the rest of the program. That's all the
INPUT statement does; it allows you to type in a number for the computer to
use.

Despite its simplicity, the INPUT command has vast implications for
programming. Up until this point, the programs you could write would always
do the same thing. Your first program, for example, would always calculate
3*5+5 to be equal to 20. Now, this may be an exciting revelation the first ten
or twenty times, but eventually it does get a little boring to be told for the
umpteenth time that 3*5+5 is 20. With the INPUT statement, though, you can
start to have some variety. Using the program listed above, you could type in a
different number each time you ran it, and get a different answer. You could
type in 8, and discover that 8+5 is 13; then you could type in 9, and learn that
9+5 is 14. For real thrills, you could type in a big, scary number like 279, and
find out that 279+5 is 284. Wowie, zowie! Aren't computers impressive?

Have patience, this is only chapter 3.

Chapter Four

Decisions, Decisions

PROGRAM FLOW
I begin this chapter by introducing you to a new concept that you probably
already understand intuitively. It is the concept of program flow. Every
program traces a path as it executes, carrying out first one command, and then
another, and another, until it reaches the end of the program; then the
computer stops executing the program. Program flow is similar to the concept
of a storyline for a movie: first the heroine does this, and then that happens,
and then somebody else does something, and then the heroine does
something heroic, and they all live happily ever after. The story moves from
one event to another as it follows the storyline. So too does the computer
move from one command to another as it follows the program flow. Perhaps
we could even make a movie out of a single program's execution:

Our story starts with FROGGER set equal to 4 ("Wow!"). Then, all of a sudden,
he added 12 to FROGGER! ("Oh, no!") But, quick as a flash, he stored the
answer into BIRDIE ("Whew, that was close!"). In the end, he printed both
FROGGER and BIRDIE, and they were both correct. ("Yay!").

Granted, this would not make a very exciting movie, but it does show the
similarity between program flow and storyline.

The important concept here is that the program flow starts at the first
command of the program, moves to the second command, then the third, the
fourth, and so on until it reaches the end of the program. The program flow
follows a straight path from the beginning of the program to the end, just like a
movie does.

At least, that's how simple programs work. But in this chapter I am going to
introduce you to something you will never see in the movies: program flow
that branches. The best way to appreciate branching program flow is to
imagine a movie that you could change. For example, imagine a movie in which
the beautiful young heroine's car mysteriously breaks down on a dark and
stormy night in front of an old, abandoned house. Blood-red eyes peer from
dark windows. "I'll just wait in the house until morning", our innocent maiden
chirps. "NO!" you cry out, "Don't go into that creepy house!" But in she goes,
the dummy. Well, what if the movie could change in response to your wishes?
When you cry out, "Don't go into that creepy house!" she pauses and says, "
On second thought, I think I'll just wait in the car."

If you think about it, you'll probably agree that movies like this would not be as
powerful or as interesting as regular movies; the heros and heroines would
always do the mature, reasonable thing, and they would as dull and boring as
the rest of us. But in this case, what makes a bad movie makes a good
program. So let's talk about branching.

DECISIONS
The essence of branching lies in decision-making. The act of making decisions
carefully is one of the central components of Western civilization. We take it
for granted, not realizing the degree to which we worship at the altar of
decision-making. Perhaps a little story will drive home how crucial this religious
devotion of ours is.

In 1757 India was a semi-autonomous collection of principalities under the


partial domination of European powers. France and England contended for
primacy on the Indian subcontinent. After a series of complex diplomatic
maneuvers, the British commander, Robert Clive, led a tiny army of 3,000 men
and 8 cannons against an Indian force of 50,000 men and 53 cannons. The
odds looked bad, to say the least. The night before the battle, the Indian
commander, Suraj-ud-Dowlah, celebrated the victory he felt certain to achieve
on the morrow. Clive, by contrast, spent the night meticulously going over
every aspect of the coming battle, inspecting troops, positioning his meager
forces with painstaking care, planning for every possible contingency. When
the battle was fought, the British forces emerged triumphant. The battle of
Plassey delivered India into the British Empire. It was won, not by heroism, or
superior firepower, or better discipline, or technological superiority, but by
careful, thorough, meticulous planning. India was won by sheer force of
decision-making power.

Precisely what is a decision? How do we make decisions? At heart, a decision is


a choice between options. Three steps are required in the decision-making
process:
1. Identify the options available
2. Identify criteria for choosing between options
3. Evaluate criteria and resolve the decision

An example might show this process in action. Suppose you are driving your
car, about to enter an intersection. Suddenly an oncoming car makes a left
turn directly in front of you. What do you do?

Let us begin by dispensing with all the nonsense options such as "Play a game
of bridge" or "Whistle Dixie". Let us focus on the options that might avert an
accident. Three simple possibilities come to mind: swerve to the right, swerve
to the left, or hit the brakes and stop here. There might be more, but we don't
have time to debate; that car is coming fast!

Having identified our options, we must now identify our criteria for choosing
between options. Our prime criterion is, of course, the avoidance of a collision.
Given our uncertainties about the speed of the oncoming car, the condition of
the road surface, and the intentions of the other driver, we cannot be certain
as to the efficacy of any of the options, so we must think in terms of
probabilities. Our decision criterion, then, is, "Which option has the greatest
probability of avoiding a collision?" Again, we could get snazzy and throw in
such considerations as the fact that an impact on the driver's side of our car is
more likely to injure us than an impact on the passenger side, but, again, let's
not dawdle on fine points when time is so short.

We gauge the relative positions and velocities of the two cars and assess the
probabilities of avoiding collisions of each of the three options. From this
assessment we conclude the following: if we attempt a straight-line brake to a
stop, the probability of collision is very high; if we attempt to swerve to the
right, the probability of collision is moderate; and if we attempt to swerve to
the left, the probability of collision is low. Since our decision criterion is the
lowest probability of collision, we choose the option "swerve to the left". That
is an example of decision-making.

This example also demonstrates the fact that most decision-making is


bedevilled by ugly complications and confusing uncertainties. These are the
little nasties that demoralize our efforts to decide things carefully. We throw
our hands up and declare, "It's too complicated to figure out. I'll just choose
arbitrarily." It is sad to see people abandon their greatest human birthright,
the ability to exercise free will by making decisions. There is another way to
deal with complexity and uncertainty in our decision-making efforts. That way
is to strip away the extenuating circumstances, to cut through the underbrush
of complicating factors and get down to the heart of the matter. How can we
reduce decision-making to its absolute essence?

The first step in this process is to reduce a complex decision to a series of


binary choices. Instead of asking, should I take decision A, B, C, or D, we can
use a process of elimination rather like that used in sports playoffs. To
determine the best football team in the National League, we don't throw all
the football teams in the league into a single stadium and have them play one
monstrous, twelve-sided game. Instead, we play a series of binary games, each
game determing one victor and one loser, always pitting victors against victors,
until there is but one ultimate victor. Choosing between two things is always
simpler and easier than choosing between many things.

Sometimes we can simplify even further. When we are considering taking an


action, sometimes it is possible to reduce a decision to "Do I take the action or
don't I?" This yes-no type of decision is the simplest possible way to approach
decisions about actions.

In the case of a binary decision, the criterion for choice between the two
options becomes ridiculously simple. It can be a simple true-false or yes-no
type of criterion. In the case of football, the football game is a decsion-making
process that determines the winner. If team A has the higher score, then team
A is the winner. If team B has the higher score, then team B is the winner.

We have now arrived at a surprisingly simple formula for making decisions. It


takes the form:

IF (condition) THEN (action)

"Condition" is something that is either true or false. There is no uncertainty or


equivocation with "condition": it is one or the other. "Action" is the option that
is selected. If this all sounds strange, just try the example: "condition" is "team
A has the higher score". There's nothing abstruse about that, is there? It's
either true or false; either team A has the higher score, or it doesn't. And
"action" is simply, "team A is the winner". Thus, the statement becomes:

IF (team A has the higher score) THEN (team A is the winner)


If you understand this simple concept of decision-making, then you are ready
to use it in your programs, for this is exactly the way that decisions are made in
BASIC. The statement in BASIC that makes decisions is called the IF-statement,
and it looks like this:

IF condition THEN command

In this statement, the command-part is any regular BASIC command, such as


FROGGER=5 or PRINT BIRDIE. The condition-part is a little trickier. It is a logical
expression. A logical expression is not a command. It is a statement, a
declaration that may or may not be true. Sometimes a logical expression can
look just like a command, but that doesn't make it the same. Here's an
example of what I mean:

10 FROGGY=5
20 IF FROGGY=6 THEN PRINT "YOU GOOFED!"

Line 10 is a command; it tells the computer, "Computer, I command you to put


a 5 into FROGGY!" But in line 20, the reference to FROGGY is a logical
expression. It asks the computer if FROGGY really is 6. The computer will, of
course, evaluate the statement as false, because FROGGY is actually 5, and so
it will not print "YOU GOOFED!"

Logical expressions can take many forms. They can evaluate equality with the
equals sign (=), as in the above example. You can get much more complex than
that example if you want:

30 IF (FROGGY+2)*5=(BIRDIE-7)/3 THEN PRINT "FROGGY'S the one"

For that matter, you can also determine if two numbers are unequal. The
symbol to use for this is <>, a "less than" sign followed by a "greater than" sign.
Together, they mean "is not equal to". So we could have the following code:

40 IF FROGGY=BIRDIE THEN PRINT"They are equal!"


50 IF FROGGY<>BIRDIE THEN PRINT"They are not equal!"

A particularly useful aspect of logical expressions is their ability to evaluate


inequalities &emdash; whether one number is bigger or smaller than another.
The symbols for these are ">" (greater than) and "<" (less than). I always get
them straight by thinking that the big guy is on the big side of the sideways V.
Their use is illustrated with these commands:

60 IF FROGGY>BIRDIE THEN PRINT "FROGGY is bigger than BIRDIE"


70 IF FROGGY<BIRDIE THEN PRINT "FROGGY is less than BIRDIE"

Just as with the equality symbol, you can get very messy with the logical
expression:

80 IF ((FROGGY-7)*12)+3 > ((BIRDIE+4)/4)-8 THEN PRINT "What a mess!"

Now let's lean back for a moment and consider broader issues. You may
remember that I began this discussion by emphasizing the need to strip away
petty details and get down to the heart of the matter. Now that we have
reduced decision-making to its simplest form, we must now return to the
question of making real-world decisions. All of this simplistic, yes-no decision-
making may look great in a program, but what good does it do in the real
world? It turns out that we can now use these simple decisions like building
blocks to address a much more complex range of decisions. Mind you, we
won't be able to solve all the world's problems, but you'll be surprised at how
much you can do with this simple IF-THEN statement. There are three ways to
add richness to the decision-making that we can do with the IF-THEN
statement: compound conditions, compound commands, and multiple options.

COMPOUND CONDITIONS
Sometimes you may want to consider several factors before taking an action.
For example, suppose that your program is considering the age of the various
people it is working with, but only wants to consider teenagers whose age is
between 13 and 16? The program must consider two conditions: whether AGE
is greater than 13, and whether age is less than 16. How do you put the two
factors together?

In most BASICs you can solve this problem with Boolean operators. These are
simple words, "AND" and "OR", but they are used much more precisely in
BASIC than in English. You use these operators to couple two logical
expressions, thereby creating a third, compound logical expression. The
general rule is pretty much common sense:

AND: If you AND two logical expressions together, then the result is true only if
both expressions are true (e.g., if expression A AND expression B are both
true).

OR: If you OR two logical expressions together, then the result is true if either
expression is true (e.g., if either expression A OR expression B is true.). If both
expressions are false, then the result is false.

Suppose, for example, that AGE has a value of 14. Here are some examples of
true and false statements:

StatementValue

AGE > 13 TRUE


AGE < 16 TRUE
AGE > 99 FALSE
(AGE > 13) AND (AGE < 16) TRUE
(AGE > 13) AND (AGE > 99) FALSE
(AGE < 16) OR (AGE > 99) TRUE
(AGE > 16) OR (AGE > 99) FALSE
((AGE > 13) AND (AGE < 16)) OR (AGE > 99) TRUE

This last example demonstrates the use of parentheses with logical


expressions. If it generates confusion, just think of the parentheses the same
way that you think about parentheses when you calculate numbers: figure out
the innermost parentheses first and work outward. In fact, it's quite practical
to think of logical expressions in much the same way that you think of
arithmetic, only instead of working with numbers, you are working with just
plain old true-false answers. Let's try it with the last expression in the list. First,
look at each inequality and determine whether it is true or false; substitute
that result into the expression. That gives:

((TRUE) AND (TRUE)) OR (FALSE)

Now discard unneeded parentheses that mark off single expressions:

(TRUE AND TRUE) OR FALSE

Now let's collapse that TRUE AND TRUE phrase into a single result. You will
recall that the rule for AND is that when both expressions are true, then the
result is true. And indeed, both are true. So now we have:
(TRUE) OR FALSE

Discard the unneeded parentheses:

TRUE OR FALSE

And now remember the rule for OR: if either one or the other expression is
true, then the result is true. Therefore, our expression works out to be:

TRUE

The value of compound expressions is that they make it possible to evaluate


very complex situations. For example, consider this bit of code that one might
see in a football program:

70 IF (DOWN = 4) AND ((YARDS > 2) OR ((THEIRSCORE-OURSCORE) < 20)


THEN PRINT"PUNT!"

Can you figure out what it means?

COMPOUND ACTIONS
What happens if you want to execute more than one command if the condition
of an IF-THEN statement is satisfied? Suppose, for example, that you have a
program that asks the user to input the age of a certain person, and you want
to check to make sure that the number that the user types in is reasonable. If
the inputted number is unreasonable, you want to print a message telling the
user that and ask for him to input the value again, then thank the user for
being patient. Your program might look like this:

50 INPUT AGE
60 IF (AGE < 1) OR (AGE > 99) THEN PRINT "That age is odd; please repeat."
70 INPUT AGE
80 PRINT "Thank you for your patience."
90 (this is the rest of the program)

This code would not do what you want. Even if the age were correct, it would
ask for the age a second time, without explaining, and thank the user. In other
words, lines 70 and 80 are executed regardless of the results of the IF-THEN
statement. So there is the problem: how do you put multiple lines inside the
"THEN" part of an IF-THEN statement?
The answer relies on a new command that is very simple in operation. The
command is called GOTO and to use it, you simply type "GOTO n", where n is
the line number of the line you want the computer to go to. The GOTO
command breaks the normal program flow. As you remember, the program
flow starts with the first command in the program and continues to the next
largest line, then the next, the next, and so on until the computer reaches the
end of the program. But the GOTO statement commands the computer to
jump to whatever line number is specified. If you tell the computer to GOTO
60, it will immediately jump there and continue computing from line 60.

This is a very powerful command. It allows you to create whole groups of


commands and execute them all just by telling the computer to GOTO the first
command in the group. At the end of the group, you can tell the computer to
GOTO the line from which it had earlier come. For example, a correct way to
solve the problem of the bad age input would look like this:

50 INPUT AGE
60 IF (AGE < 1) OR (AGE > 99) THEN GOTO 1000
70 (this is the rest of the program)
1000 PRINT "That age is odd; please repeat."
1010 INPUT AGE
1020 PRINT "Thank you for your patience."
1030 GOTO 70

This code will execute properly; if the age is wrong then the program will jump
to the corrective code in lines 1000-1030, then when it is done it will jump
back to line 70 to resume the normal program. If the age is OK, it will do
nothing in line 60 and go straight on to line 70.

There is one minor problem with it, a technical detail. What happens when the
computer reaches the end of the program? It will eventually work its way
through all the line numbers higher than 70 and come to lines 1000-1030. Then
it will execute those lines, even though there was no error. Oops! The solution
to this is a minor command that I never bothered to mention before. It is called
"END" and it means just that. When the computer reaches the END statement,
it stops executing the program. So, you should always put an END statement
after your regular program but before all the little chunks of code like the
example. In our example, the END statement might look like this:
200 END

As it happens, there is a neater way to solve the problem of the bad age input,
but it requires a backwards approach. The code for it looks like this:

50 INPUT AGE
60 IF NOT ((AGE < 1) OR (AGE > 99)) THEN GOTO 70
62 PRINT "That age is odd; please repeat."
64 INPUT AGE
66 PRINT "Thank you for your patience."
70 (this is the rest of the program)

This code reverses the logic of the IF-THEN statement by using the NOT-
operator. This operator just takes the opposite of a logical expression. If an
expression is true, then NOT-expression is false. If the expression is false, then
NOT-expression is true. It's a simple idea, but it can be confusing to figure out.
The same thing is true in English: no statement that isn't written with no
negatives is not knotty to figure out, no? Nevertheless, if you dig through it
diligently, you can figure it out. Line 60 now says, in effect, "if the opposite of
the old condition is true, then skip over lines 62-66 to line 70." The advantage
of this approach over the earlier version is that lines 62-66 automatically feed
into line 70 when they are done without requiring another GOTO statement.
This approach cuts down on the amount of "spaghetti code" that you create.
Spaghetti code is code that is full of GOTO statements, jumping all over the
program. Programs like this are very confusing to read, and so they are difficult
to work with. Since this latter approach is cleaner than the former approach, it
is considered superior. However, please remember that both approaches work
just as well. The difference is one of style, not function.

MULTIPLE OPTIONS
Now we tackle the toughest problem: how do you put together all of these
binary IF-THEN statements to handle complicated sets of options? For
example, how would a BASIC program handle the traffic collision problem that
I used at the beginning of this chapter? Let us assume that the computer is
driving the car (Lord help us!) and is capable of calculating the probabilities of
collision for each of the three options available. How could it choose among
three options with its IF-THEN statement?

The answer requires the use of several IF-THEN statements in a sequence. Let's
say that it has stored the probability of avoiding a collision by swerving to the
left into the variable LEFTSAFE. Similarly, it has stored the probability of
avoiding a collision by swerving to the right into the variable RIGHTSAFE, and
simililarly for STRAIGHTSAFE. It must decide which probability is highest and
indicate the proper course of action. It does this by examining each in turn.
Here's how it's done:

50 IF LEFTSAFE > RIGHTSAFE THEN GOTO 110


60 IF RIGHTSAFE > STRAIGHTSAFE THEN GOTO 90
70 PRINT "GO STRAIGHT"
80 GOTO 200
90 PRINT "SWERVE RIGHT"
100 GOTO 200
110 IF LEFTSAFE < STRAIGHTSAFE THEN GOTO 70
120 PRINT "SWERVE LEFT"
200 END

If you trace the program flow for each of the three possible cases, you will find
that the program correctly deduces the correct course of action in each case.
There is an ambiguity when two of the three values are equal; in this case, the
program will prefer straight over right and right over left; that is an arbitrary
aspect of the order in which the statements are executed.

CONCLUSIONS
This has been a long and involved chapter. What does it all mean? The
important lesson here is that the computer really can make decisions. They are
not the sort of soul-searching, agonizing decisions that we humans make; they
are not expressions of free will. They are simple, mechanical decisions. Yet, we
should not minimize the value of this kind of decision-making. We can
differentiate between decisions involving incalculable factors and decisions
that are merely complex. Moral and emotional decisions like, "Should we get
married?" fall into the incalculable category, but many other decisions, such as
"Which computer should I buy?" are at least theoretically calculable.

You might be surprised to know just how many decisions really can be
submitted to calculation. The crucial element is our ability to express
seemingly incalculable factors in quantitative form. Part of the problem is our
own squeamishness with numbers, a reluctance on our part to reduce flesh-
and-blood issues to numerical form. It seems dehumanizing to reduce issues to
mere numbers.
But what is so "mere" about a number? Precisely what is wrong with
expressing ideas in numerical form? Let's consider a specific example. I once
wrote a program (Balance of Power) that concerned geopolitical conflict. One
of the variables I used in the program is called "INTEGRITY". As you might
guess, this variable keeps track of the integrity of the player in his dealings with
other nations. Now, integrity is one of our most cherished human virtues,
something we revere as special, magic, beyond the soulless world of numbers.
But is it not a quantity that a person can possess more or less of? Are not some
people distinguished by great integrity, while others are possessed of little
integrity? Is it not a small jump from "great or little" integrity to "100 or 20"
integrity? Does not the numerical form allow us greater precision?

I will press the arguement even further. My program uses another quantity
called SHOULD, which measures the degree to which the player should help
another nation in time of need. The value of SHOULD is derived from treaty
commitments made by the player. In other words, if I have made solemn
treaty commitments to you, then SHOULD will have a high value, whereas if I
have made only minor assurances, not guarantees, then SHOULD will have a
low value. My program has a section dealing with the ramifications of a failure
to honor one's commitments. One command from that section looks
something like this:

2240 INTEGRITY=INTEGRITY - SHOULD

Talk about blasphemy; here is a formula for integrity! Before you cross yourself
and reach for the garlic, though, consider this: line 2240 does not present the
formula for integrity; it presents a formula for integrity. The idea expressed in
line 2240 is simple: if you fail to honor your commitments, then your integrity
falls in proportion to the solemnity of the commitment. Is this not a reasonable
concept? Does it not reflect the truth of the world?

Numbers and formulas are only a way to express ideas. They are another
language. There is nothing intrinsically blasphemous about a language. Ideas
can be blaspemous; had I written

2240 INTEGRITY=MURDER + LIES + THEFT

that would be blasphemy. The blasphemy does not arise from the
quantification of the relationship, but rather from the relationship itself.
Integrity does not arise from murder, lies and theft &emdash; no matter how
you say it.

If you can learn how to express thoughts in quantitative form, you will have
made a large step into a new world. Decisions become much clearer when they
can be expressed in calculable form. But beware of taking your new-found
skills too seriously; ultimately, all such decisions should be checked against
simple common sense. Only a fool would take the equation in line 2240 as final
truth.

Chapter Five

Over and over again

So far you have learned a number of things to do with your computer:


arithmetic, input and output, and branching. If you think about it, though, you
might wonder what good it all is. After all, you can do arithmetic with a $10
calculator without learning all this high-faluting nonsense about variables and
constants and deferred execution. The computer's ability to make decisions is
interesting, but you can manage without it. You might be tempted to dismiss
computer programming as a lot of high-sounding pap. In other words: "Big, fat,
hairy deal!"

You would be right to dismiss computers as a waste of time if all they did was
calculate and branch. But we are at last going to learn something that makes
computers really useful, and it is very important that you understand the
fundamental concept behind this capability.

TOOLS
We humans have come a long ways since the good old days of caves and
woolly mammoths. One of the key factors contributing to our progress has
been the development of increasingly powerful tools. We use tools for almost
everything we do. We even use "tool-tools" &emdash; tools that make other
tools.

Just exactly what is a tool? It is a device that allows us to execute some special
function quickly and more easily than would otherwise be possible. Of course,
there is a price we pay for this benefit: the cost of the tool. Suppose, for
example, that I want to dig a small hole. I could grab a nearby rock and scrape
out a hole in five minutes. Or I could go build myself a shovel and then dig the
hole in one minute. Now, building the shovel might take me several hours, so if
all I want to do is dig a single hole, I am better off using the rock. But if I know
that I will be digging quite a few holes in my time, then the savings from the
shovel can really add up. This is the central concept behind any tool. You pay a
steep entry price to get the tool, but each time you use it, you enjoy a savings
of time. If my shovel cost me 60 minutes to build and saves me four minutes
per hole, then after fifteen holes I am ahead of the game.

PLAY IT AGAIN, SAM


Implicit in all of this is one of the most fundamental concepts behind all
technology: the concept of repetition. Doing something once is slow,
cumbersome, and prone to mistakes. But if we do it over and over, we develop
speed and accuracy. If we can develop a system (a tool) that allows us to
reduce a huge job to a sequence of repetitive operations, we suddenly become
a great deal more efficient. You want a machine to take you long distances?
OK, we'll have a piston that pushes down once, turning the wheel that goes
round, moving you forward. Then the piston goes back up and does it again,
and again, and again. Take the ticket from the customer's hand, tear it in half,
and return the pieces to the customer. Then turn to the next person and do it
again, and again, and again. Hammer the nail into the shingle; then get another
nail and do it again, until the shingle is finished. Then do another shingle, and
another, and another.

Again and again and again; another and another and another. This is the stuff
of productivity: repetition. Repetition allows us to specialize, to learn and hone
our skills. A painter can paint my house faster and better than I can, because
he has painted many houses. An executive can manage a company better than
I can, because he has managed many companies. Experience is nothing more
than the end result of repetition. Repetition doesn't make the wheels of
civilization go round; it is the wheels of civilization going round.

The real value of the computer as a tool, then, will not lie in simple
computations. The real utility of this machine is realized in repetition. If we use
it over and over, we derive the true benefit of the tool. But there are two types
of repetition: manual and automatic. Manual repetition is the repetition of the
human; automatic repetition is the repetition of the machine. Thus, a
carpenter nailing nails is engaging in manual repetition, while an automobile
engine is automatically repetitive. A calculator is used in a manually repetitive
manner. You, the user provide the repetition by using it over and over. The
calculator itself doesn't know or care anything about repetition. But the
computer can engage in automatic repetition. The difference between a
calculator and a computer is the difference between a chisel and a
jackhammer, a rifle and a machine gun, a pen and a printing press.

The key to this automatic repetition with the computer is called the loop.
Looping is the stuff and substance of computing. Any computer program
without a loop is not worth writing or executing. Looping is truly the essence of
computing.

THE INFINITE LOOP


The simplest type of loop is trivially simple to create. You will recall from
Chapter 4 that branching allows you to jump to different points in the
program, to alter the program flow. The only branching cases I discussed in
that chapter were cases of forward branching, in which the program always
flows forward, never folding back on itself. But consider the following fragment
of code:

50 PRINT "Hip, hip, hooray!"


60 GOTO 50

This is the simplest type of backward branching, in which the program flows
back to itself. After it prints "Hip, hip, hooray!", the program moves to line 60,
which directs it back to line 50. This simple loop is called an infinite loop,
because it will continue forever unless you either a) press the BREAK key on
your keyboard or b) reset or turn off the computer. An even more extreme
case of an infinite loop is the following statement:

60 GOTO 60

This statement will execute forever, going to itself but never doing anything. It
is more Sisyphusean than Sisyphus himself; at least he had a boulder to roll
over and over. This statement does nothing over and over.

TERMINATING THE LOOP


Do you remember the story of the Sorcerer's Apprentice? The apprentice,
having overheard certain magic words, orders a broom to bring some water.
This the broom does, but the magic spell has apparently set the broom into an
infinite loop, and the broom continues to bring more and more water, setting
the room awash. The apprentice realizes to his dismay that he doesn't know
how to terminate the loop, so he is unable to stop the broom. There is a lesson
here for the beginning programmer: make sure you know how to terminate a
loop before you start it. The infinite loop is an academically interesting beast,
but it represents a useless extreme. If a computer program with a loop is to
have any practical value, it must be able to terminate the loop. Being able to
start a process is only half a power; being able to stop it is the other half. How
does one stop a loop? There are two fundamental methods: the conditional
termination and the predetermined count.

CONDITIONAL TERMINATION
In this form of looping, the program repeats the process until some condition is
met. In effect, you are telling the computer, "Computer, do this over and over
again until (blank) happens." The condition is any logical expression that can be
evaluated in an IF-THEN statement. Thus, we normally use an IF-THEN
statement to terminate the loop. Here is an example:

50 PRINT "Please input the amount of the next check."


60 PRINT "If no checks are left, input a zero."
70 INPUT AMOUNT
80 BALANCE=BALANCE-AMOUNT
90 IF AMOUNT<>0 THEN GOTO 50

This little loop, which might come from a checkbook balancing program,
repeatedly asks for the amount of the next check until the user enters a value
of 0; then it exits the loop.

We use this type of loop when we don't know how many times we want the
loop to be executed. All we know is that the loop will be executed at least once
and possibly many, many times. We keep executing it until the termination
condition is satisfied.

The general structure of this type of loop is simple; we just put an IF-THEN
statement at the end of the loop that loops us back up to the top of the loop
unless the termination condition is satisfied.

PREDETERMINED COUNT
This is the second major type of loop. We use this loop when we know how
many times we want the loop to execute. It uses a new type of BASIC
command: the FOR-NEXT loop. Here's a simple example:

40 FROGGY=0
50 FOR X=1 TO 10
60 FROGGY=FROGGY+X
70 NEXT X

This bit of code works as follows: first, the computer stores a zero into
FROGGY. Then, when it first encounters line 50, it stores a one into X. Then it
moves on to line 60, where it adds X to FROGGY and stores the result (a one)
into FROGGY. Then it goes to line 70, which tells it to use the "NEXT X". An
awful lot happens in line 70. The next X after 1 is 2, so it stores a 2 into X. Then
it checks to see if X has passed the upper limit of 10 that was imposed in line
50 ("FOR X=1 TO 10"). The value of X is only 2, so it automatically loops back up
to line 50, does nothing there, and proceeds on to line 60. This loop continues
with X taking values of 1, 2, 3, 4, and so on all the way to 10. When it reaches
line 70 after the tenth pass, it goes to the next X, which is 11, and realizes that
11 is greater than the upper limit of 10 imposed in the FOR statement. It
therefore decides to terminate the loop, and proceeds from line 70 to the next
statement in the program.

The general form of a FOR-NEXT loop is as follows: A FOR-statement goes at


the top of the loop, specifying three things: 1) the loop variable (in our
example, X) that will step through all the values one at a time; 2) the initial
value of the loop variable (in our example, 1) that tells the computer what
value to start with for the loop variable; and 3) the final value of the loop
variable (in our example, 10), that specifies the last value that the computer
should use for the loop variable before terminating the loop. After the FOR-
statement comes the body of the loop. We mark the bottom of the loop with a
NEXT-statement.

ARRAYS
You now know how to construct a loop. But there is a difference between
knowing how to operate a tool and knowing how to utilize a tool. In order to
utilize loops, you must understand a variety of other concepts. The first of
these is the array. There are very few cases of loops that do not in some way
use an array. An array is a group of numbers catalogued in a list. If a regular
variable is like a box that holds a number, then an array is like row of boxes
marked #1, #2, #3, and so on. An array allows us to readily deal with a group of
numbers.

For example, hold old are you? That's a number. How old is your brother? Your
sister-in-law? Your cousin? If you were making files on all your relatives, you
might want to make a list of everyone's age. Now, the wrong way to do that
would be to store each age in a separate variable, like this:

10 JOESAGE=33
20 MARYAGE=24
30 GRAMPAGE=62

and so on. If you had 10 relatives, you'd end up with a mess. The right way to
deal with this is to save all the numbers in an array. Then, if Joe is the first
person, and Mary is the second person, and Grampa is the third person, then
you would store Joe's age (33) in the first box in the array, Mary's age (24) in
the second box, and Grampa's age (62) in the third box in the array. The ages
of the other relatives would go into other boxes.

Now, the problem with this system is keeping track of whose age is in which
box. We normally don't think of Grampa as person #3, but that's what we have
to do if we are going to use arrays. This problem can become severe if you
have a big array with, say, 10 numbers in it. Then you get into all sorts of
problems keeping track of who goes where. As it happens, there are a number
of ways to solve this, but we'll get to them later in the book. For now, just think
in terms of a list like this:

# Name Age
1 Joe 33
2 Mary 24
3 Grampa 62

The array of ages is then 33, 24, 62, and so on.

Now for the technical details of implementing an array. To use an array, you
must first notify the computer that you want to use an array. You must do this
before you start to use the array. Normally, notifying the computer of arrays is
something you do right at the beginning of the program. After all, it's only
simple courtesy to notify people up front of the expectations you'll be placing
on them, and the same thing goes for computers. You notify the computer
with the command DIM, which is short for DIMENSION. Why they gave it a
dumb name like DIMENSION, I'll never know, but we're stuck with it now. After
you say DIM, you list the names and lengths of the arrays that you intend to be
using in the program. A typical DIM-statement might look like this:

10 DIM AGES(10)
This statement tells the computer that you are going to be using an array that
you will call AGES. You want 10 boxes in this array to hold 10 numbers. Now, if
you ask for 10 boxes, you'll get 10 boxes. This is important, because if you try
to use 11 boxes, the program will not work. Don't try to overcompensate by
asking for, say, a million boxes. Those boxes come out of the memory that your
computer has. Remember when you bought the computer, and the salesman
yakked on and on about how it had "512K RAM", or something like that? Well,
that's how much memory your computer has, and, no matter how much you
have, it never seems to be enough. So don't waste your precious RAM by
asking for more boxes than you need!

Once you have notified the computer with your DIM-statement, you are free
to use the array. You treat the array in the same way that you treat a regular
variable, with two exceptions. First, you can only handle a single number in the
array at a time. That is, you can't can't tell the computer to, say, multiply the
array by 5. How would you feel if somebody walked up to you, handed you a
sheet of paper filled with numbers, and said, "Multiply this by 5." ? What does
that mean? Multiply each one by 5? Add them all up and multiply the sum by
5? Who knows? So our program never treats the array as a lump; it must
instead refer to individual numbers in the array.

The second difference between an array and a regular variable is that you must
specify which element of the array you want to work with. If I have an array
called AGE, I never write a command like this:

40 AGE=5*FROGGY

The problem with this command is that it doesn't tell the computer which
number in the array is equal to 5*FROGGY. So, to tell the computer which
number you want, you use parentheses, like this:

40 AGE(2)=5*FROGGY

This command makes sense; it tells the computer to put a value of 5*FROGGY
into the second box in the array. Whenever you want to refer to one of the
boxes in an array, you specify the name of the array, then an open parenthesis,
then the number of the box, then a close parenthesis.

Of course, there is nothing that says that the number of the box has to be
specified as a constant. That is, you don't have to always say things like AGE(2),
AGE(5), or AGE(9). If you want, you can use any arithmetic expression inside
those parentheses. You could say this, for example:

30 BIRDIE=2
40 AGE(BIRDIE)=5*FROGGY

These two commands will do exactly the same thing that the earlier example
does; they will store a value of 5*FROGGY into the second box in the array.

You can even get snazzy with this, if you want:

40 AGE(3*BIRDIE+2)=5*FROGGY

This command will store a value of 5*FROGGY into whatever box is specified
by the expression 3*BIRDIE+2. Of course, if BIRDIE is, say, 3, then 3*BIRDIE+2
will be 11, and the computer will try to store 5*FROGGY into the 11th box,
which isn't there, because we told the computer to make 10 boxes, and the
computer will get confused, and that's not good. So you have to be careful
when you write snazzy commands like the one above.

ARRAYS AND LOOPS


The real value of arrays comes when you use them in loops; in fact, arrays and
loops go hand in hand &emdash; they were made for each other. Here is a very
simple example of a loop using an array:

50 FOR X=1 TO 10
60 AGE(X)=0
70 NEXT X

This little loop does nothing more than store a zero into each box in the array.
This is a very common function, called initialization. You see, when you create
an array with a DIM-statement, the computer sets aside some memory for the
array. Odds are that memory was used for something else earlier, and so
contains numbers produced by previous computations. Thus, when you first
create an array, it already has all sorts of meaningless numbers in its boxes.
You've got to clear out those boxes so that you don't insult Auntie Millie by
listing her age as 233. A loop like the above example will do that quite nicely.

Of course, setting all the ages to zero like the example may erase the
chalkboard, but it doesn't put any real numbers into the array. How do you do
that? Here's an example:

80 FOR X=1 TO 10
90 PRINT "Please input the next person's age."
100 INPUT Y
110 AGE(X)=Y
120 NEXT X

This little loop will ask the user to type in the age of each person on the list. It
will then store that age into the AGE array.

Now that all the ages are properly stored in the AGE array, let's see how arrays
and loops can work together. A good exercise is to find the largest number in
an array. In other words, let's have the computer find the oldest person in our
array.

In order to do this, we must first design our algorithm. An algorithm is just a


strategy for solving a problem. It is a set of steps that we know will lead to a
solution. An algorithm is like a soft, fuzzy version of a program. The algorithm
expresses the general idea of the program; the program translates the
algorithm into specific commands. You use an algorithm when you give
somebody directions to your home ("Go to the intersection of Elm and Hazel;
turn right onto Hazel and follow it to the third stoplight. . ."). You are not
specifying the actual commands necessary to get to your home, such as "Turn
the steering wheel right 90 degrees, press on the accelerator. . ." You are
describing the process in general terms. That's what an algorithm is: a general,
non-specific description of a procedure. Algorithms are a useful way to figure
out a problem without plunging into the picky details of writing a program.
Once you've got your general strategy figured out, it's a lot easier to translate
the algorithm into a program than to go directly from your problem to the
program.

Our first task is to define what we are trying to accomplish. We want to find
not one but two numbers: we want to know what the oldest age is and who
has that oldest age. In programming terms, we need the highest array value
and the array number of that value.

Our algorithm for doing finding these numbers will be as follows: We'll start off
by telling ourselves that the record-holding oldest age is 0 years old; that's so
young that we are guaranteed that everyone will break that record. We will
also set the record-holder as 0, which is to say, nobody. Then we will sweep
through the array, picking each age in turn. When we pick an age, we ask, "Is
this age older than our world-record age?" If it isn't, we should skip on to the
next age, but if it is, we have a new record, so store this age into the world-
record age, and store its array number ("This is the 3rd number in the array")
into the record-holder. After we have done this for all the people in the array,
we will print out our answer.

Now our only task is to translate the algorithm into some BASIC commands.
Read this code carefully and verify for yourself that the code implements the
algorithm:

130 OLDESTAGE=0
140 WHOISOLDEST=0
150 FOR X=1 TO 10
160 IF AGE(X)<OLDESTAGE THEN GOTO 190
170 OLDESTAGE=AGE(X)
180 WHOISOLDEST=X
190 NEXT X
200 PRINT "The oldest person is person number "; WHOISOLDEST
210 PRINT "This person is ";OLDESTAGE;" years old."

What is the significance of this little program? If you were blessed with a
skeptical attitude, you might argue that this whole thing is a waste of time,
because you could search through a list of ten numbers and find the highest
number in a few seconds. But consider how easily this program could be
changed. What if, instead of ten numbers, there were a thousand numbers? To
do that by hand would be a great deal more work, and you might make a
mistake, but to change the loop to handle a thousand numbers, we need only
change line 150 so that the 10 is 1000. It's that simple. (Of course, somebody
would still have to type in the thousand numbers.)

There are plenty of other things we could do. Do you want the youngest
person instead of the oldest? Just change line 130 to OLDESTAGE=999 and line
160 to IF AGE(X)>OLDESTAGE THEN GOTO 190. Suppose you wanted to find
anybody between 40 and 50 years old. Then you could change the loop to test
for that condition ("IF (AGE(X)>=40) AND (AGE(X)<=50). . .") and print out a
message if the condition was satisfied. For a real challenge, you could sort the
array, listing the oldest person, then the next oldest, then the next, all the way
to the youngest person. But that is too advanced a topic for this book.

CONCLUSIONS
The loop on the computer is closely analogous to the moveable type on the
printing press. The invention of moveable type was profoundly important
because it allowed each type character to be used over and over again, in
different words on different places on the page. This made possible the
printing press. At the same time, moveable type imposed severe constraints on
the creative freedom of the author. The characters could only fit into the press
in a defined fashion; no longer could the author scatter words across the page
in any fashion that struck his fancy. The use of illustrations was severely
curtailed, and color was impossible to print. Despite these restrictions, the
printing press is ranked as one of the most important inventions of human
history.

In the same way, the loop is the critical concept that makes the computer a
useful, practical tool. The computer imposes constraints on our organization of
data just as the early printing presses constrained the presentation of
information on the printed page. And the computer may well be another
landmark in the history of civilization. But there is one big difference between
the printing press and the computer: the computer is accessible to everyone,
not just a few experts. You can program this machine; you can guide it where
you will. What will you print with your new press, Herr Gutenberg?

Chapter Six

Bits, Bytes, And Bureaucracies

You may have noticed that the programming examples I have used have been
getting larger and larger with the passing chapters. This is partly because the
ideas I have been presenting have been getting more and more involved,
requiring larger and more involved examples. You may also have guessed that
real programs must be larger than the examples I am giving, and indeed they
are. The size of a program is often measured by the number of lines of code
written by the programmer. By this way of measuring, my examples in the last
chapter were 5-line or 6-line programs. Real programs run considerably larger.
Most of my computer games come out at around 10,000 lines of code, and
that doesn't include any of the graphics or sound!

What is staggering about so large a program is not the sheer amount of code
itself so much as the complexity represented by all that code. A program is not
an inert mass of information like a book. Word for word, character for
character typed into the computer, a program is a far more complex effort
than a book. This is because the words in a book, in comparison to the words
in a computer program, are pretty much a loosely connected jumble. The
words I chose to use in the last chapter have very little impact on the words I
choose for this chapter.

A computer program, by contrast, is an immensely more demanding creature.


It acts like a gigantic engine, with thousands of gears and wheels and pulleys,
all packed into a very small space, everything very tightly connected. The
overwhelming complexity of a huge program is enough to try the courage of
any programmer. How can one person, or even a group of people, possibly
keep track of this maze of interconnections and relationships?

ANCIENT SOLUTIONS
The problem we face here is not a new one. The creation and maintenance of
complex structures has plagued civilization since its earliest days, for a
civilization is itself a complex structure requiring maintenance. The task that
falls on any government &emdash; to regulate commerce, collect taxes,
adjudicate disputes &emdash; is as complex as the devious ways of its many
citizens. The first civilization to develop effective techniques for dealing with
these problems was Rome.

What was the source of Roman power? How were the Romans able to first
create and then maintain an empire over a span of nearly two thousand years?
Historians cite many factors, but a crucial factor often underestimated by the
layman is the role of the Roman bureaucracy. We normally think of Roman
legions marching across Europe, conquering everything in sight, but a much
more important factor in Roman success was the mousy bureaucrat following
in the wake of the legion, papyri in hand. Rome did not invent bureaucracy, but
the Romans refined and developed the art of bureaucracy far beyond anything
the world had known. Roman administrative skills made it possible to raise,
equip, and train the legions that conquered the territories; these same skills
insured that the conquered lands were smoothly and efficiently governed. A
newly-won territory quickly became a prosperous component of the Empire
rather than a poverty-stricken and sullen vassal. Throughout the Empire, a
large and efficient bureaucracy coordinated the flow of goods and people, and
brought peace and prosperity to a larger area, for a longer time, than the
world has known before or since. Such is the power of bureaucracy.
Exactly what is a bureaucracy? Three primary elements determine the form of
a bureaucracy. The bureaus themselves constitute the first element. A bureau
is a group of people performing a function. A bureau can be a small, one-
person operation, or it can be as large as the Department of Defense. The
second element is the assignment of functions to bureaus. Each bureau is
responsible for a single function, be it broad or narrow. Each function is
assigned to a single bureau. The third element is the set of communications
procedures within the bureaucracy. The various bureaus must coordinate their
actions; to do this requires a clear and simple communication system for
transmitting work orders.

These three elements characterize a bureaucracy, but they do not explain its
strengths. Why does a bureaucracy work? What is the source of its ability to
handle complex problems?

MODULARITY
One strength of the bureaucracy is its modularity. The bureaucracy is broken
up into discrete chunks that are much easier to understand. Consider, for
example, the United States government. What is it? Well, we could start off by
breaking it into three chunks, the legislative, the executive, and the judicial.
Each of those three chunks includes within it a great many people. If you
wanted more detail, we could break the executive branch into the various
departments (State, Defense, Commerce, Labor, etc). We could then break one
of the departments down into its subcomponents, going down further and
further. Each module within the structure can be broken down to smaller
components, and the modules can be reassembled to form the whole. This
breaking down and putting together is one of the "big ideas" of Western
civilization. It parades under the name analysis and synthesis. It is the basis for
many of our civilization's achievements. The bureaucracy is an example of
analysis and synthesis applied to large organizations. Take all the problems
that we require the US government to handle; break them down into
components, assigning each component to a bureau. If a component is itself
too large to digest, break it down further into sub-components. Continue this
process of breaking down into subcomponents as necessary. Once the
problems have been broken apart and assigned, allow each bureau to tackle its
small problem, then put the pieces together. The result? A Social Security
program, an environmental protection policy, or an MX missile.

Analysis and synthesis appears in many other areas. It is fundamental to


scientific inquiry. The scientist approaches a complex and little-understood
phenomenon and starts by breaking it down into its component aspects,
identifying those aspects that can be explained with existing theory and
isolating the aspect that represents a mystery. This makes it possible to focus
intense attention on the single mysterious item. Once the core problem has
been cracked, the components can be reassembled to produce a new theory of
stellar evolution, a new chemical, or a cure for cancer.

An engineer follows the same pattern in designing a machine. Break the


problem up into components. Put one team of engineers on the carburation
system. Have another team tackle the suspension, while a third can worry
about engine cooling. Send them off on their respective tasks; when they are
done, assemble their work into a new car.

THE INTELLIGENT HAMMER


A crucial requirement for successful analysis and synthesis is that the problem
be broken up in an intelligent manner. If one attempts to subdivide a problem
the way one partitions a vase with a hammer, one gets only a shattered mess.
The hidden skill in successful analysis and synthesis is the ability to see clean,
natural ways to subdivide the problem. And the basis for clean, natural
subdivision, the key criterion, is the simplicity of interaction between the
modules.

A problem in analysis and synthesis is essentially a problem of untangling.


Suppose that I constructed a tangle of balls connected by springs. Some balls
might have many springs attached to them, while other balls might have only
one or two springs. How would you go about untangling this mess? If you
studied it, you would undoubtedly find at least one group of balls that was
tightly interconnected with lots of springs, but connected to other groups of
balls by only a single spring. This would form the basis of your untangling
effort. You would begin by separating the first group from the main mass. As
you pick through the tangle, you would search for easily-separated groups. In
short, you would analyze the tangle on the basis of the lowest interaction
between groups.

This is the key idea to intelligent analysis. One must scan the problem, looking
for patterns that break it up into modules that interact with each other in the
simplest way. If each module has but one simple interaction with all other
modules, then the situation is highly modular and ideal for analysis and
synthesis. If some modules have multiple interactions with other modules,
then the situation is less modular and will prove more difficult to handle.
An example is in order. Let's say that you are a manager in a large corporation
and are about to hire a new employee. You have interviewed a number of
candidates and have made your decision. To implement it, you merely send a
memo to the Personnel Department listing four items: the candidate's name,
the date that this person will start work, the salary to be offered, and the
personnel requisition under which the candidate is being hired. The Personnel
Department will take care of all the details: notifying the candidate of the job
offer, obtaining the candidate's Social Security number, home address,
telephone number, filling out all the forms for the government, opening a
personnel file on the candidate, and all the myriad other tasks that are
required for employment in a large corporation. Your interaction with
Personnel is small and simple: only four items of information are required from
you. Yet, those four pieces of information trigger a great deal of work inside
Personnel. In short, there are few springs between you and Personnel, and
many springs inside Personnel. That's a highly modular situation.

Just for laughs, let us consider a situation with very low modularity. Suppose,
for example, that you were responsible for notifying the government of the
candidate's pay, but Personnel was responsible for notifying the government
of the candidate's claimed deductions. Then both you and Personnel would
have to obtain the candidate's name and Social Security number, and probably
an internal employee number. You would need to check your information with
Personnel, and they would need to check their information with you, and you
would both need to check your information with the candidate. There is plenty
of opportunity for a snafu here, with slightly different or inconsistent
information being reported to the government. In terms of my tangled springs
analogy, this situation has lots of springs running between you and the
government, you and Personnel, and you and the candidate. A messy, tangled
situation like this emphasized the essence of good modularity: lots of internal
communication within modules, the absolute minimum of external
communication between modules.

There is one other benefit of the highly modular environment: once you have
shot off your message to another bureau, you can forget about it. Personnel
has their little form, number P-503, that you fill out and send off to them. If
you fill it out properly, you need not worry about anything else. They'll take
care of all the little details. Indeed, they are probably taking care of details that
you are completely unaware of &emdash; new government regulations about
hiring, or whatever. Once a module, or bureau, or engine subassembly has
been set up and its inputs determined, you treat it as a black box whose
internal workings are of no concern. In future decision-making, you merely tell
yourself, "So long as I ship the right inputs, or forms, or whatever, to that
module, it will spew out the results I need." It simplifies your thinking.

MODULARITY IN COMPUTERS: SUBROUTINES


So what does all this have to do with computers? As it happens, the concepts
of modularity, analysis and synthesis, and clear communications procedures
are built into computer programming languages. Indeed, they are expressed
with pristine clarity in the concept of the subroutine. The subroutine is one of
the simplest and subtlest ideas in all of computing. It is trivially simple to
implement, yet very difficult to master. If you think of it as a bureau within a
bureaucracy, the idea will come more easily, and after you have worked with
it, it will help you understand bureaucracies and analysis and synthesis better.

A subroutine is a small section of a program. It can be anything &emdash;


loops, IF-THEN statements, INPUT statements. Anything that you can put into a
regular section of program, you can put into a subroutine. There are only two
rules about subroutines: first, a subroutine is a closed module. You should not
jump into or out of the subroutine halfway through. Second, a subroutine is
terminated by a new type of statement: the RETURN statement.

Time for an example. Suppose that you have a program in which it is necessary
to get input from the keyboard several times during the course of the
program's execution. Suppose further that the input must be conditioned. That
is, for some reason, you want to make sure that the right numbers are typed
in. Suppose, for example, that the program analyzes different test scores, and
all the test scores are between 0 and 100. You could just hope that the user
would always type the numbers in correctly, but if you are a careful
programmer, you would anticipate the likelihood of somebody typing in crazy
test scores by mistake, scores like 537 or -33. You want to make certain that all
the test scores are reasonable. You could write some code to check for this:

60 INPUT SCORE
70 IF (SCORE >= 0) AND (SCORE <= 100) THEN GOTO 110
80 PRINT "You typed in ";SCORE
90 PRINT "That number is wrong. Please try again."
100 GOTO 60

This little bit of code insures that SCORE will always be between 0 and 100,
even if the user makes a mistake. Now, you could type this code in every single
time your program needed another score. But a much easier way to handle the
problem would be to make a subroutine out of it. The subroutine might look
like this:

3000 INPUT SCORE


3010 IF (SCORE >= 0) AND (SCORE <= 100) THEN GOTO 3050
3020 PRINT "You typed in ";SCORE
3030 PRINT "That number is wrong. Please try again."
3040 GOTO 3000
3050 RETURN

The only difference between this subroutine and the earlier bit of code is that
the numbering is different and the subroutine ends with a RETURN statement.
To use this subroutine, your program need only say "GOSUB 3000". The GOSUB
statement is like a GOTO with a memory. It means, "Computer, GOTO this line
number, but remember the line you're on right now." The RETURN statement
reverses the process; it says, "Computer, remember the line number you came
from? Well, GOTO that line number."

The advantage of this system is that you can call this subroutine from any part
of the program. Consider this example:

120 GOSUB 3000


130 TEST=TEST+SCORE
140 GOSUB 3000
150 GRADE=GRADE+SCORE

When the computer reaches line 120, it goes off to subroutine 3000. That
subroutine will RETURN to line 120. Later on, line 140 will go to subroutine
3000, and the subroutine will then RETURN to line 140. You can call subroutine
3000 from any part of the program without the computer losing track of where
you are. A GOSUB call is rather like telling the computer, "Computer, go off and
do this chunk of work, then come back when you're done."

SUBROUTINES AS BUREAUCRACIES
Subroutines very precisely express the three primary elements earlier
associated with bureaucracies: bureaus, assignment of functions, and
communications between bureaus. The subroutine itself is a bureau. It may
not have any bureaucrats to handle its functions, but it doesn't need any; its
commands take care of its operations. Indeed, the subroutine is a very precise
bureau: one knows exactly what it does. None of this ambiguous "Bureau of
Assorted Functions Support (BAFS)" nonsense that we so often see with
modern bureaucracies. The subroutine executes a precise function specified in
its code. And the competent programmer has no qualms about rearranging or
eliminating a subroutine that is no longer needed.

The second element of a bureaucracy is the assignment of functions to


bureaus, and again we see the concept expressed very clearly with the
subroutine. You use a subroutine to execute a particular function. If you need
input conditioning, just use the sample subroutine presented earlier. That's the
one and only place you need go for input conditioning. If your needs for input
conditioning change, then change the input conditioning subroutine. Certainly
makes life easy, doesn't it?

One of the more abstruse concepts associated with subroutines is the


generality with which functions are assigned to subroutines. The subroutine
example given above is only capable of handling inputs that should fall
between 0 and 100. But what if another portion of your program needs inputs
between 100 and 200? You would like to have input conditioning for this part
of the program, too, but do you need to write another subroutine? Not if you
rewrite the first subroutine to be more general. One way to do this is as
follows:

3000 INPUT SCORE


3010 IF (SCORE >= LOWER) AND (SCORE <= HIGHER) THEN GOTO 3050
3020 PRINT "You typed in ";SCORE
3030 PRINT "That number is wrong. Please try again."
3040 GOTO 3000
3050 RETURN

The difference between this subroutine and the earlier one is in line 3010; the
constants 0 and 100 have been replaced with variables LOWER and HIGHER.
You would now call this subroutine with the following sequence:

116 LOWER=0
118 HIGHER=100
120 GOSUB 3000
.
.
.
226 LOWER=100
228 HIGHER=200
230 GOSUB 3000

Now the subroutine is able to handle a wider range of functions. However,


there is a price we pay for this greater generality: we must now specify the
values of LOWER and HIGHER before we use the subroutine.

The third element of a bureaucracy is the set of communications procedures


between bureaus. This concept is particularly well-developed in computer
programming. In fact, it has its own special term: parameter passing. In a
bureacracy, you send all manner of messages: letters, memos, work orders,
and so forth. But in a computer, you only send numbers. The numbers that you
send to a subroutine to tell it what to do are called parameters; the act of
sending them is called parameter passing. The reason we call it parameter
passing instead of parameter sending is that parameters can be both sent and
received. In our example subroutine, the parameter SCORE is passed back from
the subroutine to the calling statement in the main program.

Actually, BASIC uses a very poor method for passing parameters. The numbers
that are passed back and forth between subroutines are always global
variables. A global variable is a variable that is used throughout the program.
The opposite of a global variable is a local variable, a variable that is used in
only one subroutine. Imagine a bureaucracy that had no paper, only a gigantic
blackboard and a bunch of telescopes, one telescope for each bureaucrat.
Suppose then that bureaus communicated with each other not by sending
memos back and forth, but rather by writing messages onto the blackboard.
Everybody would then read the same blackboard, looking for the messages
that concerned them.

BASIC works the same way. All the variables in the program go onto one big
blackboard. When our example subroutine had the properly conditioned score
to pass back to the calling statement, it wrote the value onto the blackboard in
the slot for the global variable SCORE. The calling statement then read the
blackboard to find the value of SCORE. The system is very simple, but it can be
clumsy when you want to pass lots of parameters. Suppose, for example, that
you had a subroutine at line 5000 that needed three variables (V1,V2, and V3)
as input parameters and produced another three variables (W1, W2, and W3)
as output parameters. Then to call that subroutine you would have to write
this much code:

170 V1=27
180 V2=158
190 V3=-9
200 GOSUB 5000
210 SCORE=W1
220 GRADE=W2
230 FINAL=W3

All this work just to talk to the subroutine! What a waste of time! As it
happens, there is a much better way that some advanced BASICs and many
other languages use: it's called a parameter list. When you use a language with
parameter lists, you simply list all of the parameters in parentheses right after
the subroutine call. Such a subroutine call with the above example might look
something like this:

200 GOSUB 5000(27,158,-9,SCORE,GRADE,FINAL)

That's much simpler, isn't it? Unfortunately, the odds are that your version of
BASIC doesn't have this, so you will have to use the old blackboard method.
Don't despair; it is perfectly serviceable, just a little clumsy with some
subroutines.

It is interesting to note that one of the most common bugs in any program is
the failure to pass parameters properly. Suppose, for example, that you had a
subroutine that needed those three global variables V1, V2, and V3 as inputs.
Suppose also that you used it a little earlier in the program and that time, you
gave V1 a value of 33. A little while later, you decide to call the subroutine, but
you forget to give V1 a new value appropriate to the situation. When the
computer GOSUBs to the subroutine, the subroutine looks at the blackboard
through its telescope in the slot marked "V1" and it sees the same old value,
33. It goes ahead and does its job using that number. Of course, that's an old
number, and it's all wrong, so the subroutine gives you bad outputs. You get
mad and try to figure out how that stupid subroutine fouled up, and you can't
find anything wrong with it. The problem is not with the subroutine itself but
with the parameters you passed to it.

The same thing happens with bureaucracies. We goof and send the wrong
parameters to the office across the street; they do their duty and get it wrong.
Then we yell and scream over the phone at these idiots who screwed
everything up. Eventually we find out what really happened, croak a thin little
"Oops", and crawl into a hole. At least the computer doesn't have a sense of
righteous anger.

PERFORMANCE ADVANTAGES AND DISADVANTAGES


The alternative to a subroutine is called in-line code. In-line code is merely the
same code as the subroutine, put in place of the subroutine call. In-line code is
like having your own little bureaus within your organization, rather like having
your own little Personnel department or Purchasing Office inside your
department. The two are functionally identical, but differ somewhat in terms
of performance attributes.

The subroutine is always slower than the in-line code. There are two reasons
for this. First, there is a time penalty paid just for talking with the subroutine. It
takes time for the computer to make a note of where it is when it encounters a
GOSUB statement. It takes more time for the computer to look up the line
from which it came when it reaches the RETURN statement. These time
penalties, although small, are unavoidable and have nothing to do with the
nature of the subroutine. Moreover, subroutines tend to be generalized where
in-line code is customized. If, for example, a particular subroutine is meant to
handle five different kinds of input conditioning, then when it comes time to
handle any one of those five, it will surely waste a little time handling
computations not appropriate to that one situation.

Bureaucracies are the same way. You always pay a time penalty just sending
the forms through the inter-office mail. Just getting somebody else to pay
attention to your problem takes a little time. And there is the same time
penalty associated with generality. When you want to buy a large expensive
computer, and you reach the place on the form that asks "Quantity", you are
wasting time filling out "One". Any reasonable person would know that you
don't go around buying multimillion dollar computers by the gross. But this
form is meant to work for big computers and little calculators, for company
cars and paper clips, so we use it and pay the time penalty.

The time penalty of subroutines is counterbalanced by their resource-


efficiency. When you use a subroutine, you write the code just once; when you
use in-line code, you write it each time you use it. When you consider that
programs take up scarce RAM space, you realize that subroutines can save you
enough RAM to make the time penalty worthwhile.
Again, bureaucracies are the same way. Having your own Purchasing may be
faster than going through Corporate Purchasing, but can your company afford
the extra expense of your own Purchasing staff? It's a trade-off between speed
and efficiency.

The ideal use of a subroutine comes when it is called occasionally from many
different statements in the program. The worst possible use of a subroutine
arises when it is called many times (by means of a loop) from a single
statement only. In this case, we pay the time penalty each time we use the
subroutine, but we enjoy no savings in RAM whatsoever. This situation is
analogous to having a hypothetical "Department of Personnel Telephone
Answering". Such a department would provide a service to only a single
bureau, and would be called on many times a day. Thus, Personnel would pay
the time penalty but achieve no resource efficiency. Better to integrate that
operation into Personnel.

The ideal subroutine situation is analogous to a Personnel department within


an organization. It is called by nearly every bureau in the organization, for
everyone needs to hire a new employee occasionally, but it is called few times
by each bureau, because few departments hire en masse. That's why so many
organizations quickly sprout Personnel departments.

The real advantage of subroutines, though, arises from their modularity.


Subroutines help you organize your program into clean, understandable
modules. They make it easy to see the organization of the program. In a well-
written program, you can always see exactly where to go to get any job done.
Similarly, in a well-organized bureaucracy, you can find exactly the right bureau
to solve your problem. As you organize your program, you should ask yourself,
"What kind of bureaucracy am I creating here? Is this a clean, understandable
bureaucracy, or is it a messy, snafu-prone one?" Unfortunately, when you
encounter a problem-ridden bureaucracy, matters are not so simple. You can't
simply press RESET and start all over. Too bad.

Chapter Seven

From Data to Information

NUMBERS AND MEANING


If you could look into the heart of a computer, you would find no spreadsheets,
no programs, no words to process, no aliens to blast. All you would find are
numbers, thousands and thousands of numbers. The fundamental
measurement of a computer's power is its storage capacity for numbers
&emdash; typically 512 thousand numbers on a personal computer. With
these numbers, the computer is capable of only a very small number of
manipulations. It can move them, add, subtract, compare, and perform simple
logical operations known as Boolean operations. Where in this mass of
numbers and simple manipulations is meaning? How can the computer
transform all these numbers into words to process, alien invaders, or
programs?

Consider atoms. Simple things, atoms. They can interact with each other
according to the laws of chemistry. There are lots of combinations there, but
little in the way of meaningful interaction. Yet, put enough atoms together and
you get a human being, a person with character, feelings, and ideas. If you look
deep inside a human being, all you will find are lots and lots of chemical
reactions. Meaning does not come from its smallest components, but from the
way that they are organized and the context in which they are used.

Data is what the computer stores, but information is what we seek to


manipulate when we use the computer. The key word in understanding the
difference between data and information is context. Data plus context gives
information. This is a fundamental aspect of all communication systems, but it
is most clearly present in the computer. The computer stores only numbers,
but those numbers can represent many things, depending on the context.

NUMERIC DATA
They can, of course, represent numbers with values, things like a bank balance,
or a score on a test, or somebody's weight. Even then, these numbers are not
without a context of their own. First, they have dimensions, the units with
which they are measured. We don't say only that my weight is 110 &emdash; it
is 110 pounds. The number 110 all by itself doesn't mean anything; you have to
include the unit of measure to give it a context to make it meaningful.
Similarly, my bank balance of 27 makes no sense until I specify whether it is 27
dollars, 27 cents, 27 pesos, or whatever it is.

There is another context to consider when using the computer. It recognizes


only one kind of number: the 16-bit integer. This is a number ranging from 0 to
32,767, with no fractions or decimal points. In other words, the computer can
count like so: 0, 1, 2, 3, 4, . . . 32,765, 32,766, 32,767. It cannot recognize a
number bigger than 32,767. When it reaches 32,767, the next number is just 0;
it starts all over again. Now, you might wonder what use there is in a computer
that can only recognize the first 32,768 numbers in the whole universe. Well,
there's a trick that programmers learned long ago. You can combine little
numbers to make big numbers. Actually, we do it all the time. If you think
about it, you only know ten numbers yourself. Those ten numbers are 0, 1, 2,
3, 4, 5, 6, 7, 8, and 9. You think you know more? Look closely at the next
number, 10. It's nothing but a 1 followed by a 0. There's nothing new or
different about the number 10; it's just two old numbers stuck together!

Of course, you know perfectly well that what makes 10 different from 1 or 0 is
manner in which you interpret it. The number 10 has a context of its own. We
think in terms of "the tens place" and "the ones place", and so we interpret 10
as "1 in the tens place plus 0 in the ones place." Using this system, we can build
any number we want. The only price we pay is that we have to write lots of
digits to express big numbers.

The programmer's trick is to do the same thing with the computer. If you stick
together 8-bit bytes, you can get bigger numbers. With the computer, you pay
two prices to get bigger numbers: first, it takes one 16-bit word for each part
of the number that you add, and second, it takes more computer time to
manipulate such bigger numbers. There is also the restriction of context: you
have to remember that the numbers in such a compound number belong
together and must be taken as a group, rather than individually.

It is even possible to group these 16-bit numbers together in such a way as to


interpret the group as what is called a floating-point number. This has nothing
to do with water or boats; it is a number whose decimal point is free to move
around ("float" &emdash; get it?) within the number. The idea sounds weird
until you see some examples:

Floating point numbers Integers

12.36835418 127
17,893.35 94,366
.00231 90
-451.0 -451

As you may have guessed, all floating point numbers have a decimal point. The
big question about any floating point number you have is, how many
significant figures does it have? Let me show you an example, using the value
of π.

3.14159265358979323
3.1415926536
3.1416
3.14
3

The first value gives π to 18 significant figures. The second value gives π to 11
significant figures. The third gives it to only 5; the fourth gives 3; and the fifth
gives only one. Each number is correct to within its number of significant
figures; it is rounded off from the previous one. A lot of people make the
mistake of assuming inappropriate zeros. For example, that last value of π, 3, is
it a 3 or a 3.0 or a 3.000 or a 3.0000000000? Many people think that 3 is the
same thing as 3.000000000, but it isn't. The next digit of π after the 3 should
be a 1, but we rounded it off when we went down to only one significant
figure. So, if you were trying to reconstruct the value of π after I gave you only
a 3, you would be wrong to put a 0 after the 3 to make it a 3.0. In other words,
3 is not the same as 3.0. If you want to say 3.0, say it; if I say 3, don't read it as
3.0, because it isn't. It could be 3.1, or 2.9, or anything between 2.5000000 and
3.4999999.

The meaning of significant figures is that they show us the limitations of


computers and arithmetic. Remember, each significant figure costs you some
RAM space and some execution time. For this reason, some computers use
only 4 bytes to save a floating-point number; others may use 8 or even more
bytes. A floating-point number expressed with 4 bytes has about 7 significant
figures; thus, you could express ¹ this accurately with such a computer:

π = 3.141593

This is fairly accurate for most purposes. But now we come to a nasty trick that
trips up lots and lots of people. Suppose I divide 1 by 3. That should yield the
fraction 1/3rd, whose decimal value is .333333 . ., with the 3's repeating
forever. Now, when I do this division on my computer equipped with 4-byte
floating point arithmetic, it will report the result as .3333333, with 7 significant
figures of 3's, but not an infinite number. The difference between the
computer's answer (.3333333) and the correct answer (.3333333. . . .) is small
(about one part in a million), but the fact remains that the computer is wrong.
Now, this discovery tends to upset some people. They think that computers
are always right, that they can make no mistakes, especially with arithmetic,
yet here is incontrovertable proof that the computer is wrong. This really
rattles their cage.

The problem is not that the computer is mistaken, or that it is stupid and
cannot perform arithmetic. The problem is that there is no mathematical way
to correctly express the value of 1/3rd with a finite number of significant
figures. There isn't enough room to be accurate in so small a space. Suppose,
for example, that you had a brilliant plan to solve, say, the problem of the
American budget deficit. You had figured out a detailed plan that included all
the critical factors for eliminating the budget deficit without wiping out the
economy. I then gave you one piece of paper and a crayon and told you, "You
think you're so smart, put your plan on that paper with that crayon." You may
have the answer, but if you don't enough room to say it, you come out looking
pretty stupid. The same thing goes with the computer: with anything less than
an infinite number of significant digits, the computer will sometimes be wrong
by a tiny amount.

This problem is so common that it has a name: round-off error. We call it that
because the computer rounds off numbers to make them fit into its floating-
point format, and in the process, it can round off some of the accuracy of the
number. In some cases, it can completely wipe out your number. For example,
suppose as part of your plan to solve the deficit, you had developed a
computer program to figure out how much money to allocate each part of the
Federal budget. Let's say that you had even figured the amount of money to go
for buying file folders at the White House. Let's say that you figured $23.57 a
year would be a good figure. Now suppose you have a "bottom line" routine
that adds up all the expenditures of the budget to see what the grand total is.
Remember, we're talking hundreds of billions of dollars here. Let's say that the
grand total is about $300 billion dollars by the time the program gets around to
adding in your figure for file folders. Let's say the program statement looks like
this:

8230 TOTAL=TOTAL+WHFFOLDERS

Now, the computer will add the numbers like this:

312,237,300.00
23.57
312,237,300.00

If you count digits, you will see that the computer's seven significant digits are
used up on the high part of the number; the 2 in 23.57 is in the eighth
significant digit place, and so it is rounded off &emdash; right out of existence!
It's as if the $23.57 never existed. Your program would produce unreliable
results, and you would think that it had a very mysterious bug. In truth, this is
one of the natural limitations of the computer. The moral of this story is, if you
want the computer to use great big numbers next to little bitty numbers, you
need lots of significant digits, which will take more space and run more slowly.
Accuracy truly does have its price.

ALPHANUMERIC DATA
Numbers can mean more than just values. They can also be used to mean
alphanumeric characters. These are just letters and symbols like "a", "(", or
"%". The system for using them is very simple; it uses a code called ASCII
(pronounced "ass-key"), an acronym meaning "American Standard Code for
Information Interchange." This code assigns a number to every character.
Perhaps you used a code like that when you were a kid. A 1 stood for the letter
A, a 5 stood for the letter E, and so forth. This code is similar, but its purpose is
not to hide messages but to make them understandable to the computer,
which, after all, only understands numbers. Another difference is that the
letter A does not get a 1, but a 65, while B gets 66, C gets 67, and so forth.
Every letter and symbol gets its own number. The reason why A starts at 65 is a
bit of technical trivia with which I won't waste your time.

With this one code you can store text messages inside the computer. To use it,
you convert a character to a number using the ASCII code and store the
number in the computer. To read it out, just convert back. Lo and behold,
almost all versions of BASIC will do this automatically for you with a facility
called "string data". A string is a collection of numbers that are always treated
in the context of ASCII code conversion. You can always treat a string as a
collection of characters, even though it's really a collection of numbers. Using
strings from BASIC is very simple. Here's a simple example:

50 NAME$="CHRIS"
60 PRINT NAME$

There are only two syntax rules to note about this construction. First, a string is
always indicated by a "$" symbol at the end of the variable name. That tips off
the computer that you want this data treated as a string. Second, the string
data should be placed inside a pair of double quotation marks.

I cannot tell you much more about string handling because different
computers handle strings differently. Some allow you extensive facilities for
manipulating strings, allowing you to join strings, extract a portion of a string,
insert and delete sections of a string, and much more. Two fairly common
facilities, though, are the ASC function and the CHR$ function. These two
functions allow you to see the code conversion process. Try this little example
out on your computer:

80 PRINT ASC("C")
90 PRINT CHR$(67)

The first line will print the ASCII value of C, which should be 67. The second
value will print the character corresponding to 67, which is C. Thus, you can
take strings apart, find their numeric equivalents, and manipulate them with
arithmetic, although that is certainly the hard way to do it.

BOOLEAN DATA
Another kind of data is Boolean data, named after George Boole, who founded
the mathematics of formal logic. Boolean data is very simple: it takes one of
only two values, true or false. Most BASIC languages store a zero to represent
a value of false, and something else to indicate a value of true. Quite often,
computer programs allow the user to set a particular choice, a choice that is
either taken or not taken. For example, a program might ask you if you want
some data sent out to the printer. You can answer yes (true) or no (false). The
program can then keep track of your answer as a variable called, say,
CHOOSEPRINTER. Then, whenever it is about to send something out, it might
have a statement like this:

1120 IF CHOOSEPRINTER THEN GOTO 2000

This statement would treat the value CHOOSEPRINTER the same way it would
treat a logical expression. If the result were true, it would GOTO 2000;
otherwise it would continue on. Thus, the Boolean variable is a good way to
keep track of such true/false conditions. Remember, though, that it really is a
number, just interpreted differently.

INSTRUCTION DATA
The numbers in a computer can be interpreted in a completely different
manner. They can be treated as instructions to the computer. Even then, there
are two variations on this.

BASIC Instructions
Your BASIC program is stored in RAM as a set of instructions for the computer.
Each instruction has a code number, called a token, associated with it. For
example, the token for the command PRINT might be 27. If this were the case,
then the command PRINT "ABCD" would be stored in RAM as 27, 65, 66, 67,
68. The 27 stands for PRINT and the 65, 66, 67, and 68 are the ASCII codes for
"ABCD". To RUN a BASIC program, the computer would scan through RAM,
looking at each instruction code and translating it into action.

Native Code
The second form of computer instructions are what is called native code. These
are instructions that the computer itself recognizes as instructions to directly
execute. The difference between BASIC instructions and native code is that the
BASIC instructions are foreign to the computer. That is, the computer does not
really know what the BASIC instructions mean for it to do; after it reads a
BASIC instruction, it must look up the meaning of the instruction in a "book of
commands" called an interpreter. The interpreter allows the computer to
figure out what it is supposed to do. As you might imagine, a BASIC program is
slowed down quite a bit by having to go through this interpreter. What is
worse, the computer must interpret each instruction each and every time it
encounters the instruction, even if it has executed that instruction thousands
of times previously.

Native code is much faster than interpreted code. Native code is program
instructions that are couched in the natural language of the computer. This
language, called machine language, is built deep into the innards of the
computer and cannot be changed. It is the fundamental language that the
computer uses for all its work. A BASIC interpreter translates your BASIC
commands into the computer's machine language.

What, you might wonder, does machine language look or sound like? Perhaps
you imagine some weird language of beeps and buzzes. But no, machine
language is nothing more than numbers. For example, a 96 will tell some
computers to return from a subroutine; it is exactly the same as the RETURN
statement in BASIC. Other commands, however, are nothing at all like BASIC.
There is more information on machine language in the appendix.
PIXEL DATA
Data inside the computer can also be interpreted as pixel data. This is data to
be displayed on the screen. To understand how this is done, you must first
learn something about number systems. There are three commonly used
number systems to master: decimal, hexadecimal, and binary. Decimal is the
first. You already know about decimal; it is the number system that you
normally use.

Hexadecimal is the second system. It sounds like a number system that witches
might use to cast hexes, but actually, "hex" in this case means 6, and "deci"
means 10, so hexadecimal refers to a base-16 numbering system. That is, we
count by 16's in a hexadecimal system. The idea to master here is the idea of
counting up until we reach the top of the number system and start over. In
decimal, we do it like this: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Now, cast aside your
natural familiarity with that 10 and look at it closely. What happened was this:
we reached 9, the last numeral in our possession. To go to the next higher
number, we started over again with 0, but we put a 1 in the 10's place. When
we reach 99, we add 1 to 9, which takes us over the top, so we go back to 0,
carry the one, which throws that 9 over the top to 0, so we carry the 1 again,
and end up with a 1 in the hundreds place. The rule is simple: when you reach
the highest number in the system and go up, replace it with a 0 and add 1 to
the next place. That place is a 1's place, or a 10's place, or a 100's place, or so
on in the decimal system.

In the hexadecimal system we count by 16's. The next 6 numbers after 9 are A,
B, C, D, E, and F. So we count like this: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F,
10. Now, be careful about that last 10. It is not the same as the 10 you are used
to seeing. It's really the number after F, and F is 15, so 10 is 16. Does that
confuse you?

As you might imagine, reading hexadecimal numbers can be quite confusing,


so programmers have one little trick to help out. Whenever a programmer
writes down a hexadecimal number, he puts a dollar sign in front of it so that
you'll know that it is special. Thus, $10 is hexadecimal 10, or 16, but 10 is
decimal 10, or just plain old everyday 10.

Doing anything in hexadecimal is enough to drive almost anybody nuts.


Arithmetic is really wild. Where else would 8+8=$10? Or try to figure this one
out: $30/2=$18. This stuff gets real hairy real fast. To help out, most
programmers use a little hexadecimal calculator that lets them figure these
things out quickly and easily.

The third numbering system that programmers use is called binary. It is a very
simple numbering system, so simple that it confuses lots of people. In binary,
we only count up to 1 before starting over. Thus, while decimal has 10
numerals (0, 1, 2, 3, 4, 5, 6, 7, 8, and 9), and hexadecimal has 16 numerals,
binary has only two: 0 and 1. So in binary, we count like this:

Binary: 0, 1, 10, 11, 100, 101, 110, 111, 1000


Decimal: 0, 1, 2, 3, 4, 5, 6, 7, 8

This means that in decimal, 10 is 10, in hexadecimal, $10 is 16, and in binary,
10 is 2. Are you getting confused yet?

Binary numbers get very long very quickly. For example, the number 999 in
binary is 1111100111. They are also very tedious to do arithmetic with. The
one saving grace of binary numbers is that they directly show the status of the
bits inside the computer. A bit is the fundamental unit of memory inside the
computer. We normally talk in terms of bytes, because the computer is
organized around bytes. But bytes are made up of bits; there are eight bits in
one byte. We normally don't worry about individual bits because one bit is too
small to do much with. I mean, what can you do with something that is either 0
or 1? Not much. About all you can do is pack eight of them together into a
byte, and then you've got a number between 0 and 255. But there is one
situation in which it is handy to worry about individual bits, and that is when
you are making a screen graphic. All computers draw images on the screen by
breaking the screen up into little tiny cells called pixels. The word pixel is a
contraction of "picture element". On a black and white display, a pixel is either
black or white. A blow-up of the letter "A" makes the point better than words:

Those big black squares are the pixels that we use to draw the A on the screen.
Now, notice that a pixel is either black or white. There are only two states
possible for a pixel, no in between. Thus, a pixel's state can be represented by
a binary number, a 1 or a 0. We might say that a 0 means white and a 1 means
black. If so, then our letter A can be represented by binary numbers, one for
each row in the letter, like so:

What we have here is something very exciting and very important: the ability
to express images as numbers. Now if we apply the powerful number-
crunching capabilities that the computer gives us, we can process the images
themselves, just by processing the numbers that represent the images. That's
how computer games are able to create those animated images. Behind every
twisting, grimacing alien, there's a microprocessor frantically shuttling
numbers around.

SUMMARY OF NUMBER TYPES


We have seen that a number can mean many different things. It can be your
plain old, everyday number, like Joe's bank balance or Fred's weight. It can also
be a character, like an "A" or a "%". It could also be a simple "true or false"
indicator. It could also be an instruction for the computer to execute. Or it
might be a part of an image. There are many other things that a number might
mean; it all depends on the context in which the number is taken.

How is it that one number could mean so many different things? Because we
can apply so many different contexts to that one number. This is nothing new;
we do it all the time with words. Consider the word "dig". My Webster's
Unabridged lists fourteen different definitions for the word. A simple, everyday
word like "dig" could be interpreted fourteen different ways. How could you
tell which of the fourteen interpretations applied? Only from the context. If
you were a foreigner first learning English, you might be angry at such a stupid
language that cannot keep its words straight. Yet, as a fluent speaker of the
language, you have no problem determining the exact shade of meaning of the
word from the context in which it is used. So too it is with computers. They
may use a number in many different ways, but the context is always clear. Thus
is it possible to breathe meaning into something as meaningless as a number.

DATA VERSUS PROCESS


Let us look more closely at this concept of context. Exactly how is context
established? As in so many things, the question bears the seeds of the answer.
The key word to examine is "established". Context is not some static entity
that lies on the page the way that data does. No, context must be established,
created, or forged. Context is intrinsically part of a process; it is established or
created by some activity. Here we encounter one of the most profound
concepts of computing: the complementarity of data versus process in the
computer.

Data are the numbers inside the computer; process is what the computer does
with them. Data are passive, process is active. An idea or a message, though, is
composed of both data and process, number and context. Both are necessary
to create an idea or message. Oddly enough, the ratio of data to process is not
fixed. Any message can be expressed with any combination of data and
process. A contrivedly simple example may help make this point. Suppose that
I wish to convey to you scores of six students, and suppose that these scores
just happen to be 2, 4, 6, 8, 10, and 12. I could send you the information in a
data-intensive form:

2, 4, 6, 8, 10, 12

Or I could send the same information in a process-intensive form:

10 FOR X=1 TO 6
20 SCORE=2*X
30 NEXT X

Both messages convey the same information, but one uses primarily data and
the other uses primarily process to convey the same information.
Programmers are intensely aware of this process-data duality, and often use it
in polishing their programs. If a program is too large and must be made
smaller, translate data-intensive portions into more process-intensive forms. If
a program runs too slowly, translate process-intensive sections into more data-
intensive forms. This is because data consumes space while process consumes
time. A sufficiently clever programmer can obtain almost any desired trade-off
of space for time by finding the precise trade-off of data for process.

But there is a point many programmers miss. Just because data and process
are to a large degree interchangeable does not mean that we should use them
without bias. If you regard the computer as a communications medium, then
when using a computer, you must always bear in mind the possibility of using
another medium to convey your message. Consider, for example, the printed
page, one of our most heavily used media. Here is a medium ideally suited for
conveying data and quite incapable of directly presenting process.
Nevertheless, we are able to use the printed page to convey a great deal of
information about the world. It is especially adept at presenting static data. If
you want to find the atomic weight of beryllium, the population of Sierra
Leone, or some other simple fact, a reference book is an ideal source to
consult. On a per-idea basis, there is no medium cheaper, more convenient,
and more effective.

But suppose we wish to convey information not about facts, but about events.
Now we are getting a little more demanding of the medium, and it does not
perform quite as satisfactorily. It manages, certainly, but somehow the
description of a complicated sequence of events can get a little muddled and
require perhaps a few re-readings before we can understand it.

Now let's go to the extreme of the spectrum and consider the ability of the
printed page to convey information about processes. We find that the medium
is certainly capable of doing so, but not very well. How many textbooks have
you dragged through, trying to divine the author's explanation of some simple
process, with little success? Look how much work I have had to go through to
explain to you the small ideas presented in this book. Because the printed page
is a data-intensive medium, it is strongest at presenting data and weakest at
communicating processes.

The computer, though, is the only medium we have that can readily handle
processes. That is because it is the only medium that is intrinsically interactive;
all other media are expository. Indeed, the computer might well be said to be
more process-intensive than data-intensive. The typical personal computer can
store 512,000 bytes of data, but the same computer can perform
approximately 300,000 operations per second. If you let it run for just four
hours, it can perform over 4 billion operations, even though holding same
measly 512,000 bytes. This is not a medium for storing data, it is a machine for
processing it.

It follows, therefore, that the ideal application for the computer will stress its
data-processing capabilities and minimize its data-storage capabilities. Indeed,
if you list the most successful programs for computers, you will see that key
element in all has very little to do with data storage and very much to do with
data processing. Spreadsheets are a good example; so are word processing
programs. Both allow you to store lots of information, information that was
once stored with paper and pencil. But the real appeal of these programs is not
the way they allow you to store data but the way that they make it easy to
manipulate data. Even the most data-intensive application on computers, the
database manager, is really not a way to store data but a way to select data.

The moral of this chapter is that data is not information. Numbers without
context are useless, meaningless piles of digits. The jerk who tries to intimidate
you with lots of numbers is wasting your time unless he can orchestrate those
numbers into a coherant line of reasoning. Numbers are only the junior
partner in the partnership of information. The senior partner is context, which
is derived from the processing to which the numbers are subjected.
Concentrate your attention on the context behind the numbers, the reasoning
that gives them meaning. Be the master of your own numbers.

Chapter Seven

From Data to Information

NUMBERS AND MEANING


If you could look into the heart of a computer, you would find no spreadsheets,
no programs, no words to process, no aliens to blast. All you would find are
numbers, thousands and thousands of numbers. The fundamental
measurement of a computer's power is its storage capacity for numbers
&emdash; typically 512 thousand numbers on a personal computer. With
these numbers, the computer is capable of only a very small number of
manipulations. It can move them, add, subtract, compare, and perform simple
logical operations known as Boolean operations. Where in this mass of
numbers and simple manipulations is meaning? How can the computer
transform all these numbers into words to process, alien invaders, or
programs?

Consider atoms. Simple things, atoms. They can interact with each other
according to the laws of chemistry. There are lots of combinations there, but
little in the way of meaningful interaction. Yet, put enough atoms together and
you get a human being, a person with character, feelings, and ideas. If you look
deep inside a human being, all you will find are lots and lots of chemical
reactions. Meaning does not come from its smallest components, but from the
way that they are organized and the context in which they are used.

Data is what the computer stores, but information is what we seek to


manipulate when we use the computer. The key word in understanding the
difference between data and information is context. Data plus context gives
information. This is a fundamental aspect of all communication systems, but it
is most clearly present in the computer. The computer stores only numbers,
but those numbers can represent many things, depending on the context.

NUMERIC DATA
They can, of course, represent numbers with values, things like a bank balance,
or a score on a test, or somebody's weight. Even then, these numbers are not
without a context of their own. First, they have dimensions, the units with
which they are measured. We don't say only that my weight is 110 &emdash; it
is 110 pounds. The number 110 all by itself doesn't mean anything; you have to
include the unit of measure to give it a context to make it meaningful.
Similarly, my bank balance of 27 makes no sense until I specify whether it is 27
dollars, 27 cents, 27 pesos, or whatever it is.

There is another context to consider when using the computer. It recognizes


only one kind of number: the 16-bit integer. This is a number ranging from 0 to
32,767, with no fractions or decimal points. In other words, the computer can
count like so: 0, 1, 2, 3, 4, . . . 32,765, 32,766, 32,767. It cannot recognize a
number bigger than 32,767. When it reaches 32,767, the next number is just 0;
it starts all over again. Now, you might wonder what use there is in a computer
that can only recognize the first 32,768 numbers in the whole universe. Well,
there's a trick that programmers learned long ago. You can combine little
numbers to make big numbers. Actually, we do it all the time. If you think
about it, you only know ten numbers yourself. Those ten numbers are 0, 1, 2,
3, 4, 5, 6, 7, 8, and 9. You think you know more? Look closely at the next
number, 10. It's nothing but a 1 followed by a 0. There's nothing new or
different about the number 10; it's just two old numbers stuck together!

Of course, you know perfectly well that what makes 10 different from 1 or 0 is
manner in which you interpret it. The number 10 has a context of its own. We
think in terms of "the tens place" and "the ones place", and so we interpret 10
as "1 in the tens place plus 0 in the ones place." Using this system, we can build
any number we want. The only price we pay is that we have to write lots of
digits to express big numbers.

The programmer's trick is to do the same thing with the computer. If you stick
together 8-bit bytes, you can get bigger numbers. With the computer, you pay
two prices to get bigger numbers: first, it takes one 16-bit word for each part
of the number that you add, and second, it takes more computer time to
manipulate such bigger numbers. There is also the restriction of context: you
have to remember that the numbers in such a compound number belong
together and must be taken as a group, rather than individually.

It is even possible to group these 16-bit numbers together in such a way as to


interpret the group as what is called a floating-point number. This has nothing
to do with water or boats; it is a number whose decimal point is free to move
around ("float" &emdash; get it?) within the number. The idea sounds weird
until you see some examples:

Floating point numbers Integers

12.36835418 127
17,893.35 94,366
.00231 90
-451.0 -451

As you may have guessed, all floating point numbers have a decimal point. The
big question about any floating point number you have is, how many
significant figures does it have? Let me show you an example, using the value
of π.

3.14159265358979323
3.1415926536
3.1416
3.14
3

The first value gives π to 18 significant figures. The second value gives π to 11
significant figures. The third gives it to only 5; the fourth gives 3; and the fifth
gives only one. Each number is correct to within its number of significant
figures; it is rounded off from the previous one. A lot of people make the
mistake of assuming inappropriate zeros. For example, that last value of π, 3, is
it a 3 or a 3.0 or a 3.000 or a 3.0000000000? Many people think that 3 is the
same thing as 3.000000000, but it isn't. The next digit of π after the 3 should
be a 1, but we rounded it off when we went down to only one significant
figure. So, if you were trying to reconstruct the value of π after I gave you only
a 3, you would be wrong to put a 0 after the 3 to make it a 3.0. In other words,
3 is not the same as 3.0. If you want to say 3.0, say it; if I say 3, don't read it as
3.0, because it isn't. It could be 3.1, or 2.9, or anything between 2.5000000 and
3.4999999.
The meaning of significant figures is that they show us the limitations of
computers and arithmetic. Remember, each significant figure costs you some
RAM space and some execution time. For this reason, some computers use
only 4 bytes to save a floating-point number; others may use 8 or even more
bytes. A floating-point number expressed with 4 bytes has about 7 significant
figures; thus, you could express ¹ this accurately with such a computer:

π = 3.141593

This is fairly accurate for most purposes. But now we come to a nasty trick that
trips up lots and lots of people. Suppose I divide 1 by 3. That should yield the
fraction 1/3rd, whose decimal value is .333333 . ., with the 3's repeating
forever. Now, when I do this division on my computer equipped with 4-byte
floating point arithmetic, it will report the result as .3333333, with 7 significant
figures of 3's, but not an infinite number. The difference between the
computer's answer (.3333333) and the correct answer (.3333333. . . .) is small
(about one part in a million), but the fact remains that the computer is wrong.
Now, this discovery tends to upset some people. They think that computers
are always right, that they can make no mistakes, especially with arithmetic,
yet here is incontrovertable proof that the computer is wrong. This really
rattles their cage.

The problem is not that the computer is mistaken, or that it is stupid and
cannot perform arithmetic. The problem is that there is no mathematical way
to correctly express the value of 1/3rd with a finite number of significant
figures. There isn't enough room to be accurate in so small a space. Suppose,
for example, that you had a brilliant plan to solve, say, the problem of the
American budget deficit. You had figured out a detailed plan that included all
the critical factors for eliminating the budget deficit without wiping out the
economy. I then gave you one piece of paper and a crayon and told you, "You
think you're so smart, put your plan on that paper with that crayon." You may
have the answer, but if you don't enough room to say it, you come out looking
pretty stupid. The same thing goes with the computer: with anything less than
an infinite number of significant digits, the computer will sometimes be wrong
by a tiny amount.

This problem is so common that it has a name: round-off error. We call it that
because the computer rounds off numbers to make them fit into its floating-
point format, and in the process, it can round off some of the accuracy of the
number. In some cases, it can completely wipe out your number. For example,
suppose as part of your plan to solve the deficit, you had developed a
computer program to figure out how much money to allocate each part of the
Federal budget. Let's say that you had even figured the amount of money to go
for buying file folders at the White House. Let's say that you figured $23.57 a
year would be a good figure. Now suppose you have a "bottom line" routine
that adds up all the expenditures of the budget to see what the grand total is.
Remember, we're talking hundreds of billions of dollars here. Let's say that the
grand total is about $300 billion dollars by the time the program gets around to
adding in your figure for file folders. Let's say the program statement looks like
this:

8230 TOTAL=TOTAL+WHFFOLDERS

Now, the computer will add the numbers like this:

312,237,300.00
23.57

312,237,300.00

If you count digits, you will see that the computer's seven significant digits are
used up on the high part of the number; the 2 in 23.57 is in the eighth
significant digit place, and so it is rounded off &emdash; right out of existence!
It's as if the $23.57 never existed. Your program would produce unreliable
results, and you would think that it had a very mysterious bug. In truth, this is
one of the natural limitations of the computer. The moral of this story is, if you
want the computer to use great big numbers next to little bitty numbers, you
need lots of significant digits, which will take more space and run more slowly.
Accuracy truly does have its price.

ALPHANUMERIC DATA
Numbers can mean more than just values. They can also be used to mean
alphanumeric characters. These are just letters and symbols like "a", "(", or
"%". The system for using them is very simple; it uses a code called ASCII
(pronounced "ass-key"), an acronym meaning "American Standard Code for
Information Interchange." This code assigns a number to every character.
Perhaps you used a code like that when you were a kid. A 1 stood for the letter
A, a 5 stood for the letter E, and so forth. This code is similar, but its purpose is
not to hide messages but to make them understandable to the computer,
which, after all, only understands numbers. Another difference is that the
letter A does not get a 1, but a 65, while B gets 66, C gets 67, and so forth.
Every letter and symbol gets its own number. The reason why A starts at 65 is a
bit of technical trivia with which I won't waste your time.

With this one code you can store text messages inside the computer. To use it,
you convert a character to a number using the ASCII code and store the
number in the computer. To read it out, just convert back. Lo and behold,
almost all versions of BASIC will do this automatically for you with a facility
called "string data". A string is a collection of numbers that are always treated
in the context of ASCII code conversion. You can always treat a string as a
collection of characters, even though it's really a collection of numbers. Using
strings from BASIC is very simple. Here's a simple example:

50 NAME$="CHRIS"
60 PRINT NAME$

There are only two syntax rules to note about this construction. First, a string is
always indicated by a "$" symbol at the end of the variable name. That tips off
the computer that you want this data treated as a string. Second, the string
data should be placed inside a pair of double quotation marks.

I cannot tell you much more about string handling because different
computers handle strings differently. Some allow you extensive facilities for
manipulating strings, allowing you to join strings, extract a portion of a string,
insert and delete sections of a string, and much more. Two fairly common
facilities, though, are the ASC function and the CHR$ function. These two
functions allow you to see the code conversion process. Try this little example
out on your computer:

80 PRINT ASC("C")
90 PRINT CHR$(67)

The first line will print the ASCII value of C, which should be 67. The second
value will print the character corresponding to 67, which is C. Thus, you can
take strings apart, find their numeric equivalents, and manipulate them with
arithmetic, although that is certainly the hard way to do it.

BOOLEAN DATA
Another kind of data is Boolean data, named after George Boole, who founded
the mathematics of formal logic. Boolean data is very simple: it takes one of
only two values, true or false. Most BASIC languages store a zero to represent
a value of false, and something else to indicate a value of true. Quite often,
computer programs allow the user to set a particular choice, a choice that is
either taken or not taken. For example, a program might ask you if you want
some data sent out to the printer. You can answer yes (true) or no (false). The
program can then keep track of your answer as a variable called, say,
CHOOSEPRINTER. Then, whenever it is about to send something out, it might
have a statement like this:

1120 IF CHOOSEPRINTER THEN GOTO 2000

This statement would treat the value CHOOSEPRINTER the same way it would
treat a logical expression. If the result were true, it would GOTO 2000;
otherwise it would continue on. Thus, the Boolean variable is a good way to
keep track of such true/false conditions. Remember, though, that it really is a
number, just interpreted differently.

INSTRUCTION DATA
The numbers in a computer can be interpreted in a completely different
manner. They can be treated as instructions to the computer. Even then, there
are two variations on this.

BASIC Instructions
Your BASIC program is stored in RAM as a set of instructions for the computer.
Each instruction has a code number, called a token, associated with it. For
example, the token for the command PRINT might be 27. If this were the case,
then the command PRINT "ABCD" would be stored in RAM as 27, 65, 66, 67,
68. The 27 stands for PRINT and the 65, 66, 67, and 68 are the ASCII codes for
"ABCD". To RUN a BASIC program, the computer would scan through RAM,
looking at each instruction code and translating it into action.

Native Code
The second form of computer instructions are what is called native code. These
are instructions that the computer itself recognizes as instructions to directly
execute. The difference between BASIC instructions and native code is that the
BASIC instructions are foreign to the computer. That is, the computer does not
really know what the BASIC instructions mean for it to do; after it reads a
BASIC instruction, it must look up the meaning of the instruction in a "book of
commands" called an interpreter. The interpreter allows the computer to
figure out what it is supposed to do. As you might imagine, a BASIC program is
slowed down quite a bit by having to go through this interpreter. What is
worse, the computer must interpret each instruction each and every time it
encounters the instruction, even if it has executed that instruction thousands
of times previously.

Native code is much faster than interpreted code. Native code is program
instructions that are couched in the natural language of the computer. This
language, called machine language, is built deep into the innards of the
computer and cannot be changed. It is the fundamental language that the
computer uses for all its work. A BASIC interpreter translates your BASIC
commands into the computer's machine language.

What, you might wonder, does machine language look or sound like? Perhaps
you imagine some weird language of beeps and buzzes. But no, machine
language is nothing more than numbers. For example, a 96 will tell some
computers to return from a subroutine; it is exactly the same as the RETURN
statement in BASIC. Other commands, however, are nothing at all like BASIC.
There is more information on machine language in the appendix.

PIXEL DATA
Data inside the computer can also be interpreted as pixel data. This is data to
be displayed on the screen. To understand how this is done, you must first
learn something about number systems. There are three commonly used
number systems to master: decimal, hexadecimal, and binary. Decimal is the
first. You already know about decimal; it is the number system that you
normally use.

Hexadecimal is the second system. It sounds like a number system that witches
might use to cast hexes, but actually, "hex" in this case means 6, and "deci"
means 10, so hexadecimal refers to a base-16 numbering system. That is, we
count by 16's in a hexadecimal system. The idea to master here is the idea of
counting up until we reach the top of the number system and start over. In
decimal, we do it like this: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Now, cast aside your
natural familiarity with that 10 and look at it closely. What happened was this:
we reached 9, the last numeral in our possession. To go to the next higher
number, we started over again with 0, but we put a 1 in the 10's place. When
we reach 99, we add 1 to 9, which takes us over the top, so we go back to 0,
carry the one, which throws that 9 over the top to 0, so we carry the 1 again,
and end up with a 1 in the hundreds place. The rule is simple: when you reach
the highest number in the system and go up, replace it with a 0 and add 1 to
the next place. That place is a 1's place, or a 10's place, or a 100's place, or so
on in the decimal system.

In the hexadecimal system we count by 16's. The next 6 numbers after 9 are A,
B, C, D, E, and F. So we count like this: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F,
10. Now, be careful about that last 10. It is not the same as the 10 you are used
to seeing. It's really the number after F, and F is 15, so 10 is 16. Does that
confuse you?

As you might imagine, reading hexadecimal numbers can be quite confusing,


so programmers have one little trick to help out. Whenever a programmer
writes down a hexadecimal number, he puts a dollar sign in front of it so that
you'll know that it is special. Thus, $10 is hexadecimal 10, or 16, but 10 is
decimal 10, or just plain old everyday 10.

Doing anything in hexadecimal is enough to drive almost anybody nuts.


Arithmetic is really wild. Where else would 8+8=$10? Or try to figure this one
out: $30/2=$18. This stuff gets real hairy real fast. To help out, most
programmers use a little hexadecimal calculator that lets them figure these
things out quickly and easily.

The third numbering system that programmers use is called binary. It is a very
simple numbering system, so simple that it confuses lots of people. In binary,
we only count up to 1 before starting over. Thus, while decimal has 10
numerals (0, 1, 2, 3, 4, 5, 6, 7, 8, and 9), and hexadecimal has 16 numerals,
binary has only two: 0 and 1. So in binary, we count like this:

Binary: 0, 1, 10, 11, 100, 101, 110, 111, 1000


Decimal: 0, 1, 2, 3, 4, 5, 6, 7, 8

This means that in decimal, 10 is 10, in hexadecimal, $10 is 16, and in binary,
10 is 2. Are you getting confused yet?

Binary numbers get very long very quickly. For example, the number 999 in
binary is 1111100111. They are also very tedious to do arithmetic with. The
one saving grace of binary numbers is that they directly show the status of the
bits inside the computer. A bit is the fundamental unit of memory inside the
computer. We normally talk in terms of bytes, because the computer is
organized around bytes. But bytes are made up of bits; there are eight bits in
one byte. We normally don't worry about individual bits because one bit is too
small to do much with. I mean, what can you do with something that is either 0
or 1? Not much. About all you can do is pack eight of them together into a
byte, and then you've got a number between 0 and 255. But there is one
situation in which it is handy to worry about individual bits, and that is when
you are making a screen graphic. All computers draw images on the screen by
breaking the screen up into little tiny cells called pixels. The word pixel is a
contraction of "picture element". On a black and white display, a pixel is either
black or white. A blow-up of the letter "A" makes the point better than words:

Those big black squares are the pixels that we use to draw the A on the screen.
Now, notice that a pixel is either black or white. There are only two states
possible for a pixel, no in between. Thus, a pixel's state can be represented by
a binary number, a 1 or a 0. We might say that a 0 means white and a 1 means
black. If so, then our letter A can be represented by binary numbers, one for
each row in the letter, like so:

What we have here is something very exciting and very important: the ability
to express images as numbers. Now if we apply the powerful number-
crunching capabilities that the computer gives us, we can process the images
themselves, just by processing the numbers that represent the images. That's
how computer games are able to create those animated images. Behind every
twisting, grimacing alien, there's a microprocessor frantically shuttling
numbers around.

SUMMARY OF NUMBER TYPES


We have seen that a number can mean many different things. It can be your
plain old, everyday number, like Joe's bank balance or Fred's weight. It can also
be a character, like an "A" or a "%". It could also be a simple "true or false"
indicator. It could also be an instruction for the computer to execute. Or it
might be a part of an image. There are many other things that a number might
mean; it all depends on the context in which the number is taken.
How is it that one number could mean so many different things? Because we
can apply so many different contexts to that one number. This is nothing new;
we do it all the time with words. Consider the word "dig". My Webster's
Unabridged lists fourteen different definitions for the word. A simple, everyday
word like "dig" could be interpreted fourteen different ways. How could you
tell which of the fourteen interpretations applied? Only from the context. If
you were a foreigner first learning English, you might be angry at such a stupid
language that cannot keep its words straight. Yet, as a fluent speaker of the
language, you have no problem determining the exact shade of meaning of the
word from the context in which it is used. So too it is with computers. They
may use a number in many different ways, but the context is always clear. Thus
is it possible to breathe meaning into something as meaningless as a number.

DATA VERSUS PROCESS


Let us look more closely at this concept of context. Exactly how is context
established? As in so many things, the question bears the seeds of the answer.
The key word to examine is "established". Context is not some static entity
that lies on the page the way that data does. No, context must be established,
created, or forged. Context is intrinsically part of a process; it is established or
created by some activity. Here we encounter one of the most profound
concepts of computing: the complementarity of data versus process in the
computer.

Data are the numbers inside the computer; process is what the computer does
with them. Data are passive, process is active. An idea or a message, though, is
composed of both data and process, number and context. Both are necessary
to create an idea or message. Oddly enough, the ratio of data to process is not
fixed. Any message can be expressed with any combination of data and
process. A contrivedly simple example may help make this point. Suppose that
I wish to convey to you scores of six students, and suppose that these scores
just happen to be 2, 4, 6, 8, 10, and 12. I could send you the information in a
data-intensive form:

2, 4, 6, 8, 10, 12

Or I could send the same information in a process-intensive form:

10 FOR X=1 TO 6
20 SCORE=2*X
30 NEXT X

Both messages convey the same information, but one uses primarily data and
the other uses primarily process to convey the same information.
Programmers are intensely aware of this process-data duality, and often use it
in polishing their programs. If a program is too large and must be made
smaller, translate data-intensive portions into more process-intensive forms. If
a program runs too slowly, translate process-intensive sections into more data-
intensive forms. This is because data consumes space while process consumes
time. A sufficiently clever programmer can obtain almost any desired trade-off
of space for time by finding the precise trade-off of data for process.

But there is a point many programmers miss. Just because data and process
are to a large degree interchangeable does not mean that we should use them
without bias. If you regard the computer as a communications medium, then
when using a computer, you must always bear in mind the possibility of using
another medium to convey your message. Consider, for example, the printed
page, one of our most heavily used media. Here is a medium ideally suited for
conveying data and quite incapable of directly presenting process.
Nevertheless, we are able to use the printed page to convey a great deal of
information about the world. It is especially adept at presenting static data. If
you want to find the atomic weight of beryllium, the population of Sierra
Leone, or some other simple fact, a reference book is an ideal source to
consult. On a per-idea basis, there is no medium cheaper, more convenient,
and more effective.

But suppose we wish to convey information not about facts, but about events.
Now we are getting a little more demanding of the medium, and it does not
perform quite as satisfactorily. It manages, certainly, but somehow the
description of a complicated sequence of events can get a little muddled and
require perhaps a few re-readings before we can understand it.

Now let's go to the extreme of the spectrum and consider the ability of the
printed page to convey information about processes. We find that the medium
is certainly capable of doing so, but not very well. How many textbooks have
you dragged through, trying to divine the author's explanation of some simple
process, with little success? Look how much work I have had to go through to
explain to you the small ideas presented in this book. Because the printed page
is a data-intensive medium, it is strongest at presenting data and weakest at
communicating processes.
The computer, though, is the only medium we have that can readily handle
processes. That is because it is the only medium that is intrinsically interactive;
all other media are expository. Indeed, the computer might well be said to be
more process-intensive than data-intensive. The typical personal computer can
store 512,000 bytes of data, but the same computer can perform
approximately 300,000 operations per second. If you let it run for just four
hours, it can perform over 4 billion operations, even though holding same
measly 512,000 bytes. This is not a medium for storing data, it is a machine for
processing it.

It follows, therefore, that the ideal application for the computer will stress its
data-processing capabilities and minimize its data-storage capabilities. Indeed,
if you list the most successful programs for computers, you will see that key
element in all has very little to do with data storage and very much to do with
data processing. Spreadsheets are a good example; so are word processing
programs. Both allow you to store lots of information, information that was
once stored with paper and pencil. But the real appeal of these programs is not
the way they allow you to store data but the way that they make it easy to
manipulate data. Even the most data-intensive application on computers, the
database manager, is really not a way to store data but a way to select data.

The moral of this chapter is that data is not information. Numbers without
context are useless, meaningless piles of digits. The jerk who tries to intimidate
you with lots of numbers is wasting your time unless he can orchestrate those
numbers into a coherant line of reasoning. Numbers are only the junior
partner in the partnership of information. The senior partner is context, which
is derived from the processing to which the numbers are subjected.
Concentrate your attention on the context behind the numbers, the reasoning
that gives them meaning. Be the master of your own numbers.

Chapter Seven

From Data to Information

NUMBERS AND MEANING


If you could look into the heart of a computer, you would find no spreadsheets,
no programs, no words to process, no aliens to blast. All you would find are
numbers, thousands and thousands of numbers. The fundamental
measurement of a computer's power is its storage capacity for numbers
&emdash; typically 512 thousand numbers on a personal computer. With
these numbers, the computer is capable of only a very small number of
manipulations. It can move them, add, subtract, compare, and perform simple
logical operations known as Boolean operations. Where in this mass of
numbers and simple manipulations is meaning? How can the computer
transform all these numbers into words to process, alien invaders, or
programs?

Consider atoms. Simple things, atoms. They can interact with each other
according to the laws of chemistry. There are lots of combinations there, but
little in the way of meaningful interaction. Yet, put enough atoms together and
you get a human being, a person with character, feelings, and ideas. If you look
deep inside a human being, all you will find are lots and lots of chemical
reactions. Meaning does not come from its smallest components, but from the
way that they are organized and the context in which they are used.

Data is what the computer stores, but information is what we seek to


manipulate when we use the computer. The key word in understanding the
difference between data and information is context. Data plus context gives
information. This is a fundamental aspect of all communication systems, but it
is most clearly present in the computer. The computer stores only numbers,
but those numbers can represent many things, depending on the context.

NUMERIC DATA
They can, of course, represent numbers with values, things like a bank balance,
or a score on a test, or somebody's weight. Even then, these numbers are not
without a context of their own. First, they have dimensions, the units with
which they are measured. We don't say only that my weight is 110 &emdash; it
is 110 pounds. The number 110 all by itself doesn't mean anything; you have to
include the unit of measure to give it a context to make it meaningful.
Similarly, my bank balance of 27 makes no sense until I specify whether it is 27
dollars, 27 cents, 27 pesos, or whatever it is.

There is another context to consider when using the computer. It recognizes


only one kind of number: the 16-bit integer. This is a number ranging from 0 to
32,767, with no fractions or decimal points. In other words, the computer can
count like so: 0, 1, 2, 3, 4, . . . 32,765, 32,766, 32,767. It cannot recognize a
number bigger than 32,767. When it reaches 32,767, the next number is just 0;
it starts all over again. Now, you might wonder what use there is in a computer
that can only recognize the first 32,768 numbers in the whole universe. Well,
there's a trick that programmers learned long ago. You can combine little
numbers to make big numbers. Actually, we do it all the time. If you think
about it, you only know ten numbers yourself. Those ten numbers are 0, 1, 2,
3, 4, 5, 6, 7, 8, and 9. You think you know more? Look closely at the next
number, 10. It's nothing but a 1 followed by a 0. There's nothing new or
different about the number 10; it's just two old numbers stuck together!

Of course, you know perfectly well that what makes 10 different from 1 or 0 is
manner in which you interpret it. The number 10 has a context of its own. We
think in terms of "the tens place" and "the ones place", and so we interpret 10
as "1 in the tens place plus 0 in the ones place." Using this system, we can build
any number we want. The only price we pay is that we have to write lots of
digits to express big numbers.

The programmer's trick is to do the same thing with the computer. If you stick
together 8-bit bytes, you can get bigger numbers. With the computer, you pay
two prices to get bigger numbers: first, it takes one 16-bit word for each part
of the number that you add, and second, it takes more computer time to
manipulate such bigger numbers. There is also the restriction of context: you
have to remember that the numbers in such a compound number belong
together and must be taken as a group, rather than individually.

It is even possible to group these 16-bit numbers together in such a way as to


interpret the group as what is called a floating-point number. This has nothing
to do with water or boats; it is a number whose decimal point is free to move
around ("float" &emdash; get it?) within the number. The idea sounds weird
until you see some examples:

Floating point numbers Integers

12.36835418 127
17,893.35 94,366
.00231 90
-451.0 -451

As you may have guessed, all floating point numbers have a decimal point. The
big question about any floating point number you have is, how many
significant figures does it have? Let me show you an example, using the value
of π.

3.14159265358979323
3.1415926536
3.1416
3.14
3

The first value gives π to 18 significant figures. The second value gives π to 11
significant figures. The third gives it to only 5; the fourth gives 3; and the fifth
gives only one. Each number is correct to within its number of significant
figures; it is rounded off from the previous one. A lot of people make the
mistake of assuming inappropriate zeros. For example, that last value of π, 3, is
it a 3 or a 3.0 or a 3.000 or a 3.0000000000? Many people think that 3 is the
same thing as 3.000000000, but it isn't. The next digit of π after the 3 should
be a 1, but we rounded it off when we went down to only one significant
figure. So, if you were trying to reconstruct the value of π after I gave you only
a 3, you would be wrong to put a 0 after the 3 to make it a 3.0. In other words,
3 is not the same as 3.0. If you want to say 3.0, say it; if I say 3, don't read it as
3.0, because it isn't. It could be 3.1, or 2.9, or anything between 2.5000000 and
3.4999999.

The meaning of significant figures is that they show us the limitations of


computers and arithmetic. Remember, each significant figure costs you some
RAM space and some execution time. For this reason, some computers use
only 4 bytes to save a floating-point number; others may use 8 or even more
bytes. A floating-point number expressed with 4 bytes has about 7 significant
figures; thus, you could express ¹ this accurately with such a computer:

π = 3.141593

This is fairly accurate for most purposes. But now we come to a nasty trick that
trips up lots and lots of people. Suppose I divide 1 by 3. That should yield the
fraction 1/3rd, whose decimal value is .333333 . ., with the 3's repeating
forever. Now, when I do this division on my computer equipped with 4-byte
floating point arithmetic, it will report the result as .3333333, with 7 significant
figures of 3's, but not an infinite number. The difference between the
computer's answer (.3333333) and the correct answer (.3333333. . . .) is small
(about one part in a million), but the fact remains that the computer is wrong.
Now, this discovery tends to upset some people. They think that computers
are always right, that they can make no mistakes, especially with arithmetic,
yet here is incontrovertable proof that the computer is wrong. This really
rattles their cage.
The problem is not that the computer is mistaken, or that it is stupid and
cannot perform arithmetic. The problem is that there is no mathematical way
to correctly express the value of 1/3rd with a finite number of significant
figures. There isn't enough room to be accurate in so small a space. Suppose,
for example, that you had a brilliant plan to solve, say, the problem of the
American budget deficit. You had figured out a detailed plan that included all
the critical factors for eliminating the budget deficit without wiping out the
economy. I then gave you one piece of paper and a crayon and told you, "You
think you're so smart, put your plan on that paper with that crayon." You may
have the answer, but if you don't enough room to say it, you come out looking
pretty stupid. The same thing goes with the computer: with anything less than
an infinite number of significant digits, the computer will sometimes be wrong
by a tiny amount.

This problem is so common that it has a name: round-off error. We call it that
because the computer rounds off numbers to make them fit into its floating-
point format, and in the process, it can round off some of the accuracy of the
number. In some cases, it can completely wipe out your number. For example,
suppose as part of your plan to solve the deficit, you had developed a
computer program to figure out how much money to allocate each part of the
Federal budget. Let's say that you had even figured the amount of money to go
for buying file folders at the White House. Let's say that you figured $23.57 a
year would be a good figure. Now suppose you have a "bottom line" routine
that adds up all the expenditures of the budget to see what the grand total is.
Remember, we're talking hundreds of billions of dollars here. Let's say that the
grand total is about $300 billion dollars by the time the program gets around to
adding in your figure for file folders. Let's say the program statement looks like
this:

8230 TOTAL=TOTAL+WHFFOLDERS

Now, the computer will add the numbers like this:

312,237,300.00
23.57

312,237,300.00

If you count digits, you will see that the computer's seven significant digits are
used up on the high part of the number; the 2 in 23.57 is in the eighth
significant digit place, and so it is rounded off &emdash; right out of existence!
It's as if the $23.57 never existed. Your program would produce unreliable
results, and you would think that it had a very mysterious bug. In truth, this is
one of the natural limitations of the computer. The moral of this story is, if you
want the computer to use great big numbers next to little bitty numbers, you
need lots of significant digits, which will take more space and run more slowly.
Accuracy truly does have its price.

ALPHANUMERIC DATA
Numbers can mean more than just values. They can also be used to mean
alphanumeric characters. These are just letters and symbols like "a", "(", or
"%". The system for using them is very simple; it uses a code called ASCII
(pronounced "ass-key"), an acronym meaning "American Standard Code for
Information Interchange." This code assigns a number to every character.
Perhaps you used a code like that when you were a kid. A 1 stood for the letter
A, a 5 stood for the letter E, and so forth. This code is similar, but its purpose is
not to hide messages but to make them understandable to the computer,
which, after all, only understands numbers. Another difference is that the
letter A does not get a 1, but a 65, while B gets 66, C gets 67, and so forth.
Every letter and symbol gets its own number. The reason why A starts at 65 is a
bit of technical trivia with which I won't waste your time.

With this one code you can store text messages inside the computer. To use it,
you convert a character to a number using the ASCII code and store the
number in the computer. To read it out, just convert back. Lo and behold,
almost all versions of BASIC will do this automatically for you with a facility
called "string data". A string is a collection of numbers that are always treated
in the context of ASCII code conversion. You can always treat a string as a
collection of characters, even though it's really a collection of numbers. Using
strings from BASIC is very simple. Here's a simple example:

50 NAME$="CHRIS"
60 PRINT NAME$

There are only two syntax rules to note about this construction. First, a string is
always indicated by a "$" symbol at the end of the variable name. That tips off
the computer that you want this data treated as a string. Second, the string
data should be placed inside a pair of double quotation marks.

I cannot tell you much more about string handling because different
computers handle strings differently. Some allow you extensive facilities for
manipulating strings, allowing you to join strings, extract a portion of a string,
insert and delete sections of a string, and much more. Two fairly common
facilities, though, are the ASC function and the CHR$ function. These two
functions allow you to see the code conversion process. Try this little example
out on your computer:

80 PRINT ASC("C")
90 PRINT CHR$(67)

The first line will print the ASCII value of C, which should be 67. The second
value will print the character corresponding to 67, which is C. Thus, you can
take strings apart, find their numeric equivalents, and manipulate them with
arithmetic, although that is certainly the hard way to do it.

BOOLEAN DATA
Another kind of data is Boolean data, named after George Boole, who founded
the mathematics of formal logic. Boolean data is very simple: it takes one of
only two values, true or false. Most BASIC languages store a zero to represent
a value of false, and something else to indicate a value of true. Quite often,
computer programs allow the user to set a particular choice, a choice that is
either taken or not taken. For example, a program might ask you if you want
some data sent out to the printer. You can answer yes (true) or no (false). The
program can then keep track of your answer as a variable called, say,
CHOOSEPRINTER. Then, whenever it is about to send something out, it might
have a statement like this:

1120 IF CHOOSEPRINTER THEN GOTO 2000

This statement would treat the value CHOOSEPRINTER the same way it would
treat a logical expression. If the result were true, it would GOTO 2000;
otherwise it would continue on. Thus, the Boolean variable is a good way to
keep track of such true/false conditions. Remember, though, that it really is a
number, just interpreted differently.

INSTRUCTION DATA
The numbers in a computer can be interpreted in a completely different
manner. They can be treated as instructions to the computer. Even then, there
are two variations on this.
BASIC Instructions
Your BASIC program is stored in RAM as a set of instructions for the computer.
Each instruction has a code number, called a token, associated with it. For
example, the token for the command PRINT might be 27. If this were the case,
then the command PRINT "ABCD" would be stored in RAM as 27, 65, 66, 67,
68. The 27 stands for PRINT and the 65, 66, 67, and 68 are the ASCII codes for
"ABCD". To RUN a BASIC program, the computer would scan through RAM,
looking at each instruction code and translating it into action.

Native Code
The second form of computer instructions are what is called native code. These
are instructions that the computer itself recognizes as instructions to directly
execute. The difference between BASIC instructions and native code is that the
BASIC instructions are foreign to the computer. That is, the computer does not
really know what the BASIC instructions mean for it to do; after it reads a
BASIC instruction, it must look up the meaning of the instruction in a "book of
commands" called an interpreter. The interpreter allows the computer to
figure out what it is supposed to do. As you might imagine, a BASIC program is
slowed down quite a bit by having to go through this interpreter. What is
worse, the computer must interpret each instruction each and every time it
encounters the instruction, even if it has executed that instruction thousands
of times previously.

Native code is much faster than interpreted code. Native code is program
instructions that are couched in the natural language of the computer. This
language, called machine language, is built deep into the innards of the
computer and cannot be changed. It is the fundamental language that the
computer uses for all its work. A BASIC interpreter translates your BASIC
commands into the computer's machine language.

What, you might wonder, does machine language look or sound like? Perhaps
you imagine some weird language of beeps and buzzes. But no, machine
language is nothing more than numbers. For example, a 96 will tell some
computers to return from a subroutine; it is exactly the same as the RETURN
statement in BASIC. Other commands, however, are nothing at all like BASIC.
There is more information on machine language in the appendix.

PIXEL DATA
Data inside the computer can also be interpreted as pixel data. This is data to
be displayed on the screen. To understand how this is done, you must first
learn something about number systems. There are three commonly used
number systems to master: decimal, hexadecimal, and binary. Decimal is the
first. You already know about decimal; it is the number system that you
normally use.

Hexadecimal is the second system. It sounds like a number system that witches
might use to cast hexes, but actually, "hex" in this case means 6, and "deci"
means 10, so hexadecimal refers to a base-16 numbering system. That is, we
count by 16's in a hexadecimal system. The idea to master here is the idea of
counting up until we reach the top of the number system and start over. In
decimal, we do it like this: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Now, cast aside your
natural familiarity with that 10 and look at it closely. What happened was this:
we reached 9, the last numeral in our possession. To go to the next higher
number, we started over again with 0, but we put a 1 in the 10's place. When
we reach 99, we add 1 to 9, which takes us over the top, so we go back to 0,
carry the one, which throws that 9 over the top to 0, so we carry the 1 again,
and end up with a 1 in the hundreds place. The rule is simple: when you reach
the highest number in the system and go up, replace it with a 0 and add 1 to
the next place. That place is a 1's place, or a 10's place, or a 100's place, or so
on in the decimal system.

In the hexadecimal system we count by 16's. The next 6 numbers after 9 are A,
B, C, D, E, and F. So we count like this: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F,
10. Now, be careful about that last 10. It is not the same as the 10 you are used
to seeing. It's really the number after F, and F is 15, so 10 is 16. Does that
confuse you?

As you might imagine, reading hexadecimal numbers can be quite confusing,


so programmers have one little trick to help out. Whenever a programmer
writes down a hexadecimal number, he puts a dollar sign in front of it so that
you'll know that it is special. Thus, $10 is hexadecimal 10, or 16, but 10 is
decimal 10, or just plain old everyday 10.

Doing anything in hexadecimal is enough to drive almost anybody nuts.


Arithmetic is really wild. Where else would 8+8=$10? Or try to figure this one
out: $30/2=$18. This stuff gets real hairy real fast. To help out, most
programmers use a little hexadecimal calculator that lets them figure these
things out quickly and easily.

The third numbering system that programmers use is called binary. It is a very
simple numbering system, so simple that it confuses lots of people. In binary,
we only count up to 1 before starting over. Thus, while decimal has 10
numerals (0, 1, 2, 3, 4, 5, 6, 7, 8, and 9), and hexadecimal has 16 numerals,
binary has only two: 0 and 1. So in binary, we count like this:

Binary: 0, 1, 10, 11, 100, 101, 110, 111, 1000


Decimal: 0, 1, 2, 3, 4, 5, 6, 7, 8

This means that in decimal, 10 is 10, in hexadecimal, $10 is 16, and in binary,
10 is 2. Are you getting confused yet?

Binary numbers get very long very quickly. For example, the number 999 in
binary is 1111100111. They are also very tedious to do arithmetic with. The
one saving grace of binary numbers is that they directly show the status of the
bits inside the computer. A bit is the fundamental unit of memory inside the
computer. We normally talk in terms of bytes, because the computer is
organized around bytes. But bytes are made up of bits; there are eight bits in
one byte. We normally don't worry about individual bits because one bit is too
small to do much with. I mean, what can you do with something that is either 0
or 1? Not much. About all you can do is pack eight of them together into a
byte, and then you've got a number between 0 and 255. But there is one
situation in which it is handy to worry about individual bits, and that is when
you are making a screen graphic. All computers draw images on the screen by
breaking the screen up into little tiny cells called pixels. The word pixel is a
contraction of "picture element". On a black and white display, a pixel is either
black or white. A blow-up of the letter "A" makes the point better than words:

Those big black squares are the pixels that we use to draw the A on the screen.
Now, notice that a pixel is either black or white. There are only two states
possible for a pixel, no in between. Thus, a pixel's state can be represented by
a binary number, a 1 or a 0. We might say that a 0 means white and a 1 means
black. If so, then our letter A can be represented by binary numbers, one for
each row in the letter, like so:
What we have here is something very exciting and very important: the ability
to express images as numbers. Now if we apply the powerful number-
crunching capabilities that the computer gives us, we can process the images
themselves, just by processing the numbers that represent the images. That's
how computer games are able to create those animated images. Behind every
twisting, grimacing alien, there's a microprocessor frantically shuttling
numbers around.

SUMMARY OF NUMBER TYPES


We have seen that a number can mean many different things. It can be your
plain old, everyday number, like Joe's bank balance or Fred's weight. It can also
be a character, like an "A" or a "%". It could also be a simple "true or false"
indicator. It could also be an instruction for the computer to execute. Or it
might be a part of an image. There are many other things that a number might
mean; it all depends on the context in which the number is taken.

How is it that one number could mean so many different things? Because we
can apply so many different contexts to that one number. This is nothing new;
we do it all the time with words. Consider the word "dig". My Webster's
Unabridged lists fourteen different definitions for the word. A simple, everyday
word like "dig" could be interpreted fourteen different ways. How could you
tell which of the fourteen interpretations applied? Only from the context. If
you were a foreigner first learning English, you might be angry at such a stupid
language that cannot keep its words straight. Yet, as a fluent speaker of the
language, you have no problem determining the exact shade of meaning of the
word from the context in which it is used. So too it is with computers. They
may use a number in many different ways, but the context is always clear. Thus
is it possible to breathe meaning into something as meaningless as a number.

DATA VERSUS PROCESS


Let us look more closely at this concept of context. Exactly how is context
established? As in so many things, the question bears the seeds of the answer.
The key word to examine is "established". Context is not some static entity
that lies on the page the way that data does. No, context must be established,
created, or forged. Context is intrinsically part of a process; it is established or
created by some activity. Here we encounter one of the most profound
concepts of computing: the complementarity of data versus process in the
computer.

Data are the numbers inside the computer; process is what the computer does
with them. Data are passive, process is active. An idea or a message, though, is
composed of both data and process, number and context. Both are necessary
to create an idea or message. Oddly enough, the ratio of data to process is not
fixed. Any message can be expressed with any combination of data and
process. A contrivedly simple example may help make this point. Suppose that
I wish to convey to you scores of six students, and suppose that these scores
just happen to be 2, 4, 6, 8, 10, and 12. I could send you the information in a
data-intensive form:

2, 4, 6, 8, 10, 12

Or I could send the same information in a process-intensive form:

10 FOR X=1 TO 6
20 SCORE=2*X
30 NEXT X

Both messages convey the same information, but one uses primarily data and
the other uses primarily process to convey the same information.
Programmers are intensely aware of this process-data duality, and often use it
in polishing their programs. If a program is too large and must be made
smaller, translate data-intensive portions into more process-intensive forms. If
a program runs too slowly, translate process-intensive sections into more data-
intensive forms. This is because data consumes space while process consumes
time. A sufficiently clever programmer can obtain almost any desired trade-off
of space for time by finding the precise trade-off of data for process.

But there is a point many programmers miss. Just because data and process
are to a large degree interchangeable does not mean that we should use them
without bias. If you regard the computer as a communications medium, then
when using a computer, you must always bear in mind the possibility of using
another medium to convey your message. Consider, for example, the printed
page, one of our most heavily used media. Here is a medium ideally suited for
conveying data and quite incapable of directly presenting process.
Nevertheless, we are able to use the printed page to convey a great deal of
information about the world. It is especially adept at presenting static data. If
you want to find the atomic weight of beryllium, the population of Sierra
Leone, or some other simple fact, a reference book is an ideal source to
consult. On a per-idea basis, there is no medium cheaper, more convenient,
and more effective.

But suppose we wish to convey information not about facts, but about events.
Now we are getting a little more demanding of the medium, and it does not
perform quite as satisfactorily. It manages, certainly, but somehow the
description of a complicated sequence of events can get a little muddled and
require perhaps a few re-readings before we can understand it.

Now let's go to the extreme of the spectrum and consider the ability of the
printed page to convey information about processes. We find that the medium
is certainly capable of doing so, but not very well. How many textbooks have
you dragged through, trying to divine the author's explanation of some simple
process, with little success? Look how much work I have had to go through to
explain to you the small ideas presented in this book. Because the printed page
is a data-intensive medium, it is strongest at presenting data and weakest at
communicating processes.

The computer, though, is the only medium we have that can readily handle
processes. That is because it is the only medium that is intrinsically interactive;
all other media are expository. Indeed, the computer might well be said to be
more process-intensive than data-intensive. The typical personal computer can
store 512,000 bytes of data, but the same computer can perform
approximately 300,000 operations per second. If you let it run for just four
hours, it can perform over 4 billion operations, even though holding same
measly 512,000 bytes. This is not a medium for storing data, it is a machine for
processing it.

It follows, therefore, that the ideal application for the computer will stress its
data-processing capabilities and minimize its data-storage capabilities. Indeed,
if you list the most successful programs for computers, you will see that key
element in all has very little to do with data storage and very much to do with
data processing. Spreadsheets are a good example; so are word processing
programs. Both allow you to store lots of information, information that was
once stored with paper and pencil. But the real appeal of these programs is not
the way they allow you to store data but the way that they make it easy to
manipulate data. Even the most data-intensive application on computers, the
database manager, is really not a way to store data but a way to select data.
The moral of this chapter is that data is not information. Numbers without
context are useless, meaningless piles of digits. The jerk who tries to intimidate
you with lots of numbers is wasting your time unless he can orchestrate those
numbers into a coherant line of reasoning. Numbers are only the junior
partner in the partnership of information. The senior partner is context, which
is derived from the processing to which the numbers are subjected.
Concentrate your attention on the context behind the numbers, the reasoning
that gives them meaning. Be the master of your own numbers.

Chapter Eight

Conclusions

We have journeyed a long way together. What has been accomplished? The
central message that I have tried to show in this book is that the computer
concisely expresses many of the concepts central to civilization. Let's go over
them:

The first concept is the importance of clarity of expression (Chapter Two). With
the computer, you quickly learn to get your statements precisely correct or
suffer the syntax error. We see the same concept throughout many fields: law,
which has developed the precise interpretation of English to a high art;
science, which has developed its own language (mathematics) for precise
expression of its concepts; even advertising, which has learned how to twist
language masterfully to create impression without substance. Throughout our
civilization you can find people sweating over the precise expression of an
idea.

The science and art of decision-making (Chapter Four) is the second great
achievement of civilization. The formal study of reason, the enjoyment of
carefully constructed disputation, the willingness to analyze a problem
endlessly &emdash; these are hallmarks of our civilization. Perhaps nowhere
else is it better expressed than in the American political and legal system,
which expends vast efforts on achieving a correct decision-making process.
What other civilization would set free an admitted criminal solely because the
process used to establish guilt was flawed? Or consider the fact that the
American constitution sets no policies of the government whatever; it
concerns itself solely with the process by which political decisions will be
made. And the computer reduces the decision-making process to its absolute
essence, allowing us to clearly see the central components of decision-making.
Repetition (Chapter Five) is the next great triumph of civilization. Despite our
romantic attachment to the individualized, the hand-made, we owe our wealth
and comfort to the economies of scale associated with highly repetitious
production. With each passing decade, we have pushed the scale higher and
higher, building larger and more efficient tools that allow us to create more
and more wealth faster and faster. Sometimes our productivity has outrun our
wisdom, and we have created unanticipated problems of excess &emdash;
smog, traffic congestion, and technostress &emdash; but these are problems
that the peasant of two centuries ago would have dearly loved to suffer. The
essential nature of repetition, the economy of scale associated with repetitive
work, is perfectly captured in the loop of a computer program.

The fourth great achievement of civilization is the development of the


bureaucracy (Chapter Six) as a means of controlling so complex a phenomenon
as a civilization. We love to hate bureaucracies, but we all know that we simply
cannot live without them. And a computer program with subroutines is a
microcosm of a complex bureacracy. It is created for the same reasons, faces
the same problems, and solves them in the same way.

The taming and harnessing of numbers is the fifth great hallmark of civilization
embodied in the computer (Chapter Seven). We have developed mathematics
and applied it to a huge variety of problems. Lord Berkeley caught the spirit of
it when he wrote, "To measure is to know." The trick was not in the
manipulation of numbers per se, but rather in the ability to relate numbers to
the real world. We translate real-world phenomena into the cyber-world of
numbers, make the numbers dance, and translate the results of their dance
back into real-world results. It matters not whether we are sawing lumber,
irrigating crops, or navigating a ship; the numbers obediently dance to our
tune. The computer is the perfect tool to choreograph and observe the waltz
of the numbers.

The parallels between the computer and the central structures of our
civilization are no accident; the computer is, after all, the child of the
civilization that created it. The only surprising thing is that it manages to
capture so much of the essence of our civilization. Perhaps this is because a
civilization is not a collection of artifacts or even people, but rather a logical
structure for controlling processes, and a computer is, at heart, the same thing.
A civilization is, of course, an immensely richer and more complex structure
than a computer, but the computer does seem to be a convincing homonculus
to civilization. To understand the logic of a computer, then, is to gain a glimpse
into the heart of our culture.

But we must not overrate our understanding, especially when we deal with the
computer. The clarity and precision of computer-style thinking gives a false
sense of certainty to the small-minded. That certainty has been the source of
more pig-headedness and nonsense than any anti-rationalist mysticism to
sweep our society. It's not that computers are small-minded, or that computer
programmers are small-minded, but rather that small-minded people who
learn the computer can do a lot of damage.

Consider the bureaucrat who inflexibly sticks to the rules even when it is to the
obvious detriment of the bureaucracy as a whole. Rules are rules, you know;
we can't go bending the rules just because it might make things go more
smoothly this time. Of course, we didn't need to invent computers to create
such people. But how much better armed they are to enforce their puny view
of the universe when they can say, "I'm sorry, the computer can't take it any
other way."

A variation on this is the number-happy manager. He's got his computer


printouts, loaded with a zillion numbers about every aspect of the company's
business. He's just got to slow down every discussion with endless quotations
of data. The best retort to this fellow is an acronym: GIGO. It means "Garbage
in, garbage out". The quality of the numbers that come out of a computer is
only as good as the quality of the numbers you put in. And most of the
numbers that go into such a program are garbage.

There is also the danger of taking the lessons of the computer too seriously.
The lessons of the computer impart a kind of intellectual power to their users,
and power always corrupts in proportion to weakness of character. Thus, we
see a corps of overconfident technophiles who bring too much certainty to all
aspects of their thinking. They know the answer to every question of politics,
sex, and religion. The black-and-white world of the computer does not admit
subtle shades of gray. The goal of this book is to teach you this style of
thinking, that you may apply it to real-world problems, but don't overdo it!

The greatest victim of the computer, though, is the high school kid who falls in
love with the computer. It starts out innocently enough. Johnny is curious
about this computer stuff and shows some aptitude for it. His parents give him
a computer to encourage him. As he plunges into it, he learns many of the
same lessons that this book has presented. But there is more. He learns power,
something he hasn't had before. The power to make things happen inside the
computer. His curiousity is always rewarded with discovery, and there is no
more addictive drug than learning, especially learning that comes so easily. But
most important, and most insidious, is the cheerful willingness of the computer
to be a companion, a friend with whom the kid can talk. The real world has
parents with unreasonable demands, with whom communication is difficult.
The real world has girls with whom all interaction is invariably embarassing.
The real world is unresponsive; the real world treats him like a dumb kid. But
the computer responds to every communication in a fair and understandable
manner. The computer obeys his every command. Faced with the real world or
the computer in your bedroom, which would you choose?

And so our tragic hero renounces parents, school, and girls, and pledges
himself to the computer. He stays up late, working on his programs. His
parents and teachers, not seeing the trap, encourage him in this, the first
activity to which he has truly applied himself. Deeper and deeper he sinks,
learning subroutines and stacks when he should be making a fool of himself
with girls. Society toasts him as a "Whiz Kid", and he retreats further into his
soulless world, mastering every intricacy of the technology. He skips college,
forswearing beer busts and other crucial instructive foolishness for the
foolishness of a job earning more money than his father. For five, maybe ten
years, he is treated like a budding genius, catered to and pampered. Then
something goes wrong and he is discarded like an old sweater.

Numerous excuses are given for the failure of whiz kids at an early age. Some
call it burnout, the natural result of a too-intense workstyle. Some point out
that whiz kids don't get along well with co-workers. Others point to the need
for career development, something unavailable to a person without a college
degree. But whatever the symptom, the underlying reason for failure is that
whiz kids are not fully developed human beings. They are emotionally and
educationally stunted, unable to cope with anything other than the computer.
The day inevitably comes when the boss demands more than simple code-
hacking, and the whiz kid cannot satisfy the demand.

I have known many such whiz kids. Not one has beaten the curse. It is a tragic
waste of talent. It is the highest price we pay for the computer revolution, the
human pollution of our high-tech industry.

Even the computer has its dark side. This should come as no surprise; every
tool we make, from needle to A-bomb, has potential for positive or negative
uses. Those who bemoan the dehumanizing influence of the computer have
forgotten their heritage. The computer does not introduce any new
dehumanizing elements into our society. It is the latest and most refined
expression of forces that have been at work in our civilization for hundreds of
years. These forces were not foisted on us by some malicious demon
&emdash; they are the expression of the desires and efforts of millions of
people over scores of generations. The computer is a single point in the
cannonball trajectory of civilization, connected to all other points and existing
because of them.

We must not blame our tools, nor deify them. We must learn to use them
wisely. And here I must stop, for wisdom is outside the scope of this book.

Appendix

How Computers Work

Goal of this Appendix

My goal in this appendix is to explain in simple terms exactly how it is that a


collection of silicon can do all the wonderful things that computers can do. This
is certainly a tall order, but not an impossible one. If you have followed me this
far, you should have no problems understanding this appendix. However, while
the concepts themselves are simple, they do tend to pile one on top of one
another in a rather intimidating heap. If you pay attention and bear with me,
we'll pick this heap apart. I assure you, the result &emdash; the knowledge
that even the mighty computer is within your mental reach &emdash; will be
worth the mental exertion.

My fundamental strategy in this appendix will be a process of agglomeration.


I'll start with the simplest building blocks and use them to assemble bigger
building blocks. These bigger blocks will then form the basis for the next, even
bigger group of building blocks. At each stage of the game, once you
understand how the building block is put together, you forget the internal
details and just treat it as a black box whose properties you have already
figured out. Its kind of like going from atoms to molecules to cells to people to
societies. Leaping from atoms to societies is a mind-boggling endeavor, but if
you take it in steps, it really can make sense. Let's begin.

Level One: Transistors


The atoms of a computer are transistors. A typical personal computer will have
millions of these little devices inside its chips. What they do is very simple, but
how they do it involves a great deal of tricky physics. A transistor controls the
flow of electricity by controlling electrons. The big trick is its ability to control
the flow of many electrons with just a few electrons. Crudely speaking, it uses
a few moving electrons to stampede many more electrons. The result is like a
switch that is controlled by electricity instead of a finger. You can use the
presence or absence of electricity in one wire to turn on or turn off electricity
in another wire.

Three special properties of the transistor make it possible to build computers


out of transistors. The first is that we can run it either all the way on or all the
way off, and treat those two states as numbers. If the transistor is on, we say
that it means "1"; if it is off, we say that it means "0". All the 1s and 0s inside a
computer are just represented by transistors that are turned on or off. We can
extend the idea to the wires inside a computer. If a wire has electricity on it
(meaning that a transistor connected to it has turned on), then the wire
represents a "1"; if the wire has no electricity, it represents a "0". It's not as if
there are tiny little 1s and 0s printed on the wires; it's a code. Electricity means
1, no electricity means 0. This gives us the ability to manipulate numbers (i.e.,
to compute) by manipulating electricity.

The second special property of the transistor that makes computers possible
involves a special type of transistor that can not one but two independent
controlling wires. Earlier, I said that a transistor is like a switch that is
controlled by electricity instead of a finger. Well, it is possible to make a
transistor that uses two finger-wires instead of just one. It's rather like the
double light switches in long hallways in some houses &emdash; either one will
turn on the light.

The third special property of transistors is their ability to invert the relationship
between input and output. Normally, we think of the transistor as creating a
direct relationship between the input wire and the output wire: if there is
electricity on the input wire, it turns on the output wire, and if there is no
electricity on the input wire, then it turns off the output wire. It is also possible
to make transistors reverse this relationship, so that electricity on the input
wire will yield no electricity on the output wire, and no electricity on the input
wire will yield electricity on the output wire.

Let's summarize what we've got on transistors with some simple diagrams:
Level Two: Gates (Assemblies of Transistors)

We can use transistors as the building blocks for the next level in our hierarchy:
the gate. A gate is an electronic circuit that accepts one or more inputs and
produces a single output that is a logical function of the inputs. In simple
terms, you plug wires into it and it has one wire coming out of it. Whether or
not the output wire will have electricity on it depends on the electricity on the
input wires and the rule that the gate uses. The first type of gate is called an
"AND" gate. This gate is drawn as follows:

The rule for the AND gate is simple: if the first input is a 1 AND the second
input is also a 1, then the output will be a 1. Otherwise, it will be a 0. This is
just like the Boolean AND function you use in programming; in fact, this is the
electronic circuit that the computer uses to make that program work.

The second type of gate is called the "OR" gate, and the diagram we use for it
looks like this:

The rule for the OR gate is also simple: if the first input is a 1 OR the second
input is a 1, then the output will be a 1. Otherwise, it will be a 0. Again, this is
just like the OR Boolean function you met in programming and is the hardware
source of the software capability.
The third simple gate is hardly a gate at all: it is a simple inverter, and it is
diagrammed like so:

The inverter simply takes the input and inverts it, so that a 1 becomes a 0 and
vice versa. This little guy is handy for all sorts of jobs.

Level Three: Decoders, Latches, Adders (Assemblies of gates)

We can use the gates we have just built to create even more elaborate devices.
The first new assembly is called a decoder. Its purpose is to electronically
convert a numeric reference to a specific one. Instead of talking about "the
third one", we can use a decoder to talk about "that one". Suppose, for
example, that we have a wire. It can carry a 1 or a 0. Thus, one wire can
represent one of two choices. We could, for example, use it to select one of
two light switches that we might want to turn on. In other words, if the wire
has a 0 on it, then we want to turn on light #0, and if it has a 1 on it, then we
want to turn on light #1. However, there's a problem: the wire by itself can't
turn on the right switch. If we hook it up directly to a light, it will turn the light
on when we have a 1 on it, and turn it off when we have a 0 on it. We need a
decoder. A decoder for this job might look like this:

If you put a 1 into this decoder, Ouput #0 will have a 1 on it and Output #1 will
all have 0. If you put a 0 into this decoder, then Output #0 will get a 1 and
Output #1 will get a 0. In short, a decoder translates "Gimme number x" into
"Gimme that one".

The next doodad we will build is called a latch. We build this one from NAND
gates. "NAND" means "Not AND"; it is just an AND gate whose output is
inverted. In other words, we take a plain old AND gate and stick an inverter on
its output. We indicate this by putting a little circle on the end of the AND gate;
that makes it a NAND gate. The rule for a NAND gate is as follows: if Input #0 is
a 1, AND Input #1 is a 1, then the output will be a 0; otherwise, the output will
be a 1. So, here is a latch:

(Here's another fine point: when wires cross, they are not considered to be
connected unless there is a dot marking the spot. Thus, in this diagram, there is
no connection between the diagonally crossing wires in the center. There are
four connections marked by four black dots.)

This one may look a little intimidating, but what it does is simple and
important: it remembers. Suppose that you start off with both inputs
("Remember 0" and "Remember 1") set to 0. If you then make the "Remember
0" wire equal to 1, then the output wire will be 0. If you make the "Remember
1" wire 1, then the output wire will be 1. More important, after you stop
making one of those wires 1, and revert to the normal state in which both
wires are 0, then the output will STILL reflect the state that you put it into
earlier. It remembers!

The last device we will assemble is called an adder. It is not a snake, but a
circuit that will add two numbers together. In this case, we are going to keep it
real simple: we are going to add two single-bit numbers. In other words, this
circuit will be able to calculate just four possible additions:

0+0=0
0+1=1
1+0=1
1+1=10
That last addition may throw you; since when did one plus one equal ten?
Remember, we are working in binary numbers here, and binary 10 is just
decimal 2, so the equation really does work. Here's what the adder looks like:

We now have devices that can select, remember, and add. Time for the next
step.

Level Four: Breadth

We are now going to expand the circuits we have to make them more practical
with real numbers. The above circuits are all single-bit circuits; each one can
handle only a single bit of information. The decoder can decode just one wire,
selecting one of only two possible options. The latch can remember only one
bit, a single 1 or 0. And the adder can only add two single-bit numbers. These
devices are almost useless. Who wants to add 1 to 0 all day long? Who needs
to remember just a single 1 or 0?

The big trick we are going to pull is embarassingly simple. We are going to gang
each of these devices up in parallel with a bunch of its brothers. Lo and behold,
they will suddenly be useful! Let's start with the latch, for it's the easiest one to
understand. Here's a diagram:

This is nothing more than eight separate latches sitting side by side.

This little guy may not look like much, but you have known and used him many
times. May I introduce you to one byte of RAM. Eight bits side by side. You can
store an eight-bit number in here. That's enough to denote one character of
text, or an integer between 0 and 255. As you can see, one little latch may not
do much, but when it gangs up with seven siblings, it suddenly becomes a
worthwhile bit of silicon.

Now let's turn to the decoder. The example I gave earlier was a one-bit
decoder. Here is a two-bit decoder:

The single-bit decoder looks at one wire to select one of two possible options.
The two-bit decoder looks at two wires to select one of four possible options. If
you have two wires carrying 1s or 0s, then there are four possible
combinations of the 1s and 0s: 00, 01, 10, and 11. If you can read binary, you
will recognize these numbers as just decimal 0, 1, 2, and 3. Three wires would
give eight combinations; eight wires would give 256 combinations. By making
our decoder a little bit bigger, we make it a lot more powerful. What do you
use a decoder for? I'll get to that in the next section.

For out next breadth trick, we'll broaden out an AND-gate:

This device allows us to take two eight-bit numbers, AND them together, and
read the result at the output of the gates. You may wonder, of course, why
anybody would want to AND two numbers together. I have to admit, it's not
one of the most useful stunts in the world, but it is occasionally useful to a
programmer, and it makes an excellent introduction the next broadening trick.
Now we are going to broaden the one-bit adder. This is a tricky operation,
because the individual bits in an addition are not independent the way the
latch bits in a byte are independent. Recall the problem of the one-bit addition.
What do we do with the extra digit when we add 1+1? All the other additions
yield but a single bit of output, but this one needs an extra bit, and every bit
needs one wire, so we will need two output wires to represent the output.
That second wire will be treated differently. We will call it the "Carry" wire. Do
you remember your old addition exercises in grammar school? Do you
remember chanting little songs like, "5 plus 8 is 13; put down the 3 and carry
up the 1"? The rule with addition is that when you get a number bigger than
ten, you write down the result and carry up the one. It's the same way with
binary addition; when you add 1+1, you get 10; write down the 0 and carry up
the one to the next decimal place (or binary place, in this case).

All this nonsense with carry bits is important because it allows us to expand
our adder to a useful size. If we now throw in a circuit that will also add in an
input carry bit, a carry bit that would come from a lower stage in the addition,
then we can build adders as wide as we want. Consider this example: we want
to add two eight-bit binary numbers:

10110011
+01010101

If we break this addition up by digits, adding just one column at a time, then
we can use a one-bit adder for each column. We start at the right side of the
number, just like you do in regular addition. If the one-bit adder ends up
generating a carry, it passes the carry on to the next higher one-bit adder,
which adds it into its work. We will need better adders that can use the carry
bit that is passed to them, but that is simply a matter of a few more gates.

Level Five: Assembling RAM

Now we are ready to construct our first major computer component: the
computer's RAM. In the previous section, I showed you a single byte of RAM.
You probably know, though, that a typical microcomputer has 64K bytes of
RAM &emdash; that's 65,536 bytes total. (Let's keep this simple by ignoring
the current generation of 16-bit and 32-bit computers. I'll talk only about the
simpler 8-bit computers that were the rage in the early 1980s. They are
obsolete now, but the bigger computers that people use nowadays work on
the same principles.) It takes no great leap of imagination to go from one byte
to 65,536. There remain, however, a few practical problems associated with
actually using all that RAM. If each byte of RAM has eight wires going into it
and eight wires coming out of it, the computer will need over one million little
tiny wires inside just to hook up all those bytes. That's ridiculous! There's gotta
be a better way.

There is, but it's a little harder to understand. It's called a bus. A bus is a bundle
of wires that everyone shares. Every computer has three buses: the address
bus, the data bus, and the control bus.

The address bus is the easiest to understand. How does the computer select
which of those 65,536 bytes it wants to use? It surely can't have 65,536 little
wires coming out of the main chip, with one wire going to each byte of RAM.
The solution is to use an address bus. This is a group of 16 wires coming out of
the main chip. Each byte of RAM has its own unique number, called an
address, that the computer can use to refer to it. The first byte is called byte 0,
the next one is byte 1, then byte 2, 3, and so forth until we reach the last byte,
byte number 65,535. Sixteen wires allow us to specify any of these addresses.
For example, byte number 0 would have the code 0000000000000000 on the
address bus, while byte number 65,535 would have the binary code
1111111111111111 on the address bus. Byte number 37,353 would have the
binary code 1001000111101001 on the address bus. Thus, a computer can
specify any of its 65,536 bytes with only a sixteen-wire address bus.

You may wonder, though, how the RAM can actually figure out which byte is
the proper one. When the main chip tells the RAM, "I need to know what
number was stored in byte number 37,353", how does the RAM decode that
number to select the right byte? The answer comes from that decoder circuit
that we worked up earlier. You build a big decoder into each RAM chip that can
decode the address bus and figure out exactly which byte is being accessed.
That way, all the 65,536 little wires that go directly to the bytes to say, "You're
the one he wants" &emdash; those little wires are built directly into the silicon
chip, saving everybody a lot of soldering.

To summarize address buses: an address bus is a group of 16 wires that run


from the main processor to the RAM. It allows the computer to specify exactly
which byte it wants out of RAM. The RAM chips have eight latches for each
byte they store, and a huge decoder that decodes the address bus and selects
the appropriate byte.

The next question is, how do we get data into and out of the bytes? An address
bus lets the computer point to a particular byte and say, "I'm talking to YOU,
silicon-head!" But how does the computer get data into or out of the byte?

The answer is, with the data bus. A data bus is a group of wires that run from
the computer to the RAM. It is eight bits wide, and so allows the computer to
transfer data a byte at a time. The data bus goes to every single byte in the
computer's RAM. Now, that raises a problem: who talks on the data bus? Let's
say, for example, that byte number 37 has the value 11111111 stored inside,
while byte number 38 has a 00000000 stored inside it. If both bytes are putting
their values onto the data bus, then what does it have: 1s or 0s? Who wins
when everybody talks at once?

The answer to this question involves two tricks: tri-state logic and a control
bus. Tri-state logic is a variation on normal chip design that allows a chip to
have not just two states (0 or 1), but three: 0, 1, or disconnected. A regular
chip is always sending out his value on his wire. Imagine him shouting into a
pipe: "I'm a 0, a 0, a big fat 0; I'm still a 0, 0, 0; whoops, now I'm a 1, a 1, a 1;
I'm still a 1; yes, I'm a big, tall 1, yes I am, a 1, a 1. . . " A tri-state RAM chip can
be a 0, a 1, or he can shut up. Even handier, you can shut him up with an
electrical signal. Thus, each byte of computer RAM stays shut up until the
computer authorizes it to talk through the address bus. Thus, a data bus is like
a huge telephone party line with 65,536 people on it, but they won't talk until
the master operator tells them to talk.

The second trick to making a data bus work requires the use of a control bus.
After all, RAM is supposed to work two ways: either the computer wants to
save a number into RAM, or it wants to recall a number out of RAM. We also
call this writing and reading. So we run a few more wires from the computer to
the RAM called a control bus. The first wire in the control bus tells the RAM
whether it's being accessed for a write or a read operation. You might say that
this wire is a command to either talk or listen. The other common wire on a
control bus is called the clock signal. This wire goes on and off, on and off, in a
regular cycle. It keeps everybody synchronized. Have you ever watched two
jugglers tossing pins or knives back and forth? They count off to each other,
"One, two, three, four; one, two, three, four. . ." This allows them to
synchronize their actions. The clock wire in the control bus serves the same
purpose, except with bytes rather than knives.

We have now built an imaginary RAM module that can store 65,536 bytes of
information. The computer can read and write data into it, and the whole thing
is practical to build and operate.

Level Six: The Central Processing Unit (CPU)

Our next creation will be the heart of the computer: the CPU. This is the unit
that actually crunches the numbers, calls the subroutines, and loops the loops.
In this highly simplified and imaginary computer, we will have but four parts:
the ALU, the instruction decoder, the registers, and the address controller.

We shall take up the ALU first. ALU is an acronym for "Arithmetic and Logic
Unit". The ALU is a rather like one of those all-purpose handy-dandy kitchen
utensils that slices, dices, and shreds, only it does its work on bytes rather than
morsels. Instead of a collection of blades in various shapes and sizes, the ALU
contains four fundamental byte-munchers: an AND-gate set, an OR-gate set,
and EOR-gate set, and an adder. Each of these is eight bits wide, taking two
bytes as input and producing a single byte as output. You ship two bytes to the
ALU, tell it which handy-dandy blade you want it to use, and you read the byte
of output. It activates the proper blade with a decoder-selector circuit and
something like tri-state logic in much the same way that our RAM module
selects the correct byte and tells everyone else to shut up.

The registers are another simple part of the CPU. These are simply bytes of
RAM inside the CPU that can be used as quick storage of intermediate results.
The relationship between register-RAM and regular RAM is rather like the
relationship between a desktop and a desk drawer. You keep your papers in
the desk drawer most of the time, but when you are actually working with a
particular document, you bring it to your desk. The desk can't hold many
documents, but they are much easier to work with on your desk than in your
desk drawer. Almost all programs follow a very simple strategy: bring some
bytes out of RAM into the registers; crunch them up; spit out the results back
into RAM.

These first two parts of the CPU (the ALU and the registers) are devoted to
handling data. The next two parts of the CPU are trickier because they handle
the more complex task of making the program go. The first of these is the
address controller; its task is to get program instructions out of the RAM and
into the CPU. You will recall from Chapter Seven that bytes in the RAM can
represent many different things. From the point of view of the CPU, those
bytes in RAM fall into two broad categories: data to be processed, or
instructions that tell how to process the data. The address controller fetches
those instructions out of the RAM in the proper sequence and delivers them to
the CPU. Thus, most programs follow an alternating sequence of "fetch an
instruction, fetch a byte of data". The address controller also handles the
manipulations required for branching and looping. These are very simple
manipulations. Each program instruction sits at an address in RAM; successive
instructions are placed in successive RAM locations so that the address
controller doesn't have to think too hard to figure out where to get the next
instruction. However, if the program needs to branch or loop, the address
controller merely loads in the address of the new instruction and proceeds
from the new location. It's that simple.

Now we are ready to tackle the inner sanctum of the CPU: the instruction
decoder. This is the module that actually translates instructions into action. It
is the heart and soul of the entire computer, the most necessary of all the
necessary components, the essence of the computer. It is based on nothing
more than the simple decoder, with a number of intricacies added. There are
two broad types of information conveyed in a typical microcomputer
instruction: what to do and who to do it to.

The "what to do" part is easy to understand. This boils down to just two basic
types of commands: move information or crunch information. Moving
information is just a matter of taking a byte from one place and storing a copy
someplace else. Depending on the complexity of the processor, you might
have any number of combinations here. You could move a byte from one
register to another register, from a register to RAM, or from RAM to a register.
The crunching part just tells the CPU to use the ALU to crunch two bytes.

All this is based on a fancy decoder circuit. The bits of the instruction go to the
decoder. If they indicate a crunching operation, the decoder activates the ALU,
sending it the proper code to activate the desired circuitry inside the ALU. If
the instruction indicates a move operation, then the decoder will select the
proper registers or RAM locations and send the proper signal to the
Read/Write line on the control bus to indicate the direction of the move
operation.

The next part of the instruction specifies the object of the command; normally
this is a register or RAM-location. For example, if we have a move instruction,
we must specify the source and destination register or RAM location. With a
typical home computer, this is done with additional bytes that follow the main
instruction byte. Thus, the command would consist of three bytes. The first
byte says "Load this register with the byte in the RAM location whose address
follows". The second and third bytes give the 16-bit address of the RAM
location. The instruction decoder, when it receives the first byte, opens up a
digital pathway that will send the next two bytes straight into the address
controller; then it activates the address controller, which in turn fetches the
byte from the address it holds.

In essence, then, the instruction decoder translates instructions into action by


using decoders that activate different sections of the CPU. Some decoders
might open up pathways in the CPU to ship bytes around to different locations;
some might activate the ALU; some might activate multi-step sequences. In a
real computer, it might be very complex, but it is certainly not magic.

And that is how computers work.

Das könnte Ihnen auch gefallen