Sie sind auf Seite 1von 325

M

11/12/2010

C language | Sai
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

C deals with the same sort of objects that most computers do, namely characters,
numbers, and addresses.

C is sometimes referred to as a ``high-level assembly language.'' Some people


think that's an insult, but it's actually a deliberate and significant aspect of the
There are no sources in the current document.

language. If you have programmed in assembly language, you'll probably find C


very natural and comfortable (although if you continue to focus too heavily on
machine-level details, you'll probably end up with unnecessarily non-portable
programs). If you haven't programmed in assembly language, you may be
frustrated by C's lack of certain higher-level features. In either case, you should
understand why C was designed this way: so that seemingly-simple constructions
expressed in C would not expand to arbitrarily expensive (in time or space)
machine language constructions when compiled. If you write a C program simply
and succinctly, it is likely to result in a succinct, efficient machine language
executable. If you find that the executable resulting from a C program is not
efficient, it's probably because of something silly you did, not because of
Page 2 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

something the compiler did behind your back which you have no control over. In
any case, there's no point in complaining about C's low-level flavor: C is what it is.

Next we see a more detailed list of the things that are not ``part of C.'' It's good to
understand exactly what we mean by this. When we say that the C language proper
does not do things like memory allocation or I/O, or even string manipulation, we
obviously do not mean that there is no way to do these things in C. In fact, the
usual functions for doing these things are specified by the ANSI C Standard with
as much rigor as is the core language itself.

The fact that things like memory allocation and I/O are done through function calls
has three implications:

1. the function calls to do memory allocation, I/O, etc. are no different from any
other function calls;

Page 3 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

2. the functions which do memory allocation, I/O, etc. do not know any more about
the data they're acting on than ordinary functions do (we'll have more to say about
this later); and

3. if you have specialized needs, you can do nonstandard memory allocation or I/O
whenever you wish, by using your own functions and ignoring the standard ones
provided.

The sentence that says ``Most C implementations have included a reasonably


standard collection of such functions'' is historical; today, all implementations
conforming to the ANSI C Standard have a very standard collection.

page 3

Deep sentence:

Page 4 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

...C retains the basic philosophy that programmers know what they are doing; it
only requires that they state their intentions explicitly.
This aspect of C is very widely criticized; it is also used (justifiably) to argue that
C is not a good teaching language. C aficionados love this aspect of C because it
means that C does not try to protect them from themselves: when they know what
they're doing, even if it's risky or obscure, they can do it. Students of C hate this
aspect of C because it often seems as if the language is some kind of a conspiracy
specifically designed to lead them into booby traps and ``gotcha!''s.

This is another aspect of the language which it's fairly pointless to complain about.
If you take care and pay attention, you can avoid many of the pitfalls. These notes
will point out many of the obvious (and not so obvious) trouble spots.

page 4

The last sentence of the Introduction is misleading: as we'll see, it's risky to defer
to any particular compiler as a ``final authority on the language.'' A compiler is
only a final authority on the language it accepts, and the language that a particular
Page 5 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

compiler accepts is not necessarily exactly C, no matter what the name of the
compiler suggests. Most compilers accept extensions which are not part of
standard C and which are not supported by other compilers; some compilers are
deficient and fail to accept certain constructs which are in standard C. From time to
time, you may have questions about what is truly standard and which neither you
nor anyone you've talked to is able to answer. If you don't have a copy of the
standard (or if you do, but you discover that the standardese in which it's written is
impenetrable), you may have to temporarily accept the jurisdiction of your
particular compiler, in order to get some program working today and under that
particular compiler, but you'd do well to mark the code in question as suspect and
the question in your head as ``don't know; still unanswered.''

I completely agree with the authors that writing real programs, and soon, is the best
way to learn programming. This way, concepts which would otherwise seem
abstract make sense, and the positive feedback you get from getting even a small
program to work gives you a great incentive to improve it or write the next one.

Page 6 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Diving in with ``real'' programs right away has another advantage, if only
pragmatic: if you're using a conventional compiler, you can't run a fragment of a
program and see what it does; nothing will run until you have a complete (if tiny or
trivial) program. You can't learn everything you'd need to write a complete
program all at once, so you'll have to take some things ``on faith'' and parrot them
in your first programs before you begin to understand them. (You can't learn to
program just one expression or statement at a time any more than you can learn to
speak a foreign language one word at a time. If all you know is a handful of words,
you can't actually say anything: you also need to know something about the
language's word order and grammar and sentence structure and declension of
articles and verbs.)

The authors list a few drawbacks of this ``dive in and program'' approach, and I
must add one more. It's a small step from learning-by-doing to learning-by-trial-
and-error, and when you learn programming by trial-and-error, you can very easily
learn many errors. When you're not sure whether something will work, or you're
not even sure what you could use that might work, and you try something, and it
does work, you do not have any guarantee that what you tried worked for the right
Page 7 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

reason. You might just have ``learned'' something that works only by accident or
only on your compiler, and it may be very hard to un-learn it later, when it stops
working. (Also, if what you tried didn't work, it may have been due to a bug in the
compiler, such that it should have worked.)

Therefore, whenever you're not sure of something, be very careful before you go
off and try it ``just to see if it will work.'' Of course, you can never be absolutely
sure that something is going to work before you try it, otherwise we'd never have
to try things. But you should have an expectation that something is going to work
before you try it, and if you can't predict how to do something or whether
something would work and find yourself having to determine it experimentally,
make a note in your mind that whatever you've just learned (based on the outcome
of the experiment) is suspect.

section 1.1: Getting Started

section 1.2: Variables and Arithmetic Expressions

Page 8 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 1.3: The For Statement

section 1.4: Symbolic Constants

section 1.5: Character Input and Output

section 1.5.1: File Copying

section 1.5.2: Character Counting

section 1.5.3: Line Counting

section 1.5.4: Word Counting

section 1.6: Arrays

section 1.7: Functions

Page 9 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 1.8: Arguments--Call by Value

section 1.9: Character Arrays

section 1.10: External Variables and Scope

Although C compilers do not care about how a program looks, proper indentation
and spacing are critical in making programs easy for people to read. We
recommend writing only one statement per line, and using blanks around operators
to clarify grouping. The position of braces is less important, although people hold
passionate beliefs. We have chosen one of several popular styles. Pick a style that
suits you, then use it consistently.

Page 10 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

There are two things to note here. One is that (with one or two exceptions) the
compiler really does not care how a program looks; it doesn't matter how it's
broken into lines. The fragments
while(i < j)
i = 2 * i;
and
while(i < j) i = 2 * i;
and
while(i<j)i=2*i;
and
while(i < j)
i = 2 * i;
and
while (
i <
j )
i =
2 *
i ;
are all treated exactly the same way by the compiler.

Page 11 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The second thing to note is that style issues (such as how a program is laid
out) are important, but they're not something to be too dogmatic about, and there
are also other, deeper style issues besides mere layout and typography.

There is some value in having a reasonably standard style (or a few standard
styles) for code layout. Please don't take the authors' advice to ``pick a style that
suits you'' as an invitation to invent your own brand-new style. If (perhaps after
you've been programming in C for a while) you have specific objections to specific
facets of existing styles, you're welcome to modify them, but if you don't have any
particular leanings, you're probably best off copying an existing style at first. (If
you want to place your own stamp of originality on the programs that you write,
there are better avenues for your creativity than inventing a bizarre layout; you
might instead try to make the logic easier to follow, or the user interface easier to
use, or the code freer of bugs.)

Deep sentence:

Page 12 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

...in C, as in many other languages, integer division truncates: any fractional part is
discarded.
The authors say all there is to say here, but remember it: just when you've forgotten
this sentence, you'll wonder why something is coming out zero when you thought
it was supposed the be the quotient of two nonzero numbers.

page 12

Here is more discussion on the difference between integer and floating-point


division. Nothing deep; just something to remember.

page 13

Hidden here are discriptions of some more of printf's ``conversion


specifiers.'' %o and %x print integers, in octal (base 8) and hexadecimal (base 16),
respecively. Since a percent sign normally tells printf to expect an additional
argument and insert its value, you might wonder how to get printf to just print a %.
The answer is to double it: %%.
Page 13 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Also, note (as was mentioned on page 11) that you must match up the arguments
to printf with the conversion specification; the compiler can't (or won't) generally
check them for you or fix things up if you get them wrong. If fahr is a float, the
code
printf("%d\n", fahr);
will not work. You might ask, ``Can't the compiler see that %d needs an integer
and fahr is floating-point and do the conversion automatically, just like in the
assignments and comparisons on page 12?'' And the answer is, no. As far as the
compiler knows, you've just passed a character string and some other arguments
toprintf; it doesn't know that there's a connection between the arguments and
some special characters inside the string. This is one of the implications of the fact,
stated earlier, that functions like printf are not special. (Actually, some compilers
or other program checkers do know that a function named printf is special, and
will do some extra checking for you, but you can't count on it.)

Page 14 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Deep sentence:

...in any context where it is permissible to use the value of a variable of some type,
you can use a more complicated expression of that type.
You may have used other languages which placed restrictions on where you could
use expressions or how complicated they could be. C has relatively few such
restrictions. There's nothing magical about the printf call above; this ability to
perform a computation inside of an argument is not unique to printf. In any
function call, the arguments in the argument list are expressions, and it doesn't
matter if they are simple expressions which just fetch the value of one variable,
like fahr, or more complicated expressions, like 5.0/9.0 * (fahr - 32).

Notice that there is no semicolon at the end of a #define line.


Actually, all lines that begin with # are special; we'll learn more about them later.

section 1.5.1: File Copying


Page 15 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 16

Pay particular attention to the discussion of why the variable to hold getchar's
return value is declared as an int rather than a char. The distinction may not seem
terribly significant now, but it is important. If you use a char, it may seem to work,
but it may break down mysteriously later. Always remember to use an int for
anything you assign getchar's return value to.

page 17

The line
while ((c = getchar()) != EOF)
epitomizes the cryptic brevity which C is notorious for. You may find this
terseness infuriating (and you're not alone!), and it can certainly be carried too far,
but bear with me for a moment while I defend it.

Page 16 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The simple example on pages 16 and 17 illustrates the tradeoffs well. We have four
things to do:

1. call getchar,
2. assign its return value to a variable,
3. test the return value against EOF, and
4. process the character (in this case, print it again).

We can't eliminate any of these steps. We have to assign getchar's value to a


variable (we can't just use it directly) because we have to do two different things
with it (test, and print). Therefore, compressing the assignment and test into the
same line (as on page 17) is the only good way of avoiding two distinct calls
to getchar(as on page 16). You may not agree that the compressed idiom is better
for being more compact or easier to read, but the fact that there is now only one
call togetchar is a real virtue.

In a tiny program like this, the repeated call to getchar isn't much of a problem.
But in a real program, if the thing being read is at all complicated (not just a single
Page 17 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

character read with getchar), and if the processing is at all complicated (such that
the input call before the loop and the input call at the end of the loop become
widely separated), and if the way that input is done is ever changed some day, it's
just too likely that one of the input calls will get changed but not the other.

(Also, note that when an assignment like c = getchar() appears within a larger
expression, the surrounding expression receives the same value that is assigned.
Using an assignment as a subexpression in this way is perfectly legal and quite
common in C.)

When you run the character copying program, and it begins copying its input (your
typing) to its output (your screen), you may find yourself wondering how to stop it.
It stops when it receives end-of-file (EOF), but how do you send EOF? The answer
depends on what kind of computer you're using. On Unix and Unix-related
systems, it's almost always control-D. On MS-DOS machines, it's control-Z
followed by the RETURN key. Under Think C on the Macintosh, it's control-D,
just like Unix. On other systems, you may have to do some research to learn how
to send EOF.
Page 18 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

(Note, too, that the character you type to generate an end-of-file condition from the
keyboard has nothing to do with the EOF value returned by getchar. The EOFvalue
returned by getchar is a code indicating that the input system has detected an end-
of-file condition, whether it's reading the keyboard or a file or a magnetic tape or a
network connection or anything else.)

Another excellent thing to know when doing any kind of programming is how to
terminate a runaway program. If a program is running forever waiting for input,
you can usually stop it by sending it an end-of-file, as above, but if it's running
forever not waiting for something (i.e. if it's in an infinite loop) you'll have to take
more drastic measures. Under Unix, control-C will terminate the current program,
almost no matter what. Under MS-DOS, control-C or control-BREAK will
sometimes terminate the current program, but by default MS-DOS only checks for
control-C when it's looking for input, so an infinite loop can be unkillable. There's
a DOS command, I think it's
break on

Page 19 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

which tells DOS to look for control-C more often, and I recommend using this
command if you're doing any programming. (If a program is in a really tight
infinite loop under MS-DOS, there can be no way of killing it short of rebooting.)
On the Mac, try command-period or command-option-ESCAPE.

Finally, don't be disappointed (as I was) the first time you run the character
copying program. You'll type a character, and see it on the screen right away, and
assume it's your program working, but it's only your computer echoing every key
you type, as it always does. When you hit RETURN, a full line of characters is
made available to your program, which it reads all at once, and then copies to the
screen (again). In other words, when you run this program, it will probably seem to
echo the input a line at a time, rather than a character at a time. You may wonder
how a program can read a character right away, without waiting for the user to hit
RETURN. That's an excellent question, but unfortunately the answer is rather
complicated, and beyond the scope of this introduction. (Among other things, how
to read a character right away is one of the things that's not defined by the C
language, and it's not defined by any of the standard library functions, either. How
to do it depends on which operating system you're using.)
Page 20 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 1.5.2: Character Counting


page 18

Ignore the mention of efficiency with respect to nc = nc+1 vs. ++nc. Once you've
gotten used to ++ meaning ``increment by 1,'' you'll probably find yourself
preferring++nc simply because it is more concise, and incrementing things by 1 is
so common. (Personally, once I got used to it, I found ++ more natural, too,
because after all, expressions like nc = nc+1, though they're common enough in
programming, are very unnatural from an algebraic perspective.)

pages 18-19

You may find it odd to have a loop with no body, but such loops do crop up. Just
make sure that the explicit null statement (or, if you prefer, empty {}) marking the
empty loop body is plainly visible.

Page 21 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The whole first paragraph of page 19 counts as ``deep.'' A clean, well-designed


loop will work properly for all of its ``boundary conditions'': zero trips through the
loop, one trip, many trips, maximum trips (if there is any maximum, and if so, also
maximum minus one). If a loop for some reason doesn't work at a particular
boundary condition, it's tempting to claim that that condition is rare or impossible
and that the loop is therefore okay. But if the loop can't handle the boundary
condition, why can't it? It's probably awkwardly constructed, and straightening it
out so that it naturally handles all boundary conditions will usually make it clearer
and easier to understand (and may also remove other lurking bugs).

section 1.5.3: Line Counting


page 19

Note the word of caution about = vs. == carefully. Typing one when you mean the
other is, unfortunately, a very easy mistake to make.

Page 22 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Note that the character constants discussed on page 19 are very different from the
string constants introduced on page 7.

section 1.5.4: Word Counting


page 21

Deep sentence:

In a program as tiny as this, it makes little difference, but in larger programs, the
increase in clarity is well worth the modest extra effort to write it this way from the
beginning.
I agree with this. Some people complain that symbolic constants make a program
harder to read, because you always have to look them up to see what they mean.
As long as you choose appropriate names for symbolic constants and use them
consistently (i.e. even if APPLE and ORANGE happen to have the same value,

Page 23 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

don't use one when you mean the other), no one will have this complaint about
your programs.

Note that there's no direct way to simplify the condition


if (c == ' ' || c == '\n' || c == '\t')
In particular, something like
if (c == (' ' || '\n' || '\t'))

would not work. (What would it do?)

section 1.6: Arrays


page 22

Note carefully that arrays in C are 0-based, not 1-based as they are in some
languages. (As we'll see, 0-based arrays turn out to be more convenient than 1-
based arrays more of the time, but they may take a bit of getting used to at first.)

Page 24 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

When they say ``as reflected in the for loops that initialize and print the array,''
they're referring to the fact that the vast majority of for loops in C look like this:
for(i = 0; i < 10; ++i)
and count from 0 to 9. The loop
for(i = 1; i <= 10; ++i)
would count from 1 to 10, but loops like this are comparatively rare. (In fact,
whenever you see either ``= 1'' or ``<='' in a for loop, it's an indication that
something unusual is going on which you'll want to be aware of, and it may even
be a bug.)

page 23

They've started going a little fast here, so read up if they're losing you. What's this
magic expression c-'0' that they're using as an array subscript? Remember, as we
saw first on page 19, that characters in C are represented by small integers
corresponding to their values in the machine's character set. In ASCII, which most
machines use, 'A' is character code 65, '0' (zero) is code 48, '9' is code 57, and

Page 25 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

all the other characters have their own values which I won't bother to list. If we've
just read the character '9' from the file, it has value 57, so c-'0' is 57 - 48 which is
9, and we'll increment cell number 9 in the array, just like we want to.
Furthermore, even if we're not using a machine which uses ASCII, by
subtracting '0', we'll always subtract whatever the right value is to map the
characters from '0' to '9' down to the array cell range 0 to 9.

section 1.7: Functions


page 24

Deep sentence:

...you will often see a short function defined and called only once, just because it
clarifies some piece of code.
Ideally, this is true in any language. Breaking a program up into functions (or
subroutines or procedures or whatever a language calls them) is one of the first and

Page 26 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

one of the most important ways to keep control of the proliferating complexity in a
software project.

page 25

Note that the for loop at the top of the page runs from 1 to n rather than 0 to n-1,
and may therefore seem suspect by the above note for page 22. In this case, since
all that matters is that the loop is traversed n times, it doesn't matter which
values i takes on.

Not only the names of the parameters and local variables, but also their values (as
we'll see in section 1.8), are all local to a function. Rather than remembering a list
of things that are local, it's easier to remember that everything is local: the whole
point of a function as an abstraction mechanism is that it's a black box; you don't
have to know or care about any of its implementation details, such as what it
chooses to name its parameters and local variables. You pass it some arguments,
and it returns you a value according to its specification.

Page 27 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The distinction between the terms argument and parameter may seem overly
picky, but it's a good way of reinforcing the notion that the parameters and other
details of a function's implementation are almost completely separated from (that
is, of no concern to) the caller.

page 26

Note the discussion about return values from main. The first few sample programs
in this chapter, including the very first ``hello, world'' example on page 6, have
omitted a return value, which is, stricly speaking, incorrect. Do get in the habit of
returning a value from main, both to be correct, and because ``programs should
return status to their environment.''

By ``Parameter names need not agree'' they mean that it's not a problem that the
prototype declaration of power says that the first parameter is named m, while the
actual function definition that it's named base.

pages 26-7
Page 28 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

It's probably a good idea if you're aware of this ``old style'' function syntax, so that
you won't be taken aback when you come across it, perhaps in code written by
reactionary old fogies (such as the author of these notes) who still tend to use it out
of habit when they're not paying attention.

section 1.8: Arguments -- Call by Value


page 27

If, on the other hand, you are not used to other languages such as Fortran, these
call-by-value semantics may not be surprising (any more than anything else in C
which is new to you).

Even though you can modify a parameter in a function (i.e. treat it as a


``conveniently initialized local variable''), you certainly don't have to, especially if
(as is often the case) you'll need an unmodified copy of the parameter later in the
function.

Page 29 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 28

Don't worry too much about the exception mentioned for arrays--there are a
number of exceptions for arrays, and we'll have much more to say about them later.
But be aware that we are deliberately glossing over a few details here, and they are
details which will be come important later on. (In particular, the statement on page
27 that ``the called function cannot directly alter a variable in the calling function''
may not seem to be true for arrays, and this is what the authors mean when they
say that ``The story is different''. We'll be seeing several functions which return
things--usually strings--to their callers by writing into caller-supplied arrays. In
chapter 5 we'll learn how this is possible. If this discrepancy wouldn't have
bothered you now, pretend I didn't mention it.)

section 1.9: Character Arrays


Pay attention to the way this program is developed first in ``pseudocode,'' and then
refined into real C code. A clear pseudocode statement not only makes it easier to
think about the structure of the eventual real code, but if you make the eventual
Page 30 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

real code mimic the pseudocode, the real code will be equally straightforward and
easy to read.

The function getline, introduced here, is extremely useful, and we'll have as much
use for it in our own programs as the authors do in theirs. (In other words, they
have succeeded in their goal of making it ``useful in other contexts.'' In fact, I've
been using a getline function much like this one ever since I learned C from K&R,
and I generally find it preferable to the standard library's line-reading function.)

Pages 28 through 30 introduce quite a lot of material all at once; you'll probably
want to read it several times, especially if arrays or character strings are new to
you.

Earlier we said that C provided no particular built-in support for composite objects
such as character strings, and here we begin to see the significance of that
omission. A string is just an array of characters, and you can access the characters
within a string exactly as easily (because you use exactly the same syntax) as you
access the elements within any other array.
Page 31 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

If you've used BASIC, you will probably wonder where C's SUBSTR function is.
C doesn't have one, for two reasons. First of all, there's less of a need for one,
because it's so easy the get at the individual characters within a string in C. More
importantly, a SUBSTR function implies that you take a string and extract a
substring as a new string. However, creating a new string (i.e. the extracted
substring) involves allocating arbitrary amounts of memory to hold the string, and
C rarely if ever allocates memory implicitly for you.

If anything, it's too easy to access the individual characters within strings in C.
String handling illustrates one of the potentially frustrating aspects of C we
mentioned earlier: the language doesn't define any high-level string handling
features for you, so you're free to do whatever low-level string processing you
wish. The down side is that constantly manipulating strings down at the character
level, and always having to remember to allocate memory for new strings, can get
tedious after a while.

The preceding paragraph is not meant to discourage you, but just to point out a
reality: any C program which manipulates strings (and this includes most C
Page 32 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

programs) will find itself doing a certain amount of character-level fiddling and a
certain amount of memory allocation. It will also find that it can do just about
anything it wants to do (and that its programmer has the patience to do) with the
strings it manipulates.

Since string processing, and at this relatively low level, is so common in C, you'll
want to pay careful attention to the discussion on page 30 of how strings are stored
in character arrays, and particularly to the fact that a '\0' character is always
present to mark the end of a string. (It's easy to forget to count the '\0' character
when allocating space for a string, for instance.) Notice the nice picture on page
30; this is a good way of thinking about data structures (and not just simple
character arrays, either).

page 29

Note that the program explicitly allocates space for the two strings it manipulates:
the current line line, and the longest line longest. (It only needs these two strings
at any one time, even though the input consists of arbitrarily many lines.) Note that
Page 33 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

it cannot simply assign one string to another (because C provides no built-in


support for composite objects such as character strings); the program calls
the copy function to do so. (The authors write their own copy function for
explanatory purposes; the standard library contains a string-copying function
which would normally be used.) The only strings that aren't explicitly allocated are
the arrays in thegetline and copy functions; as the discussion briefly mentions,
these do not need to be allocated because they're already allocated in the caller.
(There are a number of subtleties about array parameters to functions; we'll have
more to say about them later.)

The code on page 29 contains a number of examples of compressed assignments


and tests; evidently the authors expect you to get used to this style in a hurry. The
line
while ((len = getline(line, MAXLINE)) > 0)
is similar to the getchar loops earlier in this chapter; it calls getline, saves its
return value in the variable len, and tests it against 0.

Page 34 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The comparison
i<lim-1 && (c=getchar())!=EOF && c!='\n'
in the for loop in the getline function does several things: it makes sure there is
room for another character in the array; it calls, assigns, and tests getchar's return
value against EOF, as before; and it also tests the returned character against '\n',
to detect end of line. The surrounding code is mildly clumsy in that it has to check
for \n a second time; later, when we learn more about loops, we may find a way of
writing it more cleanly. You may also notice that the code deals correctly with the
possibility that EOF is seen without a \n.

The line
while ((to[i] = from[i]) != '\0')
in the copy function does two things at once: it copies characters from
the from array to the to array, and at the same time it compares the copied character
against'\0', so that it stops at the end of the string. (If you think this is cryptic,
wait 'til we get to page 106 in chapter 5!)

Page 35 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

We've also just learned another printf conversion specifier: %s prints a string.

page 30

Deep sentence:

There is no way for a user of getline to know in advance how long an input line
might be, so getline checks for overflow.

Because dynamically allocating memory for arbitrary-length strings is mildly


tedious in C, it's tempting to use fixed-size arrays. (It's so tempting, in fact, that
that's what most programs do, and since fixed-size arrays are also considerably
easier to discuss, all of our early example programs will use them.) Using fixed-
size arrays is fine, as long as some assurance is made that they don't overflow.
Unfortunately, it's also tempting (and easy) to forget to guard against array
overflow, perhaps by deluding yourself into thinking that too-long inputs ``can't
happen.'' Murphy's law says that they do happen, and the various corrolaries to
Murphy's law say that they happen in the most unpleasant way and at the least
Page 36 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

convenient time. Don't be cavalier about arrays; do make sure that they're big
enough and that you guard against overflowing them. (In another mark of C's
general insensitivity to beginning programmers, most compilers do not check for
array overflow; if you write more data to an array than it is declared to hold, you
quietly scribble on other parts of memory, usually with disastrous results.)

section 1.10: External Variables and Scope


page 31

There's a bit of jargon in this section. An external variable is what is sometimes


called a global variable. The authors introduce the term automatic to refer to the
local variables we've seen so far; this is a good word to remember, even if you
never use it, because people will spring it on you when they're being precise, and if
you don't know this usage you'll think they're talking about transmissions or
something. (To be precise, ``local'' is a broader category than ``automatic''; there
are both automatic and static local variables.)

Page 37 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Deep sentence:

If [automatic variables] are not set, they will contain garbage.


Actually, if automatic variables always contained garbage, the situation wouldn't
be quite so bad. In practice, they often (though not always) do contain zero or some
other predictable value, and this happens just often enough to lull you into the
occasional false sense of security, by making a program with an inadvertently
uninitialized variable seem to work.

Deep sentence:

An external variable must be defined, exactly once, outside of any function; this
sets aside storage for it. The variable must also be declared in each function that
wants to access it; this states the type of the variable.
The basic rule is ``define once; declare many times.'' As we'll see just below, it is
not necessary for a declaration of an external variable to appear in every single
function; it is possible for one external declaration to apply to many functions. (In

Page 38 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

the clause ``the variable must also be declared in each function'', the word
``declared'' is an adjective, not a verb.)

page 33

In fact, the ``common practice'' of placing ``definitions of all external variables at


the beginning of the source file'' is so common that it's rare to see external
declarations within functions, as in the functions on page 32. The authors are using
the in-function extern declarations partly because it is an alternative style, and
partly because we haven't talked about separate compilation (that is, building a
single program from several separate source files) yet. Rather than jumping the gun
and discussing those two topics now, I'll just mention that the discussion in section
1.10 might be a bit misleading, and that you should probably wait until we get to
the complete description of the issue in section 4.4 before you commit any of this
to memory.

Deep sentence:

Page 39 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

You should note that we are using the words definition and declaration carefully
when we refer to external variables in this section. ``Definition'' refers to the place
where the variable is created or assigned storage; ``declaration'' refers to places
where the nature of the variable is stated but no storage is allocated.
Do note the careful distinction; it's an important one and one which I'll be using,
too.

page 34

The authors' criticism of the second (page 32) version of the longest-line program
is accurate. The revision of the longest-line program to use external variables was
done only to demonstrate the use of external variables, not to improve the program
in any way (nor does it improve the program in any way).

As a general rule, external variables are acceptable for storing certain kinds of
global state information which never changes, which is needed in many functions,
and which would be a nuisance to pass around. I don't think of external variables as
``communicating between functions'' but rather as ``setting common state for the
Page 40 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

entire program.'' When you start thinking of an external variables as being one of
the ways you communicate with a particular function, and in particular when you
find yourself changing the value of some external variable just before calling some
function, to affect its operation in some way, you start getting into the troublesome
uses of external variables, which you should avoid.

Chapter 2: Types, Operators, and Expressions

page 35

Deep sentence:

The type of an object determines the set of values it can have and
what operations can be performed on it.

Page 41 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

This is a fairly formal, mathematical definition of what a type is,


but it is traditional (and meaningful). There are several
implications to remember:

1. The ``set of values'' is finite. C's int type can not


represent all of the integers; its float type can not
represent all floating-point numbers.
2. When you're using an object (that is, a variable) of some
type, you may have to remember what values it can take on
and what operations you can perform on it. For example,
there are several operators which play with the binary (bit-
level) representation of integers, but these operators are not
meaningful for and may not be applied to floating-point
operands.
3. When declaring a new variable and picking a type for it, you
have to keep in mind the values and operations you'll be
needing.
Page 42 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

In other words, picking a type for a variable is not some abstract


academic exercise; it's closely connected to the way(s) you'll be
using that variable.

You don't need to worry about the list of ``small changes and additions'' made by
the ANSI standard, unless you started learning C long ago or have a keen interest
in its history. We'll be using these new features indiscriminately, usually without
comment.

section 2.1: Variable Names

section 2.2: Data Types and Sizes

section 2.3: Constants

section 2.4: Declarations

section 2.5: Arithmetic Operators


Page 43 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 2.6: Relational and Logical Operators

section 2.7: Type Conversions

section 2.8: Increment and Decrement Operators

section 2.9: Bitwise Operators

section 2.10: Assignment Operators and Expressions

section 2.11: Conditional Expressions

section 2.12: Precedence and Order of Evaluation

Page 44 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 2.1: Variable Names


Deep sentence:

Don't begin variable names with underscore, however, since library routines often
use such names.
If you happen to pick a name which ``collides'' with (is the same as) a name
already chosen by a library routine, either your code or the library routine (or both)
won't work. Naming issues become very significant in large projects, and problems
can be avoided by setting guidelines for who may use which names. One of these
guidelines is simply that user code should not use names beginning with an
underscore, because these names are (for the most part) ``reserved to the
implementation'' (that is, reserved for use by the compiler and the standard library).

Note that case is significant; assuming that case is ignored (as it is with some other
programming languages and operating systems) can lead to real frustration.

Page 45 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The convention that all-upper-case names are used for symbolic constants (i.e. as
created with the #define directive, which we learned about in section 1.4) is
arbitrary, but useful. Like the various conventions for code layout (page 10), this
convention is a good one to accept (i.e. not get too creative about), until you have
some very good reason for altering it.

Deep sentence:

Keywords like if, else, int, float, etc., are reserved; you can't use them as
variable names.

You can find the complete list of keywords in appendix A2.4 on page 192.

section 2.2: Data Types and Sizes


page 36

Page 46 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

If you can look at this list of ``a few basic types in C'' and say to yourself, ``Oh,
how simple, there are only a few types, I won't have to worry much about choosing
among them,'' you'll have an easy time with declarations. (Some masochists wish
that the type system were more complicated so that you could specify more things
about each variable, but those of us who would rather not have to specify these
extra things each time are glad that we don't have to.)

Note that the basic types are defined as having at least a certain size. There is no
specification that a short int will be exactly 16 bits, or that a long int will be
exactly 32 bits. Some programmers become obsessed with knowing exactly what
sizes things will be in various situations, and write programs which depend on
things having certain sizes. Exact sizes are occasionally important, but most of the
time we can sidestep size issues and let the compiler do most of the worrying.

Most of the simple variables in most programs are of types int, long int,
or double. Typically, we'll use int and double for most purposes, and long int any
time we need to hold values greater than 32,767. We'll rarely use individual
variables of type char; although we'll use plenty of arrays of char. Types short
Page 47 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

int andfloat are important primarily when efficiency (speed or memory usage) is
a concern, and for us it usually won't be.

Note that even when we're manipulating individual characters, we'll usually use
an int variable, for the reason discussed in section 1.5.1 on page 16.

ection 2.3: Constants


page 37

We write constants in decimal, octal, or hexadecimal for our convenience, not the
compiler's. The compiler doesn't care; it always converts everything into binary
internally, anyway. (There is, however, no good way to specify constants in source
code in binary.)

pages 37-38

Page 48 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Read the descriptions of character and string constants carefully; most C programs
work with these data types a lot, and their proper use must be kept in mind. Note
particularly these facts:

1. The character constant 'x' is quite different from the string constant "x".
2. The value of a character is simply ``the numeric value of the character in the
machine's character set.''
3. Strings are terminated by the null character, \0. (This applies to both string
constants and to all other strings we'll build and manipulate.) This means
that the size of a string (the number of char's worth of memory it occupies)
is always one more than its length (i.e. as reported by strlen) appears to be.

As we saw in section 1.6 on page 23, it's possible to switch rather freely between
thinking of a character as a character and thinking of it as its value. For example,
the character '0' (that is, the character that can print on your screen and looks like
the number zero) has in the ASCII character set the internal value 48. Another way
of saying this is to notice that the following expressions are all true:

Page 49 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

'0' == 48
'0' == '\060'
'0' == '\x30'
We'll have a bit more to say about characters and their small integer
representations in section 2.7.

Note also that the string "48" consists of the three characters '4', '8', and '\0'.
Also in section 2.7 we'll meet the atoi function which computes a numeric value
from a string of digits like this.

page 39

We won't be using enumerations, so you don't have to worry too much about the
description of enumeration constants.

section 2.4: Declarations


Page 50 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 40

You may wonder why variables must be declared before use. There are two
reasons:

1. It makes things somewhat easier on the compiler; it knows right away what
kind of storage to allocate and what code to emit to store and manipulate
each variable; it doesn't have to try to intuit the programmer's intentions.
2. It forces a bit of useful discipline on the programmer: you cannot introduce
variables willy-nilly; you must think about them enough to pick appropriate
types for them. (The compiler's error messages to you, telling you that you
apparently forgot to declare a variable, are as often helpful as they are a
nuisance: they're helpful when they tell you that you misspelled a variable,
or forgot to think about exactly how you were going to use it.)

Although there are a few places where ``certain declarations can be made
implicitly by context'', making use of these removes the advantages of reason 2
above, so I recommend always declaring everything explicitly.
Page 51 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Most of the time, I recommend writing one declaration per line (as in the ``latter
form'' on page 40). For the most part, the compiler doesn't care what order
declarations are in. You can order the declarations alphabetically, or in the order
that they're used, or to put related declarations next to each other. Collecting all
variables of the same type together on one line essentially orders declarations by
type, which isn't a very useful order (it's only slightly more useful than random
order).

If you'd rather not remember the rules for default initialization (namely that
``external or static variables are initialized to zero by default'' and ``automatic
variables for which there is no initializer have... garbage values''), you can get in
the habit of initializing everything. It never hurts to explicitly initialize something
when it would have been implicitly initialized anyway, but forgetting to initialize
something that needs it can be the source of frustrating bugs.

Don't worry about the distinction between ``external or static variables''; we haven't
seen it yet.

Page 52 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

One mild surprise is that const variables are not ``constant expressions'' as defined
on page 38. You can't say something like
const int maxline = 1000;
char line[maxline+1]; /* WRONG */

section 2.5: Arithmetic Operators


page 41

Keep in the back of your mind somewhere the fact that the behavior of
the / and % operators is not precisely defined for negative operands. This means
that -7 / 4might be -1 or -2, and -7 % 4 might be -3 or +1. The difference won't
matter for the simple programs we'll be writing at first, but eventually you'll get bit
by it if you don't remember it.

Page 53 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

An additional arithmetic operation you might be wondering about is


exponentiation. Some languages have an exponentiation operator (typically ^
or **), but C doesn't.

The term ``precedence'' refers to how ``tightly'' operators bind to their operands
(that is, to the things they operate on). In mathematics, multiplication has higher
precedence than addition, so 1 + 2 * 3 is 7, not 9. In other words, 1 + 2 * 3 is
equivalent to 1 + (2 * 3). C is the same way.

The term ``associativity'' refers to the grouping when two or more operators of the
same precedence participate next to each other in an expression. When an operator
(like subtraction) associates ``left to right,'' it means that 1 - 2 - 3 is equivalent
to (1 - 2) - 3 and gives -4, not +2.

By the way, the word ``arithmetic'' as used in the title of this section is an
adjective, not a noun, and it's pronounced differently than the noun: the accent is
on the third syllable.

Page 54 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 2.6: Relational and Logical Operators


If it isn't obvious, >= is greater-than-or-equal-to, <= is less-than-or-equal-to, == is
equal-to, and != is not-equal-to. We use >=, <=, and != because the symbols >=, <=,
and != are not common on computer keyboards, and we use == because equality
testing and assignment are two completely different operations, but = is already
taken for assignment. (Obviously, typing = when you mean == is a very easy
mistake to make, so watch for it. Some compilers will warn you when you use one
but seem to want the other.)

The fact that evaluation of the logical operators && and || ``stops as soon as the
truth or falsehood of the result is known'' refers to the fact that

``false'' AND anything is false


or, in C,
(0 && anything) == 0
while, on the other hand,

Page 55 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

``true'' OR anything is true


or, in C,
(1 || anything) == 1
Looking at these another way, if you want to do something if thing1 is true and
thing2 is true, and you've just noticed that thing1 is false, you don't even need to
check thing2. Similarly, if you're supposed to do something if thing3 is true or
thing4 is true, and you notice that thing3 is true, you can go ahead and do whatever
it is you're supposed to do without checking thing4.

C works the same way, and if it's not true that ``most C programs rely on these
properties,'' it's certainly true that many do.

For another example of the usefulness of this ``short-circuiting'' behavior, suppose


we're taking the average of n numbers. If n is zero, that is, if we don't have any
numbers to take the average of, we don't want to divide by zero. Code like
if(n != 0 && sum / n > 1)

Page 56 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

is common: it tests whether n is nonzero and the average is greater than 1, but it
does not have to worry about dividing by zero. (If, on the other hand, the compiler
always evaluated both sides of the && before checking to see whether they were
both true, the code above could divide by zero.)

page 42

Note the extra parentheses in


(c = getchar()) != '\n'
Since this is a common idiom, you'll need to remember the parentheses. What
would
c = getchar() != '\n'
do?

C's treatment of Boolean values (that is, those where we only care whether they're
true or false) is straightforward. We'll have more to say about it later, but for now,
note that a value of zero is ``false,'' and any nonzero value is ``true.'' You might

Page 57 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

also note that there is no necessary connection between statements like if() which
expect a true/false value and operators like >= and && which generate true/false
values. You can use operators like >= and && in any expression, and you can use
any expression in an if() statement.

The authors make a good point about style: if valid is conceptually a Boolean
variable (that is, it's an integer, but we only care about whether it's zero or nonzero,
in other words, ``false'' or ``true''), then
if(valid)
is a perfectly reasonable and readable condition. However, when values are not
conceptually Boolean, I encourage you to make explicit comparisons against 0. For
example, we could have expressed our average-taking code as
if(n && sum / n > 1)
but I think it's clearer to be explicit and say
if(n != 0 && sum / n > 1)
(However, many C programmers feel that expressions like
if(n && sum / n > 1)

Page 58 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

are ``more concise,'' so you will see them all the time and you should be able
to read th section 2.7: Type Conversions

The conversion rules described here and on page 44 are straightforward, but they're
quite important, so you'll need to learn them well. Usually, conversions happen
automatically and when you want them to, but not always, so it's important to keep
the rules in mind. (Recall the discussion of 5/9 on page 12.)

Deep sentence:

A char is just a small integer, so chars may be freely used in arithmetic


expressions.
Whether you treat a ``small integer'' as a character or an integer is pretty much up
to you. As we saw earlier, in the ASCII character set, the character '0' has the
value 48. Therefore, saying
int i = '0';
is the same as saying
int i = 48;

Page 59 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

If you print i out as a character, using


putchar(i);
or
printf("%c", i);
(the %c format prints characters; see page 13), you'll see the character '0'. If you
print it out as a number:
printf("%d", i);
you'll see the value 48.

Most of the time, you'll use whatever notation matches what you're trying to do. If
you want the character '0', you'll use '0'. If you want the value 48 (as the number
of months in four years, or something), you'll use 48. If you want to print
characters, you'll use putchar or printf %c, and if you want to print integers, you'll
use printf %d. Occasionally, you'll cross over between thinking of characters as
characters and as values, such as in the character-counting program in section 1.6
on page 22, or in the atoi function we'll look at next. (You should never have to
know that '0' has the value 48, and you should never have to write code which
depends on it.)

Page 60 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 43

To illustrate the ``schitzophrenic'' nature of characters (are they characters, or are


they small integer values?), it's useful to look at an implementation of the standard
library function atoi. (If you're getting overwhelmed, though, you may skip this
example for now, and come back to it later.) The atoi routine converts a string
like"123" into an integer having the corresponding value.

As you study the atoi code at the top of page 43, figure out why it does not seem
to explicitly check for the terminating '\0' character.

The expression
s[i] - '0'
is an example of the ``crossing over'' between thinking about a character and its
value. Since the value of the character '0' is not zero (and, similarly, the other
numeric characters don't have their ``obvious'' values, either), we have to do a little
conversion to get the value 0 from the character '0', the value 1 from the

Page 61 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

character '1', etc. Since the character set values for the digit
characters '0' to '9' are contiguous (48-57, if you must know), the conversion
involves simply subtracting an offset, and the offset (if you think about it) is
simply the value of the character '0'. We could write
s[i] - 48
if we really wanted to, but that would require knowing what the value actually is.
We shouldn't have to know (and it might be different in some other character set),
so we can let the compiler do the dirty work by using '0' as the offset (since
subtracting '0' is, by definition, the same as subtracting the value of the
character'0').

The functions from <ctype.h> are being introduced here without a lot of fanfare.
Here is the main loop of the atoi routine, rewritten to use isdigit:
for (i = 0; isdigit(s[i]); ++i)
n = 10 * n + (s[i] - '0');

Page 62 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Don't worry too much about the discussion of signed vs. unsigned characters for
now. (Don't forget about it completely, though; eventually, you'll find yourself
working with a program where the issue is significant.) For now, just remember:

1. Use int as the type of any variable which receives the return value
from getchar, as discussed in section 1.5.1 on page 16.
2. If you're ever dealing with arbitrary ``bytes'' of binary data, you'll usually
want to use unsigned char.

page 44

As we saw in section 2.6 on page 44, relational and logical operators always
``return'' 1 for ``true'' and 0 for ``false.'' However, when C wants to know whether
something is true or false, it just looks at whether it's nonzero or zero, so any
nonzero value is considered ``true.'' Finally, some functions which return true/false
values (the text mentions isdigit) may return ``true'' values of other than 1.

Page 63 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

You don't have to worry about these distinctions too much, and you also don't have
to worry about the fragment
d = c >= '0' && c <= '9'
as long as you write conditionals in a sensible way. If you wanted to see whether
two variables a and b were equal, you'd never write
if((a == b) == 1)
(although it would work: the == operator ``returns'' 1 if they're equal). Similarly,
you don't want to write
if(isdigit(c) == 1)
because it's equally silly-looking, and in this case it might not work. Just write
things like
if(a == b)
and
if(isdigit(c))
and you'll steer clear of most problems. (Make sure, though, that you never try
something like if('0' <= c <= '9'), since this wouldn't do at all what it looks like
it's supposed to.)

Page 64 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The set of implicit conversions on page 44, though informally stated, is exactly the
set to remember for now. They're easy to remember if you notice that, as the
authors say, ``the `lower' type is promoted to the `higher' type,'' where the ``order''
of the types is
char < short int < int < long int < float < double < long double
(We won't be using long double, so you don't need to worry about it.) We'll have
more to say about these rules on the next page.

Don't worry too much for now about the additional rules for unsigned values,
because we won't be using them at first.

Do notice that implicit (automatic) conversions do happen across assignments. It's


perfectly acceptable to assign a char to an int or vice versa, or assign an int to
a float or vice versa (or any other combination). Obviously, when you assign a
value from a larger type to a smaller one, there's a chance that it might not fit.
Therefore, compilers will often warn you about such assignments.

Page 65 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 45

Casts can be a bit confusing at first. A cast is the syntax used to request an explicit
type conversion; coercion is just a more formal word for ``conversion.'' A cast
consists of a type name in parentheses and is used as a unary operator. You may
have used languages which had conversion operators which looked more like
function calls:
integer i = 2;
floating f = floating(i); /* not C */
integer i2 = integer(f); /* not C */
In C, you accomplish the same thing with casts:
int i = 2;
float f = (float)i;
int i2 = (int)f;
(Actually, in C, we wouldn't need casts in those initializations at all, because
conversions between int and float are some of the ones that C performs
automatically.)

Page 66 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

To further understand both how implicit conversions and explicit casts work, let's
study how the implicit conversions would look if we wrote them out explicitly.
First we'll declare a few variables of various types:
char c1, c2;
int i1, i2;
long int L1, L2;
double d1, d2;
Next we'll look at the kinds of conversions which C automatically performs when
performing arithmetic on two dissimilar types, or when assigning a value to a
dissimilar type. The rules are straightforward: when performing arithmetic on two
dissimilar types, C converts one or both sides to a common type; and when
assigning a value, C converts it to the type of the variable being assigned to.

If we add a char to an int:


i2 = c1 + i1;
the fourth rule on page 44 tells us to convert the char to an int, as if we'd written
i2 = (int)c1 + i1;

Page 67 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

If we multiply a longint and a double:


d2 = L1 * d1;
the second rule tells us to convert the long int to a double, as if we'd written
d2 = (double)L1 * d1;
An assignment of a char to an int
i1 = c1;
is as if we'd written
i1 = (int)c1;
and an assignment of a float to an int
i1 = f1;
is as if we'd written
i1 = (int)f1;

Some programmers worry that implicit conversions are somehow unreliable and
prefer to insert lots of explicit conversions. I recommend that you get comfortable
with implicit conversions--they're quite useful--and don't clutter your code with
extra casts.

There are a few places where you do need casts, however. Consider the code

Page 68 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

i1 = 200;
i2 = 400;
L1 = i1 * i2;
The product 200 x 400 is 80000, which is not guaranteed to fit into an int.
(Remember that an int is only guaranteed to hold values up to 32767.) Since
80000will fit into a long int, you might think that you're okay, but you're not: the
two sides of the multiplication are of the same type, so the compiler doesn't see the
need to perform any automatic conversions (none of the rules on page 44 apply).
The multiplication is carried out as an int, which overflows with unpredictable
results, and only after the damage has been done is the unpredictable value
converted to a long int for assignment to L1. To get a multiplication like this to
work, you have to explicitly convert at least one of the int's to long int:
L1 = (long int)i1 * i2;
Now, the two sides of the * are of different types, so they're both converted to long
int (by the fifth rule on page 44), and the multiplication is carried out as along
int. If it makes you feel safer, you can use two casts:
L1 = (long int)i1 * (long int)i2;
but only one is strictly required.

Page 69 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

A similar problem arises when two integers are being divided. The code
i1 = 1;
f1 = i1 / 2;
does not set f1 to 0.5, it sets it to 0. Again, the two operands of the / operand are
already of the same type (the rules on page 44 still don't apply), so an integer
division is performed, which discards any fractional part. (We saw a similar
problem in section 1.2 on page 12.) Again, an explicit conversion saves the day:
f1 = (float)i1 / 2;
Alternately, in a case like this, you can use a floating-point constant:
f1 = i1 / 2.0;
In either case, as soon as one of the operands is floating point, the division is
carried out in floating point, and you get the result you expect.

Implicit conversions always happen during arithmetic and assignment to variables.


The situation is a bit more complicated when functions are being called, however.

Page 70 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The authors use the example of the sqrt function, which is as good an example as
any. sqrt accepts an argument of type double and returns a value of typedouble. If
the compiler didn't know that sqrt took a double, and if you called
sqrt(4);
or
int n = 4;
sqrt(n);
the compiler would pass an int to sqrt. Since sqrt expects a double, it will not
work correctly if it receives an int. Therefore, it was once always necessary to use
explicit conversions in cases like this, by calling
sqrt((double)4)
or
sqrt((double)n)
or
sqrt(4.0)

However, it is now possible, with a function prototype, to tell the compiler what
types of arguments a function expects. The prototype for sqrt is

Page 71 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

double sqrt(double);
and as long as a prototype is in effect (``in scope,'' as the cognoscenti would say),
you can call sqrt without worrying about conversions. When a prototype is in
effect, the compiler performs implicit conversions during function calls
(specifically, while passing the arguments) exactly as it does during simple
assignments.

Obviously, using prototypes makes for much safer programming, and it is


recommended that you use them whenever possible. For the standard library
functions (the ones already written for you), you get prototypes automatically
when you include the header files which describe sets of library functions. For
example, you get prototypes for all of C's built-in math functions by putting the
line
#include <math.h>
at the top of your program. For functions that you write, you can supply your own
prototypes, which we'll be learning more about later.

Page 72 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

However, there are a few situations (we'll talk about them later) where prototypes
do not apply, so it's important to remember that function calls are a bit different
and that explicit conversions (i.e. casts) may occasionally be required. Don't
imagine that prototypes are a panacea.

page 46

Don't worry about the rand example.

em.)

section 2.8: Increment and Decrement Operators


The distinction between the prefix and postfix forms of ++ and -- will probably
seem strained at first, but it will make more sense once we begin using these
operators in more realistic situations.
Page 73 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The authors point out that an expression like (i+j)++ is illegal, and it's worth
thinking for a moment about why. The ++ operator doesn't just mean ``add one''; it
means ``add one to a variable'' or ``make a variable's value one more than it was
before.'' But (i+j) is not a variable, it's an expression; so there's no place for ++to
store the incremented result. If you were bound and determined to use ++ here,
you'd have to introduce another variable:
int k = i + j;
k++;
But really, when you want to add one to an expression, just use
i + j + 1

Another unfortunate (and utterly meaningless) example is


i = i++;
If you want to increment i (that is, add one to it, and store the result back in i),
either use
i = i + 1;
or

Page 74 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

i++;
Don't try to combine the two.

page 47

Deep sentence:

In a context where no value is wanted, just the incrementing effect, as in


if(c == '\n')
nl++;
prefix and postfix are the same.
In other words, when you're just incrementing some variable, you can use either
the nl++ or ++nl form. But when you're immediately using the result, as in the
examples we'll look at later, using one or the other makes a big difference.

In that light, study one of the examples on this page--squeeze, the


modified getline, or strcat--and convince yourself that it would not work if the
wrong form of increment (++i or ++j) were used.

Page 75 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

You may note that all three examples on pages 47-48 use the postfix form. Postfix
increment is probably more common, though prefix definitely has its uses, too.

You may notice the keyword void popping up in a few code examples. void is a
type we haven't met yet; it's a type with no values and no operations. When a
function is declared as ``returning'' void, as in the squeeze and strcat examples on
pages 47 and 48, it means that the function does not return a value. (This was
briefly mentioned on page 30 in chapter 1.)

section 2.9: Bitwise Operators


page 48

The bitwise operators are definitely a bit (pardon the pun) more esoteric than the
parts of C we've covered so far (and, indeed, than probably most of C). We won't
concentrate on them, but they do come up all the time, so you should eventually
learn enough about them to recognize what they do, even if you don't use them in
any of your own programs for a while. You may skip this section for now, though.
Page 76 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

To see what the bitwise operators are doing, it may help to convert to binary for a
moment and look at what's happening to the individual bits. In the example on
page 48, suppose that n is 052525, which is 21845 decimal, or 101010101010101
binary. Then n & 0177, in base 2 and base 8 (binary and octal) looks like
101010101010101 052525
& 000000001111111 & 000177
--------------- ------
1010101 125

In the second example, if SET_ON is 012 and x is 0, then x | SET_ON looks like
000000000 000000
| 000001010 | 000012
--------- ------
1010 12
and if x starts out as 402, it looks like
100000010 000402
| 000001010 | 000012
--------- ------
100001010 412

Page 77 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Note that with &, anywhere we have a 0 we turn bits off, and anywhere we have a 1
we copy bits through from the other side. With |, anywhere we have a 1 we turn
bits on, and anywhere we have a 0 we leave bits alone.

You'll frequently see the word mask used, both as a noun and a verb. You can
imagine that we've cut a mask or stencil out of cardboard, and are using spray paint
to spray through the mask onto some other piece of paper. For |, the holes in the
mask are like 1's, and the spray paint is like 1's, and we paint more 1's onto the
underlying paper. (If there was already paint under a hole, nothing really changes if
we get more paint on it; it's still a ``1''.)

The & operator is a bit harder to fit into this analogy: you can either imagine that
the holes in the mask are 1's and you're spraying some preservative which will fix
some of the underlying bits after which the others will get washed off, or you can
imagine that the holes in the mask are 0's, and you're spraying some erasing paint
or some background color which obliterates anything (i.e. any 1's, any foreground
color) it reaches.

Page 78 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

For a bit more information on ``bitwise'' operations, see the handout, ``A Brief
Refresher on Some Math Often Used in Computing.''

page 49

Work through the example at the top of the page, and convince yourself that 1 &
2 is 0 and that 1 && 2 is 1.

The precedence of the bitwise operators is not what you might expect, and explicit
parentheses are often needed, as noted in this deep sentence from page 52:

Note that the precedence of the bitwise operators &, ^, and | falls below == and !=.
This implies that bit-testing expressions like
if ((x & MASK) == 0) ...
must be fully parenthesized to give proper results.

section 2.10: Assignment Operators and Expressions

Page 79 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 50

You may wonder what it means to say that ``expr<sub>1</sub> is computed only
once'' since in an assignment like
i = i + 2
we don't ``evaluate'' the i on the left hand side of the = at all, we assign to it. The
distinction becomes important, however, when the left hand side
(expr<sub>1</sub>) is more complicated than a simple variable. For example, we
could add 2 to each of the n cells of an array a with code like
int i = 0;
while(i < n)
a[i++] += 2;
If we tried to use the expanded form, we'd get
int i = 0;
while(i < n)
a[i++] = a[i++] + 2;
and by trying to increment i twice within the same expression we'd get (as we'll
see) undesired, unpredictable, and in fact undefined results. (Of course, a more
natural form of this loop would be
Page 80 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

for(i = 0; i < n; i++)


a[i] += 2;
and with the increment of i moved out of the array subscript, it wouldn't matter so
much whether we used a[i] += 2 or a[i] = a[i] + 2.)

page 51

To make the point more clear, the ``complicated expression'' without


using += would look like
yyval[yypv[p3+p4] + yypv[p1+p2]] = yyval[yypv[p3+p4] + yypv[p1+p2]]
+ 2
(What's going on here is that the subexpression yypv[p3+p4] + yypv[p1+p2] is
being used as a subscript to determine which cell of the yyval array to increment
by 2.)

The sentence on p. 51 that includes the words ``the assignment statement has a
value'' is a bit misleading: an assignment is really an expression in C. Like any
expression, it has a value, and it can therefore participate as a subexpression in a
Page 81 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

larger expression. (If the distinction between the terms ``statement'' and
``expression'' seems vague, don't worry; we'll start talking about statements in the
next chapter.)

section 2.11: Conditional Expressions


``Ternary'' is a ten-dollar word meaning ``having three operands.'' (It's analogous to
the terms unary and binary, which refer to operators having one and two operands,
respectively.) The conditional operator is a bit of a frill, and it's a bit obscure, so
you may skip section 2.11 in the book on first reading, but please read the
comments in these notes just below (under the mention of ``annoying
compulsion'').

page 52

To see what the ?: operator has bought us, here is what the array-printing loop
might look like without it:

Page 82 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

for(i = 0; i < n; i++) {


printf("%6d", a[i]);
if(i%10==9 || i==n-1)
printf("\n");
else printf(" ");
}

You may be finding this compulsion to write ``compact'' or ``concise'' code using
operators like ++ and += and ?: a bit annoying. There are three things to know:

1. In complicated code, these operators allow an economy of expression


which is beneficial. Mathematicians are constantly inventing new notations,
in which one letter or symbol stands for a complicated expression or
operation, in order to solve complicated problems without drowning in so
much verbiage that it would be impossible to follow an argument or check
for errors. Computer programs are large and complex, so well-chosen
abbreviations can make them easier to work with, too.
2. Some C programmers, it's true, do take the urge to write succinct or concise
code to excess, and end up with cryptic, bewildering, obfuscated,

Page 83 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

impenetrable messes. (I'm not apologizing for them: I hate overly


abbreviated, impossible-to-read code, too!)
3. Since there is overly concise C code out there, it's occasionally necessary to
dissect a piece of it and figure out what it does, so you need to have enough
familiarity with these operators, and with some standard, idiomatic ways in
which they're commonly combined, so that you won't be utterly stymied.

However, there is nothing that says that you have to write concise code yourself.
Don't be lured into thinking that you're not a ``real C programmer'' until you
routinely and easily write code which no one else can read. Write in a style that's
comfortable to you; don't be embarrassed if your code seems ``simple.'' (Actually,
the very best code seems simple, too.) With time, you'll probably come to
appreciate at least some of the idioms, and to be comfortable enough with them
that you may want to use a few of them yourself, after all.

Section 2.12: Precedence and Order of Evaluation

Page 84 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Note that precedence is not the same thing as order of evaluation. Precedence
determines how an expression is parsed, and it has an influence on the order in
which parts of it are evaluated, but the influence isn't as strong as you'd think.
Precedence says that in the expression
1 + 2 * 3
the multiplication happens before the addition. But if we have several function
calls, such as
f() + g() * h()
we have no idea which function will be called first; the compiler might arrange to
call f() first even though its value won't be needed until last. If we were to write an
abomination like
i = 1;
a[i++] + a[i++] * a[i++]
we would have no way of knowing which order the three increments would happen
in, and in fact the compiler wouldn't have any idea either. We could not argue that
since multiplication has higher precedence than addition, and since multiplication
associates from left to right, the second i++ would have to happen first, then the
third, then the first. (Actually, associativity never says anything about which side
Page 85 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

of a single binary operator gets evaluated first; associativity says which of several
adjacent same-precedence operators happens first.)

In general, you should be wary of ever trying to second-guess the relative order in
which the various parts of an expression will be evaluated, with two exceptions:

1. You can obviously assume that precedence will dictate the order in which
binary operators are applied. This typically says more than just what order
things happens in, but also what the expression actually means. (In other
words, the precedence of * over + says more than that the multiplication
``happens first'' in1 + 2 * 3; it says that the answer is 7, not 9.)
2. You can assume that the && and || operators are evaluated left-to-right, and
that the right-hand side is not evaluated at all if the left-hand side determines
the outcome.

To look at one more example, it might seem that the code


int i = 7;

Page 86 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

printf("%d\n", i++ * i++);


would have to print 56, because no matter which order the increments happen in,
7x8 is 8x7 is 56. But ++ just says that the increment happens later, not that it
happens immediately, so this code could print 49 (if it chose to perform the
multiplication first, and both increments later). And, it turns out that ambiguous
expressions like this are such a bad idea that the ANSI C Standard does not require
compilers to do anything reasonable with them at all, such that the above code
might end up printing 42, or 8923409342, or 0, or crashing your computer.

Finally, note that parentheses don't dictate overall evaluation order any more than
precedence does. Parentheses override precedence and say which operands go with
which operators, and they therefore affect the overall meaning of an expression,
but they don't say anything about the order of subexpressions or side effects. We
could not ``fix'' the evaluation order of any of the expressions we've been
discussing by adding parentheses. If we wrote
f() + (g() * h())

Page 87 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

we still wouldn't know whether f(), g(), or h() would be called first. (The
parentheses would force the multiplication to happen before the addition, but
precedence already would have forced that, anyway.) If we wrote
(i++) * (i++)
the parentheses wouldn't force the increments to happen before the multiplication
or in any well-defined order; this parenthesized version would be just as undefined
as i++ * i++ was.

page 53

Deep sentence:

Function calls, nested assignment statements, and increment and decrement


operators cause ``side effects''--some variable is changed as a by-product of the
evaluation of an expression.
(There's a slight inaccuracy in this sentence: any assignment expression counts as a
side effect.)

Page 88 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

It's these ``side effects'' that you want to keep in mind when you're making sure
that your programs are well-defined and don't suffer any of the undefined behavior
we've been discussing. (When we informally said that complex expressions had
several things going on ``at once,'' we were actually referring to expressions with
multiple side effects.) As a general rule, you should make sure that each expression
only has one side effect, or if it has several, that different variables are changed by
the several side effects.

page 54

Deep sentence:

The moral is that writing code that depends on order of evaluation is a bad
programming practice in any language. Naturally, it is necessary to know what
things to avoid, but if you don't know how they are done on various machines, you
won't be tempted to take advantage of a particular implementation.

The first edition of K&R said


Page 89 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

...if you don't know how they are done on various machines, that innocence may
help to protect you.

I actually prefer the first edition wording. Many textbooks encourage you to write
small programs to find out how your compiler implements some of these
ambiguous expressions, but it's just one step from writing a small program to find
out, to writing a real program which makes use of what you've just learned. And
you don'twant to write programs that work only under one particular compiler, that
take advantage of the way that compiler (but perhaps no other) happens to
implement the undefined expressions. It's fine to be curious about what goes on
``under the hood,'' and many of you will be curious enough about what's going on
with these ``forbidden'' expressions that you'll want to investigate them, but please
keep very firmly in mind that, for real programs, the very easiest way of dealing
with ambiguous, undefined expressions (which one compiler interprets one way
and another interprets another way and a third crashes on) is not to write them in
the first place.

Page 90 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Chapter 3: Control Flow

section 3.1: Statements and Blocks

section 3.2: If-Else

section 3.3: Else-If

section 3.4: Switch

section 3.5: Loops--While and For

section 3.6: Loops--Do-while

section 3.7: Break and Continue

section 3.8: Goto and Labels

Page 91 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 3.1: Statements and Blocks


page 55

Deep sentence:

There is no semicolon after the right brace that ends a block.

Nothing more to say here, but it's a frequent point of confusion.

section 3.2: If-Else


The syntax description here may seem to suggest
that statement<sub>1</sub> and statement<sub>2</sub> must be single, simple
Page 92 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

statements, but, as mentioned in section 3.1, a block of statements enclosed in


braces {} is equivalent to a single statement.

page 56

``Coding shortcuts'' like


if(expression)
can indeed be cryptic, but they're also quite common, so you'll need to be able to
recognize them even if you don't choose to write them in your own code.
Whenever you see code like
if (x)
or
if (f())
where x or f() do not have obvious ``Boolean'' names, just mentally add in != 0.

Don't worry too much if the multiple if/else ambiguity described on page 56
doesn't make perfect sense; just note the deep sentence:

Page 93 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

...it's a good idea to use braces when there are nested ifs.

section 3.3: Else-If


pages 57-58

Binary search is an extremely important algorithm, but it turns out that it is subtle
to get the implementation just right. (It has been observed that although the first
binary search was published in 1946, the first published binary search without bugs
did not appear until 1962.) The basic idea is the same as the algorithm we all tend
to use when we're asked to guess a number between 1 and 100: ``Is it between 1
and 50? Yes? Okay, is it between 25 and 50? No? Okay, is it between 1 and 12? ...
'' (Don't worry if you can't follow all of the details of the algorithm or the code on
page 58, but do remember to be extremely careful if you're ever asked to write a
binary search routine.)

section 3.4: Switch


Page 94 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

pages 58-59

We won't be concentrating on switch statements much (they're a bit of a luxury;


there's nothing you can do with a switch that you can't do with an if/else chain, as
in section 3.3 on page 57). But they're quite handy, and good to know about.

The example on page 59 is about as contrived as the example in section 1.6 (page
22) which it replaces, but studying both examples will give you an excellent feel
for how a switch statement works, what the if/then statements are that a switch is
equivalent to and how to map between the two, and why a switch statement can be
convenient.

In the example in the text, note especially the way that ten case labels are attached
to one set of statements (ndigit[c-'0']++;). As the authors point out, this works
because of the way switch cases ``fall through,'' which is a mixed blessing.

The danger of fall-through is illustrated by:

Page 95 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

switch(food) {
case APPLE:
printf("apple\n");

case ORANGE:
printf("orange\n");
break;

default:
printf("other\n");
}
When food is APPLE, this code erroneously prints
apple
orange

because the break statement after the APPLE case was omitted.

Page 96 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 3.5: Loops -- While and For


page 60

Remember that, as always, the statement can be a brace-enclosed block.

Make sure you understand how the for loop


for (expr<sub>1</sub>; expr<sub>2</sub>; expr<sub>3</sub>)
statement
is equivalent to the while loop
expr<sub>1</sub>;
while (expr<sub>2</sub>) {
statement
expr<sub>3</sub> ;
}
There is nothing magical about the three expressions at the top of a for loop; they
can be arbitrary expressions, and they're evaluated just as the expansion into the
Page 97 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

equivalent while loop would suggest. (Actually, there are two tiny differences: the
behavior of continue, which we'll get to in a bit, and the fact that the test
expression, expr<sub>2</sub>, is optional and defaults to ``true'' for a for loop, but
is required for a while loop.)

for(;;) is one way of writing an infinite loop in C; the other common one
is while(1). Don't worry about what a break would mean in a loop, we'll be seeing
it in a few more pages.

pages 60-61

Deep sentences:

Whether to use while or for is largely a matter of personal preference...


Nonetheless, it is bad style to force unrelated computations into the initialization
and increment of a for, which are better reserved for loop control operations.
In general, the three expressions in a for loop should all manipulate (initialize, test,
and increment) the same variable or data structure. If they don't, they are
Page 98 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

``unrelated computations,'' and a while loop would probably be clearer. (The


reason that one loop or the other can be clearer is simply that, when you see
a forloop, you expect to see an idiomatic initialize/test/increment of a single
variable or data structure, and if the for loop you're looking at doesn't end up
matching that pattern, you've been momentarily misled.)

page 61

When the authors say that ``the index and limit of a C for loop can be altered from
within the loop,'' they mean that a loop like
int i, n = 10;
for(i = 0; i < n; i++) {
if(i == 5)
i++;
printf("%d\n", i);
if(i == 8)
n++;
}

Page 99 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

where i and n are modified within the loop, is legal. (Obviously, such a loop can be
very confusing, so you'll probably be better off not making use of this freedom too
much.)

When they say that ``the index variable... retains its value when the loop terminates
for any reason,'' you may not find this too surprising, unless you've used other
languages where it's not the case. The fact that loop control variables retain their
values after a loop can make some code much easier to write; for example,
theatoi function at the bottom of this page depends on having its i counter
manipulated by several loops as it steps over three different parts of the string
(whitespace, sign, digits) with i's value preserved between each step.

Deep sentence:

Each step does its part, and leaves things in a clean state for the next.
This is an extremely important observation on how to write clean code. As you
study the atoi code, notice that it falls into three parts, each implementing one step

Page 100 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

of the pseudocode description: skip white space, get sign, get integer part and
convert it. At each step, i points at the next character which that step is to inspect.
(If a step is skipped, because there is no leading whitespace or no sign, the later
steps don't care.)

You may hear the term invariant used: this refers to some condition which exists at
all stages of a program or function. In this case, the invariant is that i always points
to the next character to be inspected. Having some well-chosen invariants can
make code much easier to write and maintain. If there aren't enough invariants--
if i is sometimes the next character to look at and sometimes the character that was
just looked at--debugging and maintaining the code can be a nightmare.

In the atoi example, the lines


for (i = 0; isspace(s[i]); i++) /* skip white space */
;

Page 101 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

are about at the brink of ``forcing unrelated computations into the initialization and
increment,'' especially since so much has been forced into the loop header that
there's nothing left in the body. It would be equally clear to write this part as
i = 0;
while (isspace(s[i]))
i++; /* skip white space */

The line
sign = (s[i] == '-') ? -1 : 1;
may seem a bit cryptic at first, though it's a textbook example of the use of ?: . The
line is equivalent to
sign = 1;
if(s[i] == '-')
sign = -1;

pages 61-62

It's instructive to study this Shell or ``gap'' sort, but don't worry if you find it a bit
bewildering.
Page 102 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Deep sentence:

Notice how the generality of for makes the outer loop fit the same form as the
others, even though it is not an arithmetic progression.
The point is that loops don't have to count 0, 1, 2... or 1, 2, 3... . (This one
counts n/2, n/4, n/8... . Later we'll see loops which don't step over numbers at all.)

page 63

Deep sentence:

The commas that separate function arguments, variables in declarations, etc.


are not comma operators...
This looks strange, but it's true. If you say
for (i = 0, j = strlen(s)-1; i < j; i++, j--)
the first comma says to do i = 0 then do j = strlen(s)-1, and the second comma
says to do i++ then do j--. However, when you say
getline(line, MAXLINE);

Page 103 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

the comma just separates the two arguments line and MAXLINE; they both have to be
evaluated, but it doesn't matter in which order, and they're both passed togetline.
(If the comma in a function call were interpreted as a comma operator, the function
would only receive one argument, since the value of the first operand of the
comma operator is discarded.) Since the comma operator discards the value of its
first operand, its first operand had better have a side effect. The expression
++a,++b
increments a and increments b and (if anyone cares) returns b's value, but the
expression
a+1,b+1
adds 1 to a, discards it, and returns b+1.

If the comma operator isn't making perfect sense, don't worry about it for now.
You're most likely to see it in the first or third expression of a for statement, where
it has the obvious meaning of separating two (or more) things to do during the
initialization or increment step. Just be careful that you don't accidentally write
things like
for(i = 0; j = 0; i < n && j < j; i++; j++) /* WRONG */
Page 104 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

or
for(i = 0, j = 0, i < n && j < j, i++, j++) /* WRONG */
The correct form of a multi-index loop is something like
for(i = 0, j = 0; i < n && j < j; i++, j++)

Semicolons always separate the initialization, test, and increment parts; commas
may appear within the initialization and increment parts.

section 3.6: Loops -- Do-while


page 63

Note the semicolon following the parenthesized expression in the do-while loop;
it's a required part of the syntax.

Page 105 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Make sure you understand the difference between a while loop and a do-
while loop. A while loop executes strictly according to its conditional expression:
if the expression is never true, the loop executes zero times. The do-while loop, on
the other hand, makes an initial ``no peek'' foray through the loop body no matter
what.

To see the difference, let's imagine three different ways of writing the loop in
the itoa function on page 64. Suppose we somehow forgot to use a termination
condition at all, and wrote something like
for(;;) {
s[i++] = n % 10 + '0';
n /= 10;
}
Eventually, n becomes zero, but we keep going around the loop, and we convert a
number like 123 into a string like "0000000000123", except with an infinite number
of leading zeroes. (Mathematically, this is correct, but it's not what we want here,
especially if we want our program to use a finite amount of time and space.)

Page 106 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Our next attempt might be


while(n > 0) {
s[i++] = n % 10 + '0';
n /= 10;
}
so that we stop creating digits when n reaches 0. This works fine for positive
numbers, but for 0, it stops too soon: it would convert the number 0 to the empty
string"". That's why the do-while loop is appropriate here; the fact that it always
makes at least one pass through the loop makes sure that we always generate at
least one digit, even it it's 0.

(It's also useful to look at the invariants in this loop: during each trip through the
loop, n contains the rest of the number we have to convert, s[] contains the digits
we've already converted, and i points at the next cell in s[] which is to receive a
digit. Each trip through the loop converts one digit, increments i, and divides n by
10.)

Page 107 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 3.7: Break and Continue


pages 64-65

Note that a break inside a switch inside a loop causes a break out of the switch,
while a break inside a loop inside a switch causes a break out of the loop.

Neither break nor continue has any effect on a brace-enclosed block of statements
following an if. break causes a break out of the innermost switch or loop,
and continue forces the next iteration of the innermost loop.

There is no way of forcing a break or continue to act on an outer loop.

Another example of where continue is useful is when processing data files. It's
often useful to allow comments in data files; one convention is that a line
beginning with a # character is a comment, and should be ignored by any program
reading the file. This can be coded with something like

Page 108 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

while(getline(line, MAXLINE) > 0) {


if(line[0] == '#')
continue;

/* process data file line */


}
The alternative, without a continue, would be
while(getline(line, MAXLINE) > 0) {
if(line[0] != '#') {
/* process data file line */
}
}

but now the processing of normal data file lines has been made subordinate to
comment lines. (Also, as the authors note, it pushes most of the body of the loop
over by another tab stop.) Since comments are exceptional, it's nice to test for
them, get them out of the way, and go on about our business, which the code
usingcontinue nicely expresses.

Page 109 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 3.8: Goto and Labels


pages 65-66

A tremendous amount of impassioned debate surrounds the lowly goto statement,


which exists in many languages. Some people say that gotos are fine; others say
that they must never be used. You should definitely shy away from gotos, but don't
be dogmatic about it; some day, you'll find yourself writing some screwy piece of
code where trying to avoid a goto (by introducing extra tests or Boolean control
variables) would only make things worse.

page 66

When you find yourself writing several nested loops in order to search for
something, such that you would need to use a goto to break out of all of them when
you do find what you're looking for, it's often an excellent idea to move the search
code out into a separate function. Doing so can make both the ``found'' and ``not

Page 110 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

found'' cases easier to handle. Here's a slight variation on the example in the
middle of page 66, written as a function:
/* return i such that a[i] == b[j] for some j, or -1 if none */

int findequal(int a[], int na, int b[], int nb)


{
int i, j;

for(i = 0; i < na; i++) {


for(j = 0; j < nb; j++) {
if(a[i] == b[j])
return i;
}
}

Page 111 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

return -1;
}
This function can then be called as
i = findequal(a, na, b, nb);
if(i == -1)
/* didn't find any common element */
else /* got one */

(The only disadvantage here is that it's trickier to return i and j if we need them
both.)

Chapter 4: Functions and Program Structure


page 67

Deep paragraph:

Page 112 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Functions break large computing tasks into smaller ones, and


enable people to build on what others have done instead of
starting over from scratch. Appropriate functions hide details of
operation from parts of the program that don't need to know
about them, thus clarifying the whole, and easing the pain of
making changes.

Functions are probably the most import weapon in our battle


against software complexity. You'll want to learn when it's
appropriate to break processing out into functions (and also when
it's not), and how to set up function interfaces to best achieve the
qualities mentioned above: reuseability, information hiding,
clarity, and maintainability.

The quoted sentences above show that a function does more than just save typing:
a well-defined function can be re-used later, and eases the mental burden of

Page 113 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

thinking about a complex program by freeing us from having to worry about all of
it at once. For a well-designed function, at any one time, we should either have to
think about:

1. that function's internal implementation (when we're writing


or maintaining it); or
2. a particular call to the function (when we're working with
code which uses it).

But we should not have to think about the internals when we're
calling it, or about the callers when we're implementing the
internals. (We should perhaps think about the callers just enough
to ensure that the function we're designing will be easy to call,
and that we aren't accidentally setting up so that callers will have
to think about any internal details.)

Page 114 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Sometimes, we'll write a function which we only call once, just because breaking it
out into a function makes things clearer and easier.

Deep sentence:

C has been designed to make functions efficient and easy to use;


C programs generally consist of many small functions rather than
a few big ones.

Some people worry about ``function call overhead,'' that is, the
work that a computer has to do to set up and return from a
function call, as opposed to simply doing the function's
statements in-line. It's a risky thing to worry about, though,
because as soon as you start worrying about it, you have a bit of
a disincentive to use functions. If you're reluctant to use
functions, your programs will probably be bigger and more

Page 115 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

complicated and harder to maintain (and perhaps, for various


reasons, actually less efficient).

The authors choose not to get involved with the system-specific aspects of separate
compilation, but we'll take a stab at it here. We'll cover two possibilities,
depending on whether you're using a traditional command-line compiler or a newer
integrated development environment (IDE) or other graphical user interface (GUI)
compiler.

When using a command-line compiler, there are usually two main steps involved
in building an executable program from one or more source files. First, each source
file is compiled, resulting in an object file containing the machine instructions
(generated by the compiler) corresponding to the code in that source file. Second,
the various object files are linked together, with each other and
with libraries containing code for functions which you did not write (such
as printf), to produce a final, executable program.

Page 116 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Under Unix, the cc command can perform one or both steps. So far, we've been
using extremely simple invocations of cc such as
cc hello.c
(section 1.1, page 6). This invocation compiles a single source file,
links it, and places the executable (somewhat inconveniently) in a
file named a.out.

Suppose we have a program which we're trying to build from three separate source
files, x.c, y.c, and z.c. We could compile all three of them, and link them together,
all at once, with the command
cc x.c y.c z.c
(see also page 70). Alternatively, we could compile them
separately: the -c option to cc tells it to compile only, but not to
link. Instead of building an executable, it merely creates an object

Page 117 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

file, with a name ending in .o, for each source file compiled. So
the three commands
cc -c x.c
cc -c y.c
cc -c y.c
would compile x.c, y.c, and z.c and create object files x.o, y.o,
and z.o. Then, the three object files could be linked together using
cc x.o y.o z.o
When the cc command is given an .o file, it knows that it does not
have to compile it (it's an object file, already compiled); it just
sends it through to the link process.

Here we begin to see one of the advantages of separate compilation: if we later


make a change to y.c, only it will need recompiling. (At some point you may want
to learn about a program called make, which keeps track of which parts need
recompiling and issues the appropriate commands for you.)

Page 118 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Above we mentioned that the second, linking step also involves pulling in library
functions. Normally, the functions from the Standard C library are linked in
automatically. Occasionally, you must request a library manually; one common
situation under Unix is that certain math routines are in a separate math library,
which is requested by using -lm on the command line. Since the libraries must
typically be searched after your program's own object files are linked (so that the
linker knows which library functions your program uses), any -l option must
appear after the names of your files on the command line. For example, to link the
object filemymath.o (previously compiled with cc -c mymath.c) together with the
math library, you might use
cc mymath.o -lm

Two final notes on the Unix cc command: if you're tired of using the nonsense
name a.out for all of your programs, you can use -o to give another name to the
output (executable) file:
cc -o hello hello.c

Page 119 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

would create an executable file named hello, not a.out. Finally,


everything we've said about cc also applies to most other Unix C
compilers. Many of you will be using acc (a semistandard name for
a version of cc which does accept ANSI Standard C) or gcc (the
FSF's GNU C Compiler, which also accepts ANSI C and is free).

There are command-line compilers for MS-DOS systems which work similarly.
For example, the Microsoft C compiler comes with a CL (``compile and link'')
command, which works almost the same as Unix cc. You can compile and link in
one step:
cl hello.c
or you can compile only:
cl /c hello.c
creating an object file named hello.obj which you can link later.

Page 120 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The preceding has all been about command-line compilers. If you're using some
kind of integrated development environment, such as Turbo C or the Microsoft
Programmer's Workbench or Think C, most of the mechanical details are taken
care of for you. (There's also less I can say here about these environments, because
they're all different.) Typically there's a way to specify the list of files (modules)
which make up your project, and a single ``build'' button which does whatever's
required to build (and perhaps even execute) your program.

section 4.1: Basics of Functions

section 4.2: Functions Returning Non-Integers

section 4.3: External Variables

section 4.4: Scope Rules

section 4.5: Header Files

Page 121 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 4.6: Static Variables

section 4.7: Register Variables

section 4.8: Block Structure

section 4.9: Initialization

section 4.10: Recursion

section 4.11: The C Preprocessor

section 4.11.1: File Inclusion

section 4.11.2: Macro Substitution

section 4.11.3: Conditional Inclusion

Page 122 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 4.1: Basics of Functions


page 68

Once again, notice how a clear, simple description of the problem we're trying to
solve leads to an (almost) equally clear program implementing it.

Here are some more nice statements about the virtues of a clean, modular design:

Although it's certainly possible to put the code for all of this in main, a better way is
to use the structure to advantage by making each part a separate function. Three
small pieces are easier to deal with than one big one, because irrelevant details can
be buried in the functions, and the chance of unwanted interactions is minimized.
And the pieces may even be useful in other programs.

Page 123 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Let's say a bit more about how and why functions can be useful. First, we can see
that, having chosen to use a separate function for each part of the print-matching-
lines program, the top-level main routine on page 69 is particularly simple and
straightforward; it's little more than a transcription into C of the pseudocode on
page 68. The authors don't tend to use too many comments in their code, anyway,
but this code hardly needs any: the names of the functions called speak for
themselves. (The only thing that might not be obvious at first is that strindex is
being used not so much to find the index of a substring but just to determine
whether a substring is present at all.) Second, we may be pleased to notice that
we're already having a chance to re-use the getline function we first wrote in
Chapter 1. Third, we note that the two functions which we've chosen to use
(getline and strindex) are themselves reasonably simple and straightforward to
write. Finally, note that sometimes what you re-use is not so much a function as a
function interface. The code on page 69 uses a new implementation of getline, but
the interface (the argument list, return value, and functionality) is the same as for
the versions of getline in section 1.9 on page 29. We could have used that version
here, or this new version there. Later, if we think of some even better way of

Page 124 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

reading lines, we can write yet another version of getline, and as long as it has the
same interface, these programs can call it without their having to be rewritten.

The ease with which a program like this comes together may be mildly deceptive,
because nowhere have we discussed the the motivations which led to the particular
pseudocode description on page 68 or the particular definitions of the functions
which were chosen to break the problem down into. Choosing a design for a
program, and defining subfunctions (their interfaces and their behavior) are both
arts, and of course the tasks are not unrelated. A good design leads to the invention
of functions which might well be useful later, and an existing body of good,
general-purpose functions (all crying out to be re-used) can help to guide the
design of the next program.

What makes a good building block, either an abstract one that we use in a
pseudocode description, or a concrete one in the form of a general-purpose
function? The most important aspect of a good building block is that have a single,
well-defined task to perform. Two of the three functions used in the line-matching
program fill this role very well: getline's job is to read one line, and strindex'es
Page 125 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

job is to find one string in another string. printf's specification is considerably


broader: its job is to print stuff. (It's not surprising that printf can therefore be the
harder routine to call, and is certainly much harder to implement. Its saving virtue
is that it is nonetheless broadly applicable and infinitely reusable.)

When you find that a program is hard to manage, it's often because if has not been
designed and broken up into functions cleanly. Two obvious reasons for moving
code down into a function are because:

1. It appeared in the main program several times, such that by making it a function,
it can be written just once, and the several places where it used to appear can be
replaced with calls to the new function.

2. The main program was getting too big, so it could be made (presumably) smaller
and more manageable by lopping part of it off and making it a function.

Page 126 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

These two reasons are important, and they represent significant benefits of well-
chosen functions, but they are not sufficient to automatically identify a good
function. A good function has at least these two additional attributes:

3. It does just one well-defined task, and does it well.

4. Its interface to the rest of the program is clean and narrow.

Attribute 3 is just a restatement of something we said above. Attribute 4 says that


you shouldn't have to keep track of too many things when calling a function. If you
know what a function is supposed to do, and if its task is simple and well-defined,
there should be just a few pieces of information you have to give it to act upon, and
one or just a few pieces of information which it returns to you when it's done. If
you find yourself having to pass lots and lots of information to a function, or
remember details of its internal implementation to make sure that it will work
properly this time, it's often a sign that the function is not sufficiently well-defined.
(It may be an arbitrary chunk of code that was ripped out of a main program that

Page 127 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

was getting too big, such that it essentially has to have access to all of that main
function's variables.)

The whole point of breaking a program up into functions is so that you don't have
to think about the entire program at once; ideally, you can think about just one
function at a time. A good function is a ``black box'': when you call it, you only
have to know what it does (not how it does it); and when you're writing it, you only
have to know what it's supposed to do (and you don't have to know why or under
what circumstances its caller will be calling it). Some functions may be hard to
write (if they have a hard job to do, or if it's hard to make them do it truly well),
but that difficulty should be compartmentalized along with the function itself.
Once you've written a ``hard'' function, you should be able to sit back and relax and
watch it do that hard work on call from the rest of your program. If you find that
difficulties pervade a program, that the hard parts can't be buried inside black-box
functions and then forgotten about, if you find that there are hard parts which
involve complicated interactions among multiple functions, then the program
probably needs redesigning.

Page 128 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

For the purposes of explanation, we've been seeming to talk so far only about
``main programs'' and the functions they call and the rationale behind moving some
piece of code down out of a ``main program'' into a function. But in reality, there's
obviously no need to restrict ourselves to a two-tier scheme. The ``main
program,''main(), is itself just a function, and any function we find ourself writing
will often be appropriately written in terms of sub-functions, sub-sub-functions,
etc.

That's probably enough for now about functions in general. Here are a few more
notes about the line-matching program.

The authors mention that ``The standard library provides a function strstr that is
similar to strindex, except that it returns a pointer instead of an index.'' We haven't
met pointers yet (they're in chapter 5), so we aren't quite in a position to appreciate
the difference between an index and a pointer. Generally, an index is a small
number referring to some element of an array. A pointer is more general: it can
point to any data object of a particular type, whether it's one element of an array, or
some other object anywhere in memory. (Don't worry too much about the
Page 129 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

distinction yet, but bear in mind that there is a distinction. Note, too, that the
distinction is not absolute; in fact, the word ``index'' seems to derive from the
concept of pointing, as you can see if you think about what you use your index
finger for, or if you notice that the entries in a book's index point at the referenced
parts of the book. We frequently speak casually of an index variable ``pointing at''
some cell of an array, even though it's not a true pointer variable.)

One facet of the getline function's interface might bear mentioning: its first
argument, the character array s, is being used to return the line that it reads. This
may seem to contradict the rule that a function can never modify the value of a
variable in its caller. As was briefly mentioned on page 28, there's an exception for
arrays, which well be learning about in chapter 5; for now, we'll gloss over the
point. (Actually, we're glossing over two points: not only is getline able to return a
value via an argument, but the argument isn't really an array, although it's declared
as and looks like one. Please forgive these gentle fictions; explaining them
completely would really be premature at this point. Perhaps they weren't worth
mentioning yet, after all.)

Page 130 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

For comparison, here is yet another version of getline:


int getline(char s[], int lim)
{
int c, i = 0;

while(--lim > 0 && (c=getchar()) != EOF) {


s[i++] = c;
if(c == '\n')
break;
}

s[i] = '\0';

return i;
Page 131 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

}
Note that by using break, we avoid having to test for '\n' in two different places.

If you're having trouble seeing how the strindex function works, its algorithm is
for (each position i in s)
if (t occurs at position i in s)
return i;

(else) return -1;


Filling in the details of ``if
(t occurs at position i in s)'', we have:
for (each position i in s)
for (each character in t)
if (it matches the corresponding character in s)
if (it's '\0')
return i;
else keep going
else no match at position i

Page 132 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

(else) return -1;


A slightly less compressed implementation than the one on page 69 would be:
int strindex(char s[], char t[])
{
int i, j, k;

for (i = 0; s[i] != '\0'; i++) {


for(j = i, k = 0; t[k] != '\0'; j++, k++)
if(s[j] != t[k])
break;

if(t[k] == '\0')
return i;
}

Page 133 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

return -1;
}
Note that we have to check for the end of the string t twice: once to see if we're at
the end of it in the innermost loop, and again to see why we terminated the
innermost loop. (If we terminated the innermost loop because we reached the end
of t, we found a match; otherwise, we didn't.) We could rearrange things to remove
the duplicated test:
int strindex(char s[], char t[])
{
int i, j, k;

for (i = 0; s[i] != '\0'; i++) {


j = i;
k = 0;

Page 134 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

do {
if(t[k] == '\0')
return i;
} while(s[j++] == t[k++]);

return -1;
}
It's a matter of style which implementation of strindex is preferable; it's
impossible to say which is ``best.'' (Can you see a slight difference in the behavior
of the version on page 69 versus the two here? Under what circumstance(s) would
this difference be significant? How would the version on page 69 behave under
those circumstances, and how would the two routines here behave?)

page 70

Page 135 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Deep sentence:

A program is just a set of definitions of variables and functions.


This sentence may or may not seem deep, and it may or may not be deep, but it's a
fundamental definition of what a C program is.

Note that a function's return value is automatically converted to the return type of
the function, if necessary, just as in assignments like
f = i;
where f is float and i is int.

Most programmers do use parentheses around the expression in a return statement,


because that way it looks more like while(), for(), etc. The reason the parentheses
are optional is that the formal syntax is
return expression ;
and, as we know, any expression surrounded by parentheses is another expression.

Page 136 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

It's debatable whether it's ``not illegal'' for a function to have return statements
with and without values. It's a ``sign of trouble'' at best, and undefined at worst.
Another clear sign of trouble (which is equally undefined) is when a function
returns no value, or is declared as void, but a caller attempts to use the return value.

The main program on page 69 returns the number of matching lines found. This is
probably better than returning nothing, but the convention is usually that a C
program returns 0 when it succeeds and a positive number when it fails.

section 4.2: Functions Returning Non-Integers


page 71

Actually, we may have seen at least one function returning a non-integer, in the
Fahrenheit-Celsius conversion program in exercise 1-15 on page 27 in section 1.7.

Page 137 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The type name which precedes the name of a function (and which sets its return
type) looks just like (i.e. is syntactically the same as) the void keyword we've been
using to identify functions which don't return a value.

Note that the version of atof on page 71 does not handle exponential notation
like 1.23e45; handling exponents is left for exercise 4-2 on page 73.

``The standard library includes an atof'' means that we're reimplementing


something which would otherwise be provided for us anyway (i.e. just like printf).
In general, it's a bad idea to rewrite standard library routines, because by doing so
you negate the advantage of having someone else write them for you, and also
because the compiler or linker are allowed to complain if you redefine a standard
routine. (On the other hand, seeing how the standard library routines are
implemented can be a good learning experience.)

page 72

Page 138 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

In the ``primitive calculator'' code at the top of page 72, note that the call to atof is
buried in the argument list of the call to printf.

Deep sentences:

The function atof must be declared and defined consistently. If atof itself and the
call to it in main have inconsistent types in the same source file, the error will be
detected by the compiler. But if (as is more likely) atof were compiled separately,
the mismatch would not be detected, atof would return a double that main would
treat as an int, and meaningless answers would result.
The problems of mismatched function declarations are somewhat reduced today by
the widespread use of ANSI function prototypes, but they're still important to be
aware of.

The implicit function declarations mentioned at the bottom of page 72 are an older
feature of the language. They were handy back in the days when most functions
returned int and function prototypes hadn't been invented yet, but today, if you

Page 139 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

want to use prototypes, you won't want to rely on implicit declarations. If you don't
like depending on defaults and implicit declarations, or if you do want to use
function prototypes religiously, you're under no compunction to make use of (or
even learn about) implicit function declarations, and you'll want to configure your
compiler so that it will warn you if you call a function which does not have an
explicit, prototyped declaration in scope.

You may wonder why the compiler is able to get some things right (such as
implicit conversions between integers and floating-point within expressions)
whether or not you're explicit about your intentions, while in other circumstances
(such as while calling functions returning non-integers) you must be explicit. The
question of when to be explicit and when to rely on the compiler hinges on several
questions:

1. How much information does the compiler have available to it?


2. How likely is it that the compiler will infer the right action?
3. How likely is it that a mistake which you the programmer might make will
be caught by the compiler, or silently compiled into incorrect code?
Page 140 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

It's fine to depend on things like implicit conversions as long as the compiler has
all the information it needs to get them right, unambiguously. (Relying on implicit
conversions can make code cleaner, clearer, and easier to maintain.) Relying on
implicit declarations, however, is discouraged, for several reasons. First, there are
generally fewer declarations than expressions in a program, so the impact (i.e.
work) of making them all explicit is less. Second, thinking about declarations is
good discipline, and requiring that everything normally be declared explicitly can
let the compiler catch a number of errors for you (such as misspelled functions or
variables). Finally, since the compiler only compiles one source file at a time, it is
never able to detect inconsistencies between files (such as a function or variable
declared one way in once source file and some other way in another), so it's
important that cross-file declarations be explicit and consistent. (Various strategies,
such as placing common declarations in header files so that they can be #included
wherever they're needed, and requesting that the compiler warn about function
calls without prototypes in scope, can help to reduce the number of errors having to
do with improper declarations.)

Page 141 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

For the most part, you can also ignore the ``old style'' function syntax, which
hardly anyone is using any more. The only thing to watch out for is that an empty
set of parentheses in a function declaration is an old-style declaration and means
``unspecified arguments,'' not ``no arguments.'' To declare a new-style function
taking no arguments, you must include the keyword void between the parentheses,
which makes the lack of arguments explicit. (A declaration like
int f(void);
does not declare a function accepting one argument of type void, which would be
meaningless, since the definition of type void is that it is a type with no values.
Instead, as a special case, a single, unnamed parameter of type void indicates that a
function takes no arguments.) For example, the definition of the getcharfunction
might look like
int getchar(void)
{
int c;

Page 142 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

read next character into c somehow

if (no next character)


return EOF;

return c;
}

page 73

Note that this version of atoi, written in terms of atof, has very slightly different
behavior: it reads past a '.' (and, assuming a fully-functional version of atof, an 'e').

The use of an explicit cast when returning a floating-point expression from a


routine declared as returning int represents another point on the spectrum of what

Page 143 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

you should worry about explicitly versus what you should feel comfortable making
use of implicitly. This is a case where the compiler can do the ``right thing'' safely
and unambiguously, as long as what you said (in this case, to return a floating-
point expression from a routine declared as returning int) is in fact what
you meant. But since the real possibility exists that discarding the fractional part is
not what you meant, some compilers will warn you about it. Typically, compilers
which warn about such things can be quieted by using an explicit cast; the explicit
cast (even though it appears to ask for the same conversion that would have
happened implicitly) serves to silence the warning. (In general, it's best to silence
spurious warnings rather than just ignoring them. If you get in the habit of ignoring
them, sooner or later you'll overlook a significant one that you would have cared
about.)

section 4.3: External Variables


The word ``external'' is, roughly speaking, equivalent to ``global.''
Page 144 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 74

A program with ``too many data connections between functions'' hasn't managed to
achieve the desirable attributes we were talking about earlier, in particular that a
function's ``interface to the rest of the program is clean and narrow.'' Another bit of
jargon you may hear is the word ``coupling,'' which refers to how much one piece
of a program has to know about another.

In general, as we have mentioned, the connections between functions should


generally be few and well-defined, in which case they will be amenable to regular
old function arguments, and you won't be tempted to pass lots of data around in
global variables. (On the other hand, global variables are fine for some things, such
as configuration information which the whole program cares about and which is set
just once at program startup and then doesn't change.)

The word ``lifetime'' refers to how long a variable and its value stick around. (The
jargon term is ``duration.'') So far, we've seen that global variables persist for the
life of the program, while local variables last only as long as the functions defining
Page 145 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

them are active. However, lifetime (duration) is a separate and orthogonal concept
from scope; we'll soon be meeting local variables which persist for the life of the
program.

Deep sentence:

Thus if two functions must share some data, yet neither calls the other, it is often
most convenient if the shared data is kept in external variables rather than passed
in and out via arguments.
(Later, though, we'll learn about data structures which can make it more convenient
to pass certain data around via function arguments, so we'll have less reason for
using external variables for these sorts of purposes.)

``Reverse Polish'' is used by some (earlier, all) Hewlett-Packard calculators. (The


name is based on the nationality of the mathematician who studied and formalized
this notation.) It may seem strange at first, but it's natural if you observe that you
need both numbers (operands) before you can carry out an operation on them.

Page 146 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

(This fact is one of the reasons that reverse Polish notation is ``easier to
implement.'')

The calculator example is a bit long and a bit involved, but I urge you to work
through and understand it. A calculator is something that everyone's likely to be
familiar with; it's interesting to see how one might work inside; and the techniques
used here are generally useful in all sorts of programs.

A ``stack'' is simply a last-in, first-out list. You ``push'' data items onto a stack, and
whenever you ``pop'' an item from the stack, you get the one most recently pushed.

pages 76-79

The code for the calculator may seem daunting at first, but it's much easier to
follow if you look at each part in isolation (as good functions are meant to be
looked at), and notice that the routines fall into three levels. At the top level is the
calculator itself, which resides in the function main. The main function calls three

Page 147 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

lower-level functions: push, pop, and getop. getop, in turn, is written in terms of the
still lower-level functions getch and ungetch.

A few details of the communication among these functions deserve mention.


The getop routine actually returns two values. Its formal return value is a character
representing the next operation to be performed. Usually, that character is just the
character the user typed, that is, +, -, *, or /. In the case of a number typed by the
user, the special code NUMBER is returned (which happens to be #defined to be the
character '0', but that's arbitrary). A return value of NUMBER indicates that an entire
string of digits has been typed, and the string itself is copied into the array s passed
to getop. In this case, therefore, the array s is the second return value.

In some printings, the second line on page 76 reads


#include <math.h> /* for atof() */
which is incorrect; it should be
#include <stdlib.h> /* for atof() */

Page 148 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 77

Make sure you understand why the code


push(pop() - pop()); /* WRONG */
might not work correctly.

``The representation can be hidden'' means that the declarations of these variables
can follow main in the file, such that main can't ``see'' them (that is, can't attempt to
refer to them). Furthermore, as we'll see, the declarations might be moved to a
separate source file, and main won't care.

pages 77-78

Note that getop does not incorporate the functionality of atoi or atof--it collects
and returns the digits as a string, and main calls atof to convert the string to a
floating-point number (prior to pushing it on the stack). (There's nothing profound

Page 149 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

about this arrangement; there's no particular reason why getop couldn't have been
set up to do the conversion itself.)

The reasons for using a routine like ungetch are good and sufficient, but they may
not be obvious at first. The essential motivation, as the authors explain, is that
when we're reading a string of digits, we don't know when we've reached the end
of the string of digits until we've read a non-digit, and that non-digit is not part of
the string of digits, so we really shouldn't have read it yet, after all. The rest of the
program is set up based on the assumption that one call to getop will return the
string of digits, and the next call will return whatever operator followed the string
of digits.

To understand why the surprising and perhaps kludgey-


sounding getch/ungetch approach is in fact a good one, let's consider the
alternatives. getop could keep track of the one-too-far character somehow, and
remember to use it next time instead of reading a new character. (Exercise 4-11
asks you to implement exactly this.) But this arrangement of getop is considerably
less clean from the standpoint of the ``invariants'' we were discussing
Page 150 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

earlier. getop can be written relatively cleanly if one of its invariants is that the
operator it's getting is always formed by reading the next character(s) from the
input stream. getop would be considerably messier if it always had to remember to
use an old character if it had one, or read a new character otherwise. If getop were
modified later to read new kinds of operators, and if reading them involved reading
more characters, it would be easy to forget to take into account the possibility of an
old character each time a new character was needed. In other words, everywhere
that getop wanted to do the operation
read the next character
it would instead have to do
if (there's an old character)
use it
else
read the next character
It's much cleaner to push the checking for an old character down into
the getch routine.

Page 151 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Devising a pair of routines like getch and ungetch is an excellent example of the
process of abstraction. We had a problem: while reading a string of digits, we
always read one character too far. The obvious solution--remembering the one-too-
far character and using it later--would have been clumsy if we'd implemented it
directly within getop. So we invented some new functions to centralize and
encapsulate the functionality of remembering accidentally-read characters, so
that getopcould be written cleanly in terms of a simple ``get next character''
operation. By centralizing the functionality, we make it easy for getop to use it
consistently, and by encapsulating it, we hide the (potentially ugly) details from the
rest of the program. getch and ungetch may be tricky to write, but once we've
written them, we can seal up the little black boxes they're in and not worry about
them any more, and the rest of the program (especially getop) is cleaner.

page 79

If you're not used to the conditional operator ?: yet, here's how getch would look
without it:

Page 152 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

int getch(void)
{
if (bufp > 0)
return buf[--bufp];
else return getchar();
}
Also, the extra generality of these two routines (namely, that they can push back
and remember several characters, a feature which the calculator program doesn't
even use) makes them a bit harder to follow. Exercise 4-8 asks you two write
simpler versions which allow only one character of pushback. (Also, as the text
notes, we don't really have to be writing ungetch at all, because the standard library
already provides an ungetc which can provide one character of pushback
forgetchar.)

When we defined a stack, we said that it was ``last-in, first-out.'' Are the versions
of getch and ungetch on page 79 last-in, first-out or first-in, first out? Do you agree
with this choice?

Page 153 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

One last note: the name of the variable bufp suggests that it is a pointer, but it's
actually an index into the buf array.

Read sequentially: prev next up top

This page by Steve Summit // Copyright 1995, 1996 // mail feedback

section 4.4: Scope Rules


page 80

With respect to the ``practical matter'' of splitting the calculator program up into
multiple source files, though it's certainly small enough to fit comfortably into a
Page 154 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

single source file, it's not so small that there's anything wrong with splitting it up
into multiple source files, especially if we start adding functionality to it.

The scope of a name is what we have been calling its ``visibility.'' When we say
things like ``calling a function with a prototype in scope'' we mean that a prototype
is visible, that a declaration is in effect.

The variables sp and val can be used by the push and pop routines because they're
defined in the same file (and the definitions appear before push and pop). They can't
be used in main because no declaration for them appears in main.c (nor in calc.h,
which main.c #includes). If main attempted to refer to sp or val, they'd be flagged
as undefined. (Don't worry about the visibility of ``push and pop themselves.'')

The paragraph beginning ``On the other hand'' is explaining how global
(``external'') variables like sp and val could be accessed in a file other than the file
where they are defined. In the examples we've been looking at, as we've
said, sp and val can be used in push and pop because the variables are defined
above the functions. If the variables were defined elsewhere (i.e. in some other
Page 155 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

file), we'd need a declaration above--and that's exactly what extern is for. (See
page 81 for an example.)

page 81

A definition creates a variable, and for any given global variable, you only want to
do that once. Anywhere else, you want to refer to an existing variable, created
elsewhere, without creating a new, conflicting one. Referring to an existing
variable or function is exactly what a declaration is for.

Note also that the definition may optionally initialize the variable. (Don't worry
about why a declaration may optionally include an array dimension.)

``This same organization would also be needed if the definitions


of sp and val followed their use in one file'' means that we could conceivably have,
in one file,
extern int sp;
extern double val[];
Page 156 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

void push(double f) { ... }


double pop(void) { ... }

int sp = 0;
double val[MAXVAL];

So ``extern'' just means ``somewhere else''; it doesn't have to mean ``in a different
file,'' though usually it does.

section 4.5: Header Files


page 82

Page 157 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

By the way, the ``.h'' traditionally used in header file names simply stands for
``header.''

We can imagine several strategies for using header files. At one extreme would be
to use zero header files, and to repeat declarations in each file which needed them.
This would clearly be a poor strategy, because whenever a declaration changed, we
would have to remember to change it in several places, and it would be easy to
miss one of them, leading to stubborn bugs. At the other extreme would be to use
one header file for each source file (declaring just the things defined in that source
file, to be #included by files using those things), but such a proliferation of header
files would usually be unwieldy. For small projects (such as the calculator
example), it's a reasonable strategy to use one header file for the entire project. For
larger projects, you'll usually have several header files for sets of related
declarations.

ection 4.6: Static Variables


Page 158 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 83

Deep sentence:

The static declaration, applied to an external variable or function, limits the scope
of that object to the rest of the source file being compiled. Externalstatic thus
provides a way to hide names like buf and bufp in the getch-ungetch combination,
which must be external so they can be shared, yet which should not be visible to
users of getch and ungetch.
So we can have three kinds of declarations: local to one function, restricted to one
source file, or global across potentially many source files. We can imagine other
possibilities, but these three cover most needs.

Notice that the static keyword does two completely different things. Applied to a
local variable (one inside of a function), it modifies the lifetime (``duration'') of the
variable so that it persists for as long as the program does, and does not disappear

Page 159 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

between invocations of the function. Applied to a variable outside of a function (or


to a function) static limits the scope to the current file.

To summarize the scope of external and static functions and variables: when a
function or global variable is defined without static, its scope is potentially the
entire program, although any file which wishes to use it will generally need
an extern declaration. A definition with static limits the scope
by prohibiting other files from accessing a variable or function; even if they try to
use an extern declaration, they'll get errors about ``undefined externals.''

The rules for declaring and defining functions and global variables, and using
the extern and static keywords, are admittedly complicated and somewhat
confusing. You don't need to memorize all of the rules right away: just use simple
declarations and definitions at first, and as you find yourself needing some of the
more complicated possibilities such as static variables, the rules will begin to
make more sense.

Page 160 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 4.7: Register Variables


page 83

The register keyword is only a hint. The compiler might not put something in a
register even though you ask it to, and it might put something in a register even
though you don't ask it to. Most modern compilers do a good job of deciding when
to put things in registers, so most of the time, you don't need to worry about it, and
you don't have to use the register keyword at all.

(A note to assembly language programmers: there's no way to specify which


register a register variable gets assigned to. Also, when you specify a function
parameter as register, it just means that the local copy of the parameter should be
copied to a register if possible; it does not necessarily indicate that the parameter is
going to be passed in a register.)

section 4.8: Block Structure


Page 161 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

pages 84-85

You've probably heard that global variables are ``bad'' because they exist
everywhere and it can be hard to keep track of who's using them. In the same way,
it can be useful to limit the scope of a local variable to just the bit of the function
that uses it, which is exactly what happens if we declare a variable in an inner
block.

section 4.9: Initialization


page 85

These are some of the rules on initialization; we'll learn a few more later as we
learn about a few more data types.

If you don't feel like memorizing the rules for default initialization, just go ahead
and explicitly initialize everything you care about.

Page 162 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Earlier we said that C is quite general in its treatment of expressions: anywhere


you can use an expression, you can use any expression. Here's an exception to that
rule: in an initialization of an external or static variable (strictly speaking, any
variable of static duration; generally speaking, any global variable or
local staticvariable), the initializer must be a constant expression, with value
determinable at compile time, without calling any functions. (This rule is easy to
understand: since these initializations happen conceptually at compile time, before
the program starts running, there's no way for a function call--that is, some run-
time action--to be involved.)

page 86

It probably won't concern you right away, but it turns out that there's another
exception about the allowable expressions in initializers: in the brace-enclosed list
of initializers for an array, all of the expressions must be constant expressions
(even for local arrays).

Page 163 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

There is an error in some printings: if there are fewer explicit initializers than
required for an array, the others will be initialized to zero, for external,
static, andautomatic (local) arrays. (When an automatic array has no initializers at
all, then it contains garbage, just as simple automatic variables do.)

If the initialization
char pattern[] = "ould";
makes sense to you, you're fine. But if the statement that
char pattern[] = "ould";
is equivalent to
char pattern[] = { 'o', 'u', 'l', 'd', '\0' };

bothers you at all, study it until it makes sense. Also, note that a character
array which seems to contain (for example) four characters actually
contains five, because of the ter section 4.10: Recursion

page 86

Page 164 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Recursion is a simple but deep concept which is occasionally presented somewhat


bewilderingly. Please don't be put off by it. If this section stops making sense, don't
worry about it; we'll revisit recursion in chapter 6.

Earlier we said that a function is (or ought to be) a ``black box'' which does some
job and does it well. Whenever you need to get that job done, you're supposed to
be able to call that function. You're not supposed to have to worry about any
reasons why the function might not be able to do that job for you just now.

It turns out that some functions are naturally written in such a way that they can do
their job by calling themselves to do part of their job. This seems like a crazy idea
at first, but based on a strict interpretation of our observation about functions--that
we ought to be able to call them whenever we need their job done--calling a
function from within itself ought not to be illegal, and in fact in C it is legal. Such a
call is called a recursive call, and it works because it's possible to have several
instances of a function active simultaneously. They don't interfere with each other,
because each instance has its own copies of its parameters and local variables.
(However, if a function accesses any static or global data, it must be written
Page 165 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

carefully if it is to be called recursively, because then different instances of


it couldinterfere with each other.)

Let's consider the printd example rather carefully. First, remind yourself about the
reverse-order problem from the itoa example on page 64 in section 3.6. The
``obvious'' algorithm for determining the digits in a number, which involves
successively dividing it by 10 and looking at the remainders, generates digits in
right-to-left order, but we'd usually like them in left-to-right order, especially if
we're printing them out as we go. Let's see if we can figure out another way to do
it.

It's easy to find the lowest (rightmost) digit; that's n % 10. It's easy to compute all
but the lowest digit; that's n / 10. So we could print a number left-to-right,
directly, without any explicit reversal step, if we had a routine to print all but the
last digit. We could call that routine, then print the last digit ourselves.

But--here's the surprise--the routine to ``print all but the last digit'' is printd, the
routine we're writing, if we call it with an argument of n / 10.
Page 166 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Recursion seems like cheating--it seems that if you're writing a routine to do


something (in this case, to print digits) and instead of writing code to print digits
you just punt and call a routine for printing digits and which is in fact the very
routine you're supposed to write--it seems like you haven't done the job you came
to do. A recursive function seems like circular reasoning; it seems to beg the
question of how it does its job.

But if you're writing a recursive function, as long as you do a little bit of work
yourself, and only pass on a portion of the job to another instance of yourself, you
haven't completely reneged on your responsibilities. Furthermore, if you're ever
called with such a small job to do that the little bit you're willing to do
encompasses the whole job, you don't have to call yourself again (there's no
remaining portion that you can't do). Finally, since each recursive call does some
work, passing on smaller and smaller portions to succeeding recursive calls, and
since the last call (where the remaining portion is empty) doesn't generate any
more recursive calls, the recursion is broken and doesn't constitute an infinite loop.

Page 167 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Don't worry about the quicksort example if it seems impenetrable--quicksort is an


important algorithm, but it is not easy to understand completely at first.

Note that the qsort routine described here is very different from the standard
library qsort (in fact, it probably shouldn't even have the same name).

minating '\0'.

section 4.11: The C Preprocessor


page 88

We've been using #include and #define already, but now we'll describe them more
completely.

section 4.11.1: File Inclusion

Page 168 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The two syntaxes for #include lines can be used in various ways, but very simply
speaking, "" is for header files you've written, and <> is for headers which are
provided for you (which someone else has written).

page 89

Deep sentences:

#include is the preferred way to tie the declarations together for a large program. It
guarantees that all the source files will be supplied with the same definitions and
variable declarations, and thus eliminates a particularly nasty kind of bug.
Naturally, when an included file is changed, all files that depend on it must be
recompiled.

That's the story on #include, in a nutshell. section 4.11.2: Macro


Substitution

Page 169 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

#defines last for the whole file; you can't have local ones like you can for local
variables.

``Substitutions are made only for tokens'' means that a substitutable macro name is
only recognized when it stands alone. Also, substitution never happens in quoted
strings, because it turns out that you usually don't want it to. Strings are generally
used for communication with the user, while you want substitutions to happen
where you're talking to the compiler.

The point of the ``forever'' example is to demonstrate that the replacement text
doesn't have to be a simple number or string constant. You'd use the forevermacro
like this:
forever {
...
}
which the preprocessor would expand to
for (;;) {
...

Page 170 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

}
which, as we learned in section 3.5 on page 60, is an infinite loop. (Presumably
there's a break; see section 3.7 p. 64.)

Another popular trick is


#define ever ;;
so that you can say
for(ever) {
...
}
But ``preprocessor tricks'' like these tend to get out of hand very quickly; if you use
too many of them you're not writing in C any more but rather in your own peculiar
dialect, and no one will be able to read your code without understanding all of your
``silly little macros.'' It is best if simple macros expand to simple constants (or
expressions).

Macros with arguments are also called ``function-like macros'' because they act
almost like miniature functions. There are some important differences, however:

Page 171 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

• no call-by-value copying semantics


• no space saving
• hard to have local variables or block structure
• have to parenthesize carefully (see below)

page 90

The correct way to write the square() macro is


#define square(x) ((x) * (x))
There are three rules to remember when defining function-like macros:

1. The macro expansion must always be parenthesized so that any low-


precedence operators it contains will still be evaluated first. If we didn't
write thesquare() macro carefully, the invocation
2. 1 / square(n)

might expand to

Page 172 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

1 / n * n

while it should expand to


1 / (n * n)

3. Within the macro definition, all occurrences of the parameters must be


parenthesized so that any low-precedence operators the actual arguments
contain will be evaluated first. If we didn't write the square() macro
carefully, the invocation
4. square(n + 1)

might expand to
n + 1 * n + 1

while it should expand to


(n + 1) * (n + 1)

Page 173 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

5. If a parameter appears several times in the expansion, the macro may not
work properly if the actual argument is an expression with side effects. No
matter how we parenthesize the square() macro, the invocation
6. square(i++)

would result in
i++ * i++

(perhaps with some parentheses), but this expression is undefined, because


we don't know when the two increments will happen with respect to each
other or the multiplication.

Since the square() macro can't be written perfectly safely, (arguments with side
effects will always be troublesome), its callers will always have to be careful (i.e.
not to call it with arguments with side effects). One convention is to capitalize the
names of macros which can't be treated exactly as if they were functions:
#define Square(x) ((x) * (x))

Page 174 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 90 continued

#undef can be used when you want to give a macro restricted scope, if you can
remember to undefine it when you want it to go out of scope. Don't worry about
``[ensuring] that a routine is really a function, not a macro'' or the getchar example.

Also, don't worry about the # and ## operators. These are new ANSI features which
aren't needed except in relatively special circumstances.

section 4.11.3: Conditional Inclusion


page 91

The #if !defined(HDR) trick is a bit esoteric to start out with. Let's look at a
simpler example: in ANSI C, the remove function deletes a file. On some older
Unix systems, however, the function to delete a file is instead named unlink.
Therefore, when deleting a file, we might use code like this:

Page 175 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

#if defined(unix)
unlink(filename);
#else
remove(filename);
#endif
We would arrange to have the macro unix defined when we were compiling our
program on a Unix machine, and not otherwise.

You may wonder what the difference is between the if() statement we've been
using all along, and this new #if preprocessing directive. if() acts at run time; it
selects whether or not a statement or group of statements is executed, based on a
run-time condition. #if, on the other hand, acts at compile time; it determines
whether certain parts of your program are even seen by the compiler or not. If for
some reason you want to have two slightly different versions of your program, you
can use #if to separate the different parts, leaving the bulk of the code common,
such that you don't have to maintain two totally separate versions.

#ifcan be used to conditionally compile anything: not just statements and


expressions, but also declarations and entire functions.

Page 176 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Back to the HDR example (though this is somewhat of a tangent, and it's not vital for
you to follow it): it's possible for the same header file to be #included twice during
one compilation, either because the same #include line appears twice within the
same source file, or because a source file contains something like
#include "a.h"
#include "b.h"
but b.h also #includes a.h. Since some declarations which you might put in header
files would cause errors if they were acted on twice, the #if !defined(HDR)trick
arranges that the contents of a header file are only processed once.

Note that two different macros, both named HDR, are being used on page 91, for two
entirely different purposes. At the top of the page, HDR is a simple on-off switch; it
is #defined (with no replacement text) when hdr.h is #included for the first time,
and any subsequent #inclusion merely tests whether HDR is #defined. (Note that it is
in fact quite possible to define a macro with no replacement text; a macro so
defined is distinguishable from a macro which has not been #defined at all. One

Page 177 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

common use of a macro with no replacement text is precisely as a


simple #if switch like this.)

At the bottom of the page, HDR ends up containing the name of a header file to
be #included; the name depends on the #if and #elif directives. The line
#include HDR

#includes one of them, depending on the final value of HDR.

Chapter 5: Pointers and Arrays

page 93

Pointers are often thought to be the most difficult aspect of C. It's true that many
people have various problems with pointers, and that many programs founder on

Page 178 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

pointer-related bugs. Actually, though, many of the problems are not so much with
the pointers per se but rather with the memory they point to, and more specifically,
when there isn't any valid memory which they point to. As long as you're careful to
ensure that the pointers in your programs always point to valid memory, pointers
can be useful, powerful, and relatively trouble-free tools. (In these notes, we'll be
emphasizing techniques for ensuring that pointers always point where they should.)

If you haven't worked with pointers before, they're bound to be a bit baffling at
first. Rather than attempting a complete definition (which probably wouldn't mean
anything, either) up front, I'll ask you to read along for a few pages, withholding
judgment, and after we've seen a few of the things that pointers can do, we'll be in
a better position to appreciate what they are.

section 5.1: Pointers and Addresses

section 5.2: Pointers and Function Arguments

section 5.3: Pointers and Arrays


Page 179 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 5.4: Address Arithmetic

section 5.5: Character Pointers and Functions

section 5.6: Pointer Arrays; Pointers to Pointers

section 5.7: Multi-dimensional Arrays

section 5.8: Initialization of Pointer Arrays

section 5.9: Pointers vs. Multi-dimensional Arrays

section 5.10: Command-line Arguments

section 5.1: Pointers and Addresses


If you like to use concrete examples and to think about exactly what's going on at
the machine level, you'll want to know how many bytes are occupied
Page 180 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

by shorts,longs, pointers, etc. It's equally possible, though, to understand pointers


at a more abstract level, thinking about them only in terms of boxes and arrows, as
in the figures on pages 96, 98, 104, 107, and 114-5. (Not worrying about the exact
size in bytes basically means not worrying about how big the boxes are.) The
figure at the bottom of page 93 is probably the least pretty pointer picture in the
whole book; don't worry if it doesn't mean much to you.

When we say that a pointer holds an ``address,'' and that unary & is the ``address
of'' operator, our language is of course influenced by the fact that the underlying
hardware assigns addresses to memory locations, but again, it is not necessary (nor
necessarily desirable) to think about actual machine addresses when working with
pointers. Thinking about the machine addresses can make certain aspects of
pointers easier to understand, but doing so can also make certain mistakes and
misunderstandings easier. In particular, a pointer in C is more than just an address;
as we'll see on the next page, a pointer also carries the notion of what type of data it
points to.

page 94
Page 181 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The presentation on this page is going to seem very artificial at first. At best, you're
going to say, ``This makes sense, but what's it for?'' In fact, it is artificial, and no
real program would ever do meaningless little pointer operations such as are
embodied in the example on this page. However, this is the traditional way to
introduce pointers from scratch, and once we've moved past it, we'll be able to talk
about some more meaningful uses of pointers, and to forget about these artificial
ones. (Once we're done talking about the traditional, artificial introduction on page
94, we'll also attempt a slightly more elaborate, slightly less traditional, slightly
more meaningful parallel introduction, so stay tuned.)

Deep sentence:

The declaration of the pointer ip,


int *ip;
is intended as a mnemonic; it says that the expression *ip is an int.
We'll have more to say about this sentence in a bit.

Page 182 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

As an even more traditional, even less meaningful, even simpler example, we


could say
int i = 1; /* an integer */
int *ip; /* a pointer-to-int */
ip = &i; /* ip points to i */
printf("%d\n", *ip); /* prints i, which is 1 */
*ip = 5; /* sets i to 5 */
(The obvious questions are, ``if you want to print i, or set it to 5, why not
just do it? Why mess around with this `pointer' thing?'' More on that in a minute.)

The unary & and * operators are complementary. Given an object (i.e. a
variable), & generates a pointer to it; given a pointer, * ``returns'' the value of the
pointed-to object. ``Returns'' is in quotes because, as you may have noticed in the
examples, you're not restricted to fetching values via pointers: you can also store
values via pointers. In an assignment like
*ip = 0;

Page 183 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

the subexpression *ip is conceptually ``replaced'' by the object which ip points to,
and since *ip appears on the left-hand side of the assignment operator, what
happens to the pointed-to object is that it gets assigned to.

One of the things that's hard about pointers is simply talking about what's going on.
We've been using the words ``return'' and ``replace'' in quotes, because they don't
quite reflect what's actually going on, and we've been using clumsy locutions like
``fetch via pointers'' and ``store via pointers.'' There is some jargon for referring to
pointer use; one word you'll often see is dereference, a term which, though its
derivation is suspect, is used to mean ``follow a pointer to get at, and use, the
object it points to.'' Thus, we sometimes call unary * the ``pointer dereferencing
operator,'' and we may say that the expressions
printf("%d\n", *ip);
and
*ip = 5;

Page 184 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

both ``dereference the pointer ip.'' We may also talk about indirecting on a pointer:
to indirect on a pointer is again to follow it to see what it points to; and * may also
be called the ``pointer indirection operator.''

Our examples of pointers so far have been, admittedly, artificial and rather
meaningless. Let's try a slightly more realistic example. In the previous chapter, we
used the routines atoi and atof to convert strings representing numbers to the
actual numbers represented. Often the strings were typed by the user, and read
withgetline. As you may have noticed, neither atoi nor atof does any validity or
error checking: both simply stop reading when they reach a character that can't be
part of the number they're converting, and if there aren't any numeric characters in
the string, they simply return 0. (For example, atoi("49er") is 49,
andatoi("three") is 0, and atof("1.2.3") is 1.2 .) These attributes
make atoi and atof easy to write and easy (for the programmer) to use, but they
are not the most user-friendly routines possible. A good user interface would warn
the user and prompt again in case of invalid, non-numeric input.

Page 185 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Suppose we were writing a simple inventory-control system. For each part stored
in our warehouse, we might record the part number, location, and number of parts
on hand. For simplicity, we'll assume that the location is always a simple bin
number.

Somewhere in the inventory-control program, we might find the variables


int part_number;
int location;
int number_on_hand;
and there might be a routine that lets the user enter any of these numbers. Suppose
that there is another variable,
int which_entry;
which indicates which of the three numbers is being entered (1 for part_number, 2
for location, or 3 for number_on_hand). We might have code like this:
char instring[30];

Page 186 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

switch (which_entry) {
case 1:
printf("enter part number:\n");
getline(instring, 30);
part_number = atoi(instring);
break;

case 2:
printf("enter location:\n");
getline(instring, 30);
location = atoi(instring);
break;

case 3:
printf("enter number on hand:\n");
getline(instring, 30);
number_on_hand = atoi(instring);
break;

Page 187 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

}
Suppose that we now begin to add a bit of rudimentary verification to the input
routines. The first case might look like
case 1:
do {
printf("enter part number:\n");
getline(instring, 30);
if(!isdigit(instring[0]))
continue;
part_number = atoi(instring);
} while (part_number == 0);
break;
If the first character is not a digit, or if atoi returns 0, the code goes around the
loop another time, and prompts the user again, in hopes that the user will type
some proper numeric input this time. (The tests for numeric input are not
sufficient, nor even wise if 0 is a possible input value, as it presumably is for
number on hand. In fact, the two tests really do the same thing! But please
overlook these faults. If you're curious, you can learn about a new ANSI
function, strtol, which is like atoibut gives you a bit more control, and would be
a better routine to use here.)

Page 188 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The code fragment above is for just one of the three input cases. The obvious way
to perform the same checking for the other two cases would be to repeat the same
code two more times, changing the prompt string and the name of the variable
assigned to (location or number_on_hand instead of part_number). Duplicating the
code is a nuisance, though, especially if we later come up with a better way to do
input verification (perhaps one not suffering from the imperfections mentioned
above). Is there a better way?

One way would be to use a temporary variable in the input loop, and then set one
of the three real variables to the value of the temporary variable, depending
onwhich_entry:
int temp;

do {
printf("enter the number:\n");
getline(instring, 30);

Page 189 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

if(!isdigit(instring[0]))
continue;
temp = atoi(instring);
} while (temp == 0);

switch (which_entry) {
case 1:
part_number = temp;
break;

case 2:
location = temp;
break;

case 3:

Page 190 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

number_on_hand = temp;
break;
}

Another way, however, would be to use a pointer to keep track of which variable
we're setting. (In this example, we'll also get the prompt right.)
char instring[30];
int *numpointer;
char *prompt;

switch (which_entry) {
case 1:
numpointer = &part_number;
prompt = "part number";
break;

Page 191 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

case 2:
numpointer = &location;
prompt = "location";
break;

case 3:
numpointer = &number_on_hand;
prompt = "number on hand";
break;
}

do {
printf("enter %s:\n", prompt);
getline(instring, 30);
if(!isdigit(instring[0]))
continue;
*numpointer = atoi(instring);
} while (*numpointer == 0);

Page 192 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The idea here is that prompt is the prompt string and numpointer points to the
particular numeric value we're entering. That way, a single input verification loop
can print any of the three prompts and set any of the three numeric variables,
depending on where numpointer points. (We won't officially see character pointers
and strings until section 5.5, so don't worry if the use of the prompt pointer seems
new or inexplicable.)

This example is, in its own ways, quite artificial. (In a real inventory-control
program, we'd obviously need to keep track of many parts; we couldn't use single
variables for the part number, location, and quantity. We probably wouldn't really
have a which_entry variable telling us which number to prompt for, and we'd do
the numeric validation quite differently. We might well do numeric entry and
validation in a separate function, removing this need for the pointers.) However,
the pointer aspect of this example--using a pointer to refer to one of several
different things, so that one generic piece of code can access any of the things--is a
very typical (i.e. realistic) use of pointers.

Page 193 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

There's one nuance of pointer declarations which deserves mention. We've seen
that
int *ip;
declares the variable ip as a pointer to an int. We might look at that declaration
and imagine that int * is the type and ip is the name of the variable being
declared. (Actually, so far, these assumptions are both true.) We might therefore
imagine that a more ``obvious'' way of writing the declaration would be
int* ip;
This would work, but it is misleading, as we'll see if we try to declare
two int pointers at once. How shall we do it? If we try
int* ip1, ip2; /* WRONG */
we don't succeed; this would declare ip1 as a pointer-to-int, but ip2 as an int (not
a pointer). The correct declaration for two pointers is
int *ip1, *ip2;
As the authors said in the middle of page 94, the intent of pointer (and in fact all)
declarations is that they give little miniature expressions indicating what type a
certain use of the variables will have. The declaration
int *ip1;

Page 194 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

doesn't so much say that ip is a pointer-to-int; it says that *ip is an int. (To be
sure, ip is a pointer-to-int.) In the declaration
int *ip1, *ip2;
both *ip1 and *ip2 are ints; so ip1 and ip2 are both pointers-to-int. You'll hear
this aspect of C declarations referred to as ``declaration mimics use.'' If it bothers
you, or if you think you might accidentally write things like
int *ip1, ip2;
then to stay on the safe side you might want to get in the habit of writing
declarations on separate lines:
int *ip1;
int *ip2;

I promised to point out the safe techniques for ensuring that pointers always point
where they should. The examples in this section, which have all involved pointers
pointing to single variables, are relatively safe; a single variable is not a very risky
thing to point to, so code like the examples in this section is relatively unlikely to
go awry and result in invalid pointers. (One potential problem, though, which we'll
talk more about later, is that since local, ``automatic'' variables are automatically
deallocated when the function containing them returns, any pointer to a local
Page 195 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

variable also becomes invalid. Therefore, a function which returns a pointer must
never return a pointer to one of its own local variables, and it would also be invalid
to take a pointer to a local variable and assign it to a global pointer variable.)

section 5.2: Pointers and Function Arguments


page 95

This section discusses a very common use of pointers: setting things up so that a
function can modify values in its caller, or return values, via its arguments.
Remember that, normally, C passes arguments by value, and that if a function
modifies one of its arguments, it modifies only its local copy, not the value in the
caller. (Normally, this is a good thing; having a function which inadvertently
assigns to its arguments and hence inadvertently modifies a value in its caller can
be a source of obscure bugs in languages which don't use call-by-value.) However,
what happens if a function wants to modify a value in its caller, and its caller wants
to let it? How can a function return two values? (A function's formal return value is
always a single value.)
Page 196 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The answer to both questions is that a function can declare a parameter which is a
pointer. The caller then passes a pointer to (that is, the ``address of'') a variable in
the caller which is to be modified or which is to receive the second return value. In
fact, we've seen examples of this already: getline returns the length of the line it
reads as well as the line itself, and the getop routine in the calculator example in
section 4.3 returned both a code for an operator and a string representing the full
text of the operator. (We needed that string when the operator was '0' indicating
numeric input, so that the string could return the full numeric input.) Though we
didn't say so at the time, we were actually using pointers in these examples. (We'll
explore the relationship between arrays and pointers, which made this possible, in
section 5.3.)

With all of this in mind, make sure that you understand why the swap example on
page 95 would not work, and how and why the swap example on page
96 doeswork, and what the figure on page 96 shows.

The swap example demonstrated a function which modified some variables


(a and b) in its caller. The getint example demonstrates how to return two values
Page 197 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

from a function by returning one value as the normal function return value and the
other one by writing to a pointer. (There is no fundamental difference, though,
between ``modifying a variable in the caller'' and ``returning a value by writing to a
pointer''; these are just two applications of pointer parameters.)

The version of getint on page 97 is somewhat complicated because it allows free-


form input, that is, the values need only be separated by whitespace or punctuation;
they are not restricted to being one per line or anything. (C source code is also free-
form in this regard; see page 4 of chapter 1 of these notes.) To see more clearly the
essence of what getint is supposed to do, imagine for a moment that the
input is restricted to one value per line, as in the ``primitive calculator'' example on
page 72 of section 4.2. In that case, we might use the following simpler (i.e. more
primitive) code:
int getint(int *pn)
{
char line[20];
if (getline(line, 20) <= 0)
return EOF;

Page 198 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

*pn = atoi(line);
return 1;
}
The getint function on page 97 is documented as returning nonzero for a valid
number and 0 for something other than a number. Our stripped-down version does
not, and as it happens, the example code at the bottom of page 96 does not make
use of the valid/invalid distinction. Can you see a way to rewrite the code at the
bottom of page 96 to fill in the cells of the array with only valid numbers?

You might also notice, again from the code at the bottom of page 96, that & need
not be restricted to single, simple variables; it can take the address of any data
object, in this case, one cell of the array. Just as for all of C's other operators, & can
be applied to arbitrary expressions, although it is restricted to expressions which
represent addressable objects. Expressions like &1 or &(2+3) are meaningless and
illegal.

You may remember a discussion from section 1.5.1 on page 16 of how


C's getchar routine is able to return all possible characters, plus an end-of-file

Page 199 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

indication, in its single return value. Why does getint need two return values?
Why can't it use the same trick that getchar does?

The examples in this section are again relatively safe. The pointers have all been
parameters, and the callers have passed pointers (that is, the ``addresses'' of) their
own, properly-allocated variables. That is, code like
int a = 1, b = 2;
swap(&a, &b);
and
int a;
getint(&a);
is correct and quite safe.

Something to beware of, though, is the temptation to inadvertently pass an


uninitialized pointer variable (rather than the ``address'' of some other variable) to
a routine which expects a pointer. We know that the getint routine expects as its
argument a pointer to an int in which it is to store the integer it gets. Suppose we
took that description literally, and wrote
Page 200 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

int *ip; /* a pointer to an int */


getint(ip);
Here we have in fact passed a pointer-to-int to getint, but the pointer we passed
doesn't point anywhere! When getint writes to (``dereferences'') the pointer, in an
expression like *pn = 0, it will scribble on some random part of memory, and the
program may crash. When people get caught in this trap, they often think that to fix
it they need to use the & operator:
getint(&ip); /* WRONG */
or maybe the * operator:
getint(*ip); /* WRONG */
but these go from bad to worse. (If you think about them carefully, &ip is a pointer-
to-pointer-to-int, and *ip is an int, and neither of these types matches the pointer-
to-int which getint expects.) The correct usage for now, as we showed already, is
something like
int a;
getint(&a);

Page 201 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

In this case, a is an honest-to-goodness, allocated int, so when we generate a


pointer to it (with &a) and call getint, getint receives a pointer that does point
somewhere.

section 5.3: Pointers and Arrays


page 97

For some people, section 5.3 is evidently the hardest section in this book, or even if
they haven't read this book, the most confusing aspect of the language. C
introduces a novel and, it can be said, elegant integration of pointers and arrays,
but there are a distressing number of ways of misunderstanding arrays, or pointers,
or both. Take this section very slowly, learn the things it does say, and don't learn
anything it doesn't say (i.e. don't make any false assumptions).

It's not necessarily true that ``the pointer version will in general be faster'';
efficiency is (or ought to be) a secondary concern when considering the use of
pointers.
Page 202 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 98

On the top half of this page, we aren't seeing anything we haven't seen before. We
already knew (or should have known) that the declaration int a[10]; declares an
array of ten contiguous int's numbered from 0 to 9. We saw on page 94 and again
on page 96 that & can be used to take the address of one cell of an array.

What's new on this page are first the nice pictures (and they are nice pictures; I
think they're the right way of thinking about arrays and pointers in C) and the
definition of pointer arithmetic. If the phrase ``then by definition pa+1 points to the
next element'' alarms you; if you hadn't known that pa+1 points to the next element;
don't worry. You hadn't known this, and you aren't expected even to have
suspected it: the reason that pa+1 points to the next element is simply that it's
defined that way, as the sentence says. Furthermore, subtraction works in an
exactly analogous way: If we were to say
pa = &a[5];

Page 203 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

then *(pa-1) would refer to the contents of a[4], and *(pa-i) would refer to the
contents of the location i elements before cell 5 (as long as i <= 5).

Note furthermore that we do not have to worry about the size of the objects pointed
to. Adding 1 to a pointer (or subtracting 1) always means to move over one object
of the type pointed to, to get to the next element. (If you're too worried about
machine addresses, or the actual address values stored in pointers, or the actual
sizes of things, it's easy to mistakenly assume that adding or subtracting 1 adds or
subtracts 1 from the machine address, but as we mentioned, you don't have to think
at this low level. We'll see in section 5.4 how pointer arithmetic is actually scaled,
automatically, by the size of the object pointed to, but we don't have to worry about
it if we don't want to.)

Deep sentence:

The meaning of ``adding 1 to a pointer,'' and by extension, all pointer arithmetic, is


that pa+1 points to the next object, and pa+i points to the i-th object beyond pa.

Page 204 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

This aspect of pointers--that arithmetic works on them, and in this way--is one of
several vital facts about pointers in C. On the next page, we'll see the others.

page 99

Deep sentences:

The correspondence between indexing and pointer arithmetic is very close. By


definition, the value of a variable or expression of type array is the address of
element zero of the array.
This is a fundamental definition, which we'll now spend several pages discussing.

Don't worry too much yet about the assertion that ``pa and a have identical values.''
We're not surprised about the value of pa after the assignment pa = &a[0];we've
been taking the address of array elements for several pages now. What we don't
know--we're not yet in a position to be surprised about it or not--is what the
``value'' of the array a is. What is the value of the array a?

Page 205 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

In some languages, the value of an array is the entire array. If an array appears on
the right-hand sign of an assignment, the entire array is assigned, and the left-hand
side had better be an array, too. C does not work this way; C never lets you
manipulate entire arrays.

In C, by definition, the value of an array, when it appears in an expression, is a


pointer to its first element. In other words, the value of the array a simply is &a[0].
If this statement makes any kind of intuitive sense to you at this point, that's great,
but if it doesn't, please just take it on faith for a while. This statement is a
fundamental (in fact the fundamental) definition about arrays and pointers in C,
and if you don't remember it, or don't believe it, then pointers and arrays will never
make proper sense. (You will also need to know another bit of jargon: we often say
that, when an array appears in an expression, it decays into a pointer to its first
element.)

Given the above definition, let's explore some of the consequences. First of all,
though we've been saying

Page 206 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

pa = &a[0];
we could also say
pa = a;
because by definition the value of a in an expression (i.e. as it sits there all alone on
the right-hand side) is &a[0]. Secondly, anywhere we've been using square
brackets [] to subscript an array, we could also have used the pointer dereferencing
operator *. That is, instead of writing
i = a[5];
we could, if we wanted to, instead write
i = *(a+5);
Why would this possibly work? How could this possibly work? Let's look at the
expression *(a+5) step by step. It contains a reference to the array a, which is by
definition a pointer to its first element. So *(a+5) is equivalent to *(&a[0]+5). To
make things clear, let's pretend that we'd assigned the pointer to the first element to
an actual pointer variable:
int *pa = &a[0];
Now we have *(a+5) is equivalent to *(&a[0]+5) is equivalent to *(pa+5). But we
learned on page 98 that *(pa+5) is simply the contents of the location 5 cells past

Page 207 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

where pa points to. Since pa points to a[0], *(pa+5) is a[5]. Thus, for whatever it's
worth, any time you have an array subscript a[i], you could write it as*(a+i).

The idea of the previous paragraph isn't worth much, because if you've got an
array a, indexing it using the notation a[i] is considerably more natural and
convenient than the alternate *(a+i). The significant fact is that this little
correspondence between the expressions a[i] and *(a+i) holds for more than just
arrays. If pa is a pointer, we can get at locations near it by using *(pa+i), as we
learned on page 98, but we can also use pa[i]. This time, using the ``other''
notation (array instead of pointer, when we thought we had a pointer) can be more
convenient.

At this point, you may be asking why you can write pa[i] instead of *(pa+i). You
may be wondering how you're going to remember that you can do this, or
remember what it means if you see it in someone else's code, when it's such a
surprising fact in the first place. There are several ways to remember it; pick
whichever one suits you:

Page 208 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

1. It's an arbitrary fact, true by definition; just memorize it.


2. If, for an array a, instead of writing a[i], you can also write *(a+i) (as we
proved a few paragraphs back); then it's only fair that for a pointer pa,
instead of writing *(pa+i), you can also write pa[i].
3. Deep sentence: ``In evaluating a[i], C converts it to *(a+i) immediately;
the two forms are equivalent.''
4. An array is a contiguous block of elements of a particular type. A pointer
often points to a contiguous block of elements of a particular type.
Therefore, it's very handy to treat a pointer to a contiguous block of
elements as if it were an array, by saying things like pa[i].
5. [This is the most radical explanation, though it's also the most true; but if it
offends your sensibilities or only seems to make things more confusing,
please ignore it.] When you said a[i], you weren't really subscripting an
array at all, because an array like a in an expression always turns into a
pointer to its first element. So the array subscripting
operator [] always finds itself working on pointers, and it's a simple identity
(another definition) that pa[i] is *(pa+i).

Page 209 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

(But do pick at least one reason to remember this fact, as it's a fact you'll need to
remember; expressions like pa[i] are quite common.)

The authors point out that ``There is one difference between an array name and a
pointer that must be kept in mind,'' and this is quite true, but note very carefully
that there is in fact every difference between an array and a pointer. When an array
name appears in most expressions, it turns into a pointer (to the array's first
element), but that does not mean that the array is a pointer. You may hear it stated
that ``an array is just a constant pointer,'' and this is a convenient explanation, but it
is a simplified and potentially misleading explanation.

With that said, do make sure you understand why a=pa and a++ (where a is an
array) cannot mean anything.

Deep sentence:

When an array name is passed to a function, what is passed is the location of the
initial element.
Page 210 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Though perhaps surprising, this sentence doesn't say anything new. A function call,
and more importantly, each of its arguments, is an expression, and in an
expression, a reference to an array is always replaced by a pointer to its first
element. So given
int a[10];
f(a);
it is not the entire array a that is passed to f but rather just a pointer to its first
element. For an example closer to the text on page 99, given
char string[] = "Hello, world!";
int len = strlen(string);
it is not the entire array string that is passed to strlen (recall that C never lets you
do anything with a string or an array all at once), but rather just a pointer to its first
element.

We now realize that we've been operating under a gentle fiction during the first
four chapters of the book. Whenever we wrote a function
like getline or getopwhich seemed to accept an array of characters, and whenever
we thought we were passing arrays of characters to these routines, we were
actually passing pointers. This explains, among other things,
Page 211 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

how getline and getop were able to modify the arrays in the caller, even though
we said that call-by-value meant that functions can't modify variables in their
callers since they receive copies of the parameters. When a function receives a
pointer, it cannot modify the original pointer in the caller, but it can definitely
modify what the pointer points to.

If that doesn't make sense, make sure you appreciate the full difference between a
pointer and what it points to! It is intirely possible to modify one without
modifying the other. Let's illustrate this with an example. If we say
char a[] = "hello";
char b[] = "world";
we've declared two character arrays, a and b, each containing a string. If we say
char *p = a;
we've declared p as a pointer-to-char, and initialized it to point to the first character
of the array a. If we then say
*p = 'H';
we've modified what p points to. We have not modified p itself. After saying *p =
'H'; the string in the array a has been modified to contain "Hello".

Page 212 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

If we say
p = b;
on the other hand, we have modified the pointer p itself. We have not really
modified what p points to. In a sense, ``what p points to'' has changed--it used to be
the string in the array a, and now it's the string in the array b. But saying p =
b didn't modify either of the strings.

page 100

Since, as we've just seen, functions never receive arrays as parameters, but instead
always receive pointers, how have we been able to get away with defining
functions (like getline and getop) which seemed to accept arrays? The answer is
that whenever you declare an array parameter to a function, the compiler pretends
that you actually declared a pointer. (It does this mostly so that we can get away
with the ``gentle fiction'' of pretending that we can pass arrays to functions.)

Page 213 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

When you see a statement like ``char s[]; and char *s; are equivalent'' (as in fact
you see at the top of page 100), you can be sure that (and you must remember that)
it is only function formal parameters that are being talked about. Anywhere else,
arrays and pointers are quite different, as we've discussed.

Expressions like p[-1] (at the end of section 5.3) may be easier to understand if we
convert them back to the pointer form *(p + -1) and thence to *(p-1)which, as
we've seen, is the object one before what p points to.

With the examples in this section, we begin to see how pointer manipulations can
go awry. In sections 5.1 and 5.2, most of our pointers were to simple variables.
When we use pointers into arrays, and when we begin using pointer arithmetic to
access nearby cells of the array, we must be careful never to go off the end of the
array, in either direction. A pointer is only valid if it points to one of the allocated
cells of an array. (There is also an exception for a pointer just past the end of an
array, which we'll talk about later.) Given the declarations
int a[10];

Page 214 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

int *pa;
the statements
pa = a;
*pa = 0;
*(pa+1) = 1;
pa[2] = 2;
pa = &a[5];
*pa = 5;
*(pa-1) = 4;
pa[1] = 6;
pa = &a[9];
*pa = 9;
pa[-1] = 8;
are all valid. These statements set the pointer pa pointing to various cells of the
array a, and modify some of those cells by indirecting on the pointer pa. (As an
exercise, verify that each cell of a that receives a value receives the value of its
own index. For example, a[6] is set to 6.)

However, the statements


pa = a;
*(pa+10) = 0; /* WRONG */
Page 215 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

*(pa-1) = 0; /* WRONG */
pa = &a[5];
*(pa+10) = 0; /* WRONG */
pa = &a[10];
*pa = 0; /* WRONG */
and
int *pa2;
pa = &a[5];
pa2 = pa + 10; /* WRONG */
pa2 = pa - 10; /* WRONG */
are all invalid. The first examples set pa to point into the array a but then use
overly-large offsets (+10, -1) which end up trying to store a value outside of the
array a. The statements in the last set of examples set pa2 to point outside of the
array a. Even though no attempt is made to access the nonexistent cells, these
statements are illegal, too. Finally, the code
int a[10];
int *pa, *pa2;
pa = &a[5];
pa2 = pa + 10; /* WRONG */
*pa2 = 0; /* WRONG */

Page 216 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

would be very wrong, because it not only computes a pointer to the


nonexistent 15<sup>th</sup> cell of a 10-element array, but it also tries to
store something there. section 5.4: Address Arithmetic

This section is going to get pretty hairy. Some of it talks about things we've already
seen (adding integers to pointers); some of it talks about things we need to learn
(comparing and subtracting pointers); and some of it talks about a rather
sophisticated example (a storage allocator). Don't worry if you can't follow all the
details of the storage allocator, but do read along so that you can pick up the other
new points. (In other words, make sure you read from ``Zero is the sole exception''
in the middle of page 102 to ``that is, the string length'' on page 103, and also the
last paragraph on page 103.)

What is a storage allocator for? So far, we've used pointers to point to existing
variables and arrays, which the compiler allocated for us. But eventually, we may
want to allocate data structures (arrays, and others we haven't seen yet) of a size
which we don't know at compile time. Earlier, we spoke briefly about a

Page 217 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

hypothetical inventory-management system, which recorded information about


each part stored in a warehouse. How many different parts could there be? If we
used fixed-size arrays, there would be a fixed upper limit on the number of parts
we could enter into the system, and we'd be annoyed if that limit were reached. A
better solution is not to allocate a fixed array at compile time, but rather to use a
run-time storage allocator to allocate memory for the data structures used to
describe each part. That way, the number of parts which the system can hold is
limited only by available memory, not on any static limit built into the program.
Using a storage allocator to allocate memory at run time in this way is
called dynamic allocation.

However, dynamic memory allocation is where C programming can really get


tricky, because you the programmer are responsible for most aspects of it, and
there are plenty of things you can do wrong (e.g. not allocate quite enough
memory, accidentally keep using it after you deallocate it, have random invalid
pointers pointing everywhere, etc.). Therefore, we won't be talking about dynamic
allocation for a while, which is why you can skim over the storage allocator in this
section for now.
Page 218 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 102

The first new piece of information in this section (which you'll need to remember
even if you're not following the details of the storage allocator example) is the
introduction of the ``null pointer.''

So far, all of our pointers have pointed somewhere, and we've cautioned about
pointers which don't. To help us distinguish between pointers which point
somewhere and pointers which don't, there is a single, special pointer value we can
use, which is guaranteed not to point anywhere. When a pointer doesn't point
anywhere, we can set it to this value, to make explicit the fact that it doesn't point
anywhere.

This special pointer value is called the null pointer. The way to set a pointer to this
value is to use a constant 0:
int *ip = 0;

Page 219 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The 0 is just a shorthand; it does not necessarily mean machine address 0. To make
it clear that we're talking about the null pointer and not the integer 0, we often use
a macro definition like
#define NULL 0
so that we can say things like
int *ip = NULL;
(If you've used Pascal or LISP, the nil pointer in those languages is analogous.)

In fact, the above #definition of NULL has been placed in the standard header
file <stdio.h> for us (and in several other standard header files as well), so we
don't even need to #define it. I agree completely with the authors that
using NULL instead of 0 makes it more clear that we're talking about a null pointer,
so I'll always be using NULL, too.

Just as we can set a pointer to NULL, we can also test a pointer to see if it's NULL. The
code
if(p != NULL)
*p = 0;

Page 220 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

else printf("p doesn't point anywhere\n");


tests p to see if it's non-NULL. If it's not NULL, it assumes that it points somewhere
valid, and writes a 0 there. Otherwise (i.e. if p is the null pointer) the code
complains.

Though we can use null pointers as markers to remind ourselves of which of our
pointers don't point anywhere, it's up to us to do so. It is not guaranteed that all
uninitialized pointer variables (which obviously don't point anywhere) are
initialized to NULL, so if we want to use the null pointer convention to remind
ourselves, we'd best explicitly initialize all unused pointers to NULL. Furthermore,
there is no general mechanism that automatically checks whether a pointer is non-
null before we use it. If we think that a pointer might not point anywhere, and if
we're using the convention that pointers that don't point anywhere are set to NULL,
it's up to us to compare the pointer to NULL to decide whether it's safe to use it.

The next new piece of information in this section (which we've already alluded to)
is pointer comparison. You can compare two pointers for equality or inequality
(==or !=): they're equal if they point to the same place or are both null pointers;
Page 221 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

they're unequal if they point to different places, or if one points somewhere and
one is a null pointer. If two pointers point into the same array, the relational
comparisons <, <=, >, and >= can also be used.

page 103

The sentences

...n is scaled according to the size of the objects p points to, which is determined by
the declaration of p. If an int is four bytes, for example, the intwill be scaled by
four.
say something we've seen already, but may only confuse the issue. We've said
informally that in the code
int a[10];
int *pa = &a[0];
*(pa+1) = 1;
pacontains the ``address'' of the int object a[0], but we've discouraged thinking
about this address as an actual machine memory address. We've said that the

Page 222 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

expression pa+1 moves to the next int in the array (in this case, a[1]). Thinking at
this abstract level, we don't even need to worry about any ``scaling by the size of
the objects pointed to.''

If we do look at a lower, machine level of addressing, we may learn that


an int occupies some number of bytes (usually two or four), such that when we
add 1 to a pointer-to-int, the machine address is actually increased by 2 or 4. If
you like to consider the situation from this angle, you're welcome to, but if you
don't, you certainly don't have to. If you do start thinking about machine addresses
and sizes, make extra sure that you remember that C does do the necessary scaling
for you. Don't write something like
int a[10];
int *pa = &a[0];
*(pa+sizeof(int)) = 1;
where sizeof(int) is the size of an int in bytes, and expect it to access a[1].

Since adding an int to a pointer gives us another pointer:

Page 223 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

int a[10];
int *pa1 = &a[0];
int *pa2 = pa1 + 5;
we might wonder if we can rearrange the expression
pa2 = pa1 + 5
to get
pa2 - pa1 5
(where this is no longer a C assignment, we're just wondering if we can
subtract pa1 from pa2, and what the result might be). The answer is yes: just as you
can compare two pointers which point into the same array, you can subtract them,
and the result is, naturally enough, the distance between them, in cells or elements.

(In the large parenthetical statement in the middle of the page, don't worry too
much about ptrdiff_t, size_t, and sizeof.)

section 5.5: Character Pointers and Functions


page 104

Page 224 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Since text strings are represented in C by arrays of characters, and since arrays are
very often manipulated via pointers, character pointers are probably the most
common pointers in C.

Deep sentence:

C does not provide any operators for processing an entire string of characters as a
unit.
We've said this sort of thing before, and it's a general statement which is true of all
arrays. Make sure you understand that in the lines
char *pmessage;
pmessage = "now is the time";
pmessage = "hello, world";
all we're doing is assigning two pointers, not copying two entire strings.

At the bottom of the page is a very important picture. We've said that pointers and
arrays are different, and here's another illustration. Make sure you appreciate the

Page 225 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

significance of this picture: it's probably the most basic illustration of how arrays
and pointers are implemented in C.

We also need to understand the two different ways that string literals like "now is
the time" are used in C. In the definition

char amessage[] = "now is the time";


the string literal is used as the initializer for the array amessage. amessage is here an
array of 16 characters, which we may later overwrite with other characters if we
wish. The string literal merely sets the initial contents of the array. In the definition
char *pmessage = "now is the time";
on the other hand, the string literal is used to create a little block of characters
somewhere in memory which the pointer pmessage is initialized to point to. We
may reassign pmessage to point somewhere else, but as long as it points to the
string literal, we can't modify the characters it points to.

As an example of what we can and can't do, given the lines


char amessage[] = "now is the time";

Page 226 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

char *pmessage = "now is the time";


we could say
amessage[0] = 'N';
to make amessage say "Nowis the time". But if we tried to do
pmessage[0] = 'N';
(which, as you may recall, is equivalent to *pmessage = 'N'), it would not
necessarily work; we're not allowed to modify that string. (One reason is that the
compiler might have placed the ``little block of characters'' in read-only memory.
Another reason is that if we had written
char *pmessage = "now is the time";
char *qmessage = "now is the time";
the compiler might have used the same little block of memory to initialize both
pointers, and we wouldn't want a change to one to alter the other.)

Deep sentence:

The first function is strcpy(s,t), which copies the string t to the string s. It would
be nice just to say s=t but this copies the pointer, not the characters.

Page 227 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

This is a restatement of what we said above, and a reminder of why we'll need a
function, strcpy, to copy whole strings.

page 105

Once again, these code fragments are being written in a rather compressed way. To
make it easier to see what's going on, here are alternate versions of strcpy, which
don't bury the assignment in the loop test. First we'll use array notation:
void strcpy(char s[], char t[])
{
int i;
for(i = 0; t[i] != '\0'; i++)
s[i] = t[i];
s[i] = '\0';
}
Note that we have to manually append the '\0' to s after the loop. Note that in
doing so we depend upon i retaining its final value after the loop, but this is
guaranteed in C, as we learned in Chapter 3.

Page 228 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Here is a similar function, using pointer notation:


void strcpy(char *s, char *t)
{
while(*t != '\0')
*s++ = *t++;
*s = '\0';
}
Again, we have to manually append the '\0'. Yet another option might be to use
a do/while loop.

All of these versions of strcpy are quite similar to the copy function we saw on
page 29 in section 1.9.

page 106

The version of strcpy at the top of this page is my least favorite example in the
whole book. Yes, many experienced C programmers would write strcpy this way,
and yes, you'll eventually need to be able to read and decipher code like this, but

Page 229 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

my own recommendation against this kind of cryptic code is strong enough that I'd
rather not show this example yet, if at all.

We need strcmp for about the same reason we need strcpy. Just as we cannot
assign one string to another using =, we cannot compare two strings using ==. (If
we try to use ==, all we'll compare is the two pointers. If the pointers are equal,
they point to the same place, so they certainly point to the same string, but if we
have two strings in two different parts of memory, pointers to them will always
compare different even if the strings pointed to contain identical sequences of
characters.)

Note that strcmp returns a positive number if s is greater than t, a negative number
if s is less than t, and zero if s compares equal to t. ``Greater than'' and ``less than''
are interpreted based on the relative values of the characters in the machine's
character set. This means that 'a' < 'b', but (in the ASCII character set, at least) it
also means that 'B' < 'a'. (In other words, capital letters will sort before lower-
case letters.) The positive or negative number which strcmp returns is, in this

Page 230 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

implementation at least, actually the difference between the values of the first two
characters that differ.

Note that strcmp returns 0 when the strings are equal. Therefore, the condition
if(strcmp(a, b))
do something...
doesn't do what you probably think it does. Remember that C considers zero to be
``false'' and nonzero to be ``true,'' so this code does something if the
strings a andb are unequal. If you want to do something if two strings are equal,
use code like
if(strcmp(a, b) == 0)
do something...
(There's nothing fancy going on here: strcmp returns 0 when the two strings are
equal, so that's what we explicitly test for.)

To continue our ongoing discussion of which pointer manipulations are safe and
which are risky or must be done with care, let's consider character pointers. As

Page 231 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

we've mentioned, one thing to beware of is that a pointer derived from a string
literal, as in
char *pmessage = "now is the time";
is usable but not writable (that is, the characters pointed to are not writable.)
Another thing to be careful of is that any time you copy strings, using strcpy or
some other method, you must be sure that the destination string is a writable array
with enough space for the string you're writing. Remember, too, that the space you
need is the number of characters in the string you're copying, plus one for the
terminating '\0'.

For the above reasons, all three of these examples are incorrect:
char *p1 = "Hello, world!";
char *p2;
strcpy(p2, p1); /* WRONG */

char *p = "Hello, world!";


char a[13];

Page 232 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

strcpy(a, p); /* WRONG */

char *p3 = "Hello, world!";


char *p4 = "A string to overwrite";
strcpy(p4, p3); /* WRONG */
In the first example, p2 doesn't point anywhere. In the second example, a is a
writable array, but it doesn't have room for the terminating '\0'. In the third
example,p4 points to memory which we're not allowed to overwrite. A correct
example would be
char *p = "Hello, world!";
char a[14];
strcpy(a, p);
(Another option would be to obtain some memory for the string copy, i.e. the
destination for strcpy, using dynamic memory allocation, but we're not talking
about that yet.)

page 106 continued (bottom)

Page 233 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Expressions like *p++ and *--p may seem cryptic at first sight, but they're actually
analogous to array subscript expressions like a[i++] and a[--i], some of which we
were using back on page 47 in section 2.8.

section 5.6: Pointer Arrays; Pointers to Pointers


page 107

Deep sentence:

Since pointers are variables themselves, they can be stored in arrays just as other
variables can.
This is just one aspect of the generality of C's data types, which we'll be seeing in
the next few sections.

We've used a recursive definition of ``expression'': a constant or variable is an


expression, an expression in parentheses is an expression, an expression plus an

Page 234 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

expression is an expression, etc. There are obviously an infinite number of


expressions, of arbitrary complexity. In exactly the same way, there are an infinite
number of data types in C. We've already seen the basic data
types: int, char, double, etc. But then we have the derived data types such as array-
of-char and pointer-to-int and function-returning-double. So we can say that for
any type, array-of-type is another type, and pointer-to-type is another type, and
function-returning-type is another type. Once we've said that, we can see that there
is also the possibility of arrays of pointers, and arrays of arrays, and functions
returning pointers, and even (in section 5.11, though this is a deeper topic) pointers
to functions. (The only possibilities that C doesn't support are functions returning
arrays, and arrays of functions, and functions returning functions.)

Make sure you understand why an integer is something that can be ``compared or
moved in a single operation,'' but that a string (that is, an array of char) is not.
Then, realize that a pointer is also something that can be ``compared or moved in a
single operation.'' (Actually, though, the string comparisons we'll be doing are not
single operations.) From time to time you'll hear me caution you not to worry too
much about certain aspects of efficiency. Here, it's true that the overhead of
Page 235 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

copying entire strings from one place to another, a character at a time (which is the
overhead we'll be getting around by manipulating pointers instead) can be
significant, but that's not the only concern: once we're comfortable with the idea,
manipulating pointers will be somewhat easier on us, too. (Copying lots of
characters around is a nuisance, and it can also be dangerous, if the destination isn't
big enough or isn't in the right place.)

Don't worry about the ``one long character array'' that the ``lines to be sorted are
stored end-to-end in.'' Instead, look at the picture at the bottom of page 107, which
shows the pointers that might be set up after reading the lines
defghi
jklmnopqrst
abc
On the left are the pointers before sorting, and on the right are the pointers after
sorting. The three strings have not been moved, but by reshuffling the pointers, the
three pointers in order now point to the lines
abc
defghi

Page 236 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

jklmnopqrst

page 108

Once again, we see a nice simple decomposition of the problem, which might seem
deceptively simple except that when problems are decomposed in simple ways like
this, and then implemented faithfully, they really can be this simple. Deferring the
sorting step is an excellent idea, especially if we didn't quite follow the details of
the sorting functions in the previous chapter. (Actually, in practice, we can usually
defer the sorting step forever, since there's often a general-purpose sort routine
provided for us somewhere. C is no exception: a qsort function is a required part
of its standard library. For the most part, the only people who have to write sort
routines are programming students and the few people who get stuck implementing
system functions.)

The main program at the bottom of page 108 looks a bit more elaborate than the
pseudocode at the top of the page, but the essence of the program is the three calls
to readlines, qsort, and writelines. Everything else is declarations, plus an error

Page 237 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

message which is printed if readlines is for some reason not able to read the input.
Eventually, you should be able to understand why all of the various declarations
are required, but you can skim over them at first.

page 109

The readlines function first calls our old friend getline to read each line into a
local array, line. On page 29 in section 1.9, we saw a program for finding the
longest line in the input: it read each line into a local array line, and kept a copy of
the longest line in a second array longest. In that program, it didn't matter that the
input array line was continually overwritten with each new input line, and that
most lines (except the longest one) were lost and forgotten. Here, however,
we doneed to save all of the input lines somewhere, so that we can sort them and
print them later.

The lines are saved by calling alloc, a function which we wrote in section 5.4 but
may have skimmed over. alloc allocates n bytes of new memory for something
which we need to save. Each time we read another line, we call alloc to allocate
Page 238 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

some new memory to store it, then call strcpy to copy the line from the linearray
to the newly allocated memory. This way, it's okay that the next line is read into
the same line array; we save each line, as it's read, into its own little alloc'ed piece
of memory.

Note that memory allocated with a routine such as alloc persists, just as global
and static variables do; it does not disappear when the function that allocated it
returns.

Hopefully you're getting used to reading compressed condition statements by now,


because here's another doozy:
if (nlines >= maxlines || (p == alloc(len)) == NULL)
This line checks to make sure we have enough room to store the new line we just
read. We need two things: (1) a slot in the lineptr array to store the pointer, and
(2) space allocated by alloc to store the line itself. If we don't have either of these
things, we return -1, indicating that we ran out of memory. We don't have a slot in
the lineptr array if we've already read maxlines lines, and we don't have room to

Page 239 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

store the line itself if alloc returns NULL. The subexpression (p = alloc(len)) ==
NULL is equivalent in form to to other assign-and-test combinations we've been
using involving getchar and getline: it assigns alloc's return value to p, then
compares it to NULL.

Normally, we might be suspicious of the call alloc(len). Why? Remember that


strings are always terminated by '\0', so the space required to store a string is
always one more than the the number of characters in it. Normally, we'll call things
like alloc(len + 1), and accidentally calling alloc(len) is usually a bug. Here, it
happens to be okay, because before we copy the line to the newly-allocated
memory, we strip the newline '\n' from the end of it, by overwriting it with'\0',
hence making the string one shorter than len. (Why is the last character in line,
namely the '\n', at line[len-1], and not line[len]?)

The fragments
if (nlines >= maxlines ...
and
lineptr[nlines++] = p;
Page 240 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

deserve some attention. These represent a common way of filling in an array in


C. nlines always holds the number of lines we've read so far (it's
another invariant). It starts out as 0 (we haven't read any lines yet) and it ends up
as the total number of lines we've read. Each time we read a new line, we store the
line (more precisely, a pointer to it) in lineptr[nlines++]. By using postfix ++, we
store the pointer in the slot indexed by the previous value of nlines, which is what
we want, because arrays are 0-based in C. The first time through the loop, nlines is
0, so we store a pointer to the first line in lineptr[0], and then incrementnlines to
1. If nlines ever becomes equal to maxlines, we've filled in all the slots of the
array, and we can't use any more (even though, at that point, the highest-filled cell
in the array is lineptr[maxlines-1], which is the last cell in the array, again
because arrays are 0-based). We test for this condition by checking nlines >=
maxlines, as a little measure of paranoia. The test nlines == maxlines would also
work, but if we ever accidentally introduce a bug into the program such that we fill
past the last slot without noticing it, we wouldn't want to keep on filling farther and
farther past the end.

Deep sentences:
Page 241 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

...lineptr is an array of MAXLINES elements, each element of which is a pointer to


a char. That is, lineptr[i] is a character pointer...
We can see that lineptr[i] has to be a character pointer, by looking at two things:
in the function readlines, the line
lineptr[lines++] = p;
has a character pointer on the right-hand side, and the only thing we can assign a
character pointer to is another character pointer. Also, in the function writelines,
in the line
printf("%s\n", lineptr[i]);
printf's %s format expects a pointer to a character, so that's what lineptr[i] had
better be.

Note that writelines prints a newline after each line, since newlines were stripped
out of the input lines by readlines.

Don't worry too much about the discussion at the bottom of page 109. We saw in
section 5.3 that due to the ``strong relationship'' between pointers and arrays, it is
always possible to manipulate an array using pointer-like notation, and to
Page 242 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

manipulate a pointer using array-like notation. Since lineptr is an array, it is


possible to manipulate it using pointer-like notation, but since what it's an array of
is other pointers, it can start to get a bit confusing. Though many programmers do
write things like
printf("%s\n", *lineptr++);
and though this is correct code, and though one should probably understand it to
have a 100% complete understanding of C, I've decided that code like that is just a
bit too hard to follow, and I'd always write (perhaps more pedestrian and mundane)
things like
printf("%s\n", lineptr[i]);
or
printf("%s\n", lineptr[i++]);

page 110

Since I didn't ask you to follow the qsort example in section 4.10 in complete
detail, I won't ask you to work through this one completely, either. But if you
compare the code here to the code on pages 87-88, you will see that
Page 243 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

the only significant differences are that the variables and arrays containing the
things being sorted have been changed from int to char * (pointer-to-char), and
the comparison
if (v[i] < v[left])
has been changed to
if (strcmp(v[i], v[left]) < 0)

section 5.7: Multi-dimensional Arrays


page 111

The month_day function is another example of a function which simulates having


multiple return values by using pointer parameters. month_day is declared as void,
so it has no formal return value, but two of its parameters, pmonth and pday, are
pointers, and it fills in the locations pointed to by these two pointers with the two
values it wants to ``return.'' One line of the definition of month_day on page 111 is
cut off in all printings I have seen: it should read

Page 244 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

void month_day(int year, int yearday, int *pmonth, int *pday)

As we've said, although any nonzero value is considered ``true'' in C, the built-in
relational and Boolean operators always ``return'' 0 or 1. Therefore, the line
int leap = year%4 == 0 && year%100 != 0 || year%400 == 0;
sets leap to 1 or 0 (``true'' or ``false'') depending on the condition
year%4 == 0 && year%100 != 0 || year%400 == 0
which is the condition for leap years in the Gregorian calendar. (It's a little-known
fact that century years are not leap years unless they are also divisible by 400.
Thus, 2000 will be a leap year.) The 1/0 value that leap receives is what the
authors are referring to when they say that ``the arithmetic value of a logical
expression... can be used as a subscript of the array daytab.'' This line could also
have been written
int leap;
if (year%4 == 0 && year%100 != 0 || year%400 == 0)
leap = 1;
else
leap = 0;
or
Page 245 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

int leap = (year%4 == 0 && year%100 != 0 || year%400 == 0) ? 1 : 0;

page 112

The daytab array holds small integers (in the range 0-31), so it can legally be made
an array of char, though whether this is a legitimate use is a question of style.

Deep sentence:

In C, a two-dimensional array is really a one-dimensional array, each of whose


elements is an array.
Earlier we said that ``array-of-type is another type,'' and here we must believe it:
since array-of-type is a type, array-of-(array-of-type) is yet another type.

The statement that ``Elements are stored by rows, so the rightmost subscript, or
column, varies fastest as elements are accessed in storage order'' probably won't
make much sense unless you've done a lot of work with other languages, such as
FORTRAN, which do have true multi-dimensional arrays. It's pretty arbitrary what
Page 246 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

you call a ``row'' and what you call a ``column''; the most important thing to know
is which subscript goes with which dimension. If you have
int a[10][20];
then in the reference a[i][j], i can range from 0 to 9 and j can range from 0 to 19.
In other words, you might write
for (i = 0; i < 10; i++)
for (j = 0; j < 20; j++)
do something with a[i][j]

We also want to know what a actually is. Is it an array of 10 arrays, each of size
20, or is it an array of 20 arrays, each of size 10? There are other ways of
convincing ourselves of the answer, but for now let's just say that the ``closer''
dimensions are closer to what a is. Therefore, a is first an array of size 10, and
what it's an array of is arrays of 20 ints. This also tells us that if we ever refer
to a[i] (without a second subscript), then we're referring to just one of those 10
arrays (of size 20) in its entirety.

Page 247 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

When we look back at the initialization of the daytab array on page 111, everything
lines up. daytab is defined as
char daytab[2][13]
and we can see from the initializer that there are two (sub)arrays, each of size 13.
(We can also see that there is some justification for saying that the first subscript
refers to ``rows'' and the second to ``columns.'')

The authors illustrate one way of dealing with C's 0-based arrays when you have
an algorithm that really wants to treat an array as if it were 1-based. Here, rather
than remembering to subtract one from the 1-based month number each time, they
chose to waste a ``column'' of the array, and declare it one larger than necessary, so
that they could refer to subscripts from [1] to [12].

One last note about the initialization of daytab: you may have seen code in other
programming books that kept an array of the cumulative days of all the months:
{0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365}

Page 248 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Precomputing an array like that might make things a tiny bit easier on the
computer (it wouldn't have to loop through the entire array each time, as it does in
theday_of_year function), but it makes it considerably harder to see what the
numbers mean, and to verify that they are correct. The simple table of individual
month lengths is much clearer, and if the computer has to do a bit more grunge
work, well, that's what computers are for. As explained in another book co-
authored by Brian Kernighan:
A cumulative table of days must be calculated by someone and checked by
someone else. Since few people are familiar with the number of days up to the end
of a particular month, neither writing nor checking is easy. But if instead we use a
table of days per month, we can let the computer count them for us. (``Let the
machine do the dirty work.'')

The bottom of page 112 begins to get confusing. The ``number of rows'' of an array
like daytab ``is irrelevant'' when passed to a function such as the
hypothetical fbecause the compiler doesn't need to know the number of rows when
calculating subscripts. It does need to know the number of columns or ``width,''
because that's how it knows that the second element on the second row of a 10-
Page 249 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

column array is actually 12 cells past the beginning of the array, which is
essentially what it needs to know when it goes off and actually accesses the array
in memory. But it doesn't need to know how long the overall array is, as long as we
promise not to run off the end of it, and that's always up to us. (This is why we
haven't specified the array sizes in the definitions of functions such as getline on
pages 29 and 69, or atoi on pages 43, 61, and 73, or readlines on page 109,
although we did carry the array size as a separate argument
to getline and readlines, to assist us in our promise not to run off the end.)

The third version of f on page 112 comes about because of the ``gentle fiction''
involving array parameters. We learned on page 99 that functions don't really
receive arrays as parameters; they receive arrays (since any array passed by the
caller decayed immediately to a pointer). On page 39 we wrote a strlen function
as
int strlen(char s[])
but on page 99 we rewrote it as
int strlen(char *s)

Page 250 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

which is closer to the way the compiler sees the situation. (In fact, when we
write int strlen(char s[]), the compiler essentially rewrites it as int
strlen(char *s) for us.) In the same way, a function declared as
f(int daytab[][13])
can be rewritten by us (or if not, is rewritten by the compiler) to
f(int (*daytab)[13])
which declares the daytab parameter as a pointer-to-array-of-13-ints. Here we see
two things: (1) the rewrite which changes an array parameter to a pointer parameter
happens only once (we end up with a pointer to an array, not a pointer to a pointer),
and (2) the syntax for pointers to arrays is a bit messy, because of some required
extra parentheses, as explained in the text.

If this seems obscure, don't worry about it too much; just declare functions with
array parameters matching the arrays you call them with, like
f(int daytab[2][13])
and let the compiler worry about the rewriting.

Deep sentence:
Page 251 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

More generally, only the first dimension (subscript) of an array is free; all the
others have to be specified.

This just says what we said already: when declaring an array as a function
parameter, you can leave off the first dimension because it is the overall length and
not knowing it causes no immediate problems (unless you accidentally go off the
end). But the compiler always needs to know the other dimensions, so that it
knows how the rown and columns line up.

section 5.8: Initialization of Pointer Arrays


page 113

This section is short and sweet, and there are only two things I feel the need to
comment on. The sentence ``The characters of the i-th string are placed
somewhere'' simply refers to the fact that string literals always work that way
(except when they're used as array initializers, as explained on page 104). We don't

Page 252 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

really care where the characters are, as long as we can keep hold of a pointer to
them.

The other thing to notice is that the month_name function does verify that its
argument is valid. If it didn't check n against the boundary values 1 and 12, what
would happen if we called month_name(123)?

section 5.9: Pointers vs. Multi-dimensional Arrays


Actually, some people (and not just newcomers) are sometimes confused about the
difference between a one-dimensional array and a single pointer, too; moving to
two-dimensional arrays, arrays of pointers, and pointers to pointers only makes
things worse. (But don't lose heart: if you pay attention and keep your head
screwed on straight, you should be able to keep the differences clearly in mind.)

The adjective ``syntactically'' in the paragraph at the bottom of the page is


significant: after saying

Page 253 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

int *b[10];
an immediate reference to b[3][4] would not be completely legal. It wouldn't be a
syntax error or anything, but when the compiler tried to fetch the third pointer and
then the fourth integer pointed to, it would go off into deep space, because
there isn't a third pointer yet and it doesn't point anywhere.

You might want to draw a picture of the data structures that would result
``[a]ssuming that each element of b does point to a twenty-element array,'' and
verify that there are ``200 ints set aside, plus ten cells for the pointers.'' (The
picture will be similar to the one on the next page.)

Actually, I'm not sure if having rows of different lengths is the only important
advantage of using a pointer array. Another is that the size of the arrays (as we'll
see later) can be decided at run-time; another is that the pointers make certain
manipulations easier (such as the sorting example we worked through in section
5.6).

page 114

Page 254 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Do study the pictures on this page carefully, and make sure you understand the
representations of the name and aname arrays and how they differ. (You might want
to refer back to the similar discussion of pmessage and amessage on page 104 in
section 5.5.)

section 5.10: Command-line Arguments


page 115

The picture at the top of page 115 doesn't quite match the declaration
char *argv[]
it's actually a picture of the situation declared by
char **argv
which is what main actually receives. (The array parameter declaration char
*argv[] is rewritten by the compiler to char **argv, in accordance with the
discussion in sections 5.3 and 5.8.) Also, the ``0'' at the bottom of the array is just a
representation of the null pointer which conventionally terminates the argv array.

Page 255 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

(Normally, you'll never encounter the terminating null pointer, because if you think
of argv as an array of size argc, you'll never access beyond argv[argc-1].)

The loop
for (i = 1; i < argc; i++)
looks different from most loops we see in C (which either start at 0 and use <, or
start at 1 and use <=). The reason is that we're skipping argv[0], which contains the
name of the program.

The expression
printf("%s%s", argv[i], (i < argc-1) ? " " : "");
is a little nicety to print a space after each word (to separate it from the next word)
but not after the last word. (The nicety is just that the code doesn't print an extra
space at the end of the line.) It would also be possible to fold in the
following printf of the newline:
printf("%s%s", argv[i], (i < argc-1) ? " " : "\n");

Page 256 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

As I mentioned in comment on the bottom of page 109, it's not necessary to write
pointer-incrementing code like
while(--argc > 0)
printf("%s%s", *++argv, (argc > 1) ? " " : "");
if you don't feel comfortable with it. I used to try write code like this, because it
seemed to be what everybody else did, but it never sat well, and it was always just
a bit too hard to write and to prove correct. I've reverted to simple, obvious loops
like
int argi;
char *sep = "";

for (argi = 1; argi < argc; argi++) {


printf("%s%s", sep, argv[argi]);
sep = " ";
}
printf("\n");

Page 257 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Often, it's handy to have the original argc and argv around later, anyway. (This
loop also shows another way of handling space separators.)

page 116

Page 116 shows a simple improvement on the matching-lines program first


presented on page 69; page 117 adds a few more improvements. The differences
between page 69 and page 116 are that the pattern is read from the command line,
and strstr is used instead of strindex. The difference between page 116 and page
117 is the handling of the -n and -x options. (The next obvious improvement,
which we're not quite in a position to make yet, is to allow a file name to be
specified on the command line, rather than always reading from the standard
input.)

page 117

Several aspects of this code deserve note.

Page 258 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The line
while (c = *++argv[0])
is not in error. (In isolation, it might look like an example of the classic error of
accidentally writing = instead of == in a comparison.) What it's actually doing is
another version of a combined set-and-test: it assigns the next character pointed to
by argv[0] to c, and compares it against '\0'. You can't see the comparison
against '\0', because it's implicit in the usual interpretation of a nonzero
expression as ``true.'' An explicit test would look like this:
while ((c = *++argv[0]) != '\0')
argv[0] is a pointer to a character in a string; ++argv[0] increments that pointer to
point to the next character in the string; and *++argv[0] increments the pointer
while returning the next character pointed to. argv[0] is not the first string on the
command line, but rather whichever one we're looking at now, since elsewhere in
the loop we increment argv itself.

Some of the extra complexity in this loop is to make sure that it can handle both
-x -n

Page 259 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

and
-xn
In pseudocode, the option-parsing loop is
for ( each word on the command line )
if ( it begins with '-' )
for ( each character c in that word )
switch ( c )
...
For comparison, here is another way of writing effectively the same loop:
int argi;
char *p;

for (argi = 1; argi < argc && argv[argi][0] == '-'; argi++)


for (p = &argv[argi][1]; *p != '\0'; p++)
switch (*p) {
case 'x':
...

Page 260 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

This uses array notation to access the words on the command line, but pointer
notation to access the characters within a word (more specifically, a word that
begins with '-'). We could also use array notation for both:
int argi, chari;

for (argi = 1; argi < argc && argv[argi][0] == '-'; argi++)


for (chari = 1; argv[argi][chari] != '\0'; chari++)
switch (argv[argi][chari]) {
case 'x':
...
In either case, the inner, character loop starts at the second character (index [1]),
not the first, because the first character (index [0]) is the '-'.

It's easy to see how the -n option is implemented. If -n is seen, the number flag is
set to 1 (a.k.a. ``true''), and later, in the line-matching loop, each time a line is
printed, if the number flag is true, the line number is printed first. It's harder to see

Page 261 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

how -x works. An except flag is set to 1 if -x is present, but how is exceptused? It's
buried down there in the line
if ((strstr(line, *argv) != NULL) != except)
What does that mean? The subexpression
(strstr(line, *argv) != NULL)
is 1 if the line contains the pattern, and 0 if it does not. except is 0 if we should
print matching lines, and 1 if we should print non-matching lines. What we've
actually implemented here is an ``exclusive OR,'' which is ``if A or B
but not both.'' Other ways of writing this would be
int matched = (strstr(line, *argv) != NULL);
if (matched && !except || !matched && except) {
if (number)
printf("%ld:", lineno);
printf("%s", line);
found++;
}
or
int matched = (strstr(line, *argv) != NULL);
if (except ? !matched : matched) {
if (number)

Page 262 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

printf("%ld:", lineno);
printf("%s", line);
found++;
}
or
int matched = (strstr(line, *argv) != NULL);
if (!except) {
if (matched) {
if (number)
printf("%ld:", lineno);
printf("%s", line);
found++;
}
}
else {
if (!matched) {
if (number)
printf("%ld:", lineno);
printf("%s", line);
found++;
}
}

Page 263 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

There's clearly a tradeoff: the last version is in some sense the most clear (and the
most verbose), but it ends up repeating the line-number printing and any other
processing which must be done for found lines. Therefore, the compressed,
perhaps slightly more cryptic forms are better: some day, it's a virtual certainty that
more processing will be added for printed lines (for example, if we're searching
multiple files, we'll want to print the filename for matching lines, too), and if the
printing is duplicated in two places, it's far too likely that we'll overlook that fact
and add the new code in only one place.

One last point on the pattern-matching program: it's probably clearer to declare a
pointer variable
char *pat;
and set it to the word from argv to be used as the search pattern (argv[1] or *argv,
depending on whether we're looking at page 116 or 117), and then use that in the
call to strstr:
if (strstr(line, pat) != NULL ...

Page 264 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Chapter 6: Structures

page 127

There's one other piece of motivation behind structures that it's useful to discuss.
Suppose we didn't have structures (or didn't know what they were or how to use
them). Suppose we wanted to implement payroll records. We might set up a bunch
of parallel arrays, holding the names, mailing addresses, social security numbers,
and salaries of all of our employees:
char *name[100];
char *address[100];
long ssn[100];
float salary[100];
The idea here is that name[0], address[0], ssn[0], and salary[0] would
describe one employee, array slots with subscript [1] would
describe the second employee, etc. There are at least two
problems with this scheme: first, if we someday want to handle

Page 265 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

more than 100 employees, we have to remember to change the


size of several arrays. (Using a symbolic constant like
#define MAXEMPLOYEES 100
would certainly help.)

More importantly, there would be no easy way to pass around all the information
associated with a single employee. Suppose we wanted to write the
functionprint_employee, which will print all the information associated with a
particular employee. What arguments would this function take? We could pass it
the index to use to retrieve the information from the arrays, but that would mean
that all of the arrays would have to be global. We could pass the function an
individual name, address, SSN, and salary, but that would mean that whenever we
added a new piece of information to the database (perhaps next week we'll want to
keep track of employee's shoe sizes), we would have to add another argument to
the print_employee function, and change all of the calls. (Pretty soon, the number
of arguments to the print_employee function would become unwieldy.) What we'd

Page 266 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

really like is a way to encapsulate all of the data about a single employee into a
single data structure, so we could just pass that data structure around.

The right solution to this problem, in languages such as C which support the idea,
is to define a structure describing an employee. We can make one array of these
structures to describe all the employees, and we can pass around single instances of
the structure where they're needed.

section 6.1: Basics of Structures

section 6.2: Structures and Functions

section 6.3: Arrays of Structures

The sizeof operator

section 6.4: Pointers to Structures

Page 267 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 6.5: Self-referential Structures

section 6.1: Basics of Structures


Don't get too excited about the prospect of doing graphics in C--there's no one
standard or portable way of doing it, so the points and rectangles we're going to be
discussing must remain abstract for now (we won't be able to plot them out).

page 128

To summarize the syntax of structure declarations: A structure declaration has


about four parts, most of them optional: the keyword struct, a structure
tag(optional), a brace-enclosed list of declarations for the members (also called
``fields'' or ``components'') of the structure (optional), and a list of variables of the
new structure type (optional). The arrangement looks like this:
struct tag {
member declarations
} declared variables ;
Page 268 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Normally, a structure declaration defines either a tag and the members, or some
variables based on an existing tag, or sometimes all three at once. That is, we
might first declare a structure:
struct point { /* 1 */
int x;
int y;
};
and then some variables of that type:
struct point here, there; /* 2 */
Or, we could combine the two:
struct point { /* 3 */
int x;
int y;
} here, there;
The list of members (if present) describes what the new structure ``looks like
inside.'' The list of variables (if present) is (obviously) the list of variables of this
new type which we're defining and which the rest of the program will use. The tag
(if present) is just an arbitrary name for the structure type itself (not for any
variable we're defining). The tag is used to associate a structure definition (as in

Page 269 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

fragment 1) with a later declaration of variables of that same type (as in fragment
2).

One thing to beware of: when you declare the members of a structure without
defining any variables, always remember the trailing semicolon, as shown in
fragment 1 above. (If you forget it, the compiler will wait until the next thing it
finds in your source file, and try to define that as a variable or function of the
structure type.)

section 6.2: Structures and Functions


In this section, we'll begin playing with structures more or less as if they were
ordinary variables such as we've been using all along (which they more or less are).
As we'll see, we can declare variables of structure type, declare functions which
accept structures as parameters and return them, declare pointers to structures, take
the address of a structure (creating a pointer-to-structure) with &, and assign
structures.

Page 270 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Notice that when we declare something as ``a structure type,'' we always have to
say which structure type, usually by using the struct tag. If we've set up a ``point''
structure as above, then to declare a variable of this type, we say
struct point thepoint;
Both
struct thepoint; /* WRONG */
and
point thepoint; /* WRONG */
would be errors.

The above list of things the language lets us do with structures lets us keep them
and move them around, but there isn't really anything defined by the language that
we can do with structures. It's up to us to define any operations on structures,
usually by writing functions. (The addpoint function on page 130 is a good
example. It will make a bit more sense if you think of it as adding not isolated
points, but rather vectors. [We can't add Seattle plus Los Angeles, but we could
add (two miles south, one mile east) plus (one mile east, two miles north).])

Page 271 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 131

As an aside, how safe are the min() and max() macros defined at the top of page
131, with respect to the criteria discussed on pages 15 and 16 of the notes on
section 4.11.2 (page 90 in the text)?

The precise meaning of the ``shorthand'' -> operator is that sp->m is, by definition,
equivalent to (*sp).m, for any structure pointer sp and member m of the pointed-to
structure.

section 6.3: Arrays of Structures


page 132

In the previous section we introduced pointers to structures and functions returning


structures without fanfare. But now let's pay attention to the fact that structures fit
the pattern of the other types: a structure is a type, so we can have pointer-to-struct,
array-of-struct, and function-returning-struct. (We can also say, following our
Page 272 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

ongoing pattern of recursive definitions, that for any list of types t1, t2, t3, ..., we
can make a new type
struct tag {
t1 m1;
t2 m2;
t3 m3;
...
};
which is a structure composed of members of those types.)

page 134

We glossed over the binary search routine on page 58 in section 3.3, so we can
skip the details of this one, too. This illustrates another benefit of breaking
functionality out into functions, though: as long as you know what a function does,
you can understand a program that it's in without necessarily understanding all of
it. In this case,binsearch searches an array tab, containing n cells of type struct

Page 273 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

key,looking for one whose word field matches the parameter word. If it finds a
matching cell, it returns its index in the array; otherwise, it returns -1.

The sizeof operator


page 135

This may seem like an excessively roundabout or low-level way of finding the
number of elements in an array, but it is the way it's done in C, and it's perfectly
safe and straightforward once you get used to it. (I would, however, be hard-
pressed to defend against the accusation that it's a bit too low-level.)

Note that sizeof works on both type names (things like int, char *, struct key,
etc.) and variables (strictly speaking, any expression). Parentheses are required
when you're using sizeof with a type name and optional when you're using it with
a variable or expression (just like return), but it's safe to just always use
parentheses.

Page 274 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

sizeof returns the size counted in bytes, where the C definition of ``byte'' is ``the
size of a char.'' In other words, sizeof(char) is always 1. (It turns out that it's not
necessarily the case, though, that a byte or a char is 8 bits.) When we start doing
our own dynamic memory allocation (which will be pretty soon), we'll always be
needing to know the size of things so that we can allocate space for them, so it's
just as well that we're meeting and getting used to the sizeof operator now.

The sentence ``But the expression in the #define is not evaluated by the
preprocessor'' means that, as far as the preprocessor is concerned, the ``value'' of
the macro NKEYS (like the value of any macro) is just a string of characters like
(sizeof(keytab) / sizeof keytab[0])
which it replaces wherever NKEYS is used, and which will then be evaluated by the
compiler as usual, so it doesn't matter that the preprocessor wouldn't have known
how to deal with the sizeof operator, or how big the keytab array or a struct
key were.

A third way of defining NKEYS would be

Page 275 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

#define NKEYS (sizeof(keytab) / sizeof *keytab)

Note that the definition of NKEYS depends on the definition of the keytab array
(which appears on page 133), and both of them will have to precede the use
ofNKEYS in main on page 134. (Also, all three will have to be in the same source file,
unless other steps are taken.)

page 136

Notice that getword has a lot in common with the getop function of the calculator
example (section 4.3, page 80).

section 6.4: Pointers to Structures


The bulk of this section illustrates how to rewrite the binsearch function (which
we've already been glossing over) in terms of pointers instead of arrays (an
exercise which we've been downplaying). There are a few important points
towards the end of the section, however.
Page 276 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 138

When we began talking about pointers and arrays, we said that it was important
never to access outside of the defined and allocated bounds of an array, either with
an out-of-range index or an out-of-bounds pointer. There is one exception: you
may compute (but not access, or ``dereference'') a pointer to the imaginary element
which sits one past the end of an array. Therefore, a common idiom for accessing
an array using a pointer looks like
int a[10];
int *ip;

for (ip = &a[0]; ip < &a[10]; ip++)


...
or
int a[10];
int *endp = &a[10];
int *ip;

Page 277 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

for (ip = a; ip < endp; ip++)


...
The element a[10] does not exist (the allocated elements run from [0] to [9]), but
we may compute the pointer &a[10], and use it in expressions like ip <
&a[10] and endp = &a[10].

Deep sentence:

Don't assume, however, that the size of a structure is the sum of the sizes of its
members.

If this isn't the sort of thing you'd be likely to assume, you don't have to remember
the reason, which is mildly esoteric (having to do with memory alignment
requirements).

Page 278 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 6.5: Self-referential Structures


page 139

In section 4.10, we met recursive functions. Now, we're going to meet recursively-
defined data structures. Don't throw up your hands: the two should be easier to
understand in combination.

The mention of ``quadratic running time'' is tangential, but it's a useful-enough


concept that it might be worth a bit of explanation. If we were keeping a simple list
(``linear array'') in order, each time we had a new word to install, we'd have to scan
over the old list. On average, we'd have to scan over half the old list. (Even if we
used binary search to find the position, we'd still have to move some part of the list
to insert it.) Therefore, the more words that were in the list, the longer it would
take to install each new word. It turns out that the running time of this linear
insertion algorithm would grow as the square of the number of items in the list
(that's what ``quadratically'' means). If you doubled the size of the list, the running
time would be four times longer. An algorithm like this may seem to work fine
Page 279 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

when you run it on small test inputs, but then when you run it on a real problem
consisting of a thousand or ten thousand or a million words, it bogs down
hopelessly.

A binary tree is a great way to keep a set of words (or other values) in sorted order.
The definition of a binary tree is simply that, at each node, all items in the left
subtree are less than the item at that node, and all items in the right subtree are
greater. (Note that the top item in the left subtree is not
necessarily immediately less than the item at that node or anything; the
immediately-preceding item is merely down in the left subtree somewhere, along
with all the rest of the preceding items. In the ``now is the time'' example, the word
``now'' is neither the first, last, nor middle word in the sorted list; it's merely the
word that happened to be installed first. The word preceding it is ``men''; the word
following it is ``of.'' The first word in the sorted list is ``aid,'' and the last word is
``to.'')

The binary tree may not immediately seem like much of an improvement over the
linear array--we still have to scan over part of the existing tree in order to insert
Page 280 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

each new word, and the time to add each new word will get longer as there are
more words in the tree. But, if you do the math, it turns out that on average you
have to scan over a much smaller part of the tree, and it's not a simple fraction like
half or one quarter, but rather the log (base two) of the number of items already in
the tree. Furthermore, inserting a new node doesn't involve reshuffling any old
data. For these reasons, the running time of binary tree insertion doesn't slow down
nearly as badly as linear insertion does.

By the way, the reason that the word ``binary'' comes up so often is because it
simply means ``two.'' The binary number system has two digits (0 and 1); a binary
operator has two operands; binary search eliminates half (one over two) of the
possibilities at each step; a binary tree has two subtrees at each node.

One other bit of nomenclature: the word ``node'' simply refers to one of the
structures in a set of structures that is linked together in some way, and as we're
about to see, we're going to use a set of linked structures to implement a binary
tree. Just as we talk about a ``cell'' or ``element'' of an array, we talk about a
``node'' in a tree or linked list.
Page 281 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

When we look at the description of the algorithm for finding out whether a word is
already in the tree, we may begin to see why the binary tree is more efficient than
the linear list. When searching through a linear list, each time we discard a value
that's not the one we're looking for, we've only discarded that one value; we still
have the entire rest of the list to search. In a binary tree, however, whenever we
move down the tree, we've just eliminated half of the tree. (We might say that a
binary tree is a data structure which makes binary search automatic.) Consider
guessing a number between one and a hundred by asking ``Is it 1? Is it 2? Is it 3?''
etc., versus asking ``Is it less than 50? Is it greater than 25? Is it less than 12?''

page 140

Make sure you're comfortable with the idea of a structure which contains pointers
to other instances of itself. If you draw some little box-and-arrow diagrams for a
binary tree, the idea should fall into place easily. (As the authors point out, what
would be impossible would be for a structure to contain not a pointer but rather
another entire instance of itself, because that instance would contain another, and
another, and the structure would be infinitely big.)
Page 282 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 141

Note that addtree accepts as an argument the tree to be added to, and returns a
pointer to a tree, because it may have to modify the tree in the process of adding a
new node to it. If it doesn't have to modify the tree (more precisely, if it doesn't
have to modify the top or root of the tree) it returns the same pointer it was handed.

Another thing to note is the technique used to mark the edges or ``leaves'' of the
tree. We said that a null pointer was a special pointer value guaranteed not to point
anywhere, and it is therefore an excellent marker to use when a left or right subtree
does not exist. Whenever a new node is built, addtree initializes both subtree
pointers (``children'') to null pointers. Later, another chain of calls to addtree may
replace one or the other of these with a new subtree. (Eventually, both might be
replaced.)

If you don't completely see how addtree works, leave it for a moment and look
at treeprint on the next page first.

Page 283 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The bottom of page 141 discusses a tremendously important issue: memory


allocation. Although we only have one copy of the addtree function (which may
call itself recursively many times), by the time we're done, we'll have many
instances of the tnode structure (one for each unique word in the input). Therefore,
we have to arrange somehow that memory for these multiple instances is properly
allocated. We can't use a local variable of type struct tnode in addtree, because
local variables disappear when their containing function returns. We can't use
a static variable of type struct tnode in addtree, or a global variable of
type struct tnode, because then we'd have only one node in the whole program,
and we need many.

What we need is some brand-new memory. Furthermore, we have to arrange it so


that each time addtree builds a brand-new node, it does so in another new piece of
brand-new memory. Since each node contains a pointer (char *) to a string, the
memory for that string has to be dynamically allocated, too. (If we didn't allocate
memory for each new string, all the strings would end up being stored in
the word array in main on page 140, and they'd step all over each other, and we'd
only be able to see the last word we read.)
Page 284 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

For the moment, we defer the questions of exactly where this brand-new memory
is to come from by defining two functions to do it. talloc is going to return a
(pointer to a) brand-new piece of memory suitable for holding a struct tnode,
and strdup is going to return a (pointer to a) brand-new piece of memory
containing a copy of a string.

page 142

treeprint is probably the cleanest, simplest recursive function there is. If you've
been putting off getting comfortable with recursive functions, now is the time.

Suppose it's our job to print a binary tree: we've just been handed a pointer to the
base (root) of the tree. What do we do? The only node we've got easy access to is
the root node, but as we saw, that's not the first or the last element to print or
anything; it's generally a random node somewhere in the middle of the eventual
sorted list (distinguished only by the fact that it happened to be inserted first). The
node that needs to be printed first is buried somewhere down in the left subtree,
and the node to print just before the node we've got easy access to is buried
Page 285 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

somewhere else down in the left subtree, and the node to print next (after the one
we've got) is buried somewhere down in the right subtree. In fact, everything down
in the left subtree is to be printed before the node we've got, and everything down
in the right subtree is to be printed after. A pseudocode description of our task,
therefore, might be
print the left subtree (in order)
print the node we're at
print the right subtree (in order)
How can we print the left subtree, in order? The left subtree is, in general, another
tree, so printing it out sounds about as hard as printing an entire tree, which is what
we were supposed to do. In fact, it's exactly as hard: it's the same problem. Are we
going in circles? Are we getting anywhere? Yes, we are: the left subtree, even
though it is still a tree, is at least smaller than the full tree we started with. The
same is true of the right subtree. Therefore, we can use a recursive call to do the
hard work of printing the subtrees, and all we have to do is the easy part: print the
node we're at. The fact that the subtrees are smaller gives us the leverage we need
to make a recursive algorithm work.

Page 286 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

In any recursive function, it is (obviously) important to terminate the recursion,


that is, to make sure that the function doesn't recursively call itself forever. In the
case of binary trees, when you reach a ``leaf'' of the tree (more precisely, when the
left or right subtree is a null pointer), there's nothing more to visit, so the recursion
can stop. We can test for this in two different ways, either before or after we make
the ``last'' recursive call:
void treeprint(struct tnode *p)
{
if(p->left != NULL)
treeprint(p->left);
printf("%4d %s\n", p->count, p->word);
if(p->right != NULL)
treeprint(p->right);
}
or
void treeprint(struct tnode *p)
{
if(p == NULL)
return;

Page 287 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

treeprint(p->left);
printf("%4d %s\n", p->count, p->word);
treeprint(p->right);
}
Sometimes, there's little difference between one approach and the other. Here,
though, the second approach (which is equivalent to the code on page 142) has a
distinct advantage: it will work even if the very first call is on an empty tree (in this
case, if there were no words in the input). As we mentioned earlier, it's extremely
nice if programs work well at their boundary conditions, even if we don't think
those conditions are likely to occur.

(One more thing to notice is that it's quite possible for a node to have a left subtree
but not a right, or vice versa; one example is the node labeled ``of'' in the tree on
page 139.)

Page 288 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Another impressive thing about a recursive treeprint function is that it's not
just a way of writing it, or a nifty way of writing it; it's really the only way of
writing it. You might try to figure out how to write a nonrecursive version. Once
you've printed something down in the left subtree, how do you know where to go
back up to? Our struct tnode only has pointers down the tree, there aren't any
pointers back to the ``parent'' of each node. If you write a nonrecursive version,
you have to keep track of how you got to where you are, and it's not enough to
keep track of the parent of the node you're at; you have to keep a stack of all the
nodes you've passed down through. When you write a recursive version, on the
other hand, the normal function-call stack essentially keeps track of all this for you.

We now return to the problem of dynamic memory allocation. The basic approach
builds on something we've been seeing glimpses of for a few chapters now: we use
a general-purpose function which returns a pointer to a block of n bytes of
memory. (The authors presented a primitive version of such a function in section
5.4, and we used it in the sorting program in section 5.6.) Our problem is then
reduced to (1) remembering to call this allocation function when we need to, and

Page 289 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

(2) figuring out how many bytes we need. Problem 1 is stubborn, but problem 2 is
solved by the sizeof operator we met in section 6.3.

You don't need to worry about all the details of the ``digression on a problem
related to storage allocators.'' The vast majority of the time, this problem is taken
care of for you, because you use the system library function malloc.

The problem of malloc's return type is not quite as bad as the authors make it out to
be. In ANSI C, the void * type is a ``generic'' pointer type, specifically intended to
be used where you need a pointer which can be a pointer to any data type.
Since void * is never a pointer to anything by itself, but is always intended to be
converted (``coerced'') into some other type, it turns out that a cast is not strictly
required: in code like
struct tnode *tp = malloc(sizeof(struct tnode));
or
return malloc(sizeof(struct tnode));

Page 290 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

the compiler is willing to convert the pointer types implicitly, without warning you
and without requiring you to insert explicit casts. (If you feel more comfortable
with the casts, though, you're welcome to leave them in.)

page 143

strdup is a handy little function that does two things: it allocates enough memory
for one of your strings, and it copies your string to the new memory, returning a
pointer to it. (It encapsulates a pattern which we first saw in the readlines function
on page 109 in section 5.6.) Note the +1 in the call to malloc! Accidentally
calling malloc(strlen(s)) is an easy but serious mistake.

As we mentioned at the beginning of chapter 5, memory allocation can be hard to


get right, and is at the root of many difficulties and bugs in many C programs. Here
are some rules and other things to remember:

1. Make sure you know where things are allocated, either by the compiler or
by you. Watch out for things like the local line array we've been tending to
Page 291 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

use with getline, and the local word array on page 140. When a function
writes to an array or a pointer supplied by the caller, it depends on the caller
to have allocated storage correctly. When you're the caller, make sure you
pass a valid pointer! Make sure you understand why
2. char *ptr;
3. getline(ptr, 100);

is wrong and can't work. (For one thing: what does that 100 mean?
If getline is only allowed to read at most 100 characters, where have we
allocated those 100 characters that getline is not allowed to write to more of
than?)

4. Be aware of any situations where a single array or data structure is used to


store multiple different things, in succession. Think again about the
local line array we've been tending to use with getline, and the
local word array on page 140. These arrays are overwritten with each new
line, word, etc., so if you need to keep all of the lines or words around, you
must copy them immediately to allocated memory (as the line-sorting

Page 292 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

program on pages 108-9 in section 5.6 did, but as the longest line program
on page 29 in section 1.9 and the pattern-matching programs on page 69 in
section 4.1 and pages 116-7 in section 5.10 didnot have to do).
5. Make sure you allocate enough memory! If you allocate memory for an
array of 10 things, don't accidentally store 11 things in it. If you have a
string that's 10 characters long, make sure you always allocate 11 characters
for it (including one for the terminating '\0').
6. When you free (deallocate) memory, make sure that you don't have any
pointers lying around which still point to it (or if you do, make sure not to
use them any more).
7. Always check the return value from memory-allocation functions. Memory
is never infinite: sooner or later, you will run out of memory, and allocation
functions generally return a null pointer when this happens.
8. When you're not using dynamically-allocated memory any more, do try to
free it, if it's convenient to do so and the program's not just about to exit.
Otherwise, you may eventually have so much memory allocated to stuff
you're not using any more that there's no more memory left for new stuff
you need to allocate. (However, on all but a few broken systems, all
Page 293 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

memory is automatically and definitively returned to the operating system


when your program exits, so if one of your programs doesn't free some
memory, you shouldn't have to worry that it's wasted forever.)

Unfortunately, checking the return values from memory allocation functions (point
5 above) requires a few more lines of code, so it is often left out of sample code in
textbooks, including this one. Here are versions of main and addtree for the word-
counting program (pages 140-1 in the text) which do check for out-of-memory
conditions:
/* word frequency count */
main()
{
struct tnode *root;
char word[MAXWORD];

root = NULL;
while (getword(word, MAXWORD) != EOF) {
if (isalpha(word[0])) {

Page 294 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

root = addtree(root, word);


if(root == NULL) {
printf("out of memory\n");
return 1;
}
}
}

treeprint(root);

return 0;
}
struct tnode *addtree(struct tnode *p, char *w)
{
int cond;

Page 295 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

if (p == NULL) { /* a new word has arrived */


p = talloc(); /* make a new node */
if (p == NULL)
return NULL;
p->word = strdup(w);
if (p->word == NULL) {
free(p);
return NULL;
}
p->count = 1;
p->left = p->right = NULL;
} else if ((cond = strcmp(w, p->word)) == 0)
p->count++; /* repeated word */
else if (cond < 0) { /* less than: into left subtree */
p->left = addtree(p->left, w);
if(p->left == NULL)
return NULL;
}
else { /* greater than: into right subtree */
p->right = addtree(p->right, w);
if(p->right == NULL)
return NULL;

Page 296 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

return p;
}
In practice, many programmers would collapse the calls and tests:
struct tnode *addtree(struct tnode *p, char *w)
{
int cond;

if (p == NULL) { /* a new word has arrived */


if ((p = talloc()) == NULL)
return NULL;
if ((p->word = strdup(w)) == NULL) {
free(p);
return NULL;
}
p->count = 1;
p->left = p->right = NULL;

Page 297 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

} else if ((cond = strcmp(w, p->word)) == 0)


p->count++; /* repeated word */
else if (cond < 0) { /* less than: into left subtree */
if ((p->left = addtree(p->left, w)) == NULL)
return NULL;
}
else { /* greater than: into right subtree */
if ((p->right = addtree(p->right, w)) == NULL)
return NULL;
}

return p;
}

Chapter 7: Input and Output

page 151

Page 298 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

By ``Input and output facilities are not part of the C language itself,'' we mean that
things like printf are just function calls like any other. C has no built-in input or
output statements. For our purposes, the implications of this fact--that I/O is not
built in--is mainly that the compiler may not do as much checking as we might like
it to. If we accidentally write
double d = 1.23;
printf("%d\n", d);
the compiler says, ``Hmm, a function named printf is being called
with a string and a double. Okay by me.'' The compiler does not
(and, in general, could not even if it wanted to) notice that
the %d format requires an int.

Although the title of this chapter is ``Input and Output,'' it appears that we'll also be
meeting a few other routines from the standard library.

If you start to do any serious programming on a particular system, you'll


undoubtedly discover that it has a number of more specialized input/output (and
Page 299 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

other system-related) routines available, which promise better performance or nicer


functionality than the pedestrian routines of C's standard library. You should resist
the temptation to use these nonstandard routines. Because the standard library
routines are defined precisely and ``exist in compatible form on any system where
C exists,'' there are some real advantages to using them. (On the other hand, when
you need to do something which C's standard library routines don't provide, you'll
generally turn to your machine's system-specific routines right away, as they may
be your only choice. One common example is when you'd like to read one
character immediately, without waiting for the RETURN key. How you do that
depends on what system you're using; it is not defined by C.)

section 7.1: Standard Input and Output

section 7.2: Formatted Output--Printf

section 7.3: Variable-length Argument Lists

section 7.4: Formatted Input--Scanf


Page 300 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 7.5: File Access

section 7.6: Error Handling--Stderr and Exit

section 7.7: Line Input and Output

section 7.8.1: String Operations

section 7.8.2: Character Class Testing and Conversion

section 7.8.3: Ungetc

section 7.8.4: Command Execution

section 7.8.5: Storage Management

section 7.8.6: Mathematical Functions

Page 301 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 7.8.7: Random Number Generation

section 7.1: Standard Input and Output


Note that ``a text stream'' might refer to input (to the program) from the keyboard
or output to the screen, or input and output from files on disk. (For that matter, it
can also refer to input and output from other peripheral devices, or the network.)

Note that the stdio library generally does newline translation for you. If you know
that lines are terminated by a linefeed on Unix and a carriage return on the
Macintosh and a carriage-return/linefeed combination on MS-DOS, you don't have
to worry about these things in C, because the line termination will always appear to
a C program to be a single '\n'. (That is, when reading, a single '\n' represents
the end of the line being read, and when writing, writing a '\n' causes the
underlying system's actual end-of-line representation to be written.)

pages 152-153

Page 302 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

The ``lower'' program is an example of a filter: it reads its standard input, ``filters''
(that is, processes) it in some way, and writes the result to its standard output.
Filters are designed for (and are only really useful under) a command-line interface
such as the Unix shells or the MS-DOS command.com interface. Obviously, you
would rarely invoke a program like lower by itself, because you would have to type
the input text at it and you could only see the output ephemerally on your screen.
To do any real work, you would always redirect the input:
lower < inputfile
and perhaps the output:
lower < inputfile > outputfile
(notice that spaces may precede and follow the < and > characters). Or, a filter
program like lower might appear in a longer pipeline:
oneprogram | lower | anotherprogram
or
anotherprogram < inputfile | lower | thirdprogram > outputfile

Filters like these are not terribly useful, though, under a Graphical User Interface
such as the Macintosh or Microsoft Windows.

Page 303 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 7.2: Formatted Output -- Printf


pages 153-155

To summarize the important points of this section:

• printf's output goes to the standard output, just like putchar.


• Everything in printf's format string is either a plain character to be printed
as-is, or a %-specifier which generally causes one argument to be consumed,
formatted, and printed. (Occasionally, a single %-specifier consumes two or
three arguments if the width or precision is *, or zero arguments if the
specifier is %%.)
• There's a fairly long list of conversion specifiers; see the table on page 154.
• Always be careful that the conversions you request (in the format string)
match the arguments you supply.
• You can ``print'' to a string (instead of the standard output) with sprintf.
(This is the usual way of converting numbers to strings in C;

Page 304 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

the itoa function we were playing with in section 3.6 on page 64 is


nonstandard, and unnecessary.)

section 7.3: Variable-length Argument Lists


This is an advanced section which you don't need to read.

section 7.4: Formatted Input -- Scanf


page 157

Somehow we've managed to make it through six chapters without meeting scanf,
which it turns out is just as well.

In the examples in this book so far, all input (from the user, or otherwise) has been
done with getchar or getline. If we needed to input a number, we did things like
char line[MAXLINE];

Page 305 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

int number;
getline(line, MAXLINE);
number = atoi(line);
Using scanf, we could ``simplify'' this to
int number;
scanf("%d", &number);
This simplification is convenient and superficially attractive, and it works, as far as
it goes. The problem is that scanf does not work well in more complicated
situations. In section 7.1, we said that calls to putchar and printf could be
interleaved. The same is not always true of scanf: you can have baffling problems
if you try to intermix calls to scanf with calls to getchar or getline. Worse, it turns
out that scanf's error handling is inadequate for many purposes. It tells you
whether a conversion succeeded or not (more precisely, it tells you how many
conversions succeeded), but it doesn't tell you anything more than that (unless you
ask very carefully). Like atoi and atof, scanf stops reading characters when it's
processing a %d or %f input and it finds a non-numeric character. Suppose you've
prompted the user to enter a number, and the user accidentally types the letter
`x'. scanf might return 0, indicating that it couldn't convert a number, but the

Page 306 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

unconvertable text (the `x') remains on the input stream unless you figure out some
other way to remove it.

For these reasons (and several others, which I won't bother to mention) it's
generally recommended that scanf not be used for unstructured input such as user
prompts. It's much better to read entire lines with something like getline (as we've
been doing all along) and then process the line somehow. If the line is supposed to
be a single number, you can use atoi or atof to convert it. If the line has more
complicated structure, you can use sscanf (which we'll meet in a minute) to parse
it. (It's better to use sscanf than scanf because when sscanf fails, you have
complete control over what you do next. When scanf fails, on the other hand,
you're at the mercy of where in the input stream it has left you.)

With that little diatribe against scanf out of the way, here are a few comments on
individual points made in section 7.4.

We've met a few functions (e.g. getline, month_day in section 5.7 on page 111)
which return more than one value; the way they do so is to accept a pointer
Page 307 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

argument that tells them where (in the caller) to write the returned value. scanf is
the epitome of such functions: it returns potentially many values (one for each %-
specifier in its format string), and for each value converted and returned, it needs a
pointer argument.

The statement on page 157 that ``blanks or tabs'' in the format string ``are ignored''
(which is repeated on page 159) is a simplification: in actuality, a blank or tab (or
newline; actually any whitespace) in the format string causes scanf to skip
whitespace (blanks, tabs, etc.) in the input stream.

A * character in a scanf conversion specifier means something completely


different than it does for printf: for scanf, it means to suppress assignment (i.e. for
that conversion specifier, there isn't a pointer in the argument list to receive the
converted value, so the converted value is discarded). With scanf, there is no direct
way of taking a field width from the argument list, as * does for printf.

Conversion specifiers like %d and %f automatically skip leading whitespace while


looking for something to convert. This means that the format
Page 308 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

strings "%d %d" and"%d%d" act exactly the same--the whitespace in the first format
string causes whitespace to be skipped before the second %d, but the
second %d would have skipped that whitespace anyway. (Yet another scanf foible
is that the innocuous-looking format string "%d\n" converts a number and then
skips whitespace, which means that it will gobble up not only a newline following
the number it converts, but any number of newlines or whitespace, and in fact it
will keep reading until it finds a non-whitespace character, which it then won't
read. This sounds confusing, but so is scanf's behavior when given a format string
like "%d\n". The moral is simple: don't use trailing \n's in scanf format strings.)

page 158

Notice that, for scanf, the %e, %f, and %g formats are all the same, and signify
conversion of a float value (they accept a pointer argument of type float *). To
convert a double, you need to use %le, %lf, or %lg. (This is quite different from
the printf family, which uses %e, %f, and %g for floats and doubles, though all
three request different formats. Furthermore, %le, %lf, and %lg are technically
incorrect for printf, though most compilers probably accept them.)
Page 309 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

page 159

More precisely, the reason that you don't need to use a & with monthname is that an
array, when it appears in an expression like this, is automatically converted to a
pointer.

The dual-format date conversion example in the middle of page 159 is a nice
example of the advantages of calling getline and then sscanf. At the beginning of
this section, I said that ``when sscanf fails, you have complete control over what
you do next.'' Here, ``what you do next'' is try calling sscanf again, on the very
same input string (thus effectively backing up to the very beginning of it), using a
different format string, to try parsing the input a different way.

section 7.5: File Access


page 160

Page 310 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

We've come an amazingly long way without ever having to open a file (we've been
relying exclusively on those predefined standard input and output streams) but now
it's time to take the plunge.

The concept of a file pointer is an important one. It would theoretically be possible


to mention the name of a file each time it was desired to read from or write to it.
But such an approach would have a number of drawbacks. Instead, the usual
approach (and the one taken in C's stdio library) is that you mention the name of
the file once, at the time you open it. Thereafter, you use some little token--in this
case, the file pointer--which keeps track (both for your sake and the library's) of
which file you're talking about. Whenever you want to read from or write to one of
the files you're working with, you identify that file by using the file pointer you
obtained fromfopen when you opened the file. (It is possible to have several files
open, as long as you use distinct variables to store the file pointers.)

Not only do you not need to know the details of a FILE structure, you don't even
need to know what the ``buffer'' is that the structure contains the location of.

Page 311 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

In general, the only declaration you need for a file pointer is the declaration of the
file pointer:
FILE *fp;
You should never need to type the line
FILE *fopen(char *name, char *mode);
because it's provided for you in <stdio.h>.

If you skipped section 6.7, you don't know about typedef, but don't worry. Just
assume that FILE is a type, like int, except one that is defined by <stdio.h>instead
of being built into the language. Furthermore, note that you will never be using
variables of type FILE; you will always be using pointers to this type, or FILE *.

A ``binary file'' is one which is treated as an arbitrary series of byte values, as


opposed to a text file. We won't be working with binary files, but if you ever do,
remember to use fopen modes like "rb" and "wb" when opening them.

page 161

Page 312 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

We won't worry too much about error handling for now, but if you start writing
production programs, it's something you'll want to learn about. It's extremely
annoying for a program to say ``can't open file'' without saying why. (Some
particularly unhelpful programs don't even tell you which file they couldn't open.)

On this page we learn about four new functions, getc, putc, fprintf, and fscanf,
which are just like functions that we've already been using except that they let you
specify a file pointer to tell them which file (or other I/O stream) to read from or
write to. (Note that for putc, the extra FILE * argument comes last, while
forfprintf and fscanf, it comes first.)

page 162

catis about the most basic and important file-handling program there is (even if its
name is a bit obscure). The cat program on page 162 is a bit like the ``hello,
world'' program on page 6--it may seem trivial, but if you can get it to work, you're
over the biggest first hurdle when it comes to handling files at all.

Page 313 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Compare the cat program (and especially its filecopy function) to the file copying
program on page 16 of section 1.5.1--cat is essentially the same program, except
that it accepts filenames on the command line.

Since the authors advise calling fclose in part to ``flush the buffer in which putc is
collecting output,'' you may wonder why the program at the top of the page does
not call fclose on its output stream. The reason can be found in the next sentence:
an implicit fclose happens automatically for any streams which remain open when
the program exits normally.

In general, it's a good idea to close any streams you open, but not to close the
preopened streams such as stdin and stdout. (Since ``the system'' opened them for
you as your program was starting up, it's appropriate to let it close them for you as
your program exits.)

section 7.6: Error Handling -- Stderr and Exit


page 163
Page 314 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

stdout and stderr are both predefined output streams; for our purposes, the only
difference between them is that stderr is not likely to be redirected by the user, so
the error messages printed to stderr will always appear on the screen, where they
can be seen.

page 164

The cryptic note about ``a pattern-matching program'' simply means that if you
want to search the source code of a program for all the exit status values it can
return, ``exit'' might be an easier string to search for than ``return.'' (Every call
to exit represents an exit from the program, but not every return statement does.)

The feof and ferror functions can be used to check for error conditions more
carefully. In general, input routines (such as getchar and getline) return some
special value to tell you that they couldn't read any more. Often, this value is EOF,
reinforcing the notion that the only possible reason they couldn't read any more
was because end-of-file had been reached. However, it's also possible that there
was a read error, and you can call feof or ferror to determine whether this was the
Page 315 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

case. On the output side, though the output routines generally do return an error
indication, few programs bother to check the return values from every call to
functions such as putchar and printf. One way to check for output errors, without
having to check the return value of every function, is to call ferror on the output
stream (which might be stdout) at key points.

section 7.7: Line Input and Output


pages 164-165

To summarize, puts is like fputs except that the stream is assumed to be the
standard output (stdout), and a newline ('\n') is automatically appended. gets is
likefgets except that the stream is assumed to be stdin, and the newline ('\n') is
deleted, and there's no way to specify the maximum line length. This last fact
means that you almost never want to use gets at all: since you can't tell it how big
the array it's to read into is, there's no way to guarantee that some unexpectedly-
long input line won't overflow the array, with dire results. (When discussing the

Page 316 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

drawbacks of gets, it's customary to point out that the ``Internet worm,'' a program
that wreaked havoc in 1988 by breaking into computers all over the net, was able
to do so in part because a key network utility on many Unix systems used gets, and
the worm was able to overflow the buffer in a particularly low, cunning way, with
the dire result that the worm achieved superuser access to the attacked machine.)

section 7.8.1: String Operations


page 166

One thing to beware of is that strcpy's arguments--more precisely, the strings


pointed to by its arguments--must not overlap.

Another string function we've seen is strstr:

strstr(s,t) return pointer to first t in s, or NULL if not present

Page 317 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

section 7.8.2: Character Class Testing and Conversion


One quirk of these functions, which the authors mention briefly, is that although
they accept arguments of type int, it is not legal to pass just any int value to them.
If you were to attempt to call isupper(12345), it might do something bizarre. You
should only call these functions with arguments which represent valid character
values. (Also, they are guaranteed to accept the value EOF gracefully.)

section 7.8.3: Ungetc


There's not much more to say about ungetc, but two more stdio functions which
might deserve mention are fread and fwrite.

getc and putc (and getchar and putchar) allow you to read and write a character at
a time, while fgets and fputs read and write a line at a time. The printffamily of
routines does formatted output, and the scanf family does formatted input. But
what if you want to read or write a big block of unformatted characters, not

Page 318 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

necessarily one line long? You could use getc or putc in a loop, but another
solution is to use the fread and fwrite functions, which are (briefly) described in
appendix B1.5 on page 247.

section 7.8.4: Command Execution


page 167

The only thing to add to this brief description of system concerns the disposition of
the executed command's output. (Similar arguments apply to its input.) The output
generally goes wherever the calling program's output goes, though if the calling
program has done anything with stdout (such as closing it, or redirecting it within
the program with freopen), those changes will probably not affect the output
of system. One way to achieve redirection of the command executed by system, if
the operating system permits it, is to use redirection notation within the command
line passed to system:
system("date > outfile");

Page 319 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Note also that the exit status returned by the program (and hence perhaps
by system) does not necessarily have anything to do with anything printed by the
program. One way to capture the output printed by the program is to use
redirection, as above, then open and read the output file.

section 7.8.5: Storage Management


The important thing to know about malloc and free and friends is to be careful
when calling them. It is easy to abuse them, either by using more space than you
ask for (that is, writing beyond the ends of an allocated block) or by continuing to
use a pointer after the memory it points to has been freed (perhaps because you had
several pointers to the same block of memory, and you forgot that when you freed
one pointer they all became invalid). malloc-related bugs can be difficult and
frustrating to track down, so it's good to use programming practices which help to
assure that the bugs don't happen in the first place. (One such practice is to make
sure that pointer variables are set to NULL when they don't point anywhere, and to

Page 320 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

occasionally check pointer values--for instance at entry to an important pointer-


using function--to make sure that they're not NULL.)

As we mentioned on page 142 in section 6.5, it is no longer necessary (that is, in


ANSI C) to cast malloc's value to the appropriate type, though it doesn't hurt to do
so.

section 7.8.6: Mathematical Functions


page 168

Note that the pow function is how you do exponentiation in C--C does not have a
built-in exponentiation operator (such as ** or ^ in some other languages).

Page 321 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

Before calling these functions, remember to #include <math.h>. (It's always a good
idea to #include the appropriate header(s) before using any library functions, but
the math functions are particularly unlikely to work correctly if you forget.) Also,
under Unix, you may have to explicitly request the math library by adding the -
lmoption at the end of the command line when compiling/linking.

section 7.8.6: Mathematical Functions


page 168

Note that the pow function is how you do exponentiation in C--C does not have a
built-in exponentiation operator (such as ** or ^ in some other languages).

Before calling these functions, remember to #include <math.h>. (It's always a good
idea to #include the appropriate header(s) before using any library functions, but
the math functions are particularly unlikely to work correctly if you forget.) Also,
Page 322 of 325
C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

under Unix, you may have to explicitly request the math library by adding the -
lmoption at the end of the command line when compiling/linking.

section 7.8.7: Random Number Generation


There is a typo in some printings; the code for returning a floating-point random
number in the interval [0,1) should be
#define frand() ((double) rand() / (RAND_MAX+1.0))

If you want to get random integers from M to N, you can use something like
M + (int)(frand() * (N-M+1))

``[Setting] the seed for rand'' refers to the fact that, by default, the sequence of
pseudo-random numbers returned by rand is the same each time your program
runs. To randomize it, you can call srand at the beginning of the program, handing

Page 323 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

it some truly random number, such as a value having to do with the time of day.
(One way is with code like
#include <stdlib.h>
#include <time.h>

srand((unsigned int)time((time_t *)NULL));


which uses the time function mentioned on page 256 in appendix B10.)

One other caveat about rand: don't try to generate random 0/1 values (to simulate a
coin flip, perhaps) with code like
rand() % 2
This looks like it ought to work, but it turns out that on some systems rand isn't
always perfectly random, and returns values which consistently alternate even,
odd, even, odd, etc. (In fact, for similar reasons, you shouldn't usually use rand() %
N for any value of N.) A good way to get random 0/1 values would be

Page 324 of 325


C PROGRAMMING

This book is by Steve Summit // Copyright 1995, 1996 // mail


feedback

(int)(frand() * 2)

based on the other frand() examples above.

Page 325 of 325

Das könnte Ihnen auch gefallen