Sie sind auf Seite 1von 2

10

Binary Files, Random Access, and


Dynamic Allocation

10.1 Binary and Random Access Files

10.1.1 Random Access File Concepts

In this section we will discuss a different kind of file that


stores information in a binary format rather than as text.
Recall from previous examples throughout this book that
data files were always text files in the sense that they could be created with a text
editor or a word processor in a human-readable format. Even numerical
information was stored in terms of readable characters-the digits 0-9 plus other
appropriate characters such as a period serving as a decimal point. Through the
use of appropriate conversion specifiers, the contents of this kind of file could be
interpreted by a program either as text (characters) or as numerical information.
We treated such files as sequential access files in which we always started reading
information at the beginning of the file and proceeded sequentially from one value
to the next. We stopped either when we found what we were looking for or, more
typically, when we got to the end of the file.
Sequential access text files are simple to use, but they have some
limitations that are especially obvious when you are faced with manipulating more
information than can be stored in a statically allocated array while your program
is running. One typical use of arrays is to store a list in a way that makes it
efficient to look for things in the list. Once the list is stored in an array, you can
sort the list and then use a binary search; the required algorithms were discussed
in Chapter 8. However, you cannot perform a binary search on a sequential access
file even if the information in the file is sorted in the desired order, and you
cannot sort such a file. The inefficiency of using linear searches is most noticeable
for large files-that is, just those files that may be too large to store in an array.
The solution to this kind of data management problem is to remove the
restriction of sequential access. The easiest way to do this is to store data in
records of fixed length. The problem with text files is that they do not lend
themselves naturally to the imposition of a fixed record length. For example,
consider these two lines (records) in a text file:
1001 17.77 313.5
999 3.3 2.1

This file is simply a string of characters, including blanks and end-of-line marks.
We know that it can be interpreted in an appropriate way by using C intrinsic

D. R. Brooks, C Programming: The Essentials for Engineers and Scientists


© Springer-Verlag New York, Inc. 1999
422 • 10. Binary Files, Random Access, and Dynamic Allocation

functions such as f scanf along with appropriate conversion specifiers. Thus it


is easy to read the six values in this file, perhaps three at a time. However, the
first line contains 16 characters and the second line contains only 11 characters.
This is due simply to how the numerical information is represented in this form.
When a program stores the six values in this file in its internal binary
format (that is, as a series of Is and Os), the space required in memory is
independent of how the values are originally represented as text because the
internal storage of numerical values depends only on the data type of the values
and not on how they are expressed as characters. Thus, assuming the numbers
1001 and 999 are stored as integers, both values require the same number of
bytes. Similarly, assuming the values 17.77,313.5,3.3, and 2.1 are all stored as
real numbers, the memory requirements are identical for all four values. Internally
then, each "record" of this file requires the same amount of storage space.
It is possible to copy the internal representation of values into a file and
also to read such a file simply by reversing the process; that is, by copying the
contents of the file directly into memory. Such a file, which preserves the internal
binary format in which values are stored, is called a binary file. VO operations are
more efficient for binary files than for text files because it is no longer necessary
to translate back and forth between internal and external (text-based)
representations for numerical values. Most significantly, however, it is easy to
implement random access to information in a binary file. This because each record
in a binary file has the same length, based on the data types of the values stored
in the record.
Finally, do not confuse the terms binary format or binary file in this
discussion with terms such as binary search. The first two terms describe a way
of storing data. The third term describes an algorithm.

10.1.2 Implementing Binary Files

To see how binary files work, we will return to a problem that first appeared as
a file-processing example in Chapter 2. In Section 2.3, a programming problem
was posed in which several remote instrument stations report measurements to a
central data collection facility. The reports are then assembled into a data file for
processing. A station can submit multiple reports, not necessarily sequentially, and
it can submit up to eight measurements in a single report. A program was
presented with the limited goal of counting the number of reports and the total
number of measurements. Modifications of this problem presented as Exercise 1
in Chapters 6 and 7 used arrays or arrays of structs to keep track of the total
number of reports and measurements for each station. If you did Exercise 1 in
Chapter 7, your program should have used an array of structures whose index
values 0 through 10 were obtained directly from the station IDs, assumed to be
numbers in the range 1000 through 1010. (If you didn't do those exercises, it
would be worthwhile to do them now.)

Das könnte Ihnen auch gefallen