Sie sind auf Seite 1von 40

4.2BSD and 4.

3BSD as Examples of the UNIX System


JOHN S. QUARTERMAN, ABRAHAM SILBERSCHATZ, and JAMES L. PETERSON
Department of Computer Sciences, University of Texas, Austin, Texas 78712

This paper presents an in-depth examination of the 4.2 Berkeley Software Distribution,
Virtual VAX-11 Version (4.2BSD), which is a version of the UNIX’” Time-Sharing
System. There are notes throughout on 4.3BSD, the forthcoming system from the
University of California at Berkeley. We trace the historical development of the UNIX
system from its conception in 1969 until today, and describe the design principles that
have guided this development. We then present the internal data structures and
algorithms used by the kernel to support the user interface. In particular, we describe
process management, memory management, the file system, the I/O system, and
communications. These are treated in as much detail as the UNIX licenses will allow. We
conclude with a brief description of the user interface and a set of bibliographic notes.

Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]:


Distributed Systems--distributed applications; D.4.0 [Operating Systems]: General-
UNIX; D.4.7 [Operating Systems]: Organization and Design-interactiue systems;
K.2 [History of Computing]: Software--UNIX
General Terms: Algorithms, Design, Human Factors, Performance, Reliability, Security
Additional Key Words and Phrases: Flexibility, portability, simplicity

INTRODUCTION Berkeley, differs functionally from 4.2BSD


This paper presents an in-depth examina- in the areas of interest, such differences are
tion of the 4.2BSD operating system, the noted.
research UNIX’ system developed for the This paper is not a critique of the design
Defense Advanced Research Projects and implementation of 4.2BSD or UNIX;
Agency (DARPA) by the University of Cal- it is an explanation. For comparisons of
ifornia at Berkeley. We have chosen System V and 4.2BSD, see the literature,
4.2BSD over UNIX System V (the UNIX particularly the references given in Section
system currently being licensed by AT&T) 1.1, p. 380. Such comparisons are mostly
because concepts such as internetworking beyond the scope of this paper.
and demand paging are implemented in The VAX’ implementation is used be-
4.2BSD but not in System V. Where cause 4.2BSD was developed on the VAX,
4.3BSD, the forthcoming system from
’ VAX, PDP, TOPS-20, and VMS are trademarks of
1 UNIX is a trademark of AT&T Bell Laboratories. Digital Equipment Corporation.

Chapter 14 of Operating Systems Concepts, Second Edition, by J. L. Peterson and A. Silberschatz (0 1985 by
Addison-Wesley, Reading, Massachusetts) and this article were both derived from an earlier common manu-
script by J. S. Quarterman. Consequently they share some text. Common portions are reprinted with the
permission of Addison-Wesley.
Author’s present address: James L. Peterson, MCC, 9430 Research Blvd., Austin, Texas 78759.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its
date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To
copy otherwise, or to republish, requires a fee and/or specific permission.
0 1986 ACM 0360-0300/85/1200-0379 $00.75

Computing Surveys, Vol. 17, No. 4, December 1985


380 l J. S. Quarterman, A. Silberschatz, and J. L. Peterson
CONTENTS nel. Students of operating systems and nov-
ice systems programmers (the intended
readership) should find the organization
and content appropriate.
INTRODUCTION The novice UNIX user will want to read
1. OVERVIEW Section 7 on the user interface before delv-
1.1 History ing into the sections on kernel details. That
1.2 Design Principles
2. PROCESSES
section is as brief as possible, because the
2.1 user Interface user interface and user programs in general
2.2 Control Blocks are (regardless of their importance to the
2.3 CPU Scheduling utility and popularity of the system) be-
3. MEMORY MANAGEMENT
yond the proper scope of this paper. Read-
3.1 Paging
3.2 Swapping ing one of the several good books on using
4. FILE SYSTEM UNIX (see Section 8, Bibliographic Notes)
4.1 user Interface would be good preparation for reading the
4.2 Implementations paper.
4.3 Data Structures on the Disk
4.4 Layout and Allocation Policies
The paper begins with a very brief over-
4.5 Mapping a Pathname to an Inode view of the history of the system and some
4.6 Mapping a File Descriptor to an Inode description of the design philosophy behind
5. I/O SYSTEM it. The other sections cover process man-
5.1 Block Buffer Cache
agement, memory management, the file
5.2 Raw Device Interfaces
5.3 C-Lists system, the I/O system, communications,
6. COMMUNICATIONS and certain features of the user interface
6.1 Signals that distinguish the system. The paper con-
6.2 Interprocess Communication cludes with a set of bibliographic notes.
6.3 Networking
6.4 Distributed Systems
7. USER INTERFACE 1. OVERVIEW
7.1 Shells and Commands
7.2 Standard I/O This section is concerned with the history
7.3 Pipelines, Filters, and Shell Scripts
and design of the UNIX system, which was
7.4 The UNIX Philosophy
8. BIBLIOGRAPHIC NOTES initially developed at Bell Laboratories as
ACKNOWLEDGMENTS a private research project of two program-
REFERENCES mers. Its original elegant design and devel-
opments of the past fifteen years
have made it an important and powerful
operating system. We trace the history
of the system E [Compton 1985; Ritchie
and that machine still represents a conven- 1978, 1984a, 1984b] and relate its design
ient point of reference, despite the recent principles.
proliferation of implementations on other
hardware (such as the Motorola 68020
1.1 History
or National Semiconductor 32032). Also,
details of implementation for non-VAX The first version of UNIX was developed
systems are usually proprietary to the com- at Bell Laboratories in 1969 by Ken
panies that did them. And space does not Thompson to use an otherwise idle PDP-7.
permit examination of every implementa- He was soon joined by Dennis Ritchie, and
tion on every kind of hardware. the two of them have since been the largest
This paper is not a tutorial on how to use influence on what is commonly known as
UNIX or 4.2BSD. It is assumed that the Research UNIX.
reader knows how to use the UNIX system. Ritchie, Thompson, and other early Re-
The presentation is closely limited to a search UNIX developers had previously
technical examination of traditional oper- worked on the Multics project [Peirce
ating system and networking concepts, 19851, and Multics [Organick 19751 was a
most of which are implemented in the ker- strong influence on the newer operating

Computing Surveys, Vol. 17, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System l 381
system. Even the name UNIX is merely a III), in 1982, which incorporated features
pun on Multics, indicating that in areas of Version 7,32V, and also of several UNIX
where Multics attempted to do many systems developed by groups other than the
things, UNIX tries to do one thing well. Reseach group. Features of UNIX/RT (a
The basic organization of the file system, real-time UNIX system) were included, as
the idea of the command interpreter (the well as many features from PWB. USG
shell) being a user process, the use of a released UNIX System V (System V) in
process per command, the original line- 1983; it is largely derived from System III.
editing characters # and @, and many other The divestiture of the various Bell Oper-
things come directly from Multics. ating Companies from AT&T has left
Ideas from various other operating sys- AT&T in a position to market System V
tems, such as Massachusetts Institute of [Wilson 19851 aggressively. USG has me-
Technology’s CTSS, have also been used. tamorphosed into the UNIX System
The fork operation comes from Berkeley’s Development Laboratory (USDL), whose
GENIE (XCS-940) operating system. current distribution is UNIX System V
The Research UNIX systems include Release 2 (V.2), released in 1984.
UNIX Time-Sharing System, Sixth Edition The ease with which the UNIX system
(commonly known as Version 6), which can be modified has led to development
was the first version widely available out- work at numerous organizations such as
side Bell Laboratories (in 1976) and ran on Rand, Bolt, Beranek and Newman (BBN),
the PDP-11. (These version numbers cor- the University of Illinois, Harvard, Purdue,
respond to the edition numbers of the and even DEC. But the most influential of
UNIX Programmer’s Manual that were the non-Bell Laboratories and non-AT&T
current when the distributions were made.) UNIX development groups has been the
Multiprogramming was added before Ver- University of California at Berkeley
sion 6, and after the system was rewritten [McKusick 19851. UNIX software from
in a high-level programming language, C Berkeley is released in so-called BerkeZey
[Kernighan 1978; Ritchie et al. 19781. Software Distributions (BSD), hence the
C was designed and implemented for this generic numbers 2BSD for the later PDP-
purpose by Dennis Ritchie. It is descended 11 distributions and 4BSD for the later
[Rosler 19841 from the language B, de- VAX distributions.
signed and implemented by Ken Thomp- Many of the features of the 4BSD ter-
son. B was itself descended from BCPL. C minal drivers are from TENEX/TOPS-20,3
continues to evolve [Stroustrup 1984; and efficiency improvements have been
Tuthill 1985a]. made as a result of comparisons with VMS.
The first portable UNIX system was The first Berkeley VAX UNIX work was
UNIX Time-Sharing System, Seventh Edi- the addition to 32V of virtual memory, de-
tion (Version 7), which ran on the PDP-11 mand paging, and page replacement in 1979
and the Interdata 8132, and had a VAX by Bill Joy and Ozalp Bagaoglu to produce
variety called UNIX/32V Time-Sharing 3BSD. The large virtual memory space of
System Version 1.0 (32V). The system cur- 3BSD allowed the development of very
rently in development by the Research large programs, such as Berkeley’s own
group at AT&T Bell Laboratories is UNIX Franz Lisp. This memory management
Time-Sharing System, Eighth Edition work convinced DARPA to fund Berkeley
(Version 8). for the later development of a standard
After the distribution of Version 7 in UNIX system for government use (4BSD).
1978, the Research group gave external dis- One of the goals of this project was to
tributions over to the UNIX Support Group provide support for the DARPA Internet
(USG). USG had previously distributed networking protocols TCP/IP. This was
such systems as UNIX Programmer’s Work done in a general manner, and it is possible
Bench (PWB) internally, and sometimes to communicate among diverse network fa-
externally as well [Mohr 19851. cilities, ranging from local networks (such
Their first external distribution after
Version 7 was UNIX System III (System 3 TENEX is a registered trademark of BBN.

Computing Surveys, Vol. 17, No. 4, December 1985


382 l J. S. Quarterman, A. Silberschatz, and J. L. Peterson
as Ethernets and token rings) to long-haul butions. There is constant discussion of
networks (such as DARPA’s ARPANET). UNIX in general (including 4.2BSD) in
It is sometimes convenient to refer to the the DARPA Internet mailing list UNIX-
Berkeley VAX UNIX systems following WIZARDS, which appears on the
3BSD as 4BSD, although there were ac- USENET network as the news group
tually several releases (indicated by decimal net.unix-wizards; both the Internet and
points in the release numbers), most nota- USENET are international in scope. There
bly 4.1BSD. 4BSD was the operating sys- is another USENET news group dedicated
tem of choice for VAXs from the beginning to 4BSD bugs. While few ideas appear to
until the release of System III (1979-1982) be accepted by Berkeley directly from these
and remains so for many research or net- lists and news groups (probably because of
working installations. Most organizations the difficulty of sifting through the sheer
would buy a 32V license and order 4BSD volume of submissions), discussions in
from Berkeley without ever bothering to them sometimes lead to new facilities being
get a 32V tape. Many installations inside written that are later accepted.
the Bell System ran 4.1BSD (many still do, Figure 1 is a sketch of the evolution of
and many others run 4.2BSD). the several main branches of the UNIX
The 4BSD work for DARPA was guided system, especially those leading to 4.2BSD
by a steering committee, which included and System V [Chambers and Quarterman
many notable people from the UNIX and 1983; Uniejewski 19851. The dates given
networking communities. 4.2BSD, first dis- are approximate, and there is no attempt
tributed in 1983, is the culmination of the to show all influences. Some of the systems
original Berkeley DARPA UNIX project, named in the figure are not mentioned in
although further research proceeds at the text, but are included to better show
Berkeley. the relations among the ones that are dis-
Berkeley was not the only organization cussed in the text.
involved in the development of 4.2BSD. We are aware at this writing of the im-
Contributions (such as autoconfiguration, minent release of 4.3BSD and System V
job control, and disk quotas) came from Release 2 Version 4. There are few func-
numerous universities and other organiza- tional changes in the kernel in 4.3BSD,
tions in Australia, Canada, Europe, and the although there are many performance im-
United States. A few ideas, such as the fcntl provements [Cabreral et al. 1985; Leffler et
system call, were taken from System V. al. 1984; McKusick et al. 19851. (Some of
(Licensing and pricing considerations have these 4.3BSD changes are noted in sections
prevented the use of any actual code from throughout this paper.) Although System
System III or System V in 4BSD.) Not only V Release 2 Version 4 does introduce pag-
are many contributions included in the dis- ing [Jung 1985; Miller 19841 (including
tributions proper, but there is an accom- copy-on-write and shared memory) to Sys-
panying set of user-contributed software, tem V, there are few other functional
which is carried on the tapes containing the changes.
4BSD distributions. The system was tested Dozens of computer manufacturers,5
on the M68000-based workstation by Sun including almost all of those usually
Microsystems, Inc., before its initial distri-
bution. This simultaneous development ’ These include at least Altos, Amdahl, Apollo, AT&T,
contributed to the ease of further ports of Burroughs, Callan, Celerity, Codata, Convergent
4.2BSD. Technologies, Convex, COSI, Cray, Cromemco, Data
Berkeley accepts mail about bugs General, DEC, Denelcor, Dual Systems, ELXSI,
Encore, Flexible, Gould, Heurikon, Hewlett Packard,
and their fixes at a well-known electronic Honeywell, IBM, Integrated Business Computers,
address, and the consulting company Integrated Solutions, Intel, Interactive Systems, Logi-
mt.Xinu distributes a bug list compiled cal Microcomputer, Medical Informatics, NBI, NCR,
from such submissions. Many of the bug National Semiconductor, Onyx, Pacific Computer,
Parallel, Perkin-Elmer, Plexus, Pyramid, R Sys-
fixes may be incorporated in future distri- tems, Radio Shack, Ridge, Sequent, Silicon
Graphics, Sperry, Sun Microsystems, Tektronix,
4 Ethernet is a trademark of Xerox Corporation. Visual Technology, and Wicat.

Computing Surveys, Vol. 17, No. 4, December 1985


: PDp-ii : lBSD-2BSD a 28BSD + 2.9BSD +

3BSD- 4.OBSD J.lcBSD -+iii+~:i$;+

Bell Research
/

Bell Cdwnbns

/
USG I USDL MERT + UNWRT

:
1969 1973 1976 1977/ 1978 1979 1980 1981 1982 1983 1984 1985

Figure 1. UNIX history.


384 l J. S. Quarterman, A. Silberschatz, and J. L. Peterson
considered major by market share, have UNIX system to include both the source of
either announced or introduced computers programs written in a language such as C,
that run the UNIX system or close deriva- FORTRAN, Pascal, or LISP, and the out-
tives, and numerous other companies sell put of the same programs. The manuscript
related peripherals, software packages, sup- of the book itself may be used as the input
port, training, documentation, or combi- of such programs, and their output may be
nations of these. The hardware packages statistics concerning the book. It is trivial
involved range from micros through minis, for the programmer to set up a mechanism
multis, and mainframes to supercomputers. whereby a single word (e.g., “make”) typed
Most of these use ports of System V, at a terminal causes the programs to be
4.2BSD, or mixtures of the two, although compiled and run and both their source
there are still a variety of machines running and output to be typeset as part of the
software based on System III, 4.1BSD, and manuscript.
Version 7. There are even some Version 6 A main reason for this flexibility is the
systems still in regular operation. lack of numerous file types: There is only
UNIX is also an excellent vehicle for one data file type as far as the operating
academic study. For example, both the system is concerned. This is a sequence of
Tunis operating system [Holt 19831 and bytes, which may be accessed either ran-
the Xinu operating system [Comer 19841 domly or sequentially. There are no “access
are based on the concepts of UNIX, but methods” and no “control blocks” in the
were developed explicitly for classroom data space of a UNIX user process, and
study. Ritchie and Thompson were honored there are few limitations on what a pro-
in 1983 by the ACM Turing award for their cess’s data space may be used for. The
work on UNIX. interface to the file system is very simple:
A file is referred to by a character string for
opening and by an integer for further
1.2 Design Principles
manipulation.
Unlike its most influential ancestor, Mul- Files are grouped in directories, which
tics, UNIX was not designed by a joint essentially form a tree-structured hierar-
project of several major institutions and did chy. Users may create directories as easily
not have project goals and aspirations set as ordinary tiles.
out beforehand in a series of papers pre- It is not hard to build elaborate database
sented at a prestigious professional confer- access mechanisms on top of UNIX’s one
ence. UNIX was, instead, orginated first by simple file type, as the half-dozen or more
one programmer, Ken Thompson, and then readily available databases for UNIX
another, Dennis Ritchie, as a system for (starting with INGRES, which is distrib-
their personal convenience, with no elabo- uted with 4BSD) attest.
rate plan spelled out beforehand. This flex- Devices and ordinary files are treated as
ibility appears to have been one of the key similarly as possible. Thus device depen-
factors in the development of the system. dencies or peculiarities are kept in the ker-
There were some design principles in- nel as much as possible, and even in the
volved, however, even though they were not kernel most of them are segregated in the
spelled out at the outset. device drivers.
UNIX was designed by programmers for A process may be informed of an ex-
programmers. It has always been interac- ceptional condition (perhaps by another
tive. Multiple processes are supported, and process) by means of a signal. True inter-
it is easy for one process to create another process communication and networking
process. There are standard and flexible may be accomplished in 4.2BSD by the use
ways of interconnecting the input and out- of sockets.
put of processes and otherwise coordinating The size constraints of the PDP-11 (and
several processes to do a task. It is not earlier computers used for UNIX) have
uncommon for a book typeset using the forced a certain elegance. Whereas other

Computing Surveys, Vol. 17, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System l 385

(the users)

shells and commands


compilers and interpreters
system libraries
sysrem call interface lo lhc kernel

Figure 2. Layers of the UNIX system.


signals file system CPU scheduling
terminal handling swapping page replacement
character I/O system block l/O system demand paging
terminal drivers disk and tape drivers virtual memory

kernel interface to the hardware


terminal controllers device controllers memory controllers
terminals disks and tapes physical memory

systems have elaborate algorithms for deal- system have had, on the whole, a beneficial
ing with pathological conditions, UNIX effect.
just does a controlled crash (a panic), and From the beginning, UNIX development
tries to prevent rather than cure such con- systems have had all the UNIX sources
ditions. Whereas other systems would use available on line, and the developers have
brute force or macro expansion, UNIX used the systems under development as
mostly has had to have developed more their primary systems. This has greatly fa-
subtle, or at least simpler, approaches. cilitated discovering deficiencies and their
In some instances, such as networking, fixes, as well as new possibilities and their
PDP-11 size constraints unfortunately had implementations.
the opposite effect. The original UNIX Facilities for program development have
Version 6 ARPANET software was split always been a high priority. Such facilities
into a kernel part and a part that ran as a include the program make (which may be
user process, purely because of size con- used to determine which of a collection of
straints. This entailed not only perfor- program source tiles need to be compiled
mance penalties but also led to a rather and then compile them) and the Source
convoluted design. The 4.2BSD networking Code Control System (SCCS) (which is used
code does not suffer from this, since it runs to keep successive versions of files available
on processors (VAX, M68000, NS16032, without having to store the entire contents
etc.) that have a reasonably sized address of each step).
space. The PDP-11 ports of this code re- The availability of sources for the oper-
quire extensive kernel overlays. ating system has also encouraged the pleth-
Virtual memory and paging were not im- ora of UNIX variants existing today, but
plemented on the PDP-11 because of the the benefits have outweighed the disadvan-
small number and huge size of the pages tages. If something is broken, it can be fixed
allowed by the hardware. Thus early ver- at a local site, rather than having to wait
sions of the INGRES database system ran for the next release of the system. Such
as multiple (six or seven) processes, and fixes, as well as new facilities, may be in-
Franz Lisp, with its need for huge data corporated into later distributions. Binary
spaces in a single process, did not develop licenses are becoming more popular with
until the VAX permitted paging in 3BSD. the growing number of small, inexpensive,
Even though some UNIX systems now UNIX systems, however.
try to do some things that require large The UNIX operating system may be con-
address spaces, the size constraints im- sidered for convenience of exposition to be
posed during the early development of the layered roughly as depicted in Figure 2.

Computing Surveys, Vol. 17, No. 4, December 1985


386 l J. S. Quarterman, A. Silberschatz, and J. L. Peterson
Everything below the system call interface fier of the child process to the parent
and above the physical hardware is the process, and returns zero in the child
kernel [Thompson 19781. This paper is process; this is how a program can deter-
mostly concerned with the kernel, since mine in which process it is running after a
that is where most of the traditional oper- fork. The wait system call provides the
ating systems issues are addressed: process identifier of the child that ter-
minated so the parent can tell which of
The kernel is the only UNIX code that cannot be possibly several children it was.
substituted by a user to his own liking. For this From the viewpoint of the calling pro-
reason, the kernel should make as few real deci-
cess, one may liken fork and wait to a
sions as possible. This does not mean to allow the
user a million options to do the same thing.
subroutine call and return, whereas exec is
Rather, it means to allow only one way to do one more like a goto.
thing, but have that way be the least-common The simplest form of communication be-
divisor of all the options that might have been tween processes is by pipes. Pipes provide
provided. [Thompson 1978, p. 19311 a reliably delivered byte stream between
two processes.They may be created before
This is especially noticeable in the design the fork, and their end points may then be
of the system call interface [Joy et al. 19831: set up between the fork and the exec.
“Throughout, simplicity has been substi- All user processes are descendants of one
tuted for efficency. Complex algorithms original process, which is called init and
are used only if their complexity can be has process identifier 1. On each terminal
localized” [Thompson 1978, p. 19321. port available for interactive use, init forks
(with the fork system call) a copy of itself,
2. PROCESSES
which attempts to open the port for reading
and writing. This new process has a new
In this section we describe how user pro- process identifier. The open succeeds when
cesses are created and manipulated by a directly connected terminal is turned on
other user processes, including the layout or a telephone call is accepted by a dial-up
of a process’s address space. Then the modem. Then the init process executes
kernel control blocks that keep track of (with the exec system call) a program called
processes are described. Finally, an over- getty.
view of the CPU scheduler and its event Getty initializes terminal line parameters
mechanism is given [Thompson 19781. and prompts the user to type a login name,
which getty collects. Getty then executes a
program called login, passing the login
2.1 User Interface
name as an argument. Login prompts the
A process is a program in execution. To user for a password, and collects it as the
execute a new program, a new process is user types it. Login determines whether
first produced by the fork system call, cre- the user is allowed to log in by encrypting
ating two almost identical processes, each the typed password and comparing it with
with a copy of the original data space. Then an encrypted string found according to the
the exec primitive may be used by one pro- login name in the file /etc/passwd. If the
cess to replace its virtual memory space comparison is successful, login sets the nu-
with that for a new program (read from a meric user identifier (uid) of the process to
file). A process may choose to terminate by that of the user logging in and executes a
using the exit system call, and its parent shell, or command interpreter. (The path-
process may wait for that event by using name of the shell and the user identifier
the wait system call. Figure 3 shows two are also found in /etc/passwd according
common scenarios for the use of these to the user’s login name.) This shell is what
system calls. the user ordinarily communicates with for
Processes are named by their process the rest of the login session. The shell itself
identifier (pid), which is an integer. The forks subprocesses for the commands that
fork system call returns the process identi- the user tells it to execute.

Computing Surveys, Vol. 17, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System l

exec init
init pld 1

exec exec zombie pid 1095

exec

login pid 7623


f

Figure 3. Fork, exec, exit, and wait.

The same process identifier is used suc- tifier is the one used to determine file access
cessively after the fork by the child init permissions, and the real user identifier is
process, by getty, by login, and finally by used by some programs to determine who
the shell. When the user logs out, the shell the original user was before the effective
dies and the original init process (process user identifier was set by a setuid. If the
identifier 1) waits on it. After the wait file being executed by exec has setuid indi-
succeeds, the process identifier formerly cated, the effectiue user identifier of the
used by the shell may be reassigned by the process is set to the user identifier of the
kernel to a new process. owner of the file, while the real user iden-
The user identifier is used by the kernel tifier is left as it was. This allows certain
to determine the user’s permissions for cer- processes to have more than ordinary priv-
tain system calls, especially those involving ileges while still being executable by ordi-
file accesses. There is also a group identifier nary users. This setuid idea is patented by
(gid ), which is used to provide similar priv- Dennis Ritchie [1979a] and is one of the
ileges to a collection of users. In 4.2BSD, a most distinctive features of UNIX. For
user’s processes may be in several groups groups there is a similar distinction for
simultaneously. The login process puts the effective and real group identifiers, and a
user’s shell in all the groups permitted similar setgid feature.
to the user by the files /etc/passwd and Every process has both a user and a
/etc/group. system phase, which never execute simul-
There are actually two user identifiers taneously. Most ordinary work is done by
used by the kernel: The effective user iden- the user process, but when a system call is

Computing Surveys, Vol. 17, No. 4, December 1985


388 l J. S. Quarterman, A. Silberschatz, and J. L. Peterson
done, it is the system process, which has a memory: An array of such structures is
different stack than the user process, that allocated at system link time.
performs the system call. The page tables contain information on
The virtual address space of a user pro- how the process’s virtual address space is
cess is divided into text (executable instruc- mapped to physical memory. When a pro-
tions), data, and stack segments. The data cess is in main memory, its page tables may
and stack segments are always in the same be found by a pointer in the process struc-
address space, but may grow separately ture. When the process is swapped, the
and, on most machines, in opposite direc- process structure contains instead the ad-
tions: On a VAX the stack grows down dress of the process on the swap device.
as the data grow up toward it. The text There is no special separate page table for
segment is usually not writable, so that the text segment when it is resident in main
one copy may be shared among several memory; every process sharing it has en-
processes. tries for its pages in the process’s page
How a process’s virtual address space is table. (Nonpaging UNIX systems ordinar-
mapped into physical main or secondary ily have the text structure point directly to
memory varies greatly from system to sys- the text segment, both in main memory
tem but is usually transparent to the user. and on the swap device.)
Information about the process that is
needed only when the process is resident
2.2 Control Blocks
(i.e., information that is not in the process
There are no system control blocks acces- structure) is kept in the user structure (or
sible in a user process’s virtual address u structure). A copy of the VAX hardware
space, but there are such control blocks in process control block is kept here for saving
the kernel associated with the process. the process’s general registers, stack
Some of these control blocks are diagramed pointer, program counter, and page table
in Figure 4. base registers when the process is not run-
The most basic data structure associated ning. There is space to keep system call
with processes is called the process struc- parameters and return values. All user and
ture. There is an array of these, whose group identifiers associated with the pro-
length is defined at system link time. Each cess (not just the effective user identifier
process structure contains everything that kept in the process structure) are kept here.
is necessary to know about a process when Signals, timers, and quotas have data struc-
it is swapped out, such as its unique process tures here. Of more obvious relevance to
identifier (an integer), scheduling infor- the ordinary user, the current directory is
mation (like the process’s priority), and maintained here, and open files are kept
pointers to other control blocks. The pro- track of. The kernel stack for the process
cess structures of running processes are (i.e., the stack of the system process phase)
kept linked together by the scheduler in a is also in the user structure.
doubly linked list, and there are pointers On the VAX, the user structure is
from each process structure to the process’s mapped into the high end of user address
parent, its youngest living child, and var- space, just above the process’s stack. The
ious other relatives of interest, such as a user structure of the running process is still
list of processes sharing the same text. directly accessible to the kernel, however,
Every process with sharable text (almost since kernel address space is VAX system
all, under 4.2BSD) has a pointer from its space, which is the high half of the whole
process structure to a text structure. The 32-bit virtual address space, whereas pro-
text structure records how many processes cess address space is the low half.
are using the text segment, a pointer into a When a fork system call is done, a new
list of their process structures, and where process structure with a new process iden-
the page table for the text segment can be tifier is allocated for the child process, and
found on disk when it is swapped. The text the user structure is copied. The copying
structure itself is always resident in main preserves open file descriptors, user and

Computing Surveys, Vol. 17, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System 9 389
Kernel AddressSpncc
Resident
: r--------7
: I I
: I

~1
VAX Hardware AddressSpacesandRegions :: ’’
:: III III
: I process I
:: ’’
I saucture ;
: I I
:.tJI I
,,/y--I---’
User AddressSpace
...... . . ... ... ............... .... .
; r- - - - - - - ---- - - - - - -;_ - - -f!_w_a_pp_a_ble_
----/g:- -- - - -/- -- ~
: I
: I
: I
: I
; 1
I
! I ~~~~~~~,
: L

Figure 4. Process control data structures.

group identifiers, signal handling, and most affecting the other process, since the kernel
similar properties of a process. There is data structures involved depend on the user
ordinarily no need for a new text structure, structure, which is not shared. The kernel
as the processes share their text; the appro- suspends the parent process until the child
priate counters and lists are merely up- calls exec or exits.
dated. A new page table is constructed, and When the parent process is large, ufork
new main memory is allocated for the data can produce substantial savings in system
and stack segments of the child process. (If CPU time. It is a rather dangerous system
enough memory cannot be found for the call, however, since any memory change by
page table, the process is swapped until the child process occurs in both processes
there is enough.) until the exec occurs. An alternative is to
The ufork system call does not copy the share all pages by duplicating the page
data and stack to the new process; rather table, but to mark the entries of both page
the new process simply shares the page tables as copy-on-write. The hardware pro-
table of the old one. A new user structure tection bits are set to trap any attempt to
and a new process structure are still cre- write in these shared pages. If such a trap
ated. A common use of this system call is occurs, a new frame is allocated and the
by a shell to execute a command and wait shared page is copied to the new frame. The
on its completion. The parent process uses page tables are adjusted to show that this
ufork to produce the child process. The page is no longer shared (and therefore
child process only wishes to use exec to need no longer be write protected), and
change its virtual address space completely execution can resume. Hardware bugs with
into that of a new program, so that there is the VAX-11/750 prevented 4.2BSD from
no need for a complete copy of the parent including a copy-on-write fork operation
process. Such data structures as are neces- (although Tektronix has since imple-
sary for manipulating pipes may be kept in mented it).
registers between the vfork and the exec. An exec system call creates no new pro-
Files may be closed in one process without cess or user structure; rather the text and

Computing Surveys, Vol. 17, No. 4, December 1985


390 l J. S. Quarter-man, A. Silberschatz, and J. L. Peterson
data of the process are replaced. Open files recomputation is also timed by a subroutine
are preserved (although there is a way to that resubmits a timeout for itself.
specify that certain file descriptors are to When a process chooses to relinquish the
be closed on an exec). Most signal-handling CPU, it goes to sleep on an euent. The
properties are preserved, but arrangements kernel primitive used for this is called sleep
to call a specific user routine on a signal (not to be confused with the user-level
are canceled for obvious reasons. The pro- library routine of the same name). It takes
cess identifier and most other properties an argument that is by convention the ad-
of the process are unchanged. dress of a kernel data structure related to
an event the process wants to occur before
it is awakened. When the event occurs, the
2.3 CPU Scheduling
system process that knows about it calls
CPU scheduling in UNIX is designed to wakeup with the address corresponding to
benefit interactive jobs. Processes are given the event, and all processes that had done
small CPU time slices by an algorithm that a sleep on the same address are put in the
reduces to round-robin for CPU-bound queue to be scheduled to be run.
jobs, although there is a priority scheme. For example, a process waiting for disk
There is no preemption of one process by I/O to complete will sleep on the address of
another when running in the kernel. A the buffer header corresponding to the data
process may relinquish the CPU because it being transferred. When the interrupt rou-
is waiting on I/O or because its time slice tine for the disk driver notes that the trans-
has expired. fer is complete, it calls wakeup on the buffer
Every process has a scheduling priority header. The interrupt uses the kernel in-
associated with it; the lower the numerical terrupt stack, and the wakeup is done from
priority, the more likely the process is to whatever system process happens to be
run. Processes doing disk I/O or other im- running.
portant tasks have negative priorities and The process that actually does run is
cannot be interrupted by signals. Ordinary chosen by the scheduler, effectively at ran-
user processes have positive priorities and dom. Sleep, however, also takes a second
thus are all less likely to be run than any argument, which is the scheduling priority
system process, although user processes to be used for this purpose. The priority
may have precedence over one another. argument, if negative, also prevents the
The nice command may be used to affect process from being prematurely awakened
this precedence according to its numerical from its sleep by some exceptional event,
priority argument. such as a signal.
The more CPU time a process accumu- There is no memory associated with
lates, the lower (more positive) its priority events, and the caller of the routine that
becomes. The reverse is also true (process does a sleep on an event must be prepared
aging is employed to prevent starvation). to deal with a premature return, including
Thus there is negative feedback in CPU the possibility that the reason for waiting
scheduling, and it is difficult for a single has vanished.
process to take all CPU time. There are race conditions involved in the
Older UNIX systems used a l-second event mechanism. If a process checks to see
quantum for the round-robin scheduling. if a flag in memory has been set by an
4.2BSD reschedules processes every 0.1 interrupt routine, and then sleeps on an
second and recomputes priorities every sec- event, the process may sleep forever, be-
ond. The round-robin scheduling is accom- cause the event may occur between the test
plished by the timeout mechanism, which of the flag and the completion of the sleep
tells the clock interrupt driver to call a primitive. This is prevented by raising the
certain subroutine after a specified interval. hardware processor priority during the crit-
The subroutine to be called in this case ical section so that no interrupts can occur,
causes the rescheduling and then resubmits and thus only the process desiring the event
a timeout to call itself again. The priority can run until it is sleeping. Hardware pro-

Computing Surveys, Vol. 17, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System 391
cessor priority is used in this manner to main memory was limited to 256 kbytes.
protect critical regions throughout the ker- Thus UNIX swapped.
nel and is the greatest obstacle to porting The advent of the VAX with its 512-byte
UNIX to multiple processor machines pages and multigigabyte virtual address
[Bach and Buroff 1984; Gobel and Marsh space permitted paging, although the vari-
19811(although not a large enough obstacle ety of practical algorithms was limited by
to prevent such ports, as many have been the lack of a hardware reference bit. Berke-
done [Beck and Kasten 1985; Bell 19851). ley introduced paging to UNIX with 3BSD
Many processes, such as text editors, are [Babaoglu and Joy 19811.
I/O bound and will usually be scheduled
mainly on the basis of waiting for I/O. 3.1 Paging
Experience suggests that the UNIX sched-
uler performs best with I/O bound jobs, as VAX 4.2BSD is a demand-paged virtual
can be observed when there are several memory system. External fragmentation of
CPU bound jobs like text formatters or memory is minimized by paging. (There is
language interpreters running. internal fragmentation, but this is negligi-
CPU scheduling, swapping, and paging ble with a reasonably small page size.)
interact: The lower the priority of a process, Swapping is kept to a minimum because
the more likely it is that its pages will be more jobs can be kept in main memory
paged out and that it will be swapped in its since not all of any job has to be resident.
entirety. Because the VAX has no hardware mem-
The CPU scheduling just described pro- ory management page reference bit, many
vides short-term scheduling, although the memory management algorithms (such as
negative feedback property of the priority page fault frequency) are unusable. 4.2BSD
scheme provides some more long-term used a modified Global Clock Least Recently
scheduling, since it largely determines the Used (LRU) algorithm. A software clock
long-term job mix. Swapping has an inter- hand linearly and repeatedly sweeps all
mediate range scheduling effect, although frames of main memory that are available
on a machine with sufficient memory, for paging. The missing hardware reference
swapping occurs rarely. bit is simulated by marking a page as in-
valid (reclaimable) when the clock hand
3. MEMORY MANAGEMENT
sweeps over it. If the page is referenced
before the clock hand next reaches it, a
Much of UNIX’s early development was page fault occurs, and the page is made
done on the PDP-11. That computer has valid again. But if the page has not been
only eight pages in its virtual address space, referenced when the clock hand reaches it
and those are of 8192 bytes each. (The again, it is reclaimed for other use. Various
larger machines like the PDP-11/70 allow software conditions are also checked before
separate instruction and address spaces, a page is marked invalid, and many other
which effectively double the address space measures are taken to reduce the high over-
and number of pages, but this is still not head of this use of page faults to simulate
much.) This large granularity is not con- reference bits.
ducive to sophisticated paging algorithms. There is a way for a process to turn off
The kernel was also restricted by the same the reference bit simulation for its pages,
virtual address space limitations, and so thus getting effectively random page re-
there was also little room to implement placement: Franz Lisp uses this feature
such algorithms. (In fact, the kernel was during garbage collection. There are checks
even more severely constrained as a result to make sure that a process’s number of
of dedicating part of one data page to in- valid data pages does not fall too low, and
terrupt vectors, a whole page to point at to keep the page-out device from being
the per-process system page, and yet an- flooded with requests. There is also a mech-
other for the UNIBUS register page.) Fur- anism by which a process may limit the
ther, on the small PDP-lls, total physical amount of main memory that it uses. There

Computing Surveys, Vol. 17, No. 4, December 1985


392 l J. S. Quurterman, A. Silberschatz, and J. L. Peterson
is no provision, however, for a process to disk. Processes that have been swapped out
lock a specific set of its pages in main may also find some of their pages still in
memory, because this is considered inap- the free list when they are swapped
propriate in the research environments in back in.
which 4.2BSD is commonly used, where the UNIX processes have their data logically
paging characteristics of programs under divided into initialized data and uninital-
development cannot be readily predicted ized data (bss). Uninitialized data are all
and equitable sharing of all system re- zero at process execution and are repre-
sources among all processes is important. sented in the process’s object file only by
In general, page locking is less appropriate an indication of the file’s size. Pages for
in a system like UNIX, where processes are such uninitialized data do not have to be
numerous and readily created, than in a read from disk: A frame is found, mapped,
system like VMS. and zero filled. New stack pages also do not
All main memory frames available for need to be transferred from disk.
paging are represented by the core map or If the page has to be fetched from disk,
cmap. This map records the disk block cor- it must be locked for the duration. Once
responding to a frame that is in use, an the page is fetched and mapped properly, it
indication of what process page a frame in must not be unlocked if raw physical I/O is
use is mapped into, and a free list of frames being done on it.
that are not mapped into process pages. Paging out, that is, the page replacement
Paging in, that is, demandpaging, is done algorithm, is more interesting. The soft-
in a straightforward manner. When a ware clock hand cycles through cmap,
process needs a page and the page is not checking conditions on each frame as it
mapped into a memory frame, a page fault passes. If the frame is empty (has no pro-
is produced by the hardware. This causes cess page mapped into it), it is left un-
the kernel to allocate a frame of main touched, and the hand sweeps to the next
memory, map it into the appropriate pro- page. If I/O is in progress on it, or some
cess page, and read the proper data into it other software condition indicates the page
from disk. Such pages come initially from is actually being used, then the frame is
the process’s object file. That is, processes also left untouched. But if the page is not
are not prepaged whole; instead, they are in use, the corresponding process page table
demand loaded. entry is located. If the entry is valid, it is
There are some complications. If the re- made invalid but reclaimable. If the entry
quired page is still in the process’s page is invalid (because the last sweep of the
table but has been marked invalid by the clock hand made it so), it is reclaimed. If
last pass of the clock hand, it can be marked the page has been modified (the VAX does
valid and used without any I/O transfer. have a dirty bit), it must first be written to
Pages can be similarly retrieved from the disk before the frame is added to the free
memory free list. Every process’s text seg- list.
ment is by default a shared read-only text The LRU clock hand is implemented in
segment. This is practical with paging, the pagedaemon, which is process 2 (the
since there is no external fragmentation, scheduler is process 0 and init is process 1).
and the swap space gained by sharing more The pagedaemon’s purpose is to keep the
than offsets the overhead involved, since memory frame free list large enough that
the kernel virtual space is large. When the paging demands on memory will not ex-
last process sharing a text segment dies, haust it. This process spends most of its
the disk block information and data for its time sleeping, but a check is done several
text frames are left in the frames when they times per second (scheduled by a timeout)
are put in the free list, so that if a new to see whether action is necessary, and
process sharing the same text executes process 2 is awakened if so. Whenever the
soon, frames can be mapped into many of number of free frames falls below a thresh-
its text pages without being retrieved from old (lotsfree), the process is awakened; thus,

Computing Surveys, Vol. 1’7, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System 393

if there is always a lot of free memory, the used by the shells) are smoothed by the
pagedaemon imposes no load on the system substitution of vfork (see Section 2.2) in
because it never runs. many instances.
The sweep of the clock hand each time For I/O efficiency, the VAX 512-byte
the pagedaemon process is awakened (i.e., hardware pages are too small, so they are
the number of frames scanned, which is clustered in groups of two so that all paging
usually more than the number paged out) I/O is actually done in 1024-byte (or larger)
is determined both by the number of frames chunks. For still greater efficiency, adjacent
needed to reach lo&free and by the number frames that are ready to be paged in or out
of frames that the scheduler has determined at the same time are done in the same I/O
are needed for various reasons (the more operation; this is called klustering.
frames lacking or needed, the longer the On a page fault, several additional pages
sweep). If the number of frames free rises that are adjacent in both physical and pro-
to lotsfree before the expected sweep is com- cess virtual space may also be read in on
pleted, the hand stops and the pagedaemon one disk transfer. Such prepaged frames
process sleeps. The parameters that deter- are put on the bottom of the free list, so
mine the range of the clock hand sweep are that they are likely to remain on the free
set at system start-up according to the list long enough for the process to claim
amount of main memory so that page- them if they are needed. Since many pro-
daemon should not use more than 10 cesses may not actually use such prepaged
percent of all CPU time. frames, they are not immediately mapped
If the scheduler decides that the paging into the process’s pages, because they would
system is overloaded, processes will be then stay there until the clock hand passed,
swapped out whole until the overload is even if they were initially marked invalid.
relieved. This usually happens only if sev- Large VAX systems now commonly have
eral conditions are met: There is a high 8, 16, or even more megabytes of main
load average; free memory has fallen below memory. This leads to a problem with the
a very low limit, minfree; and the average reference bit simulation. The 4.2BSD clock
memory available over recent time is less hand may take a long time (minutes, or
than a desirable amount, desfree, where even tens of minutes) to complete a cycle
lotsfree > desfree > minfree. In other words, around such large amounts of memory.
only a chronic shortage of memory with Thus the second encounter of the hand
several processes trying to run will cause with a given page (when it is checked to see
swapping, and even then free memory has if it is still valid) has little relevance to the
to be very low at the moment. (An excessive first encounter (when the page is marked
paging rate or a need for page tables by the invalid), and the pagedaemon will have dif-
kernel itself may also enter into the calcu- ficulty finding reclaimable page frames.
lations in rare cases.) Processes may, of 4.3BSD uses a second clock hand, which
course, be swapped by the scheduler for follows behind the first at a shorter dis-
other reasons (such as just not running for tance than a complete cycle (see Figure 5).
a long time). The front hand marks pages invalid, while
The parameter lotsfree is usually one- the back hand reclaims frames whose pages
fourth of the memory in the map the clock are still invalid. The proper interval be-
hand sweeps, and desfree and minfree are tween the two hands is still a matter for
usually the same in different systems, research.
but are limited to fractions of available
memory. 3.2 Swapping
Many peaks of memory demand caused
by exec in a swapping system are smoothed Pre-3BSD UNIX systems used swapping
by demand paging processes rather than by exclusively to handle memory contention
preloading them. Other peaks caused by the among processes: If there was too much
address space copying of fork (especially as contention, some processes were swapped

Computing Surveys, Vol. 17, No. 4, December 1985


394 l J. S. Quarterman, A. Silberschatz, and J. L. Peterson

clock hand

4.2BSD clock hand

Figure 5. Pagedaemon clock hands in 4.2BSD and 4.3BSD.

out. Also, a few large processes could rce also promoted external fragmentation of
many small processes out of memory, and both main memory and swap space.
a process larger than nonkernel main mem- Decisions on which processes to swap in
ory could not be run at all. The system data or out were made by the scheduler process,
segment (the u structure and kernel stack) process 0 (also known as the swapper pro-
and the user data segment (text, if non- cess). The scheduler woke up at least once
sharable; data; and stack) were kept in con- every 4 seconds to check for processes to
tiguous main memory for swap transfer be swapped in or out. A process was more
efficiency, so external fragmentation of likely to be swapped out if it was idle, had
memory could be a serious problem. been in main memory a long time, or was
Allocation of both main memory and large; if no easy candidates were found,
swap space was done first fit. When the other processes were picked by age. A pro-
size of a process increased (owing to either cess was more likely to be swapped in if it
stack expansion or data expansion), a new had been swapped out a long time or was
piece of memory big enough for the whole small. There were checks to prevent thrash-
process was allocated, the process copied, ing, basically by not letting a process be
the old memory freed, and the appropriate swapped out if it had not been in core a
tables updated. (Some attempt was made certain amount of time.
in some systems to find memory contiguous Many UNIX systems still use the swap-
to the end of the current piece to avoid ping scheme described above. All AT&T
some copying, but the stack would still have USG/USDL systems, including System V,
to be copied on machines where it grew do. All Berkeley VAX UNIX systems, on
downward.) If no single large enough piece the other hand, including 4.2BSD, depend
of main memory was available, the process primarily on paging for memory contention
was swapped out in such a way that it would management and only secondarily on swap-
be swapped back in with the new size. ping. A scheme very similar in outline to
There was no need to swap a sharable the traditional one is used to determine
text segment out (more than once), because what processes get swapped in or out, but
it was never writable, and there was no the details differ and the influence of
need to read in a text segment for a process swapping is less.
when another instance was already in core. If the paging situation is pathological,
This was one of the main reasons for shared then jobs are swapped out as described
text: less swap traffic. The other reason above until the situation is acceptable.
was that multiple processes using the same Otherwise, the process table is searched for
text segment required less main memory. a process deserving to be brought in (deter-
However, it was not practical on most ma- mined by how small it is and how long it
chines for every process to have a shared has been swapped). The amount of memory
text segment, since those segments re- the process will need is some fraction of its
quired extra overhead in the kernel and total virtual size, up to one-half if it has

Computing Surveys, Vol. 17, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System 395

been swapped a long time. If there is not specifying a path through the file system to
enough memory available, processes are the file. Syntactically it consists of individ-
swapped out until there is. The processes ual file name elements separated by slash
to be swapped out are chosen according to characters. In the example
their being the oldest of the biggest jobs in
core, or having been idle for a while, or, in /alpha/beta/gamma
case of desperation, simply being the oldest the first slash indicates the root of the
in core. whole directory tree, called the root di-
The age preferences used with swapping rectory. The next element, alpha, is a
guard against thrashing, but paging does subdirectory of the root, beta is a sub-
so more effectively. Ideally, given paging, directory of alpha, and gamma is a file in
processes will not actually be swapped out the directory beta. Whether gamma is
whole unless they are idle, since each pro- an ordinary file or a directory itself cannot
cess will only need a small working set of be told from the pathname syntax.
pages in main memory at any one time, and There are two kinds of pathnames, ab-
the pagedaemon will reclaim unused pages solute pathnames and relative pathnames.
for use by other processes, so that most Absolute pathnames start at the root of the
runnable processes will never be completely file system and are distinguished by a slash
swapped out. at the beginning of the pathname; the pre-
There is a swap allocation map, dmap, vious example (/alpha/beta/gamma) is
for each process’s data and stack segment. an absolute pathname. Relative pathnames
Swap space is allocated in pieces that are start at the current directory, which is a
multiples of a constant minimum size (e.g., property of the process accessing the path-
32 pages) and a power of 2. There is a name. The example
maximum, which is determined by the size
of the swap space partition on the disk. If gamma
several logical disk partitions may be used indicates a file named gamma in the cur-
for swapping, they should be the same size rent directory, which might or might not be
for this reason. The several logical disk /alpha/beta.
partitions should be on separate disk arms A file may be known by more than one
to minimize disk seeks. name in one or more directories. Such mul-
tiple names are known as links and are all
4. FILE SYSTEM treated as equally important by the oper-
ating system. In 4.2BSD there is also the
Data are kept in files, which are organized idea of a symbolic link, which is a file con-
in directories. Files, directories, and related taining the pathname of another file. The
data structures comprise the file system. two kinds of links are also known as hard
links and soft links, respectively. Soft links,
unlike hard links, may point to directories
4.1 User Interface
and may cross file system boundaries (see
An ordinary file in UNIX is a sequence of below).
bytes. Different programs expect various The filename “.” in a directory is a hard
levels of structure, but the kernel does not link to the directory itself, and the filename
impose structure on files. For instance, the “ ” is a hard link to the parent directory.
convention for text files is lines separated Thus, if the current directory is /alpha/
by a single new-line character (which is the beta, then . . refers to /alpha and . refers
line feed character in ASCII), but the ker- to /alpha/beta itself.
nel knows nothing about this convention. Hardware devices have names in the file
Files are organized in directories in a system. These device special files or special
hierarchical tree structure. Directories are files are known to the kernel as device
themselves files that contain information interfaces, but are nonetheless accessed by
on how to find other files. Apathname to a the user by much the same system calls as
file is a text string that identifies a file by other files.

Computing Surveys, Vol. 17, No. 4, December 1985


: --_.._.!!!E ______.:
Figure 6. Example directory structure.
4.2BSD and 4.3BSD as Examples of the UNIX System l 397

Figure 6 shows some directories, ordinary buffer and its size) to perform data trans-
files, and special files that might appear in fers to or from the disk file or device. A file
a real file system. The root of the whole is closed by passing its file descriptor to the
tree is /. /vmunix is the binary object of close system call. Each read or write up-
the 4.2BSD kernel, which is used at system dates the current offset into the file, which
boot time. /etc/init is the executable bi- is used to determine the position in the file
nary of process 1, which is the ancestor of for the next read or write. This position
all other user processes. System mainte- can be set by the lseek system call. There
nance commands and basic system param- is an additional system call, ioctl, for
eter files appear in the directory /etc. manipulating device parameters.
Examples are /etc/passwd (which defines A new, empty file may be created by the
a user’s login name, numerical identifier, treat system call, which returns a file de-
login group, home directory, and command scriptor as for open. New hard links to an
interpreter, and which contains the user’s existing file may be created with the link
encrypted password) and /etc/group system call, and new soft links with the
(which defines names for group identifiers symlink system call. Either may be removed
and determines what users are in many by the unlink system call. When the last
groups). hard link is removed (and the last process
Ordinary commands appear in the direc- that has the file open closes it), the file is
tories /bin (commands essential to system deleted. There may still be a symbolic link
operation), /usr/bin (other commands, in pointing to the nonexistent file: Attempts
a separate directory for historical reasons), to reference such a link will produce an
/usr/ucb (commands from the University error.
of California, Berkeley), and /usr/local Device special files may be created by the
(commands, added at the local site, which mknod system call. Directories are created
did not come with the 4.2BSD distribu- by the mkdir system call (whose functions
tion). Library files appear in /lib (e.g., com- were accomplished in pre-4.2BSD systems
piler passes and /lib/libc.a, which is the by the mkdir command using the mknod
C library, containing utility routines and and link system calls). Directories are re-
system call interfaces), /usr/lib (most moved by rmdir (or, in pre-4.2BSD sys-
text processing macros), and /usr/local/ tems, by the rmdir command using unlink
lib (locally added libraries). System param- several times). The current directory is set
eter files that are useful to user by the chdir system call.
programs appear in /usr/include. For The chown system call sets the owner
instance, /usr/ include/stdio.h contains and group of a file and chmod changes
parameters related to the standard I/O sys- protection modes. Stat applied to a tile
tem (see Section 7.2). name or fstat applied to a file descriptor
Device special files (such as /dev/con- may be used to read back such properties
sole, the interface to the system console of a file. In 4.2BSD, the rename system call
terminal) ordinarily appear in /dev. may be used to rename a file; in previous
Finally, private user files appear under systems this was done by link and unlink.
users’ login directories, which are grouped The user ordinarily only knows of one
in directories whose names vary from site file system, but the system may know this
to site. In the figure, /uO/avi would be a one virtual file system is actually composed
login directory for the user whose login of several physical file systems, each on a
name is avi. different device. A physical file system may
A file is opened by the open system call, not span multiple hardware devices. Since
which takes a pathname and a permission most physical disk devices are divided into
mode (indicating whether the file should be several logical devices, there may be more
open for reading, writing, or both) as ar- than one file system per physical device,
guments. This system call returns a small but no more than one per logical device.
integer, called a file descriptor. This file One file system, the root file system, is
descriptor may then be passed to a read or always available. Others may be mounted,
write system call (along with a pointer to a that is, integrated into the directory
Computing Surveys, Vol. 17, No. 4, December 1985
398 . J. S. Quarterman, A. Silberschatz, and J. L. Peterson
hierarchy of the root file system. Refer-
ences to a directory that has a file system
mounted on it are transparently converted
by the kernel into references to the root
directory of the mounted file system.

4.2 Implementations
The system call interface to the file system
Figure 7. Cylinder group.
is simple and well defined. This has allowed
the implementation of the file system itself
to be changed without significant effect on
the user. This happened with Version 7: tern, which is described in Section 5.1; for
The size of inodes doubled, the maximum the moment, we consider only what resides
file and file system sizes increased, and the on the disk.
details of free list handling and superblock A physical disk drive may be partitioned
information changed. Also at the time seek into several logical disks, and each logical
(with a 16-bit offset) became lseek (with a disk may contain a file system. A file sys-
32-bit offset) to allow for simple specifica- tem cannot be split across more than one
tion of offsets into the larger files then logical disk. The actual number of file sys-
permitted. Few other changes were visible tems on a drive varies according to the size
outside the kernel. of the disk and the purpose of the computer
In 4.0BSD the size of the blocks used in system as a whole. Some partitions may be
the file system was increased from 512 to used for purposes other than supporting file
1024 bytes. Although this entailed in- systems, such as swapping.
creased internal fragmentation of space on The first sector on the logical disk is the
the disk, it allowed a factor-of-2 increase in boot block, containing a primary bootstrap
throughput, due mainly to the greater num- program, which may be used to call a sec-
ber of data accessed on each disk transfer. ondary bootstrap program residing in the
This idea was later adopted by System V, next 7.5 kbytes.
along with a number of other ideas, device The data structures on the rest of the
drivers, and programs. logical disk are organized into cylinder
The 4.2BSD file system implementation groups, as shown in Figure 7. Each of these
[McKusick et al. 19841 is radically different occupies one or more consecutive cylinders
from that of Version 7 [Thompson 19781. of the disk so that disk accesses within the
This reimplementation was done primarily cylinder group require minimal disk head
for efficiency and robustness, and most of movement. Every cylinder group has a
the changes done for those reasons are in- superblock, a cylinder block, an array of
visible outside the kernel. There were some inodes, and some data blocks.
new facilities introduced at the same time, The superblock contains static param-
such as symbolic links and long filenames, eters of the file system. These include the
which are visible at both the system call total size of the file system, the block and
and the user levels. Most of the changes fragment sizes of the data blocks, and as-
required to implement these were not in sorted parameters that affect allocation
the kernel, but rather in the programs that policies. The superblock is identical in each
use them. cylinder group, so that it may be recovered
from any one of them in the event of disk
4.3 Data Structures on the Disk corruption.
The cylinder block contains dynamic pa-
The virtual file system seen by the user is rameters of the particular cylinder group.
supported by a data structure on a mass These include a bit map for free data blocks
storage medium, usually a disk. This data and fragments and a bit map for free inodes.
structure is the file system. All accesses to Statistics on recent progress of the alloca-
it are buffered through the block I/O sys- tion strategies are also kept here.

Computing Surveys, Vol. 17, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System l 399

There is an array of inodes in each cyl-


inder group. The inode is the locus of most
information about a specific file on the disk.
It contains the user and group identifiers
of the file, its times of last modification and
access, a count of the number of hard links
(directory entries) to the file, and the type
of the file (plain file, directory, symbolic Figure 8. Data block pointers and file size.
link, character or block device, or socket).
Finally, the inode contains pointers to
the blocks containing the data in the file. distinction between ordinary files and di-
The first dozen of these pointers point to rectories at this level: Their contents are
direct blocks; that is, they directly contain kept in data blocks in the same manner;
addresses of blocks that contain data of the only the type field in the inode distin-
file. Thus, blocks of small files may be guishes them.
referenced with few disk accesses, since a A directory contains a sequence of (file-
copy of the inode is kept in main memory name, inode number) pairs. In Version 7,
while a file is open. By supposing a major filenames were limited to 14 characters,
block size of 4096 bytes for the file system, and so directories were arrays of 16-byte
then up to 48 kbytes of data may be ac- records, the last 14 bytes of each record
cessed directly from the information in the containing the filename (null padded) and
inode. the first two bytes the inode number.
The next three data block pointers in the Empty array elements were indicated by
inode point to indirect blocks. The first of zero inode numbers. In 4.2BSD, filenames
these is a single indirect block; that is, the are limited to 255 characters, and the size
inode contains the address of a disk block of individual directory entries is variable
that itself contains not data but rather the up to this size. These long filenames require
addresses of blocks that do contain data. more overhead in the kernel, but make it
Then there is a double indirect block easier to choose meaningful filenames with-
pointer, containing the address of a block out worrying about filename sizes.
that contains the addresses of blocks that A data block size larger than the usual
finally contain data. The last pointer would hardware disk sector size of 512 bytes is
contain the address of a triple indirect block, desirable for speed. Since UNIX file sys-
but there is no need for it. The minimum tems usually contain a very large pro-
block size for a file system is set to 4096 portion of small files, much larger blocks
bytes so that files with as many as 232bytes cannot be used alone because they cause
will only use double, not triple, indirection. excessive internal fragmentation, that is,
That is, since each block pointer takes four wasted space. This is why the earlier 4BSD
bytes, we have the number of bytes acces- file system was limited to a 1024-byte block.
sible from each type of pointer shown in The 4.2BSD solution is to use two sizes:
Figure 8. All the blocks of a file are of a large block
The number 232is significant because the size, for example, 8192, except for the last,
file offset in the file structure in main mem- which may be an appropriate multiple of a
ory is kept in a 32-bit word, preventing files fragment size, for example, 1024, to fill out
from being larger. the file. Thus, an l&000-byte file would
Most of the cylinder group is taken up have two 8192-byte blocks and one 2048-
by data blocks, which contain whatever the byte partial block (which would not be com-
users have put in their files. There is no pletely filled). If the file were large enough
to use indirect blocks, those would each be
6 The name inode (pronounced eye node) is derived of the major block size; the fragment size
from “index node” and was originally spelled “i-node.” applies only to data blocks.
The hyphen mostly fell out of use over the years, but
both spellings may still be seen, along with the variant The block and fragment sizes are set
I node. at file system creation according to the

Computing Surveys, Vol. 1’7,No. 4, December 1985


400 l J. S. Quarterman, A. Silberschutz, and J. L. Peterson
intended use of the file system: If many and the inumber is effectively just an index
small files are expected, the fragment size into this array. Things are a bit more com-
should be small; if repeated transfers of plicated in 4.2BSD, but the inumber is still
large files are expected, the basic block size just the sequence number of the inode in
should be large. Implementation details the file system.
force a maximum block/fragment ratio of In the Version 7 file system, any block of
8/l and a minimum block size of 4096, so a file can be anywhere on the disk between
typical choices are 4096/512 for the former the end of the inode array and the end of
case and 8192/1024 for the latter. the file system, and free blocks are kept in
Suppose that data are written to a file in a linked list in the superblock. The only
transfer sizes of 1024 bytes, and the block constraint on the order of allocation of disk
and fragment sizes of the file system are blocks is the order in which the free list
4096 and 512 bytes. The file system will happens to be when they are allocated.
allocate a 1024-byte fragment to contain Thus the blocks may be arbitrarily far from
the data from the first transfer. The next both the inode and each other. Further-
transfer will cause a 2048byte fragment to more, the more a file system of this kind is
be allocated to hold the total data, and the used, the more disorganized the blocks in a
data from the original fragment must be file become. This entropic process can only
recopied. The local allocation routines do be reversed by reinitializing and restoring
attempt to find space on the disk immedi- the entire file system, which is not a con-
ately following the existing fragment so venient thing to do.
that no copying will be necessary. None- The cylinder group was introduced in
theless, in the worst case, up to seven copies 4.2BSD to allow localization of the blocks
to new, larger, fragments may be required in a file. The header information in a cyl-
as a 1024-byte fragment grows into an inder group (the superblock, the cylinder
8192-byte block. block, and the inodes) is not always at the
When extending a file that already ends beginning of the cylinder group. If it were,
in a fragment, 4.3BSD allocates a new frag- the header information for every cylinder
ment in a space large enough for a whole group would be on the same disk platter,
block, so that following writes will be un- and a single disk head crash could wipe
likely to require recopying. them all out. Therefore each cylinder group
Provisions have been made for a user has its header information at a different
program to discover the block size of the offset from the beginning of the group.
file system for a file, so that the program It is very common for the directory listing
may do data transfers of that size in order command Is to read all the inodes of every
to avoid this fragment recopying. This is file in a directory, making it desirable for
often done automatically for programs by all such inodes to be close together. For this
the user-level standard I/O library (see reason the inode for a file is usually allo-
Section 7.2). cated from the same cylinder group as that
of its parent directory’s inode. Not every-
4.4 Layout and Allocation Policies
thing can be localized, however, and so an
inode for a new directory is put in a differ-
The kernel uses a (device number, inode ent cylinder group from that of its parent
number) pair to identify a file. The device directory. The cylinder group chosen for
number resolves matters to the logical disk, the new directory inode is the one with the
and thus to the file system, since there may greatest number of unused inodes.
be only one file system per logical disk. The To reduce the disk head seeks involved
inodes in the file system are numbered in in accessing the data blocks of a file, data
sequence. In the Version 7 file system, all blocks are allocated from the same cylinder
inodes are in an array immediately follow- group as much as possible. Since a single
ing a single superblock at the beginning of file cannot be allowed to take up all the
the logical disk (there are no cylinder blocks in a cylinder group, a file exceeding
groups). The data blocks follow the inodes, a certain size (generally 48 or 96 kbytes)

Computing Surveys, Vol. 17, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System 401
has further block allocation redirected to a missions, and an error is returned if neces-
different cylinder group, the new group sary. The inode of the starting directory is
being chosen from among those having always available and is used to search the
more than average free space. If the file directory, if required.
continues to grow, allocation is again redi- The next element of the pathname, up to
rected at each megabyte to yet another the next “/,” is looked for as a name in the
cylinder group. Thus, all the blocks of a starting directory, and an error returned if
small file are likely to be in the same cyl- it is not found. If it is found, the inumber
inder group, and the number of long seeks of the element as found in the directory is
involved in accessing a large file is kept used to retrieve the inode.
small. If there is another element in the path-
There are two levels of disk block allo- name, an error is returned if the current
cation routines. The global policy routines inode does not refer to a directory or if
select a desired disk block according to the access is denied; else this directory is
above considerations. The local policy rou- searched as was the previous. This pro-
tines use the specific information recorded cess continues until the end of the path-
in the cylinder blocks to choose a block name is reached and the desired inode is
near to the one requested. If the requested returned.
block is not in use, it is returned, else the If at any point an inode has a file system
block rotationally closest to it in the same mounted on it, this is indicated by a bit in
cylinder, else a block in a different cylinder the inode structure. The mount table is then
but the same cylinder group. If there are no searched to find the device number of the
more blocks in the cylinder group, a quad- mounted device and the in-core copy of its
ratic rehash is done among all the other superblock. The superblock is used to find
cylinder groups to find a block, and if that the inode of the root directory of the
fails, an exhaustive search is done. If mounted file system, and that inode is used.
enough free space (typically 10 percent) is Conversely, if a pathname element is “..”
left on the file system, blocks are usually and the directory being searched is the root
found where desired, the quadratic rehash directory of a file system that is mounted,
and exhaustive search are not used, and the mount table must be searched to find
performance of the file system does not the inode that it is mounted on, and that
degrade with use. inode is used.
The 4.2BSD file system is capable of Hard links are simply directory entries
using 30 percent or more of the bandwidth like any other. Symbolic links are handled
of a typical disk, in contrast to about 3 for the most part by starting the search
percent or less in Version 7. over with the pathname taken from the
contents of the symbolic link. Infinite loops
4.5 Mapping a Pathname to an bode are prevented by counting the number of
symbolic links encountered during a path-
The user refers to a file by a pathname, name search and returning an error when
while the file system uses the inode as its a limit (eight) is exceeded.
definition of a file. Thus the kernel has to The addition of symbolic links exacer-
map the user’s pathname to an inode. bated the slowness of pathname transla-
First a starting directory is determined. tion, so that it became possibly the main
If the first character of the pathname is efficiency problem in 4.2BSD. The problem
“I,” this is an absolute pathname, and the has been avoided in 4.3BSD by having
starting directory is the root directory. If a cache of filename-to-inode references.
the pathname starts with anything other Since it is very common for the k; directory
than a slash, this is a relative pathname, listing program (and other programs) to
and the starting directory is the current scan all the files in a directory one by one,
directory of the current process. a directory offset cache was also added, so
The starting directory is checked for that when a process requests a filename in
existence, proper file type, and access per- the same directory as its previous request,

Computing Surveys, Vol. 17, No. 4, December 1985


402 . J. S. Quurterman, A. Silberschatz, and J. L. Peterson
Disk Space

Figure 9. File descriptor to inode.

the search through the directory is started in the inode. Since more than one process
where the previous name was found. may open the same file, and each such
Special files and sockets do not have data process needs its own offset for the file,
blocks allocated on the disk. The kernel keeping the offset in the inode is inappro-
notices these file types (as indicated in the priate. Thus the file structure is used to
inode) and calls appropriate drivers to contain the offset.
handle I/O for them. File structures are inherited by the child
Once the inode is found by, for instance, process after a fork, so several processes
the open system call, a file structure is may also have the same offset into a file.
allocated to point to the inode and to be The fcntl system call manipulates the file
referred to by a file descriptor by the user. structure (it can be used to make several
file descriptors point to the same file struc-
4.6 Mapping a File Descriptor to an bode ture, for instance), whereas the ioctl system
call manipulates the inode.
System calls that refer to open files take a The inode structure pointed to by the file
file descriptor as argument to indicate the structure is an in-core copy of the inode on
file. (A file descriptor is a small nonnegative the disk and is allocated out of a fixed-
integer returned by the open or treat system length table. The in-core inode has a few
calls or other system calls that open or extra fields, such as a reference count of
create files; see Section 4.2.) The file de- how many file structures are pointing at it,
scriptor is used by the kernel to index an and the file structure has a similar refer-
array of pointers for the current process ence count for how many file descriptors
(kept in the process’s user structure) to refer to it.
locate a file structure. This file structure The 4.2BSD file structure may point to
in turn points to the inode. The relations a socket instead of to an inode.
of these data structures are shown in
Figure 9.
5. I/O SYSTEM
The read and write system calls do not
take a position in the file as argument. Many hardware device peculiarities are
Rather the kernel keeps a file offset that is hidden from the user by high-level kernel
updated after each read or write according facilities, such as the file system and the
to the number of data actually transferred. socket interface. Other such peculiarities
The offset can be set directly by the lseek are hidden from the bulk of the kernel itself
system call. If the file descriptor indexed by the I/O system [Ritchie et al. 1979a;
an array of inode pointers instead of file Thompson 19781. This consists of buffer
pointers, this offset would have to be kept caching systems, general device driver code,

Computing Surveys, Vol. 17, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System l 403

system call interface to the kernel

network I
interface block device drivers character&vice drivers
ClliVerS
the hardware

and drivers for specific hardware devices, The names block and character for the
which must finally address peculiarities two main device classes are not quite ap-
of the specific devices. The various kernel propriate; structured and unstructured
I/O systems are diagramed in Figure 10. would be better. For each of these classes
There are three main kinds of I/O in there is an array of entry points for the
4.2BSD: the socket interface and its related various drivers. A device is distinguished
protocol implementations, block devices, by a class and a device number, both of
and character devices. which are recorded in the inode of special
The socket interface, together with pro- files in the file system. The device number
tocols and network interfaces, is treated in is in two parts. The major device number is
Section 6 on communications. used to index the array appropriate to the
Block devices include disks and tapes. class to find entries into the appropriate
Their distinguishing characteristic is that device driver. The minor device number is
they are addressable in a common fixed interpreted by the device driver as, for ex-
block size, usually 512 bytes. The device ample, a logical disk partition or a terminal
driver is required to isolate details of tracks, line.
cylinders, and the like from the rest of the A device driver is connected to the rest
kernel. Block devices are accessible directly of the kernel only by the entry points re-
through appropriate device special files corded in the array for its class, by its use
(e.g., /dev/hpOh), but are more commonly of common buffering systems, and by its
accessed indirectly through the tile system. use of common low-level hardware support
In either case, transfers are buffered routines and data structures. This segre-
through the block buffer cache, which has a gation is important for portability, and also
profound effect on efficiency. in configuring systems.
Character devices include terminals (e.g.,
/dev/ttyOO) and line printers (/dev/lpO), 5.1 Block Buffer Cache
but also almost everything else (except net-
work interfaces) that does not use the block The block buffer cache serves primarily to
buffer cache. For instance, there is /dev/ reduce the number of disk I/O transfers
mem, which is an interface to physical required by file system accesses through the
main memory, and /dev/null, which is a disk drivers.
bottomless sink for data and an endless Since it is common for system parameter
source of end of file markers. Devices such files, commands, or directories to be read
as high-speed graphics interfaces may have repeatedly, it is possible for their data
their own buffers or may always do I/O blocks to be in the buffer cache when they
directly into the user’s data space; they are are needed, so that it is not necessary to
in any case classed as character devices. retrieve them from the disk.
Terminal-like devices use c-lists, which Processes may write or read data in sizes
are buffers smaller than those of the block smaller than a file system block or frag-
buffer cache. ment. The first time a small read is required

Computing Surveys, Vol. 17, No. 4, December 1985


404 l J. S. Quarterman, A. Silberschatz, and J. L. Peterson
from a particular file system block, the that contains useful data. The term buffer
block will be transferred from the disk into is usually used to refer to a buffer header
a kernel buffer. Succeeding reads of parts and the associated useful data. The term
of the same block then usually require only empty buffer, however, refers to a buffer
copying from the kernel buffer to the user header with no associated memory or data.
process’s memory. Writes are treated sim- The buffer headers are kept in several
ilarly, in that a cache buffer is allocated linked lists:
(and arrangements are made on the file
system on the disk) when the first write to l Reserved buffers: blocks containing in-
a file system block is made, and succeeding formation that stays in main memory,
writes to the part of the same block are such as superblocks of file systems.
then likely to only require copying into the l The cache: blocks not currently in use,
kernel buffer, and no disk I/O. but likely to be reused, in least recently
If the system crashes while data for a used order.
particular block are in the cache but have l The aged queue: blocks unlikely to be
not yet been written to disk, the file system reused, also in least recently used order.
on the disk will be incorrect and those data l The empty queue: buffer headers with no
will be lost. To alleviate this problem, memory or disk blocks associated with
writes are periodically forced for dirty them.
buffer blocks. This is done (usually every l Device driver active queues: Each block
30 seconds) by a user process, sync, exercis- device has a queue of buffers on which
ing a kernel primitive of the same name. I/O is active or pending. A buffer may be
There is also a system call, fsync, which on one of these active queues at a time
may be used to force all blocks of a single and will not appear at the same time in
file to be written to disk immediately: This the cache, the aged queue, or the empty
is useful for database consistency. queue.
Corruption of a directory can detach an The buffers in these lists are also hashed
entire subtree of the file system. Thus by device and block number for search
writes to disk are forced for directories on efficiency.
every write into the cache. The cache still On system start-up, a pool of memory
improves efficiency for directories because pages is reserved for use with the block
reads of the same blocks will find copies of buffer system. The size of this pool, and the
the data still in the cache. number of buffer headers, is determined
The problem also exists for the in-core according to the size of available main
copies of inodes (which are kept in a kernel memory. All the pages are initially linked
array dedicated to this purpose, and sepa- into buffer headers, which are all linked
rate from the block buffer cache) and into the aged queue.
superblocks. Write-throughs to the disk on When a read is wanted from a device, the
writes into the kernel copies are also forced appropriate block is first searched for in
for these, although the write-through is the cache. If it is found, it is used as is, and
somewhat selective for inodes. no I/O transfer is necessary. If it is not
Most magnetic tape accesses are, in prac- found, a buffer is chosen from the aged
tice, done through the appropriate raw tape queue, the device number and block num-
device (see Section 5.2), bypassing the ber associated with it are updated, more
block buffer cache. When the cache is used, memory is found for it (if necessary), and
tape blocks must still be written in order; the new data are transferred into it from
so the tape driver forces synchronous writes the device. If there are no aged buffers, the
for them. least recently used buffer in the cache is
The block buffer cache consists of a num- written to the disk (if necessary) and
ber of buffer headers. Each buffer header reused.
contains a device number, the number of a On a write, if the block is not found in
block on the device, a pointer to a piece of the buffer cache, a buffer is chosen in a
physical memory, the size of the piece of similar manner as for a read. If the block
memory, and the amount of that memory is already in the buffer cache, the new data
Computing Surveys, Vol. 17, No. 4, December 1985
4.2BSD and 4.3BSD as Examples of the UNIX System 405
are put in the buffer, and the buffer header buffer is taken off the empty queue, and
is marked to indicate that the buffer has excess pages from the shrinking buffer are
been modified. If the whole buffer is writ- put into it.
ten, it is queued for I/O and put on the The size of the buffer cache can have a
aged queue. Otherwise, no I/O transfer is profound effect on the performance of a
initiated, and the buffer is left in the main system. If it is large enough, the percentage
cache queue. of cache hits can be quite high and the
Similarly, when a read of a buffer reaches number of actual I/O transfers low.
the end of the buffer, it is put on the aged There are some interesting interactions
queue; otherwise, it is left in the cache. between this buffer cache, the file system,
Thus buffers that have been involved in and the disk drivers. When data are written
partial transfers are unlikely to be reused to a disk file, the buffers are put on a device
for other purposes before succeeding partial driver’s I/O queue. The disk device driver
transfers happen, whereas blocks that have keeps this active queue sorted by disk ad-
been completely used are made more read- dress to minimize disk head seeks and to
ily available. Transfers of large files, done write data at times optimized for disk ro-
in transfers of the file system block size or tation. When data are read from a disk file,
larger, will usually use buffers out of the the block I/O system does some read-ahead;
aged queue without disturbing buffers in reads, however, are much more nearly asyn-
the main cache. chronous than writes, making long disk
The number of data in a buffer in 4.2BSD head seeks and missed disk revolutions
is variable, up to a maximum (usually 8 more likely. Thus output to the disk
kbytes) over all file systems. The minimum through the file system is often faster
size of a buffer is the fragment size (see than input for large transfers, counter to
Section 4.3) of the file system, usually 512 intuition.
or 1024 bytes. The buffer header specifies
the number of data in the buffer, which is
5.2 Raw Device Interfaces
either a full-sized block or one to seven
contiguous fragments. Each buffer header Almost every block device also has a char-
also specifies the amount of memory allo- acter interface. These are called raw device
cated to the buffer, which is an integral interfaces and are accessed through sepa-
number of memory clusters. Each memory rate special files; for example, /dev/rhpOh
cluster in the buffer pool is mapped to might be the raw interface corresponding
exactly one buffer at a time. When a buffer to /dev/hpOh. Such an interface differs
is allocated, if it has insufficient memory from the block interface in that the block
for the block to be stored, pages are taken buffer cache is bypassed.
from free buffers to fill out the buffer, and Each disk driver maintains an active
the now-empty free buffers are put on the queue of pending transfers. Each record in
empty queue. Many buffers are of the sizes the queue specifies whether it is for reading
of the major blocks of the file systems in or writing, a main memory address for the
use. UNIX file systems, however, fre- transfer, a device address for the transfer
quently contain many small files as well as (usually the 512-byte block number), and a
larger ones, and there are consequently transfer size (in bytes). It is simple to map
many buffers of one, two, or three times the information from a block buffer to what
the common fragment sizes. is required for this queue.
The truncate system call may be used to It is almost as simple to map a piece of
reduce the size of a file without decreasing main memory corresponding to part of a
it to zero. If a file is truncated to a size that user process’s virtual address space into a
is not a multiple of the major block size of buffer. For instance, this is what a raw disk
its file system, the last block in the file may interface does. Transfers directly between
be a fragment. If there is a block buffer a user’s virtual address space and a device
corresponding to the new end of the file at are allowed by this mapping. The size of
that time, the buffer must shrink to match the transfer is limited by the physical de-
the now-smaller file block. In this case, a vice, some of which require an even number

Computing Surveys, Vol. 17, No. 4, December 1985


406 l J. S. Quarterman, A. Silberschatz, and J. L. Peterson
of bytes. However, it may be larger than The user process does not have to do a read
the largest block that the block buffer sys- system call first. If there is not enough
tem can handle, and it may be a size that space in the raw queue when an input in-
is not a multiple of the page cluster size terrupt occurs, the data are thrown away.
(nor a multiple of 512 bytes). The software When a user process uses a read system
restricts the size of a single transfer to what call to attempt to get data from the device,
will fit in a Is-bit word; this is an artifact any data already in the canonical queue may
of the system’s PDP-11 history and of the be returned immediately. If there are not
PDP-11 derivation of many of the devices enough data in the canonical queue, the
themselves. read system call will block until interrupts
The kernel accomplishes transfers for cause enough data to be put on the canon-
swapping and paging simply by putting the ical queue. Transfer of characters to the
appropriate request on the queue for the canonical queue is triggered when the in-
appropriate device. No special swapping or terrupt routine puts an end-of-line charac-
paging device driver is needed. ter on the raw queue. If there is a process
The 4.2BSD file system implementation blocked on a read from the device, it is
was actually written and largely tested as a awakened and its system half does the
user process that used a raw disk interface transfer. Some conversion is done during
before the code was moved into the kernel. the transfer: Carriage returns may be trans-
lated to line feeds and special handling of
5.3 C-Lists
other characters may be done.
It is also possible to have the device
Terminal drivers and the like use the char- driver bypass the canonical queue so that
acter buffering system. This involves small the read system call returns characters di-
(usually 26byte) blocks of characters kept rectly from the raw queue. This is known
in linked lists. There are routines to en- as raw mode.
queue and dequeue characters for such lists.
Although all free character buffers are kept 6. COMMUNICATIONS
in a single free list, most device drivers that
use them limit the number of characters Many tasks can be accomplished in isolated
that may be queued at one time for a single processes, but many others require inter-
terminal port. process communication (IPC). Isolated
A write system call to such a device puts computing systems have long served for
characters on an output queue for the de- many applications, but networking is in-
vice. An initial transfer is started, and creasingly important, especially with the
hardware interrupts cause dequeuing of increasing use of personal workstations.
characters and further transfers. These in- Resource sharing ranging up to true distrib-
terrupts are handled asynchronously by the uted systems is becoming more common.
kernel and independently of user processes.
6.1 Signals
The stack used is a special interrupt stack,
not the kernel stack of the system half of a Signals are a facility for handling excep-
user process. If there is not enough space tional conditions. They are not interprocess
on the output queue for all the data sup- communication in any usual sense, but they
plied by a write system call, the system call are an important facility of UNIX and need
may block until space becomes available to be discussed.
when interrupts transfer characters from There are a few dozen signals, each cor-
the output queue to the device. responding to a certain condition. A signal
Input is similarly interrupt driven. Ter- may be generated by a keyboard interrupt,
minal drivers, however, typically support by an error in a process, such as a bad
two input queues: the raw queue and the memory reference, or by a number of asyn-
canonical queue. The raw queue collects chronous events, such as timers or job con-
characters exactly as they come from the trol signals from the C shell. Almost any
terminal port. This is done asynchronously: signal may also be generated by the kill

Computing Surveys, Vol. 1’7, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System l 407
system call, which may, in turn, be exer- is being handled by the user process, other
cised by the kill command from the shell. signals for the same process are automati-
The interrupt signal, SIGINT, is usually cally blocked by the kernel. Still, signals
produced by typing the C (control C) char- are mostly appropriate for handling excep-
acter on a terminal keyboard. This signal tional conditions, not for large-scale IPC.
is used to stop a command before it would
otherwise complete. The quit signal, 6.2 Interprocess Communication
SIGQUIT, is usually produced by typing
the ^\ character (ASCII FS) and has a Interprocess communication (IPC) tradi-
similar effect, except that it also causes the tionally has not been one of UNIX’s strong
process involved to write its current mem- points.
ory image to a file called core in the current Most UNIX distributions have not per-
directory for use by debuggers. (Such char- mitted shared memory because the PDP-11
acters that send signals or cause other spe- hardware did not encourage it. System V
cial effects when typed at the terminal may does support a shared memory facility, and
all be set by the user. ASCII DEL and ^U one was planned for 4.2BSD but not imple-
are two other common choices for the in- mented owing to time constraints. Shared
terrupt and quit characters.) The signals memory presents a problem in a networked
SIGSTOP and SIGCONT are used by the environment, since network accesses can
C shell in job control to stop and restart a never be as fast as memory accesses on the
process. SIGILL is produced by an illegal local machine. Although one could pretend
instruction and SIGSEGV by an attempt that memory was shared between two sep-
to address memory outside of a process’s arate machines by copying data across a
legal virtual memory space. network transparently, still the major
When a signal is generated, it is queued benefit of shared memory, speed, would be
until the system half of the affected process lost.
next runs. This usually happens soon, since The pipe is the IPC mechanism most
the signal causes the process to be awak- characteristic of UNIX and is basic to the
ened if it has been waiting for some other pipeline facility described in Section 7.3. A
condition. The default action of a signal is pipe permits a reliable unidirectional byte
for the process to kill itself and many sig- stream between two processes. It is tradi-
nals have a side effect of producing a core tionally implemented as an ordinary file,
file. with a few exceptions. It has no name in
Arrangements can be made for most sig- the file system, being created instead by the
nals to be ignored (to have no effect) or for pipe system call (the resulting file descrip-
a routine in the user process (a signal han- tor may then be manipulated by the. read,
dler) to be called. There is one signal (the write, close system calls, and by others,
kill signal, number nine, SIGKILL) that which may be used on ordinary files). A
cannot be ignored or caught by a signal pipe’s size is fixed, and once all data pre-
handler. SIGKILL is used, for instance, to viously written into one have been read out,
kill a runaway process that is ignoring other writing recommences at the beginning of
likely signals, such as SIGTERM, which is the file (pipes are not true circular buffers).
the soft kill signal, which many processes One benefit of the small size (usually 4096
expect to catch and then die gracefully. bytes) of pipes is that pipe data are seldom
Signals can be lost: If another of the same actually written to disk, being usually
kind is sent before a previous one has been kept in memory by the block buffer cache
processed by the process it is directed to, system.
the first one will be overwritten and only In 4.2BSD, pipes are implemented as a
one will be seen by the process. special case of the socket mechanism [Lef-
Signals have been reimplemented in fler et al. 1983a], which provides a general
4.2BSD according to a new model that interface not only to facilities such as pipes,
avoids many of the race conditions of the which are local to one machine, but also to
older model. In particular, while one signal networking facilities.

Computing Surveys, Vol. 17, No. 4, December 1985


408 l J. S. Quarterman, A. Silberschatz, and J. L. Peterson
A socket is an end point of communica- that does arrive. This type is supported
tion. A socket in use usually has an address in the Internet domain by the UDP
bound to it. The nature of the address protocol.
depends on the communication domain of l SOCK-RDM. Reliably delivered message
the socket. A characteristic property of a sockets would transfer messages that
domain is that processes communicating in would be guaranteed to arrive, and would
the same domain use the same address for- otherwise be like the messages trans-
mat. A single socket ordinarily communi- ferred using datagram sockets. This type
cates in only one domain. is currently unsupported.
The two domains currently imple- l SOCK-RAW. Raw sockets allow direct
mented in 4.2BSD are the UNIX domain access by processes to the protocols that
(AF-UNIX) and the Internet domain support the other socket types. For ex-
(AF-INET). The address format of the ample, in the Internet domain it is pos-
UNIX domain is that of ordinary file sys- sible to reach IP with SOCK-RAW,
tem pathnames, such as /alpha/beta/ below TCP or UDP, which are reached
gamma. Processes communicating in the by SOCK-STREAM and SOCK-
Internet domain use DARPA Internet com- DGRAM, respectively. This capability is
munications protocols such as TCP/IP useful for developing new protocols.
and Internet addresses, which consist of
a 32-bit host number and a 16-bit port The socket facility has a set of system
number. calls specific to it. The socket system call
There are several socket types, which rep- creates a socket. It takes as arguments
resent classes of services. Each type may or specifications of the communication do-
may not be implemented in any communi- main, the socket type, and the protocol to
cation domain. If a type is implemented in be used to support that type. The value
a given domain, it may be implemented returned is a small integer called a socket
by one or more protocols, which may be descriptor, which is in the same name space
selected by the user. as file descriptors, indexes the same de-
scriptor array in the user structure in the
l SOCK-STREAM. Stream sockets pro- kernel, and has a file structure allocated for
vide reliable duplex sequenced data it. The file structure points not to an inode
streams. No data are lost or duplicated structure but rather to a socket structure,
in delivery, and there are no record which keeps track of the socket’s type,
boundaries. This type is supported in the state, and the data in its input and output
Internet domain by the TCP protocol. In queues.
the UNIX domain, pipes are imple- For another process to address a socket,
mented as a pair of communicating the socket must have a name. A name is
stream sockets. bound to a socket by the bind system call,
l SOCK-SEQPACKET. Sequenced which takes the socket descriptor, a pointer
packet sockets would provide data to the name, and the length of the name
streams like those of stream sockets, ex- as a byte string. The contents and length
cept that record boundaries would be pre- of the byte string depend on the address
served. This type is unsupported in format. The connect system call is used
4.2BSD. The Xerox Network Services to initiate a connection. The arguments
sequenced packet protocol is supported are syntactically the same as for bind,
in the NS domain in 4.3BSD. and the socket descriptor represents the
l SOCK-DGRAM. Datugram sockets local socket, whereas the address is
transfer messages of variable size in that of the foreign socket to attempt to
either direction. There is no guarantee connect to.
that such messages will arrive in the same Many processes that communicate by us-
order in which they were sent, or that ing the socket IPC follow the client/server
they will be unduplicated, or that they model. In this model, the server process
will arrive at all, but the original message provides a service to the client process.
(record) size is preserved in any datagram When the service is available, the server

Computing Surveys, Vol. 17, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System 409

process listens on a well-known address, The select system call may be used to
and the client process uses connect, as multiplex data transfers on several file de-
above, to reach it. scriptors and/or socket descriptors. It may
A server process uses socket to create a even be used to allow one server process to
socket and bind to bind the well-known listen for client connections for many ser-
address of its service to it. Then it uses the vices and fork a process for each connection
listen system call to tell the kernel it is as it is made. This is done by doing socket,
ready to accept connections from clients, bind, and listen for each service, and then
and how many pending connections the doing select on all the socket descrip-
kernel should queue until the server can tors.When select indicates activity on a de-
service them. Finally, the server uses the scriptor, the server does accept on it and
accept system call to accept individual con- forks a process on the new descriptor re-
nections. Both listen and accept take as an turned by accept, leaving the parent process
argument the socket descriptor of the orig- to do select again.
inal socket. Accept returns a new socket
descriptor corresponding to the new con-
6.3 Networking
nection; the original socket descriptor is
still open for further connections. The This section assumes some basic knowledge
server usually uses fork to produce a new of the concepts of networking separate com-
process after the accept to service the client, puter systems, or hosts, by means of net-
while the original server process continues work protocols over communication media
to listen for more connections. to form networks [Tanenbaum 19811.
There are also system calls for setting Almost all current UNIX systems sup-
parameters of a connection and for return- port the UUCP network facilities, which
ing the address of the foreign socket after are mostly used over dial-up phone lines to
an accept. support the UUCP mail network and the
When a connection for a socket type such USENET news network. These are, how-
as SOCK-STREAM is established, the ad- ever, at best rudimentary networking facil-
dresses of both end points are known and ities, as they do not even support remote
no further addressing information is login, much less remote procedure call or
needed to transfer data. The ordinary read distributed file systems. These facilities are
and write system calls may then be used to also almost completely implemented as
transfer data. user processes, and are not part of the
The simplest way to terminate a connec- operating system proper.
tion and destroy the associated socket is to Many installations that have 4.2BSD
use the close system call on its socket de- systems have several VAXs or workstations
scriptor. One may also wish to terminate such as Suns connected by networks. Al-
only one direction of communication of a though the 4.2BSD distribution does not
duplex connection, and the shutdown sys- support a true distributed operating sys-
tem call may be used for this. tem, still remote login, file copying across
Some socket types, such as SOCK- networks, remote process execution, etc.,
DGRAM, do not support connections, and are trivial from the user’s viewpoint.
instead their sockets exchange datagrams, 4.2BSD supports the DARPA Internet
which must be individually addressed. The protocols [RFCS n.d.; MIL-STD n.d.]
system calls sendto and recufrom are used UDP, TCP, IP, and ICMP on a wide range
for such connections. Both take as argu- of Ethernet, token ring, and IMP (ARPA-
ments a socket descriptor, a buffer pointer NET) interfaces. The standard Internet
and the length of the buffer, and an address application protocols (and their corre-
buffer pointer and length. The address sponding user interface and server pro-
buffer contains the address to send to for grams) Telnet (remote login), FTP (file
sendto and is filled in with the address of transfer), and SMTP (mail) are supported.
the datagram just received by recufrom. The 4.2BSD also provides the 4.2BSD-specific
number of data actually transferred is re- application programs (and underlying net-
turned by both system calls. work protocols), rlogin (remote login), rep

Computing Surveys, Vol. 17, NO. 4, December 1985


410 l J. S. Quarterman, A. Silberschatz, and J. L. Peterson

Figure 11. Network reference models


and layering.

(file transfer), rsh (remote shell execution), The ARPANET and its sibling networks
and other, more minor, applications, such that run IP and are connected together by
as talk (remote interactive conversation). gateways form the ARPA Internet. This
The framework in the kernel to support is a large, functioning internetwork that
networking [Leffler et al. 1983131is acces- appears to the naive user to be one large
sible via the socket interface and is in- network, owing to the design of the pro-
tended to facilitate the implementation of tocols involved [Cerf and Cain 1983;
further protocols (4.3BSD includes the XE- Padlipsky 19851. It is also a test bed for
ROX Network Services protocol suite). ongoing internet gateway research. The
The first version of the code involved was IS0 protocols currently being designed and
written by Rob Gurwitz of BBN as an add- implemented take many features from this
on package for 4.1BSD. already functional DOD internetwork.
Several models of network layers are rel- Whereas the IS0 model is often inter-
evant to the 4.2BSD implementations. preted as requiring a limit of one protocol
These models are diagramed in Figure 11. per layer, the ARM allows several protocols
The Open System Interconnection (OSI) in the same layer. There are only three
Reference Model for networking [ISO protocol layers in the ARM:
19811of the International Organization for
Standardization (ISO) prescribes seven
layers of network protocols and strict meth- Process/Applications subsumes the Ap-
ods of communication between them. An plication, Presentation, and Session lay-
implementation of a protocol may only ers of the IS0 model. Such user-level
communicate with a peer entity speaking programs as the File Transfer Protocol
the same protocol at the same layer, or with (FTP) and Telnet (remote login) exist at
the protocol-protocol interface of a proto- this level.
col in the layer immediately above or below Host-Host corresponds to ISO’s Trans-
in the same system. port and the top part of its Network
The 4.2BSD networking implementa- layers. Both the Transmission Control
tion,. and to a certain extent the socket Protocol (TCP) and the Internet Proto-
facihty, is more oriented toward the AR- col (IP) are in this layer, with TCP on
PANET Reference Model (ARM) [Padlip- top of IP. TCP corresponds to an IS0
sky 19831. The ARPANET in its original Transport protocol and IP performs the
form served as proof of concept for many addressing functions of the IS0 Network
networking concepts such as packet switch- layer.
ing and protocol layering. It serves today Network Interface spans the lower part
as a communications utility for researchers. of the IS0 Network layer and all of
The ARM predates the IS0 model and the the Data Link layer. The protocols in-
latter was in large part inspired by the volved here depend on the physical
ARPANET research. network type. The ARPANET uses the

Computing Surveys, Vol. 17, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System 9 411

IMP-Host protocols, while an Ethernet There are projects in progress at various


uses Ethernet protocols. organizations to implement protocols other
than the DARPA Internet ones, including
The ARM is primarily concerned with the protocols IS0 has thus far adopted to
software, and so there is no explicit net- fit the OS1 model.
work hardware layer. However, any actual The socket facility and the networking
network will have hardware corresponding framework use a common set of memory
to the IS0 Hardware layer. buffers, or mbufs. These are intermediate
The networking framework in 4.2BSD is in size between the large buffers used by
more generalized than either the IS0 model the block I/O system and the C-lists used
or the ARM, although it is most closely by character devices. An mbuf is 128 bytes
related to the latter. long, 112 bytes of which may be used for
User processes communicate with net- data; the rest is used for pointers to link
work protocols (and thus with other pro- the buffer into queues and to indicate how
cesses on other machines) via the socket much of the data area is actually in use.
facility, which corresponds to the IS0 Ses- Data are ordinarily passed between lay-
sion layer, since it is responsible for setting ers (socket/protocol, protocol/protocol, or
up and controlling communications. protocol/network interface) in mbufs. This
Sockets are supported by protocols, pos- ability to pass the buffers containing the
sibly by several, layered one on another. A data eliminates some data copying, but
protocol may provide services such as reli- there is still frequent need to remove or add
able delivery, suppression of duplicate protocol headers. It is also convenient and
transmissions, flow control, or addressing, efficient for many purposes to be able to
depending on the socket type being sup- hold data of the size of the memory man-
ported and the services required by any agement page. Thus it is possible for its
higher protocols. data to reside not in the mbuf itself, but
A protocol may communicate with an- elsewhere in memory. There is an mbuf
other protocol or with the network inter- page table for this purpose, as well as a pool
face that is appropriate for the network of pages dedicated for this use.
hardware. The general framework places
little restriction on what protocols may
6.4 Distributed Systems
communicate with what other protocols, or
on how many protocols may be layered on 4.2BSD is not a distributed operating sys-
top of one another. The user process may, tem, so this section concentrates less on
by means of the SOCK-RAW socket type, 4.2BSD and more on UNIX in general than
access protocols underlying the transport most sections.
protocols. This capability is used by routing Several attempts have been made to pro-
processes and also for new protocol devel- vide some distributed facilities, ranging
opment. from the ability to mount a remote file
There tends to be one network interface system as if it were attached to the local
driver for each hardware model of network machine, to LOCUS, which is a true dis-
controller. The network interface is respon- tributed operating system.
sible for handling characteristics specific to There is at least one 4.2BSD imple-
the local network being addressed in such mentation of an extended file system (not
a way that the protocols using the interface distributed with 4.2BSD) that allows
need not be concerned with them. mounting a file system on a disk connected
The functions of the network interface to a remote machine as if it were directly
depend largely on the network hardware, connected to the local machine. It suffers
which is whatever is necessary for the net- from performance problems due to exces-
work it is connected to. Some networks may sive generality: There is no record locking,
support reliable transmission at this level, and thus there can be no local caching of
but most do not. Some provide broadcast data; every byte read or written has to be
addressing, but many do not. transferred over the network.

Computing Surveys, Vol. 17, No. 4, December 1985


412 l J. S. Quarterman, A. Silberschatz, and J. L. Peterson

Version 8 has a similar capability of There may be personal workstations with


mounting a file system from a remote sys- bit-map screens, real-time systems for ac-
tem over a network as if it were local quiring data, mainframes for bulk data
[ Weinberger 19841. The networking imple- storage and manipulation, and array pro-
mentation that supports it is unrelated to cessors or supercomputers for large-scale
the 4.2BSD implementation, as are the pro- number crunching. It is unlikely that a
tocols used. A user process needs only the closed system would be supported on all
ordinary file access system calls to use such equipment and operating systems. A
either of these remote mount capabilities single operating system or a single version
once the tile system is mounted. of one may not be suited for all of the above
The IBIS distributed file system from tasks. Many vendors support their own op-
Purdue [Tichy and Ruan 19841 also has erating systems, which are adapted to a
this file access transparency and goes fur- particular kind of task, and would not be
ther. IBIS claims file location transparency; willing to adopt another vendor’s operating
that is, the user does not need to know system.
which host actually contains a given file. Nonetheless, it is necessary to share in-
Efficiency and robustness are promoted by formation in such a heterogeneous comput-
replication of data on several hosts. Files ing environment. Information sharing in
may also be migrated to the host from the DARPA Internet, among a very diverse
which they are most commonly accessed. range of machines, has long been supported
IBIS is implemented on top of 4.2BSD and by the client/server model, which requires
thus can use a variety of underlying net- the individual machines to support only
work hardware. However, it evidently cur- certain well-defined protocols, not entirely
rently runs only on VAX CPUs. identical operating systems. This is the
The LOCUS Distributed UNIX System open system approach. 4.2BSD uses this
from Locus Computing Corporation [But- model in providing the basic Internet ser-
terfield and Popek 1984; Popek et al. 19811 vices of remote login, file transfer, mail
is a true distributed operating system. It exchange, and remote job execution.
provides not only a distributed file system, A recent useful application of the open
but also transparent process execution on system approach is Sun Microsystems’
any of the participating CPUs that can Network File System (NFS) [Walsh et al.
support the process, even on a different 19851. This allows file access transparency
CPU type than that from which the process and a large degree of file location transpar-
is invoked. The system runs on a variety of ency. A foreign file system (or a part of
CPU types. It uses network protocols de- one) may be mounted on the local system,
signed specifically for its purposes, and just as a file system on a local disk may be
which are not interoperable with other pro- mounted. A file on the foreign system then
tocol suites above the network layer. Lower appears to the user on the local system as
layers that may be used include Ethernet. just another file in the file system. Al-
The implementors claim object code com- though this has only been implemented
patibility with both System V and 4BSD. thus far on 4.2BSD (on a variety of vendor
LOCUS is an example of a closed system, hardware), there is no reason why it could
in which every participant must be running not be implemented on a different operat-
the same software. Such systems have the ing system. Then one could mount a foreign
advantage of being able to efface most evi- VMS, TOPS-20, or MSDOS file system on
dence of multiple CPUs and tile systems so a local UNIX tile system, and access the
that they appear to the user as a single VMS (or other) files as if they were UNIX
computer system. They can also be very files, using an apparent UNIX directory
efficient. structure to find them. Similarly, a UNIX
Most research organizations and many file system mounted on a VMS machine
businesses have a variety of hardware, from would appear as part of the VMS file sys-
several vendors, supporting different oper- tem and directory structure. (This flexibil-
ating systems and with different purposes. ity has led to problems of acceptance of

Computing Surveys, Vol. 17, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System 413

NFS by the UNIX community because The dollar sign is the usual Bourne shell
some UNIX features, such as setuid capa- prompt and the Is typed by the user is the
bility, file locking, and device access, have list directory command. Most commands
been sacrificed for interoperability with may also take arguments, which the user
other operating systems.) types after the command name on the same
NFS is not a distributed operating sys- line and separated from it and each other
tem; rather, it is a network (not distributed) by white space (spaces or tabs).
file system. Thus it cannot provide the Although there are a very few commands
transparent process execution of a closed built into the shells, a typical command is
system like LOCUS. However, the basic represented by an executable binary object
services of remote login, file transfer, and file, which the shell finds and executes. The
remote process execution are already sup- object file may be in one of several direc-
ported by 4.2BSD, and NFS does provide tories, a list of which is kept by the shell.
transparent file access and transparent file This list is known as the search path and is
location on heterogeneous systems, owing settable by the user. The directories /bin
to its open system design [Joy 1984; Morin and /usr/bin are almost always in the
19851. search path, and a typical search path on a
BSD system might look like this:
7. USER INTERFACE ( . /usr/local /usr/imb /bin /usr/bin )

Although most aspects of UNIX appropri- The Is command’s object file is /bin/is and
ate for discussion in this paper are imple- the shell itself is /bin/sh (the Bourne shell)
mented in the kernel, the nature of the user or /bin/csh (the C shell).
interface is sufficiently distinctive and dif- Execution of a command is done by a
ferent from those of most previous systems fork (or ufork) system call followed by an
to deserve being discussed. exec of the object file (see Figure 3 in Sec-
tion 2.1). The shell usually then does a wait
to suspend its own execution until the com-
7.1 Shells and Commands mand completes. There is a simple syntax
(an ampersand at the end of the command
The command language interpreter in
line) to indicate that the shell should not
UNIX is a user process like any other. It is
wait. A command left running in this man-
called a shell, as it surrounds the kernel of
ner while the shell continues to interpret
the operating system. It may be substituted
further commands is said to be a back-
for, and there are in fact several shells in
ground command, or to be running in the
general use [Korn 1983; Tuthill 1985133.
background. Processes on which the shell
The Bourne shell [Bourne 19781, written by
does wait are said to run in the foreground.
Steve Bourne, is probably the most widely
The C shell in 4BSD systems provides a
used, or at least the most widely available.
facility calledjob control (implemented par-
The C shell [Joy 801, mostly the work of
tially in the kernel) that allows moving
Bill Joy, is the most popular on BSD sys-
processes between foreground and back-
tems.
ground, and stopping and restarting them
There are also a number of screen- or
on various conditions. This allows most of
menu-oriented shells, but we describe here
the control of processes provided by win-
only the more traditional line-oriented in-
dowing or layering interfaces, and requires
terfaces.
no special hardware.
The various common shells share much
of their command language syntax. The
shell indicates its readiness to accept an- 7.2 Standard I/O
other command by typing a prompt, and Processes may open files as they like, but
the user types a command on a single line, most processes expect three file descriptors
for example, (see Section 4.6 and Figure 9) to already be
$ Is open, having been inherited across the exec

Computing Surveys, Vol. 17, No. 4, December 1985


414 l J. S. Quarterman, A. Silberschutz, and J. L. Peterson

(and possibly the fork) that created the ately for reading or writing. A stream may
process. be closed with fclose.
These file descriptors are numbers 0, 1, The shells have a simple syntax for
and 2, more commonly known as standard changing what files are open for a process’s
input, standard output, and standard error. standard I/O streams, that is, for standard
Frequently, all three are open to the user’s I/O reduction:
terminal. Thus the program can read what
# either shell
the user types by reading standard input, $ 1s>filea # direct output of k; to
and the program can send output to the file filea
user’s screen by writing to standard output. $ pr <filea >fileb # input from filea
Most programs also accept a nonterminal and output to
file as standard input or standard output. fileb
The standard error file descriptor is also $ lpr < fileb # input from fileb
open for writing and is used for error out-
put, whereas standard output is used for # direct both standard
ordinary output. output and error
to errs
There is a user-level system library that % lpr <fileb >& # C shell
many programs include because it buffers errs
I/O for efficiency. This is the standard I/O # lpr <fileb >errs # Bourne shell
library. It has routines called fread, fwrite, 2>&1
and fseek, which are analogous to the lower
level read, write, and Lseek system calls. Standard I/O redirection.
Whereas the system calls are applied to a
The Is command produces a listing of the
file descriptor, the standard I/O routines
names of files in the current directory, the
are applied to a stream, which is declared
pr command formats the list into pages
in C as a pointer to a structure that con-
suitable for outputting on a printer, and the
tains the file descriptor and the buffer.
lpr command sends the formatted output
Writes by the program to a stream by fwrite
do not cause a write system call until the to a printer.
whole buffer is filled. Similarly, if the buffer
7.3 Pipelines, Filters, and Shell Scripts
is empty when an fread is done, a read
system call will be done to fill it, but suc- The above example of I/O redirection could
ceeding fread calls will fetch data out of the have been done all in one command, as
buffer until the end of the buffer is reached.
% 1s1pr 1lpr
Thus the library minimizes the number of
system calls and does the actual I/O tran- Each vertical bar tells the shell to arrange
fers in efficient sizes, whereas the program for the output of the preceding command
retains the flexibility to read or write to the to be passed as input to the following com-
standard I/O system in transfers of any mand. The mechanism that is used to carry
size appropriate to the program’s algo- the data is called a pipe (see Section 6.2)
rithms. For maximum efficiency, the li- and the whole construction is called a pipe-
brary normally sets the size of the buffer to line. A pipe may be conveniently thought
the block size of the file system correspond- of as a simplex, reliable, byte stream, and
ing to the stream. is accessed by a file descriptor, like an
Use of the standard I/O library is indi- ordinary file. In the example, the write end
cated by the inclusion of the parameter file of one pipe would be set up (see Section
stdio.h in its source. Such a program will 2.1) by the shell to be the standard output
find streams already open for standard in- of Is and the standard input of pr; there
put, output, and error under the names would be another pipe between the pr and
stdin, stdout, and stderr, respectively. lpr commands.
Other streams may be opened by fopen, A command like pr that passes its stand-
which takes a filename and a mode argu- ard input to its standard output, perform-
ment and returns a stream open appropri- ing some sort of processing on it, is called

Computing Surveys, Vol. 1’7, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System 9 415

a filter. (Filters may also take names of put with extraneous information.
input files as arguments, but never names Avoid stringently columnar or binary
of output files.) Very many UNIX com- input formats. Do not insist on inter-
mands (probably most) may be used as active input.
filters. Thus complicated functions may be (3) Design and build software, even oper-
pieced together as pipelines of common ating systems, to be tried early, ideally
commands. Also, common functions, such within weeks. Do not hesitate to throw
as output formatting, need not be built into away the clumsy parts and rebuild
numerous commands, since the output of them.
almost any program may be piped through (4) Use tools in preference to unskilled
pr (or some other appropriate filter). help to lighten a programming task,
All the common UNIX shells are also even if you have to detour to build the
programming languages, with the usual tools and expect to throw some of them
high-level programming language control out after you have finished with them
constructs, as well as variables internal to [McIlroy et al. 19781.
the shell. The execution of a command by
the shell is analogous to a subroutine call. These principles have led to the devel-
A file of shell commands, a shell script, opment of not only byte-oriented, typeless
may be executed like any other command, files, but also of pipes and pipelines, and
with the appropriate shell being invoked the ability to combine existing programs to
automatically to read it. Shell programming build new ones. Much of the power and
may thus be used to combine ordinary pro- popularity of UNIX is based on the facili-
grams conveniently for quite sophisticated ties provided by the shells and other pro-
applications without the necessity of pro- grams such as make, awk, sed, lex, yacc,
gramming in conventional languages. find, SCCS, etc. The principles also lead
The isolation of the command interpreter indirectly to the use of programming lan-
in a user process, the shell, both allowed guages such as C, which are not machine
the kernel to stay a reasonable size and dependent, particularly not assembly lan-
permitted the shell to become rather so- guage. That, in turn, leads to portability,
phisticated, as well as substitutable. The which may well be the single greatest rea-
instantiation of most commands invoked son for the popularity of the system. Such
from the shell as subprocesses of the shell matters are beyond the scope of this paper,
facilitated the implementation of I/O redi- but references are provided in the next
rection and pipelines, as well as making section.
background processes (and later Berkeley’s One may consider these ideas to be mere
job control) easy to implement. elaborations of structured programming
principles, or ad hoc practical techniques,
7.4 The UNIX Philosophy or “creeping elegance,” and there is some
There is something sometimes referred to of all of that here. It is true that many users
as “the UNIX philosophy.” Part of it has of UNIX, including many applications de-
been elaborated or at least alluded to above velopers, do not seem to be aware of or at
(see Section 1.2). Here is a statement of it least do not use these principles any more,
that is both more explicit and also more but their worth is still evidenced by the
oriented toward the levels of the operating system itself, and there has been some ef-
system that the ordinary user sees: fort of late to reacquaint people with them
[Pike and Kernighan 19841.
(1) Make each program do one thing well. This is a programmer’s philosophy, and
To do a new job, build afresh rather the result is a programmer’s system. It does
than complicate old programs by add- not limit the areas of application of the
ing new “features.” system, however, because a good program-
(2) Expect the output of every program to ming environment makes it easy for the
become the input of another, as yet programmer to build user interfaces to fit
unknown, program. Do not clutter out- applications to the needs of the end user.

Computing Surveys, Vol. 1’7, No. 4, December 1985


416 l J. S. Quarterman, A. Silberschatz, and J. L. Peterson
8. BIBLIOGRAPHIC NOTES Particular thanks from J. S. Quarterman to Sam
Leffler for technical assistance and forbearance over
The set of documentation that comes with the years, to Jim Peterson for perfectionism, to Avi
UNIX systems is called the UNIX PRO- Silberschatz for patience, to John B. Chambers for
GRAMMER’S MANUAL (UPM) and is persistent participation, and to Carol Kroll, who
traditionally organized in two volumes mediated.
[UPMV7 19831. Volume 1 contains short
entries for every command, system call, and REFERENCES
subroutine package in the system and is
BABAO~LU, O., AND JOY, W. N. 1981. Converting a
also available on line via the man com- swap-based system to do paging in an architec-
mand. Volume 2-Supplementary Docu- ture lacking nage-referenced bits. In Proceedings
ments (usually bound as Volumes 2A and of the 8th Symposium on Operating Systems Prin-
2B) contains assorted papers relevant to ciples (Pacific Grove, Calif., Dec. 14-16). ACM,
the system and manuals for those com- New York, pp. 78-86.
mands or packages too complex to describe BACH, M. J., AND BUROFF, S. J. 1984. Multipro-
cessor UNIX systems. AT&T Bell Zab.Tech J.,
in a page or two. Berkeley systems add 1733-1749.
Volume 2C [UPMBC 19831 to contain doc- BECK, R., AND KASTEN, R. 1985. VLSI assist in
uments concerning Berkeley-specific fea- building a multiprocessor UNIX system. In
tures. The whole 4.2BSD UPM has been USENIX Association Conference Proceedings
published by the USENIX Association, re- (Portland, Oreg., June 11-14). USENIX Assoc.,
El Cerrito, Calif., pp. 255-275.
organized in three logical volumes and five
BELL, C. G. 1985. Multis: A new class of multipro-
physical volumes. cessor computers. Science 228,462-467.
There are many useful papers in the two BLTJ 1984. The UNIX system. AT&T Bell Lab.
special issues of The Bell System Technical Tech. J. 63, 8 (Oct.).
Journal, BSTJ [1978] and BLTJ [1984]. BOURNE, S. R. 1978. An introduction to the UNIX
The first of these contains the paper with shell. Bell Syst. Tech. J. 1947-1972.
the most pervasive influence [Ritchie and BOURNE, S. R. 1983. The UNIX System. Addison-
Thompson 19781, which also appears in Wesley, Reading, Mass.
UPMV7 [ 19831. There are also many useful BSTJ 1978. UNIX time-sharing systems. Bell Syst.
papers in various USENIX Conference Tech. J. 57, 6, Pt. 2 (July-Aug.).
Proceedings, especially USENIX [ 19831, BUTTERFIELD, D., AND POPEK, B. 1984. Network
tasking in the LOCUS Distributed UNIX system.
USENIX [ 19841, and USENIX [ 19851. In USBNIX Association Conference Proceedings
Possibly the best book on general pro- (Salt Lake Citv, Utah. June 12-151. USENIX
gramming under UNIX, especially on the kssoc., El Cerri”to, Calif., pp. 62-71.
use of the shell and facilities such as yacc CABRERA, I,. F., KARELS, M. J., AND MOSHER, D.
and sed, is Kernighan and Pike [ 19841. Two 1985. The imnact of buffer management on net-
working software performance in Berkeley UNIX
others of interest are Bourne [1983] and 4.2BSD: A case studv. In USENIX Association
McGilton and Morgan [ 19831. Conference Proceedings (Portland, Oreg., June
11-14). USENIX ASSOC., El Cerrito, Calif., pp.
507-518.
ACKNOWLEDGMENTS
CERF, V. G., AND CAIN, E. 1983. The DOD Internet
Many useful comments on drafts of this paper were architecture model. Comput. Networks 7, 307-
received from John B. Chambers, William N. Joy, 318.
Mike Karels, Samuel J. Leffler, Bill Shannon, and CHAMBERS, J. B., AND QUARTERMAN, J. S. 1983.
Anthony 1. Wasserman. Much of the accuracy of the UNIX Svstem V and 4.1C BSD. In USENZX
Conference Proceedings (Toronto, Ontario, June).
paper is due to them, and to the others who reviewed
USENIX Assoc., El Cerrito, Calif., pp. 267-291.
it. All of the inaccuracies and other faults are due to
4.1C BSD was the test version immediately pre-
the present authors. ceding 4.2BSD, and is much more closely related
An early draft of this paper was used by Peterson to 4.2BSD than to 4.1BSD. Papers appear in the
and Silberschatz in writing the chapter on 4.2BSD in distributed documentation for both System V and
Peterson and Silberschatz [1985]. Some text in the 4.2BSD detailing their differences from their
current paper is adapted from that chapter. Thanks predecessors.
to Bill Tuthill for the DITROFF macros for the COMER, D. 1984. Operating System Design: the Xinu
Contents. Approach. Prentice-Hall, Englewood Cliffs, N. J.

Computing Surveys, Vol. 17, No. 4, December 1985


4.2BSD and 4.3BSD as Examples of the UNIX System 417
COMPTON, M., ED. 1985. The evolution of UNIX. LEFFLER, S. J., KARELS, M., AND MCKUSICK, M. K.
UNIX Reu. 3, 1 (Jan.). 1984. Measuring and improving the perform-
GOBEL, G. H., AND MARSH, M. H. 1981. A dual ance of 4.2BSD. In USENIX Association Confer-
processor VAX 11/780. Tech Rep. TR-EE 81-31, ence Proceedings (Salt Lake City, Utah, June
School of Electrical Engineering, Purdue Univ., 12-15). USENIX Assoc., El Cerrito, Calif., pp.
West Lafayette, Ind., Sept. 237-252.
HOLT, R. C. 1983. Current Euclid, the UNIX Sys- MCGILTON, H., AND MORGAN, R. 1983. Introducing
tem, and Tunis. Addison-Wesley, Reading, Mass. the UNIX System. McGraw-Hill, New York.
IS0 1981. IS0 open systems interconnection-Basic MCILROY, M.D., PINSON, E. N., AND TAGUE, B. A.
reference model. ZSO/TC 97/SC 16, 719 (Aug.). 1978. Forward. Bell Syst. Tech. J. 57, 6, 1899-
JOY, W. N. 1980. An introduction to the C shell. 1904.
UNIX Proerammer’s Manual. 4.2 Berkelev Soft- MCKUSICK, M. K. 1985. A Berkeley odyssey. The
ware Distrybution, Virtual VAX-11 Version, vol. names and events shaping 10 years of Berkeley
2C, Computer Systems Research Group, Dept. of UNIX. UNIX Rev. 3, 1,30.
Electrical Engineering and Computer Science, MCKUSICK, M. K., JOY, W. N., LEFFLER, S. J., AND
Univ. of California, Berkeley, Aug. FABRY,R. S. 1984. A fast file system for UNIX.
JOY, W. N. 1984. The UNIX system in the labora- ACM Trans. Comput. Syst. 2, 3 (Aug.), 181-197.
tory. UNIX/ WORLD 1, 4,34-38. An earlier version appears in UNIX Program-
JOY, W. N., COOPER, E., FABRY, R., LEFFLER, S., mer’s Manual, 4.2 Berkeley Software Distribu-
MCKUSICK, M. K., AND MOSHER, D. 1983. tion, Virtual VAX-11 Version, vol. 2C, Computer
4.2BSD System Manual, revised July, 1983. Systems Research Group, Department of Electri-
UNIX Programmer’s Manual, 4.2 Berkeley Soft- cal Enaineering and Comnuter Science. Univ. of
ware Distribution, Virtual VAX-11 Version, vol. California, Berkeley, Aug:
2C, Computer Systems Research Group, Dept. of MCKUSICK, M. K., KARELS, M., AND LEFFLER, S.
Electrical Engineering and Computer Science, 1985. Performance improvements and func-
Univ. of California, Berkeley, Aug. A terse but tional enhancements in 4.3BSD. In USENZX
detailed summary of the facilities provided by Association Conference Proceedings (Portland,
4.2BSD, illustrated mostly by the system calls Orea..
-. June 11-14). USENIX Assoc., El Cerrito.
involved. Calif., pp. 519-531.
JUNG, R. S., 1985. Porting the AT&T demand paged MILLER, R. 1984. A demand paging virtual memory
UNIX implementation to microcomputers. In manager for System V. In USENIX Association
USENIX Association Conference Proceedings Conference Proceedings (Salt Lake City, Utah,
(Portland, Oreg., June 11-14). USENIX Assoc., June 12-15). USENIX Assoc., El Cerrito, Calif.,
El Cerrito, Calif., pp. 361-370. pp. 178-182.
KERNIGHAN, B. W., AND PIKE, R. 1984. The UNIX MIL-STD. (N.d.). Military Standards for DOD Znter-
Programming Enuironment. Prentice-Hall, En- net Protocols. Naval Publications and Forms Cen-
glewood Cliffs, N. J. ter, Philadelphia, Pa. The ARPA Internet pro-
KERNIGHAN, B. W., AND RITCHIE, D. M. 1978. The tocols are defined bv the set of military standards
C Programming Language. Prentice-Hall, Engle- IP (MIL-STD-1777), TCP (MIL-STD-1778),
wood Cliffs, N. J. FTP (MIL-STD-1780). SMTP (MIL-STD-1781).
and TELNET (MIL-STD-1782). See ARPANET
KORN, D. 1983. KSH-A shell programming lan- Working Group Requests for Comments. SRI
guage. In USENIX Association Conference Pro- International, Menlo Park, Calif.
ceedings (Toronto, Ontario, June) USENIX
Assoc.. El Cerrito. Calif.. PP. 191-202. The Korn MOHR, A. 1985. The genesis story. Tales of how
shell, which is said to subsume the good points of UNIX took shape as a product. UNIX Rev 3,
both the Bourne shell and the C shell. 1, 18.
LEFFLER, S. J., FABRY, R., S., AND JOY, W. N. MORIN, R. 85. The future of the workstation. UNIX
1983a. A 4.2BSD interprocess communication Reu. 3, 1, 52.
primer. UNIX Programmer’s Manual, 4.2 Berke- ORGANICK, E. I. 1975. The Multics System: An
ley Software Distribution, Virtual VAX-11 Ver- Examination of Its Structure. M.I.T. Press,
sion, vol. 2C, Computer Systems Research Group, Cambridge, Mass.
Dept. of Electrical Engineering and Computer PADLIPSKY, M. A. 1983. A perspective on the
Science, Univ. of California, Berkeley, Aug. ARPANET reference model. In Proceedings of
LEFFLER, S. J., JOY, W. N., AND FABRY, R. S. 1NFOCOM 82 (Apr.). Also appears as Chapter 5
1983b. 4.2BSD networking implementation of Padlipsky, M. A. 1985. The Elements of Net-
notes, revised July, 1983. UNIX Programmer’s work Style. Englewood Cliffs, N. J., and in RCF
Manual. 4.2 Berkelev Software Distribution, Vir- (N.d.). ARPANET Working Group Requests for
tal VAX-11 Version; vol. 2C, Computer Systems Comments. RFC871. SRI International. Menlo
Research Group, Dept. of Electrical Engineering Park, Calif.
and Computer Science, Univ. of California, PADLIPSKY,M. A. 1985. The Elements of Networking
Berkeley, Aug. Style. Prentice-Hall, Englewood Cliffs, N. J.

ComputingSurveys,Vol. 17, No. 4, December 1985


418 l J. S. Quarterman, A. Silberschatz, and J. L. Peterson
PEIRCE, N. 1985. Putting UNIX in perspective: An ROSLER, L. 1984. The evolution of C-Past and fu-
interview with Victor Vyssotsky. The manager of ture. AT&T Bell Lab. Tech. J. 63, 8, 1685-1699.
AT&T’s Multics project remembers the way it STROUSTRUP, B. 1984. Data abstraction in C. AT&T
was. UNIX Rev. 3, 1, 58. Bell Lab. Tech. J. 63, 8, 1701-1732.
PETERSON, J., AND SILBERSCHATZ, A. 1985. TANENBAUM, A. S. 1981. Computer Networks. Pren-
Operating System Concepts, 2nd ed. Addison- tice-Hall, Englewood Cliffs, N. J.
Wesley, Reading, Mass.
THOMPSON, K. 1978. UNIX implementation. Bell
PIKE, R., AND KERNIGHAN, B. W. 1984. Program Syst. Tech. J. 57,6, Pt, 2. 1931-1946.
design in the UNIX environment. AT&T Bell
Lab-Tech. J. 63,8, 1595-1605. TICHY, W. F., AND RUAN, Z. 1984. Towards a dis-
tributed file svstem. In USENIX Association
POPEK, B., ET AL. 1981. Locus: A network transpar-
Conference Proceedings (Salt Lake City, Utah,
ent, high reliability distributed system. In Pro-
June 12-15). USENIX Assoc., El Cerrito, Calif.,
ceedings of the 8th Symposium on Operating Sys-
pp. 87-97.
tems Principles (Pacific Grove, Calif., Dec. 14-
16). ACM, New York, pp. 169-177. TUTHILL. W. 1985a. The evolution of C: Heresy and
RFCS (N.d.). ARPANET Working Group Requests prophecy. UNIX Res. 3, 1,BO.
for Comments. SRI International, Network Infor- TUTHILL, W. 1985b. The shell game: A comparison
mation Center, Menlo Park, Calif. This series of of the C and Bourne shells. UNIX/ WORLD 2, 2
technical notes includes the specifications for the (Mar.), 103.
ARPA Internet protocols IP (RFC-791), ICMP UPMSC 1983. UNIX Programmer’s Manual, 4.2
(RFC-792). TCP (RFC-793). UDP (RFC-768). Berkeley Software Distribution, Virtual VAX-11
FTP (RF&765), SMTP (RFC-821), ‘and TEL: Version, vol. 2C, Computer Systems Research
NET (RFC-854), plus related papers. All the pro- Group, Department of Electrical Engineering and
tocols are indexed in Assigned Numbers (RFC- Computer Science, University of California,
943) and Official Protocols (RFC-944). See also Berkeley, Calif. (Aug.).
Military Standards for DOD Internet Protocols.
UPMV7 1983. UNIXProgrammer’s Manual, 7th ed.,
Naval Publ. Forms Ctr., Philadelphia, Pa.
~01s. 1 and 2. Holt, New York.
RITCHIE, D. M. 1978. A retrospective. Bell Syst.
USENIX 1983. USENIX Association Conference
Tech. J. 57,6, Pt. 2, 1947-1969.
Proceedings (Toronto. Ontario. June). USENIX
RITCHIE, D. M. 1979a. Protection of data file con- Assoc., El-Cerrito, Calif. ’
tents. United States Pat. no. 4,135,240, United
States Patent Office, Washington, D.C., Jan. 16, USENIX 1984. USENIX Association Conference
1979. Assignee: Bell Telephone Laboratories, In- Proceedings (Salt Lake City, Utah, June 12-15).
corporated, Murray Hill, N.J. Appl. No.: 377,591. USENIX Assoc., El Cerrito, Calif.
Filed: Jul. 9, 1973. USENIX 1985. USENIX Association Conference
RITCHIE, D. M. 1979b. The UNIX I/O system. Proceedings. (Portland, Oreg., June Ll-14).
UNIX Programmer’s Manual, 7th ed., ~01s. 1 and USENIX Assoc., El Cerrito, Calif.
2. Holt, New York. UNIEJEWSKI, J. 1985. UNIX System V and BSD4.2
RITCHIE, D. M. 1984a. The evolution of the UNIX Compatibility Study. Apollo Computer, Chelms-
time-sharing system. AT&T Bell Lab.Tech. J. 63, ford, Mass.
8,1577-1593. WALSH, D., LYON, R., AND SAGER, G. 1985.
RITCHIE, D. M. 1984b. Reflections on software re- Overview of the Sun network file system. In USE-
search. Reflections on the environment that nur- NIX Association Conference Proceedings (Dallas,
tured the develonment of UNIX. Ritchie’s ACM Tex., Jan. 23-25). USENIX Assoc., El Cerrito,
Turing Award lecture. UNIX Rev. 3, 1, 28. Calif., pp. 117-124.
RITCHIE, D. M., AND THOMPSON, K. 1978. The WEINBERGER, P. J. 1984. The Version 8 network file
UNIX time-sharing system. Bell Syst. Tech. J. system. In USENIX Association Conference Pro-
57, 6, 1905-1929. The original version in Com- ceedings (Salt Lake City, Utah, June 12-15).
mun. ACM 7, 7 (July 1974). 365-375 described USENIX Assoc., El Cerrito, Calif., $86.
Version 6, while this one describes Version 7. WILSON, 0. 1985. The business evolution of the
RITCHIE, D. M., JOHNSON, S. C. LESK, M. E., AND UNIX system. The details of how AT&T organ-
KERNIGHAN, B. W. 1978. The C programming izational changes made UNIX commercially
language. Bell Syst. Tech. J. 57, 6, 19912019. available. UNIX Rev. 3, 1, 46.

Received March 1985; revised November 1985; final revision accepted February 1986.

Computing Surveys, Vol. 17, No. 4, December 1985

Das könnte Ihnen auch gefallen