You are on page 1of 146


a Software Production Method

0 Charles |mI
Thi s thesi s descri bes an organizational schema, desi gned to yi el d very hi gh programmi ng
producti vi ty i n a si mpl i fi ed task envi ronment whi ch excl udes schedul i ng, system desi gn,
documentati on, and other engi neeri ng acti vi ties. The l everage provided by hi gh
producti vi ty can, i n turn, be used to si mplify the engi neeri ng tasks.
Di ffi cul ty of communi cati ons wi thi n a producti on team, caused by the i nherentl y rapi d
creati on of probl em speci fi c local language, i s posi ted as t he major obstacl e t o the
i mprovement of productivity. The thesi s proposes a combi nati on of ideas for si mpl i fyi ng
communi cati ons between programmers. Meta-programs are i nformal , written commu
niati ons, from the meta-programmer, who creates the l ocal language, to techni ci ans who
l earn i t and actual l y wri te the programs.
The abstract notion of l ocal l anguage is resolved i nto the questi ons: what are the objects
that should be named, and what shoul d their names be? The answers involve the
concept of painted types ( rel ated to types i n programming languages), and nami ng
conventions based on the i dea of i denti fying objects by thei r types.
A method of state vector syntax checki ng for debuggi ng the programs produced i n the
hi gh producti vi ty envi ronment i s descri bed.
Descripti ons of the relati onshi ps or contrasts between the meta-programming
organization and the rel evant software engineeri ng concepts of high l evel languages,
egol ess programming, structured programmi ng, Chief Programmer Teams, and automatic
program veri fi cati on are also gi ven.
To verify the predicti ons of the meta-programmi ng theory, a series of experi ments were
performed. In one of the projects, three programs were produced from the same
speci fi cations, by three different groups in a control l ed experiment. During the l ongest
experiment 1 4, 000 l ines of code were written, at an average rate of 6. 1 2
l i nes/man-hour. The control l ed experiments showed that comparabl e results can be
obtained by different persons acting as meta-programmers. The difficul t experimental
comparisons of the meta-programmi ng and conventional organizations, however, yi el ded
interesti ng, but Inconcl usive, results.
Software engi neering, management of software production, measurement of program
ming producti vity, meta-programmi ng, painted types, nami ng conventions, state vector
syntax checki ng
1 . 53, 2. 2, 2. 42, 2. 43, 3. 50, 4. 22
Loyot8 HtII Hod / FIo A|lo / LaIto|Dt 44
Copyright 1977
Charles Simonyi
First, I would like to thank my parents for thei r courageous support, which was tendered
often under difficult circumstances. I am also extremely grateful to Mr. Niels lvar Bech,
former President of A/S Regnecentralen, Copenhagen; and Professor Cornel i us A. Tobias
of the Uni versity of Cali fornia, Berkeley, for their timely and generous help.
The idea of i ncluding experimental verification i nto the thesis was due to Dr. Jerome l.
Elkind. Manager of the Computer Science Laboratory of the Xerox Palo Alto Research .
Center. Substantial resources for the experiments, in manpower, computers, and other
facilities, were provided by Xerox Corporation. Dr. Elkind's continuing Slpport was
essential for obtaining these resources. Dr. El ki nd also gave helpful advice about the
management aspects of the thesis and the experi ment.
Professor Vinton Cerf, my Pri ncipal Adviser, helpe me in forming my ideas i nto a thesis
with great patience. Discussions with Professor Cordell Green, who was also on the
readi ng commi ttee, were also very helpful. The day-to-day i nteractions with Dr. Butler
W. Lampson, the third committee member, were extremely rewardi ng and pleasurable.
The expenditures of resources in the experi ments were wisely moni tored by a Board of
Directors, chaired by Dr. Elkind. Other members of the Board were: Dr. Jame Morris,
and Robert F. Sproull.
Dr. Ben Webreit contri buted much valuable criticism. Advice on some combinatorial
problems was given by Dr. Leo Guibas.
I am deeply i ndebted to the seven i ndi viduals who participated i n the experi ments. Their
di l igent effort was absolutely essential to the success of the experiments. The valuable
contri butions of Dr. Patrick Baudelaire and Thomas Malloy deserve special mention.
This thesis was typed by the author himself. The illustrations were drawn by Joe
Leitner. Vicki Parish and Gail Pilki ngton helped with the layout work.
CHAPER : The Business of the Software Producer
1.1 I ntroduction
1.2 Software production as a process _technology
1.3 Design strategies when production i s i nexpensi ve
1.4 Process technology and software shari ng
1.5 Measures of software productivity
1.6 What determi nes producti vi ty?
CHAPER 2: Meta-Programming
2.1 I ntroduction
2.2 Opti mi zi ng software producti vi ty
2.3 Task orders and meta-programs
2.4 Abstractions and operations
2.5 Naming of types and quanti tie
2.6 Debuggi ng as an organi zed acti vi ty
2.1 Other meta-programmi ng conventions
2.1.1 Divisions in meta-programs
2.1.2 Nami ng conventions for procedures
2.7.3 Name hyphenation
2.1.4 Parameter order i n procedures
2.7.5 Use of comments for explanation
2.7.6 Programmi ng language syntax extensions
2.7.1 Standard operations
2. 8 Meta-programmi ng example
2. 9 Comparisons and combi nations wi th other programmi ng methods
2.9.1 High level languages
2.9.2 Buddy system, Egoless Programmi ng
2.9.3 Structured programmi ng. goto-less programmi ng
2.9.4 Chief programmer teams
2.9.5 Automatic program verification
-i v-
CHAPR Experimental Verification
3.1 Introduction
3.2 Exprimentl approch
3.3 Experimentl environment
3.4 Exprimentl setup
3.5 Measurement methos
3.6 Task speifications
3.7 Prouctivity accounting
3.8 Potential sources of measurement errors
3.9 Experimental results
3.9.1 Early experiments group (Projets A and B)
3.9.2 Project C
3.9.3 Projects 01, 02 and D control
CHAPR 4: Conclusion
4.1 Conclusions from the experi mental result
4.2 Recommendations for future work
A: Programmi ng Test
: Format of the Measurement File
C: Project C System Descri ption
D: Task Order for Project D
E: Summary of the Measurements
1. Bui l di ng a runway
2. Organi zi ng conti nuous process software production
3. Design strategy when i mplementation is i nexpens ive
4. The effect of high producti vi ty on software shari ng
5. Approxi mate conversion factors relati ng the most common units
of production and ti me
6. Structure of a software production team
7. Contours of the function P(T,Q,M)
8. Localization of programmi ng error by bi nary search
9. Organi zation of the experi ment
10. Producti vi ty plots for Projects A+B and C
11. Producti vi ty plots for Project 01, 02, and L con trol
12. Productivi ty plots for the participants i n Project D control
13. Cumul ati ve production plots for Projects 01, 02, and D control
-vi -
1 . 1 Introduction
The explosi ve growth of the computer i ndustry is l i kel y to conti nue in the comi ng years.
Between 1960 and 1971, the number of appl ication areas for computers has grown from
300 to 2000, and it i s estimated to reach 7700 by 1985 [Kosy]. The major problem faci ng
the i ndustry is whether programmi ng technol ogy can be i mproved to keep up wi th the
expected growth. I mprovements woul d be necessary even if the current demands coul d
wel l satisfied; i n fact, recent performance has been characterized as a "cri sis" [NAT01].
further underl i ni ng the urgency and i mportance of posi tive action.
There are many ways of approachi ng the problem: developers of advanced software
production tools and techniques and educati onal i nstitutions teachi ng the use of the tools
and techni ques wi l l certai nl y make i mportant contri butions. Key contri butions must also
be made by management and management scientists. The best resul ts, however, wi l l come
from concerted efforts to i mprove the tools, techni ques, and management as a total
production system.
Hi storical devel opment of programmi ng technol ogy confi rms this view. In the late
ni neteen-si xties and early seventies, management of software producti on, starti ng wi th
fi rst level management and higher, was mostly concerned wi th al l ocation of resources i n
response to changi ng ci rcumstances. Management had practical l y no di rect i nfl uence i n
technical matters; most, and often al l , tasks whi ch requi red detai led understandi ng of
software concepts were. entrusted to programmers, hence programmers, by defaul t, could
mai ntai n absol ute product control [Brandon]. Programmi ng management handbooks had
to be content wi th recommendi ng management controls based solely on vi si bl e i ndi cators:
specifications, flowcharts, comments, and other forms of documentation [Metzger]
[Wei nwurm].
At the same time, computer theoreticians were devel opi ng si gni fi cant new i deas
i nterrelating hi gh level l anguages [Hoare-Wi rth] [Dahi -Nygaard], proofs of correctness
[Naur1] [ Fioydl ], and structured programmi ng [ Dijkstra]. These developments di d not
have an early i mpact on management practices, however. The reports from the i mportant
1 968-69 NATO Software Engi neering conferences [NAT01] [NAT02] did not yet show a
desi re to attack the recogni zed technical and manageri al problems si mul taneously.
In 1971 Weinberg's book, "The Psychology of Computer Programmi ng" al ready discussed
technical and styl i stic issues together wi th new forms of organi zations and
i nterrel ationshi ps based on "egoless programmi ng". Meanwhi l e the state of the art i n
engi neeri ng theory was further advanced by the cl arificati on of data structuri ng [Hoare],
new languages [Wi rth2], new modul ari zation criteria [Parnas1], and firm styl i stic
pri nci pies [Kernighan-PI auger].
A first cl ear break in this pattern of separate development of technol ogy and management
occurred after practical experience with the Chief Programmer Team (CPT) organization
had been publ i shed [Mi l l s] [ Bakerl]. I n a CPT, the Chief Programmer, in a first level
managerial position, provides technical leadership by programming critical program
sections and assigning specific subtasks to other team members. The organization relies
on a number of supporting techniques (3.8. 7), especial l y on the institutional use of
structured programming.
[ Horowitz].
More recent books echo simil ar senti ments [ Brooks]
Meta-programming, the main subject of the present dissertation, and its host organi zation.
the Software Production Team (SPT) form an integrated method of software production.
J ust as in a CPT, the SPT's first level manager, the meta-programmer, is directly involved
in programmi ng activi ti es. The techniques supporting SPT, on the other hand, are
different from those used in CPT. These differences wil l be discussed in detai l in Section
3. 8. The most i mportant feature of the SPT organization is the emphasis on opti mizing
productivity in the simpler phases of programming, which are detailed design
(meta-programming), coding, and debugging.
The dissertation is organized as fol lows. Fol l owing the present introduction, al ternative
task environments are proposed for the software producer. The purpose of the argument
i s to moti vate the specific opti mi zation criteria for the SPT organi zation. Productivity is
shown to be the key parameter. The l everage provided by a highl y productive producer
can be used to simpl ify schedul i ng, design, and other forms of decision making.
Difficul ty of communications within a production unit, caused by the rapid creation of
problem specific, local, l anguage, is posited as the major obstacle to the i mprovement of
Technical detail is fi rst presented in Chapter 2. Wi thi n the SPT organization, l anguage
creation is careful l y control l ed by the meta-programmer, who issues written
meta-programs defining local l anguage and specifyi ng the program l ogic as wel l .
Technicians write and debug the actual code on the basis of the meta-programs. A
n umber of conventions for increasi ng the effectiveness of meta-programs, and for
organizing the debugging activi ty are al so presented. A detai l ed exampl e is gi ven to
i l l ustrate these ideas. The chapter concl udes wi th a series of comparisons and possibl e
combinations of SPT and other software engineering concepts, CPT i n particul ar.
Chapter 3 describes a series of experiments which were performed to measure the actual
performance of the method. The experiments were designed to serve as demonstrations of
practical capability. The probl ems the experi mental teams worked on, requi red between 3
and 12 man-months of effort. One of the experiments i ncl uded 3 Software Production
Team and a tradi ti onal l y organi zed control group working on the same programmi ng
probl em. Chapter 4 summarizes the resul ts of the experi ments and offers some
concl udi ng remarks.
1.2 Software Production as a Process Technolog
In thi s section we shal l discuss possi ble answers to the basic questions faci ng the software
producer: What is the product? Who are the customers and who should they be? I n
[Drucker] Peter Drucker convi nci ngly shows that the way the busi ness i s defined,
together with the choi ce of customers, can determi ne the vi abi l i ty of a busi ness enterprise
or a whole i ndustry. The contemporary software i ndustry has defi ned i ts mi ssion as
fol l ows:
The software product is the compl ete (tested and documented) set of software
components which satisfies some data processi ng need. The customer, or user, i s
the enti ty wi th the need.
It is unfortunate that such a definition did not appear in the general literature, and had to b
composed by the present writer. As such, it needs some clarifications. First, the products actually
delivered may or may not satisfy the definition. Also note that system programming is not
excluded under the broad interpretation of the term data processing favored by [Naur2].
Such a product has been certai nl y useful and saleable. Nevertheless, serious problems have
arisen. The cost of the software product, both i n absolute terms and rel ati ve to hardware
costs, has been risi ng dramatical l y [Boehm] [Aron]. Often, producers have been unable
to l i ve up to thei r promises: software was del i vered l ate, i ncomplete or otherwise not
satisfyi ng the user's need. Managers of l arge systems fought heroical l y as schedul es
sl i pped and hundreds of man-years were consumed [Brooks]. What has gone wrong?
Observations by [Metzger] showed that the problems have been often due to unstable
problem defi ni tions, unreal i stic deadl i nes and poor pl anni ng. Other sources ([Boehm],
[Royce]) reported si mi lar l ists. The common factor i n the causes ci ted is uncertainty.
The stabi l i ty of the problem defi ni ti on i s uncertai n, the deadl i nes are uncertai n and so on.
Uncertai nty i s general l y disl i ked; hence, the propensity of managers - and the advice of
many experts - is to strive to el i mi nate it by planni ng. Thus, general ly, the problems
were defi ned and deadl i nes schedul ed wi th the utmost care permi tted by the
ci rcumstances. When the uncertai nti es i nherent in the problems and the schedules
perturbed the plans, projects fai l ed or produced di sappoi nti ng resul ts. I nstead of
ascri bi ng such events to the lack of sufficient preparation and planni ng, they mi ght be
more preci sel y diagnosed as fai l ures to deal effectively wi th uncertai nty. Val uable i nsight
i nto the proper treatment of uncertai nty can be gai ned from the experi ences of other
i ndustries.
Drilling for water or oil is a notoriously risky business. Yet the outfit which
performs the actual drilling operations is shielded from most of the risk by a
simple formula: they will drill for $1/foot in clay, $100/foot in rock until asked
to stop. The entrepreneur who commissioned the well has absorbed the
uncertainties of what lies underground: clay, rock, oil or more rock. The basis for
the absorption may be a scientific geological survey, intuition about probabilities
or a tax scheme. This is not to say that the contractor's operations are without
risks: the drilling contractor is responsible for tool changes, safety, the
productivity of personnel and dealings with the union.
Commissioned to build a new runway for an airport, the civil enginering firm
doesn't need to worry about the uncertainties of future needs for the runway. The
decision, right or wrong, has been made to proceed; partially on the basis of the
plans and cost figures submitted by the engineers. A Ready-Mix contractor will
perform the largest subtask in the project: the pouring of concrete. Since the
market price of poured concrete is fairly stable, the engineers' cost estimate is
really an estimate of the volume of concrete that will be required. The engineers
are thus responsible for ("absorb the uncertainties of") the validity of the plans,
while the contractor must produce, deliver and pour the concrete as the plans
require. These relationships are summarized in Figure l.
The above are "gedanken" or "thought" examples, idealized to suggest metaphors for alternative
organizations of software production. After the inferences will have been drawn, the full impact of
the painful realities of life will also have to be analyzed to reach a conclusion.
Uncertainty absorption is not the same as a reduction or elimination of uncertainty by
planning; it is merely a promise of action which enables others to operate free from the
"absorbed" uncertainty.
Partitioning responsibilities along the lines of the above examples results in a number of
remarkable developments. Since the participants individually have less to worry about,
specialization can take place. As new information becomes available, changes can be
implemented. If a project is unsuccessful, blame and financial burden can be correctly
assigned. Reputations can be established for reliability and capacity independent of the
merits and, to a large extent, the nature of the projects.
The engineers, for example, might have selected the contractor on the basis of his
good performance in an otherwise disastrous freeway project. The customer

airport, in turn, trusts the engineers because of their previous successful execution
of 0 runway, although in a different part of the country. Uncertainty about the
local conditions will be borne mostly by the contractor.
The driller's formula permits his customer to stop the sinking of the well, which
he might want to do if recovered core samples look unpromising. A reasonable
Needs air transportation
Absorbs uncertainty about future transportation needs
Needs new runway
Absorbs uncertainty about the abstraction of runway. Determines volume
of concrete required
Needs concrete
Absorbs uncertainty about delivery, price, quality
Pours concrete
(which turns out to be a runway)
(which satisfies future transportation needs)
Figure Bui ldi ng a runway
minimum charge may be required to protect the driller from caprice. There is no
"loss" incurred by the driller when asked to stop, the stoppages merely tend to
reduce the average depth of the wells he drills. The extra business attracted by the
protection more than compensates for the inconvenience.
In these examples the flexible relationship between the contractor and his customer is
possible because the contractor's service is a continuous process, characterized by:
1. Small unit size. Units are determined by the boundaries where delivery can stop
without residue. Since small is interpreted relative to the requirements of a
customer, small unit size implies the expected delivery of large number of units.
2. Uniform production method for the units. Hence, the production process is the
continuous application of the production method to make the units. The
repetitive nature of the production process means that its properties can be
precisely measured for control and optimization. Scheduling of delivery becomes
a matter of reserving a portion of the productive capacity.
The units produced need not be uni form or i nterchangeable. Even the homogeneous
concrete-mix ceases to be uni form when thought of as a product to be del i vered at a given place
and a gi ven ti me: two shi pments could not be i nterchanged. The producti on process, however, is
uniform for the delivered mix: prepare mix fi rst, then load, del iver, and unload.
The key to understanding the difficulties of the software industry is the observation that
the software producer is expected to absorb too much uncertainty, in particular the
uncertainties about the customer's needs, the method of solution, planning, scheduling,
writing, testing, and documenting the implementation of the solution. To improve the
situation, the absorption of uncertainties should be partitioned and continuous process
production of software should be introduced. The characteristics of continuous process
production, as defined above, are manifestly incompatible only with the engineering
phases of software production: analysis of the user's needs, choice of algorithms, user
documentation, and acceptance testing. The production phases: detailed design, coding,
testing, and internal documentation, can be organized in a continuous process, as will be
shown in detail in the sequel. As a first step in implementing the partitioning suggested
by this distinction, we define the products of the production organization:
The units of production of the software production organization are lines of
proto-software which work toward the solution of well-defined problems. The
customers are software engineering organizations.
The relationshi ps between the user, the production organization and the engineering organization are
summarized in figure 2. For brevity's sake, we wi l l generally wri te code for proto-software.
The lines produced are interdependent in that they must fit together with other lines to
form procedures, modules or programs so that they can run on computers. Procedures or
Absorbs uncertainty about desirability and form of solution
Needs software solution
Well-defines problem, absorbing uncertainty of the volume of
proto-software required
Needs proto-software
Absorbs uncertainty about delivery, unit price, quality
Produces proto-software
(which is refined into software by the engineers)
(which satisfies the user's needs)
Figure 2 Organizing continuous process software production
modules may be the most common units of delivery, but small pieces of replacement code
may also be offered; at any rate, units are small and rule 1 above is satisfied. Lines are
the units of charge. They represent tangible incremental value for the customer, because
they can be individually associated with some aspect of the customer's problem, and they
are ready to be used in an environment which already exists or is formed by the other
lines delivered.
The question "How many lines or units is a program?" is just like asking how
many feet of concrete is a runway? Well, 100 ft. is not, 5,000 ft. is, and so is
10,000 feet. Which is the better runway? That depends on the airport's nees. If
the needs change, existing runways may be lengthened or the building of a new
runway may be cut short - which is not the same as leaving the runway unfinished.
Most problems cannot be solved by any single line of code, so the product, in general, can
only contribute to ("work toward") the solution of a larger problem. This allows the
producer to concentrate on the rate and efficiency of production, or productivity, and
charges the engineers with the responsibility of estimating the volume of code that will be
The technical term proto-software is used to distinguish the product from user software
which is refined from proto-software by the engineering organizations. Proto-software
comes in a single quality grade; it is, say, 99.7% correct. Refining improves the quality
further, as required. The other technical term, well-defined problem, implies that the
engineering organizations have absorbed substantial uncertainties in the process of
well-defining the users' not so well-defined problems. Indeed, well-defining is just the
engineering partition of the conventional design phase. The difference between
well-defining and the other partition, production design, is precisely that the former
absorbs uncertainty. Production testing and refining are similarly related. When
production testing reaches the 0.3% errors/line level, distinctions between actual
production mistakes and singularities due to definitional and user uncertainty become
blurred. At this point further testing is the best performed by the engineering
organization. When the engineers expose an error or decide on a change, they will ask the
producer to deliver the replacement lines of code.
The savings perceived by the end user will depend on the increases in the engineers' and
producer's productivity, weighted by the fractions of their respective participation in the
total effort. Our approach for getting the largest savings will be to obtain large
productivity gains in the production phase and at the same time ensuring that the value
contributed by the producer can dominate the engineers' share. Implicit in this strategy is
the belief that methods for the significant improvements in engineering productivity are
already available, for example in [Dijkstra] [Parnasl] [Wirthl] [Hoare]. . However, the
question of how engineering practices might be influenced by the access to highly
productive production organizations is a new issue which will be discussed in the next
1.3 Design Strategies when Production is Inexpensive
Design is easily identified as the critical phase in software production. In typical project
40% of the effort is spent designing [Boehm]; moreover, the quality of design greatly
affects the project schedule [Brooks]. In this section, we shall explore ways to reduce the
sensitivity of production costs to design.
The outputs of design are choices; to design is to make design decisions. There are two
activities supporting the decisions: first, alternatives must be proposed; and second, the
alternatives must be evaluated. The difficulty of creating alternatives ranges from
outright discovery to the simple recognition that a standard approach might work. The
evaluation of alternatives might take the form of intuitive or rigorous proofs of
correctness, and performance analyses.
It is often attractive to accept the overhead of conversions of related problems into the
domain of applicability of a highly productive technology. A manifold increase in the
productivity of software implementation would make design much more expensive
relative to implementation. The "distortion" of the price structure would tend to reverse
established preferences. In the paragraphs following, we shall elaborate on such
"reversed" operational decisions, which would be appropriate when the incremental design
costs exceed the cost of equivalent production. Figure 3 illustrates how the lowering of
implementation costs tends to push more decisions into the region of reversed preference
below the diagonal. As the coordinates indicate, the decisions involved must be capable
of converting, further decision making and implementation. It is interesting to note that
reversed decisions can be readily observed when implementation costs are of little or no
importance - as in certain phases of a space project or in emergencies.
I .J. Implement without exploring all alternatives
It is seldom possible to explore all alternatives for a decision, therefore the issue
here is a matter of degree. For a cost-effective approach, the cutoff point in
considering alternatives should be determined by the cost of further deliberations,
compared with the expected incremental value. If no further decisions are
pending on the choice, that is the current decision is an independent one, the best
result that can be expected is that the implementation will not have to be redone.
Thus the cost of implementation has direct bearing on the incremental value of
Figure Ja For typical decisions (area around the arrow) cost of additional
design is less than cost of equivalent implementation.
Figure Jb If implementation costs are lower, implementation may be
preferred for design (shaded area).
1 1
Discussion of non-independent, or basic, decisions is outside of the scope of this dissertation.
However, it should be pointed out that there are methods available to convert basic decisions into
independent ones by hiding the information about design decisions i n modules [Parnasl] (1.3.4).
This makes the effective handling of independent decisions even more importanl
Some guidelines for controlling independent decisions may be the following:
Some decisions are operationally unimportant. For example: a space/time
tradeoff opportunity in a situation where both space and time are plentiful.
Many decisions turn out to be operationally unimportant. For example: if
there is an important limit on space, space tradeoffs are consistently
made. If the limit is not reached, some of the tradeoffs become, in
retrospect, unimportant.
Sometimes seemingly important decisions are relatively unimportant. Such
situation may arise with the discovery of a serious problem which dwarfs
the existing ones. As a corollary, while there is some probability of a
serious unknown problem existing, the importance of all decisions is
Sometimes only implementation can suggest the right decision, and then a
pre-implementation decision is meaningless. Such is the case for many
human engineering and user requirement problems.
Implementation often suggests ways for better decisions. This means that
decisions are simpler to make and are more reliable the second time.
([Brooks] Chapter 2).
These observations can be combined into a startling but viable strategy: make the
meta-decision to consider all independent decisions initially unimportant. For
decisions which belong to the first four of the above five categories, this
treatment will be, in fact, proper. In the remaining fifth case, when the decision
"bounces", our loss will not be total since we are guaranteed valuable clues for the
correct decision.
Unimportant decisions should be made by reference to standards or by conscious
arbitrariness. The important thing is that the decisions be made, and made swiftly.
In management science, this principle has long had many adherents. In [Morrisl] Robert McNamara
is paraphrased as saying, "I n the past hour I have made a number of decisions resolving
controversies [regarding the standardization in single clothing items among the services] which have
been going on since the Department of Defense was created. None of these decisions was
important. The important thing is that I made a decision. [We should learn to] make unimportant
decisions quickly because action is better than inaction".
In conventional design practice, this strategy is not applicable because the expense
of implementtion or the schedule demands (or is perceived to demand) success
on the first attempt The low cost implementation is the crucial ingredient which
enables the conversion of design problems into a stream of unimportant and
independent decisions which can be processed efficiently.
It is worth noting that truly important decisions are not only expensive to make,
but they are also dangerous! By definition, the effects of errors in important
decisions can be disastrous. By contrast, unimportant decisions cannot, by
themselves, cause much harm. When dealing with unimportant decisions, the
designers' effectiveness can be measured, controlled and optimized continuously;
an improvement from 80% to 85% correct decisions, for example, may
considered significant
.J.2 Implement alternatives beyond a satisfactory one
In the previous section, we discussed how a decision-maker may bet on the
adequacy of an alternative without detailed evaluation of others. The low penalty
for a losing bet, that is the low re-implementation costs, combined with the
savings in evaluation costs, make the bets attractive. After a satisfactory solution
has been demonstrated, another type of bet may be made on the possibility of a
re-implementation being even better. Again, the lower the implementation costs,
the more appropriate the bet.
.J.J Implement an experimental system as improvements to a test bed
The requirement of software producers that the problems be well-defined does not
exclude their direct participation in research efforts. The researchers, presumably,
are trying to extend the limits of the technology in some area. To take advantage
of the leverage provided by the producer, they should first retreat and well-define
a system which is within the limits of technology, but not too far from the
eventual goal of the research. This system is called a test bed, and it can be
implemented by the software producer. Research can then proceed by piecemeal
extensions of the test bed into the experimental domain. Throughout the research
project, the researchers will benefit from a complete and working system, and
continuous feedback on the validity of their approach.
.J.1 Implement alternatives instead of making a critical choice
If the parallel implementation of several alternatives is initiated, the problem of a
priori evaluation can be replaced by the considerably simpler a posteriori
1 3
measurement. The price of conversion is high: all but one implementation will be
wasted. Still, severe scheduling constraints may tip the balance in favor of
accepting the price and postponing the decision. The option of aborting
alternatives prior their completion should be retained; the ability of the
continuous process producer to stop producing can be very helpful. The
implementation of modules should be ordered with special emphasis on the
earliest resolution of the major uncertainties.
1.3.5 Implement instead of analysing or simulating
Analytical tools and simulation are often used to predict the behavior of a
complex system without recourse to implementation. Nonetheless. implementation
is intellectually less demanding and measurements from even a partial
implementation may yield more precise or more credible results than simulation
or the analysis of a simplified model.
1.3.6 Rewrite instead of modifying, translating, or bootstrapping
Solving a problem by modifying an existing, related implementation has obvious
advantages: presumably, the cost of the new implementation will be reduced by the
value of the re-used portion of the existing one. However, the cost of
understanding the properties of the existing software, so that the proper
modifications may be determined, should also be considered. Although recent
developments in making software more readable [Dijkstra] tend to decrease the
cost of understanding, implementation costs may decrease even more and offset
the advantage of re-use in most cases. Implementation from scratch will also
involve "understanding", or production design; nevertheless, for ro utine problems,
it may be less expensive than the engineering design which would have to absorb
the uncertainties about the modifications. Also, the more complex the problem,
the smaller the probability of the existence of a related implementation.
1.3.7 Implement general rather than special solution
If the straightforward generalization of a special problem can be implemented at a
small extra cost, it. is often reasonable to do so. The general solution is more
likely to tolerate the inevitable escalation of demands; if there is a performance
penalty, the solution can be easily particularized.
1.3.8 Implement special rather than general solution
If the design of a problem turns out to be especi al l y difficul t, the possi bi l i ty of
i mplementing a scaled-down, speci al sol ution should be considered. The special
i mpl ementation can helpful i n a number of ways:
I t may show that the problems are more serious than thought
It may suggest an approach to the general problem (1. 3. 1).
I t can be used as a test bed ( 1. 3. 3).
I t wi ll insure agai nst a complete fai l ure si nce at least a part of the original
problem wi l l be solved.
1.3.9 Implement backup algorithms
The choice between alternative i mplementations ( 1. 3.4) can be delayed unti l
"run-ti me", when a dynamic decision can be made depending on system load,
normal- or restart operating mode, or even user preference. Thi s option can be
taken instead of compromising between opposing requi rements, for example
efficiency versus robustness or beginner versus expert user interface.
I .3. 10 Implement non-essential features
A tightly coupled engineering - producer complex may experience transients of
unused productive capacity. During such periods there is an opportuni ty to
implement discretionary additions to the software product, such as improved
reactions to errors, improved output formats, defaults, and so on. Such features
are easy to well-define; they generate enthusiasm, and they often turn out to be
indispensable after al l .
1.4 Process TecJology and Software Sharing
If an engineering or production organization can utilize the same software program to
solve two seemingly different problems, their effective productivity, as perceived by an
outside observer, is doubled. The program is said to be shared between the applications
which use it. Effective productivity can be greatly increased by sharing more software,
each among more applications. Moreover, if the shared programs are to be used within
the same system, it is often possible to save memory space using standard virtual memory
techniques [Dennis-VanHorn]. The commonality in the solution can make
documentation, training and use of the product easier, too. For example, if the l ine
1 5
editor code is shared, the line editing conventions in a time sharing executive and an
interactive debugger will be the same.
Despite these considerable incentives, shared software is not prevalent, for reasons that
can be surmised from the conditions of successful sharing:
First, commonality between problems must be recognized. A re-formulation of
one or both problems may be necessary to make the commonality apparent
Second, the uncertainties of the shared approach have to be absorbed over and
above the uncertainties of the individual problems. The shared solution will be
more complex and more expensive than any of the individual solutions to the
problems; within the limited context of any single problem, sharing is not
Sharing is most common when the conditions are easily satisfied. For example, the need
for mathematical functions is easy to recognize and the small uncertainties of their
sharing (such as domains, overhead when not in use, error conditions etc.) were cheerfully
absorbed by the high-level language designers and implementors. More complex software,
however, will be shared only if some organization has the intricate knowledge of the
applications to recognize commonality and if they also have responsibility for the
implementations so that the substantial uncertainties of sharing can be balanced by the
local benefits. It is also apparent that the conditions are independent of programmers'
attitude toward writing sharable code. This suggests that any attempt to improve software
sharing by exhorting programmers to "reform" is futile.
The counterargument from redactio ad absardam points out that programmers might simply refuse
to wri te sharable code. However, by assumption, the uncertainties of sharing have been absorbed
and hence the problems can be solved independently. lgn\ring the ethical problems, the re]asaiks
need not be told at al l that they are writing programs that mi ght be later shared.
The same method can be appl ied to many other controversies: documentation, comments, exhaustive
testing, use of various tools or other confl icts between local and global values. A manager could
absorb the uncertainty about documentation, for example, by rewarding a programmer exclusively
for doing documentation as planned, regardless of sl i ppages in project schedule or unappreciative
co-workers. Once the uncertainties are removed, the controversy disappears. I t is a separate
question whether or not the enforced methodology is actually useful.
The engineering organization in Figure 2 is a natural niche for software sharing
responsibility. The engineers can, in principle, recognize commonalities in the flow of
problems from different users, and they are also experienced in uncertainty absorption.
The high productivity of the proposed organization will amplify this sharing potential, as
shown in the following paragraphs.
Consider a software producer operating indefinitely in a perfectly stable production
environment, without any changes in personnel, computer systems, languages, or methods.
Figure 4a Small group is unable to accumulate critical software mass
(shaded) within project lifetime.
Figure 4b If group size is increased, subgroups will form
and the critical
mass will increase.
Figure 4c By increasing productivity, the small group can reach the
takeoff point.
Let us further postul ate that wi thi n thi s environment al l sharing opportunities are
exploited. Such a producer woul d be able to accumulate an extensive l i brary of the
stereotyped computer sci ence problems: assembl er, loader, compi ler, operating system,
i nformation storage and retrieval , l i near programmi ng, graphics and so on. At some
point, it woul d be discovered that the next problem - for example a support system for
l arge systems descri bed by [ Brown] - can be devel oped by large scale sharing of l i brary
i tems. By l arge scale sharing, we mean the sharing of substantial porti ons of complete
systems as contrasted wi th the, more common, small scale sharing of modul es.
An excellent example of large scale sharing i s the lnterLisp system described i n [Teitelman] in
whi ch the services of Dwim, the Programmer's Assistant and other powerful appl i cations are
available to one another as well as to the i nteractive user or to the user's programs.
We shall cal l the the point in ti me where l arge scale sharing is l i kely to commence the
takeoff point. Operating in the post takeoff regi me is exceptional l y rewarding: the
effective producti vi ty wi l l soar and the product qual i ty wi l l benefi t from the synergy of
In real i ty, the properties postul ated for the producer can only be approxi mated.
Individual programmers working alone wi l l take advantage of al most al l sharing
opportuni ties. Smal l , tightly kni t groups can come very close to opti mum because of the
number of interactions necessary for recogni zing commonal i ties in the problems is sti l l
relatively low. The development of the effective producti vi ty of such small producers i s
depi cted on Figure 4a. Note that the fi ni te project l ifeti me (symbol i zed by dashed l ine
on the figure) prevents the accumulation of the cri tical software mass (shaded area) for
large scale sharing. The l i feti me may be determi ned by local values which dictate
termination on the achievement of a l i mi ted goal. Even if the producer is i nterested i n
achieving as much as possi bl e, the project l i feti me wi l l be l i mi ted by natural personnel
turnover, people l osi ng i nterest, external schedul ing constraints, or computer systems,
l anguages and methods becoming obsolete.
A producer may try to reach the takeoff point wi thi n the project l ifeti me l i mi t by
assigning more people to the task. Unfortunately, as the number of interactions grows
steepl y wi th the number of people, sharing opportuni ties wi l l be mi ssed. Formal l y or
informal l y, subgroups of manageable si ze wi l l form, consciously excl uding the possi bi l i ty
of large scale sharing between the subgroups in order to control the cost of interactions
and al l ow work to proceed. The resul t is shown on Figure 4b: if the work force i s
doubled, the cri tical software mass doubles, too, leavi ng the takeoff point beyond reach.
On the other hand, a smal l production group operating at sufficiently high rates of
production can produce the cri tical software mass wi thi n the project l ifeti me l i mi t as
shown on Figure 4c. Thus we may concl ude that high producti vi ty can do more than j ust
reducing uni t costs; it wi l l make l arge scale sharing possible, increasing effective
productivi ty and product qual i ty.
To summarize the argument up to this point, we have proposed uncertainty absorption for
i mproving the software i ndustry's abi l i ty to deal with the uncertainties inherent in l arge
software problems. Uncertai nty absorption - the promise of action which enables others
to operate free of the uncertai nty - is particularly simple when production can be
organized as a continuous process whi ch can be measured, control led, and hence,
optimized. We di vided the software production task i nto an engineering phase, in whi ch
the user's problems are made wel l defined; and production phase, in which proto-software
is produced by a continuous process. The proto-software is given back to the engi neers
for refi nement to create the final product
If proto-software is inexpensive, design methodologies should be changed to conform to
the new economies. To this end, we l i sted a number of methods to di vert effort from
design to implementation. These methods offered new uses for the proto-software
product; for example, for exploring al ternative approaches. Thi s also impl ied that some
fraction of the proto-software produced wi l l never be refined, si nce i ts purpose wi l l have
been fulfi l led enti rely wi thi n the engineering organi zation.
Final l y, we noticed that addi tional benefi ts can be reaped from enabl ing smal l production
groups to amass software l i braries whi ch can be shared on a l arge scale.
1.5 Measures of Software Productivity
Productivi ty i s tradi tional l y defi ned as the relationshi p between the output of goods and
services and the inputs used in thei r production. Appl i ed to software production, the
output of program bul k should be expressed as a function of the i nputs: the time of
programmers and other labor and possibly overhead costs. I n general, this function
depends on the size and type of problem bei ng programmed [Pietrasanta] [Brooks].
Once the domain of discourse is held reasonably constant, two simpl ifyi ng
approxi mations are j ustified: al l inputs may be expressed in terms of programmer-hours
burdened wi th the overhead, and the productivi ty function i tself may be taken to be
l i near. Even if the si mpl ificati ons yi el d crude results, they may be useful in establ i shing
lower limits, the actual functions bei ng always worse than l inear [Brooks].
The way to obtain the simpl ified producti vi ty measure is then to take a bulk measure of
the software produced, such as l i nes of source, and di vi de i t by the number of man-hours
associated wi th i ts producti on. The resul ts of measurements are often expressed using
di fferent uni ts. The approxi mate conversion factors relating the most common uni ts are
summari zed in Figure 5.
1 l i ne ( hi gh l evel l ang) 1 statement
26 characters
1 l i ne ( l ow level lang)
1 man-month
1 man-year
5 machi ne i nstructi ons
1 machi ne i nstructi on
1 70 man-hours
2000 man-hours
Figure 5 Approxi mate conversion factors relating the most common
uni ts of production and time.
Two objections are often made to productivity measures. Some argue that the variations
between i ndivi dual productivi ties is too l arge for the measure to be a useful predictor.
Experi mental results showing di fferences as l arge as 1:26 are often quoted [Sackman].
It is hard to see how the employment of a programmer with, say, 5 ti mes lower
than average performance would be economi cal l y justified ( Note that 5 is the
approxi mate geometric mean of 1 and 26). Even di sregarding salary and
overhead, if thi s person spends more than 20% [ Mayer-Stal naker] of hi s time
communicating wi th other, 5 times more productive, programmers, making an
equal demand on thei r ti me, his total contri bution will be negative!
Weinberg attri butes such results to "ambiguous programming objectives" ([Weinberg]
page 128). I n Weinberg's experi ments, two groups were gi ven the same problem
descri ption whi ch also incl uded expl i ci t statements of objectives. The objectives set for
the groups were di fferent, however. The variation of results was greater between the
groups than wi thin. We can expect, therefore, that uncertainty absorption wi l l greatly
reduce the variation of i ndi vi dual producti vi ty among programmers wi th comparable
trai ni ng.
The other common objection is that management interest in l i nes per man-hour wi l l
merely i ncrease the bul k of programs by encouraging programmers to "write i nsi pid code"
[ McCl ure] [Cw]. Indeed, many mi sguided attempts might have had this result. The
correct approach is not to ask the programmers to be "more productive" but rather to
organi ze for productivity and reward the programmers for maki ng the organization work.
Peter Drucker's comments are remarkably appl i cable ([Drucker] page 267): "I t i s
fol l y to ask workers to take responsi bi l i ty for thei r job when the work has not
been studi ed, the process has not been synthesized, the standards and controls have
not been thought through, and the physical i nformation tools have not been
designed. I t is also managerial i ncompetence".
I t is significant that the defi ni ti on of producti vi ty and the defi ni tion of the product of
software production closely correspond - thi s is a di rect consequence of vi ewing
production as a continuous process. I t can be said then that the busi ness of the software
producer i s productivity. To improve producti vi ty is to improve the busi ness.
The software producer in steady-state would program a stream of smal l uni ts of
approximately equal complexi ty - al l probl ems bei ng wel l -understood ( 1.2). The accuracy
of the si mple l i near producti vi ty measure wi l l be very good under such condi tions. The
precise productivity figures wi l l be important to the producer for fine-tuni ng the
production process, and also to the engineering organi zation (Figure 2) for quantifi cation
of the uncertai nties to be absorbed.
How can producti vi ty be improved? One way is automation. In software production,
automation means the use of Artifical Intel l igence, very high l evel languages and
automatic proofs of program correctness. Whi le both the vol ume and the qual i ty of
research in these areas are high, practical results are not expected wi thi n the next 5-10
years [Bal zer] [Deutsch]. There remai n the short term sol uti ons to improve producti vi ty
by i mproving on the current manual techniques. Al though such sol utions do not compare
wel l wi th the l ong term promises of automation, there are areas of current practice where
substantial and i mmedi ate improvement could be made. One such area i s the uti l i zation
of the programmers' time. A reveal i ng set of measurements is quoted i n
([Mayer-Stalnaker] page 86). Accordi ng to this reference, the observed programmers
spent 14% of thei r time reading and 1 3% wri ting "wi th a l i st, card, or worksheet i n
evi dence", that i s i n "productive capaci ty". "Tal king or l i stening (Business)" took 17%.
The time of inexperi enced programmers and trainees is especi al l y poorly uti l i zed. They
are often gi ven ei ther meaningless tasks [ Metzger] or inordi nate responsi bi l i ty, and thus
are al lowed to fai l or cause harm. Clearly, there i s room for improvement
1 .6 What Determines Productivity?
Our problem, then, is to fi nd organizational methods to i ncrease programming
producti vi ty. To approach thi s problem we shal l fi rst explore the space of possi bl e
sol utions by investigating the parameters on which producti vi ty depends.
The producti vi ty of a programmer worki ng alone on a problem is determined by the ski l l
and moti vation of the programmer, and by the tools used. There are two reasons why
most probl ems cannot be solved by a single i ndi vi dual and hence must be solved by teams
or organizations: Fi rst, the problem may involve subtasks which requi re extraordinary
ski l l s possessed only by speci al i sts i n that area. The team approach then becomes
imperative if the ski l l s of the speci al i st are incomplete wi th respect to the whole
problem. The other reason i s that the producti vi ty of an i ndi vi dual i s insufficient to
solve most problems wi thi n the requi red time.
The producti vi ty of a group depends only partially on the producti vi ty of i ts members, at
least two other factors also have to be considered: specialization and communications.
Speci al i zation is the concentration of effort to a l i mi ted field of acti vi ty. If the
concentration is consistent over the l ong term we speak of area specialization, where the
area might be, for example, numerical analysis or channel programming. I n the short
term, the fi el d of concentration is si mply the subtask being solved and we have subtask
Area speci al i zation is often the sign of special ist's outstanding abi l i ty and motivation.
Because of hi s long term concentration in the area, the special ist can also acqui re greater
ski l ls and hence, within hi s area, hi s producti vi ty wi l l be better than non-speci al i sts'.
Outside of hi s field, the area speci al i st i s l i kely to perform worse than non-speci al i sts,
because of lack of experi ence and motivation. The concl usion i s that the attracti veness of
area speci al i zation depends on the long term importance of the area for the organi zation
employing the special i st. As a corol lary, if sol vi ng a problem requi res area speci al i zation
which is otherwise unattractive, the attracti veness of the problem is reduced.
Subtask special i zation has the same features as area speci al i zation, but on a smal ler scale.
Subtask speci al i sts certai nl y get better acquai nted with thei r own subtask than. wi th other
aspects of a problem bei ng solved, and thei r producti vi ty wi l l ri se on a learning curve.
Dependi ng on the si ze of subtasks, thi s i ncrease in producti vi ty may not be very l arge.
On the other hand, the requi rement of long-term i nterest is all but removed, so in the
long run, the subtasks undertaken by an organi zation and assigned to a person may vary
Coordi nation of special i sts i s necessary to make sure that the subtask parti tioni ng remains
valid as the original concepts are developed by impl ementing them. Development here
si mply means the continuous i ntroduction of detai l or other effects of work bei ng done.
We define a proto-solution as an i ncomplete sol ution which can be developed into the
sol ution of the problem. A parti tioni ng is val i d if i t can be i ntegrated into a
proto-sol ution.
Coordination in any form requi res communication of information, which in turn requires
expendi ture of effort. This means that speci al i zation also has a negati ve impact on
producti vi ty, by si phoni ng effort away from di rectly productive acti vi ti es. The cost of
communi cations i s then the other important factor in determi ni ng the producti vi ty of a
If unchecked, communi cation costs can grow very fast as team size, and hence subtask
special i zation, i ncreases; i n the l i mi ti ng case the number of potential communi cation
channels is a quadratic function of the group si ze ( [ Brooks] page 18). It is somewhat
surprisi ng, however, that communi cati ons become more di fficult as producti vi ty i ncreases,
even if the number of channels is held constant. Thi s wi l l be shown i n the following
In typical productive acti vi ties i nvol vi ng communicating special i sts, the acti vi ty speci fi c
language used for communi cation i s wel l known to the communi cants. I t i s easy to
remain profi ci ent i n the l anguages of these acti vi ties because the l anguages tend to change
very slowl y, the rate of thei r growth bei ng related to the rate of i ntroduction of new
concepts or abstractions into the process.
Consider, for i nstance, the office i n a l i fe insurance company handl i ng clai ms
([Drucker] page 220). Ski l led special ists worki ng on cl ai ms of di fferent
compl exi ty can communi cate i n the wel l defi ned l anguage of the trade. Events
causing changes i n the l anguage, such as i ntroduction of a new pol icy type, are
rare and, at any rate, independent of the producti vi ty of the cl ai ms office.
Software producti on di ffers considerably from other productive acti vi ti es in this respect
The elements of computer sci ence, computer l anguages, standards and operational
procedures, form a slowly changing, fundamental, language, the global language of a
software production envi ronment Any production acti vi ty, however, wi l l give ri se to a
special ized, local, l anguage. The production process invol ves the creation of abstractions
even at, or very near to, the productive l evel ; therefore the rate of introduction of
abstractions i s necessari l y coupled to the rate of production or producti vi ty. The greater
the producti vi ty, the more rapid the change in the l anguage which wi l l tend to i mpede
further progress.
The term hash table i s understood by any programmer and it properly belongs to
a gl obal l anguage. However, i f i n the course of producing a large program a hash
table i s needed, a new, more specific, abstraction, say HSHTBL, i s created whose
properties are i mperfectly covered by the generi c term. The new abstraction wi l l
enlarge the local l anguage and entai l communi cation costs.
To be able to di scuss newly created abstractions wi thout ci rcumlocutions, a typical
communication is prefaced wi th a set of definitions whi ch we shall cal l the dictionary.
The operation performed by the source of the communication wi l l be called l anguage
creation whi l e the reci pi ent's action wi l l be cal led learning the l anguage. Creation of the
abstractions themselves i s to be di stinguished from creation of language; the latter denotes
the addi ti onal effort necessary for molding abstractions i nto communi cable form.
2.1 I ntroduction
Thi s chapter presents the major thesis: an organizational schema, the Software Production
Team, designed to fulfi l l the requi rements of a software producer. The emphasis in this
organi zation is on the i mprovement of producti vi ty by si mpl i fyi ng communi cations
between the programmers. Section 2. 2 wi l l propose the use of the wheel network type of
organization to mini mi ze the number of communi cation channels and to central i ze the
i mportant language creation ( 1.6) function. Language learning ( 1.6) wi l l be overlapped
wi th task performance to effect further savi ngs.
Meta-programs, as descri bed in Section 2.3, are informal, wri tten communi cations, from
the meta-programmer, who creates the local language, to the technician who learn i t and
actual l y write the programs. Feedback communications from the techni ci ans to the
meta-programmer are very efficient, because no language creation or learning i s
involved. Meta-programs are characteri zed more by thei r purpose than by any speci fi c
form or syntax.
In Sections 2.4 and 2.5, the abstract notion of local l anguage is resolved into the questions:
what are the objects that shoul d be named, and what should thei r names be? The answers
involve the concept of painted types (related to types in programming l anguages), and
naming conventions based on the i dea of i dentifying objects by thei r types.
Section 2.6 addresses the problem of debugging i n a high producti vi ty envi ronment The
method of error l ocalization using state vector syntax checking is descri bed. Thi s method
invol ves, fi rst, the preparation of procedures to check the run-ti me consistency of data
structures, and second, a binary search strategy for swift error local i zation.
Section 2. 7 introduces additional useful meta-programmi ng conventions. The role of
meta-programs i n documentation i s also di scussed. A complete meta-programming
example is presented and anal ysed i n Section 2.8. Final l y, i n Section 2.9 we consider the
relationshi ps or contrasts between the meta-programming organization and the relevant
software engineering concepts of high level languages, egol ess programmi ng, structured
programmi ng, Chi ef Programmer Teams,
and automatic program verification.
2.2 Optimizing Software Productivity
We proceed to consider organizational schemes and thei r effects on the most i mportant
parameters determi ning producti vi ty. By maxi mi zi ng the contri butions of the parameters
we can fi nd a local maxi mum which we shall select as the point of i nterest.
Fi rst, the parameters affecting indi vidual producti vi ty - ski l l s anq tools - can be
conveniently separated from the group factors, which are speci al i zation and
communi cation. Consi derations of possible i mprovements in programmers' ski l l s woul d
involve deep questions of computer science education. The problems of bui l di ng
i mproved or new tools, such as high level languages, edi tor- compi l er- debugger
complexes, augmentation systems, are also very di ffi cul t; yet the possi bi l i ties are al ready
wel l covered i n the l i terature [Tei tel man] [Engel hart] [Geschke-Mi tchel l ]. The present
work wi l l excl ude di scussion of these questions. Instead, we wi l l assume some real i sti c
constant qual i ty of the avai l abl e ski l l s and tools, and concentrate on the question of
opti mal organi zation whi ch wi l l achi eve our goal s. Thi s approach retains the option of
uti l i zing new ski l l s nd tools as they become avai lable.
The group factors - special ization and communi cation - are i nterrelated in compl i cated
ways. The meri ts of any gi ven organizational choi ce must be eval uated by si mul taneous
consideration of i ts combined effects on al l group factors.
For i ncreased producti vi ty, communication costs must be decreased, consistent wi th
satisfyi ng the essential communi cation requi rements of the organization. The options
number three: the requi rements themsel ves may be decreased by sui table parti ti oni ng of
subtasks; waste of communication capaci ty can be mini mi zed by di stri bution on a strict
need-to-know basis, and fi nal l y, the most effi ci ent medi um and l anguage can be used i n
each i nstance.
Note that these and the followi ng comments apply only for task-oriented and not socio-emotional
or other s upportive communications [Katz-Kahn].
The i mportance of c
mmuni cations to software production was very expli ci tly elucidated i n
([NATOI ] page &9) Suggestions made there i ncluded proposals covering each of the above point:
"effecti vely structuring the object to be constructed and ensuring that this structure is reflected i n
the structure of the organi7.ati on maki ng the product" (Dijkstra), need-to-know type controls, and
usi ng automation for communication efficiency ( remote consoles, text edi ti ng).
We shal l choose the fol l owing aggregate of organizational schemes to accompl i sh our
wheel network (Figure 6) as the model for the communi cation channels and task
parti tioni ng in a team of programmers.
new l anguage wi l l be created only by the central node i n the wheel network.
task ori ented l anguage i n wri tten form for most communications.
The wheel network is a two-level hi erarchical structure consisti ng of a central node and
other nodes whi ch are connected to the hub by the spokes of the wheel. We shal l cal l the
central node the meta-programmer and the other nodes wi l l be called technicians (these
designations wi l l be justified later). The complete network wi l l be referred to as a
Software Production Team or si mpl y team.
Figure Structure of a Software Production Tem
The attraction of the wheel organi zation l ies in the simpl i ci ty of i ts topology. Thi s
intui tion i s reinforced by experimental resul ts i n psychology which general l y confi rm that
the effici ency of groups in task performance is greater in wheel networks than in other
networks admi tting more channels (for references see [Katz- Kahn] page 237).
Relyi ng on hi s central position, and having excl usi ve l icense for l anguage creation, the
meta-programmer can control the distri bution of information on the basis of
need-to-know. The sum total of new language di rected toward, and l earned by, a gi ven
technici an is the technici an's local l anguage, which i s, in general, di sjoint from other local
l anguages as shown in Figure 6. The technicians wi l l be subtask speci al i sts not only by
what they do, but also by the local l anguage they understand. The lack of common
l anguage wi l l tend to mi ni mi ze the i nformal and expensi ve i nformation flow between
techni cians outside of the highl y optimi zed channels (but see the note above on
supportive communi cations). The meta-programmer may be consi dered an area special ist,
speci al i zing in l anguage creation and meta-programming.
Return, or feedback, communications from technicians to the meta-programmer are
parti cularly effi ci ent because the l anguage used wi l l be known to both communicants.
Thi s poi nt is made in anti ci pation of tradeoff possibi l i ties between costs and error rate of
forward communi cations. Wi th efficient feedback avai lable for error correction, the
uncorrected error rate may be al lowed to ri se and costs can be reduced.
A serious drawback of the wheel organi zation is that it cannot grow arbi trari ly. The
bottleneck i s clearl y in the central node, so the team si ze wi l l be l i mi ted by the
meta-programmer's abi l i ty to perform as the number of techni ci ans increases. The
precise figure for the maximal team si ze shoul d be determi ned by experi ment, but a
common rule of thumb for managers ([Metzger] page 85) suggests an upper l imi t of four
techni ci ans i n a team. The question of growth beyond thi s l i mi t wi l l be treated in Section
Except for certain responses to feedback, al l communications from the meta-programmer
to the technicians wi l l be in wri ting, descri bing specific programming tasks the
techni cians should perform. These communi cations are the meta-programs, so called
because they descri be the steps to be taken when writing a parti cular computer program.
New l anguage wi l l be introduced by incl udi ng definitions of new terms i n the
meta-programs; expl i ci t explanation using terms al ready establ i shed wi l l al ways
accompany i ni tial usage. Since the meta-programs wi l l be avai lable in wri tten form, the
techni cians wi l l be able to consul t the defini tions at any time, and thus accompl ish the
tasks and learn the new terms in paral lel. I deal l y, the l earni ng process should be
completed at the same time as the task i tself, in which case the full insvucti onal potential
of the task i s exploited and the enriched l anguage can be profitably used as early as the
next task. To start the i mplementation sequence, the fi rst task wi l l be descri bed in some
global language (see Section 1.6), and the fol lowing tasks wi l l use the progressively richer
local language.
The order of local language introduction readi ly fol lows from a design obtained by
stepwise refinement and expressed i n terms of levels of abstractions [Dij kstra] [Wi rth!].
Since we want the l anguage of the fi rst task to be the si mpl est, and later tasks to us
language introduced earl ier, the levels of abstractions wi l l have to be vi si ted from the
bottom up. Note that thi s does not i mpl y that the design i tself has to be prepared
bottom-up or in any other particul ar sequence; it appl i es only to the order of the
combined communi cation and i mplementation of a design.
The mai n advantage of the proposed scheme is that the ti me spent by a techni ci an
communicating is reduced to a negl i gi bl e fraction: most of the recei ved i nformation wi l l
be processed whi l e performi ng production tasks; minor cl ari fications wi l l be obtained by
referring to the wri tten material, and verbal feedback wi l l be necessary only i f the
meta-programs contain incomprehensi bl e or inconsistent parts. The cost of writing the
meta-programs wi l l be more than offset by the savings in communications.
2.3 Task Orders and Meta-programs
The key communi cations wi thin a Software Production Team, as wel l as between the user
and the engineers or the engi neers and the producers, ai m at getting some software task
performed. We shall use the term task order to denote such communi cations. The
essenti al characteristics of task orders are the fol lowing:
they carry authority to initiate expenditure of effort;
they are instruments of uncertai nty absorption;
they must be interpreted in the context of some global or local language;
a task order uni quely determi nes some fami l y of programs; members of thi s
fami l y are equi val ent i n thei r abi l i ty to fulfi l l the intent of the task order.
Fi rm i ntent, resul ting from uncertainty absorption, can be expressed i n a task order by
the use of powerful local l anguage, or by being as expl i ci t as necessary gi ven the avai lable
global language. Conversely, l i cense to fol l ow any prudent course of action, especi al l y in
areas of lesser i mportance, can be granted by omission of specific i nstructions.
The form of a task order may vary considerably dependi ng on the language avai labl e to
those wishi ng to communicate. For example, al l of the following three communicati ons
can qual ify as task orders under plausi bl e ci rcumstances:
1. Wri te an ALGOL-
compi ler for the 7YZ computer. I mplement the ful l l anguage
except for i nteger labels, arrays cal l ed by val ue and dynami c own arrays. Use the
reference character set of the Revised Report, avai lable on the
ABC terminal.
I mplement l /0 as in
2. I mplement GcD(m, n) as follows:
El. [Fi nd remai nder.] Di vi de m by n and let r be the remai nder.
E2. [I s it zero?] If r=O, the algori thm termi nates; n is the answer.
E3. [I nterchange.] Set m+n, n+r, and go back to step El.
3. Type the following:
procedure TREESORT (M, n);
value n; integer array M; i nteger n;
procedure siftup(i ,n); value i ,n; integer i ,n;
begin integer copy, j;
copy : = M[i ];
loop: j := 2 i;
if j n then
begin i f j n then
begin i f M[j+1] 2 M[j ] then j : = j + 1 end;
if M[j] 2 copy then
begin M[i ] := M[j]; i := j; go to loop end
M[i ] : = copy
end si ftup;
integer i;
for i := n ; 2 step -1 until 2 do siftup(i ,n);
for i := n step -1 unti l 2 do
begin siftup(1, i ); exchange(M[1],M[i ]) end
These examples differ greatly in the richness of the operational language. In the fi rst
example, which is a speci fication for a routi ne problem, a basi c agreement is apparent
about the extremely complex meani ng of the term "compi ler" si nce no further
performance, i mplementation or rel iabi l ity specifications are gi ven. Mutual trust and
powerful local language may have been developed duri ng long-term professional
association between the communicants. Uncertai nty absorption by the customer i s evident
in the exclusion of certai n expensi ve language features and the explicit selection of
i nput/output style. Al l this remi nds us of a typical shopper who selects the style and
color of a dress wi th great care, while relyi ng on the shop's reputation for qual i ty.
The second example (an adaptation of Eucl id's algori thm as stated in [Knuth]) uses much
si mpler language: a mi x of Engl i sh, algebra and basi c computer science. Thi s language i s
understood by most col l ege sophomores. The precise meani ng of the i mperative verb
"i mplement" i s, agai n, i mpl i ci t; it i s plausi bl y establ i shed by a short-term association
between the communi cants. There i s very l i ttle uncertai nty left about the i ntent of the
task order, si nce it not only specifies the algori thm, but al so suggests a speci fi c
i mplementation by expl i ci t loopi ng i nstead of, for example, recursion. Dependi ng on the
local language, the meani ng of the terms "divide" or "termi nate" may also be highl y
specific. Thi s task order i ntroduces new language by nami ng both the variables and the
steps of the algori thm. However scant, the new language may be useful, as i n the response
to feedback seeking help: "Print m and n before the i nterchange!".
Although the thi rd example looks l ike an ALGOL procedure [ Floyd2], it is rather a
request to a typi st. The communi cants presumably have an understandi ng about the
requi red fidel i ty and about the "i mplementati on" of the special characters , ., and
boldface. For the reci pi ent typist, the operational meani ngs of al l characters i n the
communication (whether they belong to del i mi ters, identifiers, constants, or comments)
are equi valent, to wi t: cause a si mi lar mark to appear on a sheet of paper.
Task orders coveri ng the ful l range of complexi ti es i l l ustrated above may appear i n
di fferent areas of software production. The styl e of the fi rst example i s typical of
programmi ng product specifications passed from a user to a software engi neer, or from an
engi neer to the leader of a production team. Use of di rect quotation, as in the thi rd
example, is qui te proper for modules accepted as black-boxes, where detai led
understandi ng of the i nsides woul d be rather diffi cul t and woul d serve no i mmedi ate
purpose. Most local operational procedures for job control, assembly or loadi ng are i n
fact i n this category.
In the Software Production Team organi zation, meta-programs are the parti cular task
orders gi ven by the meta-programmer to the techni cians for elaboration, that is for the
purpose of creating the actual computer software ful fi l l i ng the i ntent of the orders. Si nce
a meta-program is j ust one step removed from a computer program, i t must show
considerable detai l , and may be closely related to programmi ng languages. In thi s respect,
the second example may be representative. Differences between the i nformal descri ption
of an algorithm (from which the second exampl e was adapted) and a meta-program arise
because the meta-programs possess the properties of task orders. Whi le an algorithm i s
an option (one may take i t or l eave i t), a meta-program embodies the decision that the
algori thm it represents is, in fact, the proper one for the problem at hand.
Meta-programs can be i mplementation specific and they may rely on local language.
Publ i shed algorithms, on the other hand, are always descri bed i n a global language.
The preparation of a detai led plan for a program before codi ng commences has been long
considered a good programmi ng practice. The use of flowcharts, deGi si on tables, HI PO
charts, or other Program Design Languages are often recommended. (see, for i nstance
[ Metzger] [ Horowitz] [ Barry])
The advice in the excel l ent styl e manual by [ Kernighan- Pi auger] reduces the i ssue
to i ts essence: "Wri te fi rst i n an easy-to-understand pseudo-language: then
translate i nto whatever language you have to use."
A meta-program i s a flexi bl e medi um whereby the detai l ed design can be i ni tial l y stated
and i teratively i mproved. I t can be al so used to document the program, as noted i n
[ Kernighan-Pi auger]. Moreover, the completeness and correctness of meta-programs, and
therefore their documentation val ue, i s enhanced by operational use duri ng
i mplementation. It should be stressed, however, that the mai n purpose of meta-programs
is not to be a design or documentation ai d, but to di ssemi nate detailed design i nformation
efficaciously. In parti cul ar, meta-programs generally omi t the reasoni ng behi nd the
particular decisions. Thi s is partly because usi ng onl y the local language al ready
i ntroduced ( 2.2), the reasoni ng might be diffi cul t to state. The reasons may also be
i rrelevant, obvious, and/or uni mportant (1.3).
The syntax and semantics of meta-programs are determi ned by convention, whi ch are
essentially admi ni strative rul es. Uncertai nty about the val ue of the conventions i s
absorbed when the team i s organi zed; the meta-programmer and technici ans can proceed
forthwith, assumi ng that others wi l l compl y wi th the rules. The stabi l i ty of this
organi zation wi l l depend whether the rul es are si mpl e and unambiguous, and whether it is
easier to compl y than not. Non-compl i ance should resul t in i mmediate calami ty whi ch
ampl ifies the cul prit's appreciation of the i ntrinsic, if temporari l y mal igned, meri ts of the
broken rul e.
Probably the most basic conventi on is that techni ci ans should precisely follow the
deci sions i n a meta-program. I t is clearly easier to compl y wi th thi s rul e than to
embroil oneself i n redundant decision maki ng. If, the convention
notwithstandi ng, the techni ci an changes a seemi ngly i nconsequential deci si on, such
as the name of an object, the meta-programmer can poi nt out the di ffi cul ties
whi ch could be caused by such uni l ateral action. Feedback communi cations would
become less effi ci ent, other technicians might have al ready acted on the original
deci sion, and the meta-programs would have to be updated to retai n their
documentation val ue. Thi s, however, does not mean that the techni ci ans cannot
i nfl uence the detai led design; they can always feed back thei r observations to the
meta-programmer, particul arl y if the meta-program i s pl ai nl y in error.
It is significant that conventions need not i nvolve special software ai ds. Conventions can
be adapted to existing ci rcumstances: the computing envi ronment, avai l able uti l i ties,
i mpl ementation language and so on. They can be adjusted as di ctated by experience and
measurements to opti mize the conti nuous production process. Exceptions can be made
whenever appropriate.
Conventions are al so expected to i mprove producti vi ty by si mp1 ifyi ng or altogether
e1 i mi nating acts of deci sion maki ng. Thus sma11 excursions in the cost of i mplementi ng a
standard decision, relati ve to other options, are not necessari l y of pri mary i nterest
For selecting conventions, analogies with programmi ng l anguages are very usefu1. I n the
remai nder of thi s chapter we shaH explore how type declarations, type conversions, and
other programmi ng language related extensions can si mpl ify the wri ti ng of meta-programs.
2.4 Abstractions and Operations
The task of the meta-programmer is to prepare the detailed design of some software and
to put the design i nto an easi ly communicable meta-program. In thi s secti on we shall
descri be how the we1 1 -known concept of type can be used to si mp1 ify the preparation of
From the early high-level language concepts of i nteger and real types, there emerged the
modern software engi neeri ng view that types are classes of values associated wi th whi ch
there are a number of operations whi ch apply to such values [ Dahi -Hoare] [Morris3].
The significance of thi s tenet i s that i t is truly l anguage i ndependent, i ndeed i t i s
appl i cable to high level languages as wel l as machi ne l anguages or hardware
i mplementation. The term operation i s to be i nterpreted broadly; i t covers ari thmetic and
other operators, assignment, subscri pting, procedure cal l s or even peri pheral i nput/output
operations, however they might be represented. The type of any val ue can be uniquely
i dentified by l isti ng the operations the val ue takes part in. I t i s obvi ous that even i n a
sma11 program the number of different 1 ists thus obtai ned wi l l be greater than the number
of readi l y i dentifiable types such as i ntegers and reals, and therefore new constructions
are necessary for the expression of the "excess" types.
Whi l e a new pi ece of software is being created, such an i nspection of uses is i nfeasi ble,
and i f the i dentification of type is desi red, clai rvoyance is cal l ed for on the part of the
designer. What needs to be predicted i s: can the vari able under consideration share all
operations wi th some other existi ng variable? If so, thei r types are the same, otherwise we
have a new type. The prediction ptocess can be si mpl i fi ed by looki ng for di fferences i n
the fol lowi ng properties of the variables compared:
cardi nal i ty of the class of val ues;
physical di mension (l ength, ti me, mass etc.), i f a physical quanti ty i s bei ng
uni t of measurement (hours, seconds, words, bytes etc.);
origi n of measurement (
MT, local ti me, starting at 0 or 1 etc.).
Any disagreement wi l l excl ude the possi bi l i ty of shari ng al l operations. If they agree,
further i nvestigations are necessary, of course.
The process of determi ni ng types is i l l ustrated by the fol l owi ng examples:
Example 1.
Program for centering a card i mage ([Kernighan-Pi auger] page 55). If the i nput
the output shal l !
The method is to "read the i nput i nto the middl e of a l arge array of blanks and
wri te out the appropriate part wi th the ri ght number of blanks on each si de".
This method was suggested by the avai l abi l i ty, in FORTRAN, of certai n operations
and the lack of other ones. The i nformal plan for the program is:
1. create array A contai ni ng 120 blanks
2. read . card i mage (80 col umns) i nto the l ast 80 locations of the array
3. fi nd position L and R i n the card of the leftmost and rightmost
non-blank characters defi ni ng the text "body" to be centered
4. get N, the number of blanks to precede the body
5. output 80 col umns starting i n the array so that the right number of
blanks precede the body
To fi nd the types we exami ne the quanti ties appeari ng i n the program. Fi rst, we
have A, an array of characters. The associated operations are: read and write 80
characters starti ng at a gi ven i ndex, and fetch and store a character C at i ndex I .
Thi s i mmediately i ntroduces two new types: characters, whi ch can be compared
for equal i ty as wel l as stored in A; and i ndi ces to A, which can take part in loops
( i ncremented, decremented and compared) and, by defi ni tion, i ndex any array
wi th the same type as A. Are L and R such i ndex types? The program could be
wri tten that way. However, the plan i mpl i es a conceptual l y si mpler i nterpretation:
L and R are the fami l i ar col umn n umbers 1 through 80 on the punched card.
They form a new type, the number of di fferent possi ble val ues (80) bei ng
di fferent then the cardi nal i ty of the i ndex type (120). Col umn numbers can be
enumerated i n l oops, converted to i ndi ces by the operation " +40" and the
di fference of two col umn numbers may be taken to yield N 1 . The quanti ty N
belongs to yet another type representi ng a count of col umns. Al l of the i nteger
operations are defi ned for the count type, moreover, it can be added or subtracte
from an i ndex or col umn, yi el di ng another i ndex or col umn provided only that no
overflow occurs.
Consideri ng the si mpl i ci ty of the problem, the number of di fferent types may
seem rather large. However, extensions to the problem - to i nclude left and right
fl ush formats - could be programmed usi ng j ust the types i ntroduced. Types
appear quickly but thei r number stays al most constant as a program is expanded
wi th more operations on the basic object.
Example Z.
I n-core sort program TREESORT (Section 2.3). At least three types can be
associated wi th the quanti ties i nvolved: i tems, whi ch can be compared; the array of
i tems, M, which wi l l be sorted wi th respect to the comparison usi ng the operations:
fetch and store i tem at some i tem i ndex; and, i tem i ndi ces. The latter can be
enumerated in loops and, in TREESORT, mul ti pl i ed and di vided by 2. The l ength U
of the array M, i s also of the i tem i ndex type. Thi s can be easi l y seen: i , i n the
outer block, is clearly an i tem i ndex, and both i and O appear as the second
parameter to the procedure si ftup, therefore they are of the same type. One can
i nterpret n as the i ndex of the last i tem, si nce i ndexing starts wi th 1 in this case.
These examples show that the idea of types i s i ndependent of how the objects belongi ng
to the types are represented. All scalar quanti ties appeari ng above - col umn numbers,
i ndices and so forth - coul d be represented as i ntegers, yet the set of operations defined
for them, and therefore thei r types, are di fferent. We shal l denote the assignment of
objects to types, i ndependent of thei r representations, by the term painting. When an
object is pai nted, it acqui res a disti nguishi ng mark (or color) wi thout changing i ts
underl yi ng representation. A painted type i s a cl ass of val ues from an underlying type,
col lecti vely pai nted a uni que color. Operations on the underl yi ng type are avai lable for
use on pai nted types as the operations are actual l y performed on the underl yi ng
representation; however, some operations ' may not make sense withi n the semantics of the
pai nted type or may not be needed. The purpose of pai nti ng a type i s to symbol i ze the
association of the values belonging to the type wi th a certain set of operations and the
abstract objects represented by them.
The col umn numbers of Example 1, for i nstance, are pai nted i ntegers. I ndeed, i t
i s i mpossi bl e to fi nd any other properties of col umn numbers whi ch might be
considered essential. The fact that col umn numbers belong to the subrange type
[ Hoare] of i ntegers i n the closed i nterval [1: 80] is certai nl y nei ther uni que nor
i nvariant if other subrange types over the same i nterval or conversions to other
card formats wi th, say, 90 col umns are considered. The operations of the col umn
number type (loops, +40 and di fference) are si mpl y i nheri ted from the underlyi ng
i nteger type.
Any type can be pai nted, and pai nted types can take part i n the construction of aggregate
types, such as array and records, provi di ng an addi tional degree of type discri mi nation.
Arrays are the si mplest representations of mappi ngs from i ntegers (often restricted to a
subrange) to array elements of some possi bl y di fferent type ([Hoare] page 115). The
mapping operation i s called subscripting. I t yields a reference to an element given the
subscri pt, an i nteger val ue. Now, si nce pai nted types can i nherit the operations of the
underl yi ng types, values of any pai nted type based on i ntegers or i nteger subranges coul d
also be used as subscri pts. I f the domai n type i s di stinguished by pai nti ng, the type of an
array shoul d be properly characterized by the pai r of domai n and range types i nstead of
j ust the range type alone.
Records are aggregate types di fferi ng from arrays in the fol l owi ng respects: the elements
are called fields, the types of the fields need not be the same, and the elements are named
by a fixed set of field names. Records are used to col lect quanti ties of arbi trary types for
some common purpose: a record may contai n the properties of a complex object, the local
variables of a block or parameters of a procedure i nstance [ Lampson-Mi tchell]. In the
l atter two cases, the common terms for the field names are variable and formal parameter
names, respectively. References to fi elds are obtained usi ng the field selection operation
which takes a record and a field name as arguments. For variables, parameters, and
sometimes for other fields [Wi rth2], the record i s specified i mpl i ci tly.
A number of advantages accrue from precise type specifications. Fi rstly, type checki ng
can be more thorough.
In Example 2, the complete descri ption of the type of the array to be sorted, M, i s
{array wi th domain i tem i ndex and range i tem}, i nstead of {i nteger array} or
even {i tem array}. Specifyi ng the array type thi s way excl udes i ncorrect
statements of the form:
M[copy] . M[j ];
where both copy and j are represented as i ntegers, but one i s an i tem and the
other is an i tem i ndex. The fol lowi ng statements also contai n type errors, not
otherwise discerni ble:
M[j] . = j; j . M[j ];
The second advantage i s related to the fi rst: the set of possibl e (or legal) uses of some
quantity i s small and it i s i mpl ied just by the type of the quanti ty. This is l eads us to the
i dea of coercion [Wijngaarden], or i mpl i ci t type conversions. We defi ne any operation
which i s uniquely determi ned (wi thi n some domain of di scourse) by i ts operand and
resul t types, as a type conversion. It i s then expected, that many operations can
expressed i mpl i ci tl y j ust by menti oni ng the types of the operands and the result
An early appl i cation of coercion was the automatic conversion of i ntegers to reals
and vice versa. The former operation (floati ng) is unique, the real to i nteger
conversion, however, can be defined in truncated and rounded versions. By
convention, only one of these - usual l y roundi ng - is considered for coercion.
The unique conversion operation from col umn numbers to i ndices of Exampl e 1,
i s "+40". Using coercion, the i l legal expression A[L] could be transformed i nto
the correct A[L+40] where L i s a col umn number and A demands an i ndex as
subscript. In Example 2, subscri pting i nto the array M converts an i tem index i nto
an i tem. The i l legal expression j 2 copy could be coerced i nto M[j ] 2 copy, si nce
the relations are defi ned onl y for l i ke types and there is no conversion from i tems
to i tem i ndi ces.
The conversions between pai nted types and thei r underlyi ng types may be
considered as the trivial operations painting and unpainting. Thus, in i . 1 , the
i nteger constant 1 i s coerced i nto an i ndex type by the {pai nt i ndex type}
operati on. The i nheri tance, by pai nted types, of the operations of the underl yi ng
type, coul d also be explai ned as a conversion of the pai nted type, by unpai nti ng,
followed by the origi nal operation. For i nstance, terms of the relation M[j ] 2
copy may be fi rst coerced into i ntegers, by unpainting, and then the ")" operation
defi ned for i ntegers can be appl ied.
Note that a reference to a variable is also an operation, it i s the selection of a field from
an i mpl i ci t record, the local frame of a procedure or a block [Lampson-Mi tchel l ]. If the
type of the vari able is uni que wi thi n i ts scope, the reference can be made, in fact, by
coercion from that record. Si nce the record is i mpl i cit, it is suffici ent to demand the
type, and the variable is determi ned wi thout any expl i ci t nami ng. One way the demand
can be made, i s by omitting some arguments of an over-determi ned type conversion
operation which is uniquely identified by the types of the arguments provided. The
operation wi l l then demand the remai ni ng arguments by thei r types. Al ternatively, an
operation can be specified explici tly and then the omi ssion of any argument will create a
demand for a val ue of some type.
The use of coercions necessari l y reduces the error checki ng potential of types because an
error may be i nadvertently coerced i nto a legal, if meani ngless, expression. An expl i ci t
signal when coercion i s expected can prevent thi s ki nd of mi stake. Another source of
error is i ntroduced when a number of possi bl e conversions exi st and. by convention, one
is designated for coercions. The i ntent of what is wri tten may be i ncongruous wi th thi s
The connection to meta-programmi ng i s now evident coerci ons can make the descriptions
of operations and thei r operands concise. The expressive power of coercions is derived
from the resol ution of types; more detailed type specifi cations mean more opportuni ties
for coercions.
I n summary, we have shown how to i ncrease type resol ution by pai nti ng. The color of a
pai nted type represents the association of the type wi th operations. Pai nted types can
cl ustered i n arrays and records; the element selection operations of subscri pti ng and field
selection can be thought of as type conversions. When the combi nation of operand and
resul t types i s uni que, a conversion operation can be i mpl i ci t and it i s called a coercion.
Moreover, references to si mpl e quantities - such as variables - can be al so obtai ned by
coercion i f the quantity i s considered to be a fi el d i n some i mpl i ci t record. The purpose
of using coercions i s to make the part of meta-programs descri bi ng operations conci se.
. Naming of Types and Quantities
Deci di ng on the name of a quanti ty is the prototype of deci sions whi ch are uni mportant
in themselves, but appear frequently enough to have an i mpact on producti vi ty.
Consideri ng the narrowly defined requirements of producti vi ty, name creation shoul d be
speedy, preferably automatic (automobi l e l i cense plates are such lames). Names should
be short to mi ni mi ze wri ti ng or typi ng (or keypunchi ng) ti me, to reduce the number of
mistyped names and, perhaps, to stay wi thi n bounds of existi ng l i mi tations. Names of
extreme brevity or extreme si mi larity shoul d be avoided, however; otherwise si mpl e
mistakes may transform one val i d name i nto another, rendering some checks, such as
declarations, i neffective. Lastl y, names should assist in the associ ation of the name and

the named quantity; that is, they should be mnemonic.

The most common mnemonic device is to express by the name an i mportant property of
the named quanti ty, The association is readi l y made i n both di rections: seeing the name,
one learns an i mportant property of the quanti ty which, in turn, leads to other
properties. Conversely, gi ven the quanti ty, i ts i mportant properties are known, hence the
name is suggested.
I n the busi ness oriented language COBOL, there is a standard defi ni tion for the
quantity l arger than al l others i n the col l ating sequence. The name gi ven for this
quantity is: HIGH-VALUE. Thi s name i s mnemoni c because i t reflect an
i mportant property of the quanti ty represented.
In exampl e 2.3.3, the quantity named copy is i ndeed the copy of M[i ]. It requi res
deep understandi ng of the algori thm to see why the property of being a copy of
somethi ng else is i mportant i n this case.
A number of problems arise with this practice: a quantity may not have any significant
properties, or i t may have so many that i t i s diffi cul t to remember whi ch one was
chosen. Note that the latter problem mostly affects the association in the di rection from
the quantity to the name. In other cases, the i mportant property may be di ffi cul t to
express concisely. Yet other quanti ties share their most i mportant property, complicating
the association from the name to the quantity.
These problems can be exhi bi ted by nami ng, respectively: the loop variables i n
TREESORT, giving rise to the ubi qui tous i; the mai n hash table of variable
identifiers in a compi l er, which may be the Mai nTable, HashTable and so on; the
stack reference to the lexicographical l y enclosing block in an
ALGOL runti me
system; or the special val ue used as a "high del i mi ter" i n COBOL. The actual name
defined for the last quantity i s UPPER- BOUND, easi l y confused wi th HI GH-VALUE.
These problems considerably compl i cate the naming decisions. The selection of the
property to be expressed by the name takes ti me, especial l y i f shorter names are sought
Nevertheless, i t would be a mi stake to abandon mnemonic names, because the
development of local languages depend mostly on the ease of learni ng of new names.
We shall si mpl ify the nami ng process by i ntroduci ng a compound nami ng scheme: we
shall select a si ngle property, appl icable to al l quantities, for the major qualifier part of
al l names. This part wi l l provide enough resol ution to identify a si ngle quantity in most
cases, or at least to reduce the number of quanti ties matchi ng the descri ption to a few. I n
the l atter cases, a second minor qualifier property wi l l be chosen appropriately to provide
unique identification of the quanti ty. The si mpl ifications l i e in the el i mi nation of
expl i ci t decision-maki ng in some cases and the substi tution of a si mpler decision for a
more di fficul t one in others. The selection of the mi nor qual ifier is si mpl e because the
number of quanti ties to be di stinguished is smal l - practical l y any property would do. In
view of the concl usions of Section 3.3, the property for major qual ification wi l l be the
quanti ty's type.
There are many examples of compound nami ng and using types as qual ifiers i n
programmi ng languages and systems. The early algori thmi c l anguage FORTRAN,
for i nstance, encoded the types of vari ables i nto the fi rst l etter of thei r names:
ICOUNT was mani festly an i nteger, RsUM a real, and so on. Actual ly, thi s
convention was meant to assist the compi l er i n assigni ng the proper representation
to the variables.
In ALGOL-W [Hoare-Wi rth] and SNOBOL 4 [ Farber-Gri swold-Polonsky] as well
as i n other languages, the procedure creating a new i nstance of a record type i s
named the same as the record type i tself. Si nce this procedure i s the onl y object
named by the record type, no mi nor qualifiers are necessary.
Many ti me-shari ng executi ves (for example SDs-940 or TENEX) i ncl ude a type
i dentifyi ng extension i nto al l fi l e names as a mi nor qual ifier. Thus the source
text for some program may be stored i n fi l e PROG.TXT and the compi l ed bi nary
version of the same program might be called PROG. BIN. The extensions denote
true types, si nce they determi ne the operations which may be performed on the
fi les: a text fi le may edi ted or compi led and a bi nary file may be run.
For conciseness and ease of creati on, pri mi ti ve types and some of the pai nted and
aggregate types wi l l be described by two- or three letter tag, abbrevi ati ng the spoken,
i nformal, type name. For the other types, the descri ption wi l l be constructed from the
descri ptions of constituent types. The construction schema may be standard, or it may be
defi ned when needed. The schema for arrays, probably the most i mportant one, can be
stated thusly: let X, Y be the descri ptions of the domai n and range of the array,
respectively; the description mpXY wi l l be used for the array type. The reason for short
tags is now evident: longer tags woul d make unwieldy constructions.
Let us assign tags to the types of Example 2.4.1 as fol l ows: use CO for col umn
numbers, ci for character i ndices and ch for characters. The major qual ifier for
the array A of characters wi l l be mpci ch .
Qual ifier construction schemes are not restricted to aggregate types. Consider, for
example, the di fference type dX, generated by the ari thmetic di fferences of any pai r of
objects of type X. A comprehensive l ist of useful schemes i s gi ven at the end of this
section. Note that there are no record ,construction schemes on the l i st: i t appears that
records types are i ndependent of the number and types of thei r fields and are best
descri bed by new tags. Thi s is supported by the fol lowing argument: Fields of a record
represent properties of an abstract object. The reason for addi ng a new field, representing
another property of the same object, i s to extend the set of operations or to make existing
operations more efficient. Thi s action will not change the type of the record.
Let X be a type, as determined by a set of operations. I f this set is changed, the new set determines
type X. In principle, X is not identical to X. However, si nce after the change there remain no
objects of type X, we may safely claim that the types are the same.
To ensure the sufficiency of the resol ution, types should be first di stinguished by pai nti ng
as described i n the previous section. If groups of identical l y typed objects remai n ,
strongly related objects can be organi zed i nto arrays, and new scopes can be i ntroduced to
separate the more loosely related ones. New scopes are created by decl aring records or
procedures, for exampl e. Fields need t o be identified onl y wi thi n a record and
parameters within a procedure i nstance. These steps are al so good programmi ng practice;
hence in a properly constructed program whi ch uses pai nted types, type resol ution i s
probably as good as i t can be. Conversely, unseemly type resol ution may be an i ndication
of poor design. We shall return to thi s poi nt later.
In spi te of proper specification of types and scopes, in some cases mul tiple val ues i n the
same scope, belonging to the same type, need to be di stinguished, ostensi bl y by mi nor
qual ifi ers. Si nce the success of the compound naming scheme depends on the sparing use
of mi nor qual i fiers, the probabi l i ty of such an event should be esti mated by enumeration
of the reasons for di sti nguishing val ues. Whether a distinguished val ue is a constant or i s
gi ven by reference to a variable or array element possessing i t, i s largely i rrelevant i n this
case. In ei ther case, a potential for confl ict is present.
In case of the arrays, values of the i ndex type identifying the disti nguished array elements must, l n
turn, be distinguished. Aggregation of values i nto arrays can el i mi nate onl y unnecessary names.
Actually. there is an i ndependent advantage to aggregation: operations whi ch need to enumerate al l
val ues are si mpl ified.
Constant val ues do not requi re names i f wri tten as constants, such as 3.14 or 'string'. It i s go
programmi ng style however, to treat constant values as potential variables, in whi ch case the value
has to be named.
Val ues withi n certai n types must be i ndi vi dual l y distinguishable, in parti cul ar, a large
number of procedures, Boolean variables (flags) and val ues of an enumerated type
[ Hoare] may concei vabl y appear in some scope. Compound nami ng offers some help, i n
that the selection of the mi nor qual i fier i s i ndeed simpler if disti nctions need to be made
wi thi n the type only, rather than among all objects wi thi n the scope of the type.
In many types, a certai n val ue i s disti nguished to represent the "empty" or nil object. I f
the val ues of a type are ordered, the min and max values are often di stingui shed. These
cases can be handled by standard mi nor qual ifiers l isted below.
Lastly, identical ly typed variables, parameters or fields may appear in the same scope.
Assumi ng a stochastic model of random assignment of types to quantities, the expected
number of mi nor qual ifiers, M, is a function of the number of types, 1, and the number
of quantities per scope, Q. Contours of thi s function are pl otted i n Figure 7. The plot
reveals that for 1 Q, the probabi l i ty that three minor qualifiers will suffice, is better
than 80%. Measurements by [Geschke] i ndicate that for 82% of scopes, Q 8. Wi th the
expedi ent trick of di sti nguishing between parameters and local vari ables by a prefix (see
1 2
Q 8
1 0
1 1
1 2
1 3
1 4
1 5
3 4 5 6 7 8
P(T, Q, M) = C(T, Q, M}/T , where
9 1 0 1 1 1 2 1 3 1 4 1 5
C(T, Q, M) = i f T= 1 then ( i f QS. M then 1 el se 0} el se
. () C(T- 1 , 0-i , M}
i =O
M = 2, > 90%
M = 3, Q > 90%
M = 3, Q > 80%
Figure / Contours of the function P(T,Q,M): The probabi l i ty that 3
selection. wi th replacement, of si ze Q from T i tems contai ns l ess or equal
than M repeti tions of any i tem.
below), Q may be hal ved. The trivial examples i n section 3.3 show that T is l i kely to
at l east 4 and probably much l arger.
Experience suggests that the property fi rst considered for mi nor qual i fi cation should
the quanti ty's position i n a spatial or temporal order. Thus the val ues represented are
often the first or last i n some i nterval, or they are initial, old, new, previous, current or
next i n temporal sequence.
The signifi cance of compound nami ng is enhanced by addi tional benefits. The presence
of the type in every name is extremely val uable for coerci ons, type checki ng and general
documentation. Some type checki ng can be performed even wi thout detailed knowledge
of the tags or operations by a form of "type cal cul us", not unl i ke the di mensional checks
of physical equations:
Let X and Y denote arbitrary tags. Clearly, the types in the expression: mpXY[X]
Y are consistent. Si mi larly for: mpXdY[X] mpXY[X] - Y.
The type calcul us i s also useful for defi ni ng type construction schemes:
Gi ven arbi trary tag X, defi ne dX to be the type such, that X + dX i s also an X.
The abi l i ty to identify the types of objects may be a major reason for fol l owi ng the
conventions in si tuations where compound nami ng i s otherwise awkward. Consider the
enumerated type: co {coRed, coYel l ow, coGreen} . The choice of names coul d be
considered i nferior to the straightforward: col or {red, yel l ow, green} were i t not for
the type i ndication. Besides, maki ng the decision to make an exception is probably more
expensi ve than the val ue of the di fference.
A different ki nd of check is made possi.ble by associati ng semantics wi th the standard
mi nor qual ifiers. For example, l ast may be defi ned to mean the upper l i mi t in a closed
i nterval . Now, i f X and Xlast are to be compared as part of testing whether X belongs to
an i nterval, there wi l l be no doubt that the proper operation is X Xl ast as opposed to X
< Xl aSt. By ri gi dl y adheri ng to the standard semantics for the minor qual ifiers, many of
these common "off-by-one" mistakes [ Ktrnighan-Piauger] can be avoided.
A summary of standard major and mi nor qualifi ers i s gi ven in the fol lowing table. (X
and Y denote arbi trary tags, throughout. Note that whenever some operation i s used in a
defi ni tion, the appl icabi l i ty of the operation to instances of the actual operand types i s
pX poi nter to X. Let $ be the i ndi rection operation. $pX i s then an X.
aX address of X. paX i s an X.
eX counts i nstances of X (not necessari l y al l i nstances). For example, ceo coul d be a
counter counti ng colors whi ch appear i n a graph (assumi ng the type defi ni ti on co
dX first d ifference of X. X dX i s an X.
mpXY array (map) wi th domain X . and range Y. mpXY[X] i s a Y.
rgX short for mpi XX, array with domai n i X and range X.
i X domai n of rgX.
IX l ength of an i nstance of X i n words (thi s construction is useful i n system
programmi ng languages).
tX temporary X, the same type as X. A somewhat i nelegant but efficaci ous device to
disti ngui sh between parameters and local (temporary) variables i n procedures,
thereby i ncreasing major qual i fi er resol ution.
Xmi n mi ni mum X val ue: for al l X, X 2 Xmi n.
Xmax maxi mum X val ue, for al l X used as a subscri pt, X Xmax. We note that i f
Xmi n=O, Xmax i s the cardi nality of the domai n of mpXY. Xmax=O means the
domai n is empty.
Xmac current maxi mum X val ue: when X is the domai n of some array whi ch is used as a
stack, max may be used to denote the al l ocated si ze of the array whi l e mac keeps
track of the portion actual l y used, acti ng as the top of the stack poi nter. For al l X
used as a subscri pt, X Xmac; Xmac Xmax. Xmac=O means the stack is empty.
Xfirst fi rst X val ue in some closed i nterval . For al l X in the i nterval, X 2 Xfirst.
Xlast l ast X val ue in some closed i nterval. For all X i n the i nterval , X Xlast. If the
empty i nterval is al lowed, it is represented by Xlast Xfi rst.
Xni l di sti ngui shed X val ue to represent the empty i nstance. May be used for checki ng
equal i ty or i nequal i ty onl y.
2.6 Debugging as an Organized Activity
Si nce the design and creation of program text i ncl ude onl y manual checks of correctness,
it seems unavoidable that this i ntermedi ate product wi l l contai n errors. The process of
local i zi ng and removal of the errors is termed debugging. Other related terms are testing
and integration. The former denotes especial l y the generation of a range of sti mul i and
checki ng the correspondi ng responses i n an attempt to uncover errors. The i ncl usion of
i ntegration in thi s category reflects the recogni tion that many errors are i ntroduced when
al ready debugged components are combi ned. I ntegrati on, thus, is in the midst of, and al l
but i ndisti nguishable from, the debuggi ng activi ty.
Data publ i shed in [ Boehm] show that 30% to 50% of the total software cost is l i kel y to
be spent on debugging. There are some reasons to bel ieve that meta-programmi ng wi l l
reduce the number of errors i n the i ni tial program text and thereby si mpl ify the
debugging problem. The logic of all software wi l l be scruti ni zed and understood by at
least two persons: the meta-programmer and the techni cian. The nami ng conventions
descri bed in Section 2. 5 provide addi tional opportuni ties for checki ng operator and
operand compati bi l i ty. Nonetheless, wi thout mechani cal checks of semantic correctness -
considerations of which have been excl uded (for bibl iography see [Deutsch]) - debugging
coul d remai n a serious problem, especial l y in view of the expected i ncrease in the
production rate. Consistent with the plan laid down in Section 2.2, we shal l concentrate
on the question of opti mal organi zation while assumi ng the availabi l i ty of realistic tools
to assist debugging.
High debuggi ng productivi ty means that i ndi vi dual errors are made apparent, local i zed
and removed quickly. Gi ven the vol ume of acti vi ty, i t is reasonable to assume that these
steps wi l l be performed by the techni ci ans. The nature and extent of the
meta-programmer's contribution is a key problem.
The fi rst evidence of a software error, the error indication, may be i ncorrect termi nation
i ncl udi ng fail ure to termi nate, excessi ve use of resources, i ncorrect output, or an error
message. The actual error, the cause of the error i ndi cation i s typical l y removed from the
locus of i ndication both in space and ti me.
The plausibi l i ty of this effect can be seen as fol lows. An error i ndi cation is a
coi nci dence of a statement capable of maki ng the i ndi cation (trap, loop or output)
wi th the occurrence of erroneous operands which actual l y cause the indication.
Assumi ng uniform di stributions, the probabi l i ty of thi s coi nci dence occurring i n
the vi ci ni ty of the error i s low. The si tuation i s compl icated by statements which
depend on erroneous data, but, i nstead of giving an i ndication, propagate the error
by producing erroneous resul ts. The avalanche resulting from error propagation
i ncreases the probabi l i ty of early i ndi cation, but i t al so tends to destroy evidence
and generall y frustrate analysis.
Frequent checks of the reasonableness of the data passi ng through checking
interfaces, also i mprove the chances of early error i ndication. This method,
however, is l i mi ted i n its appl i cabi l i ty. If data about to used i s cheked, for
example i n the case of dynami c bounds checks of array subscri pts, the i nterface
mai nl y serves to prevent error propagation and to give an earl ier and more
controlled i ndi cation than the one whi ch woul d have happened otherwise. Checks
of results from operations are rare, because they would but restatements of
what has been done i mmei ately precei ng.
When, i n a large system, a reference count of a certain class of pointers gets fouled up, that is
usually not the fault of the proedure respnsible for creating or deleting pointers which
unconditionally i ncrease or decrease the count. On the other hand, the proedure which d0c8
i nconsiderately smash a poi nter or the reference count i tself, i s not l ikely to include any checks
against that particular form of unexpected behavior. The i ndication of the error could b given by
an i nterface check before a pointer is deleted refusing to decrement the zro count. This i ndi cation
would convey very l i ttle i nformation about the ti me and place of the actual errr.
We can conclude that, whi le i nterface checks can be val uable, the problem of localizing a
l arge n umber of errors on the basis of scant i nformation must be solved. Local i zation i s
often approached as i f i t were a puzzle of the form: What coul d cause the observed error
i ndication? The sol ution space - the set of possi bl e answers for thi s problem - is
extremely l arge, considering the number of possi bl e i mmediate causes fi rst, then what
could cause those and so on. A further compl i cating factor is that the reasoni ng i nvolved
must go beyond the domair of the abstractions and operations of the program si nce the
events reasoned about do not necessari l y take place i n a correct envi ronment. Even in a
wel l protected high-level language envi ronment, an error wi l l cause a transition from the
domai n of the program i nto a more complex domai n where the behavior of objects i s
constrai ned only by the most complete defi ni ti on of the language.
Most languages do not have i ron-clad protection. In such cases, or if the error is i n the language
processor, execution after some errors is constrai ned only by the defi ni tion of the vi rtual or real
machi ne. If the error is caused by operating system or hardware malfunction, the constrai nts can b
even more obscure.
These observations suggest that concentration on the post-error regi me, i ncl udi ng the
error i ndication i tself, may be a mi stake; i nstead, the question to be answered should be:
At what ti me does the program state change from correct to i ncorrect? The sol ution
space is a trace, d l ist of the program statements as they were executed. The i mportant
property of thi s space i s not i ts si ze, but that i t i s ordered, and therefore an efficient
binary search can be used to find the correct to i ncorrect state transition point.
A bi nary search is performed as follows: consider the poi nts of the lasi good state
and the earliest bad state. I nitially, these are at the start of the run and at the error

state is correct state is i ncorrect

3 6 5 4

Figure 8 Local i zation of programmi ng error by bi nary search. Probes 2,
3 and 6 found the state correct; 1, 4 and 5 found i t i ncorrect.
i ndication, respectively. Choose a new probe poi nt i n between and decide whether the
state is correct there. If so, we have a later good state, otherwise, an earl i er bad state.
Update the poi nts accordi ngl y and repeat The search termi nates when the poi nts straddle
an erroneous statement, or a smal l area wherei n the error may be found by i nspecti on
(Figure 8).
Two operations are essential i n thi s scheme: exhi bi ti ng the program state at the chosen
poi nt, and decidi ng whether a state is correct or not. In contrast wi th the "puzzle"
approach, the defi ni tions of the abstractions and operations of the program provide a
sufficient basis for determi ni ng the latter. A n umber of possible i mplementations for
both operations wi l l be descri bed below, ranging from manual procedures to others
requi ri ng extensive preparations and programmi ng. By the term debugging strategy, we
shall mean the choices among the possi bl e i mplementations. The execution of the search
schema, i ncl udi ng the choice of i ntermedi ate probe points, we shal l call debugging
tactics. Thi s di sti nction wi l l be used for assigni ng roles to the meta-programmer and the
technicians, respectively.
The si mplest way to determi ne the correctness of a state i s by manual i nspection of some
representation of the state. The representation may be a uni form octal or hexadeci mal
dump of the bi ts comprising the state, test output, or a stored bi nary i mage i nterrogated
by i nteractive means.
It js i mportant that the representation be adequate for determi ni ng the correctness
of the data structures comprisi ng the state. Let R be a transfer function as
defi ned by [Morris2], such that for some Wand for al l x of some type: W( Rx)) =
x. R is then adequate for the gi ven type. Octal dumps or equi valent i nteractive
tools are cl early adequate for al l types. However, i nspection is much si mpl ified i f
an R transfer function, the test print procedure, i s wri tten for every type to
produce detailed textual i mages for val ues of that type, wi th fi elds clearly labeled
and formatted accordi ng to thei r underlyi ng type. (see Secti on 2.7.7)
Note that it i s si mpler to show that the state is wrong than that i t i s right; a
demonstration of a single i nconsi stency bei ng sufficient i n the former case whi l e the
latter i nvol ves uni versal quantification: consistency must be shown for all asserti ons
characteri zi ng a correct state. Thi s suggests that the i nspections start by looki ng for
i nconsistencies. The known earl i est bad state can gi ve a val uable hi nt as to where and
what to look for. The problems arise if no i nconsi stencies are found this way.
One possi bi l i ty is to accept the state as provi si onal l y correct if it does not contai n the
i nconsistency of the earl i est bad state. The search then will converge ei ther on the error
i tself, or on an i nstance of error propagation in which case a new hi nt i s obtai ned and the
whole procedure may be i terated. This procedure systematical l y uncovers the l i nks i n the
causal chai n of error propagation. Whi l e each bi nary search wi l l converge quickly to the
next l i nk, the number of l i nks i n the chai n, and therefore the ti me for local i zi ng the
error, may be large.
To restate symbol ical ly: let Ab be an i ncorrect data structure, let Ag be an earl i er
correct state of the same structure i n a provi sional l y good state. The search wi ll
converge on some operation Ab + A ? B, for some structure, or group of
structures, B. If B i s correct, we found an error, otherwise we have Bb, a new hi nt
Al ternatively, at the cost of eval uati ng al l assertions, a state can be certified correct or
i ncorrect and the search wi l l find the error di rectly. The expected l arge number of
complex assertions excl ude the possi bi l i ty of manual eval uation. I nstead, software check
procedures whi ch determi ne the correctness of i nstances of a gi ven type, wi l l be
combi ned to form an easily executable state vector syntax checker ..
The assertions the check procedures eval uate are very si mi l ar to those used i n
provi ng programs correct. The si mi lari ty ends there, however, because check
procedures show the val i di ty of the assertions restricted to a si ngle, actual i nstance
of a type, whi l e program correctness proofs extend over al l values i n all possi bl e
For example, the fol l owi ng assertions about a chai ned hash table are typical of
those appeari ng in check procedures:
Al l l i st poi nters poi nt wi thi n the boundaries of the table.
The number of entries i n each l ist is l ess or equal than the total number of
entries (no ci rcular l ists).
The hash codes of al l keys on any gi ven l i st are equal and poi nt to the
head of that l i st (keys are probably i ntact, l i sts are di sjoi nt).
The sum of si zes of entries on all l ists pl us the free entries accounts for al l
storage i n the table (no lost entries).
If any asserti ons are found not. to hol d, a check procedure can i mmediately
termi nate wi th some i ndication, ignori ng other errors that might also exi st. The
i ndi cation should identify the assertion which fai l ed. To assist in identifyi ng the
erroneous value, the verifier should keep some easi l y accessi bl e variables updated
with the type and address of the current val ue bei ng checked. Further
i nformation about the nature of the error can be gleaned from the meta-program
or code for the assertion.
It i s not strictly necessary that the assertions be complete in descri bi ng the correct
behavior of the program. If an i nconsi stency is mi ssed, in the worst case, the
manual procedure described above may have to be fol l owed for one search
iteration. When the error is found, the check procedure can be updated wi th the
proper test
Si nce a few mi ssi ng asserti ons do not cause undue harm, some assertions may be
expl i ci tl y omi tted if thei r cost/benefit ratio is low. In parti cular, assertions wi th
memory are often as di fficul t to i mpl ement as the operations themsel ves, whi l e
excl udi ng onl y rather obvi ous errors whi ch are best local i zed manual l y.
The most i mportant property of a hash table is that i t remembers the keys
that were i nserted. The asserti ons expressing thi s property woul d i nvol ve
an i ndependent i mpl ementation of an associative memory to serve as the
model for the behavi or of the hash table. The expense of producing the
i ndependent i mpl ementation woul d not be justified by the smal l number
of addi tional fai l ure modes i t woul d cover.
Consider the memoryless consistency checks of a chai ned hash table
descri bed above. They can determi ne whether any l i sts are destroyed or
malformed, or i f keys are destroyed (unless the bad data happens to hash
i nto the correct code). The addi tional property ensured by a perfect
checker would be that the keys to be l ooked up, provided as parameters to
the hash table operations, are reproduced and compared faithful l y.
The fai l ure modes covered by the asserti ons wi th memory are - related to the smal l
number of operations of a si ngle abstraction. In comparison, the errors detected
by the memoryless checks may be the undesi red side effects of any erroneous
operation whatsoever. Note also that the pri vate storage of a checker would not
be i mmune to side effects, ei ther.
It is apparent that the power of memoryless asserti ons are derived from
redundancy i n data structures. The usual reasons for redundancy are breakage,
efficiency, and error checki ng of peri pheral operations. By breakage, we mean the
storage of val ues from smal ler sets, carryi ng a few bi ts of i nformation, in ful l
machi ne words capable of holdi ng dozens of bits. Redundant secondary
structures are often bui l t and mai ntai ned for effici ent access to i mportant
functions on the i ndependent, primary, data. The consistency of the structures
can be tested by checki ng the membershi p of values i n the sets to which they
should belong, or eval uati ng the functi ons of primary data and compari ng wi th
correspondi ng results obtai ned from the secondary structure. If the above
condi tions are not present, it may be reasonable to i ntroduce some redundancy
j ust for the purpose of error checki ng: such practice is qui te common for
hardware peri pheral operations where pari ty, checksummi ng, . identifying labels,
write locks, or even error correcting codes help i n copi ng wi th errors. Si mi l ar
measures may be appropriate for the protection of i mportant data structures,
si nce, i n the presence of software errors, the address space where the structures
reside can be vi ewed as a noisy storage medi um.
I t i s understood that check procedures can not be used at arbitrary points i n the
execution of a program; the cri tical sections excl uded are those modifying the
structure whi ch is checked. A voidance of cri tical sections is an i mportnt part of
debugging tactics. Errors local i zed to wi thi n a cri tical section can be certai nl y
found by i nspection.
What happens i f a check procedure contai ns an error? Errors of omission are
si mi lar to the mi ssi ng assertions di scussed earl ier. Si de effects wi l l be also
detected by the standard strategy. Other errors cause i ncorrect i ndications; these
are best found i n the operational envi ronment of the check procedure. The i ni tial
i ndications of a newly i nstal led check procedure should be verified by i nspecting
the data structures clai med to be malformed. Si nce check procedures are
memoryl ess, the cause of an erroneous i ndication is al ways i mmediate and can
found by i nspection. If the i ndication is justified, the standard strategy shoul d be
followed, of course.
The second essential operation for the bi nary search scheme is fi ndi ng the state of
execution at some, for the purposes of the operation arbi trari l y, selected poi nt. If the
execution of the program can be repeated exactly, or al most exactly, any state can be
obtai ned by re-execution wi th a break or hal t at the proper place.
Practical considerations may al ter the strategy in a number of ways. Fi rst, the
selection of the probe poi nts may be constrai ned to expl icitl y programmed ones
by the lack of break faci l i ties. Second, the exact repetition of program executions
may be i mpractical , even if theoretical l y possi bl e: the execution time or batch
turn-around time may be too long, or the program may depend on real -ti me
i nputs such as typei n or i nterrupts. Fortunately, all of these adverse condi tions
are predictable from the nature of the computi ng envi ronment and the problem.
Appropriate preparations may i ncl ude the following:
Identify the set of regular points in the program such that control wi l l
pass through one of them wi th medi um frequency and where al l data
structures are in consistent state. These poi nts can be fi tted wi th
conditional halts, state dumps for i nspecti on, or condi tional cal l s on the
state vector syntax verifier. The number of program executions in the
search process can be reduced by runni ng the verifier at the highest
possi bl e frequency consistent with the length of execution and the
avai l able computi ng resources. Thus, after the fi rst run. the error (or the
new hi nt, dependi ng on the power of the veri fier) i s local i zed to wi thi n
one "wavelength" of the verifier. Further debuggi ng can proceed by
i nspection. or a new run may be prepared wi th higher frequency
verification concentrated i n the smal l er, local ized, area. Numerous
variations of these schemes are possi bl e: the verifi er may be turned on
duri ng al l executions whi le debuggi ng or even i n an operational system;
check procedures i n the verifier may be i ndi vi dual l y turned on or off so
that the overhead and i nterference of verification can be decreased whi le
the frequency and resol ution can be i ncreased.
To fi nd the most el usive bugs, a ci rcul ar event buffer may be e mployed.
The buffer can hold the recent history of a smal l pi ece of the state and i t
can be updated wi thout appreciable i nterference t o the program. The
shortcomi ngs of the buffer are short temporal and spatial reach. These are
somewhat al l eviated when the use of an event buffer and a veri fi er are
combi ned: the veri fier may local i ze the error to wi thi n a wavelength and
may also gi ve a sharper hi nt as to what part of the state should be
buffered. Thi s method i s analogous to hardware debugging wi th delay
l i nes in osci l 1 oscopes which enable the engi neer to i nspect events occurri ng
shortly before a trigger signal.
Provisions should be made for avoi di ng unnecessary real -ti me i nputs
duri ng debugging. In parti cular, major i nput for test runs should be read
from a fi l e, even if an on-l i ne termi nal is avai l able. The program should
also i ncl ude some global i ni tial i zation to protect i tself from dependence
on uni ni ti al i zed val ues.
Program execution ti me may be reduced by the standard techn ique of
checkpoint and provisions for restart. At a checkpoi nt the program state,
resul ti ng from a lengthy computation, is saved on a fi le. Poi nts past the
checkpoi nt can be then reached repeatedl y starti ng from a restored state.
The computing envi ronment may not offer checkpoi nti ng services, but i t i s
relatively si mpl e to i mplement them i ntegral to the program .
Removal of errors once they are local i zed, is probably the si mpl est of the debuggi ng steps,
because it is closely . related to production. Si nce there are two i ndependent
representations of the program logic: the meta-program and the el aborated program text,
two cases must be disti nguished. If the local i zed error occurs i n the program text onl y,
the technician can perform the correction. If the meta-program is manifestly i n error,
the techni cian may or may not propose a sol ution, but the meta-programmer shoul d be
told in any case, so that the meta-programs and the meta-programmer's model of the
world can be kept up to date, and also that the meta-programmer car comment on the
i mpl i cations, or, i f the error i s serious, prepare the requi red changes. Note that thi s
woul d be an i nstance of efficient feedback communications (Secti on 2.2) rel yi ng enti rely
on language wel l known to both communi cant.
.1 Other Meta-Programming Conventions
In addi tion to object nami ng, conventions may be used to control other syntactic and
semantic aspects of meta-programs and the produced code. Conventions should
selected on the basis of thei r contri bution to productivi ty and ease of communi cation. I t
should be re-emphasi zed that the meta-programs' mai n purpose (2. 3) i s to communicate
the detailed design to a technician so that he can produce code which fulfi l l s the i nten
the meta-program, and so that he can learn the new terms i n the local language at the
same ti me. Uncertai nty about the form and economies of conventions i nvol vi ng speci al
purpose addenda to meta-programs or code shoul d be properly absorbed by engineering
organi zations (1.2).
It i s by no means certai n, for example, that special documentation for the
purposes of future program mai ntenance is al ways desirable. Some code may be
short-l i ved (1. 3. 1 1. 3.4) if eval uation by the engi neeri ng organi zation shows that
the engi neeri ng design is unsati sfactory. For the purpose of evaluating
al ternati ves, the least expensive code, undocumented except for the
meta-programs, is the best sui ted. Furthermore, the worst-case costs of future
program mai ntenance from the meta-programs can not be much greater than the
techni cian's contri bution to the origi nal creation of the code, which is si zeable but
does not precl ude repeti tion. However, the unavai labi l i ty of feedback from the
meta-programmer and i ncomplete meta-programs may make maintenance, from
the meta-programs. alone, diffi cul t.
Software is said to be readable if the cost of a mi ni mal modi fi cation i s low, even
when the expert preparing the modification has had no prior fami l iari ty with the
detai l s of the program. The combi nation of meta-programs and code is not
readable i n thi s sense, si nce the i nformation contained therein i s geared for
writeability, for understandi ng by an organi zed and l arge scale scan of the
contents. The i mportant poi nt is that the production of readable software
i nvolves more engi neering effort and it i s more expensive than the production of
wri teable code. If future modifications turn out to be si mple, readable software
may look better; but, in the larger picture, the ease of the small modifications
were bought at the disproportionate cost of modi fi cati on-proofi ng the whole
program. For larger future modifications the i mportance of the narrowly
construed concept of readabi l i ty di mi nishes as the modifications begi n to resemble
production tasks.
The fol lowing conventions have proved themselves i n operational use (see Chapter 4), and
are strongly recommended:
1.7. 1 Divisions in meta-programs
The defi ni tions of new major and mi nor qual ifiers, compri si ng the major portion
of the new language i ntroduced by a meta-program, form a body of reference
material which the technician as well as the meta-programmer wi l l peruse
frequently. To si mpl ify these references, the defi ni tions appear at the begi nni ng
of a meta-program i n the Abstractions di vi si on. The Operations di vi sion which
describes the actual code to be wri tten as a set of proedures operating on
i nstances of abstractions al ready defined, foll ows thereafter.
Wi thi n the l ist of abstractions there may appear the fol lowing constructs: new tags
together with thei r i nformal, or spoken, names; l i sts of fi elds if the abstraction is
a data structure, and l i sts of disti nguished values to defi ne the non-standard
mi nor qual ifiers. The essential properties of an abstraction may be summarized
by i nvariant relations which hold true for al l i nstances; however, such detai l i s
seldom necessary save for more i ntricate structures. I f i nvariances are gi ven, they
may be used for the meta-programmer's own reference and general
documentation; or they may hel p in determi ni ng the correctness of state duri ng
debuggi ng (2.6). Moreover, the descri ption of those portions of the operations
which are responsi bl e for the mai ntenance of the i nvariances may be si mpl ified.
Defi ni tions of new type construction (2. 5) may be wri tten among the abstractions.
Very l i ttle, if any, code resul ts from the elaboration of the abstractions.
Dependi ng on the programmi ng language used, declarations for the data structures
and their fields have to be prepared; distinguished val ues have to be declared and
i ni ti al ized.
The di visions of a meta-program are somewhat analogous to the Data and Procedare
di visions of the busi ness-oriented language COBOL [McCracken]. The mai n di fference i s
i n the concentration of generic i nformation i n the di vision of Abstractions, as opposed to
the more concrete declarations of the COBOL Data di vision.
The Operations di vision contai ns the descri ptions of the expl i ci tl y programmed
operations, wri tten in a convenient pseudo-language commentary which usual l y
resembles a higher - l evel programmi ng language. Impl i ci t operations, such as
pai nting or operations i nherited from the underl yi ng type (2. 5), need not be
defined. Variables need not be declared.
The essential properties of operations may be expressed by state transformation
relations coupl i ng the program state before and after the operation. These
relations. i f gi ven. are used si mi l arl y to i nvari ances. as descri bed above.
The elaborated operations consti tute the major portion of the produced code.
Some new language may be i ntroduced i n the Operations di vision by refinement:
an action may be descri bed usi ng a new term wi th an explanation fol l owi ng
i mmedi ately or i n a separate secti on. Parts of the refi nement. i n turn. may ne
further explanation unti l al l actions are defi ned entirely i n known terms.
For example. a meta-programmer may elect to i ntroduce a new concept as
i f buffer i s empty then
followed by the refi nement in terms of the known type bi (for buffer
i ndex):
buffer i s empty iff: bi Read=bi Wri te- 1 or
(bi Read=O and biWrite=bi Max)
Thi s arrangement is related to the design technique of stepwise refinement
[Wi rthl]. The relation. however. need not be a strong one: the design detai l
communi cated by refi nement coul d have been created using other design methods,
for example by bui l di ng action cluster [ Naur3].
Z. /. Z Naming conventions for procedures
The nami ng conventions described in Section 2.5 are not di rectly appl i cable for
nami ng procedures. Many procedures do not return any val ue and, therefore. are
not typed i n the usual sense. The scopes of procedures are usual l y large, often as
large as the whole program. The combi nation of these two effects means that the
mi nor qualifier must disti nguish a procedure from all other procedures just as a
conventional procedure name woul d. When a procedure does return a val ue. the
major qualifier of the procedure name should be retai ned to i ndi cate the type of
the val ue. If no val ue is return

d. the major qual i fi er can be safel y omi tted

because potential ambigui ties are rare and most high-level language processors can
check the correct uses of procedure names from context.
The mi nor qualifi ers of most procedure names are composed of an i mperative
verb (Create, Sum, Pri nt and so on) and the tags for the fi rst one to three
arguments (see Section 2.8 for examples). Procedures i mplementing mappi ngs are
qual ifi ed by the tag for the range which i s the procedure's resul t type, fol l owed
by the word From and the tag for the argument (as i n Ci FromCh( ch) where the
domai n is ch and the range ci ). These conventions offer a reasonable compromise
between the requi rements of speedy creati on, mnemoni c val ue and type checki ng.
2.7.3 Name hyphenation
Some i mplementation l anguages allow the highl ighti ng the. boundaries between
consti tuent parts of names by hyphens (as i n PRINT-CH), by underl i nes (PRINT_CH)
or by the use of capi tal i zed i ni tials (Pri ntCh). Si nce there may be a number of
di fferent ways of separati ng a name, an unambiguous rul e must be chosen: for
i nstance, hyphenation may be restricted to mark the boundary between the major
and mi nor qual ifier only. Marking the components of type construction would
resul t i n too many separators, while sub-components of mi nor qual ifiers are
difficul t to define unambi guously. Agai n, an exception can be made for
procedure names where the mi nor qual i fi er is constructed i n a wel l -defi ned way
from a few words (2. 7.2). These components, as well as the major qual ifier, may
be hyphenated.
2.7.4 Parameter order in procedures
Correspondence between actual and formal parameters i n procedure cal l s has
tradi tionally been establ i shed by thei r orderi ng: i n general the nth actual
parameter wi l l correspond wi th the nth formal one. Thus, the ordi nal number n
of a parameter acts as i ts external name. The choice of the parameter order is a
nami ng problem where conventions are appropriate. Si nce i mportant properties
of the parameters have al ready been expressed by the formal pa
ameter names, we
can proceed by mapping the names i nto an order. Thi s can be accompl ished by
establ i shi ng separate canoni cal orderi ngs for major and mi nor qual i fiers and
sorting parameter l i sts accordi ngly. The canoni cal orderi ng should be based on
the i ntui tive si ze or i mportance of the abstractions represented. Note that the
mi nor qual i fiers often come al ready parti al l y ordered (2. 5).
An exception is warranted if some parameters are used to return val ues from a
procedure. Because of the dangers i nherent in thei r mi suse, these parameters
shoul d be expl i ci tl y identi fi ed by wri ti ng them fi rst i n the parameter l i sts. Thi s
rul e i s easi l y remembered because the orderi ng resembl es the conventional order
in assignment statements [ Lampsonl].
2.7.5 Use of comments for explanation
Al though comments have l ong been an i mportant part of programmi ng practice,
their val ue must be re-exami ned in l ight of the meta-programmi ng conventions.
The meta-programs themsel ves, unencumbered by petty l i mi tations of high-level
l anguages, can answer the same operational purposes as comments used to serve.
This poi nt i s expressed in the di scussion of comments in [Kernighan
Piauger] thusly: "If you wrote your code by fi rst programmi ng i n a made
up pseudo-language ... then you al ready have an excel lent 'readable
description of what each program is supposed to do'." (see also the quote
from the same reference i n Section 2. 3)
I n particular, comments descri bi ng procedure parameters are superseded by the use
of pai nted types and nami ng conventions; structure descri ptions are given i n the
Abstractions di visions of meta-programs; the i ntents of action cl usters are stated
by refi nements. Si nce exceptional needs for comments can be always satisfied by
the meta-programmer, technicians do not have to wri te explanatory comments at
al l .
J./. Programming language syntax extensions
Conventions about the use of the i mplementation language are often the easiest to
state in terms of extensions of the language syntax. As noted earlier, these
extensions need not be backed by software i mplementation.
The extended syntax may regulate the use of new J i nes, spacing, and i ndentation,
otherwise partial l y or whol ly ignored by the language processor. Typical l y, the
i ndentation would be used to show the nesting of scopes, condi tional and i terative
Example 2.3.3 is shown wi th standardi zed i ndentation. Note that
compound and condi tional statements fi tting enti rely on a si ngle l i ne are
treated differently from longer ones. Al though a natural convention, such
a fine di sti nction would be diffi cul t to express in syntax equations.
When the i mplementation language al lows a number of equi valent options, a
si ngl e one may be selected for use, or redundant i nformation may be encoded i nto
the choice. To distinguish logical ly different uses of the same syntactic form; to
identify a group of statements as the i mplementation of a higher-l evel construct,
or to emphasize a particularly i mportant statement, further redundancy can be
i ntroduced in the form of standard comments. I nsofar as the use of these
comments must follow prescri bed syntax, the remarks of the previous section do
not apply.
J.. Standard operations
Whenever the meta-programmer defi nes a new abstraction, he should also
consider the i mmediate i mpl ementation of a number of standard operations for
checki n!. pri nti ng and enumerating i nstances of the abstraction.
The purpose and detai l s of the checki ng and prmti ng procedures were
discussed in Section 2.6. Examples are gi ven in Section 2. 9. I t i s also
worth noti ng that by wri ti ng the checki ng and pri nti ng procedures, the
techni ci ans' mental models of the abstractions are confirmed or updated;
thus these procedures are also very effective means of communi cati on.
The enumerator procedure provides conveni ent access to al l i nstances of the the
abstraction by arrangi ng to cal l a formal procedure, representi ng the body of a
loop, once for each i nstance. The diffi cul ty of performi ng the enumeration may
range from si mpl e counti ng to compl ex operations on sets. I n ei ther case, the
enumerator serves to hide i nformation [ Parnasl] about the nature of loops
i nvol vi ng the ab_stracti on. The appl i cabi l i ty of enumerators is determi ned by
weighi ng the val ue of i nformation hi di ng agai nst the executi on overhead
i ntroduced. If ei ther the enumeration algori thm or the body of some loop is
complex, relative overhead wi l l be low and i nformation hi di ng wi l l be val uable.
The detai l s of enumerator procedure conventions are highly dependent on
the avai l abi l i ty of various i mplementation language features. Gi ven that
procedures may be passed as formal parameters, the conventi on may look
as foll ows: for abstraction X, EnX( Proc) wi l l call Proc( x) for all i nstances
X of the abstracti on. For example, EnCi ( Pri ntCi ) would i mplement the
i nformal meta-programmi ng statement:
for al l ci , pri nt ci
Other i mplementations, usi ng macros or even by manual copyi ng of acti on
cl usters, are also possi ble.
. Meta-Programming Example
We now have suffi ci ent theory to attempt i ts appl i cation to a si mple example. The
subject problem for the example was chosen to be the one descri bed by [Dijkstra] so that
the cl ose relationshi p between the structured design and the meta-programs can be better
i l l ustrated. Briefly restated, the problem is to prepare a plot of some i nteger function
gi ven in parametri c form (fx( i ), fy(i )) on a l i ne-pri nter which
i s capable onl y of the
fol lowi ng operations:
pri nt bl ank
pri nt mark
return carriage and start a new l i ne
Dijkstra's sol uti on - which we shal l al so fol l ow - is a program consi sti ng of si x "pearls",
or levels of refi nement. These are from the top down: (Dijkstra's names are gi ven i n
1 . (COMPFIRST) says that we fi rst bui l d an "i mage" then pri nt i t.
2. (CLEARFI RST) expl ai ns bui l di ng as cl eari ng the i mage then setti ng marks.
3. (ISCANNER) defi nes setti ng marks: for al l i (parameter for fx and fy) add mark.
4. (COMPPOS) states the rul e for addi ng marks: cal cul ate the posi tion of the mark
(fx(i ), fy(i )), then mark that posi ti on.
5. (LI NER) contai ns the defi ni ti on of the i mage: i t consists of a fi xed number of
"l i nes". To cl ear the i mage (used by 2), it cl ears al l l i nes. To pri nt the i mage ( 1),
i t pri nts all l i nes. To mark a posi tion (4), i t sel ects the l i ne at y and marks that
l i ne at the gi ven x.
6. (SHORTREP) i ntroduces a parti cul ar representation for l i nes: they are fixed l ength
arrays of characters wi th an associated counter which keeps track of the number
of characters to be pri nted. To pri nt the l i ne, i t pri nts the requi red number of
characters from the array. To cl ear a l i ne, the counter i s reset to 0. I f a posi tion
i s to be marked, dependi ng on the counter, the l i ne fi rst may have to be
"l engthened" and the added space fi l l ed wi th bl anks, then the mark may be stored
in the array.
Si nce both the probl em and the sol ution are now presented, the question may arise: what
can we expect to add to this? For the answer, a compari son of goal s is in order. Dijkstra
anal yses the program development process, from the poi nt the probl em i s cl early posed,
al l the way to the completion of the l anguage processor executabl e program text. We, on
the other hand, assume that such design work has al ready been compl eted by the
meta-programmer, except that this design mi ght not be i n machi ne executable (or even
human readable) form, but rather i n the highly personal notation of the
meta-programmer, such as personal notes, mental i mages, references to l i terature or a task
order from the customer. In particul ar, a specification statement, such as the above
descri pti on, would probably not exist at al l . What remai ns to be accompl ished is to
transfer the knowl edge of the design to the technician who wi l l prepare the machi ne
executable version and do the debugging. The transfer medi um wi l l be a meta-program.
Let us also assume an i mplementation language whi ch i ncl udes data structures, such as
ALGOL W [Hoare-Wi rth] or BCPL [ Ri chards].
The fi rst meta-program wi l l describe the lowest level of refi nement: (l i ne numbers are
gi ven for reference onl y)
1 Abstracti ons:
xc x coordi nate
ch character
I n
l i ne, structure wi th f i el ds:
7 Operati ons:
8 Pri ntln( l n) :
mpxcch fi xed si ze
for al l xc i n I n
1 0 Pri ntCh( mpxcch[xc ] )
1 1 Newli ne( )
1 2 Pri ntCh and Newli ne must be decl ared EXTERNAL!
1 3 end of Pri ntln . . .
1 4 Cl earln( l n) :
15 set xcMaco
1 6 end of Cl earln . . .
1 7 Markln( l n, xc ) :
1 8
fi rst ensure xcMac>xc:
1 9 for al l txc i n [xcMac, xc- 1 ]
20 mpxcch[t xc]chSpace
21 xcMacxc+1
22 mpxcch[xc] +chMark
23 end of Markln . . .
24 Pxln( l n)
25 pri nt on new l i ne: ( al l #' s octal )
26 "I n: " I n, xcMac, Pri ntln( l n)
27 end of Pxln . . .
28 Ckln( l n) :
29 i f xcMac<O or >xcMax t hen error
for all xc i n I n do
end of LKLU..
if mpxcch[xc] is not s
ace or mark then
Whi l e thi s meta-program contai ns very l i ttle i nformation about the nature of the l arger
problem, i t i ntroduces the basi c abstractions and operations rel yi ng onl y on global
l anguage. I n l i ne 2, we fi nd the i ntroduction of the pai nted i nteger type xc whi ch wi l l
represent pri nter posi tions. The reason for not cal l i ng i t pri nter posi tion i s the
expectation that magni fi ed and rotated pri ntout formats may be added l ater. The
expl anation of thi s fi ne poi nt in the meta-program woul d serve no operational purpose,
however. The real defi ni tion of the abstraction xc is gi ven by the operati ons fol l owi ng:
xc is a quanti ty whi ch is used as shown.
The fi el ds of structure I n i n l i ne 4 i ncl ude a fi xed si ze array and the quantity xcMac,
ostensi bl y designati ng the defi ned portion of the array (see Section 2. 5). The al l ocated
size of the array wi l l be set to some val ue, say 10, and named xcMax by convention.
I n l i ne 8, the defi ni tion of the fi rst operation starts. The name Pri ntln i s a typical
construction from an acti ve verb and the parameter type. It may be pronounced parti al l y
spelled out pri nt-I -n or, i nformal l y, as pri nt-l i ne. The statement on the next l i ne:
for al l xc in I n
elaborates i nto a loop from 0, whi ch may be the defaul t lower bound i n . the global
language, to xcMac. The latter quanti ty can only be obtai ned from the parameter In by
field selection; thi s is an example of a coercion. Type compati bi l i ty in the next statement:
Pri ntCh( mpxcch[xc] )
can be easi l y checked: mpxcch may be i ndexed by xc and yi elds the ch expected by
Pri ntCh.
The explanation about a subtle i mplementation language requi rement i n l i ne 12 is a
useful precautionary measure.
A si mpl e refi nement i s apparent in l i ne 18 where the purpose of an action cl uster i s
stated, fol l owed by more detai l . The quanti ties chSpace and chMark, the character codes
for space and the mark, arc di sti ngui shed i nstances of the type ch. Thei r defi ni tions can
be safely entrusted to the techni ci an. The convenient notation for an i nterval in l i ne 19
need not be a legal construction i n the i mplementation l anguage. We al so note a tri ck i n
l i ne 21, setti ng xcMac (of I n, by coerci on) to i ts desi red val ue di rectly i nstead of
Dijkstra's origi nal :
whi ch i s more diffi cul t to prove correct. The practical val ue of the i mprovement i s
i nfi ni tesi mal but then there was no preci ous production ti me wasted by expl anation.
Starti ng at l i ne 24 the test pri nt and check procedures (gi ven the standard names Pxln
and Ckln, respectively) are defi ned for l i nes. The di fference between the normal and the
test pri nt procedures is evident; in fact the normal pri nt procedure, Pri ntln, is used as
part of the test pri nti ng. Test pri ntout wi l l be in octal for easy compari sons wi th data
obtai ned by an i nteractive debugger. The code to be wri tten when the errors are detected
i n the check procedure (l i nes 29 and 32) is defi ned by convention.
The next meta-program will define the next higher level of abstracti on, provi di ng the
second di mension to form the i mage:
34 Abstracti ons:
38 Operat i ons:
i m
mpycl n
y coordi nate
i mage, structure with f i el ds:
fi xed si ze
39 Pri ntl m( i m) : for al l yc Pri ntln.
40 Cl earlm( i m) : for al l yc Cl earln.
41 Markl m( i m, xc, yc) : Markln( mpycl n[yc], xc)
42 Pxl m( i m):
43 pri nt on new l i ne ( al l #' s octal ) "i m", i m
44 for al l yc, pri nt on new l i ne "yc", yc, Pxln
45 end of Pxl m .a
46 Cklm( i m) : for al l yc Ckln
The upper l i mi ts of the loops on yc wi l l be ycMax (i mpl ici t from mpycl n bei ng fi xed
si ze) because there i s no ycMac defi ned anywhere. We al so note a compound coercion i n
l i ne 44: Pxln needs a I n, but the only quanti ti es avai l abl e are i m, the formal parameter,
and yc, the loop variable. The sol ution i s si mple: Pxln( ( mpycl n of i m) [yc ] ).
Fi nal l y, the dri ver i s meta-programmed as fol 1ows:
47 Abstracti ons:
49 Operati ons:
par parameter for the parametric functi ons XcPar, YcPar.
XcPar(par) : return mi n( par, xcMax)
1 YcPar( par): return mi n( par, ycMax)
52 EnPar( Proc ) : for al l par i n [0, 1 00) Proc(par)
53 Draw( ):
54 CompPar( par): Markl m( i m, XcPar( par), YcPar ( par) )
55 reserve storage for l ocal structure i m
56 Cl earlm, EnPar( CompPar), Pri ntl m
57 end of Draw . . .
An enumerator i s speci fied i n l i ne 52 to h ide i nformation about the nature of loops on
pars i n anti ci pation of changes to more complex loops, i n case pars are changed to
floati ng poi nt representation, for example. The use of the enumerator is i l l ustrated i n
l i nes 5 6 where i t i s cal l ed to cause execution of the loop body, defi ned i n l i ne 54; for al 1
pars. I n thi s i nstance, the notation i s rather unfortunate as the body of t he loop i s
removed from the place where i t i s active, but the techni ci an' s task of el aboration remai ns
si mpl e. Once the technicians are fami l i ar wi th the construction, a more compact notation
may be used, such as the ALGOL 68 style:
EnPar( CompPar(par): Markl m( i m, XcPar( par) , YcPar( par ) ) ) .
A pai r of si mpl e parametri c functions are also defi ned i n l i nes 50 and 5 1 for
completeness. The i mpl i ci t pai nti ng and unpai nti ng operations in the functions wi l l
remai n i mpl ici t i n the code as well as long as al l underl yi ng types are i ntegers i n the
i mplementation. I n a strictly typed envi ronment, the expression
return mi n( par, xcMax)
woul d have to be wri tten as
return Xc( mi n( l nt(par) , l nt(xcMax) ) ); 0I
return XcMi n( Xc( par), xcMax)
where Xc i s a pai nti ng, l nt i s an unpai nti ng operator and XcMi n i s the mi ni mum
operation defi ned for xes. Some of these complexi ties are due to l ack of foresight the
bounds checks for the coordi nates should have been i mpl emented in the lower l evels. The
omission can be easi l y remedi ed:
1 7. 1
1 7. 2
41 . 1
i gnore out of bounds xc:
return unl ess xc i s i n [O, xcMax)
but i gnore out of bounds yc!
. Comparisons and Combinations with Other Programming Methods
I n thi s section, the relationshi ps between meta-programmi ng and the most i mportant
methods of software engi neeri ng, are di scussed. Whenever the method di scussed attacks
the same problems as me
a-programmi ng, we contrast the di fferent approaches; otherwise
the possi bi l i ty of combi ni ng the ideas wi l l be explored.
2.9. 1 High Level Languages
The development of high level l anguages was a H .torical l y i mportant step i n
i mprovi ng programmi ng producti vi ty. A signi fi cant factor i n thei r success has
been the users' taci t acceptance of si mpl i fyi ng conventions whi ch go beyond the
syntax and semantics of the l anguages to i ncl ude the use of standard run-ti me
envi ronment, 1/0 packages, si mpl i fi ed regi ster and i nstruction usage. The factors
more general l y recogni zed as i mportant have been readabi l i ty, conciseness,
availabi l i ty of operators, control structures, compi le- and run-ti me checks.
When high level languages are used in conjunction wi th meta-programs, we saw
that readabi l i ty of code becomes less cri tical (2. 7), type checki ng may be the best
handled by nami ng conventions (2. 5) and mechani cal enforcement of other
conventions is unnecessary (2. 3). What remai ns essential are capabilities, access
to the most effi cient means of doi ng useful work on the computer.
Examples of capabi l i ti es may i ncl ude such mundane conveni ences as
compi l e ti me constants, the abi l i ty to retri eve the remai nder in a di vision
operation and the high order part i n a product, or to access data through
poi nters; or necessities such as readi ng or wri ti ng magnetic tapes.
Unfortunately, questions of capabi l i ti es have become enmeshed wi th styl istic
considerati ons and access to capabi l i ti es has been often denied for fear of
aesthetic di suni ty, abuse, loss of protection or possi bi l i ty of mi sunderstandi ng.
Whi l e these fears have been val i d under conventi onal organi zation of production,
under meta-programmi ng styl i sti c focus i s on the meta-programs and the language
of i mplementation is si mpl y a tool of i nteraction wi th the computer. The style of
e meta-programs is controlled expl i ci tly, by the meta-programmer, and
i mpl i ci tl y, by admi nistrative conventions (2.3). Further controls of styl e by high
level language processors are redundant and may actual l y be harmful i f
capabi l i ties are lost as a resul t.
2.9.2 Buddy System [Metzger], Ego/ess Programming [Weinberg]
Both of these essentially equi valent techniques emphasize careful readi ng and
checki ng of code before debugging may start It is al so significant that the
checki ng would not be done by the author, who is more l i kely to overlook hi s own
mistakes, but by a peer, the buddy. The fol lowi ng advantages accrue from the
arrangement: debugging is si mpl ified because the checki ng is l i kel y to remove
some fraction of the mistakes; the checki ng also ensures that at least two persons
wi l l be fami l iar wi th the deti l s of the code; fi nal l y, the peer review may serve as
an i ncentive for more careful work. The major cost factor is the the time spent
by programmers readi ng other programmer's code, l earni ng the local language
defined therei n and understandi ng the detai l s to the degree necessary for fi ndi ng
mistakes. Note that there are no operational l y unambiguous signals of the
reviewer's fai l ure to do a thorough job. In fact, the better the unchecked code, the
more diffi cul t to eval uate the reviewer's work.
In a Software Production Team, a form of the buddy system is present: al l design
detai l s undergo i ntense scruti ny by the meta-programmer, whi l e wri ti ng the
meta-programs, and by a techni cian whi l e wri ting the code. Si nce both of these
activi ties are di rectly productive, checki ng does not entai l extraordi nary costs.
Assumi ng, conservatively, equal productivity, a Team of two wi l l complete
some module in half the ti me taken for the same task by a conventional
programmer. The man-hours used are the same in both cases, but the
Team's code is al ready checked. Checki ng of the programmer' s code may
cost an estimated 30-60% more.
Strictly speaki ng, the Team's checki ng is less complete: the technician's
wri tten contri bution, the elaborated code, is not checked by review.
However, the conceptual di fference between the code and the double
checked meta-programs i s small enough to suggest that errors i ntroduced
by the el aboration process wi l l be si mpl e and few in number. These and
the other remai ni ng errors wi l l be caught duri ng debugging.
The combi nation of di rectl y productive and checki ng acti vi ties also means that the
completion of the productive task i mpl ies the completion of a careful scan of the
contents and, therefore, a measure of checki ng.
The buddy system and the Team approach both requi re that the participants
practice ego/ess programming [Wei nberg], that is be wi l l i ng to release thei r work
for public scruti ny. The meta-programmer should have no problem in accepti ng
thi s condi tion si nce the meta-programs are al l but worthless un_less someone reads
them. However, the technicians are put i nto a potenti al l y l ess comfortble
si tuati on: not onl y they cannot keep thei r programs private, but they must also
submi t to decisions made by the meta-programmer. This suggests that
i nexperienced programmers shoul d be selected for techni cians. These people
would welcome the learni ng opportuni ty and would be moti vated pri marily by
bei ng part of an extremely productive organization.
An attempt to combi ne the si mpl er social structure of the buddy system wi th
higher efficiency of meta-programmi ng i s cross meta-programming. In thi s
scheme, a pai r of programmers both play the dual roles of meta-programmer and
techni ci an worki ng for one another. This way the checki ng ti me wi l l be reduced
and scrupulousness of checki ng wi l l be operational l y ensured, as shown above.
The di fference between cross meta-programmi ng and the Software Production
Team organi zation i s in special izati on: the Team members are more special i zed i n
thei r roles. Because of the Jack of special i zation, cross meta-programmi ng is l ess
effi cient. A programmer is ei ther over-qual ified to be a technician or
under-qual ified for the meta-programmer's job. Nevertheless, under existing
condi ti ons, cross meta-programmi ng may be an attractive form of organization.
2.9.3 Structured Programming, Goro-less programming
Structured programmi ng i s a design methodology, origi nal l y descri bed i n
[Dijkstra], whi ch can be used to great advantage by engi neeri ng organi zations
( 1.2) for system analysis and also by the meta-programmer for detai led design.
The meta-programmi ng requi rement that i mplementation proceed bottom-up (2. 2)
i s compati ble wi th structured programmi ng: the design may i tself be bottom-up
[ Dahi -Hoare] or the top-down design may precede the i mplementation.
The problem of personnel trai ni ng for structured programmi ng i s greatly
si mpl ified if the technique is used in a Software Production Team: only the
meta-programmer has to be trained i ni ti al l y. The technicians fol l owi ng the wel l
structured meta-programs cannot but wri te structured code.
The remarks of Secti on 2. 9. 1 apply to comparisons of structured constructs and
unstructured GoTo statements i n i mplementation languages.
2.9.4 Chief Programmer Teams
The Chi ef Programmer Team (CP) organi zation is the pioneeri ng appl i cation of
engineering and management pri nci pl es to production programmi ng. The method
is i ntroduced i n [ Bakerl] thusly:
"Seeki ng to demonstrate i ncreased programmer productivi ty, a functional
organi zation of specialists led by a chief programmer has combi ned and
appl i ed known techni ques i nto a unifie methodology. Combi ned are a
program production l i brary [also called development support l ibrary,
DsL], general-to-detai l [top-down] i mplementation and structured
programmi ng ... "
Addi tional tehni ques associated wi th the CP organi zation are egoless
programmi ng, top-down development, the employment of "more competent but
fewer people", among them the backup programmer who "can assume the
leadershi p role at any ti me, i f requi red", and the programming secretary who
mai ntai ns the DsL; and fi nally, the "rei ntroduction of seni or people i nto detai l ed
program codi ng" [Mi l ls]. Comments made earl ier on structured programmi ng and
egoless programmi ng remai n applicable when these techniques are used i n a CP.
I t is evident that these i deas cover a l arger range of concerns than the present
work; i n particular, system archi tecture and system design are wi thi n the scope of
the team effort, and so are certai n tools. We assi gned the former tasks to an
engi neeri ng organi zation ( 1. 2) and have not di scussed the question of tool s at all
For example, the DsL and the associated special i st, the programmi ng
secretary, can greatly si mpl ify the use of batch processi ng systems. The
reported success of thi s tool wi thi n or without a CP
(Mi l ls] shows that
software i mplementation of al l clerical functions is not a prerequi si te of
programmi ng producti vi ty. The DsL's significance i n promoting
communications wi l l be di scussed below.
Top down development of system archi tecture, as advocated i n (Mi l ls],
requi res that the archi tect have a clear visi on of the lower levels of
abstraction. Often the design wi l l have to be developed i teratively,
"osci l lati ng between two levels of descri ption ... This osci l l ati on, this form
of trial and error, is defi ni tely not attractive, but wi th a sufficient lack of
clai rvoyance and bei ng forced to take our decisions i n sequence, I see no
other way." comments [ Dijkstra]. Uncertai nty absorption and conti n uous
process producti on, i ntroduced in Section 1.2, are expl i ci t concepts for
clari fyi ng organi zational roles whi l e the design is developed. Si mi lar i deas
are i mpl i ci t i n Mi l l s' remarks: "software was del i vered ... i n spi te of 1 200
formal changes i n the requi rements [. The] rate at whi ch computer ti me
was used remai ned nearly constant from the 9th to the 24th month, a
consequence of the conti nuous i ntegration ... " [Mi lls].
I n a CPT the chi ef programmer bears project responsi bi l i ty, aided by the
backup programmer who can i nsure the conti nui ty of the project should
the chi ef leave. The locus of project responsi bi l i ty may or may not reside
i n a SPT depen
i ng on the detai l of task orders (2.3). For shorter, routine.
or general l y parsi monious projects the meta-programmer can take the ful l
responsi bi l ity. Larger projects, which have to be able to survi ve changes i n
key personnel , shoul d be supported by an engi neeri ng organi zation
representi ng the overall project responsi bi l i ty and mai ntai ni ng conti nui ty.
The task orders from the engi neeri ng organi zation to the SPT would b
more detailed i n thi s case and the tasks themselves woul d be shorter i n
durati on. Several vari ations for replacement of personnel are possi ble: the
meta-programmer can be replaced wi th the loss of at most one task plus
hi s knowl edge of the project; the key archi tect i n the engineering
organizat1on could be probably replaced by the meta-programmer, or &
backup archi tect could be employed by the engi neeri ng organi zation.
The basic CPT i dea of letti ng seni or talent partici pate in di rectly productive
acti vi ti es has been ful l y adopted i n the SPT organi zation (2.2), substantially
determi ni ng the meta-programmer's role. Nonetheless, there are numerous
di fferences of detail. The meta-programmer does not wri te code at all, yet he can
mai ntai n absol ute product contro b mea-programmi ng. Lacki ng thi s powerful
communi cation i nstrument, the chi ef programmer must code the cri tical portions
of the program to exercise control. Because of the highl y leveraged position of
the meta-programmer, the other members of the team do not have to be "more
competent" to be able to emulate and absorb the meta-programmer's ski l l and
The cri tical communi cation problem (1.6) i s addressed in a CPT by reliance on
structured programmi ng and the vi si bi l i ty of programs afforded by the DsL.
These measures enable programmers to read and understand each other's code. I n
the SP the wheel organization, the central i zation of language creation, and the
object nami ng conventions ai d communi cations to the degree that al l readi ng and
understandi ng can be overlapped wi th di rectly productive acti vi ti es.
The opposi te di rections of i mplementation i n CPT and i n SPT were determi ned i n
both cases by i ndependent consi derations. The bottom-up order of SPT i s
necessary so that communi cations can al ways use known, concrete, terms; defined
operationally by procedures al ready coded and understood. The argument
supporti ng top-down order of i mplementation i n CPT (a question separable from
the order of design which has been discussed above) shows the efficiency and
thoroughness of testi ng when higher l evel routi nes (the earl ier ones i n the
top-down sequence) are avai l able to create a real i stic test envi ronment for lower
levels [ Baker] [ Barry]. It is possi bl e to combi ne these advantages: a set of
routi nes may be coded bottom-up unti l a level at the top or near the top i s
reached, then debugging can start from the top down, al ways usi ng the higher ones
to create the test envi ronment for the others below. It should be noted that the
test data in the real istic envi ronment i s more complex than i f data were generated
by special purpose dri vers. State vector syntax checkers ( 2. 6) are i ndi spensable for
local i zi ng errors under such ci rcumstances.
2.9.5 Automatic Program Verification
I n [ Deutsch] we fi nd the fol lowing defi ni tion of this method:
"Program verification refers to the idea that one can state the i ntended
effect of a program in a precise way that i s not merely another program,
and then prove rigorously that the program conforms to thi s specification.
Automatic refers to the hope that ... we can bui l d systems that perform
some or all of thi s veri fi cation task for us".
The promi se of verification is then both qual i tative and quantitati ve. On the
qual i tati ve side, absolute, rather than approxi mate, correctness wi l l be attai nable.
Quanti tatively, the mechani zation of the process may i mprove producti vi ty by
el i mi nati ng the need for manual debuggi ng. Thi s di stincti on i s i mportant, because
the absol uteness of correctness has very t i ttle practical val ue. The property val ued
by users is reliability, defi ned i n [Parnas2] as a "measure of the extent to which
the system can be expected to del i ver usable services when those services are
demanded." Parnas goes on to argue that rel iabi l i ty and correctness are
compl ementary but not synonymous. A l ogi cal l y correct program may be, in fact,
unrel i abl e if i ts specifications fai l to account for the possi bi l i ty of hardware
errors or i ncorrect i nput.
I n general , i t is not sufficient that the system mai ntai n i ts temper in face of
adversi ty as operational experience may show that techni cal l y well defi ned
responses may be operati onal l y unacceptable. The di ffi cul ty of predi cti ng the
sources of operational di fficul ties so that thei r handl i ng can become part of the
specifications i s well i l l ustrated by the lSS experi ence [Vyssotsky] where most of
the (extremely rare) fai l ures were caused by external events, or combi nation of
events, whi ch the system designers di d not foresee at al l . Thi s means that if the
number of program errors can be kept substantially below the number of
speci fi cation problems, further el i mi nation of program errors wi l l not perceptibly
i mprove rel i abi l i ty.
The projected output of veri fi ers would i ncl ude theorems and condi ti ons under
whi ch the theorems do not hold. The condi ti ons might be of the form of paths
through the program, symbol i c counterexampl es and so on. Such output i s
essenti al l y the equi valent of a run-ti me error i ndi cation (2.6). To be
quantitatively hel pful , a veri fi er wi l l also have to local i ze the poi nt of error.
The possi bi l i ty of i nteractive hel p to verifi ers [Deutsch] al so raises personnel
i ssues: what l evel of trai ni ng wi l l be requi red for the hel pers?
3.1 Introduction
To verify the predictions of the meta-programmi ng theory, a series of experi ments were
performed, as descri bed in this chapter. The general experi mental approach was to do a
small number of full -scale prorammi ng projects, with some variation i n key personnel
and in organi zation (Sections 3.2 and 3.4). In particular, in the last project (Project D,
3.9. 3) three programs were produced from the same specifications, by three different
groups i n a controlled experi ment
All partici pants in the experi ments were ful l -ti me empl oyees. Programmi ng was done on
personal computers usi ng a high-level system programmi ng language (3. 3). Uti l i ty
programs on the computers were i nstrumented to record measurements of thei r usage
automatically. Detai l s of the measurement system are descri bed i n Section 3.5 and i n
Appendi x B. One of the projects ( Project C; 3.6, 3.9.2, Appendi x C) produced a si mpl e
Management Information System, which was later used to process the col lected
Independent evaluation of the experimental resul ts is made possi bl e by the detai led
descri ptions of the experi mental envi ronment (3.3), the personnel selection cri teria (3.4),
the task specifications (3.6, Appendi ces C and D), the defi ni tions of the producti vi ty
measures used (3.7), and the processi ng used to el i mi nate vari ous distortions from the raw
measurement data (3. 8).
Section 3. 9 descri bes the resul ts of the experi ments. During the longest experi ment,
Project C. al most 14,000 l i nes of code were wri tten, at an average rate of 6. 12
l i nes/man- hour. The control led experi ments of Project D showed that comparable resul ts
can be obtai ned by different persons acti ng as meta-programmers. The diffi cul t
experimental comparisons of the meta-programmi ng and conventional organi zations,
however, yi elded only i nconclusive resul ts.
3.2 Experimental Approach
Organi zation of experi ments for the measurement of software producti vi ty demand a
fundamental choice of resource al location between a larger number of experi mental
i mplementation efforts, each l i mi ted i n size and scope, or a smal l er number of samples
whi ch may be more representative of the i mportant, larger-scal e, problems. I n the former
case the results can be statistically significant, but serious doubts would remain about
thei r scal abi l i ty or appl icabi l i ty to the larger-scale domai n. The latter choice would yield
resul ts which would be appl icable, but thei r statistical val ue woul d be correspondi ngly
reduced and the contri butions of di sti nct variables bl urred.
The concern about the scalabi l i ty of results is caused mostly by the nonl i near growth of
communi cations, both wi thi n the organi zation produci ng the program and wi thi n the
program i tself ([Brooks] Chapter 8). Si nce the di fficul ty of communi cations in a team
of producers caused by the conti nuous enrichment of the l ocal language has been posi ted
i n Section 1.6 as the basic structural obstacle to higher producti vi ty, the deci si on was
made to perform onl y larger scale experi ments whereby this effect coul d be observed or
Real i stic resource l i mi tations woul d severely l i mi t the number of such experi ments. They
would then, at best, serve as demonstrations of the feasi bi l i ty of achievi ng certai n result
under certai n condi tions. The subjective significance of the demonstration to an external
observer would depend on the deviation of the results from the norm; the presence of
val i d predictions, si nce a predicted devi ation i s less l i kely to be a fl uctuation; and fi nally,
the perceived abi l i ty to re
roduce the ci rcumstances of the experi ment
The enthusiastic response to the Chief Programmer Team results i n the celebrated
New York Ti mes Informati on Bank project [ Mi l ls] [Bakerl] exempl i fi es the
potential i mpact of demonstrations. The results were far above norm; the authors
i n fact predicted the producti vi ty i mprovement, and the purely organi zational
approach i nvi ted reproduction.
Si nce the envi ronmental and personnel factors are generally the major obstacles to
i ndependent reproduction of resul ts, i t was also decided that, i nsofar as resources permi t,
the fraction of resul ts attributable to these factors shoul d be also demonstrated. The
meta-programmi ng method i tself makes no assumptions about tools ( 2.2) and special
programmi ng ski l l s are requi red onl y from the meta-programmer. The fraction of
producti vi ty i mprovement not due to the envi ronment and personnel should then be the
method's own contri bution, reproduci ble in a wide set of envi ronments by di fferent
partici pants.
The separation of contributions to the results was done by matched pai rs of
demonstrations, in which some cri ti cal variable was varied whi le the other variables were
matched as closely as possi ble. Whenever matchi ng requi red approxi mation, ei ther
because of the di fficulty of perfect matching, or because the vari ation i n the cri tical
variable precluded certai n matches, a conservative approach was taken, as described for
each case in the sequel , to obtai n credi ble resul ts.
3.3 Experimental Environmnt
Al though not a part of the method under discussion, a descri ption of the programmi ng
envi ronment is i n order; fi rst, because it contai ns some unusual features, _ and second, to
al l ow di rect comparisons of the uncontrolled experi mental resul ts wi th other experiments
or experiences.
The choice of envi ronment was determi ned by considerations of avai l abi l i ty, i nherent
effici ency so that personnel costs can be reduced, and support of measurements (3. 5).
Throughout the experiments, an operati ng personal mi ni -computer

[LRG] [Lampson2]
was available to each parti ci pant at all ti mes. A removable di sk cartridge provided 2. 5
mi l l ion characters of file storage on each computer. Furthermore, the computers were
connected by a communication network [ Metcalfe-Boggs] to each other and to a central
time-shari ng system which was used as a repository for common fi les and for archi val
storage. Another means of backi ng-up fi l es was the copyi ng of whole di sk cartridges. A
high speed pri nter was also avai lable via the network.
All programmi ng was done i n the typeless system programmi ng l anguage BcPL
[Richards]. The sequence of operations i n the program creation cycle was to generate or
edi t source program text usi ng an i nteractive editor, compi l e the new source or the old
source modules affected by the changes, issue the load command, and run the loaded
program under the control of an i nteractive debugger. The editor used was QED
[Deutsch-Lampson] i n the early experi ments group ( 3.4) and the Project B edi tor ( 3.6)
duri ng the mai n experi ments ( 3.4). The debugger was a di rect descendant of DDT
[TENEX]. I t coul d be used to set breakpoi nts, i nspect variables, and cal l procedures
duri ng execution of a. loaded program. The symbol ic names of procedures, labels, and
global variables were known to the debugger, but the names of local variables and compi l e
time constants were not
The programs wri tten could depend on the services, such as streams, fi les and file
di rectories, of an open operating system descri bed in [ Lampson2].
Parti ci pants also enjoyed reasonably pri vate accomodations. Junior partici pants (3.4),
hi red for the duration of the experi ments, had the experimental work as their full -ti me
assignment. Senior partici pants had onl y the usual load of planni ng, reviews, reports, and
conferences in addi tion to thei r major, full -ti me experimental responsibi l i ty. Al l
parti ci pants were pai d competitive i ndustrial wages commensurate wi th their experiences.
Benefits i ncl uded paid holidays and legisl ated state benefi ts.
To mi ni mi ze the effects of the measurements on the experi mental ensemble, the
measurements were made unobtrusive and largely automatic (3. 5). Absolutely no
eval uations of the measurements were made whi l e the experi ments were in progress except
for periodi c i nspections to ensure that the collected data is safe and complete.
3.4 Experimental Setup
The sequence of experi ments can be divi ded i nto two major groups: fi rst, the early
experiments group comprisi ng two projects designated A and B respecti vely; and second,
the main experiments group whi ch i ncl uded projects C, 01, 02 and D control.
The purpose of the early experi ments was the val idation of the basic m
ta-programmi ng
i deas, the clarification of the supplementary ideas and conventions, and the trai ni ng of a
second meta-programmer. The software produced for projects A and B, a cross-reference
program and a text edi tor (3.6), was used i n support of the mai n experi ments. The edi tor
B provided the i nstrumentation for the measurement (3.5).
Based on the experiences from the early experi ments, the mai n group was designed to
i mplement the approach descri bed i n Section 3.2.
Project C demonstrated the producti vi ty of a Software Production Team and the
qual i ty of the code produced.
Projects C, Dl and 02 showed the degree of i ndependence of resul ts from
personnel factors.
Project D control provided data on the performance of a conventional
programmi ng group for comparison.
The assignment of personnel to the various projects i s i l l ustrated in Figure 9. The present
author is desig

ated MI. Programmer Pl, a researcher wi th a Ph.D. i n Computer Science

and Ml, at that time a candidate for the same degree, were the senior participants. The
technicians Tl-T5 and programmer P2 were junior participants, hi red for the d uration of
the experi ments only. Tl and M2 denote the same person in di fferent roles.
Seni or partici pants were well acquai nted wi th the experi mental envi ronment. Techni cians
got thei r trai ni ng strictl y on the job. Programmer P2 and meta-programmer M2 were
gi ven ti me to practice, as descri bed below, before thei r participation i n the experi ments.
The technicians' score on a programmi ng test (Appendi x A) was a major factor in thei r
selection. However, appl icants wi th professional programmi ng background, who often
had excel lent scores, were considered overqual i fi ed. Techni ci ans Tl, T3 and T4 had very
si mil ar backgrounds (4 years at presti gious uni versities, no professional programmi ng
experience, approxi mately 5 computer science courses wi th a grade poi nt average of 3. 8
for those courses onl y) and si milar test scores (no errors; 75, 70 and 103 mi nutes for Tl,
T3 and T4 respectively). The quali fications of T2 and T5 were si mi lar except for
professional experience and test resul ts. I t is evident from the topology of Figure 9 that
these di fferences could not affect the mai n experi ments group.
f -^
7 ^
\ T5 ,.- M2 \
" - \ I
Project A+B
Jul y-Sept 1 97 4
Project L
Jul y-Nov 1 97 5
Jul y-Nov 1 97 5
Project D
Dec 1 975
Dec 1 975
Project D control
Jan-Feb 1 976
Figure 9 Organi zation of the experi ments. Tl-T5 are techn ici ans. Ml
and M2 are meta-programmers. PI and P2 are programmers.
The partici pation of T3 and T4 in Project C was designed to test that wi th the above
selection cri teri a, the variation i n the techni cians' i ndi vi dual productivi ties i s smal l (1. 5).
M2, the second meta-programmer, learned the use of the tools, the meta-programmi ng
method and the conventions as a technici an i n the early experi ments. He was later gi ven
the opportuni ty to practice the meta-programmer's role i n a team wi th T5 for about fi ve
months. Thus the preparations of Ml and M2 for Projects 01 and 02 differed
considerably. On the other hand, T4 and T3, the other participants of 01 and 02, were
closely matched i n trai ni ng prior to joi ni ng the experi ment, as wel l as after: they took
part i n the production of the same program, Project C, under the di rection of the same
meta-programmer, Ml. The particular pai ri ngs of Ml wi th T4 and M2 with T3 were
obtai ned by random selecti on. After the pai ri ngs the two teams were gi ven i dentical task
orders (3.6) which they i mpl emented i ndependently. These teams were set up to
demonstrate the relative i nsensi ti vi ty of the method of the meta-programmers'
personal i ty, whi le the other variables (envi ronment, problem speci fi cation, technician
selection criteria, techni cian trai ni ng) were held as comparable as possible.
To approxi mate the potential of the other two Project 0 teams, the Project 0 control
team was organi zed around a seni or member, PI, and a junior programmer, P2. The latter
had a B.A. degree in Mathematics and three years of systems programmi ng experience.
He was hi red on the basis of references and an i nterview. No wri tten tests were gi ven;
thi s i s now consi dered a mistake. Accordi ng to standard i ndustry practices, hi s starti ng
salary was 3 1% higher than the techni ci ans'. He was allowed three weeks to get
acquai nted wi th the i mplementation language and the tools.
3.5 Measurement Methods
The si mple measurements obtai ned from the early experi ments were weekl y pri ntouts of
the l engths, i n characters ( 3. 7), of all meta-programs and source language programs. At
the same i nterval , the contents of these fi l es were also stored on magnetic tape. Manual
record keepi ng of ti me spent in various acti vi ti es was also attempted and abandoned as
i mpractical.
In the main experiments, collection of producti vi ty data was aided by software
modifications to the edi tor to record data on a measurement file. Records i n this fi l e are
in form of text l i nes, each contai ni ng the date and time, the name of the person worki ng,
and a code identifyi ng the format of the remai ni ng variable portion of the record. The
contents of the latter part depend on the nature of the event bei ng recorded:
Edi ti ng of fi l es is performed on temporary copies for techni cal reasons. When
the edi ts are compl ete, the user issues a save command to store the edi ted copy i n
a permanent fi le. For every save, a measurement record is made showi ng the
fi l ename. the n umber of characters wri tten, . the change in the si ze of the fi l e, and
a breakdown of the characters wri tten by source, whi ch may be the keyboard, the
previous versi on of the same fi l e, or di fferent fi les identified by thei r names.
At the end of an edi ti ng sessi on, usual l y right after the edi ted fi l es are saved,
general i nformation about the ti me spent edi ti ng, the n umber of commands typed,
the total number of characters entered from the keyboard is recorded on the
measurement fi l e.
Al so at the end of a sessi on, the BCPL compi l er or the loader may be designated by
the user as a successor program. The designation and any parameters to the
successor, such as the name of the fi l e to be compi l ed, are also recorded. When
the compi lation i s complete, control i s automatical l y returned to the edi tor and
the l ist of compi l ation errors i s di splayed. The user i s prompted to make a
comment about the number of errors (see below). This way the use of the
compi ler and loader can be moni tored by the edi tor's measurement mechani sm,
provi ded the user abides by the conventions and always cal l s these programs from
the edi tor.
The user can also make mi scel laneous comments which wi l l be recorded. For
example, a styl i zed comment may mark the begi nni ng and end of a work period,
the reception and completion of a task order, or other i mportant events.
The precise format of the measurement fi l e i s documented i n Appendi x B.
I n preparation for processi ng the col l ected data, the i mplementation of a si mple
Management I nformation System was also undertaken as Project C (Appendi x C).
3.6 Task Specifications
It should be emphasi zed that the object of the experi ments was to measure producti vi ty of
software production organi zations worki ng on well -defi ned ( 1. 2) problems. Other
characteristics of the problems and the qual i ties of the abstract sol utions were not of
primary i nterest.
For Projects A, B, and C there were no fi xed specifications prepared i n advance. The task
orders to the experi mental group, compri si ng the producti on organi zation, were the
statements of problems; the organi zation was to produce code working toward the sol ution
of the problems. These were:
Project A: prepare a cross-referenced l isti ng of a set of BCPL fi l es.
Project B: al low edi ti ng of BcPL source text and other documents wi th commands
such as i nsert, delete, search, read and wri te fi les, and transfer data between fi les.
Project C: i mplement a query language operating on measurement fi les (Appendi x
B), powerful enough to obtai n producti vi ty figures from a database that may
contai n errors.
The lack of pre-pl anni ng meant that the designs had to be di vi ded i nto relatively
i ndependent parti tions so that one part could be i mpl emented whi l e another was
designed. The remai ni ng parts were consi dered onl y i n general terms before ful l attention
could be focussed on them. This mode of operations was consistent wi th the pri nci ples 0
conti nuous process production expounded i n Sections 1. 2 and 1. 3. The success of the
parti ti oni ng, and i ndeed, the success of the production effort, was dependent on the
meta-programmer's understandi ng of the tasks. The above problem statements appeared
wel l -defi ned for the particular meta-programmer Ml because of hi s earl ier experi ences
wi th si mi lar systems. The resul ti ng design for Project C is described i n Appendi x L.
For Project D (01, 02 and D control) it was i mportant that al l groups work on
comparable tasks. Accordi ngly, a detai l ed task order was drawn by an external
col l aborator. The order i s shown in Appendi x D. It specifies a uti l i ty program which
can permute di sk storage whi l e keepi ng the assorted di rectory and file structures i ntact.
The reason for permuti ng storage is usual l y to bri ng l ogical l y consecutive fi l e pages
together i n the physical address space i n order to i mprove the speed of sequential access
i n the rotating memory. Uncertai nties about the permutation algori thm and the user
i nterface were absorbed by the order. Al though the di rectory and fi l e structures were not
described in the order, they were amply documented elsewhere (for example [ Lampson2])
and were also wel l known to Ml, M2, Pl, and, to a lesser extent, to P2.
3.7 Productivity Accounting
The si mpl i fi ed producti vi ty measure, i ntroduced in Section 1. 5, i s defi ned as the amount
of completed source code di vided by the man-hours associ ated wi th i ts producti on. I n
this section, a more detai l ed breakdown of the components of the producti vi ty calculation
i s given.
The quanti ty of code i s always measured i n characters [ASCI I ], al though i t may be
expressed as "l i nes" of 26 characters. The count of characters i s not only more
conveni ent to obtai n for measurements, but it i s also more i nvariant of style. The
conversion factor 26 has been obtai ned by counti ng l i nes in a representati ve sample of
BCPL source programs. Li nes whol ly blank were not counted. End of l i nes counted as
si ngle "carriage return" (CR) characters. The sample programs were properly i ndented;
each i ndentation level on each l i ne counted as one "hori zontal tabul ation" (HT)
character. Conversions of the producti vi ty figures to other l i ne length statistics can be
readi l y performed by converting to character uni ts fi rst.
Code produced by SP's contained no explanatory comments (2. 7.5), but standards
requi red a comment statement wi th the name of every procedure and approxi mately five
comments identifying various groups of declarations in every source module. Al l
comments appeari ng i n code produced by the Project D control team were i ncl uded i n the
length measurements.
The lengths of meta-programs, although reported separately, were not i ncluded i n
producti vi ty figures.
External l y produced shared code was excl uded from the producti vi ty cal cul ation i n al l
projects. I nformation on shari ng opportuni ties was made avai lable to al l three Project D
teams equally.
The final production figure for every project refers to net l i nes, that is l i nes debugged to
proto-software qual i ty ( 1.2). Figures reporti ng on the i ntermedi ate progress of projects,
however, do not di sti nguish between debugged and undebugged l i nes because that would
be i mpractical. Whi le not measuri ng true producti vi ty, these i ntermediate figures are very
useful i n i nvestigations of the conti nuous production process ( 3.9).
Al though the measurements show the precise number of hours worked by al l partici pants,
producti vi ty was calcul ated on the basi s of standard eight-hour days, wi th only a few
exceptions. I nherently part-ti me acti vi ty, such as advance design acti vi ty by the
meta-programmer was i ncl uded as measured. Overtime ( 3.9. 3. 1) was also i ncluded as
measured. Days of physical absence by seni or parti ci pants were not i ncl uded. There were
no sick leaves or personal leaves taken duri ng the projects.
I t is i mportant to note that the meta-programmer's ti me was charged agai nst the SPTs'
producti vi ty. The on-the-job trai ni ng ti me (3.4) of the techni ci ans was si mi larly
i ncl uded. The time for special trai ni ng of M2 and P2 ( 3.4) was excl uded.
3.8 Potential Sources of Measurement Errors
There were a number of fai l ure modes of the measurement setup (3. 5) which caused the
i ntermi ttent recordi ng of erroneous i nformation. Usi ng the redundancy i n the
measurements, i nconsistencies in the data were local i zed and the errors were esti mated or,
in most cases, corrected. The particulars of thi s process depended on the fail ure mode.
For example, the mi ni -computer used i n the experiments (3. 3) relied on a ti me
base, kept i n unprotected core, for keeping ti me. The measurements, i n turn,
recorded the ti me as provided by the machi nes. I t was not uncommon for the
base to get lost whi l e programs were debugged. Many of these events were noticed
and corrected by the users. Others were found by using the Project C system U
scan the database for records with ti me stamps out of order. Each i nstance of the
error was i nspected and the correct ti me was esti mated to fit the correctly
recorded neighbouri ng records. Correction of the database was done by manual
edi ting.
The procedure for local i zi ng and correcting other errors followed the same
pattern. Fi rst, the database was scanned by a special purpose Project C program to
find all questionable records. The sel ected records were then i nspected and
corrected i f necessary.
Another common error was the operator's omission to mark the begi nni ng and the
end of a worki ng period (Appendi x B). These were easi l y found after l isti ng al l
i ntervals of apparent i nacti vi ty whi ch were

longer than 30 mi nutes.

Whi l e it was possi bl e to omi t records of compi l ations, cal l s on the loader, and
syntax errors (3. 5), in fact, the records of these events are precise because the use
of the correct procedure was actually si mpler than the al ternative.
Records of the number of semantic errors (bugs) were general l y unrel i able, partly
because of the subjective element in deci di ng what consti tutes a bug, and partly
because of the complexity of the procedure: at the time the bug was found, the
user was usual l y working with the debugger but the record had to be made i n the
edi tor. An i ndependent rough esti mate of the n umber of bugs can be obtained
from the number of re-compi lations and toads.
Duri ng the experi ments, source code fi les were frequently copied and renamed for
backup, recovery or other purposes. Thi s created a dangerous si tuation in which
the same code might have appeared in the measurements under di fferent names
and might have been counted more than once. Careful moni tori ng of the
appearance of new fi lenames in the database helped to account for these events.
3.9 Experimental Resul ts
The summaries of the measurements are gi ven i n Appendi x E. Sel ected measurements are
al so pl otted in Figures 10 through 13. These measurements do not, in themselves,
comprise the experi mental resul ts. The fol l owi ng sections wi l l complete the basic
measurement data wi th parti cular i nterpretations and wi th the descri ptions of other, not
readi ly quantifiable, resul ts. The summari es by no means l essen the i mportance of the
highly-resolved detai l s of the measurements: in some i nstances the method of
i nterpretation and the acceptabi l i ty of si mpl ifications depend on the nature of the data.
Moreover, access to the detailed data offers the opportuni ty for al ternative
i nterpretations. Fi nal l y, some of the measurements are also of general i nterest.
3.9.1 Early Experiments Group (Projects A and B)
The si mpl ified programmi ng producti vi ty obtained duri ng this early effort can be
calculated from the data gi ven i n Appendi x E. 1 as fol l ows (see also Section 3.7):
5671 source l i nes / (13 weeks 3 hol i days) 3 employees 3. 81 1/m-h
I n addi ti on of the executable code, the projects yi el ded more than 3800 l i nes of
meta-programs. We shal l cal l the ratio of source length to the length of the
meta-programs, the meta-program expansion. I n thi s experi ment, the expansion was
149%. Rel i abi l i ty, user acceptance, and modifiabi l i ty of the products were excel lent;
numerous extensions to the Project B edi tor (such as the addi tion of measurements (3.5))
were later i mplemented by M1, T1, and al so by other programmers whose i nterests were
unrelated to the experiments.
The occurrence of specific diffi cul ti es duri ng the projects suggested the the exhi bi ted
producti vi ty could be i mproved just by refi ni ng the method and the conventions. I n
particular, several days were wasted because of the i nsufficient understandi ng of the
modul ari zation requi rements of the BCPL system. The modul e template fi nal l y developed
has been i n use through experi ments C, 01, and 02. Not al l of the nami ng conventions
descri bed in Section 3.4 were known duri ng the early experi ments; i nstead of usi ng the
standard constructions aX, eX, dX, or iX (3.4), di fferent and often i ncons
istent tags were
i ntroduced. Procedure names (2.7.2) were not regular at al l . Check procedures and test
pri nt procedures (2. 6) were wri tten onl y after some ti me had al ready been wasted by
conventional i nteractive debuggi ng.
I nspection of the graph of the weekly changes in producti vi ty (Figure 10, upper portion)
yi elds some i nteresti ng resul ts. We note that there i sn' t much evidence of a l earni ng
curve for the techni cians. By the end of the thi rd worki ng day i n a completely new
programmi ng envi ronment, with the hel p of the meta-programmer the two techni cians
were able to write about 300 l i nes of code (see E. l). However, this fi gure is not di rectl y
comparable to the long-term average performance because the i ni tial transient period was
not burdened wi th debugging tasks. Al so, the i ni ti al meta-programs were especially
careful i n specifyi ng the ki nd of programmi ng language constructs which were expected to
be used i n the elaboration.
1 000

1 I
I t
t I
t I
I t

" "


- - .
l "
1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3
1 000
l t
t t
. __ _

g ===
l l
w___ ,
l r---.
. ___ .
r-- --
l l
t t

~~~ :
. *@ W*W
1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9
Figure 10 Plot of weekly changes i n codi ng and meta-programmi ng productivi ties (sol id and broken l i nes, respectively) in
the early experi ments group (above) and in Project C (below). The X axis i s marked i n calendar weeks. Codi ng producti vi ty
is shown as l i nes of code per week per technician; adjusted for short weeks where i ndi cated. Meta-programmi ng producti vi ty
is shown as l i nes of meta-program wri tten per week.
Dumps of the project state show that the fi rst load of the A system occurred duri ng the
4th week (the modulari zation problem mentioned earl i er surfaced at the same time), and
the system was rel eased i n the 8th week. System B was fi rst loaded duri ng the 7th week.
The fi rst meta-program for a system B procedure was issued as earl y as the second week.
The overl ap between the two projects explai ns why Project A di d not have a "tai l ", a fi nal
transient period of reduced producti vi ty caused by the preponderance of debugging tasks
relative to code creation tasks. Project B exhi bi ts a tai l , starting at about the 9th week.
J.V.Z Project L
I n thi s project, the ful l y developed meta-programmi ng method, as descri bed i n Chapters 2
and 3, was appl i ed to a medi um si ze problem (3.6). The si mpl ified productivi ty obtai ned
( E. 1, 3. 7) was:
13944 source l i nes / (19 weeks 1 day - 1 holiday) " 3 employees - 6.12 1 /m-h
Separati ng the contri butions of the two technicians, we have:
T3: 7423 source l i nes - 6. 51 1 /m-h
T4: 6521 source l i nes - 5. 72 1 /m-h
The n umber of compi l ations and program loads performed by the techni ci ans were also
very si mi l ar (959 versus 846 and 573 versus 651 respectively (E.2)). The techni cians spent
most of their ti me worki ng on di sjoi nt portions of the system; T3 concentrated on the
compi ler and the user i nterface, whi l e T4 worked mostly on the run-ti me envi ronment
(Appendi x C). Any variation of the i ndi vi dual producti vi ties should be vi ewed i n l ight
of the possible di fferences between the compl exi ties of the subtasks worked on.
After the completion of the project, the fi nal product worked rel iably when used to
process the more than 800,000 characters of measurement records coll ected duri ng the
experi ments. About 20 to 30 programs of an average length of 50 l i nes were wri tten i n
the C language. The Summary of the Measurements i n Appendi x E was compi l ed from
the outputs of these programs.
Al though a smal l number ( -5) of programmi ng errors were also uncovered, the
most serious operational problems were caused by the lack of certai n capabi l i ties
(2. 9. 2). For example, it was di scovered that for some complex reason, fi lenames
i n the database had been i nconsi stently l i sted i n either lower or upper case letters.
The i mpl ementation of a speci al -purpose function to convert strings to lower case
was i mperative to solve thi s problem. Thi s experience supported the theory that
the last fraction of production errors would be domi nated by speci fication
problems (2.9.6). Lacki ng the production team, thi s i mplementtion task was
successful l y undertaken by the meta-programmer.
The meta-programmi ng conventi ons and the debuggi ng organi zation described i n Sections
2.5, 2.6, and 2.7 were used wi th good resul ts. The check procedures were very effective i n
local i zi ng the complex fail ures of the storage al location and garbage collection algorithm
requi red by the C language.
An i nteresting appl i cation of checki ng procedures was called for i n the sol ution of
a rare "real -ti me" error. The i ni tial i ndi cation was a consistent machi ne halt but
at a random place i n the code. I t was i mmediately concluded that the i ndi cation
was related to some side effect of the code bei ng debugged on the onl y
unprotected real -ti me process i n the computer: the 60 cycl e ti mer i nterrupt. To
fi nd the origi n of the si de effect, a check procedure was defi ned as fol l ows: the
program state is correct (for thi s purpose) if the 60 cycle i nterrupt can take place,
otherwise it is i ncorrect. To i mplement this defi ni tion, the check procedure just
had to i dl e more than one-si xti eth of a second, to al low at l east one i nterrupt, and
then signal that the state i s correct. An observed machi ne hal t served as the
i ncorrect state signal. A bi nary search (2.6) located the error in a few i terations.
Note that the check procedure used only an externally known property of the
ti mer i nterrupt, namely, that it takes place 60 ti mes a second.
The total length of the meta-programs was 4916 l i nes ( 284% expansi on). Compared to
the early experi ments, the higher expansion may i ndi cate a more efficient style, or the
devel opment of a richer local language i n the longer project. The plot of weekly changes
in producti vi ty (Figure 10, lower porti on) shows evidence of the growth of local language
where the vol ume of meta-programs decreases whi l e code production remai ns
approxi mately level; for example d uri ng weeks 5 through 8, 9 through 12, and especi al l y
duri ng weeks 13 through 15. Thi s effect i s the most pronounced duri ng the bottom-up
i mplementation of a new subtree i n the structured hi erarchy ( 2.2). The "sawtooth"
starti ng at the 13th week, for example. marks the i mplementation of the run-ti me
i nterpreter and the various run-ti me standard procedures (C.6). I t should be noted that
the wri ti ng of the meta-programs were ti med so that elaboration could usual l y commence
i mmedi ately after a meta-program had been i ssued. For thi s reason, variations of weekly
meta-programmi ng and codi ng productivi ties should correspond wi thout appreciable
queui ng delay.
I t i s apparent from the measurements ( E. 2) that in Project C, the i ni tial trai ni ng transient
has ended by the second week of operations. For techni ci an T3, dur i ng the second week
all i ndi cators (l i nes wri tten, compi lations, loads) were above the long term averages.
Duri ng the same week, some of the i ndi cators for T4 were lower, yet comparable to hi s
own averages over the fi rst 9 weeks of the project
To si mpl ify the eval uation of the measurements, Project C had been brought to a hal t
before Project 01 and 02 were started. The fi nal transi ent of Project C, closel y
resembl i ng the tai l of Project B. can be observed starting at about the 16th week.
The measurements also show that there was, on the average, one compi l ation for every 6
source l i nes. Gi ven the average producti vi ty of 6. 12 1 /m-h, we see that one man-hour
supported the average compi lation ( 40 mi nutes, i f the meta-programmer's ti me i s
excl uded). One l oadi ng was performed ( i mpl yi ng approxi mately one bug) for every 11
source l i nes. Obviously, compi lation and load ti mes (rangi ng from 30 seconds to 3
mi nutes) had very l i ttle effect on producti vi ty.
3. 9. 3 Projects Dl, D2, and D control
The purposes of the 0 experi ments were ( 3.4) to measure production resul ts i n groups
lead by different meta-programmers (Project 01 versus 02) and to compare the
performance of the meta-programmi ng organi zations with the performance of a group of
si mi lar si ze but using tradi tional techni ques ( Projects 01 and 02 versus D control). The
opti mal experi mental ensembl e would have Jet the three experi mental groups work on the
same problem specifications, produce comparable products, and achieve the same
mi l estone before thei r termi nation. The actual execution of the experiments fel l short of
the ideal i n a number of ways. Fi rst, the scope of the problem was reduced mi dway
through Projects 01 and 02 (Appendi x D); the 0 control team was gi ven the si mpl ified
specificati ons from the begi nni ng. Second, Projects 01 and 02 had to be termi nated
before normal operations of the product coul d be demonstrated, al though test output
i ndi cated the correct operati on of l arge porti ons of the programs.
One problem wi th the large-scale experi mental approach descri bed i n Section 3.2 was th2t
the same resource l i mi tations preventing the repeti ti on of the experi ments for control,
prevented the exti rpation of anomal ies. Approxi mate resul ts can be sti l l obtained by
careful consi deration of the possi bl e effects of the anomal i es. The fact that the si ze of
the program was i ni ti al l y mi sjudged i ndicates an engi neeri ng, rather than production,
problem ( 1 . 2). The causes and remedies of such mi stakes were beyond the i mmediate
i nterests of the present research.
Al l three groups chose to rely on the services of the existi ng operati ng system
[Lampson2] and on the same l i brary sort routi ne. The si zes of these common routi nes
are excl uded from the program si zes l i sted below and in Appendi x E.
1 000
1 000
1 000
,.. .
WWmm 1 l
. .4

' ....
Figure I I Comparisons of producti vi ti es i n Projects 01, 02, and 0
control. Projects 01 and 02 are plotted accordi ng to the conventi ons of
Figure 10. The last plot shows the IoIu codi ng producti vi ty of the two
partici pants of Project D control.
1 000

P 1
7 [

Figure 12 Comparisons of the i ndi vi dual producti vi ties of the two
parti ci pants i n Projects D control. The plots fol low the conventions of
Figure 10. The sum of these two curves appears i n Figure 1 1.
-- -
1 000
1 000
Figure /3a Li nes of code accumul ated i n Projects 01, 02, and 0 control
as a function of el apsed ti me. X axis i s marked at every 5 worki ng days
el apsed.
-- -
Figure /3b Li nes of meta-programs accumul ated i n Projects 01 and 02
(by meta-programmers Ml and M2, respecti vely) as a function of elapsed
ti me. Tri angul ar symbol marks start of code production.
It i s conservatively estimated that both projects 01 and 02 were termi nated 4 man-days
before operational demonstrations. These estimates are supported by the fol lowi ng
observations: in both projects, all meta-programs have been compl eted and all code has
been wri tten; test output i ndicated that the most i mportant secti ons of the programs were
worki ng correctly; all parti ci pants have demonstrated previously thei r abi l i ty to design or
elaborate code whi ch was free of major surprises; and at 4 man-days, the si mpl e
producti vi ties of Project C and 01 woul d be approximately equal. A val iant, . but
unsuccessful , attempt to reach the mi l estone was i n fact made i n 10 hours of overtime
(Appendi x D), prior to the i mpendi ng Christmas vacation period. The esti mates are
equivalent to declaring the projects 92% compl ete (see below), a di fference of 1 man-day
in the esti mate would change the results by approxi mately 2%.
Mechanical application of the productivi ty accounti ng princi ples used earl i er yields the
fol l owing n umbers:
01: 2399 source l i nes 49 man-days - 6. 12 l lm-h
where the denomi nator is:
(5 weeks 2 days) " ! meta-programmer
( 3 weeks J days) " 1 technician
4 man-days of debuggi ng (estimate)
02: 2467 source l i nes 49 man-days (same as for 01) - 6.29 l lm-h
The l i nes of meta-programs wri tten i n the two projects di ffered considerably:
01: 1572(-187) l i nes, expansion: 173%
Note: !&7 l i nes of meta-programs were never elaborated because of the change i n problem scope.
02: 2304 l i nes, expansion: 107%
The cumulative plot of meta-programmi ng production i s depi cted on Figure 13b. The
start of meta-programmi ng preceded the start of code production by more than one week
in both projects. Experience wi th Project C showed that supporting the i mmediate start
of codi ng put an unreal i stic load on the meta-programmer. The lead times in Project 01
and 02 were to be used by the meta-programmers to bui l d a comfortable backlog of
meta-programs. The di fference in the lead ti mes (also shown in Figure 11) is not thought
to be of signi ficance.
The 173% expansion of the meta-programs in Project 01 was less than in C (184%)
al though both projects i nvol ved the same subjects: M1 and T4. The di fference suggests
that due to the smal l er si ze of the project, the local language of 01 was less rich than that
of C. Si nce the actual ti mes spent meta-programmi ng by M1 and M2 were nearly equal
(98 and 96 hours, respectivel y) the lower efficiency of M2's meta-programs can be
attri buted to a more verbose wri ti ng style. Also, M2 and T3 di d not have the benefit of
prior col laboration so the meta-program expansion shoul d be more comparable to that of
Project A (which was probably less than 149% (3.9.1)), than of 01. Some .of the verbosity
i n M2's meta-programs found i ts way i nto the elaborated code as wel l . The density of the
02 code was 3.61 bi nary words/source l i ne, lower than the density of 01: 4.58.
I nspection of the code shows that M2's selection of longer tags and extra-long i dentifiers
when the tags were combi ned (2. 5) was the major cause of the lower density.
If the l i ne counts were obtai ned by actually counti ng carriage-returns i nstead of the character
counting method (3.7), the longer identi fiers would have made only a small di fference. Of course,
the counts of carriage-returns would be sensi tive to some other stylistic variations.
Compensating for the code densi ties changes the relative producti vi ty figures. If 02 had
the same density as 01, the source l ength of 02 would be: 8898 words / 4.58 words/l i ne =
1943 l i nes, and the si mple producti vi ty measure would show:
02: 1943 01 density l i nes / 49 man-days - 4.96 1 /m-h
The considerable di fference between the si zes of the programs i n bi nary words (01 :
10988, 02 : 8898) was partial l y due to the di ffering amounts of test code bui l t i nto the
programs. I nspecti on of the sources showed 423 l i nes of test code in 01 (check
procedures, test pri nt, and a functional si mul ator for the di sk), versus 70 l i nes in 02.
Removi ng all test code from both programs would have left approxi mately 9050 words i n
01, 8650 words i n 02. Other causes of the di fference i n si ze i ncl uded the unequal i mpact
of the changes in the problem speci fications, and di fferences of programmi ng style.
The weekly rates of code production are plotted in Figure 11. The cumul ative plot of
code production is gi ven in Figure 13a. These plots do not incl ude compensation for the
di fferi ng code densi ti es. It is apparent from the data in figures 10 and 11 (also i n
Appendi x E) that i n both 01 and 02, code was wri tten at higher rates than duri ng any
week i n Project C. Note that figures 10 and 11 were plotted i n commeasurable uni ts. The
higher codi ng producti vi ty of the technicians can be partial l y attri buted to the full
support of the meta-programmer, whereas i n the earl i er projects, the attention of the
meta-programmer was di vided among two technicians. There were some i ndi cations that
the time of meta-programmers were underutil i zed, especial ly toward the end of the
projects. In particular, both meta-programmers found some ti me to help debuggi ng the
code. Measurements of thei r contri butions are shown i n the Appendi x ( E. 5, E.6).
In summary, the short Projects 01 and 02 were at a relative di sadvantage compared to
the longer Project C, for three reasons. Fi rst, there was not enough . time for the
development of a powerful local language. Second, the meta-programmi ng and codi ng
capaci ties of the mi ni mal production team of two persons are unbalanced. Lastly, the
di seconomies of producti vi ty transients at the project boundaries are relatively more
significant in the smaller project. Results of Project D control
The s i mple producti vi ty of the control group was:
D control: 2893 source l i nes / 69 man-days 5. 24 l lm-h
where the denomi nator is:
(6 weeks 4 days)

seni or programmer Pl
7 weeks

j unior programmer P2
However, thi s resul t i s not di rectly comparable to the correspondi ng resul ts of Dl and 02,
because of substantial di fferences in programmi ng style, such as the i ncl usi on, by the
control programmers, of ample comments in the code. Note that elaborated
meta-programs do not contai n comments (2.7.5), and whi l e meta-programs substitute for
comments in a sense (2. 7. 5), they are not i ncl uded i n the source length measurements ( 3.7).
The plot of weekly rates of code production for the control group as a whole in given i n
Figure 11. Thi s plot shows the sum total of production by the two programmers, as
opposed to the 01, and 02 plots which show the producti vi ty of a si ngle technici an,
whi ch, however, was supported by another person, the meta-programmer. Al l three plots
then show the effecti ve producti vi ty of 2 persons (1. 5 persons in the si mi lar A+B and C
plots of Figure 10). Contri butions from the two partici pants in the control experiments
are separated in Figure 12. The cumulati ve plot of code production i s shown in Figure
The drop of the producti vi ty curve below zero i n Figures 11 and 12 was caused by the
seni or parti ci pant, PI, edi ti ng and removi ng portions of the source code origi nally wri tten
by the juni or programmer P2. The reasons for the removal of source wi l l be di scussed
below. Even after the tri mmi ng, the density of the code remai ned low: 2. 97 words/source
l i ne. Compensating for the densi ty, we get:
D control: 1876 Dl density l i nes / 69 man-days 3.40 1 /m-h
The fi nished bi nary code was onl y 6364 bi nary words long, not i ncl udi ng the largest
fraction of test output routi nes which were prepared in separate program packages. The
code, however, i mplemented a si mpl i fi ed design, based enti rel y on the reduced problem
speci fi cations (Appendi x D).
As descri bed i n Secti on 3.4, the control team was organi zed of a seni or partici pant. Pl. a
peer of Ml, and of an experienced j unior programmer P2. The qual i fications of P2 were
necessari l y di fferent from the qual i fi cations of the techni ci ans (T3 and T4); tradi ti onal
organi zation requi red experience for i ndependent performance i n all phases of
programmi ng, i ncl udi ng design, codi ng, and debugging. The greater experience of P2
woul d tend to make control comparisons tess favorable to Dl and 02, hence provi de
conservative resul ts. However, the onl y avai lable measures of P2's experience were
i ndi rect: n umber of years si nce BA degree, empl oyment references, and salary history.
Before the start of the project, P2 had three weeks to work wi th another programmer on a
si mpl e uti l i ty program so that he could get acquai nted wi th the programmi ng
envi ronment. Thi s trai ni ng ti me was not i ncl uded i n the producti vi ty measurements.
Unfortunately, Pl and P2 di d not have an opportuni ty to meet before the project started.
Duri ng the fi rst week of the project, the parti ci pants parti tioned the task along a
conveni ent l i ne: Pl was to work on the permuter (the second phase of the program, see
Appendi x D for the detai led specifications), whi l e P2 was to wri te the planner (the fi rst
phase). Pl assumed the leadershi p rol e by defi ni ng a high l evel block-diagram of the
planner and by provi di ng general gui dance. The effecti veness of the gui dance was
reduced by the di ffi cul ties of communication between the programmers who were both
developi ng di sjoi nt local languages.
For example, Pl asked for ampl e test output to si mpl i fy debuggi ng. P2 compl i ed,
except for a subtle detai l ; the test outputs, at numerous places i n the planner,
contai ned the output values sampled before the output records were assembled
from the val ues. When there were any errors in the (non-tri vi al ) assembl y of the
records, the output sti l l appeared correct. It is, however, very difficul t to describe
the correct way of i mpl ement i ng test output, as wel l as al l other parts of a
program where subtle mi stakes may be made, unless the communicants use the
same local l anguage.
Measurements i n Appendi x E. 5 show that Pl d id very l i ttle, if any, debugging before the
4th week of the project. By the 7th week, the permuter was essential l y debugged and Pl
took over the debuggi ng of the exi sti ng portions of the pl anner, whi l e P2 was worki ng on
addi tional planner code. P2' s empl oyment contract was termi nated after the 7th week and
Pl brought the project to i ts successful concl usi on alone.
The shortcomi ngs of P2's code came to l ight duri ng the last two weeks.
Substanti al amounts of source text removed by Pl i ncl uded the mi sl eadi ng test
output statements (see above) and n umerous i mprecise comments (cf.
[Kernighan-Pi auger] page 119). I n some i nstances, i nstead of decipheri ng
erroneous logic, Pl replaced whole secti ons of the code ( ibid. page 50).
The i ndividual contributions of PI and P2 in the total product can be estimated from the
data in Appendix E.5, by assuming that P1 created 100 l i nes of source during both weeks
7 and 8, since the number of l i nes typed on the keyboard were similar duri ng weeks 6, 7,
and 8, and 110 l i nes were created during the 6th week. Under this assumption, PI's share
was 1650 l ines (57% of total), versus P2's 1243 l i nes ( 43% of total).
4.1 Conclusions from the Experimental Results
The production experi ments verified the qual i tative predicti ons of the theory. A
production organi zation was set up whi ch successful l y i mplemented a number of small
and medi um si ze systems at production rates above 6 J i nes/man-hour ( 3.6, 3.9.2). This
organi zation was uni que in that i t coul d uti l i ze the experience of a si ngl e person. the
meta-programmer, for leverage i n a production team. Gi ven an experience
meta-programmer. equal l y good resul ts were obtai ned by di fferent techni ci ans (Projet C)
who satisfied certai n selection cri teria (3.4). These results are i nterpreted to mean that
the meta-programmer has absorbed most of the uncertai nti es ( 1. 2) i nherent i n software
production whi ch would normall y cause large di fferences i n i ndi vi dual producti vitie to
appear (1.5).
Uncertai nty absorption did not mean that the task of the technici ans. the other members
of the production teams. was reduced to . routi ne. As the tasks were performed, the
techni ci ans learned the problem specific local language (1.6) and progressi vel y i ncreased
thei r relative contri bution (3.9.2).
Techni ci ans were able to grow on the job; in particular, one former techni ci an became the
meta-programmer in Project 02.
Further leverage was obtai ned by the separation of the engi neeri ng acti vi ti es from the
production organi zation whi ch i ntroduced another layer of uncertai
ty absorption. I n
Project D, the problem specification. prepared by an engi neer. removed the major
uncertai nti es from the program i mplementation. Worki ng from the speci fi cation. two
different teams, one lead by an experi enced meta-programmer, the other by a less
experienced former techni ci an, obtai ned comparable productivity resul ts (6. 12 versus 4.96
l i ne equi valents / man-hour (3.9. 3. 1, Figure 13a)).
Al though the time spent meta-programmi ng was vi rtual l y i denti cal for both
meta-programmers (3.9.3. 1), the meta-programs wri tten by the less experienced
meta-programmer, M2, were substanti al l y longer than those wri tten by the more
experi enced Ml (Figure 13b). M2's meta-programs were cl early not as efficient as
Ml's, si nce the latter's group had higher net producti vi ty, yet, consideri ng the
ci rcumstances, the di fference was surprisi ngly smal l .
Non-productive trai ni ng ti me for techni cians was consi stently negl i gi ble (Projects A and
C) because what would be usual ly classified as trai ni ng was recogni zed not to be
qual i tati vely di fferent from the continuous learni ng process which took place throughout
the projects. Meta-programs, wri tten at di fferent levels of detai l , could serve as the mai n
i nstruments of communication from the meta-programmer to the techn icians (2.2) at al l
stages of trai ni ng and program development
The resul ts of the control experi ment (3.4), for comparing the tradi ti onal programmi ng
organi zation with meta-programmi ng, were i nconcl usive, al though 'at least one i ndi cator,
the amount of bi nary code produced i n uni t cal endar time, was sharply i n favor of the
meta-programmi ng method (6. 1. 2 versus 3.40 l ine equi valents / man-hour (3.9.3)). Note
also that al l the meta-programmi ng groups also produced compl ete sets of meta-programs
which could be used as documentation (2.7), and that, in each of the projects, at least two
people were well acquainted wi th every detai l of the logi c of the programs. These
anci l lary benefi ts would be particul arl y i mportant i f the programs produced were parts of
a l arger system. The control group on the other hand, could not create documentation as
a natural by-product, except for comments, whi ch had l ess detai l or uti l i ty than
meta-programs. Al so, large portions of the program written by the control group were
known onl y to a si ngle programmer.
The si mpl i fied subject problem for the control experiment (Appendi x D) was probably
too small to create the major communi cation problems the meta-programmi ng
organ i zation was designed to solve. Even with a smaller problem, the si mul taneous
requi rements of a controlled experi ment, for resources and for moti vated people wi th the
right qual i fi cations, proved i mpossi bl e to fulfi l l enti rely.
The producti vi ty figures do not show the i ncreased rel iance of the control group
on the senior partici pant, a cri tical resource. I n fact, the actual ti me spent by the
seni or parti ci pant in Project D control was 30% higher than in 01 (note that thi s
number was not affected by the early shutdown of Project 01 (, si nce
meta-programmi ng was complete before the shutdown ).
The key factor i n the lower productivity of the control group was the i neffi ci ent
use of human resources: both the senior programmer P1, and the less experi enced
P2, have spent most of thei r ti me worki ng on tasks of si mi l ar complexity and
val ue. Some of these tasks were i n fact beyond the capabi l i ties of P2 and thi s led
to some wasted effort ( The parti tioni ng of the problem i nto largely
di sjoi nt subproblems of approxi mately equal size and compl exi ty i mplied the
reduction of communication needs of the group to exchanging i nformation about
a narrow i nterface. Thi s organi zational si mpl i fication, however, delayed the
detection of P2's mistakes, and ulti mately made i t necessary for P1 to debug or to
rewrite unfami l iar sections of P2's code.
The 20% di fference between Projects Dl and D control, i n the actual hours
worked per week by the j uni or partici pants, accounts for onl y about 0.3 l i ne
equi valents / man-hour i n the producti vi ty di fference, if the net contribution of
the junior parti ci pant i n Project D control is assumed to be 43% (
The experi ments demonstrated the feasi bi l i ty of conti nuous production ( 1.2), as shown,
for example, i n Figure 1 3a, or i n the smooth transi tion between Projects A and B ( Figure
10). The collection of productivi ty measurements was al most completely automated. The
measurements coul d have been used to moni tor and opti mi ze the production process i n
real-time, except for our desire to si mpl i fy the experi mental ensemble and delay the
evaluation of the measurements (3. 3).
The use of design princi ples ' appropriate for high-producti vi ty envi ronments (1. 3) was
essential for keepi ng the production teams occupied. User acceptance of the programs
(especially A and B), showed that the design of high qual i ty programs may be obtai ned
from a conti nuous stream of largely i ndependent design decisions, each considered
uni mportant in themsel ves. In Project C, ti me spent on system design and detailed design
was clearly less than 33% of the total , si nce onl y one out of three participants, namel y the
meta-programmer was i nvolved i n design, and si nce the meta-programmer had other
responsi bi l i ties as wel l . Consi deri ng the actual ti me, rather than calendar ti me, spent by
the meta-programmer, we fi nd that design took less than 20% of the total man- hours.
The meta-programmi ng conventions and the debuggi ng organi zation were also observed to
work well (3.9. 2). They ensured the surpri se-free and conti nuous execution of routine
tasks, such as the local i zation of fai l ures. The object nami ng conventions also contri buted
to the actualization of the concept of local language, si nce the object names, i n fact,
comprise a l arge portion of local languages. Dependence on the exi stence of speci fic
programmi ng language features, such as type checki ng, was reduced.
4. Recommendations for Future Work
We expect producti vity to remai n a key concern in the software i ndustry. Accordi ng to
the concl usions presented above, i t is unreal istic to assume that future experi ments to
provide unequi vocal comparative data about the meri ts and demeri ts of various
production methods could be successful l y executed on larger scale and with better
control . It is also evident, however, that the automatic collection of producti vity data i s
relatively si mpl e to i mplement. The most promi si ng subject for future research,
therefore, might be the comparison of the measurements taken in the large scale software
efforts sol vi ng real problems. I nnovative software producers should support such research
by col l ecti ng and publ i shi ng producti vi ty data.
Designers of the uti l i ty programs supporti ng software producti on, such as edi tors,
compi l ers, loaders, debuggers, or job control languages, should make provisions for
producti vi ty measurement. Variations in programmi ng languages and code density
coul d be accounted for by selecti ng a mi x of representative programs, for exampl e
from the set of standard algorithms publ i shed i n the Communication of AcM, to
defi ne the standard 200, 500, and 1000 l i nes. These programs could be translated
i nto whatever programmi ng language is used to yiel d the correction factor for the
I n the designs of future programmi ng languages, emphasis may be. shifted from the
question of how can the programmi ng l anguage, by i tself, ensure the highest producti vi ty,
to the fundamental l y different question of what can the programmi ng l anguage contri bute
to the organization which has
the highest producti vi ty (2.9). Such shift may al so occur i n
the research area of program correctness proofs.
For the busi ness executive who may wish to try the meta-programmi ng organi zation, we
have the fol lowi ng advice: Select a programmer wi th proven techni cal competence and
who i s enthusiastic about the idea, as the meta-programmer. Hi re entry level personnel
fresh out of col lege for techni cians. I nsi st that all appl icants be gi ven a programmi ng
test, such as the one i n Appendi x A. If the programmi ng envi ronment is properly set up
by the meta-programmer, the trai ni ng time for the technicians should be very short. The
reasons for the i ni tial exclusion of other programmers wi th experience are, that trai ni ng
time would not be saved, and that the programmer's experience may actual l y i nterfere
wi th the meta-programmer's efforts to control the creation of local language (2.2). Start
the team on a smaller problem (by absorbing the uncertai nties about the boundaries, a
subproblem of a l arger problem may be also used) and determi ne the team's productivi ty
as the basis for future planni ng.
For the most spectacular results, the scope of the problems may be later expanded, so that
the team can go "critical" in the sense of Section 1.4.
Appendix A: Programming Test
The fol lowi ng programmi ng test was used to select techn icians for the experi mental teams
(3.4). The test was i ntended to be a si mulation of a meta-program for two reasons: to
hel p fi nd those appl icants actual l y capable of elaborati ng meta-programs, and also to give
the appl i cants some feel as to what i s expected of them.
The fi rst portion of the test is a cover sheet expl ai ni ng the ground rul es, fol lowed by the
si mul ated meta-program (The term specialist used on the cover i s a euphemism for
techni ci an). The meta-progrmmi ng conventi ons were not used to avoid the need of
explai ni ng them.
The sharp contrast between the compl exi ty of the abstract algori thm (for explanation of
the algori thm see [Knuth]) and the si mpl i ci ty of the description of the steps i s
i ntentional. It was expected that most applicants would not be fami l i ar wi th the
algorithm and would have to complete the task wi thout the benefi t of deep
understandi ng. The appl i cants were gi ven ample opportuni ty to ask questions so that
ambigui ties i n the wordi ng of the test could have been resolved.
None of the selected techni cians knew of the algorithm prior to taki ng the test. Those
appl icants fami l iar wi th tit e algori thm happened to be also clearly overqual ified.
Common errors i ncl uded exchangi ng elements of KEY, compari ng elements of Q,
exchangi ng or comparing i ndices, and confusing the val ue of 0[1 ] wi th the name
Q[l ] (at GETPO 5). There were also many errors i n contorted WHILE statements
i nto which the appl icants were tryi ng to force the algori thm.
The obvious speci fi cation error at PUTPO 5 (repeat from 3 i nstead of repeat from 4) was
i ntroduced uni ntentionally. The reproduction of the test below has been slightly edi ted to
conform to the format of the present work. The test gi ven to the appl i cants was prepared
on a typewri ter.
The attached sheet contai ns the descri pti on of a programmi ng task, typi cal of the ki nd
of tasks speci al i sts wi l l perform i n the Software Producti on Team.
Pl ease wri te the three procedures descri bed, usi ng the l anguage of your choi ce
( ALGOL or FORTRAN are preferred) . Try to make the code cl ean and reasonabl y
effi ci ent. You need not ensure or prove that t he speci f i ed al gori thms are correct.
You need not f ol l ow the speci fi ed steps exactl y. Wri te comments, but do not
"overcomment". Try to refl ect the "state" of the vari abl es i n the comments.
The appearance of your compl eted manuscri pt shoul d be such that someone unfami liar
wi th the l anguage shoul d be able to copy it correctl y. Ask any questi ons you wi sh
and work at an unhurried pace. Good l uck!
A Po (Pri ori ty Queue) is an i nteger array with the fol l owi ng properties:
1 . Po[O] contai ns LPO, the "l ength" of the Po which is always MXLPO.
( Assume here that i ndexi ng wi th 0 i s al l owed.)
2. There i s an i nteger array KEY (wi th i ndi ces rangi ng from 1 to MXLKEY- 1 ,
i ncl usi ve) and:
ei ther LPO=O or
KEY[P0[ 1 ]] 2 KEY[PO[I ]] for al i i such that LPO 2 1 2 1 .
Thi s si mpl y means that Po contai ns i ndi ces of KEY ( poi nters i nto KEY) and the fi rst
i ndex i n Po poi nts to a largest ( maxi mal ) key, thus Po's are sorted i n very weak
Procedure to add an i ndex to a Po:
Q is a Po. Add index to a Q as fol lows:
1 . I ncrement the l ength (i n 0[0] ). I f too l arge, cal l ERROR( "Po OVERFLOW") .
ERROR wi l l not return.
2. Store INDEX at Q[l ength] so i t wi l l be the i tem at the end of the queue.
3. Set I = l ength.
4. I f I = 1 , we are fi ni shed.
5. I f KEY[Q[I ]] ) KEY[Q[I DI V 2] ]
then exch.ange them and repeat from 3 wi th I = I DI V 2. D1v i s the i nteger
di vi si on operator. Otherwi se, we are fi ni shed.
Procedure to remove the i ndex at the "top" of the queue (to obtain an index to a
maximal key) :
Thi s functi on returns the i ndex as i ts value. The al gori thm is as fol l ows:
1 . The resul t i s 0( 1 ], of course. Save i t. I f l ength=O, cal l ERROR("Po EMPTY").
2. Move the i tem at the end of the queue to O[ 1 ], the top.
3. Set I = 1 .
4. I f 2* 1 2 l ength, decrement the l ength and return wi th the resul t saved above.
. Call 0[1] the "father" and 0[2 *1 ] and 0[2*1 +1 ] i ts two "sons". Find the one
among the three wi th the greatest key compari ng KEY[SON1 ] to
KEY[SON2] and so on, 3 compari sons al together. If the father wi ns,
decrement the l ength and return as under 4. Otherwi se, make I poi nt to
the wi nni ng son, exchange same wi th the father and repeat from 4.
Procedure to check if an array i s a Po:
Cal l LRROR("PO STATE INCORRECT" ) if 0 is not a PO
Otherwi se return.
Appendix B: Format of the Measurement File
As descri bed i n Section 3.5, measurements of production acti vi ty was recorded by the
Project edi tor. The format of the measurement fi l e i s given below. This format was
designed to accommodate extensions so that other tools. such as the compi l er or the
debuger, may be al so i nstrumented i n the future.
Throughout the descri ption, field names wi l l be shown i n lower case sans-serif letters (for
example: ti me) whi l e upper case letters or other marks (P, *, or 541 ) denote the val ues of
All records on the measurement fi l e consist of coded characters [Ascn] and have the
fol lowing general form:
date ti me subsys type rest
where the fi elds are separated by blanks and the record i s termi nated by a carriage return
(CR) character. The fields contai n the fol lowi ng i nformation:
ti me
the year, month, and day as YYMMDD deci mal digits.
the hour (24 hour system), mi nute, and second as HHMMS.
identifier of the subsystem which made the record. The edi tor, the sole
source of measurements in the experi ments, i s identified as B.
type determi nes the format of rest relative to subsys. The edi tor uses two
different formats, identified as S and Q respectively. These formats are
descri bed below.
rest other i nformation, as determi ned by subsys and type.
After every successful save command (4.4) the edi tor records the fol l owi ng i nformation
(preceded by date time B):
S user fi l ename nO bal ance keyboard ( fi l ename1 n1 fi l ename2 n2 . . . )

fi l ename
is the type
i s the user (M1 . T1 , and so on)
i s the name of the file i n whi ch the edi ted text i s saved. Thi s fi l e i s
usual l y, but not necessari l y, also the origi nal source of the text. Fi l enames
fi l ename1
n 1
fi l ename2
are wri tten wi th extensions appended. By convention, the extensions
determi ne the type of the fi le: .MP for meta-programs, . SA and .OF
(defi ni tions) for source code. Other extensi ons are also i n use for special
purpose files.
is the n umber of characters wri tten on fi l e fi l ename . _
i s the change i n the length of the file fi l ename, that i s nO - (the length of
the fi l e prior to the save command). Note that the balance may b
negati ve. If the fi l e is a new fi l e, created by the save command, balance =
is the count of characters typed i nto the saved text from the keyboard. For
exampl e, if i n a program the word THEN is replaced by typi ng i n Do and
the resul t i s saved, bal ance wi l l be -2 and keyboard wi l l be 2. The
characters in the edi ted text are flagged so that their ori gin can b
ascertai ned.
is the name of the fi rst fi le, different from fi l ename, also contributing to
the saved text. This fi el d i s empty unless copyi ng of text from different
fi les took place.
is the n umber of characters contributed by f i lename1 .
i s the name of the second fi l e... As many {fi l ename
, n
} pai rs appear as
At the end of an edi ti ng sessi on, when the user executes a quit comman. another record
is made (preceded by date ti me B):
Q user el apsed nk nc nd successor cors pri nt remarks
el apsed
i s the type
i s the user as above
is the elapsed ti me i n the session measured i n seconds
is the total number of characters typed on the keyboard. Thi s n umber is i n
general grater than the sum of the keyboard fi elds of the type S records
for the session because of some of the characters typed may have been later
removed. Characters i mmedi ately backspaced over are
not counted.
i s the total n umber of characters copied wi thi n the same fi l e or from other
fi l es. See also the remark for nd.
is the total number of characters deleted. Whenever characters are moved
(that is copied whi l e destroyi ng the origi nal ) both nc and nd wi l l b
i ncremented.
successor is a code for the successor program which the user may specify before
confi rmi ng the qui t command (4.4). The code B i n thi s field denotes the
BCPL compi ler, L denotes the loader. If no expl i ci t successor is speci fi ed,
thi s field will contai n an asterisk ( ).
pri nt

is the number of pri mi ti ve edi tor commands execute.
is the number of pages l i sted on the l i nepri nter.
unused fields for future expansion .
a possi bl y empty l ist of remarks made by the user. Each remark may
occupy a n umber of fields. The fi rst fi eld is always a remarktype which
determi nes what follows as described next.
The di fferent remarks, wi th the remarktypes l isted fi rst, and the ci rcumstances of thei r
usage are as follows:
E n Revising n source code syntax or loader errors. I n particular, n = 0 means
that the edi ti ng acti vi ty is necessary for the fi xi ng of syntax or loader error
which has al ready been accounted for. After every compilation thi s remark
is automatical l y prompted and the user has to type in the number of
errors. If there were no errors, the DEL key shoul d be used so that the
remark wi l l be omi tted al together.
8 n As above, except for semanti c errors (bugs). The error may be i n the
meta-program or i n the source as shown by the extension of the fi l e being
edi ted.
C n As above, except for repeat efforts to fi x semanti c errors.
F fi l ename This remark i s
ade automatical l y when the compi l er i s specified as the
successor program. The f i l ename designates the fi l e bei ng compi l ed.
Z SUSPEND Marks the suspension of productive acti vi ty: meta-programmi ng,
elaboration, or debugging. Resumption may be marke< by ei ther:
1 10
or by any other log entry. Thi s remark needs to be used onl y when the
starting acti vi ty does not l eave i ts own record; for example when starti ng
the day usi ng the debugger.
X anything To be used in exceptional si tuati ons.
The fol l owi ng two records are examples of the basic measurement format:
750728 1 23850 8 S T4 PM.SR 7899 71 86 ( )
750728 1 23905 8 Q T4 951 95 1 91 1 93 8 44 0 1 F PM.SR
Usi ng the above descri ption as a key, we fi nd that the records were made on the 28th of
Jul y, 1975, at around 12: 39 noon. In an approxi mately 1 5 mi nutes long edi ti ng session,
technician T4 fi xed a si ngle syntax error in source fi l e PM.SR. After 44 edi tor commands,
the length of the file was i ncreased by 71 characters. The new version of the fi l e
contai ned 86 new characters, the balance, 7899 - 86 = 7813 characters, were transferred
from the old versi on; possi bl y somewhat rearranged (note the 191 characters copied and
deleted). At the end of the session, the compi ler was called to compile the the same fi l e
agai n.
Appendix C: Project C System Description
L. System Overview
An i nformal description of the Project C System ( Pes) is given i n the following sections.
The purpose of this documentation is to define the scope and compl exi ty of the problem
the Project C experi mental team worked on, and to i ntroduce the tool used to process the
measurement data.
Pes is a si mple Management I nformation System consi sting of a user i nterface, a compi ler
for the C language, and an i nterpreter. Algori thms for statistical processi ng of
measurement records can be expressed in the C language. The user i nterface al l ows the
user to pose a query by typi ng a program in the C l anguage or by referri ng to a l i brary of
programs. A degree of parameterization of the l i brary programs i s made possi bl e by the
macro faci l i ty. Once the specification of the query is completed, the macros are expanded
and the resul t is compi l ed by the compi ler and executed by the i nterpreter. Programs for
typical queries scan the whole database and l i st thei r results both on the computer di splay
and on a scratch fi l e using the REPORT proced ure (C.5). Upon termination of the
program, the system awai ts the next query.
L.2 Macros
Macros simpl ify the system by substi tuting for a procedure mechani sm and a run-time
user i nput mechanism. The l atter is accompl i shed by writing the programs as macros and
letti ng the user specify the val ues for the macro formal parameters. For each of these
parameters a prompting message, a defaul t val ue, and an optional l ist of possi bl e val ues
may be specified to si mpl ify the user's task.
Macros are defined by the construction: {.name\fparam1 \fparam2 . . . \body}, where the
body must be balanced wi th respect to braces ( {} ). There may be any number of f parami
formal parameter names. A macro cal l is wri tten as: {name\param1 \param2 . . . } . Here
the actual parameters are arbitrary stri ngs of characters not contai ni ng the separator (\)
and balanced in braces. A macro call is equi val ent to the expanded body of the macro
defi ned with the gi ven name. The form {fparami } is a formal parameter call: expansion
of the macro body means the replacement of the formal parameter calls wi th the
correspondi ng parami's. The actual parameters themselves are expanded before the macro
was called.
L.J Record Declarations
Records are aggregate values such that the components of the aggregate may be selected by
symbol i c field names. The record declaration serves to defi ne the field names and also
to establ i sh the correspondence between bi nary or Ascn external fi l e formats and i nternal
data representation. The si mplest form of the declaration is as follows:
RECORD recordtype(fi el dname1 : type, fi el dname2 : type ... )
The type (for example I NT, TIME, ATOM, or STRING) i s used for the i nterpretation of
external data only. I nternal l y, fi el ds are s i mple variables, and they can hold values of any
type (C.4). Operations on records are di scussed i n Section C.4.6. The recordtype is a
user defined name for the record. The field names are also defined by the user. The
same field name may be used in di fferent record dec1arations.
Record declarations are more complex if the external fi l e format allows variabi l i ty i n the
number or in the type of the fields. In particular, the declarations must accommodate the
measurement fi l e format descri bed in Appendi x B. Thi s is done by the fol lowi ng devices:
C.3.1 In place of a type, a record or sequence (C.4.7) may be declared by writi ng
recordtype( fi el dl i st) or recordtype[fi el dl i st] respectively. I n ei ther case, the
expl i ci t name for recordtype may be omi tted. A fi el dl i st of a sequence defines
the succession of types i n the sequence in a wrap-around order. Field names are
ignored in the sequence si nce fi elds are selected by ordi nal number (C.4.7).
C.3.2 The fi el dnamei and the fol l owi ng colon (:) may be omi tted in sequences or if the
field i s j ust a placeholder.
C.3.3 A conditional expression may appear in a fiel dl ist:
< condi ti on 1 fi el dl i st 1 : condi ti on 1 fi el dl i st ... I fi el dl i st>
The < si gn may be read as if, the 1 fol lowi ng a condi ti on as then, the 1 : as e/seif
the fi nal 1 as else, and the 2 as endif. The condi ti on must be i n the form:
fi el dname constant, where the named field must precede the condi tional
expression in the same record declaration.
For example, the measurement record format (Appendi x B) may be declared, i n
part, as follows:
> );
Note how types of some the fields depend on the value of the TYPE fi el d. The l i st
of {fi l ename; , n; } pai rs is declared as the value of the OTHER field a sequence of
an unnamed record type. The field for fi lename i n these records may be named
the same as a field i n the LINE record.
L.1 Types in the language.
Al l val ues i n the C language are i nstances of some type. Most operations restrict the
types of thei r operands. Al l variables (i nc1 udi ng elements of records or sequences) may
possess values of any type. The assignment operator (wri tten as + or : =) may be used to
assign any type. A complete l ist of types wi th their associated constants and operations i s
given next:
C.4.1 Ni l : There is j ust one i nstance of thi s type: the ni l val ue. Al l variables are
i ni ti al i zed to possess the ni l value. Most operations wi l J accept the ni l value an
wi l l do somethi ng reasonable, as described i n the sequel. Si nce there are no
boolean values, boolean operations (AND, OR, and NoT, also wri tten as , %, and
) i nterpret the ni l val ue as fal se and everythi ng else as true (boolean operati ons
wi l l produce the i nteger 1 for true).
The ni l constant NIL is avai lable. The constant FALSE=NI L is useful i n boolean
C.4.2 I nteger: Si xteen bit i ntegers and the standard ari thmetic and relational operations
(+, , , /, mod, ++ ( +: =) , - + ( -: =) , mi n, max, <, J, =) are avai l able.
NI L wi l l be accepted in l ieu of the i nteger 0. I nteger constants may be written i n
the decimal system as usual, for example 1 23. The form $X stands for the i nteger
character code of the character fol lowi ng the $, The constants MONDAY=l,
TUESDAY=2 ... JANUARY=!, FEBRUARY=2 ... are defined for use by the ti me
procedures (C.5). The constant TRUE=l i s useful i n boolean operations.
C.4.3 Ti me: Instances of this type may be i nterpreted ei ther as a time i nterval of
seconds (up to 232 seconds), or s an absolute date by representing the i nterval
between the date and the 1st of January, 1900. 0:00. A number of procedures are
avai lable to create and modify time values (C. 5). The operations +, -, < , 2, = are
also avai lable.
There are no ti me valued constants (but see C.4.2 and C. 5).
C.4.4 Atoms: Atoms are al phanumeric stri ngs represented by thei r i ndex i n a symbol
table. The = operation may be used wi th atoms. Atoms are al so used i n
conjunction with sets (C.4.8).
The fi l e contai ni ng the symbol table has to be declared i n the begi nni ng of any
program whi ch uses atoms read from fi l es, by wri ti ng:
ATOMFILE "fi l ename"
The extension .AT for the fi l ename is automatical l y supplied.
Atom constants may be wri tten encl osed in si ngle quotes: 'atom' .
C.4.5 Stri ngs: for efficient representation of sequences of characters. The operations are:
, 2, =. Stri ng concatenation may be wri tten as or . Substri ng and fi nd
procedures wi th various options are l isted i n Section C.5.
Stri ng constants are wri tten in double quotes: "stri ng". CR i s a stri ng constant
contai ni ng a si ngle carriage return.
C.4.6 Records: Records type val ues may be created by the INIT statement (INIT
variabl e: recordtype) whi ch assigns a variable a record value of the desi red type.
Al l fields in the record are i ni tial i zed to ni l . Records are also created by readi ng
the record from a fi l e usi ng the NEXT statement (C.4.9).
The other operation on records i s field selection, wri tten as:
record . fi el dname
When used in an expression, the val ue of the selection i s the val ue of the field
fi el dname in the speci fi c record i nstance. A selection may also appear on the left
side of the assignment operator, in which case the selected field wi l l be assigned a
new val ue. For example, one can wri te:
R.F + R. F 1
There are no record constants.
C.4.7 Sequences: si mi l ar to records, except val ues are selected by i ndexi ng. Note that
elements i n the sequence need not be of the same type. Ni l is accepted as the
empty sequence.
Selection by i ndexi ng is wri tten as: sequence[i nt egerexpr] . The selection may be
wri tten on the left si de of an assignment or in any expressi on (C.4.6). I ndex 0
sel ects the fi rst element i n a sequence. The l argest i ndex used i n an assignment,
pl us one, i s called the length of a sequence. Uni ni ti al i zed elements in a sequence
wi l l appear to contai n ni l val ues.
NIL may be used as a sequence constant.
Sets: A set is a sequence of atoms wi thout repeti tion of any atom. Ni l i s accepted
as the empty set. Sets may be i ndexed j ust as sequences can. Other operations are:
AND, OR, MINUS, I N, and INTO. The fi rst three are the set i ntersection, uni on, and
difference operations respectively, al l returni ng sets. The bi nary operations I N and
INTO check the membershi p of atoms i n sets as fol lows:
atom IN set: returns the i nteger i such that: set[i ]=atom, or returns NIL i f
there does not exi st such i .
atom INTO setvari abl e: this operation fi rst ensures that the atom i s a
member of the set (by doing setvariabl e setvari abl e OR SET(atom) i f
necessary) and returns the i nteger i such that: setvariabl e[i ] =atom.
NI L may be used as a set constant.
C.4.9 Streams: for fi l e transput. Every stream is associated wi th a data fi l e and a binary
property determi ni ng whether the fi l e is encoded as bi nary data or as ASCR
characters. Bi nary streams are easier to process. Measurements are origi nal l y
recorded i n fi les which are not bi nary, however. Operations on streams are:
creation (OPEN, C5), i nput, and output. The i nput statement:
NEXT variabf e: recordtype FROM stream
reads and converts the next record from the stream and assigns the variable the
record value (C.4.6). If the end of the data stream is reached, the ni l val ue i s
assigned to the variabl e. The data conversion i s di rected by the record type
declaration (C.3). The output statement is si mi lar:
NEXT variabl e: recordtype 1O stream
For output, the variable must contai n a record val ue. The record type and the
types of the fields in this record must correspond to the record declaration.
C.4.10 Statistics: I nstances of this type contain a set of double precision i nteger val ues to
accumulate sums and sums of squares. The ++ operation wi th statistics type left
operand wi l l form the sums, sums of squares and counts of the i nteger val ues
appeari ng on i ts right. Standard procedures are avai l able to obtai n the mean and
the standard deviation from the col l ected val ues (C5). Other operations: +, -, , /,
<, =, 2, and REPORT treat statistics type val ues as double precision i ntegers (32 bi ts
precision). Calculation of mean and standard deviation are meani ngless after
DO is the constant 0 for i ni tial i zation of variables and to establ ish thei r types.
C.4. 1 1 Formats: special val ues returned by certai n standard procedures (C5). By
presenting these values to the REPORT procedure, the format of the report may b
control led. The format val ues themselves wi l l not be pri nted.
C.5 Other Statements
Statements i n the L language are separated by semi colons (;). The assignment statement i s
written as:
l eftpart expressi on
where the l eftpart may be a variable or a selection (C.3). Variables need not be declared
and they wi l l be i ni tial i zed to ni t val ues. Parenthesis may be used i n expressions,
otherwise the customary rul es of precedence apply [Wijngaarden]. The form:
procedure( parameter1 , parameter2 . . . )
is a call on one of the standard procedures (C.6). Procedures which return a val ue may b
called from expressions.
Comments may appear anywhere, starting wi th double hyphens (-- ) and termi nated by
the hyphens or by the end of l i ne.
The avai lable loop forms are as follows:
FORALL variabl e INDEXING sequence DO body
FORALL variabl e IN sequence DO body
FoR vari abl e FROM i nteger To i nteger BY i nteger Do body
FROM i nteger To i nteger BY i nteger Do body
To i nteger BY i nteger Do body
WHILE bool ean Do body
Expressions may be used where a type is i ndicated. Sets may be used i nstead of
sequences. Ti mes may be used i nstead of i ntegers. The BY clauses may be omi tted i n
which case BY 1 wi t t be assumed. The loop bodi es are l i sts of statements enclosed i n
square brackets ([] ). The statement:
wri tten in the body wi l l exi t from the loop, whi te the statement:
wi l l ski p the rest of the body. The forms of the condi tional statement are:
I F bool ean THEN body
IF bool ean THEN body ELSE body
IF bool ean THEN body ELSEI F bool ean THEN body
The special loop form:
FORALL vari abl e: recordtype I N stream Do body
i s a conveni ent short notation for:
[ NEXT vari abl e: recordtype FROM stream;
IF vari abl e = NI L THEN [ BREAK ];
L. Standard procedures
REPORT( ) pri nts the arguments one by one on the computer di splay and the standard
output fi le. The pri ntout format depends on the argument types. I n particular,
structured val ues are pri nted as i f thei r elements were enumerated in order.
FCHARS( i nt ) returns a format val ue control l i ng the n umber of characters to be occupied by
an item on the report. l nt=O means free format.
FlTEMS(i nt) returns a format val ue control l i ng the n umber of i tems per l i ne in the report.
l nt=O means free format.
FJUST(I j ) returns a format val ue control l i ng whether the proper characters of the i tem
should be left (lj is true) or right justified.
OPEN(fi l ename, fl ag) returns a stream val ue associated wi th the fi l e f i l ename (a stri ng) i n
bi nary mode i f the fl ag i s true, otherwi se, or i s the flag i s omi tted, i n ASCII mode.
WI THI N(a, b, c) returns true if, and only if, a is in the cl osed i nterval [b, c]. The
parameters must be ti mes or i ntegers.
WI THOUT(a, b, c) returns true if, and only if, a is not in the closed i nterval [b, c ]. The
parameters must be ti mes or i ntegers.
Mt N(a, b . . . ) returns the smallest among a, b and so on (ti mes or i ntegers)
MAX(a, b . . . ) returns the l argest among a, b and so on (ti mes or i ntegers)
STRING(any) returns a stri ng which would be pri nted by REPORT for an i nteger, string,
atom or time val ue.
ATOM(any) returns an atom such that STRING(ATOM{ STRI NG{any) ) ) = STRING{ any)
SUBSTRING{ stri ng, i 1 , i 2) returns the substri ng from character i 1 up to and i ncl udi ng
character i 2. Indexi ng of characters starts wi th 0. The nul l stri ng i s returned i f
i 2(i 1 or i f the i ndices are out of range.
REPLACE(stri ng1 , i 1 , i 2, stri ng2 ) returns a copy of stri ng1 i n which. the substri ng i 1
through i 2 i s replaced by stri ng2.
Ft ND{ stri ng 1 , stri ng2) returns the i ndex of the fi rst character of stri ng2 i n stri ng1 , or ni l
i f stri ng2 i s not contai ned i n stri ng1 .
SET(a, b . . ) returns a set contai ni ng the atoms a, b . . .
SEOUENCE(a, b . ) returns a sequence contai ni ng the val ues a, b . . .
PERMUTE{ a, b) Parameter b must be an i nteger sequence, a i s a set or sequence; returns a
permuted by b. (b[i ] determi nes the new i ndex of a[i ] )
PERMSORT(sequence) returns a sequence of i ntegers whi ch i s a permutation vector which
i f appl i ed to the sequence (or set ) will resul t i n a sorted sequence. Sets are
sorted by compari ng the pri nted representation (see STRI NG) of thei r consti tuent
atoms. Sequences must contai n i ntegers or ti mes.
SORT(a) does PERMUTE{ a, PERMSORT(a) )
DATE{ year, month, day) returns the absolute date (C.4.3) assembled from the i nteger
operands. The i nteger constants JANUARY, FEBRUARY . may be used for month.
YEARS{ i nt ) returns the time i nterval of i nt years.
MONTHS( i nt) returns the time i nterval of i nt months.
DAYS{ i nt ) returns the ti me i nterval of i nt days.
HOURS{ i nt ) returns the ti me i nterval of i nt hours.
MINUTES{ i nt ) returns the ti me i nterval of i nt mi nutes.
SECONDS{ i nt ) returns the ti me i nterval of i nt seconds.
Now( i nt) returns the current absol ute date.
I YEAR(ti me) returns the i nteger year portion of the absol ute date.
IMONTH(ti me) returns the i nteger month portion of the ti me val ue.
IDAY( ti me) returns the i nteger day portion of the ti me val ue.
IHOUR(ti me) returns the i nteger hour portion of the ti me val ue.
IMINUTE(ti me) returns the i nteger mi nute portion of the ti me val ue.
ISECOND( ti me) returns the i nteger second portion of the ti me val ue.
IWEEKDAY(ti me) returns the i nteger weekday of the absol ute date. The resul t can be
checked agai nst the constants MoNDAY, TUESDAY . . .
MEAN( stat, mul t) returns mul t times the mean accumul ated i n the stat ++ . . . operations as
an i nteger value. Mul t may be omi tted and then i t defaul ts to 1 .
SIGMA(stat, mul t) returns mul t ti mes the standard deviation accumul ated i n the stat ++ . .
operations as an i nteger val ue. Mul t may be omi tted si nce i t defaul ts to 1 .
DMEAN( stat, mul t) returns a double precision mean i n a statistics type val ue. Mul t may be
omi tted and then i t defaul ts to 1 .
L. Example
Let us assume that gi ven a measurement fi l e (Appendi x B), a report of the fi l es
mentioned i n i t and thei r final l engths is desi red. The report should appear i n two
col umns, sorted al phabetically on fi lenames. The program can be wri tten as follows:
( TYPE = 'S' I FILENAME:ATOM, NO:INT, ... - - see C. 3 ) );
] ;
] ;
The name of the i nput fi l e is specified as a macro parameter, FILE. The report is prepared
as two sequences: FILENAMES, a set, holds the names of the fi l es whi l e correspondi ng
elements i n LENGTHS hold the i nteger l engths. Note the use of enumeration through a
temporary permutation vector for pri nti ng the report i n al phabetical order.
Appendix D: Task Order for Project D
[ Note: this task order was changed (3.9.3) on December 12, 1975 wi th the addi ti on of the
fol lowi ng qual ification:
NOTE: I n i ts i nitial version, the program shoul d j ust perform the defaul t permutation:
wi th or wi thout any i nput.
D. / Introduction
Project D is to i mpl ement a system for permuti ng the pages of a di sk wi thout changing
the contents of any file or the meani ng of any di rectory. The system comes i n two parts:
the planner, whi ch constructs the desired permutation of the pages and wri tes i t
on a fi le;
the permuter, which performs the permutation specified by a fi le, which might be
the output of the planner, or might be generated i n some other way.
The system should be abl e to handle up to a mi l lion pages and two hundred thousand
fi l es. I n other words, any per-page or per-file i nformation must be kept on scratch fi l es,
not in memory. The detai l s of the di sk format and i nput-output operations should be
wel l parameterized.
In order to make the program work at a reasonabl e speed, i t is essenti al to
run the di sk at ful l speed while movi ng data, as nearly as possi bl e (The time to
transfer one page is typically about one-twentieth of the time to make a random
do the bookkeeping of page posi tions wi th batch-processing techni ques (sorts and
merges) rather than straight-forward tabl e lookups, si nce looki ng somethi ng up
randomly i n a table wi l l always require a di sk reference.
To construct this system, you wi l l have to know about the structure of a di sk. Thi s
i nformation can be found i n the operati ng system manual.
.2 The Planner
The planner takes as i nput a l ist of pai rs: (parti tion, l ist of enti ties). A partition i s an
expression which speCifies a set of di sk pages. I t has the form
where each I has the form
I . . X I I X
x : : = i nteger 1 i nteger - i nteger I ALL
Parti ti ons are a way of segmenti ng the disk. A fi l e i s not al lowed to occupy more than
one parti ti on. In other words, i f any pages of a file are in a gi ven parti tion, then all the
pages of the file must be i n that parti ti on.
The l ist of enti ties i s a sequence of enti ties separated by spaces or carri age returns. An
enti ty may be
a fi l e name, which may i ncl ude #s and s, which should be i nterpreted as
matchi ng a si ngle character or an arbi trary stri ng respecti vely.
The constructed permutation shoul d leave the fi les i n the order i ndi cated by the l ist of
enti ti es wi thi n each parti tion; i .e. successi ve fi l es i n the l ist of enti ti es occupy successi ve
vi rtual di sk addresses. The pages i n each fi l e should occupy di sk pages wi th consecuti ve
vi rtual addresses, and should be ordered accordi ng to page n umber i n the fi l e.
The enti ty @OTHER FILES stands for all the fi l es not mentioned expl i ci tly i n the enti ty
l ist. The enti ty O FREE PAGES means that n free pages should be i nserted at that poi nt.
The enti ty @FREE SPACE*f stands for a fraction f of the free space i n the current
parti tion (i .e. the n umber of pages in the partition mi nus the number of pages in al l the
fi les in the enti ty l ist). Here f is expressed i n deci mal , e.g. @FREE SPACE* .333 for one
thi rd of the free space. @FREE SPACE stands for any space left over after al l the other
enti ties have been taken care of.
Here is an example of i nput to the planner:
[DRIVES 0, SURFACES 0- 1 , TRACKS 0- 1 7 4 225-400, SECTORS ALL]
*. BR
Thi s i nput specifies two partitions. The second one, whi ch occupies the middl e tracks of
the di sk, wi l l contai n the system di rectory SYSDIR (foll owed by 20 free pages) and all the
BCPL fi les. The remai nder of the di sk wi l l get all the other fi les, wi th the .BR fi l es fi rst.
The output of the planner is a fi l e contai ni ng a sequence of disk addresses. The ith i tem
i n thi s sequence is the desti nation of the page whi ch currently has disk address i (If
some other representation of the permutation proves to be more convenient, that i s fi ne.)
D.2. 1 Planner Algorithm
Here is a possi bl e way for the planner to operate.
1. Look up all the file names i n the entity l i sts and replace each by the identifier
(serial and version number) of the fi l e. This should be done by sorting the
di rectories and the enti ty l ists, and then passi ng one agai nst the other.
2. Make a complete scan of the di sk and construct a l i st D which describes the
contents of each non-empty di sk page: [disk address, fi l e identifier, page
3. Sort on file identifier and page number. Sort the entity l i sts the same way,
keepi ng track of the position of each entry.
4. Pass the sorted entity l i sts agai nst D (all at once) and add the partition and
posi ti on withi n the parti tion to each entry i n D. At the same ti me, make a l i st F
wi th one entry per fi l e which contai ns the identifier, length, parti tion and
position of the fi l e.
5. Sort F by parti tion and position.
6. Now i t i s easy to compute the desti nation of the fi rst page H each fi l e, si nce the
fi l es i n F are i n the order i n whi ch they are to appear on the fi nal disk. Add thi s
i nformation t o each entry of F, and sort i t agai n by fi l e i dentifier.
7. Pass F agai nst D and add the fi nal posi ti on i nformation to each entry of D.
8. Final l y, sort D by current di sk posi tion.
D.J The Permuter
There are three jobs to be done by the permuter:
1. Move the data;
2. Fi x up the chai ns of forward and backward poi nters which l i nk the pages of each
fi l e together;
3. Fi x up al l the di rectories so that each entry contai ns the new di sk address of the
leader pages for i ts fi l e.
D.3. 1 Permuter Algorithm
The algori thm to be used is a si mpl e recursi ve one. At each stage it is worki ng on an
active region of n consecutive di sk pages, t o which some permutation must be appl ied
( i ni tial ly i t is worki ng on the enti re di sk). If n is smal l enough that there is room i n core
for al l the data, si mpl y read in all the pages, and rewrite them in the permuted order.
Otherwise, spl i t the active region i nto two sub-regions A and B, each contai ni ng n/2
pages, and swi tch pages between A and B unti l each page is in the proper sub-regi on.
To do this spl i t, start at the begi nni ng of A and fi l l memory wi th pages from A which
belong i n B, leaving room for one track worth of data. Then move to the begi nni ng of B,
and track by track read i n pages which belong i n A, and then wri te onto the space thus
freed, and any free pages, the pages i n memory whi ch are i n transi t from A to B. Then
go back to A and i terate this procedure unti l both regions have been exhausted. The ti me
requi red is twice the ti me to scan the enti re active region, pl us some seek ti me whi ch wi l l
be fai rl y small i n comparison. Now apply the algori thm recursively to regions A and B.
The total time to deal with a region of n pages is roughly
where T is the ti me to read one page. and M is the number of pages which wi l l fit i n
memory with room for one track more.
Thi s algorithm needs as i nput a l ist which gi ves the desti nation of each page ordered by
current location of the page. It should produce two new l ists which serve the same
purpose for each sub-region, so that the recursion can proceed. The construction of these
new l i sts can easi l y be done whi l e the data is being moved. si nce al l the necessary
i nformation is avai lable i n the right order.
D.4 Remarks
Some care must be taken wi th the scratch fi l es. si nce they are bei ng moved along wi th
everythi ng else. It woul d probably be prudent to create all the scratch files needed before
doi ng anythi ng else.
Appendix E: Summary of the Measurements
The fol lowing reports were generated from the measurement database by small programs
wri tten i n the C language (see Sections 3. 5, 3.7, and 3. 8). The outputs of the programs
were edi ted to conform to the format of the present work. The reports are ordered by
projects, and by employees wi thi n a project. Two di fferent types of reports appear: fi rst,
a dai l y and weekl y breakdown of the actual ti me spent by the employee i n productive
capaci ty, and second, the weekly breakdown of the number of l i nes of meta-program or
code wri tten and compi lations i ni tiated. The precise meani ngs of the labels used are as
l i nes
kbd. l i n
weeks are n umbered to correspond to the label i ng i n Figures 10 through
13. If a date i s gi ven, it refers to the Monday of the week.
net change in the l ength of meta-programs (for M1 and M2) or i n the
length of source code expressed in l i nes (3.7).
number of J i nes typed i n from the keyboard. Some of these l i nes would
be later deleted or duplicated by copyi ng.
days/week (man-)days in the week, excl udi ng hol idays.
Not used for
cor. l i n
tot. com
net. com
l oads
the l i nes col umn corrected for the standard 5 day week. This number i s
used i n Figures 10 through 13.
the total n umber of ti mes the compi ler was called. It was standard
practice to run the compiler on i ncomplete code to get a l isti ng of symbols
which had to be defined.
number of compi lations wi thout errors.
the total n umber of ti mes the loader was cal l ed.
I mportant note: the n umbers do not add up because of truncation in the terms. Sums
gi ven are the precise sums truncated. Denomi nators i n the l i sti ngs of averages were
sel ected for conveni ence.
E. / Projects A+B
Note: duri ng the early experiments group (Projects A and B) measurements were not as
extensive as i n the later ones (4.4).
Employee: M1
week l i nes
1 1 -Jul -75 340
2 8-Jul -75 530
3 1 5-Jul -75 568
4 22-Jul -75 37
5 29-Jul -75 701
6 5-Aug-75 262
7 1 2-Aug-75 4 1 3
8 1 9-Aug-75 453
9 26-Aug-7.5 80
1 0 2-Sep-75 1 56
1 1 9-Sep-75 1 73
1 2 1 6-Sep-75 1 1 2
1 3 23-Sep-75 8
total 3832
total /1 2 3 1 9
Emgl oye: T1 T2
week l i nes man-days/week cor. l i n
1 293 6 243
2 546 1 0 273
3 867 1 0 433
4 235 1 0 1 1 8
5 666 1 0 333
6 666 1 0 333
7 322 1 0 1 61
8 722 1 0 361
9 231 1 0 1 1 6
1 0 201 8 1 26
1 1 1 96 1 0 98
1 2 329 1 0 1 64
1 3 398 1 0 1 99
total 5671
total /1 3 436
E.2 Project C
Employee: Ml
M T w T F s s total
1 1 4-Jul -75 ? ? 3: 1 1 2: 01 0: 33 0: 47 5: 1 5 1 1 : 49
2 2 1 -Jul -75 0: 1 0 2: 24 1 : 29 1 : 52 4: 45 1 :06 1 : 07 1 2: 56
3 28-Jul -75 1 :03 4: 27 3: 34 7: 42 2: 24 1 :51 0: 37 2 1 : 41
4 4-Aug-75 2: 37 4: 26 7: 29 7: 06 6: 52 2: 1 4 3: 49 34: 37
5 1 1 -Aug-75 7: 25 5: 08 4: 31 1 : 48 0: 29 0: 00 2: 31 2 1 : 54
6 1 8-Aug-75 4:49 4: 1 3 2:49 2: 27 3: 34 1 :23 3: 08 22: 26
7 25-Aug-75 3: 52 1 :41 1 : 59 5: 07 3: 05 0: 00 0: 31 1 6: 1 7
8 1 -Sep-75 5: 03 1 :39 5: 2 1 4: 58 3: 1 6 0: 00 1 : 1 5 2 1 : 34
9 8-Sep-75 8: 09 6: 24 3: 53 2: 58 0: 00 0: 00 2: 24 23:49
1 0 1 5-Sep-75 3: 31 1 : 58 6: 29 2: 1 4 0: 00 0: 00 0: 00 1 4: 1 4
1 1 22-Sep-75 1 : 1 3 0: 00 0: 00 0: 00 2: 04 4: 20 3: 57 1 1 :37
1 2 29-Sep-75 0:00 0:00 0: 00 0: 00 0: 00 0: 00 0: 00 0: 00
1 3 E-Oct-75 3: 52 5: 07 5: 27 5: 03 4: 07 1 : 1 9 2: 30 27: 28
1 4 1 3-0ct-75 5: 54 0: 44 0: 00 3: 09 2: 1 3 0: 00 1 : 37 1 3: 39
1 5 20-0ct-75 0:00 0:00 0: 00 0: 00 2: 25 0: 53 1 : 1 9 4: 38
1 6 27-0ct-75 2: 1 9 1 : 50 2: 25 2: 1 6 4: 52 0: 00 0: 00 1 3:44
1 7 3-Nov-75 2: 45 1 : 05 2: 06 0: 34 0: 43 0: 00 0:00 7: 1 5
1 8 1 0-Nov-75 0: 49 1 : 06 1 : 06 0: 00 0: 31 0: 00 0: 00 3: 34
1 9 1 7-Nov-75 0: 1 6 0: 20 0: 00 0: 00 0: 00 0: 00 0: 00 0: 36
283: 48
total /1 8 1 5:46
week l i nes kbd. l i n days/wk cor. l i n
1 421 397 5 421
2 375 351 5 375
3 526 485 5 526
4 324 2 1 3 5 324
5 622 477 5 622
6 345 322 5 345
7 1 56 1 40 5 1 56
8 250 1 85 5 250
9 354 31 0 5 354
1 0 98 77 5 98
1 1 1 45 1 35 5 1 45
1 2 0 0 0
1 3 749 664 5 749
1 4 258 1 94 5 258
1 5 38 28 5 38
1 6 79 85 5 79
1 7 1 52 1 31 5 1 52
1 8 1 5 4 5 1 5
1 9 1 1 1 5 1
total 491 6 421 7
total/ 1 9 258 221
Project C continued
EmRIQee: T3
M T w T F s s total
1 1 4-Jul -75 ? ? 1 :34 8: 46 4: 29 0: 00 0: 00 1 4: 50
2 2 1 -Jul -75 5: 42 8: 01 4: 1 8 5: 50 6: 03 0: 00 0: 00 29: 55
3 28-Jul -75 5: 08 7: 24 4: 28 6: 56 7: 01 0: 00 0: 00 30: 59
4 4-Aug-75 7: 1 0 8: 08 7:09 7: 44 8: 32 0: 00 0: 00 38: 45
5 1 1 -Aug-75 6: 51 8: 41 7: 1 6 6: 41 7: 56 0:00 0: 00 37: 28
6 1 8-Aug-75 7: 29 7: 53 5: 21 6: 1 1 6: 43 0: 08 2: 37 36: 24
7 25-Aug-75 3: 54 7:33 5: 1 6 7: 37 7:36 0:00 0: 00 31 : 58
8 1 -Sep-75 0: 00 4: 25 7:48 7: 27 6: 28 0: 00 0:00 26: 1 0
9 8-Sep-75 6: 01 3: 51 4: 59 5: 47 7: 20 0: 00 0:00 28: 01
1 0 1 5-Sep-75 7: 46 . 8: 50 8: 52 8: 1 3 8: 52 0: 00 0: 00 42: 35
1 1 22-Sep-75 8: 42 5:03 5: 48 4: 39 7: 25 0:00 0: 00 31 : 40
1 2 29-Sep-75 6: 58 7: 37 7: 1 6 7: 05 3: 57 2: 39 0: 00 35: 35
1 3 6-0ct-75 4: 22 8: 05 5: 34 5: 44 5: 55 0: 00 0: 00 29: 42
1 4 1 3-0ct-75 6: 59 5: 26 ' 8: 34 7: 38 7: 42 0: 00 0: 00 36: 21
1 5 20-0ct-75 6: 52 7:34 8: 50 8: 06 7: 03 2: 02 0:00 40:31
1 6 27-0ct-75 4: 44 8: 20 7: 30 7: 44 5: 23 0: 00 0: 00 33: 43
1 7 3-Nov-75 7: 53 7: 27 6: 45 6: 1 1 5: 49 0: 00 0: 00 34: 08
1 8 1 0-Nov-75 6: 45 7: 04 5: 27 7: 28 6: 37 0: 00 0: 00 33: 24
1 9 1 7-Nov-75 6: 28 6: 31 6: 39 7: 29 7: 1 4 0: 00 0:00 34: 23
20 24-Nov-75 7:09 0: 00 0:00 0: 00 0: 00 0: 00 0:00 7: 09
633: 41
total /1 9 33: 21
week l i nes kbd. l i n days/wk COLl i n tot. com net. com loads
1 270 1 99 5 223 26 1 9 1 3
2 690 299 5 558 88 67 35
3 596 4 1 9 5 5 1 0 72 52 26
4 1 70 302 5 267 72 67 47
5 496 586 5 445 54 45 34
6 362 331 5 42 1 96 85 50
7 242 1 82 5 248 34 27 1 7
4 1 8 233 4 489 43 28 1 5
9 474 1 84 5 406 50 40 24
1 0 4 1 3 4 1 9 5 429 1 20 92 47
1 1 383 358 5 320 61 54 26
1 2 61 4 473 5 452 61 49 38
1 3 604 307 5 588 29 23 1 4
1 4 2 1 8 337 5 372 69 49 23
1 5 679 41 7 5 478 90 74 42
1 6 243 337 5 2 1 8 70 65 32
1 7 342 323 5 284 52 40 25
1 8 2 1 6 286 5 264 61 46 36
1 9 - 1 4 1 34 5 53 38 34 27
20 0 0 1 1 65 3 3 2
total 7423 61 33 1 1 89 959 573
total /1 9 390 322 62 50 30
1 30
Project C continued
Employee: I4
M I w I F s s total
1 1 4-Jul -75 ? ? 0: 49 4: 52 6: 43 0: 00 0:00 1 2: 25
2 21 -Jul -75 6: 37 8: 30 6: 36 8: 1 4 4: 50 0:00 0:00 34: 48
3 28-Jul -75 8: 1 1 6: 1 7 5: 49 5:43 6: 46 0: 00 0: 00 32: 48
4 4-Aug-75 5: 52 8: 38 5: 43 6: 57 7: 38 0: 00 0: 00 34: 50
5 1 1 -Aug-75 5: 1 0 8: 1 5 8: 04 7: 34 8: 22 0: 00 0: 00 37: 27
6 1 8-Aug-75 5: 1 7 7: 35 6: 58 6: 47 9: 06 0: 00 0: 00 35: 45
7 25-Aug-75 7: 38 6: 21 . 7: 1 0 8: 21 7: 55 0: 00 0: 00 37: 26
8 1 -Sep-75 0:00 7: 52 7: 1 6 7: 54 5: 46 0:00 0: 00 28: 49
9 8-Sep-75 8: 23 7: 38 7: 50 6: 53 7: 1 1 0: 00 0: 00 37: 57
1 0 1 5-Sep-75 8:05 7: 25 4: 50 7: 1 2 7: 1 6 0: 00 4: 41 39: 32
1 1 22-Sep-75 8: 53 7: 34 3: 35 6: 38 7: 56 2: 28 0: 00 37: 07
1 2 29-Sep-75 5: 59 6: 1 5 8: 1 4 7: 59 8: 1 7 0: 00 2: 22 39: 08
1 3 6-0ct-75 8: 1 3 7:04 7: 58 7: 31 5: 40 0: 00 0: 00 36: 28
1 4 1 3-0ct -75 7:38 7: 39 8: 54 6: 43 6: 54 0: 00 0: 00 37: 50
1 5 20-0ct -75 6: 25 7: 35 7: 35 7: 55 5: 41 0: 00 1 : 38 36: 51
1 6 27-0ct -75 7: 52 7: 33 7: 32 5:45 8: 55 0: 00 0: 00 37: 40
1 7 3-Nov-75 6: 49 7: 1 4 7: 41 8: 46 8: 54 0: 00 0:00 39: 26
1 8 1 0-Nov-75 8: 1 8 7: 28 7: 52 9: 04 7: 34 0: 00 0: 00 40: 1 8
1 9 1 7-Nov-75 8: 1 0 9: 54 6: 30 8: 04 7: 27 0: 00 0: 36 40: 44
20 24-Nov-75 7: 21 0: 00 0: 00 0: 00 0: 00 0: 00 0: 00 7: 21
total /1 9 36: 02
week l i nes kbd. l i n days/wk cor. l i n tot. com net. com l oads
1 1 76 1 1 1 5 223 22 1 5 1 1
2 427 288 5 558 47 31 1 6
3 425 348 5 5 1 0 47 27 24
4 364 386 5 267 35 30. 1 9
5 394 424 5 445 50 36 27
6 481 41 0 5 421 54 37 28
7 254 230 5 248 58 42 32
8 364 268 4 489 40 32 25
9 338 320 5 406 54 41 29
1 0 445 2 1 9 429 82 67 60
1 1 258 31 1 5 320 27 20 1 2
1 2 291 554 5 452 1 23 93 53
1 3 573 544 5 588 91 78 41
1 4 526 549 5 372 7 1 39 1 6
1 5 277 294 5 478 66 61 67
1 6 1 94 287
2 1 8 79 67 51
1 7 226 97 5 284 51 46 51
1 8 31 2 205 5 264 48 40 38
1 9 1 21 1 87 5 53 51 40 46
20 67 9 1 1 65 4 4 5
total 6521 6052 1 1 00 846 651
total /1 9 343 31 8 57 44 34
E.J Project
EmgiQe: M1
M T w T
o4 1 0-Nov-75 0: 00 1 : 35 2: 4 0: 00
1 7-Nov-75 2: 47 0: 34 3: 57 2: 49
1 24-Nov-75 4: 47 0: 42 2: 25 0: 00
2 1 -Dec-75 1 :00 2: 34 6: 1 8 1 :38
3 8-Dec-75 2: 52 2: 56 6: 54 2: 39
4 1 5-Dec-75 1 : 25 6: 27 7:32 3: 08
week l ines kbd. l i n
244 206
1 297 267
2 1 62 1 37
3 401 335
4 427 350
total 1 572 1 345
total /5 3 1 4 269
Employee: M1 hel pi ng with the debuggi ng
4 1 5-Dec-75
Emgl oyee: T 4
1 24-Nov-75
2 1 -Dec-75
3 8-Dec-75
4 1 5-Dec-75
1 45: 44
5 ( est. )
week l i nes
1 407
2 61 3
3 355
4 1 022
total 2399
5 ( est.) 0
total/4 599
0: 00
0: 00
5: 54
6: 27
kbd. l i n
bi nary code: 1 0988 words
T w T
0: 00 0: 00 0: 00
T w T
7: 50 1 0: 01 0: 00
7: 1 2 7: 2 1 7: 31
7: 40 7:33 7: 34
7: 24 1 1 : 24 9: 38
days/wk cor. l i n
2 1 01 8
5 6 1 3
5 355
6. 5 786
F s s total
0:00 0:00 0: 00 3: 39
0:44 1 :32 3: 44 1 6: 1 0
0: 00 0: 00 3: 20 1 1 : 1 6
2:07 0: 00 2:54 1 6: 33
5: 26 1 :05 3:09 25: 04
7:03 0:00 0: 07 25: 44
98: 26
1 9: 41
F s s total
1 :43 5:30 0:00 7: 1 4
F s s total
0: 00 0: 00 0:00 1 7: 51
7: 1 7 0: 00 0:00 35: 1 7
5: 23 4: 09 0: 00 37: 57
1 1 : 59 7: 45 0: 00 54: 39
32: 00
36: 26 net. com loads
29 23 22
95 68 58
83 58 66
90 39 65
297 1 88 21 1
80 50 60
74 47 52
E.4 Project 2
EmQi o: M
M T w
1 0-Nov-75 . 0: 00 5: 02 8: 07
1 7-Nov-75 5:39 6: 02 4:51
1 24-Nov-75 6: 41 5: 30 6: 55
2 1 -Dec-75 5: 03 0: 00 2: 33
3 8-Dec-75 0: 00 0: 49 4: 35
4 1 5-Dec-75 0: 00 0: 28 0: 00
week l i nes kbd. l i n
342 327
874 782
1 231 421
2 296 263
3 401 4 1 5
4 1 57 1 48
total 2304 2360
total /5 460 472
Emploee: M2 hel pi ng wi th t he debuggi ng
M T w
3 8-Dec-75 0: 00 0: 00 1 : 1 8
4 1 5-Dec-75 6: 47 3: 1 1 6: 59
total /2
EmpiQe: T3
M T w
1 24-Nov-75 0:00 7: 1 1 6: 02
2 1 -Dec-75 7: 1 6 7: 04 7:04
3 8-Dec-75 7: 1 3 7: 36 7: 21
T F s s total
5: 03 6: 32 0: 00 0: 00 24: 46
5: 1 9 7: 28 0: 00 0: 00 29: 22
0: 00 0: 00 0: 00 0:00 1 9: 06
0: 00 1 :04 0: 00 0: 00 8: 42
4: 03 3: 1 6 0: 00 0: 00 1 2: 44
0: 41 0: 33 0: 00 0: 00 1 :42
96: 22
1 9: 26
T F s s total
2: 47 2: 46 0: 00 3: 32 1 0: 26
1 : 1 1 7: 23 4: 08 0: 00 29: 42
40: 08
20: 04
T F s s total
0: 00 0: 00 0:00 0:00 1 3: 1 4
5: 40 6: 57 0:00 0:00 34: 03
6: 35 6: 40 0: 00 0: 00 35: 27
4 1 5-Dec-75 6: 1 1 7: 1 1 7: 50 1 0: 34 1 0: 1 6 8: 58 0: 00 51 : 03
1 33:47
total/4 33: 26
5 ( est. ) 32:00
week l i nes kbd. l i n days/wk cor.l in tot. com net. com l oads
1 268 257 2 670 1 4 1 0 4
2 884 7 1 6 5 884 37 24 1 2
3 387 445(83) 5 387 52( 24) 4 1 ( 1 1 ) 36( 1 0)
4 927 659( 1 51 ) 6. 5 7 1 3 94(78) 68( 70) 38(43)
total 2467 2079( 234) 1 97( 1 02) 1 43( 81 ) 90(53)
5 ( est. ) 0 80 50 60
total /4 6 1 6 51 9 49 35 22
(fi gures i n parenthesi s show the contri buti on of M2 whi l e hel pi ng wi th the
debuggi ng ( 4. 8. 3. 1 ) )
bi nary code: 8898 words
E.5 Project control
ee: P1
1 1 2-Jan-76
2 1 9-Jan-76
3 26-Jan-76
4 2-Feb-76
5 9-Feb-76
6 1 6-Feb-76
7 23-Feb-76
8 1 -Mar-76
1 29:03
week l i nes
1 0
2 246
3 472
4 596
5 25
6 1 1 0
7 -55
8 -409
total 986
total/? 1 40
Empl o
ee: P2
1 1 2-Jan-76
2 1 9-Jan-76
3 26-Jan-76
4 2-Feb-76
5 9-Feb-76
6 16-Feb-76
7 23-Feb-76
total 209:34
total/7 29:56
week l i nes
1 454
2 297
3 2 1 6
4 1 06
5 337
6 337
7 1 56
8 0
total 1 907
total /7 272
M T w T
? ? 3: 36 5:00
0: 00 0: 01 6: 32 6: 29
3: 44 5: 1 5 6: 23 0: 00
0: 1 0 0: 00 5: 1 7 5:38
0: 00 0: 00 0: 00 0:00
8:34 6: 49 6: 36 0: 00
0: 28 0: 00 1 : 2 1 2: 26
4: 41 5: 52 6: 08 1 : 28
kbd. l i n days/wk cor. l i n
1 1 4 0
1 7 1 4 308
507 4 590
863 5 596
1 7 0 0
1 85 3 1 83
1 51 4 -67
1 78 4 - 51 1
M T w T
5: 1 7 6: 1 2 6: 22 7:36
6: 1 6 6: 38 7: 20 6: 53
6: 48 6: 30 4: 52 6: 02
6: 42 7: 00 6: 1 3 5: 47
6:03 6: 1 7 6: 33 5: 51
5: 53 6:02 6:36 5:37
6:06 6:45 5: 51 6: 17
kbd. l i n days/wk cor . l i n
261 5 454
529 5 297
3 1 0 5 2 1 6
263 5 1 06
385 5 337
439 5 337
323 5 1 56
0 0 0
251 3
bi nary code: 6364 words, representi ng 2 1 34 l i nes
( balance of source code was used for testi ng)
F s s total
0:00 0:00 0: 00 8:36
4: 53 1 :42 0: 00 1 9: 38
3: 37 0: 00 0: 00 1 9: 00
4: 41 9: 53 5: 01 30: 43
0: 00 0: 00 0: 52 0: 53
0: 00 0: 00 0: 00 21 : 59
4: 1 6 1 :31 0: 00 1 0: 04
0: 00 0: 00 0: 00 1 8: 1 0
1 8: 26
tot. com net. com l oads
4 4 ?
0 0 ?
8 6 ?
69 40 ?
1 1 ?
25 20 ?
2 1 1 3 ?
32 22 ?
1 60 1 06 ?
22 1 5 ?
F s s total
5:06 0: 00 0:00 30: 35
4: 07 0:00 0: 00 31 : 1 6
5: 32 0: 00 0: 00 29: 46
6: 08 0: 00 0: 00 3 1 : 5 1
3: 50 . 0:00 0: 00 28: 36
2:40 0:00 0:00 26:49
5: 39 0:00 0:00 30:41
tot. com net. com loads
0 0 0
73 25 1 4
78 45 26
65 39 32
65 42 24
63 30 22
76 44 33
0 0 0
42 1 226 1 51
60 32 2 1
1 34
[A ron]
Aron, J. D., 1970 See (NAT02] page 52.
Proposed Revised American Standard . Code for I nformation I nterchange,
Communications of the ACM December, 1965
[ Baker!]
Baker, Terry, Chief Programmer Team Management of Production
Programmi ng, I B
Systems Journal, Vol. 1 1, No. 1, 1972
[ Baker2]
Baker, F. Terry, System Qual i ty Through Structured Programmi ng, 1972 Fall Joi nt
Computer Conference
[ Bal zer]
Balzer, R., Automatic Programmi ng, lSl Technical Review, January, 1973
Barry, Barbara S., et al. Structured Programmi ng Seri es, Vol ume X, Chief
Programmer Team Operations Descri ption, National Techn ical I nformation
Service RADC-TR-74-300 1975
[ Boehm]
Boehm, Barry. W., The High Cost of Software. 1975 See [Horowi tz]
Brandon, Dick. H., The Economics of Computer Programmi ng. 1970 See
[Wei nwurm]
Brooks, Frederick P. Jr., The Mythi cal Man-Month, Addi son,Wesley, 1975
Brown, 1970 See [NAT02] page 53.
Computerworld, 1974 Aug 21 Raw count of I nstructions / Day May Reward Poor,
Not Good Code
[ Dahl -Hoare]
Dahl , Ole-Johan Hoare, C. A. R., Hierarchical Program Structures. Structured
Programmi ng, Academi c Press, 1972
[ Dahi -Nygaard]
Dahl , Ole-Johan Nygaard, K., Si mul a - an Algol -Based Si mul ation Language,
Communications of the ACM 9,9. September, 1966
[ Dennis-VanHorn]
Dennis, Jack B. Van Horn, Earl C., Programmi ng Semantics for
Mul ti programmed Computations, Communications of the ACM 9,3. March, 1966
[ Deutsch]
Deutsch, L. Peter, An I nteractive Program Verifier, Ph. D. di ssertation,
Department of Computer Sci ence, Uni versi ty of Cal ifornia, Berkeley, June 1973
[ Deutsch-Lampson]
Deutsch, L. Peter Lampson, Butler W., An On-l i ne Edi tor, Communications of
the ACM 10,12. Dece,mber, 1967
Dijkstra, Edsger W., Notes on Structured Programmi ng. Structured Programming,
Academic Press, 1972
[ Drucker]
Drucker, Peter F., Management: Tasks, Responsi bi l i ties, Practices, Harper Row,
[ Engel bart]
Engel hart, Douglas C.; Watson. Ri chard W. Norton, James C., The Augmented
Knowledge Workshop, In AFI PS Proceedi ngs, Vol . 42, Nee, pp. 9-21, 1973
[ Farber-Griswold-Polonsky]
Farber, D. J.; Griswold, R. E. Polonsky, I. P., Snobol, a Stri ng Mani pulati ons
Language, Journal of the ACM 11, 1 1964
[Floyd I]
Floyd, Robert W:, Assigni ng Meani ngs to Programs, i n Proc. Symp. Appl i ed
Mathematics, vol. X| X, Mathematical Aspects of Computer Science, American
Mathematical Society, 1967 -
[ Fioyd2]
Floyd, Robert W., Al gorithm 245 TREESORT 3 [M1], Communications of the
ACM 7,12. December, 1964
Geschke, Charles M., 1975 Pri vate communication.
[Oesch ke-Mi tchel l ]
Geschke, Charles M. Mi tchel l , J., On the Problem of Uni form References to
Data Structures, I EEE Transactions, SE-1, 2. June, 1975
Hoare, C. A. R Notes on Data Structuring. Structured Programmi ng, Academic
Press, 1972
[Hoare-Wi rth]
Hoare, C. A. R. Wi rth, Ni klaus, A Contribution to the Development of Algol,
Communications of the ACM 9,6. J une, 1966
E1 1 i s Horowi tz, ( Ed.) Practical Strategies for Developi ng Large Software Systems,
Addi son-Wesley, 1775
Kat, Daniel Kahn, Robert L., The Social Pychology of Organi zations, Wi ley,
1 965
Kernighan, Brian W. Pl auger, P. J., The Elements of Programmi ng Style,
McGraw-Hil l , 1974
Knuth, Donald E., The Art of Computer Programmi ng, Vol . 1, Addison-Wesley,
Kosy, Donald W. , Ai r Force Command and Control I nformation Processing i n th
1 980s: Trends i n Software Technology, USAF Project Rand, National Technical
I nformation Service Ao-A017-128 1974
Lampson, Butl er W., 1974 Private communication.
Lampson, Butler W. , An Open Operati ng System for a Si ngle-user Machi ne. Revue
Francaise d' Automatique, I nformatique et Recherche Operationnel l e, n

sept. 1975,
[Lampson-Mi tchel l ]
[ LRG]
Lampson, B. W, Mi tchell J. G. Satterthwaite E. H., On the Transfer of Control
Between Contexts. Proceedi ngs, Colloque sur Ia Programmation, Ed. by B. Robi net,
Spri nger-Verlag, 1974
Learni ng Research Group, Personal Dynami c Media. Xerox Palo Al to Research
Center, 1975
[Mayer-Stal naker]
Mayer, Davi d B. Stal naker, Ashford W., Selection and Eval uation of Computer
Personnel. 1970 See [Wei nwurm]
[ McCl ure]
McCl ure, R. M., 1969 See [NAT02] page 88.
McCracken, Daniel D., A Guide to COBOL Programmi ng, John Wi ley Sons, 1963
Metcalfe, Robert M. Boggs, David R., Ethernet: Di stri buted Packet Switchi ng
for Local Computer Networks, Communications of the AcM 19,7. July, 1976
Metzger, Phi l i p W., Managi ng a Programmi ng Project, Prenti ce-Hall, 1973
[Mi l ls]
Mi l l s, Harlan D., Chi ef Programmer Teams, Datamation, December, 1973
Morris, Thomas D., Commentary on the Effective Executive. Peter Drucker:
Contributions to Busi ness Enterprise, Ed. by T.H. Bonaparte, NY Uni versity Press,
Morri s, James H. J r. , Towards More Flexi bl e Type Systems . . 1974 See
[Lampson-Mi tcheii -Satterthwai te]
Morris, James H. Jr., Types Are Not Sets, StGPLAN - SIGACT Symposi um on the
Pri nci ples of Programmi ng Languages, Boston, October 1973
Software Engi neeri ng, Report of Nato Science Commi ttee, Ed. Peter Naur and
Bri an Randel l 1969
Software Engineering Techniques, Report of Nato Sci ence Commi ttee, Ed. J. N.
Buxton and B. Randel l 1970
[ Naur1]
Naur, Peter, Proof of Algorithms by General Snapshots, BIT 6,4 1966
Naur, Peter, Program Translation Vi ewed as a General Data Processi ng Problem,
Communications of the ACM 9,3. March, 1966
Naur, Peter, Concise Survey of Computer Methods, 1974 Petrocel l i Books
Parnas, D. L., On the Cri teria to be Used in Decomposi ng Systems i nto Modules,
Communications of the ACM December, 1972
[ Parnas2]
Parnas, D. L., The Infl uence of Software Structure on Rel i abi l i ty, Proceedi ngs of
the I nternational Conference on Rel iable Software, Los Angeles, Apri l 1975. IEEE
Cat. No. 75CH0940-7CSR
[Pi etrasanta]
Pi etrasanta, Al fred M., Resource Analysis of Computer Program System
Development. 1970 See [Wei nwurm]
1 37
Reynolds, Carl. H., What's Wrong wi th Computer Programmi ng Management?
1970 See [Wei nwurm]
Richards, M., BCPL: A Tool for Compi l er wri ti ng and System Programmi ng, Pro.
AFIPS Conf., 35, 1969, SJCC
Royce, Wi nston. W., Software Requi rements Analysi s. 1975 See [H.orowitz]
Sackman, H., Eri kson, W. H. Grant, E. E., Exploratory Experi mental Studi es
Comparing Onl i ne and Offl i ne Programmi ng Performance, Communi cations of
the AcM 1 1, 1. January, 1968
[Tei tel man]
Tei telman, Warren, I nterl i sp Reference Manual. Xerox Palo Al to Research Center,
Vyssotsky, Victor, Large-scale Rel iable Software: Recent Experience an Bel l Labs.
1975 See [Parnas2]
[Wei nberg]
Wei nberg, Gerald M., 1971 The Psychology of Computer Programmi ng, Van
[Wei nwurm]
Wei nwurm, George F., (Ed.) On the Management of Computer Programmi ng, 1970
[Wi rth1]
Wi rth, Ni kl aus., Program Development by Stepwise Refi nement, Communi cations
of the ACM 14,4. Apri l , 1971
[Wi rth2]
Wi rth, Ni kl aus, The Programmi ng Language PASCAL, Acta I nformatica, Vol ume 1,
pp. 35-63 1971
Wijngaarden, A. van (Ed. ); Mai l l oux B. J.; Peck, J. E. L. Koster, C. H. A., Report
on the Algori thmi c Language ALGOL 68, Numeri sche Mathematik, 14, 79-218 1969
area special i zation 22
capabi l i ties of high level languages 66
check proedure 50
conti nuous process 7
cross meta-programmi ng 68
debugging strategy 49
debugging tactics 49
di ctionary 24 experi ments group 77
elaboration 32
engi neeri ng phases of software production 7
error 46
error i ndi cation 46
feedback communi cations 29
global language 24
language creation 24
language learni ng 24
local language 24
large scale sharing 1 8
mai n experi ments group 77
major qual i fi er 40
meta-program 29
meta-programmer 27
mi nor qual i fi er 40
operation 34
pai nted type 36
production phases of software production 7
proto-software 9
readable software 54
refi ni ng proto-software 9
shari ng of software
state vector syntax checker
subtask special i zation
task order
test bed
test pri nt procedure
uncertai nty absorption
uni ts of production
unpai nti ng
underlyi ng type
user software
wheel network
writeable software