Microprocessor Architectures - A Comparison Based On Code Generation by Compiler (Wirth, 1986)

Edgar H.
Sibley Panel Editor
By carefully tuning computer and compiler, if is possible to avoid the otherujise inevitable compromises between complex compiling algorithms and less-than-optimal compiled code, where the key to performance appears to lie neither in sophisticated nor drastically reduced architectures, but in the key concepfsof regularify and completeness.
MICROPROCESSORARClflTECtUfES: A COMPARISOU 0ASED ON CODE GEUERATlOU BY COMPILER

NIKLAUS WIRTH
To a programmer using a high-level language, computer and compiler appear as a unit. More importantly, they must not only be regarded, but also designed, as a unit.. However, many computers display a structure and an instruction set-an architecture-that mirrors the metaphor of programming by assembling individual instructions. More recent designs feature characteristics that are oriented toward the use of high-level languages and automatic code generation by compilers. Comparing the suitability of different architectures is problematic because many variables are involved and even the criteria by which they are judged are controversial. Ultimately, it is the entire systems effectiveness in terms of speed and storage economy that counts. We chose two criteria for the comparison that we consider relevant: code density and compiler complexity, although they are not the only indicators of overall effectiveness.
l
Simplicity of compilation. A simple, compact compiler is not only faster, but also more reliable. It is made feasible by regularity of the instruction set, simplicity of instruction formats, and sparsity of special features.
Code density. Densely encoded information requires less memory space and fewer accesses for its interpretation. Density is increased by providing appropriate resources (e.g., fast address registers), suitable instructions and addressing modes, and an encoding that takes into account the instructions relative frequency of occurrence.
750
@1986ACMOOOl-0782/86/1000-0978
In this article, we make an attempt to measure and analyze the suitability of three different processors in terms of the above criteria. In general, three variables are involved, namely, the computer architecture, the compiler, and the programming language. If we fix the latter two, we have isolated the influence of the architecture, the item to be investigated. Accordingly, we shall involve a single language only, namely, Modula-2 [8]. Unfortunately, fixing the compiler variable is not as easy: Compilers for different processor architectures differ inherently. Nevertheless, a fair approximation to the ideal is obtained if we use as compilers descendants of the same ancestor, that is, variants differing in their code-generating modules only. To this end, we have designed compilers that use the same scanner, parser, symbol table, and symbol file generator, and, most importantly, that feature the same degree of sophistication in code optimization. It is reasonable to expect that a simple and regular architecture with a complete set of elementary operations, corresponding to those of the source language, will yield a straightforward compiling algo-
970
Communications of the ACM
October 1986
Volume 29
Number 10
Computing Practices
rithm. However, the resulting code sequences may be less than optimally dense. The observation that certain quantities (such as frame addresses) occur frequently may motivate a designer to introduce special registers and addressing modes (implying references to these registers). Or the observation that certain short sequences of instructions (such as fetching, adding, and storing) occur frequently may spur the introduction of special instructions combining elementary operators. The evolution of the more complex architectures is driven primarily by the desire to obtain higher code density and thereby increased performance. The price is usually not only a more complex processor, but also a more complicated compiling algorithm that includes sophisticated searches for the applicability of any of the abbreviating instructions. Hence, the compiler becomes both larger and slower. The microprocessor architectures chosen for this investigation are the Lilith [4, 73, National Semiconductor 32000 [3], and Motorola 68000 [2]. (To denote the latter two, we shall use the abbreviations NS and MC, respectively.) Lilith is a computer with a stack architecture designed specifically to suit a high-level language compiler (i.e., to obtain both a straightforward compiling algorithm and a high code density. Both the MC and in particular the NS are said to have been designed with the same goals although they both feature considerably more complex instruction sets. All three processors are microcoded; that is, every instruction invokes a sequence of microinstructions. These sequences differ in length considerably, and therefore the execution time for the instructions also varies. (On the MC and NS, the microinstructions are stored in a ROM that is included in the processor chip.) Whereas for decades the future was seen to lie in the more baroque architectures, the pendulum now appears to be swinging back toward the opposite extreme. The ideal machine is now said to have only a few, simple instructions [5], where the key distinction (e.g., for RISC architectures) is that execution time is the same for all instructions, namely, one basic cycle. Quite likely the optimal solution is to be found neither in extremely Spartan nor in lavishly baroque approaches. First we present an overview of the compared processor organizations, pointing out a few relevant differences, and then we consider the compiler and its strategy for code generation. By means of a few examples of language constructs, we illustrate the influence of the architecture on the complexity of the code-generation process, which is reflected in turn in the size of the compilers (source program length). Finally, we use the compilers themselves
as test cases to measure the overall density of generated code. THE PROCESSOR ARCHITECTURES AND THEIR INSTRUCTION FORMATS In this section, we compare briefly the essential and relevant features of the three architectures. For more detail, the reader is referred to specific descriptions of the individual processors. All three processors mirror a run-time organization tailored for high-level languages involving a stack of procedure activation records. Lilith and NS feature three dedicated address registers for pointing to the frame of global variables, to the frame of the most recently activated procedure, and to the top of the stack. In the MC, three of the seven general-purpose address registers are dedicated to this purpose. For expression evaluation and storing intermediate results, Lilith features a so-called expression sfack-a set of fast registers that are implicitly addressed by an up/down counter whose value is automatically adjusted when data are fetched or stored. The expression stack logically constitutes an extension of the stack of procedure activation records. Since it is empty at the end of the interpretation of each statement, the difficulties inherent in any scheme involving two levels of storage are minimized: The expression stack need be unloaded (from the registers) into the main stack (in memory) only when context is changed within a statement (i.e., only upon calling a function procedure). In contrast, the other processors offer a set of explicitly numbered data registers. The run-time organizations of the three processors used by the Modula-2 system are shown in Figure 1, on the next page. The processors instruction formats are shown in Figures 2-4 (p. 981). Lilith and NS instructions form byte streams, whereas the MC instructions form a stream of 16-bit units. Lilith is a pure stack machine in the sense that load and store instructions have a single operand address, and actual operators have none, referring implicitly to the stack. Instructions for the NS and MC mostly have two explicit operands. Their primary instruction word contains fields al and a2 indicating the addressing mode (and a register number) and frequently require one or two extension fields containing the actual offset value (called displacement). In the case of indexed addressing modes, the extensions include an additional index byte specifying the register to be used as index. The so-called external addressing mode of Lilith and the NS deserves special mention, It is used to refer to objects declared in other, separately compiled modules. These objects are accessed indirectly via a table of linking addresses. The external ad-
October 1986
Volume 29 Number 10
979
Conlpufirlg Practices
m-- TParameters Parameters
Lilith
NS32000
H (limit)
I
I
Parameters I FP
Variables
I
SP
Parameters
Variables
Parameters
Current global frame G SB AE *
Links MOD PC
I-
Current code frame
- PC PC --*
i
FIGURE1. Run-Time Organizations
dressing mode, when used properly, makes program linking as a separate operation superfluous-a definite advantage whose value cannot be overestimated. In the case of Lilith, the use of a single, global table of module references makes it necessary to modify the instructions upon loading; module numbers generated by the compiler must be mapped into those defined by the module table. The NS system eliminates t.he need for code modification by
retaining a local link table for each module; the loader then merely generates this link table. Another architectural difference worth mentioning relates to the relative facilities for evaluating conditions. Lilith allows Boolean expressions to be treated in the same way as other expressions, where each relational operator is uniquely represented in the instruction set and leaves a Boolean result on top of the stack. In addition, there are conditional
980
Con~mu~~icatiom of the ACM
October 1986
Volume 29
Number 10
Computing Practices
1 opcode
Operator (no operands)
1 op ) a opcode
Single operand instructions
opcode
I I
opcode
opcode
External addressing
FIGURE2. Instruction Formats of Lilith
jumps, corresponding to the AND and OR operators, which are suitable for the abbreviated evaluation of expressions: If the first operand has the value FALSE (TRUE), this value is left on the stack, and the processor skips evaluation of the second operand. By contrast, the NS and MC architectures offer a single comparison instruction, which leaves its result in a special condition code register. The distinction between the various relational operators is established by the use of different condition masks in a subsequent instruction that converts the condition code into a Boolean value. As a result, the compilation of Boolean expressions differs significantly from that of arithmetic expressions and is more complicated. The condition code register is an exceptional feature to be treated differently from all other registers. The primary characteristics discussed thus far are summarized in Tables I and II, on the next page.
THE CODE-GENERATION STRATEGY
Conditional jumps
(FO)
piGq I c I opcode
Procedure calls
(Fl)
Each operand field may require additional displacement and/or index bytes
opcode
al
Operand c is a small integer
(F2)
Single operand instructions
(F3)
pi&-p& I I opcode I a2
Double operand instructions
(F4)
al
Double operand instructions (F6, F8, Fll)
The three compilers we are comparing not only have the same scanner, parser, table handler, and symbol file generator modules, they also share the same method for code generation. The parser uses the topdown, recursive descent method, which implies that each syntactic entity is represented by a procedure recognizing that entity. The procedure is then augmented with statements that generate code, and it has a result parameter that describes the recognized entity in the form of attribute values. The computation of both the code and the attribute values is context free in the following sense: Given a syntactic rule
10
10 I d 11
-64 5 d < 64
-8192<d<8192 I
1 d
the attribute values of So are determined by a function Fi whose arguments are the attribute values of the syntactic constituents S1, . . . , S, that are being reduced into So:
I A(SJ) = Fi(A(&), A(!%), . * . 3 S(Anl);
FIGURE3. Instruction and Displacement Formats of NS32000
the corresponding code sequence C is determined by a function Gi: C(&) = Gi(A(Sl), A(S,), . . . 9S(A,)).
opcode Each operand field may require additional displacement and/or index bytes
opcode
al
oP
a2
al
FIGURE4. Instruction Formats of MC66000
The conventional context dependence, due to the presence of declarations, becomes manifest through the introduction of a symbol table T and the addition of T to be parameter lists of some of the functions Fi and Gi. The attribute values A(S) will represent the value and type ip the case of a constant, the type and address in the case of variables, an offset for record fields and so on. When compiling an expression, for instance, we wish to determine whether the expres-
October 1986
Volume 29 Number 20
961
Computiq
Practices
instruction lengths Address lengths Addresses per instruction External addressing Condition ccrde Data registers Address registers
8.16.24 4,8, 16 OS1 Yes No Stack (16) G, f-7 S. 0-Y
8, 16, 24,32,40, 8, 16, 32
16,32,48,64, 16,32
80
I,2
Yes Yes RO-R7 SB, FP, SP, MOD
1.2
No Yes DO-D7 AO-A6, SP
TABLE II. Data Addressing Modes
Register Address register Register indirect Autoincrement Autodecrement Direct
T (stack)
RLnl WWI M[SP]; INC(SP) DEC(SP); M[SP] M[SB + d] M[FP + d] M[SP + d] M[M[SB f dl] + d2] MIM[FP +- dl] + d2] M[M(SP + dl] + d2] M[SB + d + R[x] x s] M[FP + d + R[x] x s] M[M[SB + dl] + d2 + R[x] x s] M[M[FP + dl] + d2 + R[x] x s] M[M[M[MOO + 8] + dl] + d2]
Wnl Alnl
WWI
M[A[nll; INC(A[n]) DEC(A[n]); M[A[nl]
M[G + d] M[L + d] M[T+ d]
Wlnl + 4
Indirect
Indexed Indirect indexed External immediate
M[T + TJ
M[A[n] + d + O[xJ] M[A[n] + d + A[x]]
M[Mft + dl] + d2] WCI
MPCI
WCI
Capital letters denote resources of the processor; small fetters the parameters of the instruction. n, x are register number (0 7); d, dl, d2 are displacements (offsets). Autoinc and -dac mities are called stack mode ~1 the NS and apply to the SP register only. s denotes a scale factor of 1, 2, 4, or 8. MCs term for direct is register indirect with offset.
sion represents a constant or a variable, because, if an addition is compiled, the addition is performed directly if both t.erms are constants; otherwise, an add operator is emitted, and the attribute indicates that the result is placed on the stack. In order to allow (constant) expressions to occur in declarations,
the compilers ability to evaluate expressions is indispensable. In essence, we wish to distinguish between all modes of operands for which the eventual code might differ. Code is emitted whenever a further deferment of code release could bring no advantage. Table III displays the modes of item
TABLE III. Item Descriptor Modes and Their Attributes
conMd dirMd indMd inxMd
value adr
OffSet
conMd dirMd
indMd
indRMd inxMd inxiMd ,inxRfvld stkMd regMd cocMd typMd$ procMd
value adr adr, offset R adr, RX adr, offset, RX R, offset, RX R cc, Tjmp, Fjmp type prm
ConMd
value
indAMd inxAMd
adr, A adr, DX
stkMd
twMd procMd
WP prm
stkMd AregMd OregMd cocMd typMd procMd
A 0 cc, Tjmp, Fjmp type Prw
Note: The value adr is actually a triple consisting of module number, level, and off&.
982
Commmicatiom
of the ACM
October 1986
Volume 29
Number 10
Computing Practices
descriptors and their attributes as chosen for the three processors. The original modes are conMd, dirMd, indMd, typMd, and procMd: They are the modes given to the newly created constant factor, variable, varparameter, type transfer function, or procedure call, respectively. The other modes emerge when appropriate constructs are recognized: For instance, an item is given inxMd (or inxiMd) when it is combined with an index expression to form an indexed designator. Or an item obtains indMd if a pointer variable (dirMd), followed by a dereferencing operator and a field identifier, has been parsed. In general, the more complicated modes originate from the reduction of composite object designators. Evidently, the set of available modes is determined largely by the addressing modes of the processor: the more addressing modes, the more attribute modes, the larger the state space of the items to be compiled, and the more complicated the transformation and code selection routines. Complex instruction sets and large varieties of addressing modes distinctly increase the complexity of a compiler. Tables IV-VI give examples of the code generated for several typical constructs: procedure parameters, indexed variables, and arithmetic expressions. The three columns display the code for Lilith, NS, and MC, respectively. Procedure Parameters Procedure parameters (Table IV, p. 984) are passed via the stack of activation records. The NS/MC processors deposit the parameters values or addresses on top of the stack (allocated in memory) before control is transferred to the procedure. Since parameters are addressed relative to the local frame base, they are already in their proper place when the procedure is entered. In the Lilith computer, parameters are also put on the stack. However, because the top of the stack is represented by fast registers [the expression stack) and because this stack is reused in the procedure for expression evaluation, the parameters have to be unstacked into the memory frame immediately after procedure entry. This complicates code generation somewhat, but generally also shortens the generated code, because the unstack operations occur in the procedures code once, and not in each call. The fact that the NS/MC architectures include a move instruction compensates for this advantage of Lilith, because in the NS/MC machines the move instruction bypasses registers (which play a role corresponding to the Lilith expression stack). Indexed Variables Because indexed variables (Table V, p. 985) occur very frequently, the resulting code should be short,
All three processors therefore include special instructions for indexed address computation, including the validation of array bounds, that is, they check whether the index value lies within the bounds specified by the array variables declaration. In the case of Lilith, the code differs when the low bound is zero. Although this may seem an insignificant peculiarity, it contributes to the effectiveness of the architecture because of the high frequency of occurrence of the zero low bound. Arithmetic Expressions To compute arithmetic expressions (Table VI, p. 986), the NS/MC compilers utilize the data registers in a manner similar to a stack. Since the compiler does not keep track of what was loaded into these registers, it is clear that the registers are not used in an optimal fashion; however, any further improvement increases the compilers complexity considerably. Nonetheless, multiplications and divisions by integral powers of two are (easily) recognized and represented by shift instructions. Boolean Expressions Boolean expressions require special attention because, although they are specified by the same syntax as other expressions, their evaluation rules differ. In fact, the definition of the semaniics of Boolean expressions is inconsistent with their syntax, at least if one adheres to the notion that a syntax must faithfully reflect the semantic structure. This anomaly is due to the fact that the syntax of expressions is defined regardless of type, even though arithmetic operators are left-associative, whereas logical operators are right-associative. For example, x + y + z is understood to be equivalent to (x + y) + z, and p&q&r is equivalent to p&(q&r). The logical connectives are defined in Modula in terms of conditional expressions, namely, p AND 4 = if p then 4 else false P OR 4 Consequently, p OR q OR r = if p then true else (if q then true else r), which is obviously right-associative. The Boolean connectives are implemented not by logical operators, but by conditional jumps. And, since Boolean expressions occur most frequently as constituents of if and while statements, a further complication arises: An efficient implementation must unify conditional jumps within expressions with those occurring in statements, thus effectively breaching the syntactic structure of the language. By showing a = if p then true else q
October 1986
Volume 29
Number 10
983
TABLE N. Assignment and Procedures
x:=y-tz LLW LLW ADD STW
y z x 5
M0VW.y ADDW MOVW
RO z RO RO x
MOVW AbDW MOVW
y RO z RO RO x
x := 3 f LIT 8 STW x x := r7.f LLW r LSW f STW x ali] := LLA LLW LLA LLW LXW sxw b[j] a i b j
MOVW
8 x
MQVW LEA
8 x r A0 f(A0) x
MOVW f(r) *
MOVW
M#Vkr ;
RO Rl a[RO:W]
f$UYW .j / bi&~
b[kl:Wl
MOVW LEA MOVW LEA MOVW
i RO a A4 j Rl b A3 (A3, Rl)
(A4,
RO)
x:=y<z LLWY LLW z LSS ST0 x

\ (-nl 0,
MOVW z DO CMPW y DO SCS< DO NBG DO MOVW DCI x
'h
ENT SLW SLW SLW RTN
3 2 y x
n. ; In an ',
LINK ,n , I '. EXIT RI%? 0 8 UNLK MOVE ADDQ JMP
A6.
A6 (A7) f, #0, A7 (A4)
A4
LIB LGW Lf5 UADfJ LGA CLE
17 k
k p
MOVW MOVW ADDW MOVW PEA BSR
17 -(A7) k(A5) DO 5 DO DO -(A7) ktA5) P
few simple examples, Figure 5 (p. 986) indicates the structural transformations implied by the generated code sequences for the NS machine. The resulting structures could not be expressed by a context-free syntax. For Lilith, no structural transformations are necessary thanks to the existence of the and-jump and
or-jump instructions, which either cause a jump or the removal of the Boolean value on top of the stack. Whereas the compilation of Boolean expressions is straightforward, the resulting code is, unfortunately, less than optimal (see Table VII, p. 986). The considerable complications caused by the NS/MC architectures for handling Boolean expres-
904
Comrnut~ications of the ACM
October 1986
Volume 29
Number 10
Computing Practices
sions are modestly reflected in the introduction of a new item mode (cocMd) meaning the items value is represented by the condition code register. The items attributes are the mask value CC appropriately transforming the register value into a Boolean value, and the two sequences of locations of branch instructions that require updating once their destination address is known. These sequences designate the branches taken when the Boolean result is TRUE or FALSE, respectively. In summary, we observe that, as expected, the NS/MC architectures lead to a smaller number of generated instructions compared to Liliths pure stack architecture. However, the gain is made at the
cost of more complicated compiling algorithms, which can be seen in Table VIII (p. 987) and Figure 6 (p. 987), indicating the size of the compiler modules in terms of source and object code length. These results not only run counter to all intuitive expectations, but they are also highly disappointing with regard to the commercial microprocessors. Because of the complex instruction set, the hardware for the MC and NS microprocessors is considerably more intricate than that of Lilith, and its cost has been felt severely in terms of long development delays. Another consequence of complex instruction sets is the need for more sophisticated code generators to fully tap the power of the instruction set.
TABLE V. Indexed Variables
LGW LSW SGW
a 9 u
MOVW a-18(SB)
u(SB)
MOVW a-lE(A5)
u(A5)
0 := a[i] LGW a LGW i LIB HIGH(a) CHKZ LXW

SGW
CHECKW RO [O, 991 FLAG MOVW [RO:W] a(SB)
i(SB) u(SB)
MOVW i(A5) DO 99 DO CHK ASLW 1 DO LEA d(A5) A4 MOVW O(A4, DO.W)
u(A5)
u := b[i]
LGW LGW
LIW
b i
-10
MOVW ADDW
i (As) 10 DO
DO
ISUB LIB CHKZ LXW SGW
20 CHECKW RO i-10, +lO] i(SB) FLAG MOVW [RO:W] b(SB) u(SB)
20 DO CHK ASLW 1 DO LEA b(A5) A4 MOVW O(A4, D0.W)
u(A5)
u := c[9, 91 LGW c LSA 216 LSW 9 SGW u u := c[i, LGW LGW LIB CHKZ LIB UMUL
UADD
MOVB [RO:W]
c-450(SB)
u(SB)
MOVB
C-450(A5)
ch(A5)
j] c i
103
MOVW CHECKW RO [O, FLAG 991 i(SB) CHK ASLW LEA MOVW CHK LEA MOVB
i(A5) 99 DO
DO
24
LGW LIB CHKZ LXW SGW
j 23
CHECKW Rl [0, 231 FLAG INDEXW RO 23 Rl MOVB [RO:W] c(SB)
j(SB)
4 DO c(A5) A4 j(A5, D2 15 D2
O(A4,
u(SB)
O(A4,
D0.W) D2.W)
A4 u(A5)
October 1986
Volume 29
Number 10
985
Computirlg Pracfices
TABLE VI. Arithmetic Expression
LGW LI UADD LLW LGW LI UMUL UADD LLW LI SHL

UADD
a 10 i b 5
MOVW APDW
a(SB) 10 RO
RO
MOVW ADDIW
a(A5) 10 DO b(A5)
5 02
DO
MOVW b(SB)
?mr.bw .5 Rl
Rt
MOVW MULS ADDW MOVW ASLW

ADDW
DZ
j 1
Asrrt@- i(FP) MOVW j(FP) L&W ADDW 1 R2 R2 R1

-2
Rl R2
i(A6) j(A6) 1 D4 D4 D2 D2
4 D2 D2 DO
D2 D4
LI SHR
USUB
2 ~L$HW SUBW Rl R1 RO
EXTL DIVS SUBW
TABLE VII. Boolean Expressions Compiled for the Lilith
LGW LGW LSS OR3 LGW LGW
x y Ll z x L2
u v
.,"_,
c3lPkl
y x
MOVW CMPW BLT MOVW CMPW
x(A5) y(A5) Ll z(A5) x{A5)
DO
DO
,:
BGT CMPW
tl
x z
DO
DO
LEQ
Ll: AJP
LGW LGW LSS
'I * * Lj : .(
i
BLS CMPW
L3 v u
Ll:
BGT I.3 MOVW u(A5) CMPW v(A!i) BLT MOVW CMPW BGT MOVW BRA MOVW
. . .
DO DO
ORJ L2 LGW w LGW " LEQ L2: JPC I.3 LGW y SGW x JP L4 L3: LGW v SGW u
L4: ...
.%CT
L2
~2 w(A5)
u(A5)
DO
DO
L2:
L3 y(A5) L4
V(A5)
x(A5)
L3 :
u(A5)
r+4:
,..
L4:
IF p & q THEN SO ELSE Sl END
IF (p & q) OR (r & s) THEN SO ELSE Sl END
IF p OR q THEN SO ELSE Sl END
IF (p OR q) & (r OR s) THEN SO ELSE Sl END
qI Sl
FIGURE5. Boolean Connectives Represented by Conditional Jumps for the NS and MC Architectures
966
Communicatior7s of tlrr ACM
October 1986
Volume 29
Number 10
Computing Practices
TABLE VIII. Size of Compiler Modules
Scanner Parser Table handler Symbol file generator Code generated for Lilith Code generated for NS Code generated for MC Total
410 1,300 270 530 1,490 2,050 3,780
11,200 39,520 8,400 20,850 50,200 69,000 150,000
2,640 8,190 1,350 3,680 10,190 (15,340) (21,550) 26,050
4,180 11,340 2,460 5,730 22,960 46,670
5,580 18,500 4,450 9,240
1.58 1.38 1.82 1.55 1.50
2.11 2.26 3.29 2.51
48,630 86,400
1.79
2.26 3.32
Scanner Parser Table generated Symbol files Code generated Source code in lines
500
1000
1500
2000
2500
3000
3500
Object code in bytes Lilith NS MC 20K 40K 6OK 80K
FIGURE6. Overall Size of Compilers
As a result, the compiler program is 14 percent longer for NS, and 56 percent longer for MC, than for Lilith. If we consider the code-generator parts only, the respective figures are 37 percent and 154 percent. But the most disappointing thing of all is that the reward for all these efforts and expenses appears negative: For the same programs, the compiled code is about 50 percent longer for NS, and 130 percent longer for MC, than for Lilith. The cumulative effect of having a more complicated compiling algorithm applied to a less-effective architecture results in a compiler for the NS that is 1.8 times more voluminous than that for Lilith, whereas the compiler for the MC is 3.3 times as long. Quite obviously, the value of a megabyte of memory strongly depends on the computer in which it is installed.
ANALYSIS
Naturally, one wonders where the architects may have miscalculated. Measurements shed some light
on this question but there is no single contributing factor to the poor result, and there is no single, simple answer. In Figure 7, on the next page, we give the relative frequencies of occurrence of various instruction lengths and types for the three microprocessors. As objects of this investigation, we use the modules of the compilers themselves. Admittedly, this introduces some bias, for example, against long operands (real numbers), but other measurements have largely confirmed these results. For Lilith, the average length of an instruction is 1.52 bytes where 16 percent of all instructions are operators without explicit operand fields that refer implicitly to operands on the expression stack; 50 percent have a single operand field 4 bits long; and 17 percent require a l-byte and 10 percent a &byte operand field. The J-bit field is packed together with the operator field into a single byte. This facility with short operand fields-an idea that stems from
October 1986
Volume 29
Number 10
Conzputii~g Practices
Lilith Operators 1 byte
NS
MC
NS FO
1 byte
2 bytes
Fl F2
jump
call two operands with short immediate field two operands
4-bit operand 1 byte
2 bytes
8-bit operand 2 bytes 1 g-bit operand
4 bytes
F4
2 bytes
3 bytes
3 bytes
6 bytes
F6
single operand
FIGURE7. Frequency of Instruction Lengths and Types
the Mesa instruction set of Xeroxs D-machines [I, 6]-holds the major key to Liliths high code density. In Figure 7, the relative frequencies of instructions generated for the NS and MC architectures are classified according to their formats and base lengths. These 1, 2, or 3 bytes are usually followed by further bytes containing the addresses, operands, and indexing information. III rare cases, a single instruction may consist of a dozen bytes or even more. The average total instruction length is about 3.6 bytes for the NS versus 1.5 bytes for Lilith, and almost 6 for the MC (3.5 not counting the displacements). The number of generated instructions, however, is only 1.6 times higher for Lilith. The NS and MC architectures feature a particularly rich set of data addressing modes that are designed to reduce the number of instructions and to increase code density. The relative frequencies of usage of these modes is tabulated in Figure 8. Fourteen percent of the references are to registers directly. This percentage corresponds roughly to the implicit stack references of Lilith. In the case of NS
and MC, the stack mode is used exclusively for placing procedure parameters into the stack of activation records and therefore has no relationship to Liliths stack usage. The frequency of stack references is nevertheless surprisingly high (over 20 percent). A noteworthy but not surprising result of this rich set of addressing modes is that local objects are accessed considerably more frequently (via FP) than global ones (via SB). Surprisingly frequent are indirect accesses (20 percent), which use two displacements-a reflection of the preponderance of access to r?cord fields via pointers. This addressing mode is present in the NS, but not in the MC, architecture. Looking at constants that are represented as immediate mode data placed in the instruction stream immediately following the instruction, one recognizes the predominance of Is-bit operands. In light of the data size distribution measured for Lilith, one realizes that a major flaw of the NS/MC designs lies in their requirement that the length of an immediate operand be exactly as defined by the operator; no
NS Register Register indirect Direct (FP based) Direct (SB based) Indirect (FP) Immediate byte Immediate word External Stack
MC D direct A direct A indirect A indirect displacement Immediate A indirect with increment/decrement
NS
bytes
FIGURE8. Distribution of Addressing Modes and Displacement Sizes for the NS and MC Architectures
980
Conlmuuications
of
the ACM
October 1986
Volume 29
Number 10
Computing Practices
automatic lengthening (with either zero or sign extension) is provided, as is the case with addresses (displacements). This brings us to a final investigation of the frequencies of the various displacement sizes (also shown in Figure 8). The NS architecture provides sizes of 1, 2, or 4 bytes. The length is not dictated by the operator code, but instead is encoded in the displacement value itself, a solution that is equally desirable from the point of view of code generation. As expected, the l-byte displacements dominate strongly. The average displacement size is 1.32 bytes. Particularly noteworthy is the fact that, for the MC, 94 percent of all displacement values could be placed into a single byte instead of a Is-bit word. CONCLUSIONS The NS and MC architectures have been compared with the Lilith architecture, a prototype of a regular, stack-oriented design. The increased complexity of the NS and MC resources, instruction sets, and addressing modes not only fails to lead to a simpler compiler, but actually requires a more complicated one. Regrettably, it also results in longer and often less-efficient code. On average, code for the NS is about 50 percent longer, and code for the MC 130 percent longer, than that for Lilith. Among the commercial products, this puts NS far ahead of MC. Although these two microprocessors are the best architectures widely available, the results of this investigation suggest that they also leave room for considerabie improvement. Between the two, the NS yields markedly better results, particularly when judged by the compiler designer. In the authors opinion, both designs could have avoided some serious miscalculations if their compilers (for some high-level language) had been implemented before the designs of the processors were fixed. The analysis presented here reveals the two main pinpointable causes of the low code density to be 1. the lack of short (less than 1 byte) address or operand fields, and 2. the use of explicitly addressed registers for intermediate results, Nonetheless, the principal underlying syndrome is a misguided belief in complexity as a way to achieve better performance. Both the NS and MC architectures feature complicated instruction sets and addressing modes: Obviously, these architectures are compromises in an attempt to satisfy many requirements, but they are also products of an unbounded belief in the possibilities of VLSI. However, not everything that can be done should be done. This criticism of overly complex architectures would seem to favor the development of architec-
tures featuring a simple structure, a small set of simple instructions, and only a few basic addressing modes-designs that have become known as RISC architectures [5]. However, one should be cautious not to rush from one extreme to the other. In fact, some recent RISC schemes propose facilities, such as a register bank effectively implying a two-level store, that require complicated code-generation algorithms to achieve optimal performance. Once again, the designers are primarily, if not exclusively, concerned with speed. But there is no reason why features could not be added to a design to cater to specific, genuine problems posed by the implementation of high-level languages. Under no circumstances, however, should such additions involve a complicated mechanism or infringe on the regular structure of the existing scheme. Regularity of design emerges as the key. Features must solve problems, not create them. In order to promise genuine progress, the acronym RISC should stand for regular (not reduced) instruction set computer. Regularity alone, however, is not sufficient. It must be accompanied by completeness. The instruction set must closely mirror the complete set of basic operators available in the language. In this respect, the NS architecture represents a significant improvement over earlier products, whereas the MC design gives rise to innumerable grievances and is poorly suited for effective compiler design. Irregularity is the chief culprit for the complexity of the MC code generator, which is more than twice as long (in terms of source code lines) as Liliths. (In the case of the Intel 8086, the factor lies between 3 and 4.) This observation is particularly relevant in view of the widespread acceptance of these architectures and the likelihood of their becoming de facto standards. Recognizing that regularity and completeness have been pivotal concepts in mathematics for centuries, it is high time that they be taken into account by engineers designing what are in effect mathematical machines. Although this analysis may have given the impression of undue concern for code density and code efficiency, there is actually a much more profound reason to strive for regularity of design than efficiency: and that is reliability. Reliability unquestionably suffers whenever unnecessary complexity creeps in. Reliability grows at least proportionally to the regularity of a devices specification, let alone its implementation-a law that applies equally well to hardware and software. Reliability (and not convenience of programming) was also the primary motivation behind the development of high-level, structured languages, which are
The size of the code generator for the 18086 is 4,880 lines (cf. Figure 6). and the object code size for the 18086 compiler is 149,000 bytes.
October 1986
Volume 29 Number 10
989
Computing
Practices
supposed to provide suitable abstractions for formulating data definitions and algorithms in forms amenable to precise, mathematical reasonsing. But these abstractions are useless unless they are properly supported by a correct, watertight implementation. This postulate implies that all violations of the axioms governing an abstraction must be detected and reported; moreover, that checks against violations must be performed by the compiler whenever possible, and otherwise by additional instructions interpreted at run time. It is therefore a primary characteristic of architectures designed for high-level languages that they support these abstractions by suitable facilities and efficient instructions in order to make the overhead minimal. Consistent support for such checking is perhaps the most commendable characteristic of Lilith. The following violations lead to immediate termination of a computation:
l l
instruction sequences to which most programmerstoo confident in their art-are unwilling to submit. Even tests for arithmetic overflow require additional instructions. It is incomprehensible that instructions specifically designed for reserving space for local variables upon procedure entry can be designed without the inclusion of a limit check. Acknowledgments. The author gratefully acknowledges the valuable contributions by W. Heiz and H. Seiler, who ported the compiler to the MC 68000, designed the new code generator, and provided the data concerning that architecture.
l l
access to an array with an invalid index; access to a variable via a pointer variable with value NIL; overflow in integer, cardinal, and real-number arithmetic; selection of an invalid case in a case statement; lack of data space on procedure call (stack overflow).
All these violations except the first are detected without the need for additional instructions. The checks are built into the address computation, arithmetic, case selection, and procedure entry instructions. Index values are validated by additional instructions inserted before the address calculation. Above anything else, it is these features that have characterized Lilith as high-level language oriented. During five years of intensive use, they have proved not only valuable, but also indispensable, and have made possible a truly effective environment for program development. Guaranteeing the validity of a languages abstractions is not a luxury, if is a necessity. It is as vital to inspiring confidence in a system as the correctness of its arithmetic and its memory access. However, a processor must be designed in such a manner that the overhead caused by the guards is unnoticeable. By these standards, both the NS and MC architectures can be called halfheartedly high-level language oriented, even though they both represent a tremendous improvement over all earlier commercial processor designs. Both processors feature convenient index bound checks, but unfortunately, tests for invalid access via NIL values, or for stack overflow are available only at the cost of cumbersome
REFERENCES 1. Johnsson, R.K., and Wick, J.D. An overview of the Mesa processor architecture. In Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems (Palo Alto., Calif., Mar.]. 1982. (Also published in SZGARCH Comput. Archit. News IO, 2, and in SIGPLAN Not. 17, 4). 2. Motorola Corp. MC68020 32-Bit Microprocessor Users Manual. Prentice-Hall, Englewood Cliffs, N.J., 1984. Corp. Series 32000 Instruction Set Reference 3. National Semiconductor Manual. National Semiconductor Corporation, 1984. BYTE 9, 8 (Aug. 1984), 181-192. A 4. Ohran, R.S. Lilith and Mod&-Z. description of the structure of the Lilith computer, and its orientation toward Modula-2. 5. Patterson, D.A. Reduced instruction set computers. Commun. ACM 28,~ (Jan. 1985), 8-21. A thorough presentation of the concept of the RISC. 6. Sweet, R.E., and Sandman, J.G. Empirical analysis of the Mesa instruction set. In Proceedings of the Symposium on Architectural Support for Progkmvning Languages and Operating Systems [Palo Alto, Calif., Mar.). 1982. (Also published in SIGARCH Comput. Archit. News 10, 2, and in SIGPLAN Not. 17, 4). 7. Wirth. N. The personal computer Lilith. In Proceedings of fhe 5th International Conference on Software Engineering (San Diego, Calif.. Mar.). IEEE Computer Society Press, 1981. A presentation of the coinbined hardware/software design of the workstation Lilith. New York, 6. Wirth, N. Programming in Modula-2. Springer-Verlag. 1982. An introduction to the use of Mod&-Z. Includes the defining report.
CR Categories and Subject Descriptors: C.0 [Computer Systems Organization]: General--hardware/software interfaces, instruction set design; D.3.4 [Programming Languages]: Processors-code generation, compilers, optimization General Terms: Design, Performance Additional Key Words and Phrases: code density, compiler complexity, design regularity and completeness, high-level language orientation of processor architecture, Lilith. MC68000, Mod&-Z, NS32000
Received l/86:
accepted 4/66
Authors Present Address: Niklaus CH-8092 Ziirich. Switzerland.
Wirth, Institut
fiir Informatik,
ETH,
Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish. requires a fee and/or specific permission.
990
Communications
of the ACM
October
1986
Volume
29
Number
10

Microprocessor Architectures - A Comparison Based On Code Generation by Compiler (Wirth, 1986)

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Microprocessor Architectures - A Comparison Based On Code Generation by Compiler (Wirth, 1986)

Hochgeladen von

Copyright:

Verfügbare Formate

Edgar H.

Sibley Panel Editor

MICROPROCESSORARClflTECtUfES: A COMPARISOU 0ASED ON CODE GEUERATlOU BY COMPILER

Communications of the ACM

Communications of the ACM

m-- TParameters Parameters

Current global frame G SB AE *

Current code frame

Con~mu~~icatiom of the ACM

Operator (no operands)

Single operand instructions

FIGURE2. Instruction Formats of Lilith

Operand c is a small integer

Single operand instructions

Double operand instructions

Double operand instructions (F6, F8, Fll)

FIGURE3. Instruction and Displacement Formats of NS32000

FIGURE4. Instruction Formats of MC66000

Communications of the ACM

8.16.24 4,8, 16 OS1 Yes No Stack (16) G, f-7 S. 0-Y

8, 16, 24,32,40, 8, 16, 32

TABLE II. Data Addressing Modes

Register Address register Register indirect Autoincrement Autodecrement Direct

M[G + d] M[L + d] M[T+ d]

Indexed Indirect indexed External immediate

M[A[n] + d + O[xJ] M[A[n] + d + A[x]]

M[Mft + dl] + d2] WCI

TABLE III. Item Descriptor Modes and Their Attributes

conMd dirMd indMd inxMd

stkMd AregMd OregMd cocMd typMd procMd

A 0 cc, Tjmp, Fjmp type Prw

Communications of the ACM

TABLE N. Assignment and Procedures

x:=y-tz LLW LLW ADD STW

M0VW.y ADDW MOVW

MOVW AbDW MOVW

MOVW LEA MOVW LEA MOVW

x:=y<z LLWY LLW z LSS ST0 x

MOVW z DO CMPW y DO SCS< DO NBG DO MOVW DCI x

ENT SLW SLW SLW RTN

LINK ,n , I '. EXIT RI%? 0 8 UNLK MOVE ADDQ JMP

A6 (A7) f, #0, A7 (A4)

LIB LGW Lf5 UADfJ LGA CLE

MOVW MOVW ADDW MOVW PEA BSR

17 -(A7) k(A5) DO 5 DO DO -(A7) ktA5) P

Comrnut~ications of the ACM

TABLE V. Indexed Variables

LGW LSW SGW

0 := a[i] LGW a LGW i LIB HIGH(a) CHKZ LXW

CHECKW RO [O, 991 FLAG MOVW [RO:W] a(SB)

MOVW i(A5) DO 99 DO CHK ASLW 1 DO LEA d(A5) A4 MOVW O(A4, DO.W)

ISUB LIB CHKZ LXW SGW

20 CHECKW RO i-10, +lO] i(SB) FLAG MOVW [RO:W] b(SB) u(SB)

20 DO CHK ASLW 1 DO LEA b(A5) A4 MOVW O(A4, D0.W)

LGW LIB CHKZ LXW SGW

CHECKW Rl [0, 231 FLAG INDEXW RO 23 Rl MOVB [RO:W] c(SB)

Communications of the ACM

TABLE VI. Arithmetic Expression

LGW LI UADD LLW LGW LI UMUL UADD LLW LI SHL

MOVW MULS ADDW MOVW ASLW

Asrrt@- i(FP) MOVW j(FP) L&W ADDW 1 R2 R2 R1

EXTL DIVS SUBW

TABLE VII. Boolean Expressions Compiled for the Lilith