Beruflich Dokumente
Kultur Dokumente
2. ALU
A block diagram of the floating-point ALU is
shown in Figure 1. It is a two stage pipelined machine.
In the first stage, the exponent of the larger operand is
selected as the common exponent and the fraction of the
operand with the smaller exponent is shifted to the right
by the alignment shifter. In the second stage,
addition/subtraction of the fraction of the larger
exponent operand and the right shifted fraction, as well
as normalization, IEEE rounding, and correction of the
common exponent are performed.
Three arithmetic techniques are used in the ALU.
The f i s t is one-bit pre-shifting of both fractions in
effective addition cases. This technique is useful for
making the rounding process easier. The second is
normalization with the anticipated leading '1' bit of
addition/subtraction results. This normalization process
is fast even if the anticipated bit is wrong, because the
incorrectly shifted fraction can be adjusted by a simple
one-bit right shift. The third technique utilized is
pre-rounding, which prepares all possible rounded
results in parallel with addition/subtraction of aligned
fractions and selects the correct one with the leading '1'
bit of the addsubtract result. By using this technique,
the rounding process is acceralated by 5 1%.
1. Introduction
Scientific and engineering applications demand
exceptionally high floating-point performance which in
turn requires high speed floating-point ALUs and
multipliers to reduce executing time. In recent years a
number of high speed floating-point execution units
have been presented [ll - [61.
A floating-point ALU and multiplier were designed
which are each capable of 13.311s execution. The ALU
and multiplier can each individually produce a result in a
one-cycle pipelined pitch, achieving a peak execution
rate of 300MFLOPS at 15OMHz. The units are in full
compliance with the IEEE Standard for Binary
Floating-point Arithmetic (Std. 754-1985) [7].
The ALU performs add, subtract, compare, convert
to smaller/larger floating-point precision value, and
convert floating to/from integer instructions for both
double and single precision operands. The ALU can
produce a denormalized number without requiring an
additional cycle.
The
multiplier
performs
floating-point
multiplication for both double and single precision
466
S1.El S2.E2
F1
F2
SWAP
I gl I
Q
X
a,
shift number(Ediff)
I I
16527
+ + I
SELECT
+ +
ST
ET
FT(1:51)
FT(52)
E S l One-bit pre-shifting
2 . 1 One-bit pre-shifting
2 . 3 Pre-rounding
Figure 2 shows the pre-rounding scheme. The
pre-rounding process of the ALU calculations consists
of four steps.
467
El
E2
F2
f I
&
R
L
x
-1
R
0
S
0
L: least
R:round
S: sticky
x: o,
+
01 -> R1
ET
FT(1:51)
.c
FT(52)
00 -> R2
rounding carry
r52
3. Multiplier
468
4 . 2 Performance
3.2 Pre-rounding
'
4. Design methodology
(2) With
arithmetic
techniques
4 . 1 Circuit
round
(3) With
circuit
technique
Noise-tolerant
CK
1'
4 . 3 Floating-point unit
7-7-
OUT
IN2
IN3
round
add/su
5. Conclusion
One-bit pre-shifting before alignment shift,
Normalization with the anticipated leading '1' bit and
pre-rounding techniques have been developed for a
floating-point ALU. Carry select addition and
Discharge NMOS
469
Acknowledgements
The authors would like to thank A. Anzai, M.
Hashimoto, R. Yamagata, T. Kumagai, E. Kamada, T.
Nakano, K. Kaneko, N. Ido, Y. Kiyoshige, S . Muto, S .
Tanaka, K. Shimamura, K. Matsuo, T. Shimizu, and S .
Nakahara of Hitachi Ltd. for their technical support,
discussions, a d guidance.
References
[l] R. K. Montoye et al., "Design of the IBM RISC
System/6000 Floating-Point Execution Unit," IBM J. Res.
Develop. Vol. 34, No. 1 , pp. 59-70, January 1990.
[2] J. Yetter, "A 100-MHz Superscalar PA-RISC
CPU/Coprosessor Chip," Digest of Technical Papers,
Symp. VLSICircuits, pp.12-13, 1992.
[3] D. W. Dobberpuhl et al., "A 200-MHz 64-b Dual-Issue
CMOS Microprocessor," IEEE J. Solid-state Circuits, Vol.
27,No. 1 1 , p p . 1555-1557,November1992.
[4] L. Gwennap, "Digital Leads the Pack with 21164,"
Microprocessor Report, Vol. 8, No. 12, pp. 6-10,
September 1994.
[5] L. Gwennap, "MIPS RlOOOO Uses Decoupled
Architecture," Microprocessor Report, Vol. 8, No. 14, pp.
18-22, October 1994.
[6] L. Gwennap, "PA-8000 Combines Complexity and
Speed," Microprocessor Report, Vol. 8, No. 15, pp. 6-9,
November 1994.
[7] IEEE Standard for Binary Floating-point Arithmetic,
A N S E E E Standard No.754, 1988.
[8] C.S. Wallace, "A Suggestion for a Fast Multiplier,"
Trans. IEEE Electronic Computers, Vol. EC-13, pp. 14-17,
February 1964.
191 F. Murabayashi et al., "2.5V NOVEL CMOS CIRCUIT
TECHNIQUES FOR A 150MHz SUPERSCALAR RISC
PROCESSOR," to be published in ESSCIRCP5, September
1995.
Register File
I
Multiplier
ALU
DivISqrt
470
211 3.3
2113.3
1 81120.0
31/206.7
116.7
116.7
171113.3
30/200.0