Beruflich Dokumente
Kultur Dokumente
Behrooz Parhami
Department of Electrical and Computer Engineering University of California, Santa Barbara
New York
Oxford
2000
Parhami, Behrooz. Computer arithmetic : algorithms and hardware designs I Behrooz Parhami. p. em. Includes bibliographical references and index. ISBN 0-19-512583-5 (cloth) 1. Computer arithmetic. 2. Computer algorithms. I. Title. QA76.9.C62P37 1999 98-44899 004'.01'513-dc21 CIP
Printing (last digit): 9 8 7 6 5 4 3 2 1 Printed in the United States of America on acid-free paper
To the memory of my father, Salem Parhami ( 1922-1992 ), and to all others on whom I can count for added inspiration, multiplied joy, and divided anguish.
CONTENTS
Preface
xv
NUMBERS AND ARITHMETIC 3 1 .1 What Is Computer Arithmetic? 3 1 .2 A Motivating Example 5 1 .3 Numbers and Their Encodings 6 1 .4 Fixed-Radix Positional Number Systems 11 1 .5 Number Radix Conversion 1.6 Classes of Number Representations 14 Problems 15 References 18
19
Signed-Magnitude Representation 19 Biased Representations 21 Complement Representations 22 Two's- and 1's-Complement Numbers 24 Direct and Indirect Signed Arithmetic 27 Using Signed Positions or Signed Digits 28 Problems 31 References 33
REDUNDANT NUMBER SYSTEMS 35 3.1 Coping with the Carry Problem 35 3.2 Redundancy in Computer Arithmetic 37 3.3 Digit Sets and Digit-Set Conversions 39 3.4 Generalized Signed-Digit Numbers 41
vii
viii
Contents
3.5 Carry-Free Addition Algorithms 43 3.6 Conversions and Support Functions 48 Problems 50 References 52
RESIDUE NUMBER SYSTEMS 54 4.1 RNS Representation and Arithmetic 54 4.2 Choosing the RNS Moduli 57 4.3 Encoding and Decoding of Numbers 60 4.4 Difficult RNS Arithmetic Operations 64 4.5 Redundant RNS Representations 66 4.6 Limits of Fast Arithmetic in RNS 67 Problems 70 References 72
PART II ADDITION/SUBTRACTION
75
Bit-Serial and Ripple-Carry Adders 75 Conditions and Exceptions 78 Analysis of Carry Propagation 80 Carry Completion Detection 82 Addition of a Constant: Counters 83 Manchester Carry Chains and Adders 85 Problems 87 References 90
CARRY-LOOKAHEAD ADDERS 91 6.1 Unrolling the Carry Recurrence 91 6.2 Carry-Lookahead Adder Design 93 6.3 Ling Adder and Related Designs 97 6.4 Carry Determination as Prefix Computation 6.5 Alternative Parallel Prefix Networks 100 6.6 VLSI Implementation Aspects 104 Problems 104 References 107 VARIATIONS IN FAST ADDERS 7.1 Simple Carry-Skip Adders
98
108
108
Contents
ix
Multilevel Carry-Skip Adders 111 Carry-Select Adders 114 Conditional-Sum Adder 116 Hybrid Adder Designs 11 7 Optimizations in Fast Adders 120 Problems 120 References 123
MULTIOPERAND ADDITION 125 8.1 Using Two-Operand Adders 125 8.2 Carry-Save Adders 128 8.3 Wallace and Dadda Trees 131 8.4 Parallel Counters 133 8.5 Generalized Parallel Counters 134 8.6 Adding Multiple Signed Numbers 136 Problems 137 References 140
MULTIPLICATION SCHEMES 143 Shift/Add Multiplication Algorithms 143 Programmed Multiplication 145 Basic Hardware Multipliers 146 Multiplication of Signed Numbers 148 Multiplication by Constants 151 Preview of Fast Multipliers 153 Problems 153 References 156
10
HIGH-RADIX MULTIPLIERS 157 10.1 Radix-4 Multiplication 157 10.2 Modified Booth's Recoding 159 10.3 Using Carry-Save Adders 162 10.4 Radix-8 and Radix-16 Multipliers 10.5 Multi beat Multipliers 166 10.6 VLSI Complexity Issues 167 Problems 169 References 171
164
Contents
11
AND ARRAY MULTIPLIERS 1 72 Full-Tree Multipliers 172 Alternative Reduction Trees 175 Tree Multipliers for Signed Numbers Partial-Tree Multipliers 180 Array Multipliers 181 Pipelined Tree and Array Multipliers Problems 186 References 189
178
185
12
VARIATIONS IN MULTIPLIERS 191 12.1 Divide-and-Conquer Designs 191 12.2 Additive Multiply Modules 193 12.3 Bit-Serial Multipliers 195 12.4 Modular Multipliers 200 12.5 The Special Case of Squaring 201 12.6 Combined Multiply-Add Units 203 Problems 204 References 207
PART IV DIVISION
13
BASIC DIVISION SCHEMES 211 13.1 Shift/Subtract Division Algorithms 211 13.2 Programmed Division 213 13.3 Restoring Hardware Dividers 216 13.4 Nonrestoring and Signed Division 218 13.5 Division by Constants 221 13.6 Preview of Fast Dividers 223 Problems 224 References 226 HIGH-RADIX DIVIDERS 228 14.1 Basics of High-Radix Division 228 14.2 Radix-2 SRT Division 230 14.3 Using Carry-Save Adders 234 14.4 Choosing the Quotient Digits 236 14.5 Radix-4 SRT Division 238
14
Contents
xi
240
15
VARIATIONS IN DIVIDERS 246 15.1 Quotient Digit Selection Revisited 246 15.2 Using p-d Plots in Practice 248 15.3 Division with Prescaling 250 15.4 Modular Dividers and Reducers 252 15.5 Array Dividers 253 15.6 Combined Multiply/Divide Units 255 Problems 256 References 259 DIVISION BY CONVERGENCE 261 1 6.1 General Convergence Methods 261 16.2 Division by Repeated Multiplications 263 16.3 Division by Reciprocation 265 16.4 Speedup of Convergence Division 267 16.5 Hardware Implementation 269 16.6 Analysis of Lookup Table Size 270 Problems 272 References 2 75
16
17
FLOATING-POINT REPRESENTATIONS 279 17.1 Floating-Point Numbers 279 17.2 The ANSI/IEEE Floating-Point Standard 282 17.3 Basic Floating-Point Algorithms 284 1 7.4 Conversions and Exceptions 286 17.5 Rounding Schemes 287 17.6 Logarithmic Number Systems 291 Problems 293 References 296 FLOATING-POINT OPERATIONS 297 18.1 Floating-Point Adders/Subtractors 297 18.2 Pre- and Postshifting 300
18
xi i
Contents
Rounding.and Exceptions 303 Floating-Point Multipliers 304 Floating-Point Dividers 306 Logarithmic Arithmetic Unit 307 Problems 308 References 311
19
ERRORS AND ERROR CONTROL 31 3 19.1 Sources of Computational Errors 313 19.2 Invalidated Laws of Algebra 316 318 19.3 Worst-Case Error Accumulation 19.4 Error Distribution and Expected Errors 320 19.5 Forward Error Analysis 322 19.6 Backward Error Analysis 323 Problems 324 References 327 PRECISE AND CERTIFIABLE ARITHMETIC 20.1 High Precision and Certifiability 328 20.2 Exact Arithmetic 329 20.3 Multiprecision Arithmetic 332 20.4 Variable-Precision Arithmetic 334 20.5 Error Bounding via Interval Arithmetic 20.6 Adaptive and Lazy Arithmetic 338 Problems 339 References 342
20
328
336
21
SQUARE-ROOTING METHODS 345 21.1 The Pencil-and-Paper Algorithm 345 21.2 Restoring Shift/Subtract Algorithm 347 21.3 Binary Nonrestoring Algorithm 350 21 .4 High-Radix Square-Rooting 352 21.5 Square-Rooting by Convergence 353 21 .6 Parallel Hardware Square-Rooters 356 Problems 357 References 360
Contents
xiii
22
THE CORDIC ALGORITHMS 361 22.1 Rotations and Pseudorotations 361 22.2 Basic CORDIC Iterations 363 22.3 CORDIC Hardware 366 22.4 Generalized CORDIC 367 22.5 Using the CORDIC Method 369 22.6 An Algebraic Formulation 372 Problems 373 References 376
378
Additive/Multiplicative Normalization 378 Computing Logarithms 379 Exponentiation 382 Division and Square-Rooting, Again 384 Use of Approximating Functions 386 Merged Arithmetic 388 Problems 389 References 393
24
ARITHMETIC BY TABLE LOOKUP 394 24.1 Direct and Indirect Table Lookup 394 24.2 Binary-to-Unary Reduction 395 24.3 Tables in Bit-Serial Arithmetic 397 24.4 Interpolating Memory 400 24.5 Trade-Offs in Cost, Speed, and Accuracy 24.6 Piecewise Lookup Tables 403 Problems 406 References 409
402
25 HIGH-THROUGHPUT ARITHMETIC
25.1 25.2 25.3 25.4 25.5 25.6
413
Pipelining of Arithmetic Functions 413 Clock Rate and Throughput 415 The Earle Latch 418 Parallel and Digit-Serial Pipelines 419 On-Line or Digit-Pipelined Arithmetic 421 Systolic Arithmetic Units 425
xiv
26
low-POWER ARITHMETIC 430 26.1 The Need for Low-Power Design 430 26.2 Sources of Power Consumption 432 26.3 Reduction of Power Waste 434 26.4 Reduction of Activity 436 26.5 Transformations and Trade-Offs 438 26.6 Some Emerging Methods 441 Problems 443 References 446 FAULT-TOLERANT ARITHMETIC 447 27.1 Faults, Errors, and Error Codes 447 27.2 Arithmetic Error-Detecting Codes 451 27.3 Arithmetic Error-Correcting Codes 455 27.4 Self-Checking Function Units 456 27.5 Algorithm-Based Fault Tolerance 458 27.6 Fault-Tolerant RNS Arithmetic 459 Problems 460 References 463 PAST, PRESENT, AND FUTURE 464 28.1 Historical Perspective 464 28.2 An Early High-Performance Machine 466 28.3 A Modern Vector Supercomputer 468 28.4 Digital Signal Processors 469 28.5 A Widely Used Microprocessor 472 28.6 Trends and Future Outlook 473 Problems 475 References 477
27
28
Index
479
60
If we precompute and store (2j) m; for each i and j, then the residue Xi of y (mod mi) can be computed by modulo-mi addition of some of these constants. Table 4.1 shows the required lookup table for converting 10-bit binary numbers in the range [0, 839] to RNS(8 I 7 I 5 I 3). Only residues mod 7, mod 5, and mod 3 are given in the table, since the residue mod 8 is directly available as the 3 least significant bits of the binary number y.
Example 4.1 Represent y = (1010 0100) 1wo = (164) 1en in RNS(S 17 15 I 3). Theresidueofymod 8isx3 = (YzYJYo)two = (100) 1wo = 4. Since y = 2 7 +2 5 +2 2 ,
the required residues mod 7, mod 5, and mod 3 are obtained by simply adding the values stored in the three rows corresponding to j = 7, 5, 2 in Table 4.1:
= x1 = xo =
x2
(y)7
= (Y)s = (y)3 =
(2+4+4)7
(3
+ 2 + 4)s
(2 + 2 + l)3
=3 =4 =2
In the worst case, k modular additions are required for computing each residue of a k-bit number. To reduce the number of operations, one can view the given input number as a number in a higher radix. For example, if we use radix 4, then storing the residues of 4i, 2 x 4i and 3 x 4i in a table would allow us to compute each of the required residues using only k/2 modular additions. The conversion for each modulus can be done by repeatedly using a single lookup table and modular adder or by several copies of each arranged into a pipeline. For a low-cost modulus m = 2a - 1, the residue can be determined by dividing up y into a-bit segments and adding them modulo 2a - 1.
64
Xj
(Mi{aiXdm;)M
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 0 1 2 3 4 0 1 2
0 105 210 315 420 525 630 735 0 120 240 360 480 600 720 0 336 672 168 504 0 280 560
To avoid multiplications in the conversion process, we can store the values of (Mi (ai Xi )mi) M for all possible i and Xi in tables of total size 'L~:-6 mi words. Table 4.2 shows the required values for RNS(8171513). Conversion is then performed exclusively by table lookups and modulo-M additions.