Multioperand Parallel Decimal Adder: A Mixed Binary and BCD Approach

1320
IEEE TRANSACTIONS ON COMPUTERS,
VOL. 56,
NO. 10,
OCTOBER 2007
Multioperand Parallel Decimal Adder: A Mixed Binary and BCD Approach

Luigi Dadda, Fellow, IEEE
AbstractDecimal arithmetic has been revived in recent years due to the large amount of data in commercial applications. We consider the problem of Multioperand Parallel Decimal Addition with an approach that uses binary arithmetic, suggested by the adoption of binary-coded decimal (BCD) numbers. This involves corrections in order to obtain the BCD result or a binary-to-decimal (BD) conversion. We adopt the latter approach, which is particularly efficient for a large number of addends. Conversion requires a relatively small area and can afford fast operation. The BD conversion moreover allows an easy alignment of the sums of adjacent columns. We treat the design of BCD digit adders using fast carry-free adders and the conversion problem through a known parallel scheme using elementary conversion cells. Spreadsheets have been developed for adding several BCD digits and for simulating the BD conversion as a design tool. Index TermsComputer arithmetic, decimal arithmetic, multioperand adders, hardware design.
1 INTRODUCTION
ECIMAL
arithmetic was adopted in early computers (for example, ENIAC, see [8]1), being replaced by binary arithmetic for better use of the expensive electronic systems of those times, as suggested in [1]. Well-known textbooks contain extensive treatment of the subject [2]. The interest in decimal coding has been a subject vastly treated in some well-known research centers, particularly for obtaining decimal codes permitting carry-free addition. Research in decimal systems has recently received renewed interest (see, for example, [9]), with great attention being paid to decimal multiplication [10]. Such attention is justified by the increasing diffusion of embedded systems in everyday life application, where both tradition and legal constraints require a decimal arithmetic. The simultaneous addition of several decimal numbers is the basis of most multiplication and division algorithms. In several multiplier schemes, each partial product is represented by two decimal numbers. Depending on the chosen degree of parallelism, we need to add three, five, seven, and so forth, decimal numbers. Moreover, financial, commercial, and Internet-based applications, in particular, market research based on existing huge databases or the calculations needed for land management [14], could be facilitated by the availability of multioperand decimal adders. We can
also consider the possibility of using massively parallel systems with hundreds of processors on the same chip for such and other applications. In a recent paper [11], Kenney and Schulte show exhaustive treatment of the Multioperand Decimal Addition based both on direct coded additions and on a binary parallel addition with subsequent decimal correction. Here, we propose obtaining the sum of each decimal column via a network of carry-free adders and converting the sum into decimal format via a fast known BD converter. It is then easy to correctly align such decimal values, obtaining the total sum through the addition of a few (2, 3, at most 4) decimal numbers. Such a hybrid approach will be shown to also obtain a convenient adder for the addition of a large amount of binary-coded decimal (BCD) numbers. It has been shown in [12] that it is possible and convenient to adopt design tools based on spreadsheet programs with a new dot notation. For space reasons, those programs will not be treated here. The interested reader can download them from [12].
THE BASIC SCHEME

1.
Our scheme, shown in Fig. 1, is based on the following steps: The binary addition of each N digits column composed of 4-bit, N digits of equal decimal weight 100 ; 101 ; . . . ; 10n1 . 2. The BD conversion of each column sum. 3. The alignment of the decimal column sums according to their decimal weights. It will be shown that such an alignment forms an array of few (2, 3, or 4) decimal numbers, each of them n digits long (Major Partial Sums (MPSs)). 4. The addition of the MPSs for obtaining the final sum. Fig. 1 shows the basic adder scheme for N 21 numbers of n 5 BCD digits each; its operation is shown in Fig. 2.
Published by the IEEE Computer Society
1. Decimal digits in ENIAC were represented by a ring of 10 binary memory elements, only one of them being 1 and rotating for counting input pulses. ENIAC was essentially an electronic simulator of existing electromechanical computing machines. Subsequent machines adopted four or more bits for coding decimal digits.
. The author is with the Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci, 32 Milano I-20133, Italy. E-mail: dadda@elet.polimi.it, dadda@alari.ch. Manuscript received 28 Apr. 2006; revised 17 Nov. 2006; accepted 24 Apr. 2007; published online 30 Apr. 2007. Recommended for acceptance by M. Schulte. For information on obtaining reprints of this article, please send e-mail to: tc@computer.org, and reference IEEECS Log Number TC-0161-0406. Digital Object Identifier no. 10.1109/TC.2007.1067.
0018-9340/07/$25.00 2007 IEEE
DADDA: MULTIOPERAND PARALLEL DECIMAL ADDER: A MIXED BINARY AND BCD APPROACH
1321
Fig. 2. The adding algorithm: an example with all BCD digits equal to 9 (worst case).
Fig. 1. The basic parallel decimal multioperand adder scheme for N 21 addends, all digits equal to 9 (worst case).
All of the N digits in one column are added simultaneously by means of carry-free circuits (illustrated in the next section). Each (binary) column sum is converted to decimal. In the given case (N 21 and n 5), assuming that all digits have a value of 9 (the worst case), each column sum will be 21 9 189. Each column sum has to be multiplied by the respective decimal weight: The final sum is the sum of all column sums. An easy way to multiply the column sums by the respective weights is shown at the bottom of Fig. 2, where each column sum is skewed and tiled so that the final array appears to be composed of three decimal numbers, each of n digits: the topmost one is composed of the most significant digits of the column sums (11111), the second one of the second most significant digits (88888), and the third one of the least significant digits (99999). It is easy to see that the final array is composed of two rows for N 11, of three rows for 11 < N 111, of four rows for 111 < N 1111, and so on. Note that 99 (that is, 9 11) is the maximum multiple of 9 (the largest value of a digit) that is composed of only two digits. The same goes for 111, 1111,...., originating three, four, and so forth MPSs. The final addition for obtaining the sum can be done by using the standard decimal additions such as those well illustrated in [11] or by applying the same mixed binary and BCD algorithm presented here. This point will be treated in Section 5. Note the importance in this approach of the Binary-toBCD conversion. It must be both simple (that is, requiring a not-too-large area) and fast. This problem is discussed in Section 4.
ADDING
THE
DIGITS
IN
EACH COLUMN
The adoption of the BCD digits suggests the basic scheme, where the evaluation of the sum of each column is the most important operation. This operation can be done in a fast way using a combinational network of full adders. Such a network can be easily drawn using the dot notation,
introduced for the first time in [3] for the design of parallel binary multipliers. The dot notation has been widely adopted and is described in computer arithmetic textbooks [6], [7]. Here, we adopt a modified version of the dot notation, illustrated in [12]. It is a compact dot notation since it allows more compact dot schemes: it is used for the Fig. 3 schemes. In Fig. 3, we depict the compact dot schemes of digit adders from N 3 to N 11, that is, for all those leading to two MPSs. N is written in the first row of each scheme, representing, in each of the four columns, the N binary variables composing the input array of the addition. In the case N 3, the three variables of each column feed a full adder whose sum and carry output variables are represented in the second and third rows, being connected by a segment (representing a full adder). The array of two rows represents two binary numbers, added by means of a binary parallel adder, represented by a thick line joining all of the output variables, that is, the binary sum in the fourth row. In the case N 4, two transformations are used. In the first, starting with a 4 4 dot input array, each of the four full adders transforms three input variables into a sum output and a carry output, exactly as in the previous case, with the fourth input variable in each column being simply transferred in the third row of the second array. The second transformation obtains the array of two rows: Note the use of a half adder having as input the two rightmost variables in the second array. The N 5 case requires three compression stages. The first differs from the preceding one in the third row, composed of four double dots. The second stage is similar to the first in the N 4 case. In the case N 6 (and in the following ones), a further rule is used in transforming the input of six dots/column. The six variables in each input column need two full adders: We represent them, as in the figure, with a single full adder and a 2 written close to the sum output and to the carry output. This rule can be applied to any number of input variables. If the input variables number divided by 3 leaves a nonzero remainder (1 or 2), the corresponding dots are simply transferred to the next array. The application of such a rule generates a compact dot scheme [l2]. Fig. 3 schemes have been drawn for adding BCD digits; they use only 10 combinations (from 0000 to 1001) of the four binary variables. This is reflected in the fact that the resulting sums can be shorter than those obtainable by also using the input combinations from 1010 to 1111. In the Fig. 3
1322
VOL. 56,
NO. 10,
OCTOBER 2007
Fig. 3. Dot schemes of digit adders for N from 3 to 11 (all those generating only two MPSs).
schemes, the number of the output variables is as needed for adding the assigned numbers of BCD digits. For example, for N 7, the number of output variables is 6, needed to represent the number 1111112 6310 , that is, the maximum value 9 7. Consequently, no carry can be generated in the leftmost stage of the final binary parallel adder. Such a carry can instead be generated in the N 8 adder. As a further remark, we see in some schemes a few half adders, always placed at the (right) beginning of a fulladder row. It was found [12] that such schemes could be obtained using only full adders, except in the last array of two rows. This leads to a minimal number of total full adders. The use of some half adders gives an advantage of obtaining a minimum length of the binary parallel adders at the end of the schemes, which has two consequences: minimizing the delay of the final addition and reducing the cost of the final binary parallel adder. In Fig. 4, we present the dot schemes for N 16 and N 100. Drawing by hand the column adders dot schemes (as a preliminary for writing a very high density logic (VHDL) program) is a necessary but tedious and error-prone process. We have developed some programs based on spreadsheets (for example, Excel) for this scope. Two programs have been developed: one adopting the rule of using only full adders, except for the last compression stage (obtaining an array of two rows from an array of three rows), and the other using a few half adders (at most one per compression stage) in order to obtain in the two-row output, a number of single bits, and, consequently, a shorter and faster parallel adder. Moreover, those programs also provide the counting of the total number of full adders and of half adders, the number of compression stages, and the number of
stages of the carry-look-ahead (CLA) binary adder necessary for obtaining the binary value of the column. In Fig. 4a N 16, we see that the number of the reduction stages is 6. The generating program gives 53 full adders and 3 half adders. In Fig. 4b N 100, it can be seen that only two half adders have been used (in stages 5 and 7, in addition to 386 full adders). The final parallel binary
Fig. 4. (a) Digit adder for N 16. (b) Digit adder for N 100. The binary output for all input digits 9 would be 14410 100100002 for N 16 and 90010 11100001002 for N 100.
1323
Fig. 5. (a) The basic conversion cell. (b) A compact notation of the basic cell. (c) A linear array of cells for converting 6-bit binary integers. Fig. 6. Dividing a binary integer by 10: examples.
adder is composed of five stages, with no carry from the last stage. The N 100 adder, obtained by the program using only full adders (in all compression stages except the last one), requires 383 full adders and a final binary parallel adder covering all 10 final binary columns. The programs can be downloaded from [15]. A general introduction to spreadsheets for the design of complex arithmetic structures can be found in [12].
CONVERSION
FROM
BINARY
TO
DECIMAL
The binary numbers, output from the above described adders, must be converted to decimal in order to compose the final adder array (see Fig. 1). The conversion can be obtained using a circuit introduced by Nicoud [5]. The basic cell is represented in Fig. 5a. A compact notation of the cell is shown in Fig. 5b. In the cell, the decimal input digit di (a BCD digit) is multiplied by two and the result is added to the binary input bi . If S 2 di bi is smaller than 10, the output decimal digit do equals S (and the binary output bo is zero); otherwise, it equals S 10; simultaneously, the binary output bo is set to 1. This behavioral description can be translated [5] into the following equations: Ao bi ABC D Bo AD AC AD Co AB BC BC Do AD A bo D AC BC: The results of the implementation of the cell on the basis of the above equations will be given and discussed in Section 6. Consider now, in Fig. 5c, a linear array of six identical cells in which the decimal output of each cell (except the last one) feeds the decimal input of the following cell. If we feed the six binary inputs with 0 or 1 values, that is, a generic 6-bit integer Bi whose l.s.b. is the bottom one, we get from the binary outputs the integer quotient of Bi divided by 10, with the remainder of the division being the decimal output of the bottom cell. In Fig. 5c, it was assumed to input a 1 in all of the binary inputs bi . The outputs of the six cells with
the assigned Bi b5 b4 b3 b2 b1 b0 111111 has been written according to said cell rule. Note that the value 6 at the bottom is directly represented in BCD by the binary output of the last four cells. With an array composed of a greater number of cells, their binary output will not, in general, represent a BCD value: It will be simply the binary integer representing the quotient. In order to obtain the successive digits of the conversion, we will have to process the successive binary outputs with suitable (necessarily shorter) cell arrays. The binary output number is the quotient since 1) each binary cell output represents a value 2k 10 subtracted in the cell (2k is the weight of the binary input and of the binary output of the cell in position k); 2) the decimal output from the bottom cell is the value left after the last subtraction and is smaller (by construction) than 10; and 3) the binary output number is consequently the maximum (binary) factor of 10, giving a number smaller than or equal to the input binary number, that is, it is the quotient, whereas the digit output from the bottom cell is the remainder. Note, finally, that the first three cells from the top of the array are not really needed: They can be replaced by wires connecting the three top binary inputs, b5 , b4 , and b3 , to the decimal input of the fourth (from top) cell, characterized by weights 4, 2, and 1, respectively. The fourth bit, with weight 8, has to be fed with a zero, as shown in Figs. 6a, 6b, and 6c, in order to obtain a BCD digit. There are cases in which the three top bits of the binary input number are bound to have values of 1 for the first (most significant) and 0 for the next and for the second-next bits. In such a case, the top four bits compose a BCD digit (having a value of 9 or 8) and can be input to the fourth cell (see Fig. 6d for an example). The binary output of a linear array, that is, the quotient of the division by 10 of its binary input, can be fed to another array to obtain a new quotient and a second remainder. The process can be repeated until the last digit is obtained, therefore completing the BD conversion. This bidimensional array can be designed for converting a binary number of any size. In [15], we have shown an algorithm for designing converters with a minimal number of basic cells. Such an algorithm has been used for designing a sequence of BD converters, covering the range from N 2 to N 711. Fig. 7 shows the converters for up
1324
VOL. 56,
NO. 10,
OCTOBER 2007
Fig. 7. BD converters for up to N 111 (the scheme for N 111 is valid from N 89 and, similarly, for the others).
to N 111. The full design process for up to N 711 can be found in a spreadsheet downloadable from [15]. This spreadsheet contains the schemes simulating all the converters designed. The simulation schemes were used as design tools and then translated into Fig. 7 schemes. It is worth noting that any of the schemes in Fig. 7 covers an N range from the value following the marking of the scheme preceding it to the N marking it: For example, the N 111 scheme covers the range 89 to 111. The data within each converter in Fig. 7 refer to binary inputs equivalent to 9 N . This value is obviously the one appearing in the decimal outputs at the bottom. The rules given at the beginning of this section allow checking the operation of each converter. Each Fig. 7 converter is marked with c and d values. The first is the number of cells composing each scheme and the second, d, is the number of cells included in the critical path, allowing the evaluation of the maximum delay; d is the number of cells composing the rightmost column. The reason why a path leading to the different decimal outputs is not affected by the position of each output is due to a property of the basic cell (see Fig. 5a): the direct connection of the binary input bi with the first weight 1 of the d0 output. This means that no delay is paid in a path connecting two adjacent columns. This relevant property has been underlined by Nicoud [5].
111 < N < 1112. N can be extended to larger values if needed, with the number of MPSs increasing logarithmically. The case 2 < N < 12, requiring only two MPSs, can be resolved via a parallel decimal adder (of CLA type for speed reasons) [4], [14]. The total time required is the sum of the delays through the column adder, the BD converter, and the final parallel decimal adder. The case 11 < N < 112, requiring three MPSs, could be solved using two cascaded parallel decimal adders. This solution is certainly possible. It appears nevertheless more convenient (for cost and speed reasons) to first reduce the three addends to two, then using a single parallel decimal adder. The reduction from three to two can be obtained with the dot scheme in Fig. 8a for N in the range 12 to 22, whereas, for N in the range 23 to 111, the Fig. 8b scheme is needed. Fig. 8a is composed of a four-stage binary parallel adder (of CLA type for speed reasons) and a single-cell BD converter. Of the three input digits, two are standard 4-bit BCD digits, whereas the third digit is composed of a single bit, therefore representing 010 or 110 . Note that the maximum (worst-case) binary number generated by the binary parallel adder is b4 b3 b2 b1 b0 100112 2 9 1 1910 . The number b4 b3 b2 b1 1001 can thus feed the decimal input of the BD cell, as shown in Fig. 8a. The digits generated by the circuit are . . d0 , composed of 4 bits (from the decimal output of the BD cell), and d1 , composed of a single bit (from the binary output of the BD cell).
THE MAJOR PARTIAL SUMS ADDITION
Fig. 2 shows that the final sum is obtained by adding the MPSs. It has been shown that the MPSs are two for 2 < N < 12, t h r e e f o r 11 < N < 112, a n d f o u r f o r
1325
Fig. 8. (a) For compressing three MPSs, one of them composed by 0 or 1-valued digits. (b) For compressing three MPSs.
The three decimal input digits (one of them composed of 0 and 1 only) with common weight 10i are transformed into an equivalent set of two decimal digits: d0 with weight 10i and d1 with weight 10i1 composed only of 0 and 1. This {1 generated will be aligned with the next digit of weight 10 by the next unit on its left. A single parallel decimal adder is then required for obtaining the final sum. It is worth noting that one of the two final addends is composed of 0 or 1: This could lead to a simplified form of the final parallel adder. This topic will not be treated here. In Fig. 8b, we assume the maximum (worst-case) value of 3 9 27 110112 (corresponding to N 11110 ). The dot scheme is composed of a first reduction stage (from three to two rows) by means of an array of four full adders, . a binary parallel adder, and . a BD converter composed of two cells: The output digit d0 of weight 10k is generated by the rightmost one and the digit d1 of weight 10k1 is composed of two bits, produced by the two cells, with the least significant being given by the rightmost cell. Note that the maximum value of d1 is 2 since the maximum input value is 27. Note that the l.s.b. b0 from the binary parallel adder is generated by the first reduction stage and that the output from the binary parallel adder b4 b3 b2 b1 has maximum value of 11012 (since it coincides with the first four bits of 110112 2710 ). The decimal input to the leftmost cell is therefore 01102 and the binary input to the same cell is b1 1; b0 feeds the binary input of the rightmost cell. It is important to notice that the Fig. 8b scheme could be used for values of N smaller than the stated lower limit of 23 declared in the figure. For instance, it could work for a 0 value given to one of the three input digits, for instance, the topmost one. In such a case, the circuit will give a correct value, but it is somewhat redundant since Fig. 8a would suffice. The ranges of N mentioned above and written in Figs. 8a and 8b can be proven as follows. .
Fig. 9. The final adder scheme. (a) For two MPSs. (b) For three MPSs.
Looking at Fig. 1, in the final array, we see that, in the three central columns, we read the value 189, identical to the value of each column sum. This property holds for the case of N equal addends, in particular, for the case of all digits equal to 9 (worst case). In such a case, the values of the digits of each column is 198 22 N . The upper limit for the scheme in Fig. 8a is therefore 22. The lower limit for such a scheme is reached if the value of the column is 108 12 9 N 12 since, for N 11, we get 11 9 99 (that is, 11 is the upper limit for Fig. 3 schemes). In Fig. 8b, the maximum limit is reached for N 111 since 111 9 999, that is, with N > 111, we would need a fourth MPS. A small redundancy can be noticed in the scheme in Fig. 8b. We can consider, in fact, an input of three digits, where one of them, for example, the topmost, requires only three significant bits (that is, its max value is 0111 7) or only two significant bits (its max value is 0011 3). In the first case, the leftmost full adder of the reduction stage could be replaced by a half adder. In the second case, two full adders could be replaced by two half adders. The upper limit for N will be N b799=9c 88 for the first case and N b399=9c 44 for the second case.
5.1 The Final Decimal Adder Fig. 9 shows two schemes: the first, shown in Fig. 9a, for the cases generating two MPSs and the second, shown in Fig. 9b, for three MPSs. The input numbers length is assumed to be eight digits. In Fig. 9a, the two MPS registers are positioned one digit apart from each other for correctly aligning the input column sums of two digits each. The numbers in the figure correspond to the worst case of all digits equal to 9 and to the maximum value N 11. In Fig. 3b, the three registers for the MPS are placed as shown in Fig. 1. A sequence of eight compression units transforms the three rows into equivalent two rows: Note
1326
VOL. 56,
NO. 10,
OCTOBER 2007
TABLE 1 Delays and Areas (of One Column) for N 4, N 8, N 16, and N 100 Addends (Eight Digits Long)
that the first and the last units implement the scheme in Fig. 8a (two digits only are input), the remaining six units implement the Fig. 8b scheme, having an input of three digits. Note also that, in both cases, the correct number of digits of the output sums is provided. Note also that the number of stages in the parallel decimal adder equals the number of digits, eight in the figure, of the addends. The registers in Figs. 8 and 1 (for the final array) are not strictly necessary, but are used only to facilitate reasoning.
are placed in Table 1, N 4 (for simplicity, we count the half adders as full adders). The corresponding delays and areas have been obtained, directly or indirectly, using an STM 0.18 m library of standard cells and Synopsis Design Compiler. For the two most elementary and frequently used components, we obtained: . For a full adder: delay 0:20 ns, area 90m2 . . For a half adder: delay 0:15 ns, area 50m2 . . For a BD cell: delay 0:12 ns, area 168m2 . The maximum output value of the whole N 4 adder is 4 9 3610 1001002 ; the corresponding BD converter is found in Fig. 7, N 4, giving as output 3610 with the above binary input. The number of cells is two and the delay is also of two cells. Delays and areas are in subcolumns ns and m2 of the N 4 column in Table 1. Since, for N 4, two MPSs are generated, these can be easily created from the BD converter outputs and fed directly to the final decimal adder. For speed reasons, here we adopt a decimal CLA adder, assuming n 8, and reporting in the table the corresponding delay and area per digit. We can therefore obtain the total delay and the total area per digit, as shown in Table 1. This simple procedure can be applied to obtaining the same data for N < 11. For example, the results for N 8 are shown. For N 16, we have previously shown in Section 5 that three MPSs are needed, one of them composed of 0 or 1. The set of three MPSs can be the reduced to two with the Fig. 8a scheme, requiring an additional four stages of binary CLA and a single-cell BD converter. Note also that the dot scheme for the column adder has been drawn in Fig. 4, deriving it from a spreadsheet program.
DELAY
AND
AREA
Thus far, we have described the various functional components needed for designing a multioperand decimal adder using a mixed binary and BCD approach. We are now going to give data concerning the timing and silicon area needed for the VLSI implementation of a complete multioperand decimal adder. This will be done by also showing how the different functional components can be chosen among those depicted in Figs. 3, 4, 7, and 8 in accordance with the prescribed N . This will be done for units capable of adding N 4, 8, 16, and 100 operands, whose length is 8 BCD digits. The result is given in Table 1. The first two cases, N 4 and N 8, involve two MPSs, whereas the other two cases generate three MPSs. N 16 is the maximum value considered in [11]. The case N 100 will show how to obtain solutions for N > 16 with our hybrid method. Let us consider the case N 4. The corresponding column adder in Fig. 3 is composed of two compression stages containing seven full adders and one half adder and of a final binary CLA adder with four stages. The number of adder stages, full+half adders, and CLA stages (2, 8, and 4)
1327
TABLE 2 Delays and Area for Eight-Digit (32-bit) Multioperand Adders
A: Data obtained by simulation of the entire system in Verilog, represented with bar charts in [11, Figs. 13 and 14]. B: Data for one digit, read from in [14, Fig. 6]. No data on area was reported. C: Data obtained by adding the data of a single component simulated in VHDL (see Table 1).
Fig. 10. An example taken for comparison reasons from [11, Fig. 8]: a nonspeculative algorithm versus the hybrid algorithm in this paper.
The BD converter of the digit adder output for N 16 can be derived from Fig. 7. Note that there is no specific scheme for N 16: In such a case, we adopt the scheme for N 17, according to the instruction given in the caption; such a scheme is valid for N 16. The three-to-two-rows compression is done with the Fig. 8a scheme, composed of a four-stage binary parallel CLA adder and a BD converter of one cell. The corresponding data for N 16 are shown in Table 1. The final decimal adder and the total amounts of the delays and of the areas are also shown. The case N 100 differs from the N 16 one only for the digit adder in Fig. 4b and for the adoption of Fig. 8b for obtaining two MPSs from the three generated by the BD converter. It differs from Fig. 8a for the addition of a compression stage composed of four full adders and a fourstage binary CLA. The BD converter for the column sum is chosen in Fig. 7: It will be the one marked with N 111. The corresponding data are shown in Table 1: N 100. Note that, in such a table, the only data that are a function of n, the addends length, are the delay and area of the final decimal adder (of CLA type). The assumed value is n 8.
speculate on BCD correction values and correct intermediate results while adding one or two new operands, respectively. The third uses a carry-save binary adder tree, using combinational logic to correct the sum and carry in the next more significant digit. The second paper, by You et al., presents the design of a new CLA decimal adder, which is very fast (621 ps/digit in Dynamic CMOS 0.25 m technology) and gives data on four and eight input adders, without specifying their architecture (presumably a binary tree) and area. In Table 2, we have collected the main data concerning the nonspeculative tree [11], the dynamic CLA just mentioned, and the hybrid scheme in this paper. The data concern the four cases N 4, N 8, N 16, and N 100, all for operands eight digits long. Solutions A (nonspeculative tree) and C (hybrid) seem interesting to compare: Both use (although in different ways) an initial binary addition. Fig. 10 presents their application to the same case, taken from [11]. The main differences between the two schemes appear to be the following: . The initial binary additions are done in scheme A, treating the set of the digit of each addend as a single binary number. In C , the binary additions are done separately for all of the digits having the same weight. The additions in A are performed in cascade, requiring N 2 stages. In C , the column sum is obtained via an algorithm executing carry-save
COMPARISON
AND
CONCLUSION
.
Two recent papers [11], [14] have treated the problem of multioperand decimal adders. The first, by Kenney and Schulte, presents three different schemes. The first two
1328
VOL. 56,
NO. 10,
OCTOBER 2007
addition on all of the digits of the same weight: The number of stages is thus smaller than that in case A (for example, for N 16, A needs 14 stages, whereas C needs 6 stages). . The decimal sum is obtained in A through combinational circuits based on tables to be constructed for each N value. In C , corrections are replaced by BD conversions. What appears most relevant in Table 2 is that the values of areas are much greater in A. Not having detailed data regarding the components used, we can explain qualitatively the origin of the difference. Delay and area data concern two main parts of the schemes: the binary section and the corrective-conversion section. The binary sections of A and C presumably have comparable sizes. Consequently, the main difference could derive from the correction-conversion sections. The data on the conversion section in C are given in detail in Table 1 The correction section in A can be seen in the Fig. 10 example, taken from [11]. After obtaining S 3 and C 3, their binary sum, Z 0 , is computed. Simultaneously, two numbers, G and COUT, are computed using combinational tables constructed on the basis of the carries between adjacent columns generated during the previous carry free additions. The following addition of G and COUT generates the final correct result. The conceptual and practical simplicity of the hybrid method described in this paper has permitted its applicability to a large number of addends.
[9] [10] [11] [12]
[13] [14] [15]
M.F. Cowlishaw, Decimal Floating-Point: Algorithm for Computers, Proc. 16th IEEE Symp. Computer Arithmetic, pp. 104-111, June 2003. M.A. Erle and M.J. Schulte, Decimal Multiplication via CarrySave Addition, Proc. IEEE Intl Conf. Application-Specific Systems, Architectures, and Processors, pp. 348-358, June 2003. R.D. Kenney and M.J. Schulte, High Speed Multioperand Decimal Adders, IEEE Trans. Computers, vol. 54, no. 8, pp. 953963, Aug. 2005. L. Dadda, A Compact Dot-Notation for the Design of Binary Parallel Adders, Multipliers and Adders of Products, ALaRI Internal Report, http://www.alari.ch/people/dadda/home/DotNotation/DotNotation.htm, 2005. L. Dadda, Parallel Decimal Multipliers: A Hybrid Approach, ALaRI Internal Report, 2005. Y. You, Y.D. Kim, and J.H. Choi, Dynamic Decimal Adder Circuit Design by Using the Carry Lookahead, Proc. Design and Diagnostic of Electronic Circuits and Systems, pp. 242-244, Apr. 2006. L. Dadda, Multi Operand Parallel Decimal Adders: Spreadsheet Tools, ALaRI Internal Report, http://www.alari.ch/people/ dadda/home/MultiopAdder/MoAd.htm, 2006.
Luigi Dadda received the degree in electrical engineering from the Politecnico di Milano in 1947. He was a libero docente in 1953, a full professor in 1962, a rector from 1972 to 1985, and a professor emeritus in 1998. He received a US National Science Foundation scholarship in 1954. In 1955, he taught the first annual course in electronic computers and other computer sciences courses. Since 1999, he has been the president of the Advanced Learning and Research Institute (ALaRI), University of Lugano, Switzerland. His research interests include electromagnetic fields, switching theory, and high-performance computing. He has been a member of the Institute of Radio Engineers (IRE, currently IEEE) since 1954 and is currently a fellow of the IEEE and a member of the IEEE Computer Society.
. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.
ACKNOWLEDGMENTS
The author is grateful to the anonymous reviewers for their constructive suggestions, to Alberto Nannarelli (Technical University of Denmark) for providing most of the data on the circuits components, and to the students of the master course in embedded system design (in the Advanced Learning and Research Institute (ALaRI), University of Lugano, Switzerland) who performed various simulations.
REFERENCES
[1] [2] [3] [4] [5] [6] [7] [8] A.V. Burks, H.H. Goldstine, and J. von Neumann, Preliminary Discussion and Logical Design of an Electronic Computing Instrument, Inst. for Advanced Study, Princeton, N.J., June 1946. R.K. Richards, Arithmetic Operations in Digital Computers. Van Nostrand, 1955. L. Dadda, Some Schemes for Parallel Multipliers, Alta Frequenza, vol. 19, pp. 349-356, Mar. 1965. M.S. Schmookler and A. Weinberger, High Speed Decimal Addition, IEEE Trans. Computers, vol. 20, no. 8, pp. 862-866, Aug. 1971. J.D. Nicoud, Iterative Arrays for Radix Conversion, IEEE Trans. Computers, vol. 20, no. 12, pp. 1479-1489, Dec. 1971. J.M. Muller, Arithmetique des Ordinateurs. Masson, 1989. I. Koren, Computer Arithmetic Algorithms. Prentice Hall, 1993. H.H. Goldstine and A. Goldstine, The Electronic Numerical Integrator and Computer, IEEE Annals of the History of Computing, vol. 18, no. 1, pp. 10-16, 1996.

Multioperand Parallel Decimal Adder: A Mixed Binary and BCD Approach

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Multioperand Parallel Decimal Adder: A Mixed Binary and BCD Approach

Hochgeladen von

Copyright:

Verfügbare Formate

1320

IEEE TRANSACTIONS ON COMPUTERS,