Sie sind auf Seite 1von 8

EE 387 Algebraic Error-Control Codes Galois Field Arithmetic Implementation

November 8, 2010 Handout #26

Encoders and decoders for linear block codes over GF(2m ), such as Reed-Solomon codes, require arithmetic operations in GF(2m ). Moreover, decoders for some codes over GF(2), such as BCH codes, require computations in extension elds GF(2m ). In GF(2m ) addition and subtraction are simply bitwise exclusive-or. Multiplication can be performed by several approaches, including bit serial, bit parallel (combinational), and software. Division is usually done by multiplying by the reciprocal of the divisor. The reciprocal can be computed in hardware using several methods, including Euclids algorithm, lookup tables, exponentiation, and subeld representations. In this handout, we describe combinational multipliers, a variety of reciprocal circuits, and software implementations of multiplication and division. With the exception of division, combinational circuits for Galois eld arithmetic are straightforward. Fortunately, most decoding algorithms require only a few divisions, so fast methods for division are not essential. Combinational circuits for scalers: multiplication by constants First we consider multiplication by a constant. We want to determine equations for the components yi of the product y = a b, where b is a constant. Since GF(2m ) is a vector space over GF(2), multiplication by a constant element of GF(2m ) is a linear transformation on this space, which therefore can be described by an m m matrix over GF(2). We now show how to determine this matrix with respective to a standard basis {1, , 2, . . . , m1 }, where is any primitive element. Using the distributive law, a b = (a0 + a1 + a2 2 + + am1 m1 ) b = a0 b + a1 (b) + a2 (2 b) + + am1 (m1 b) . Since b is a constant, so are the i b for 0 i < m. The components of the vectors bi = i b can be precomputed and stored in a binary matrix B. Let the i-th row of B be bi = (bi,0 , . . . , bi,m1 ). Then the product of a and b is given by the following vector-matrix product. b0,0 b0,1 b0,m1 b1,0 b1,1 b1,m1 y = a b = aB = [ a0 a1 am1 ] . . . . .. . . . . . . . bm1,0 bm1,1 bm1,m1 Each bit yj of the product y is the inner product of a with column j of B:
m1

yj = a b =
i=0

ai bi,j .

Example: GF(26 ) can be represented as binary polynomials of degree < 6 with arithmetic modulo the primitive polynomial p(x) = x6 + x + 1. To implement multiplication by b = [ 1 1 0 0 0 1 ], we

Page 2 of 8

EE 387, Autumn 2010

calculate the matrix B whose rows are xi b for i = 0, . . . , 5 (most signicant bit at the right): 1 1 0 0 0 1 1 0 1 0 0 0 0 1 0 1 0 0 . B= 0 0 1 0 1 0 0 0 0 1 0 1 1 1 0 0 1 0 Each row of B is obtained from the previous row by shifting right once with feedback, where the feedback pattern [ 1 1 0 0 0 0 ] corresponds to p(x). The equations for the product y = a b can be read directly from the columns of B: y0 y1 y2 y3 y4 y5 = a0 a1 a5 = a0 a2 a5 = a1 a3 = a2 a4 = a3 a5 = a0 a4

In general, the matrix corresponding to any multiplier b = xj mod p(x) consists of m rows that are consecutive powers of the primitive element x, namely, xj mod p(x), . . . , xj+m1 mod p(x). From this discussion, we see that a GF(2m ) scaler multiplication by a constant requires at 1 most m(m 1) two-input exclusive-or gates. The typical multiplier uses about 2 m2 exclusive-or gates. (In fact, the average number of exclusive-or gates is exactly m2m1 /(2m 1).) Combinational circuits for general multiplication Multiplication of two arbitrary elements of GF(2m ) can be implemented using the representation of elements as polynomials with arithmetic modulo a prime polynomial p(x) over GF(2) of degree m. First recall that the product of two polynomials a(x) and b(x) is given by a(x)b(x) = (a0 + a1 x + + am1 xm1 )(b0 + b1 x + + bm1 xm1 ) = a0 b0 + (a0 b1 + a1 b0 )x + (a0 b2 + a1 b1 + a2 b0 )x2 + + am1 bm1 x2m2 . Let ti be the coecient of xi in the product a(x)b(x):
i

ti =
j=0

aj bij

Then, modulo p(x), the product a(x)b(x) is a(x)b(x) mod p(x) = (t0 + t1 x + + tm1 xm1 + tm xm + + t2m2 x2m2 ) mod p(x) = t0 + t1 x + + tm1 xm1 + tm (xm mod p(x)) + + t2m2 (x2m2 mod p(x)) .

Galois Field Arithmetic Implementation

Page 3 of 8

The polynomials xi mod p(x) for i = m, . . . , 2m 2 can be precomputed and stored as the rows of an (m 1) m binary matrix T . Each row of T is obtained from the previous row by shifting right once with feedback corresponding to p(x). xm mod p(x) xm+1 mod p(x) T = . . . . 2m2 x mod p(x) The product can be expressed in matrix notation: y = a b = [ t0 , t1 , . . . , tm1 ] + [ tm , . . . , t2m2 ] T = [ t0 , t1 , . . . , t2m2 ] Or in low-level computational terms, for j = 0, . . . , m 1,
m2

I . T

yj = tj +
i=0

tm+i Tij .

To illustrate these equations, consider GF(24 ) dened by the nonprimitive prime polynomial p(x) = x4 + x3 + x2 + x + 1. The matrix T consists of xi mod p(x) for i = 4, 5, 6: 1 1 1 1 T = 1 0 0 0 . 0 1 0 0 Therefore 1 1 1 1 [ y0 , y1, y2 , y3 ] = [ t0 , t1 , t2 , t3 ] + [ t4 , t5 , t6 ] 1 0 0 0 . 0 1 0 0 t0 t1 t2 t3 t4 t5 t6 = = = = = = = a0 b0 a0 b1 a1 b0 a0 b2 a1 b1 a2 b0 a0 b3 a1 b2 a2 b1 a3 b0 a1 b3 a2 b2 a3 b1 a2 b3 a3 b2 a3 b3 y0 y1 y2 y3 = = = = t0 t4 t5 t1 t4 t6 t2 t4 t3 t4

Expanding the matrix form yields the following Boolean equations:

Each product ai bj appears at least once in this set of equations, so m2 AND gates are needed by the straightforward approach. The number of exclusive-or gates needed is (m 1)2 (to compute the intermediate terms {t0 , t1 , . . . , t2m2 }) plus the number of 1s in the matrix T (to compute {y0 , y1 , . . . , ym1 } from {t0 , t1 , . . . , t2m2 }). The complexity of a GF(2m ) multiplier can be reduced by dening arithmetic modulo a polynomial with a small number of nonzero coecients concentrated in the low order positions. The number of exclusive-or gates is always less than 2m2 and typically under 1.5m2 .

Page 4 of 8

EE 387, Autumn 2010

Multiplication using a subeld representation Galois elds can also be represented using subelds larger than the eld integers. In particular, GF(22m ) can be represented as pairs of elements from the subeld GF(2m ) modulo a prime polynomial over GF(2m ) of degree 2. Prime polynomials of the form x2 + x + 1, where belongs to GF(2m ), are particularly convenient. For such a polynomial, the product of elements a = a0 + a1 x and b = b0 + b1 x of GF(22m ) can be expressed as a b = (a0 + a1 x)(b0 + b1 x) = a0 b0 + (a0 b1 + a1 b0 )x + a1 b1 x2 = a0 b0 + (a0 b1 + a1 b0 )x + a1 b1 (x + 1) = (a0 b0 + a1 b1 ) + (a0 b1 + a1 b0 + a1 b1 )x . In other words, the components of the product (y0 , y1 ) = (a0 , a1 ) (b0 , b1 ) are y0 = a0 b0 + a1 b1 , y1 = a0 b1 + a1 b0 + a1 b1 .

Multiplication of two elements of GF(22m ) can be accomplished using four multiplications in the subeld GF(2m ), in addition to one scaler and three additions. The following gure illustrates this approach. a0 a1 y0

b0 b1

y1

The multiplications in the above circuit could be performed in parallel, in series, or two at a time, allowing for a tradeo between time and gates. Circuits for reciprocal in GF(2m ) The quotient a/b in GF(2m ) is usually computed by multiplying the dividend a by the reciprocal b of the divisor b. So division requires the computation of multiplicative inverses. For most error coding applications, a single-cycle division circuit is not needed. This is fortunate, because combinational reciprocal circuits are costly.
1

Euclidean Algorithm As described in the handout Euclidean Algorithm and Division in Finite Fields, the reciprocal of a polynomial r(x) in the eld of polynomials modulo a prime polynomial p(x) is a scaler multiple of one of the outputs of the extended Euclidean algorithm for the greatest common divisor of r(x) and p(x). If deg p(x) = m, the Euclidean algorithm takes about 2m operations on m-bit registers.

Galois Field Arithmetic Implementation

Page 5 of 8

Table Lookup Reciprocals can be precomputed and stored in a 2m m ROM. For current gate arrays, one bit of ROM costs about 1/8 of a two-input NAND gate. Thus a reciprocal table for GF(28 ) uses the same area as about 256 gates. A combinational multiplier for the same eld uses 64 AND gates and 66 XOR gates, which costs approximately 260 gates. For larger elds, lookup tables are not as attractive; for example, for GF(210 ) the reciprocal table costs about (210 10) / 8 = 1280 gates, compared to about 400 gates for a multiplier. Sequential Search The reciprocal of a can be found by testing a b = 1 for each nonzero element b of GF(2m ). All nonzero elements can be generated using a maximum-length linear feedback shift register, which is slightly less costly than a binary counter. Sequential search can be performed using a pair of linear feedback shift registers with connections corresponding to the primitive polynomial that denes the eld. For example, suppose that GF(2m ) is dened by p(x) = x5 + x2 + 1.

initial value: a nal value: 1

initial value: 1 nal value: a1

The left shift register is initially loaded with a while the right shift register is loaded with 1. Shifting a register multiplies the contents by the primitive element . The registers are shifted simultaneously until the left shift register reaches 1. After each shift, the ratio of the left register to the right register remains a. If i is the number of shifts needed, then a i = 1, so the value i in the right shift register is the reciprocal of a. Time-Memory Tradeoff An associative memory such as a hash table can be used to reduce the number of clocks needed to nd the reciprocal without using a complete lookup table. Suppose, for example, the associative memory stores the reciprocals of 16i for i = 0, 1, . . . , 2m /16 1. Then the following program fragment nds the reciprocal of a in at most 16 steps: for (i = 0; i < 16; i++) { if (a i is in reciprocal table) { return i reciprocal(a i ) ; } } The search time can be decreased by using a larger associative memory. The same approach can be used to reduce the storage needed for computing the discrete logarithm from 2m entries for a direct table lookup to 2m /c entries if c lookups are used.

Page 6 of 8

EE 387, Autumn 2010

Exponentiation If is any nonzero element of GF(q) then q1 = 1. Therefore 1 = q2. Powers of can be computed eciently using squaring and multiplication by . If q = 2m then the binary representation of q 2 is 11 102 , which consists of m 1 ones followed by a zero. The exponentiation circuit shown below calculates the inverse of the input . Each step consists of multiplying the value in the accumulator by then squaring that result.

square initial value: 1

Starting from an initial value of 1, the successive values of the storage element are 2 , 6 , 14 , . . . , 2
m m 2

= 1 .

The nal value is 1 is obtained in m 1 clocks, one multiplication and one squaring per clock. There are ad hoc methods for nding 2 2 using fewer than 2m 2 operations. For example, 254 , the inverse of in GF(28 ), can be calculated using only 11 operations. The sequence of exponents 1, 2, 3, 6, 7, 14, 15, 30, 60, 120, 240, 254 is generated from the starting value 1 by adding two earlier exponents at each step. Such a sequence is called an addition chain. It is not known in general how to nd the shortest addition chain for a given nal value. See section 4.6.3, Evaluation of Powers, in volume 2 of Knuths The Art of Computer Programming for more information about addition chains. Subfield Representation The eld GF(22m ) of even dimension can be represented as pairs of elements from the subeld GF(2m ) modulo a prime (preferably primitive) polynomial over GF(2m ) of degree 2. It can be shown that there always exists a prime polynomial over GF(2m ) of the form x2 + x + 1. Every element of GF(22m ) is of the form ax + b where a, b are in GF(2m ). First consider a = 1. The inverse of x + b can be computed using the Euclidean algorithm in only one division step: x2 + x + 1 = (x + b)(x + ( + b)) + (b2 + b + 1) (x + b)1 = x + ( + b) . b2 + b + 1

This computation uses one reciprocal from the subeld GF(2m ); the denominator b2 + b + 1 is never zero because x2 + x + 1 is prime over GF(2m ). Next, the inverse of a general element ax + b of GF(22m ) with a = 0 can be obtained using the previous result: (ax + b)1 = (a(x + b/a))1 = x + ( + b/a) ax + (a + b) = 2 . a((b/a)2 + (b/a) + 1) b + ab + a2

When a = 0, computing the reciprocal using the third expression above requires two inverses from GF(2m ), rst a1 , then the inverse of the denominator. The nal expression is valid whenever a = 0 or b = 0. It can be computed with only one reciprocal, in addition to one constant multiplication, one squaring, and three general multiplications.

Galois Field Arithmetic Implementation

Page 7 of 8

Recursive Combinational Circuit A combinational circuit for inverses in GF(22m ) can be built using a reciprocal circuit for GF(2m ) and two general multipliers. This trick is based on the fact that GF(2m ) consists of the elements of GF(22m ) whose order divides 2m 1. For any nonzero in GF(22m ), ( 2
2m +1
m +1

)2

m 1

= (2

m +1)(2m 1)

= 2
2m

2m 1

= 1.

Thus belongs to the subeld GF(2m ). Since 1 = ( )1 , inverses can be computed using a circuit that includes a reciprocal unit for the subeld GF(2m ). The following gure shows this approach for GF(28 ).

2m +1

16-th power

17
16

inverse in GF(16)

17

Because squaring in GF(22m ) is linear, the circuit that computes 2 is linear; it implements a vector-matrix multiplication and uses 1 m2 XOR gates. The two general multipliers also require 2 O(m2 ) gates. The subeld reciprocal block is a small part of this reciprocal circuit. The overall cost of this circuit is about equal to that of three general multipliers.
m

Ecient hardware implementations of Galois eld arithmetic are presented in Christof Paars 1994 University of Essen doctoral thesis, Ecient VLSI Architectures for Bit-Parallel Computation in Galois Fields. Software implementations of multiplication and division Galois eld multiplication can be performed in software by emulating a shift-and-add multiplier, and Galois eld division can be accomplished using the Euclidean algorithm to nd reciprocals. Both of these approaches take O(m) steps for m-bit operands. When 2m is not too large, multiplication and division in GF(2m ) can be implemented using logarithm and antilogarithm tables. The following subroutines use this approach. Each element of GF(2m ) is represented by m bits in an unsigned integer of at least m + 1 bits. The header le galois.h contains type denitions and global data declarations:
typedef unsigned int GF; #define m #define Q 8 (1 << m) /* for example */ /* Q = 2^m */ /* coefficients of polynomial defining field */ /* logarithm table */ /* anti-logarithm table */

extern GF feedback; extern int Log[Q]; extern GF Exp[Q];

(Program continued on next page)

Page 8 of 8

EE 387, Autumn 2010

The le galois.c contains the actual multiplication and division routines.


GF mul(GF a, GF b) { int i; if (a == 0 || b == 0) return 0; i = Log[a] + Log[b]; if (i >= Q - 1) i -= Q - 1; return Exp[i]; } GF div(GF a, GF b) { int i; if (b == 0) return 0; /* we could raise divide by zero exception */ if (a == 0) return 0; i = Log[a] - Log[b]; if (i < 0) i += Q - 1; return Exp[i]; }

The following subroutine initializes the logarithm and exponential tables.


void gentab(GF alpha) { int i; GF t; /* first store powers of alpha in exponential table */ t = 1; for (i = 0; i < Q; i++) { Exp[i] = t; if ((t <<= 1) & Q) /* compute next power of alpha */ t ^= feedback; t &= (Q - 1); /* make sure t has only m bits */ } /* now invert the Exp table to obtain logarithms */ for (i = 0; i < Q - 1; i++) { Log[Exp[i]] = i; } /* Log[0] should never be accessed, but we give it a value anyway */ Log[0] = 0; }

Das könnte Ihnen auch gefallen