Sie sind auf Seite 1von 6

FPGA IMPLEMENTATION OF TWO VERY LOW COMPLEXITY LDPC DECODERS J. Casti eira Moreira, M. Rabini, C. Gonz lez, C.

Gayoso, and L. Arnone n a Departamento de Electr nica o Facultad de Ingeniera Universidad Nacional de Mar del Plata leoarn@.mdp.edu.ar
ABSTRACT Low-Density Parity-Check (LDPC) codes are very efcient error control codes that are being considered as part of many next generation communication systems. In this paper FPGA implementations of two low complexity decoders are presented. These two implementations operate over any kind of parity check matrix, (including those randomly generated, structurally generated, either systematic or non systematic) and can be parametrically performed for any code rate k/n. The proposed implementations are both of very low complexity, because they operate using only sums, subtracts and look-up tables. One of these decoders offers the advantage of not requiring the knowledge of the signal-to-noise ratio of the channel, as it usually happens to most of decoders for LDPC codes. 1. INTRODUCTION Low-Density Parity-check (LDPC) codes are receiving particular interest because of their excellent performance close to the Shannon limit [1]. A well-known algorithm for iterative decoding of LDPC codes is the so called Sum-Product (SP) algorithm proposed by Gallager [2]. Another algorithm for decoding LDPC codes is based on the Euclidean metric, and is called the Soft-Distance (SD) decoder [3], [4]. This algorithm has the advantage of not requiring the knowledge of the signal-to-noise ratio of the channel of the received signal. The BER performance of this algorithm is similar to that of the SP algorithm. Iterative decoding algorithms for LDPC codes are in general very complex, and involve operations like products and quotients, which are quite difcult to implement in programmable logic like FPGA. In this paper we present two low complexity LDPC decoder implementations in FPGA [5] based on simplied versions of the above described algorithms, that use only sums, subtracts and look-up tables. Unlike other works ([6], [7], [8], [9], [10], [11], [12] ), these decoder implementations can operate over any kind of parity check matrix H (those randomly or structurally generated, systematic or non systematic [13], [14]), and they can also easily set to any code rate k/n. Results obtained by Monte Carlo simulation show that there is no degradation for BER performance of the proposed algorithms with respect to classic algorithms like SP or SD algorithms. This allows the design of very low complexity decoding algorithms for LDPC codes without degrading the corresponding BER performance. 2. LDPC DECODER LDPC codes are block codes [1] for which there exists a parity check matrix H such that for a given codeword c the condition H c = 0 is satised. The aim of a decoding algorithm is to nd an estimate of the codeword [1] such c that the following condition is satised: H = 0 c (1)

Decoding of these codes is based on a bipartite graph that represents the relationship between the symbol nodes, and the parity check nodes, that are related through the parity check equations of the code. These iterative decoding algorithms are characterised by an interchange of information between symbol (which represent bits so the received vector r) and parity-check nodes (which represent parity-check equations of the code) [2]. This iterative process of interchanging information is stopped if condition (1) is veried, or if a given number of iterations is reached. 3. LOGSP DECODING ALGORITHM A logarithmic version of the SP algorithm is called LogSP, and it has been described, but not implemented, in [15]. It is based on the SP algorithm introduced by D. J. C. MacKay and R. M. Neal [1]. A brief description of this algorithm is developed below. Let be yj the received symbol from the channel at instant j, with additive noise variance 2 , the a priori probability Fjx of that the jth symbol is x, (x = {0, 1}) is given by: Fj1 = efj =
1

1 1+
2 e2yj /

(2)

7
978-1-4244-8848-3/11/$26.00 2011 IEEE

Fj0 = efj = 1 Fj1


1 |fj | can be calculated as: 2 1 |fj | = ln 1 + e2yj / = f+ (2yj / 2 , 0)

(3)

Table 1. Summarise of the LogSP algorithm Initialization


0 0 qij = fj 1 1 qij = fj

(4)

where f+ (a, b) is a look-up table with entry |a b| [16], [17]. Table 1 summarises steps of this algorithm. For a given parity check matrix H, N (i) is the set of indexes of columns that have non zero elements for the row i, and M (j) is the set of indexes of rows with non zero elements for the column j. In the same way, f (a, b) is a look-up table with entry |a b| (See Appendix A). 4. LOGSP DECODER ARCHITECTURE The implementation of the LogSP decoder can be seen in Fig. 1. Memory ROM H stores the positions of ones of the parity check matrix H, its size depends on the number of ones per row, and the number of rows of this matrix. For instance, for a randomly generated parity check matrix H1 of dimensions (60 30), size of ROM H is of 256 words of 6 bits each, and for a randomly generated parity check matrix H2 of dimensions (1008 504), size of ROM H is of 4096 words of 10 bits each. ROM num ones rows stores the indexes of non zero elements of each row. This memory is used together with ROM H to determine positions of ones in parity check matrix H. Memories ROM ftable+ and ROM ftable- contain look-up tables f+ (a, b) and f (a, b). Both tables are of 256 words of 16 bits each. RAM f 0 and RAM f 1 store values of f0 and f1 each time a received word is input to the decoder. Its size is the code length n. Memories RAM q0 and RAM q1 , contain the same number of words as ROM H, but they are of 16 bits. Memories RAM q0 and RAM q1 are used for many purposes. They 0 1 store values of qij and qij in the initialization step. RAM q0 stores values of qij and RAM q1 stores the values of sij , in the horizontal step 1. They store values of rij and srij in 0 the horizontal step 2. In these memories values of rij and 1 0 rij are stored in the horizontal step 3. Finally, values of qij 1 and qij are stored n the vertical step. Thus, there is a high degree of reuse for these memories. 5. SIMPLIFIED SOFT-DISTANCE (SSD) DECODING ALGORITHM The classic Sum-Product algorithm assuming that the information has been transmitted in normalized polar format with amplitudes 1. For a given received value from the channel yj that corresponds to the time instant j, a priori probability Fjx of that the jth symbol is x, with x = {0, 1} is given by expressions (2) and (3). In the Soft-Distance algorithm

Horizontal Step 1
0 1 0 1 qij = max(qij , qij ) + f (qij , qij ) 0 1 sij = 0 if qij qij else sij = 0

Horizontal Step 2 rij = qij qij sij sij srij =


N (i) N (i)

Horizontal Step 3 if srij even 0 rij = lg(2) f+ (rij , 0)


1 rij = lg(2) + f (rij , 0)

if srij odd 0 rij = lg(2) + f (rij , 0)


1 rij = lg(2) f+ (rij , 0)

Vertical Step
x cx = fj + ij

M (j)

x x rij rij

0 qij = c0 max(c0 , c1 ) + f (c0 , c1 ) ij ij ij ij ij 1 qij = c1 max(c0 , c1 ) + f (c0 , c1 ) ij ij ij ij ij

Estimation of Decoded Symbol r(j)


x cx = fj j

M (j)

x rij

rj = 0 if c0 c1 else rj = 1 j j

(SD), probabilities are replaced by the values of the square Euclidian distances [3], [4]: d2 (j) 1 d2 (j) 0 = = (yj 1)2 (yj + 1)2 (5)

ROM_H

ROM_num_ones_rows

ROM_ftable+

ROM_ftable-

Table 2. Summarise of the SSD algorithm Initialization


0 qij = d2 (j) 0 1 qij = d2 (j) 1

y(j) channel output

CORE

r(j) symbol estimate

Horizontal Step 1
0 1 qij = +f+ (qij , qij ) 0 1 qij = f (qij , qij )

RAM_f0

RAM_f1

RAM_q0

RAM_q1

Fig. 1. LogSP decoder architecture FPGA implementation using ALTERA [5]

0 1 sij = 0 if qij qij else sij = 1

Horizontal Step 2 Following a procedure detailed in [15], the use of logarithmic calculations leads to the Simplied Soft-Distance (SSD) decoding algorithm [18]. Table 2 summarises the SSD algorithm. It can be seen that the involved calculations are sums, subtracts, and look-up tables. rij = rij = qij qij

N (i)

qij qij srij = sij sij


N (i) N (i)

6. SSD DECODER ARCHITECTURE Implementation of the SSD decoder is seen in Fig. 2. Memory ROM H stores the positions or indexes of the ones of the parity check matrix H, and its size depends on the number of ones per row, and the number of rows of this matrix. For instance, for a randomly generated parity check matrix H1 of dimensions (60 30), size of ROM H is of 256 words of 6 bits each, and for a randomly generated parity check matrix H2 of dimensions (1008 504), size of ROM H is of 4096 words of 10 bits each. ROM num ones rows stores the indexes of non zero elements of each row. This memory is used together with ROM H to determine positions of ones in parity check matrix H. Memories ROM ftable+ and ROM ftable- contain lookup tables f+ (a, b) and f (a, b). Both tables are of 256 words of 16 bits each. RAM d20 and RAM d21 store values of d2 and d2 each 0 1 time a received word is input to the decoder. Its size is the code length n. Memories RAM q0 and RAM q1 , contain the same number of words as ROM H, but they are of 16 bits. Memories RAM q0 and RAM q1 are used for many purposes. They store values of d2 and d2 in the initialization step. RAM q0 0 1 stores values of qij and RAM q1 stores values of qij , in the horizontal step 1. They store values of rij and rij in 0 the horizontal step 2. In these memories values of rij and 1 0 rij are stored in the horizontal step 3. Finally, values of qij 1 and qij are stored in the vertical step. Thus, there is a high degree of reuse for these memories. RAM sign stores sij in the horizontal step 1 and srij in the horizontal step 2.

Horizontal Step 3 if srij even 0 rij = f+ (rij , rij )


1 rij = +f (rij , rij )

if srij odd 0 rij = +f (rij , rij )


1 rij = f+ (rij , rij )

Vertical Step
0 qij = d2 (j) + 0

M (j)

0 0 rij rij

1 qij = d2 (j) + 1

M (j)

1 1 rij rij

Estimation of Decoded Symbol r(j)


2 0 0 r0 (j) = qij + rij 2 1 1 r1 (j) = qij + rij 2 2 rj = 1 if r1 (j) < r0 (j) else rj = 0

7. COMPLEXITY ASPECTS In order to analyse complexity of the two decoding algorithm we rst dene t = M (j)av as the average number of

ROM_H

ROM_num_ones_rows

ROM_ftable+

ROM_ftable-

Table 4. Characteristics of the LogSP and SSD decoder implementations for the (60 30) LDPC decoder Hardware used LogSP SSD

y(j) channel output

CORE

r(j) symbol estimate

Device
RAM_d20 RAM_d21 RAM_q0 RAM_q1 RAM_sign

EP2C35F672C6

EP2C35F672C6

Family Fig. 2. SSD decoder architecture FPGA implementation using ALTERA [5] Logic elements Table 3. Complexity analysis of LDPC decoders (t = 3 and v = 6) Algorithm SP (Mackay-Neal) LogSP SSD products 36N quotients 6N sums-subtracts 15N 78N 72N comparisons 24N 21N Look-up tables 21N 12N ones per column, and v = N (i)av as the average number of ones per row. Usually it is true that: v = N t/M For a LDPC code of rate 1/2, M = N/2, then: v =2t (7) Device Table 3 shows a comparison of the two decoding algorithms for t = 3 and v = 6. As it is seen, SSD algorithm involves less calculations that the LogSP algorithm. Even when SSD and LogSP algorithm require more sums and subtracts than the classic SP algorithm (MacKay-Neal), they do not make use of products neither quotients, becoming algorithms of signicant less complexity. 8. SIMULATION RESULTS FPGA implementations of the corresponding LogSP and SSD decoding algorithms were designed for the (60 30) LDPC and (1008 504) LDPC codes using VHDL programming language [19]. QUARTUS II from ALTERA [20] has been used as a synthesis tool. Table 4 shows characteristics of the implementation of both decoders for the (60 30) LDPC code and Table 5 shows characteristics of the implementation of both decoders for the (1008 504) LDPC code. In both implementations is observed that the SSD decoder is faster than the LogSP decoder, and that it uses less number of logic components and registers. (6)

Cyclone II

Cyclone II

1036

882

Registers

394

387

Memory bits

15968

20320

Clock freq.

101, 67M Hz

112, 38M Hz

Table 5. Characteristics of the LogSP and SSD decoder implementations for the (1008 504) LDPC decoder Hardware used LogSP SSD

EP2C35F672C6

EP2C35F672C6

Family

Cyclone II

Cyclone II

Logic elements

1109

951

Registers

435

428

Memory bits

210944

219136

Clock freq.

94, 43M Hz

118, 11M Hz

Figures 3 and 4 show the Bit Error Rate (BER) performance of both decoders using different look-up table sizes. It can be seen that the use of tables of 256 entries does not

10

10

-1

APPENDIX A If C = ec , A = ea and B = eb , then C = A + B can be determined as: c = max(a, b) + ln(1 + e|ab| ) (8)

10

-2

Pbe
10
-3

10

-4

uncoder transmission SD ideal SP Mackay-Neal LogSP table of size 256 SSD table of size 256 0 1 2 Eb/No [dB] 3 4 5

For C = A B or C = (1)z ec = (1)z |C| with |C| = |A B| and z = 0 if A > B else z = 1, then: c = max(a, b) + ln(1 e|ab| ) = max(a, b) | ln(1 e|ab| )| (9)

Fig. 3. BER performance of a (60 30) LDPC decoder for different decoding algorithms

Logarithmic calculations in (8) and (9), can be avoided by using look-up tables f+ (a, b) and f (a, b), with argument |a b|.

10

-1

REFERENCES
10
-2

10

-3

[1] D.J.C. MacKay and R.M. Neal, Near Shannon limit performance of low density parity check codes, Electronics Letters, vol. 33, pp 457-458 (1997).
uncoder transmission SD ideal SP Mackay-Neal LogSP table of size 256 SSD table of size 256 1 1.5 2 Eb/No [dB] 2.5 3

Pbe

10

-4

[2] R.G. Gallager, Low Density Parity Check Codes, IRE Trans. Information Theory, IT-8, 21-28 (1962). [3] P. G. Farrell,Decoding Error-Control Codes with Soft Distance as the Metric, in Proc. Workshop on Mathematical Techniques in Coding Theory, Edinburgh, UK, (2008). [4] P. G. Farrell and J. Castieira Moreira, Soft-Input SoftOutput Euclidean Distance Metric Iterative Decoder for LDPC Codes, in Proc. Argentine Symposium on Computing Technology (AST 2008), Santa Fe, Argentina, (2008). [5] Altera, Cyclone II FPGAs, On Line. [6] T. Zhang, K. Parhi, 54Mbps (3,6)-regular FPGA LDPC Decoder, Signal Processing Systems, 2002. (SIPS 02). IEEE Workshop, vol 1, pp 127-132 (2002). [7] Y. Chen, D. Hocevar, A FPGA and ASIC implementation of rate 1/2, 8088-b irregular low density parity check decoder, Global Telecommunications Conference, 2003. GLOBECOM 03. IEEE, vol 1, pp 113-117 (2003). [8] P. Bhagawat, M. Uppal, G. Choi, FPGA based implementation of decoder for array low-density paritycheck codes, Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP 05). IEEE International Conference, vol 5, pp 29-32 (2005). )

Fig. 4. BER performance of a (1008 504) LDPC decoder for different decoding algorithms

show signicant BER performance degradation with respect to the use of the ideal function. Thus, it is possible to implement low complexity decoders without BER performance degradation using only sums, subtracts and look-up tales of a reasonable size.

9. CONCLUSIONS FPGA implementations of two LDPC decoders are presented. These two implementations operate over any kind of parity check matrix H, (including those randomly generated, structurally generated, either systematic or non systematic) and can be parametrically performed for any code rate k/n. The SSD decoder has the advantage of not requiring the knowledge of the signal-to-noise ratio of the channel. Both implementations are of very low complexity, with high application versatility, and they do not show any signicant BER performance degradation with respect to the classic SP algorithm.

11

[9] C. Beuschel, H. Peiderer, FPGA implementation of a exible decoder for long ldpc codes, Field Programmable Logic and Applications, 2008. FPL 2008. International Conference, vol 1, pp 185-190 (2008). [10] Z. Cui, Z. Wang, A 170 Mbps (8176, 7156) QuasiCyclic LDPC Decoder Implementation with FPGA, Circuits and Systems, 2006. ISCAS 2006. Proceedings. 2006 IEEE International Symposium, vol 1, pp 50955098 (2006). [11] X. Chen, J. Kang, S. Lin, L. Fellow, V. Akella, Memory System Optimization for FPGABased Implementation of Quasi-Cyclic LDPC Codes Decoders, Circuits and Systems I: Regular Papers, IEEE Transactions, (future issue). [12] R. Zarubica, S. Wilson, E Hall, Multi-Gbps FPGAbased Low Density Parity Check (LDPC) Decoder Design, Global Telecommunications Conference, 2007. GLOBECOM 07. IEEE, vol 1, pp 548-552 (2007). [13] J. Sha, M. Gao, Z. Zhang, L. Li and Z. Wang, A Memory Efcient FPGA Implementation of Quasi-Cyclic LDPC Decoder, Proc. 5th WSEAS Int. Conf. on Instrumentation, Measurement, Circuits and Systems, vol 1, pp 218-223 (2006).

[14] M.K. Ku, H.S. Li and Y.H.. Chien, Code Design And Decoder Implementation of Low Density Parity Check Code, Emerging Information Technology Conference, (2005). [15] L. Arnone, C. Gayoso, C. Gonzlez and J. Castieira, Sum-Subtract Fixed Point LDPC Decoder, Latin American Applied Research, vol 37, pp 17-20 (2007). [16] J.P. Woodard and L. Hanzo, Comparative Study of Turbo Decoding Techniques, IEEE Transaction on Vehicular Technology, vol. 49, (2000). [17] T. Bhatt, K. Narayanan and N. Kehtarnavaz, Fixed point DSP implementation of Low-Density Parity Check Codes, Proc IEEE DSP2000,(2000). [18] P. G. Farrell L. Arnone and J. Castieira Moreira, FPGA implementation of a Euclidean distance metric SISO Decoder, Proceedings of the Tenth International Symposium in Communications Theory and Applications. ISCTA 09, vol 1, pp 1-5 (2009). [19] L. Ter s, Y. Torroja, S. Olcoz, E. Villar VHDL. e Lenguaje Est ndar de Dise o Electr nico, McGraw a n o Hill/Interamericana de Espa a, Madrid (1997). n [20] Altera, Quartus II Software, Available: On Line.

12

Das könnte Ihnen auch gefallen