High-Throughput Interpolator Architecture For Low-Complexity Chase Decoding of RS Codes

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
High-Throughput Interpolator Architecture for Low-Complexity Chase Decoding of RS Codes

F. Garca-Herrero, M. J. Canet, J. Valls, and P. K. Meher
AbstractIn this paper, a high-throughput interpolator architecture for soft-decision decoding of ReedSolomon (RS) codes based on low-complexity chase (LCC) decoding is presented. We have formulated a modied form of the Nielsons interpolation algorithm, using some typical features of LCC decoding. The proposed algorithm works with a different scheduling, takes care of the limited growth of the polynomials, and shares the common interpolation points, for reducing the latency of interpolation. Based on the proposed modied Nielsons algorithm we have derived a low-latency architecture to reduce the overall latency of the whole LCC decoder. An efciency of at least 39%, in terms of area-delay product, has been achieved by an LCC decoder, by using the proposed interpolator architecture, over the best of the previously reported architectures for an RS(255,239) code with eight test vectors. We have implemented the proposed interpolator in a Virtex-II FPGA device, which provides 914 Mb/s of throughput using 806 slices. Index TermsAlgebraic soft-decision decoding, interpolation, low-complexity chase (LCC), low latency, Nielsons algorithm, ReedSolomon (R-S) codes.
updating step, and changes the order of operation of the original algorithm. Moreover, it makes a parallel computation of the discrepancies just for the different points of the test vectors. From the simulation results we nd that the proposed decoder architecture based on the modied Nielsons algorithm has better areadelay performance than the decoder based on backward interpolation architecture of [9]. The remainder of this paper is organized as follows. Section II reviews the LCC decoding. In Section III, we present the proposed modied Nielsons algorithm and the LCC interpolation process for the differing points. The proposed interpolator architecture is described in this Section IV. In Section V, we have discussed hardware and time complexities. Conclusions are presented in Section VI. II. REVIEW OF LCC DECODING OF RS CODES
I. INTRODUCTION HE algebraic soft-decoding (ASD) of ReedSolomon (RS) codes [1] provides signicant coding gain over hard-decision decoding with polynomial complexity. The decoder in this case has three main functions: 1) multiplicity assignment; 2) interpolation; and 3) factorization of bivariate polynomials [2], being the interpolation the most computation-intensive one. Several architectures based on Nielsons algorithm [3][5] and LeeOSullivan algorithm [6] are found in the literature for the VLSI implementation of interpolation stage. However, their hardware complexity is still high. The low-complexity chase (LCC) algorithm is proposed for reducing the complexity of interpolation, which interpolates over 2 test vectors [7], being attractive for VLSI implementation [9]. The interpolation is = 1 simplied in LCC decoding by restricting the multiplicity to and replacing the factorization step with Chiens search and Forneys algorithm. Furthermore, it is shown in [9] that LCC algorithm with eight test vectors (i.e., for = 3) can achieve similar or higher coding gain than the Koetter-Vardy algorithm [1] with = 4. An interpolation architecture for LCC with = 3, called backward interpolation, is proposed in [9], which could be considered as the best of the current approaches. Backward interpolator shares the computation of common points of the test vectors. These points are ordered in such way that a pair of adjacent vectors differ only at one point. Due to this feature, the backward interpolation architecture involves less area and provides higher speed than its prior ones. In this paper, a new interpolation architecture based on Nielsons algorithm is proposed to reduce the latency of implementation. The proposed architecture takes care of the growth of polynomials during the
) code over GF(2q ), with = 2q 0 1, and Let us consider a RS( systematic encoding of the message. In such a case, 2 redundant symbols are added to the original -symbol message to obtain the codeword ( ), where 2 = 0 . After the message is transmitted by a noisy channel, the RS decoder receives ( ) = ( ) + ( ), where ( ) is the polynomial representation of the noise. The rst step of the ASD algorithms is the computation of the multiplicity matrix [12]. There are two levels of reliabilities in the LCC = 1 for the most decoder: = 0 for the unreliable symbols and reliable symbols [7]. To reduce the complexity of the decoder, re-encoding and coordinate transformation can be applied [11], which result is zeros on the reliable locations (corresponding to = 1), and 2 non-zero symbols in less reliable positions (corresponding to = 0). The zeros are directly interpolated during re-encoding, making a univariate polynomial. The remaining 2 symbols require a bivariate interpolation [10]. Each of these 2 symbols are expressed as a pair of points ( j j ), where j is a power of a primitive element of the GF(2q ) and the subindex is the location of the least reliability received symbol of the message after the re-encoding, j . The LCC decoding requires 2 test vectors. When re-encoding is performed, each test-vectors has (2 0 ) common symbols, while symbols can be different. The difference among the points is that the hard decision point (yHD) or the second reliable point value (yHD2) can be chosen, to obtain the 2 different combinations of the test vectors. Each test vector is interpolated to derive a bivariate polynomial. A selection algorithm [7] is used to choose one out of the 2 polynomials to nd the locations and magnitudes of the error using the Chien search and the Forneys algorithm. Finally, the message is corrected with an erasure decoder. An efcient re-encoder architecture has been presented in [14]. To match with the latency amounting to 528 clock cycles of [14], a backward interpolator was introduced in [9] with 525 clock cycles of latency, for a RS(255,239) code decoded using LCC with = 3. However, a faster re-encoder architecture is proposed in [15] which involves only 290 cycles, (for the same RS code) so that the latency of backward interpolator of [9] becomes the bottleneck to speed up the whole decoder. To reduce the latency of the decoder and improve the throughput rate, a faster interpolator is needed.
n; k
cx
t n k
rx
cx ex m
ex
t
Manuscript received June 03, 2010; revised September 04, 2010; accepted December 23, 2010. This work was supported by FEDER and the Spanish Ministerio de Ciencia e Innovacin, under Grant TEC2008-06787. F. Garca-Herrero, M. J. Canet, and J. Valls are with the Instituto de Telecomunicaciones y Aplicaciones Multimedia, Universidad Politcnica de Valencia, 46730 Gandia, Spain (e-mail: fragarh2@epsg.upv.es, macasu@eln.upv.es, jvalls@eln.upv.es). P. K. Meher is with the Department of Embedded Systems, Institute for Infocomm Research, 138632, Singapore (e-mail: pkmeher@i2r.a-star.edu.sg). Digital Object Identier 10.1109/TVLSI.2010.2103961
III. PROPOSED MODIFIED NIELSONS ALGORITHM LOW-LATENCY INTERPOLATION
FOR
Here, we present the modied Nielsons interpolator algorithm to reduce the total latency. Since the LCC algorithm works with maximum = 1, and maximum -degree of = 1, we include multiplicity of Nielsons interpolation algorithm for these parameters in Algorithm 1 [10], where 0 and 1 are the degrees of the bivariate polynomials ) and 1 ( ) respectively. to interpolate 0 (
m y deg deg g x;y g x;y
1063-8210/$26.00 2011 IEEE
Algorithm 1 has two stages: stage-1 (A1) performs the polynomial evaluation (PE) or discrepancy computation. Stage-2 (A2 and A3) performs the polynomial update (PU). There are two ways to update a polynomial: if the polynomial has minimum order and discrepancy is different from zero, then the x-degree of the polynomial is increased by one, while the y -degree remains unchanged, (step A3); otherwise, a linear combination of the two polynomials and their respective discrepancies, 0 and 1 , are computed (step A2). This last updating mode doesnt modify the degree of the polynomial. When re-encoding is applied and the maximum multiplicity is taken to be one, the number of iterations needed in the interpolation algorithm is 2t. Considering the worst-case situation, in which one of the two polynomials is updated during all iterations by applying step A3, the highest possible order of a polynomial will be 2t x-degree. In each iteration, the number of coefcients of the minimum x-degree polynomial is increased only by one, until it reaches a maximum of 2t + 1 coefcients in the last iteration. So the required number of cycles changes from (2t + 1) 1 2t, if the 2t coefcients are covered in each iteration, to (2t + 1) 1 (2t + 2)=2, if the growth of the order of polynomial is considered. In addition, the discrepancy of the rst iteration is always one for the polynomial g0 and the rst interpolation point for g1 , so that we can initialize 0 = 1 and 1 = 0 . This allows us reschedule Algorithm 1, where PU is now performed in the rst stage (steps A2,A3) and PE is performed in the second stage (step A1). As a result, in each step, the update of one coefcient of g0 and g1 and a partial evaluation of the discrepancy can be computed in parallel as happens in [5]. Note that Algorithm 1 needs to complete PE to begin with PU. Algorithm 1. Nielsons interpolation for m = 1, L = 1 Input: Interpolation points (; ) Initialization: g0 (x; y ) = 1, g1 (x; y ) = y , deg0 = 1, deg1 = 01 Interpolation: For i = 0 to length() 0 1 A1: 0 = g0 (i ; i ), 1 = g1 (i ; i ) if (min(deg0 ; deg1 ) = deg0 ) and 0 6= 0) A2: g1 (x; y ) = g1 (x; y ) 1 0 + g0 (x; y ) 1 1 A3: g0 (x; y ) = g0 (x; y )(x + i )
to obtain the coefcients for evaluating the discrepancies and updating the polynomials. Algorithm 2. Modied Nielsons interpolation Input: 2t interpolation points from re-encoding (; ) Initialization: g0 (x; y ) = 1, g1 (x; y ) = y , deg0 = 1, deg1 = 010 = 1; 1 = 0 Interpolation: For i = 0 to 2t 0 1 For j = 0 to i + 1 if (min(deg0 ; deg1 ) = deg0 ) and 0 6= 0) A2: g1;j (x; y ) = g1;j (x; y ) 1 0 + g0;j (x; y ) 1 1 A3: g0;j (x; y ) = g0;j (x; y ) 1 i + g0;j 01 (x; y ) if (j = i + 1)
deg0 = deg0 + 1, 0 = ac0 , 1 = ac1

else A1: ac0 = g0;j 1 j+1 1 i+1 + ac0 i
ac1 = g1; j 1 j+1 1 i+1 + ac1 i

elsif (min(deg0 ; deg1 ) = deg1 ) and 1 6= 0) A2: g0;j (x; y ) = g1;j (x; y ) 1 0 + g0;j (x; y ) 1 1 A3: g1;j (x; y ) = g1;j (x; y ) 1 i + g1;j 01 (x; y ) if (j = i + 1)
deg1 = deg1 + 1, 0 = ac0 , 1 = ac1

else A1: ac0 = g0;j 1 j+1 1 i+1 + ac0 i
ac1 = g1;j 1 j+1 1 i+1 + ac1 i

Output: g0 (x; y ), g1 (x; y ) If the re-encoded points are arranged in descending order of reliability, the rst 2t 0 points are the same in all the 2 test vectors. This implies that during the rst 2t 0 iterations Algorithm 2 is applied to the common points. Two techniques can be applied to compute efciently the most unreliable points: by mapping them to a hypercube [9] or to a binary tree [7]. We use the binary tree mapping in the proposed method. In each branch of the tree, one can choose to evaluate either the hard decision point (yHD) or the second more reliable one (yHD2). This choice affects only the PE stage. If the polynomials are rewritten as gn (x; y) = an (x) + y 1 bn (x), for n = 0 and 1, where each an and bn has a maximum of (2t + 1) coefcients and an (xi ) and bn (xi ) are common for both yHDi and yHD2i points, the number of iterations needed for the PE of the differing points can be reduced by evaluating the two points in parallel (it will have a low impact on the hardware resources as only an extra multiplier will be needed to compute the discrepancy values of yHDi and yHD2i in parallel). Taking this into account, we can nd the latency resulting due to the application of Nielsons algorithm to the last points for the 2 test vectors to be 2(010i) (2t 0 i). On the other hand, during the last PU step, there is no need to cover the 2t + 1 coefcients of the polynomials an and bn , because if the
deg0 = deg0 + 1
elsif (min(deg0 ; deg1 ) = deg1 ) and 1 6= 0) A2: g0 (x; y ) = g1 (x; y ) 1 0 + g0 (x; y ) 1 1 A3: g1 (x; y ) = g1 (x; y )(x + i )
deg1 = deg1 + 1
Output: g0 (x; y ), g1 (x; y ) Algorithm 2 details the proposed modied Nielsons algorithm. The PU and PE stages are written in serial processing style as they will be used in the proposed architecture depicted in the next section. Coefcients of order j of the bivariate polynomials g0 (x; y ) and g1 (x; y ) are g0;j and g1;j . PU steps are described in A2 and A3 and the computation of the partial values of PE is stored in ac0 and ac1 . During the j th iteration only the coefcients of order j are updated and evaluated, so that the proposed algorithm will avoid accessing the memory twice
GARCA-HERRERO et al.: HIGH-THROUGHPUT INTERPOLATOR ARCHITECTURE FOR LOW-COMPLEXITY CHASE DECODING OF RS CODES
TABLE I HARDWARE REQUERIMENTS OF INTERPOLATOR ARCHUITECTURES
Fig. 1. Polynomial update unit.
Fig. 2. Polynomial evaluation unit.
interpolator polynomial corrects t errors, it only requires a polynomial y 1 bn of order t. This is due to the fact that during the Chien search step it can nd a maximum of t roots with 2t parity symbols. With more roots, the interpolation process fails, and the decoder cannot recover the original message. Therefore, the total latency of the interpolation process for a complete received message is 2t0+1 a + 20i01 1 a=1 (2t 0 i) + 2t 1 2 , i = 0 : 0 1 if we update all the coefcients in 2 the last iteration, and it is at0+1 a + 20i01 1 (2t 0 i) + t 1 2 , =1 i = 0 : 0 1, if we update only the t necessary coefcients. At the end of the interpolation we obtain 2 different polynomials. IV. PROPOSED ARCHITECTURE FOR LOW-LATENCY INTERPOLATORS FOR LCC DECODER We describe here the proposed interpolator architecture based on our modied Nielsons algorithm presented in Section III. We have assumed = 3 and the RS(255,239) code for the LCC decoder. It consists of two main functional units: 1) the PU unit (Fig. 1) and 2) the PE unit (Fig. 2). For the sake of clarity of schematic, the control signals are not shown here. The PU unit (Fig. 1) has four memory banks, R0, R1, R2 and R3 to store the coefcients of a0 (x), y 1 b0 (x), a1 (x), and y 1 b1 (x), so that each bank stores 2t = 16 8-bit coefcients for each test vector. Therefore, each bank requires 2 = 8 RAMs of 2t 2 q = 16 2 8 bits each. It has six multipliers (from M0 to M5) and four adders along with the multiplexors to execute the steps A2 and A3 of Algorithm 2 in parallel. In each cycle it reads four coefcients (one from each memory
bank) and updates them. It stores updated coefcients in the previous cycle, while the coefcients updated in the current cycle are stored in the next cycle. Since both the memory accesses for read and write use the same address, we can use read-before-write single-port memories, which helps us in reducing the total RAM area and RAM access time. Dmin and Dis are outputs of the PE unit (Fig. 2). These registers can take the values of 0 or 1 depending on which is the polynomial of minimum degree, between g0 (x; y ) or g1 (x; y ). The PE unit (Fig. 2) involves four multipliers (M6, M7, M8 and M9) and four adders to evaluate the common x-dependent polynomials, and four multipliers to evaluate all the y -dependent polynomials at different points in parallel (M10, M11, M12 and M13). Besides, it involves a bank of registers to store the discrepancies, REG0 for the discrepancies of g0 (x; y ) and REG1 for the g1 (x; y ) ones. For the (255,239) RS code and = 3, the control process during the rst 2t 0 = 13 interpolation points (the common points) is based on the worst-case growth of the polynomial. Therefore, for the nth iteration, we need n + 1 cycles for updating and 3 extra cycles for the propagation of the discrepancy to the last register. For the last = 3 points (the differing points), the yHD and the yHD2 discrepancies are computed at the same time in parallel, however, the updating of coefcients is performed sequentially to take care of the data dependencies. First, we update with the yHD discrepancy and after that we update with the yHD2 discrepancy. The updating of differing points in the PU unit cannot be computed in parallel, because that would increase the hardware complexity signicantly, unlike the PE unit which only needs one extra multiplier to compute the yHD and the yHD2 points in parallel. We discuss here the control process of the architecture for the differing points using as example the code RS(255,239). It is based on mapping the differing points in a binary tree. Let 0 be used when an yHD point branch is selected and 1 be applied for the yHD2 point branch. For example, if we compute the test vector by selecting the branches of the tree yH D214 ! yH D15 ! yH D16 , we will describe it as 100. The control for these differing points is the following: Processing for point 14: The 0 and 1 discrepancies are available from the previous cycle. The updating of the polynomial corresponding to the discrepancy for 0 is done in 15 cycles, and stored in 4 test-vector memories. In three more cycles, the discrepancies corresponding to 00 and 01 combinations are computed. The same process with the discrepancy for 1 is done to obtain the discrepancies for 10 and 11. The total latency in this process is 18 2 2 cycles. Processing for point 15: Updating of 00 is done in 16 cycles and saved on 2 memory locations from different memory blocks. In three more cycles, the discrepancies for 000 and 001 are computed. The same reasoning can be applied for all the other combinations 01, 10 and 11, for computing 8 discrepancies. The total latency in this process is 19 2 4 cycles. Processing for point 16: Due to the properties of the t roots mentioned in Section III, only t + 1 coefcients need to be updated.
TABLE II DECODERS EXTRACTED FROM[16]
TABLE III EVALUATION OF A COMPLETE DECODER WITH LOW-LATENCY RE-ENCODER AND INTERPOLATOR
This is performed in 9 cycles. The result of each test vector is stored in its designated memory. The total latency in this process is 9 2 8 cycles. The total latency is 327 cycles. It has been conrmed by simulation of the VHDL model of the proposed architecture. Note that the number of arithmetic operators are independent of t and for both PE and PU units and the memory requirement amounts to [2 2 4] RAMs of 2t 2 q bits. V. COMPLEXITY EVALUATION AND COMPARISONS Here, we evaluate and compare the hardware requirement of the proposed interpolation and decoder architectures based on the modied Nielsons algorithm. A. Complexity Evaluation of Interpolation Architectures The hardware resources required to implement the interpolator architecture based on modied Nielsons algorithm for the LCC RS(255,239) decoder with = 3 are summarized in Table I. To estimate the gate counts we have used the following xor gate equivalents of the operators as done in [8], [9], [13], [16]: a GF (28 ) adder consists of 8 xor gates, a GF (28 ) multiplier is implemented by 100 xor gates, a 2:1 multiplexors of one bit or a memory cell has the same area as 1 xor gate and each register occupies about 3 times of the area of an xor . It can be observed from Table I that the number of arithmetic resources, such as the multipliers and adders required by the proposed architecture are the same in all the designs. The biggest difference is found in the memory requirements. On the other hand, the proposed architecture keeps the same critical path (of one multiplexor, one multiplier and one adder) as the interpolator of [13]. Hence, it improves the throughput of the interpolation by 37.7%, since the total latency is reduced by the same percentage. The proposed interpolator of Section IV was modeled in VHDL and implemented in a Virtex-II FPGA device. The results of the implementation show that it requires 806 slices and can operate at a clock frequency of 146.54 MHz. Therefore, it achieves a decoding throughput of 914 Mb/s.
B. Complexity Evaluation of a Complete LCC Decoder Here, we evaluate the hardware resources required by a complete LCC decoder using the proposed interpolator architecture, which is compared with other published results. As a direct consequence of the reduction of latency of the proposed interpolator, the latency of the rest of the processing blocks of the decoder is reduced to the same as that of the interpolator. Recently, a new parallel re-encoder architecture has been proposed in [15] which provides a latency of 290 clock cycles. The combination of this re-encoding with our low-latency interpolation makes a signicant improvement of the throughput of the whole LCC decoder. Table II includes the building blocks for two kinds of LCC decoders (evaluated on [16]) designed for a RS(255,239) code with = 3: one called re-encoded decoder and the other called factorization-free decoder. All of the elements of re-encoded decoder are included on Table II. For factorization free decoder [16], factorization and key equation blocks are excluded, while the rest of the blocks remain the same. Both the decoders have the same latency and critical path, so they have the same throughput as well. In Table III, we have listed the hardware complexity of our proposed LCC decoder, which area is (45614 0 39311)=45614 = 16% greater than the factorization free decoder of [16] if we consider all the multipliers as regular ones. This increase of area however could be less than 16% if we take into account the fact that nearly one-fourth of the multipliers of the proposed design perform multiplications by constants. Besides, we have taken the area of the erasure decoder as the one of the re-encoder, which consists of an erasure decoder and some extra hardware. Compared with the re-encoded decoder, the proposed low-latency decoder involves (48479 0 45614)=48479 = 6% less area. The throughput of the proposed decoder is (5280327)=528 = 38% faster than factorization-free decoder of [16] and re-encoded decoder. The proposed decoder is (39311 2 528)=(45614 2 327) 0 1 = 39:16% more efcient in terms of areadelay product compared with the factorization-free decoder [16] and a (48479 2 528)=(45614 2 327) 0 1 = 71:6% more efcient than the re-encoded decoder [16].
GARCA-HERRERO et al.: HIGH-THROUGHPUT INTERPOLATOR ARCHITECTURE FOR LOW-COMPLEXITY CHASE DECODING OF RS CODES
VI. CONCLUSION We have formulated a modied Nielsons algorithm, which works with a different scheduling, takes care of the limited growth of the polynomials and shares the common interpolation points, for reducing the latency of interpolation. Based on the proposed modied Nielsons algorithm, we have derived a low-latency interpolator architecture. An LCC decoder using our low-latency interpolator is found to be at least 39% more efcient in terms of area-delay product over the best of previous works. This architecture has been implemented in a Virtex-II FPGA device which provides a throughput of 914 Mb/s using 806 slices.
REFERENCES
[1] R. Koetter and A. Vardy, Algebraic soft-decision decoding of ReedSolomon codes, IEEE Trans. Inf. Theory, vol. 49, no. 11, pp. 28092825, Nov. 2003. [2] A. Ahmed, N. R. Shanbhag, and R. Koetter, An architectural comparision of ReedSolomon soft-decoding algorithm, Signals, Syst. Comput., pp. 912916, 2006. [3] W. J. Gross, F. R. Kschischang, R. Koetter, and P. G. Gulak, Architecture and implementation of an interpolation processor for soft-decision ReedSolomon decoding, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no. 3, pp. 309318, Mar. 2007. [4] X. Zhang, Reduced complexity interpolation architecture for soft-decision ReedSolomon decoding, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 10, pp. 11561161, Oct. 2006. [5] Z. Wang and J. Ma, High-speed interpolation architecture for softdecision decoding of ReedSolomon codes, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 9, pp. 937950, Sep. 2006.
[6] J. Zhu and X. Zhang, Efcient VLSI architecture for soft-decision decoding of ReedSolomon codes, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 10, pp. 30503062, Nov. 2008. [7] J. Bellorado and A. Kavcic, A low-complexity method for Chase-type decoding of ReedSolomon codes, Proc. ISIT, pp. 20372041, Jul. 2006. [8] X. Zhang and J. Zhu, High-throughput interpolation architecture for algebraic soft-decision ReedSolomon decoding, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 3, pp. 581591, Mar. 2010. [9] J. Zhu, X. Zhang, and Z. Wang, Backward interpolation architecture for algebraic soft-decision ReedSolomon decoding, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 11, pp. 16021615, 2009. [10] T. K. Moon, Error Correction Coding: Mathematical Methods and Algorithms. Hoboken, NJ: Wiley, 2004. [11] W. J. Gross, F. R. Kschischang, R. Koetter, and P. G. Gulak, A VLSI architecture for interpolation-based in soft-decision list decoding of ReedSolomon decoders, J. VLSI Signal Process., vol. 39, no. 12, pp. 93111, 2005. [12] F. Parvaresh and A. Vardy, Multiplicity assignments for algebraic soft-decoding of ReedSolomon codes, in Proc. ISIT, 2003, pp. 205205. [13] X. Zhang, High-speed VLSI architecture for low-complexity Chase soft-decision ReedSolomon decoding, in Proc. Inf. Theory Applic. Workshop, San Diego, CA, Feb. 2009. [14] J. Ma, A. Vardy, and Z. Wang, Reencoder design for soft-decision decoding of an (255,239) ReedSolomon code, in Proc. IEEE Int. Symp. Circuits Syst., Island of Kos, Greece, May 2006, pp. 35503553. [15] J. Zhu and X. Zhang, High-speed re-encoder design for algebraic softdecision decoding of ReedSolomon codes to appear, in Proc. IEEE Int. Symp. Circuits Syst., 2010. [16] J. Zhu and X. Zhang, Factorization-free low-complexity Chase softdecision decoding of ReedSolomon codes, in Proc. IEEE Int. Symp. Circuits Syst., May 2009, pp. 26772680.

High-Throughput Interpolator Architecture For Low-Complexity Chase Decoding of RS Codes

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

High-Throughput Interpolator Architecture For Low-Complexity Chase Decoding of RS Codes

Hochgeladen von

Copyright:

Verfügbare Formate

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS