Sie sind auf Seite 1von 5

1020

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010

A Novel Architecture for Block Interleaving Algorithm in MB-OFDM Using Mixed Radix System
Youngsun Han, Peter Harliman, Seon Wook Kim, Jong-Kook Kim, and Chulwoo Kim

TABLE I PARAMETERS FOR MB-OFDM INTERLEAVER [7]

AbstractIn this paper, we present a novel architecture of a block interleaver in MB-OFDM systems based on Mixed Radix System (MRS). We prove mathematically that the proposed architecture can support bit permutations in the interleaving process. The hierarchical property of our proposed MRS-based design methodology allows the proposed architecture to support all the required data rates in the MB-OFDM systems with simple modular design. Furthermore, the same design to be used for the interleaver can also be used for the operation of de-interleaving, which reduces the implementation complexity signicantly. The latency of our architecture is as low as 6 MB-OFDM symbols. In addition, when comparing our proposed architecture with the conventional approach, we are able to reduce the implementation complexity by 85.5%, 69.4%, and 40.3% for 80, 200, and 480 Mb/s data rates, respectively, while improving our operating maximum clock frequency by more than 3.3 times over the conventional design. We also show that the power consumption is reduced by 87.4%, 73.6%, and 39.8% for 80, 200, and 480 Mb/s, respectively. Index TermsArray processor, block interleaving, MB-OFDM, Mixed Radix System (MRS).

I. INTRODUCTION MB-OFDM [1] has been widely used as one of the de facto standards for Ultra Wide Band (UWB) communication. MB-OFDM supports various data rates with low power consumption as shown in Table I. Due to these performance requirements, the implementation of an MB-OFDM processor becomes difcult and challenging to developers. An interleaver reorders input bit sequence into a non-contiguous way in order to improve the robustness against burst errors in transmission [2][6]. The interleaver in MB-OFDM consists of three sequential subprocesses: symbol interleaving, tone interleaving and cyclic shift. The mathematical equations for the sub-processes are represented as the following [7]:
6 i + 2 mod(i; NCBPS) (1) NCBPS NTDS i aT [i] = aS + 10 2 mod(i; NTint ) (2) NTint b[i] = aT [m(i) 2 NCBPS +mod (i + m(i) 2 Ncyc ; NCBPS )] (3)

aS [i] = a

((6

where i is an index for bit sequences with a range of 0  i < =NTDS ) 2 (NCBPS )) in (1) and (3), and 0  i < NCBPS in (2). The symbol a is an input bit into the interleaver, aS is an output of the symbol interleaver, aT is an output of the tone interleaver,

rate of the MB-OFDM system. b c denotes a oor function, and mod(m; n) = m 0 n 2 bm=nc. In conventional designs [8], the three sub-processes are implemented separately with an embedded memory as an interface buffer to keep temporary results from each process. Because each sub-process should be executed serially [9], the latency and throughput for an interleaver are determined by the summation of the performance data in each subprocess. A pipelined architecture, as one of the alternatives to resolve the latency issue, can be employed to increase the throughput. But, it also incurs much higher complexity and power dissipation than a nonpipelined architecture. Tell et al. [10] proposed an interleaver design with permutation units and a memory. The architecture employed two modulo permutation units to reorder bit sequences read out from and written into the memory respectively. The memory was designed as a special matrix memory block in order to enable multiple data to be written as rows and read as columns with multi-port I/O. MRS [11] is one of the popular numerical systems in which its numerical base (radix) varies from one position to another position. Also, it has been used in large domain of applications, such as fast Fourier transfer (FFT) [12], parallel processor architectures [13], and Residue Number System (RNS) conversions [14][16]. In an n-Radix MRS(pn j . . . jp1 ), a number X in Decimal Number System (DNS) is expressed as

b is an output of the cyclic shift, and m(i) is bi=NCBPS c. NCBPS , NTDS , NTint , and Ncyc are constant values depending on the data

X
where

han jan01 j ja0 i


...

(4)

X
Manuscript received April 14, 2008; revised October 24, 2008; accepted March 09, 2009. First published August 04, 2009; current version published May 26, 2010. This work was supported in part by the Ubiquitous Computing and Network (UCN) Project, Knowledge and Economy Frontier R&D Program of the Ministry of Knowledge Economy (MKE) in Korea as a result of UCNs subproject 09C1-C2-30S, and by the University & Industrial Coordinate R&D Program of the Small and Medium Business Administration in Korea. Y. Han, S. W. Kim, J.-K. Kim, and C. Kim are with the School of Electrical Engineering, Korea University, Seoul 136-701, Korea (e-mail: youngsun@korea.ac.kr; seon@korea.ac.kr; jongkook@korea.ac.kr; ckim@korea.ac.kr). P. Harliman is with LG Digital Media Company, Seoul 150-721, Korea (e-mail: harliman@lge.com). Digital Object Identier 10.1109/TVLSI.2009.2018091

n
=

m=0
1

am wm pi ; for 0 < m  n
for m = 0

(5)

wm

i=1

(6)

where pi are the radices, wm are the weight values, and am are the mixed-radix digits. Note that MRS expression leads to the same expression with DNS if all pi = 10. In this paper, we derive a mathematical relationship between MRS and interleaving processes. Also, we propose a novel array processor for the interleaving processes using MRS. The proposed architecture uses a structure of 2-D array processors with shift operations and simple interconnections, where each processor consists of only two

1063-8210/$26.00 2009 IEEE

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010

1021

1-bit memory cells and four 1-bit multiplexers. The hierarchical property of our MRS-based design methodology allows the proposed architecture to support all the required data rates in MB-OFDM systems when we apply a simple modular design technique. Furthermore, the same design to be used for the interleaver can be also used for the operation of de-interleaving, which reduces the implementation complexity signicantly. The performance analysis shows that our design achieves better performance than the conventional designs [8] in terms of latency, hardware complexity, power consumption, and maximum operating clock frequency. This paper is organized as follows. In Section II, we derive a mathematical relationship between MRS and interleaving processes. Also, we describe an architecture of our interleaver approach with MRS. The implementation details of the architecture are shown in Section III and the performance is analyzed in Section IV. Finally, the conclusion is made in Section V. II. INTERLEAVING/DE-INTERLEAVING PROCESSOR FOR MB-OFDM A. Interleaving/De-Interleaving With MRS
Fig. 1. Interleaving/de-interleaving processor in

MRS( )

The ith element of the DNS sequence p1 matrix by the following:


i

can be represented as the

MRS(p ).

where i i=p p Since i derived by the following:

( )] (7) in DNS is represented as hb 1 cjmod( 1 )i in MRS( 1 ). = b 1c 2 1 + mod( 1 ), the matrix in MRS( 1) can be
i=p1 c; mod i; p1 i=p i; p p i; p p k; j

[ ] = [b

where  j < p1 ,  k < M=p1 , and M is the length of the DNS sequence . p1 is transposed into a new maAssume that the matrix in 0 0 p1 , where p1 M=p1 , before being transformed trix 0 in back to its DNS representation

[ ]= [

2 p1

+]
j

(8)

MRS( )

MRS( ) =

j; k

[ ]= [ ]
0 0

k; j : i

(9)

From (7)(9), a new form of DNS sequence 0 0 p1 : the following in

0 where 0 j; k is the digit-reversal MRS in a new radix p1 . Finally, we exploited the relationship between the new DNS with the original DNS from (8)(10) as the following: 0

[ ]

MRS( ) []=
i
0

[ ] can be generated by
0

i=p1 ; mod i; p1

(10)

adjacent cells, outputs to upper and left adjacent cells. For example, cell (1,1) receives its input from cell (1,2) and (2,1), and sends its output to cell (1,0) and (0,1). With this structure, the processor is able to transfer the incoming bits to both horizontal (from right to left) and vertical (from lower to upper) directions. In the case of interleaving with p1 , the processor transfers the incoming data horizontally, along the solid line, from encode to the left direction until heading to the nal cell (0,0) at a left and upper corner. Each individual bit on X th position in the input stream is placed a1 2 p1 on a unique cell a1 ; a0 in the processor, where X a0 , as shown in (8). After the rst bit of the input stream arrives at the nal cell (0,0), by changing a ow direction from horizontal to p1 are transposed into new positions vertical, bit positions in M=p1 as shown in (9). Finally, the processor produces the in interleaved output from encode along the dotted line. It transforms M=p1 into interleaved bit positions in DNS as bit positions in shown in (10). De-interleaving process can be performed with the similar way using the same architecture, but the process starts from decode vertically along the dotted line, and the de-interleaved bits are output horizontally along the solid line.

MRS( )

IN

MRS(

MRS( ) ) OUT MRS( )

IN

[]= =

C. Architecture for MB-OFDM As mentioned in Section I, the interleaving process in MB-OFDM system consists of three consecutive permutations, (1)(3). As shown in Fig. 2(a), the rst two consecutive modulo permutations (1) and (2) are expressed as a series of the following three sub-processes: symbol p1 for an M -cells block, division of interleaving in 1-Radix the output from the symbol interleaving into p1 sub-blocks having M=p1 -cells, and tone interleaving in 1-Radix p2 for the each sub-block. Finally, the last permutation (3) is expressed as a process to cyclically shift the bit sequence in the each sub-block, interleaved by the previous two permutations, with Ncyc in Table I. p2 jp1 for Fig. 2(b) shows our architecture in 2-Radix MB-OFDM system, which combines all the three sub-processes for the modulo permutations of (1) and (2) into a single process. It p1 , divides the whole block in DNS into p1 sub-blocks in p1 into differently colored, and divides each sub-block in

mod
0

i; p1 ; i=p1

i=p1

p1

2 mod

i; p1

(11)

It can be seen that the derived (11) is in the same form with (1) and (2), but different from (3). The same expression enables us to implement the interleaving/de-interleaving permutations by transposing the 0 0 matrix p1 2 p1 in MRS with two moduli p1 and p1 . In the following two sub-sections, we show how the MRS permutations are mapped into our interleaver architecture for MB-OFDM. B. Architecture for MRS Modulo Permutation Fig. 1 shows an array processor consisting of M cells, which interleaves M bits through a modular operation with modulus p1 . The array processor consists of a 2-D array with size M=p1 2 p1 , and each cell is connected with four adjacent cells: inputs from lower and right

MRS( )

MRS( )

MRS(

((

) ( ))

MRS( ) MRS( )

1022

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010

Fig. 3. Design of the cyclic shift.

TABLE II PARAMETERS OF BLOCK INTERLEAVING FOR MB-OFDM

Fig. 2. Interleaver architecture for MB-OFDM. (a) General representation for p jp . interleaving processes. (b) Our interleaving processor in

MRS(

bit sequences from the Ncyc 2 th, Ncyc 2 th, and Ncyc 2 th positions, respectively, each of the bit sequences are cyclically shifted. The start point (s) of the second bit sequence is connected to the end point (e) of the rst bit sequence, and the third start point is connected to the second end one. These connections complete the cyclic shift. III. HARDWARE IMPLEMENTATION Table II shows parameter values of the block interleaving for MB-OFDM system. The rst column represents data rates supported by the system. The second column represents the block size for each data rate, which determines the number of cells in the array processor. The other columns show rst p1 and second p2 moduli which are used for the two consecutive modular operations in symbol and tone interleaving processes.

0) (

1)

2)

p2 sub-sub-blocks in p2 jp1 at a time. Each sub-sub-block is represented as a vertical line in Fig. 2(b). All the operations are supported by only alternatively arranging the sub-sub-blocks with different colors and extending some wire connections. For example, the X th bit position in DNS will be transformed into ha2 ja1 ja0 i in p2 jp1 on the 2-D array, where X a2 2 p2 a1 2 p1 a0 . In Fig. 2(b), ha2 ja1 ja0 i is represented as cell a2 ; a1 with a0 color. The processor performs interleaving/de-interleaving in a similar way as the 1-Radix MRS processor in Fig. 1. In the interleaving process, the input bits from encode are moved along the solid line until reaching to the nal cell (0,0) with white color. After the rst arrival to the nal cell, the processor produces the interleaved output along the dotted line through encode . At this time, all cells with the same color are processed to the end before the output is taken from another color. Ai is used to connect a nal cell from one color with the rst cell from another color. Also, the de-interleaving process starts from decode along the dotted line, and the de-interleaved bits are produced through decode along the solid line. Finally, in order to perform a cyclic shift shown in (3), we modied some wire connections and added some multiplexers on the processor. Fig. 3 shows the cyclic shift among three bit sequences, which are classied by the result of a modulo operation X mod p1 , where X is the position of each bit in the original input bit sequence. By taking the

MRS(

MRS(

=(

+ ) )

( )

( )

IN

A. Modular Design for Various Data Rates For consecutive symbol and tone interleaving with moduli p1 and p2 , we employed the architecture in Fig. 2(b). However, in order to provide a modular design for easy implementation, we do not directly use the architecture in our real implementation. Fig. 4 shows the schematic diagram of the real implementation that supports all the data rates shown in Table II. The hardware consists of three parts: A, B , and C , and each part consists of several cell processors, multiplexers, and wire connections. Each addition represents 300, 600, and 1200 bit block size, respectively. The implementation is performed by duplicating part A into B and part A B into C . The multiplexers, located between A and B , are used to congure the part A to be executed alone for the data rates under 80 Mb/s or the combined part A B to be executed for the other data rates under

OUT

OUT

IN

( )

( )

( + )

( ) ( + ) ( ) ( )

( )

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010

1023

TABLE III PERFORMANCE AND COMPLEXITY OF CONVENTIONAL AND PROPOSED ARCHITECTURES

Fig. 4. Schematic diagram of an array processor to support 300, 600, and 1200 bit block sizes for all data rates in Table II.

IV. PERFORMANCE ANALYSIS In order to study the performance advantage of the proposed architecture, we implemented the proposed architecture using Verilog HDL, and then synthesized it with Xilinx-ISE, targeting for Xilinx Virtex-4 XC4VLX100-10FF1148 FPGA logic [17]. We used a conventional interleaver design [8] as a baseline architecture to compare the performance of our proposed architecture. The conventional architecture combined multiple permutations into pipelined processes in order to increase the throughput. The conventional architecture performs each permutation sequentially, and it uses embedded memory to keep the temporary results for each permutation process [9]. Table III shows the performance of the proposed architecture and conventional architecture in terms of a maximum operating clock frequency, hardware complexity and a latency. Since our architecture results in a different hardware implementation for different maximum data rates, we compare three of our hardware implementations for 80, 200, and 480 Mb/s data rates (200 and 480 Mb/s are the mandatory and maximum data rates of MB-OFDM system, respectively) with the conventional architecture which supports all of the data rates in Table I using the same hardware. The proposed architecture reduces the hardware complexity by 85.5% for 80 Mb/s, 69.4% for 200 Mbps, and 40.3% for 480 Mb/s data rates, while improving the maximum clock frequency allowed by more than 3.3 times. The maximum clock frequency of our architecture is about 500 MHz. In addition, our architecture incurs six MB-OFDM symbols as a latency in all three processes of the interleaver, while the conventional architecture requires eight MB-OFDM symbols latency. This latency difference is due to the fact that the conventional architecture executes all the sub-processes sequentially, while our architecture performs them at the same time. Table IV shows a power consumption comparison between the conventional architecture and our proposed architecture. The operating clock frequency is set to 132 MHz as in [1]. Power consumption was estimated by using Xilinx XPower tool. The inputs are assumed to toggle continuously in order to get the worst case estimation of the toggle rate at the circuit. The proposed architecture reduces power consumption of the conventional design by 54.5% in clock power, 88.3% in logic power, and 88.1% in signal power for 80 Mb/s data rate. For 200 Mb/s data rate, the proposed architecture only consumes about 24.6% of logic power in the conventional one. This is reasonable due to the fact that the proposed architecture only uses 30.6% of the logic elements used in the conventional one. Totally, the proposed architecture consumes only around 26.4% of power consumption in the conventional one. Meanwhile, for 480 Mb/s data rate, the proposed design consumes only 60.2% of the total power in the conventional one due to its 20.0% saving in logic power consumption.

Fig. 5. Schematic of each cell in Fig. 4.

200 Mb/s. Instead of using j , we combined two array proj . Additionally, cessors A B and C with 600 cells in a controller is added to enable the two array processors to operate alternatively in every 3 bits, according to the rst modulus p1 , during receiving input bits at the data rates over 200 Mb/s. It then produces the interleaved bit sequence by concatenating the output streams from the two processors. Through this controller, our implementation provides the same functionality with the array processor in j . Finally, in order to support a cyclic shift shown in Fig. 3, the multiplexers in solid-line circles are added, and some wire connections are changed.

( + )

( )

MRS(10 6)

MRS(10 3)

( )

MRS(10 6)

B. Cell Design for the Minimum Latency Fig. 5 shows a schematic of each cell in the array processor. Each cell consists of two 1-bit memory cell (in our implementation, we used a ip-op) and four multiplexers. V_OUT and V_IN are used for the vertical movement of bit stream, while H_OUT and H_IN are used for the horizontal movement. The role of ip-ops are changed alternatively depending on the control signal SELECT. Each cell processes the interleaved bit sequence through one ip-op for the time of six symbols and through the other ip-op for the next six symbols. One ip-op is used to store a new input bit while the other ip-op is used for the output. By using this approach, our architecture does not need any additional delay other than six MB-OFDM symbols, which is the minimum required latency for the interleaving processes. As shown in (1)(3), the interleaving algorithm is performed in the unit of six MB-OFDM symbols. Hence in order to produce the interleaved bit stream, at least six MB-OFDM symbols latency is required.

1024

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 18, NO. 6, JUNE 2010

TABLE IV POWER CONSUMPTION OF CONVENTIONAL AND PROPOSED ARCHITECTURES

V. CONCLUSION In this paper, a mathematical relationship between interleaving processes and MRS was derived. Based on the derivation, we proposed an array processor architecture to support interleaving processes efciently. The performance analysis demonstrates the benets of our proposed architecture in terms of latency, complexity, and power consumption over the conventional approach. The latency of our architecture is six MB-OFDM symbols, which is the minimal. Also, we reduced the complexity by 85.5% for 80 Mb/s and 69.4% for 200 Mb/s, while improving the maximum clock frequency allowed by about 3 times compared with a conventional approach. For 480 Mb/s, the complexity was also reduced 40.3% compared to the conventional one. In addition, we reduced the power consumption by 87.4%, 73.6%, and 39.8% for 80, 200, and 480 Mb/s, respectively.

REFERENCES
[1] A. Batra, J. Balakrishnan, G. R. Aiello, J. R. Foerster, and A. Dabak, Design of a multiband OFDM system for realistic UWB channel environments, IEEE Trans. Microw. Theory Tech., vol. 52, no. 9, pp. 21232138, Sep. 2004.

[2] S. Lin and D. J. Costello, Error Control Coding: Fundamentals and Applications. Englewood Cliffs, NJ: Prentice-Hall, 1983. [3] J. L. Ramsey, Realization of optimum interleavers, IEEE Trans. Inf. Theory, vol. IT-16, no. 3, pp. 338344, May 1970. [4] G. D. Forney, Jr., Burst-correcting codes for the classic bursty channel, IEEE Trans. Commun. Technol., vol. COM-19, no. 5, pp. 772780, Oct. 1971. [5] K. Andrews, C. Heegard, and D. Kozen, A theory of interleavers, presented at the IEEE Int. Symp. Inf. Theory, 1997, 97-1634. [6] R. Garello, G. Montorsi, and G. C. Sergio Benedetto, Interleaver properties and their applications to the trellis complexity analysis of turbo codes, IEEE Trans. Commun., vol. 49, no. 5, pp. 793807, May 2001. [7] WiMedia Alliance, MAC-PHY Interface Specication 1.0, 2005. [Online]. Available: http://www.wimedia.org [8] J. Kim, Interleaver & Deinterleaver for MB-OFDM, Advanced System IC Technology Center (ASTEC), Jul. 2007. [Online]. Available: http://www.astec.re.kr:8080/ipSoC/ipInfo.jsp?ipno=576&leftimage=4 [9] X. Jinsong, L. Xiaochun, W. Haitao, B. Yujing, Z. Decai, Z. Xiaolong, and W. Chaogang, Implementation of MB-OFDM transmitter baseband based on FPGA, in Proc. Int. Conf. Circuits Syst. Commun., May 2008, pp. 5054. [10] E. Tell and D. Liu, A hardware architecture for a multi mode block interleaver, in Proc. Int. Conf. Circuits Syst. Commun., Jun. 2004. [11] D. F. Miller and W. S. McCormick, An arithmetic free parallel mixedradix conversion algorithm, IEEE Trans. Circuits Syst. II, vol. 45, no. 1, pp. 158162, Jan. 1998. [12] B. G. Jo and M. H. Sunwoo, New continuous-ow mixed-radix (CFMR) FFT processor using novel in-place strategy, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 5, pp. 911919, May 2005. [13] J. M. Camara, M. Moreto, E. Vallejo, R. Beivide, J. Miguel-Alonso, C. Martinez, and J. Navaridas, Mixed-radix twisted torus interconnection networks, in Proc. Int. Par. Distr. Processing Symp., Mar. 2007, pp. 110. [14] W. K. Jenkins and E. J. Altman, Self-checking properties of residue number error checkers based on mixed radix conversion, IEEE Trans. Circuits Syst., vol. 35, no. 2, pp. 159167, Feb. 1988. [15] P. V. A. Mohan and A. B. Premkumar, RNS-to-binary converters for 1, 2 , 2 ,2 1 and 2 1, 2 , two four-moduli sets 2 2 + 1, 2 + 1, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 6, pp. 12451254, Jun. 2007. [16] M. Akkal and P. Siy, A new mixed radix conversion algorithm MRCII, J. Syst. Arch., vol. 53, no. 9, pp. 577586, Sep. 2007. [17] Xilinx, San Jose, CA, Virtex-4 Multi-Platform FPGA, 2006. [Online]. Available: http://www.xilinx.com/products

Das könnte Ihnen auch gefallen