Sie sind auf Seite 1von 5

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts

for publication in the IEEE GLOBECOM 2005 proceedings.

Optimized Software Implementation of a Full-Rate


IEEE 802.11a Compliant Digital Baseband
Transmitter on a Digital Signal Processor
Yiyan Tang, Lie Qian, and Yuke Wang
Department of Computer Science
University of Texas at Dallas
Richardson, TX, 75080, USA
{yiyan, lqian, yuke}@utdallas.edu
AbstractThe explosive growth of 802.11-based wireless LANs
has attracted interest in providing higher data rates and greater
system capacities. Among the IEEE 802.11 standards, the 802.11a
standard based on OFDM modulation scheme has been defined
to address high-speed and large-system-capacity challenges.
Hardware implementations are often used to meet the high-datarate requirements of 802.11a standard. Although software based
solutions are more attractive due to the lower cost, shorter
development time, and higher flexibility, it is still a challenge to
meet the high-data-rate requirements of 802.11a by software. In
this paper, we implement a software-based 802.11a digital
baseband transmitter on the TI TMS320C64x DSP. The
transmitter can operate over all data rates defined in the 802.11a
standard and is compatible with the high-rate portions of the
802.11g standard. Two major optimizations have been used in the
software implementation to achieve the high-data-rate: 1)
parallelizing the scrambler function and 2) concatenating the
FEC encoder, puncturing, and interleaver functions.
Experimental results show that the optimized software
implementation on a single C64x DSP with a clock frequency of
1.0GHz can operate at the maximum of 136Mbits/s, which is
twice as fast as the previous software implementation at the same
clock frequency.
Keywords: IEEE 802.11a, digital baseband transmitter, digital
signal processor, software implementation

I.

INTRODUCTION

Due to the low-cost and high-data-rate, the popularity of


IEEE 802.11-based Wireless Local Area Networks (WLAN) is
growing exponentially. There are three major physical layer
standards available in the 802.11 family: the Complementary
Code Keying (CCK)-based 802.11b [1], the Orthogonal
Frequency Division Multiplex (OFDM)-based 802.11a [2], and
the OFDM-based 802.11g [3]. The 802.11b standard uses the
2.4GHz band and supports data rates of 1, 2, 5.5, and 11
Mbits/s. The 802.11a standard operates in the 5GHz band with
possible data rates of 6, 9, 12, 18, 24, 36, 48, and 54 Mbits/s.
The 802.11g standard released in 2003 operates in the 2.4GHz
band and supports all the data rates defined in the 802.11a and
802.11b standards. For the higher data rates in 802.11a, the
802.11g standard uses the same OFDM technology in 802.11a,

IEEE Globecom 2005

while backward compatibility is added to support the lower


data rates of 802.11b [4].
To support the high-data-rate requirements in the 802.11a
and 802.11g standards, application specific integrated circuits
(ASIC) [5][6] and field programmable gate arrays (FGPA)
[7][8] designs have been used. However, hardware-based
implementations often lack of flexibility and the hardware
development cycle is onerous. On the other hand, softwarebased implementations enable elegant reuse of silicon area and
dramatically reduce time-to-market through software
modification, but are typically much slower than hardware
implementations based on comparable technologies. An
existing software implementation for a fully-compliant 802.11a
full-rate digital baseband transmitter requires the use of a 22processor array running at a 1.0GHz clock frequency to reach
54Mbits/s performance [9].
Digital signal processors (DSPs) are a special class of
processor optimized for signal-processing applications in
communication systems. Although DSPs have been used to
implement the 802.11a standard [10], they can only support
limited data rates due to the lack of global parallelism found at
the application level [9]. Hence, it is still a major challenge to
develop a software implementation for the 802.11a standard on
a DSP to meet the high-data-date requirements.
In this paper, we present a software-based 802.11a digital
baseband transmitter implementation on the TI TMS320C64x
DSP. The transmitter can operate over all data rates defined in
the 802.11a standard and is compatible with the high-rate
portions of the 802.11g standard. Two major optimizations
have been introduced to explore the parallelism within and
between the individual functions of the transmitter to achieve
the high-data-rate requirements: 1) parallelizing the scrambler
function and 2) concatenating the FEC encoder, puncturing,
and interleaver functions. Experimental results show that the
optimized software implementation on a single C64x DSP with
a clock frequency of 1.0GHz can operate at a maximum of
136Mbits/s, which is twice as fast as the software
implementation in [9] at the same clock frequency.
In the following, Section 2 introduces the digital baseband
transmitter defined in 802.11a standard. The details of the

2194

0-7803-9415-1/05/$20.00 2005 IEEE

Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on October 24, 2009 at 08:10 from IEEE Xplore. Restrictions apply.

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE GLOBECOM 2005 proceedings.

optimized software implementation are described in Section 3.


Section 4 presents the experimental results. Finally the
conclusions are drawn in Section 5.
II.

the scrambler generates a pseudo-random sequence used to


randomize the input bit stream.

802.11A DIGITAL BASEBAND TRANSMITTER

The OFDM modulation scheme used in 802.11a distributes


the data over 52 subcarriers on a 20MHz channel to mitigate
the effects of multipath. Among the 52 subcarriers, 48 are for
data and 4 are for pilot signals used for tracking. Each
subcarrier is 312.5kHz wide, giving raw data rates from
125kbits/s to 1.125Mbits/s per subcarrier depending on the
modulation type binary phase shift keying (BPSK),
quaternary PSK (QPSK), 16-quadrature amplitude modulation
(QAM), or 64-QAM and the error-correcting code rate (1/2,
2/3, or 3/4). The composite signal therefore has a data rate
ranging from 6Mbits/s to 54Mbits/s in the 20MHz channel
[11]. Table 1 lists the mode-dependent parameters for the
802.11a standard.

Figure 2. Structure of the scrambler

After the input bits are scrambled, they are encoded by a


convolutional encoder with industry standard polynomials G0
= 1338 and G1 = 1718, of rate R = 1/2 as shown in Fig. 3. The
value of the bit string {Z5,Z4,Z3,Z2,Z1,Z0} is called the state of
the convolutional encoder and the bit denoted by X is outputted
before Y for the same input bit in Fig. 3.

TABLE I. MODE-DEPENDENT PARAMETERS FOR 802.11A

Data rate
(Mbits/s)

Modulation type

Code rate
(R)

Coded
bits per
subcarrier
(NBPSC)

6
9
12
18
24
36
48
54

BPSK
BPSK
QPSK
QPSK
16-QAM
16-QAM
64-QAM
64-QAM

1/2
3/4
1/2
3/4
1/2
3/4
2/3
3/4

1
1
2
2
4
4
6
6

Coded
bits per
OFDM
symbol
(NCBPS)
48
48
96
96
192
192
288
288

Figure 3. Convolutional encoder in 802.11a standard

Data bits
per
OFDM
symbol
(NDBPS)
24
36
48
72
96
144
192
216

Because the rate of the encoder is fixed at 1/2, the output bit
stream of the encoder must be punctured to obtain the code
rates of 2/3 and 3/4 as shown in Fig. 4.

The block diagram of a digital baseband transmitter defined


in 802.11a standard is shown in Fig. 1, which produces one
OFDM symbol at a time based on the parameters in Table 1.
The input bit stream is first randomized by a scrambler and
encoded by a convolutional encoder at a coding rate of 1/2.
Puncturing is used to obtain code rates other than 1/2. The bit
stream is then interleaved and mapped to complex numbers
representing frequency domain signals of the OFDM
subcarriers based on modulation rules. After the pilot signals
are inserted, an Inverse Fast Fourier Transform (IFFT) is
performed to convert frequency domain signals to time domain
signals. Finally the resulting time domain signals are cyclically
extended to form the guard interval for each OFDM symbol.

Figure 4. Puncturing patterns to obtain 2/3 and 3/4 code rates

The output bits from the puncturing block are interleaved


by two interleavers. The first interleaver is a bit-wise block
interleaver with 16 rows and NCBPS/16 columns that accepts
NCBPS bits at a time. Fig. 5 shows the structure of the block
interleaver and an example of a block interleaver for NCBPS =
48.

Figure 5. Operation of the block interleaver for NCBPS = 48

The second interleaver is only used by QAM modulation


types. By taking NCBPS bits from the first interleaver at a time,
the second interleaver operates based on following equation:
Figure 1. Block diagram for the digital baseband transmitter

Fig. 2 shows the scrambler structure defined in the 802.11a


standard. The value of the bit string {x7,x6,x5,x4,x3,x2,x1} is
called the state of the scrambler. Given a non-zero initial state,

IEEE Globecom 2005

The i-th input bit to the second interleaver, where i = 0, 1,


..., NCBPS 1, is moved to the position j:

2195

j = s x floor(i/s) + (i + NCBPS floor(16 x i/NCBPS)) mod s (1)

0-7803-9415-1/05/$20.00 2005 IEEE

Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on October 24, 2009 at 08:10 from IEEE Xplore. Restrictions apply.

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE GLOBECOM 2005 proceedings.

where s = max(NBPSC/2, 1) and the floor( ) function returns


the largest integer not exceeding the parameter.
Every NCBPS output bits of the interleavers are converted
into 48 complex numbers by modulation mapping function for
the chosen modulation type. Hence, each complex number
represents NBPSC output bits of the interleavers.
Before the complex numbers are processed by the 64-pt
IFFT, four complex numbers representing the pilot signals are
inserted and the total 52 complex numbers are extended to 64
complex inputs of IFFT. The four complex numbers for pilot
signals can be fetched from a predefined sequence. Fig.6
illustrates the composition of the 64 complex IFFT inputs.
The IFFT is performed based on the following equation:
1 63
x( n) =
X (k )W64 kn
64 k = 0

Figure 7. Operation of the scrambler for three consecutive input bits

(2)

where X(k) and x(n) are complex numbers, j2 = -1, and


W64 kn = e j 2nk / 64 .

Figure 8. Parallelized scrambler to generate three bits at a time

Figure 6. The composition of the IFFT inputs

Finally, the 64 complex outputs of the IFFT are extended to


form an array of 80 complex numbers by copying the last 16
outputs as the guard interval prior to the first IFFT output.
III.

OPTIMIZED SOFTWARE IMPELEMENTATION

Constrained by the sequential execution model, software


implementation of the IEEE 802.11a transmitter on single-chip
VLIW DSPs requires a high number of instructions per cycle to
achieve the required data rate [9]. Instead of looking for
powerful DSP with higher clock frequency, we develop our
software implementation on DSP with two major optimizations
to explore parallelism within and between individual functions
of the transmitter: 1) parallelizing the scrambler function and 2)
concatenating the FEC encoder, puncturing, and interleaver
functions.
A. Parallizing the Scrambler Function
The scrambler in Fig. 2 processes one input bit at a time.
By observing the operation of the scrambler defined in the
802.11a standard for consecutive input bits, we found that three
consecutive output bits can be generated based on solely the
current state of the scrambler as shown in Fig. 7.
Based on this observation, we parallelize the scrambler to
take three consecutive input bits at a time and generate three
output bits as shown in Fig. 8.

IEEE Globecom 2005

The corresponding C code to compute three consecutive


output bits based on the current state of the scrambler is shown
in Fig. 9, where all variables are declared as unsigned integer
type. The current state of the scrambler is stored in the seven
most significant bits (MSBs) of sc_state and the three
consecutive output bits are collected in tmp_block. The _bitr( )
function performs bit order reversal on a 32-bit variable, which
is necessary because the C64 processor uses little-endian bit
order and the first output bit is the least significant bit (LSB) of
tmp_block. Since all values of NDBPS in Table 1 can be divided
by three, the parallelized scrambler works for all data rates
defined in Table 1 without a bit alignment problem.

Figure 9. Corresponding C code of the parallelized scrambler

B. Concatenating the FEC Encoder, Puncturing, and


Interleaver Functions
To explore parallelism between the individual FEC
encoder, puncturing, and interleaver functions, we first
parallelize the FEC encoder function, then concatenate it with
the puncturing function, and finally parallelize the concatenated
FEC encoder and puncturing function again to concatenate it
with the interleaver function.
1) Parallelizing the FEC Encoder Function
The operation of the convolutional encoder function in Fig.
3 can be represented as logically ANDing two bit masks with
the input bit stream, one bit mask having the value M0 = 1338
to generate X and the other bit mask having the value M1 =

2196

0-7803-9415-1/05/$20.00 2005 IEEE

Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on October 24, 2009 at 08:10 from IEEE Xplore. Restrictions apply.

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE GLOBECOM 2005 proceedings.

1718 to generate Y. The output bits X and Y are generated by


counting the number of 1 bits in the ANDed results, the
output bit being 1 when there are an odd number of 1 bits
and 0 otherwise. Hence, the convolutional encoder in Fig. 3
can be redrawn as in Fig. 10, where bit masks are used to
represent the generator polynomials.

Figure 12. Concatenating the FEC encoder, puncturing, and interleaver


functions for BPSK, R = 3/4

Figure 10. Bit mask representation of the FEC encoder

By observation, the convolutional encoder generates output


bits independent of each other, i.e., generating an output bit of
the convolutional encoder does not depend on any previous
output bits. Hence, more than one input bit can be processed by
the convolutional encoder at a time by using more than two bit
masks. For example, the convolutional encoder in Fig. 11 can
generate 12 output bits at a time for 6 consecutive input bits by
using 12 bit masks.

The corresponding C code to compute the concatenated


FEC encoder, puncturing, and interleaver function is shown in
Fig. 13, where the first 8 rows (row0 to row7) of output bits in
Fig. 12 are generated. The remaining 8 rows of output bits
(row8 to row15) can be generated in the same way.

Figure 11. Concatenating the parallelized convolutional encoder with the


puncturing function for a code rate of 1/2

2) Concatenating the Interleaver with the FEC Encoder


and Puncturing Function
Since the concatenated function in Fig. 11 generates output
bits independent of each other, we can reorder the bit masks to
generate output bits in the output order of the interleaver shown
in Fig. 5 instead of the input order. In this way, the interleaver
function is concatenated with the convolutional encoder and
puncturing functions. For example, given the concatenated
function in Fig. 11 and the interleaver in Fig. 5, Fig. 12
concatenates the convolutional encoder, puncturing, and
interleaver functions for BPSK with code rate 3/4 by
parallelizing the concatenated function in Fig. 12.
In Fig. 12, 48 output bits are generated for 36 input bits at a
time. Each row of three bit masks generates output bits
corresponding to one row in the interleaver. Since all values of
NDBPS in Table 1 with code rate 3/4 can be divided by 36, the
concatenated function in Fig. 12 can be used for all cases in
which the code rate is 3/4.

IEEE Globecom 2005

Figure 13. Concatenating the FEC encoder, puncturing, and interleaver


functions for BPSK, R = 3/4

In Fig. 13, the six LSBs of cc_state store the state of the
convolutional encoder, cc_mask0 and cc_mask1 contain one
row of the M0 and M1 bit masks in Fig. 13 respectively, and
input_block contains the input bit stream. The variables
cc_tmp0 to cc_tmp7 collect the output bits corresponding to the
first eight row of the interleaver in Fig. 13.
The concatenated function for rate 1/2 and rate 2/3 cases
can be constructed in the similar way of the R = 3/4 case.
IV.

EXPERIMENTAL RESULTS

The optimized software implementation of the 802.11a


transmitter was developed on the TI TMS320C64x DSP, which
is a fixed-point DSP with enhanced VLIW (Very Long
Instruction Word) architecture. The C64x DSP has eight
functional units that are capable of completing at most eight
operations in parallel at a maximum clock frequency of 1GHz
[12].

2197

0-7803-9415-1/05/$20.00 2005 IEEE

Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on October 24, 2009 at 08:10 from IEEE Xplore. Restrictions apply.

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE GLOBECOM 2005 proceedings.

The functions of the transmitter are developed in C codes


and compiled in the Code Composer Studio (CCS) v3.0 with
the maximum compilation efforts toward execution speed. The
IFFT function is implemented by the DSP_fft16x16t function
from the TI C64x DSP library [13]. Table 2 shows the
performance for the individual functions of the transmitter in
terms of the clock cycles to compute the functions. The data
rate in Megabits per second (Mbits/s) is computed as:
Assuming 1GHz clock frequency
Data rate (Mbits/s) = 1000MHz / (Cycles/NDBPS (bits)) (3)
TABLE II.

PERFORMANCE OF THE OPTIMIZED SOFTWARE


IMPLEMENTATION

Modulation
Type

Code
rate (R)

Scrambler
(Cycles)

BPSK
BPSK
QPSK
QPSK
16-QAM
16-QAM
64-QAM
64-QAM

1/2
3/4
1/2
3/4
1/2
3/4
2/3
3/4

Modulation
Type

Code
rate (R)

72
115
138
196
218
332
405
468
Modulation
Mapping and
Pilot Insertion
(Cycles)

BPSK
BPSK
QPSK
QPSK
16-QAM
16-QAM
64-QAM
64-QAM

1/2
3/4
1/2
3/4
1/2
3/4
2/3
3/4

Modulation
Type

Code
rate (R)

Required Data
Rate (Mbits/s)

BPSK
BPSK
QPSK
QPSK
16-QAM
16-QAM
64-QAM
64-QAM

1/2
3/4
1/2
3/4
1/2
3/4
2/3
3/4

6
9
12
18
24
36
48
54

Concatenated
FEC Encoder,
Puncturing,
Interleaver
(Cycles)
93
106
123
139
165
201
242
263
IFFT
(Cycles)

V.

In this paper, we have implemented a software based


802.11a digital baseband transmitter on the TI TMS320C64x
DSP. The transmitter can operate over all data rates defined in
the 802.11a standard and is compatible with the high-rate
portions of the 802.11g standard. Two major optimizations
have been performed in the software implementation to achieve
the high-data-rate: 1) parallelizing the scrambler function and
2) concatenating the FEC encoder, puncturing, and interleaver
functions. Experiments show that the software implementation
on a single C64x DSP at a clock frequency of 1GHz can
operate at a maximum of 136Mbits/s, which is twice as fast as
the previous software implementation at the same clock
frequency.
REFERENCES

Second
Interleaver
(Cycles)

[1]

Bypassed
Bypassed
Bypassed
Bypassed

[2]

[3]

82
113

[4]

Pilot
Insertion
(Cycles)

[5]

[6]

280
311
388

[7]

44

301
306

[8]
The transmitter as a whole
Cycles
877
933
1004
1078
1198
1348
1498
1582

Mbits/s
28
38
48
66
80
106
128
136

[9]

[10]

[11]

From Table 2, the software implementation on a single


C64x DSP with a clock frequency of 1GHz can operate at a
maximum of 136Mbits/s while satisfying all the data rate
requirements defined in the 802.11a standard.

IEEE Globecom 2005

CONCLUSION

[12]
[13]

2198

IEEE 802.11b-1999, Wireless LAN medium access control (MAC) and


Physical layer (PHY) Specifications: High Speed Physical Layer
Extension in the 2.4 GHz Band, 1999.
IEEE 802.11a-1999, Wireless LAN medium access control (MAC) and
Physical layer (PHY) Specifications: High Speed Physical Layer in the 5
GHz band, 1999.
IEEE 802.11b-1999, Wireless LAN medium access control (MAC) and
Physical layer (PHY) Specifications: Further High Speed Physical Layer
Extension in the 2.4 GHz Band, June 2003.
W. Stallings, IEEE 802.11: wireless LANs from a to n, IT
Professional, Vol. 6, Issue 5, pp. 32-37, Sept.-Oct., 2004.
R. Ahola, etc., A single-chip CMOS transceiver for 802.11a/b/g
wireless LANs, IEEE Journal of Solid-State Circuits, Vol. 39, Issue 12,
pp. 2250-2258, Dec. 2004.
K. Vavelidis, etc., A dual-band 5.15-5.35-GHz, 2.4-2.5-GHz 0.18-um
CMOS transceiver for 802.11a/b/g wireless LAN, IEEE Journal of
Solid-State Circuits, Vol. 39, Issue 7, pp. 1180-1184, July 2004.
P. Coulton and D. Carline, An SDR inspired design for the FPGA
implementation of 802.11a baseband system, Proc. of IEEE
International Symposium on Consumer Electronics, pp. 470-475, Sept.
1-3, 2004.
C. Dick and F. Harris, "FPGA implementation of an OFDM PHY,"
Proc. of The Thrity-Seventh Asilomar Conference on Signals, Systems &
Computers, Vol. 1, pp. 905-909, Nov. 9-12, 2003.
M.J. Meeuwsen, O. Sattari, and B.M. Baas, A full-rate software
implementation of an IEEE 802.11a compliant digital baseband
transmitter, Proc. of IEEE Workshop on Signal Processing Systems
(SIPS 2004), pp. 124-129, Oct. 13-15, 2004.
M.F. Tariq, Y. Baltaci, T. Horseman, M. Butler, and A. Nix,
Development of an OFDM based high speed wireless LAN platform
using the TI C6x DSP, Proc. of IEEE International Conference on
Communications (ICC), Vol. 1, pp. 522-526, April 28-May 2, 2002.
T.H. Meng, B. McFarland, D. Su, and J. Thomson, Design and
implementation of an all-CMOS 802.11a wireless LAN chipset, IEEE
Communications Magazine, Vol. 41, Issue 8, pp. 160-168, Aug. 2003.
Texas Instruments, DSP Selection Guide, SSDV004P, Feb. 2005.
Texas Instruments, TMS320C64x DSP Library Programmers
Reference, SPRU565A, April 2002.

0-7803-9415-1/05/$20.00 2005 IEEE

Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on October 24, 2009 at 08:10 from IEEE Xplore. Restrictions apply.