Beruflich Dokumente
Kultur Dokumente
Abstract—Synchronization is a critical operation required by using VHDL [4] and implementation results are reported. The
a wireless local area network (WLAN) receiver. This paper performance of the architecture is evaluated under different
presents the design, implementation, and evaluation of an IEEE addative white Gaussian noise (AWGN) channel models gen-
802.11a packet synchronizer for deployment on a Xilinx Virtex-5
field-programmable gate array (FPGA) device. Packet detection, erated by MATLAB simulations. MOTIVATION SENTENCE.
carrier frequency offset estimation/correction, and time synchro- We note that although our work is for IEEE 802.11a, it can
nization are all achieved by processing samples before the fast be easily translated to other packet-oriented communication
Fourier transform (FFT) computation on the receiver. The design standards that employ OFDM, such as HYPERLAN/2 [5].
is fully synchronous, operates at 20 MHz in a single clock domain, Some publications that have contributed to WLAN syn-
and particular attention is paid to the precision of the arithmetic
computations. FPGA implementation results are reported and the chronizer design are as follows. In [6], a method for esti-
performance of the design is evaluated by simulation under dif- mating the carrier frequency offset in the frequency domain
ferent addative white Gaussian noise (AWGN) channel conditions is presented. In [7], a method for detecting a packet and
generated from MATLAB simulations. improved carrier frequency estimation technique in the time
Index Terms—IEEE 802.11a, OFDM, Wi-Fi, Xilinx Virtex-5. domain is proposed. In [8], the performance of different cross-
correlation methods for time synchronization are evaluated.
I. I NTRODUCTION In [9], a multiplierless cross-correlation strategy is presented,
where quantized versions of the training sequence samples
EEE 802.11a [1] is a widely used standard in many
I wireless local area network (WLAN) communication sys-
tems that achieves transmission rates of up to 54 Mbps and
are used, reducing the area complexity while maintaining
acceptable performance. In [10], a different quantized cross-
correlation approach is presented that relies only on shifts to
operates in the industrial, scientific, and medical (ISM) 5.8 perform the multiplications. In [11], some modifications to the
GHz unlicensed radio band. IEEE 802.11a uses orthogonal computations in [7] are presented, as well as the application of
frequency-division multiplexing (OFDM) [2] for its physical the frequency offset estimations in the top-level synchronizer
layer (PHY), and it is well known that the performance of an design. In [12], a synchronizer architecture with an automatic
OFDM system is degraded if there is any loss of orthogonality gain control (AGC) circuit is presented. Finally, a Master’s
between sub-carriers or inter-symbol interference (ISI). thesis, [13], presents the design and evaluation of a WLAN
For the very large-scale integration (VLSI) implementation synchronizer on FPGA.
of the 802.11a PHY, one of the most complicated operations The remainder of this paper is organized as follows. In
is synchronization. In the presence of channel noise, the Section II, the preliminaries are reviewed. In Section III, our
synchronizer on the receiver must be able to design is presented. In Section IV, our results are reported. In
• detect an incoming packet, Section V, the conclusion and future work are given.
• estimate and correct any carrier frequency offset, and
• perform accurate time synchronization.
II. P RELIMINARIES
To allow for the robust implementation of these three opera-
tions, IEEE 802.11a [1] specifies that all packets begin with a A. IEEE 802.11a Packet Format
preamble (training sequence) formed of periodic symbols, and The IEEE 802.11a OFDM packet format is illustrated in
to reduce the probability of ISI and increase the robustness Figure 1. For synchronization, the preamble (training se-
to multipath fading, a cyclic prefix is added to each OFDM quences) are of interest. There are two training sequences,
data symbol. Field-programmable gate arrays (FPGAs) offer the short training sequence (STS) and the long training se-
an attractive platform for OFDM baseband processing because quence (LTS). The STS consists of ten copies of a 16-sample
modern devices are inexpensive, reconfigurable, include built- sequence, while the LTS consists of a 32-sample cyclic prefix
in, high-performance digital signal processing (DSP) slices in followed by two copies of a 64-sample sequence. The received
their fabrics, and offer quick time-to-market. analog signal is sampled by a pair of ADCs operating at
In this paper, we present the design, implementation, and 20 MHz, therefore the duration of a sample is Ts = 50
evaluation of an IEEE 802.11a PHY receiver synchronizer ns, which translates to both the STS and LTS being 8 µs
VLSI architecture on a Xilinx Virtex-5 FPGA device [3]. We in length. Typically, the STS symbols are used for AGC,
present the proposed architecture in detail, including the preci- packet detection, coarse carrier frequency offset estimation,
sion of the arithmetic computations assuming two 14-bit, two’s and coarse timing synchronization, while the LTS symbols
complement analog-to-digital converters (ADCs) provide the are used for fine carrier frequency offset estimation and fine
input samples to the FPGA. The architecture is described timing synchronization [?].
2
10 # 0.8 " 8 s 1.6 ! 2 # 3.2 " 8 s 0.8 ! 3.2 " 4 s 0.8 ! 3.2 " 4 s 0.8 ! 3.2 " 4 s
OFDM OFDM
t1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t10 GI2 T1 T2 GI signal GI GI
symbol symbol
B. Selected Xilinx Virtex-5 FPGA Device where, r̄ [n + m] denotes complex conjugate of r [n + m]. The
average power computation is defined as
The Xilinx Virtex-5 xc5vlx110t-1ff1136 device [3] is se-
s −1
NX
lected as the target FPGA for the synchronizer architecture 2
to be deployed on. In terms of available resources, the P [n] = |r [n + m]| . (2)
m=0
xc5vlx110t-1ff1136 device contains 17,280 slices, 64 DSP48E
slices, 148 dual port 36 Kb random access memory (RAM) Then, the STS of a packet can be detected when the ratio of
blocks, and 680 I/O pins. Each slice contains four look-up |R [n]|
2
tables (LUTs) and four flip flops (FFs). The DSP48E slices M [n] = 2 (3)
have an embedded signed 18-bit by 25-bit multiplier, adder, (P [n])
and accumulator. The block RAMs can be instantiated using crosses a predetermined threshold, i.e., M [n] > thpd [7].
the Xilinx primitive component RAMB36. RAMB36 instances It is known that (1) and (2) can be realized using a sliding
are dual port, highly configurable in terms of addressing window to reduce the area complexity of the computation [7].
modes, and have a maximum bus width of 36-bit per port. For the windowing implementation of (1), its form is
The numerous I/O pins contain flip-flops which can sample
R [n + 1] = R [n] − r̄ [n − Ns ] · r [n] + r̄ [n] · r [n + Ns ] (4)
the input signals or latch the output signals.
and the approach is similar for (2), i.e.,
2 2
III. M ETHODOLOGY P [n + 1] = P [n] − |r [n − Ns ]| + |r [n]| . (5)
In this section, we present the design of the packet detector, Since division is generally an expensive operation to imple-
carrier frequency offset estimator and corrector, as well as the ment in hardware, the detection metric in (3), can be realized
time synchronizer, afterward we explain how they function alternatively as
together to form the entire synchronizer. For each block, 2 2
|R [n]| > thpd · (P [n]) , n > 0. (6)
we present the related mathematics and then provide their
design for implementation on FPGA. In terms of notation, In the literature, different values are suggested for thpd , e.g.,
we let r [n] = rre [n] + irim [n] denote the n-th, 14-bit, two’s 0.5, 0.75, and 0.81 in [19], [20], and [16], respectively. In our
complement, complex-pair of samples output from the ADCs, architecture, we select thpd = 0.75 because it is determined to
and Ns = 16 and Nl = 64 denote the number of samples be a good choice by simulation and the multiplication opera-
in a STS and LTS symbol, respectively. Note that we have tion in (6) can be implemented as an addition operation with
completed the entire design of the synchronizer in the time the addends right-shifted [20]. Finally, to reduce the number
domain, i.e., before the FFT operation on the OFDM receiver, of false positives from (6), one can employ a smoothing circuit
thereby simplifying the high-level control logic on the receiver. which requires 8 of the last 32 comparisons to be true before
declaring a packet to be detected [13].
Implementation: An illustration of the implementation of
A. Packet Detector the packet detector architecture is shown in Figure 2. The top
and bottom branches realize the delayed auto-correlation (4)
The primary role of the packet detector in this synchronizer and average power (5) computations, respectively. The Ns -
is to alert the other blocks that the current samples are part cycle delay blocks are each implemented using a RAMB36,
of the STS. The packet detector architecture presented in this counter, and simple finite state machine (FSM), with the input
paper uses the popular STS delayed auto-correlation/average wired to RAMB36 port A and the delayed output on port B.
power thresholding technique proposed in [7]. Various imple- The complex multiplication (with conjugation) is implemented
mentations of this approach can be found in [12], [13], [14], using three DSP48E multipliers and five adder-subtractors as
[15], [16], [17], [18], [19], and [20].
Mathematics: We opted to implement the thresholding (are + iaim ) · (bre − ibim ) = pre + ipim ,
technique across one STS symbol, with the delayed auto- where
correlation computation defined as pre = are · bre + aim · bim
s −1
NX and
R [n] = r̄ [n + m] · r [n + m + Ns ] (1)
m=0 pim = (are + aim ) · (bre − bim ) − (are · bre − aim · bre ) .
3
+ +
16D x 16D x
-
* 16D
+
| |2
r n!
* 16D
+
-
output precision of the squaring operation on the average Consequently, ∆φ can be computed from
power branch is 64-bit. With the threshold multiplication is
realized as a 64-bit adder, the result is sign extended and ∆φ = ∠R [n], (8)
requires a 65-bit comparator. The final smoothing operation
then rearranging (7), substituting (8), and setting ∆t = Ns ·Ts ,
is implemented as a window accumulator (using a SRLC32E
the coarse CFO can be estimated as
to delay the input bit) and output equality tester.
∠R [n]
fc = . (9)
2π · (Ns · Ts )
B. Carrier Frequency Offset Estimator and Corrector
In [11], it is shown that of the coarse CFO estimate can be
Due to channel conditions, a carrier frequency offset (CFO) applied to the input samples as
will likely exist in the received packet samples. Similar to
r0 [n] = r [n] · e−j2πnTs fc , (10)
the approach described in [7], the architecture presented in
this paper uses the phase angle of the delayed auto-correlated and by substituting (9) into (10), one obtains
STS and LTS symbols for the computation and correction of
the coarse and fine CFO estimates, respectively. We adopt the r0 [n] = r [n] · e−jn∠R[n]/Ns . (11)
strategy presented in [11], where the coarse CFO estimate is
applied to future input samples. Then, the fine CFO estimate Since Ns = 24 , the division operation in (11) is implemented
is calculated over coarse CFO corrected samples and subse- as a simple 4-bit right shift. The multiplication operation in
quently applied to them before being passed on to the FFT. (11) is realized iteratively by additions as
Some other papers which present variations of this approach ∠R [n]
include [12], [13], [14], [17], [18], [19], and [20]. θ [n] = θ [n − 1] + , (12)
Ns
4
x D || and
int(log2 x)
c 0! D
+2
x>0
Q2 (x) , −2int(log2 −x) x<0,
x D ||
0 x=0
MAX
c 1! D
where i is the number of quantization levels,
nft
max (cre [n] , cim [n]) = 0.160660, and the function int
returns the closest integer. Then, define the quantized
x D || cross-correlation sequence as
c N l " 1! D q [n] , Q2 (Q1 (cre [n])) + iQ2 (Q1 (cim [n])) .
r# n!
Now, the cross-correlation in (13) can be approximated as
Figure 5: Illustration of the fine time synchronizer architecture. N
X l −1
Λ [n] ≈ q̄ [n] · r [n + m]
m=0
detected, the coarse time synchronizer waits until the value The estimate of the cross-correlated value, Λ [n] can calculated
of as
2 2
|R [n]| < thct · max |R [n]| , n > npd N l −1
X sre [n] × (rre [n + m] lre [n]) −
and declares that sample as the coarse timing estimate nct . At ReΛ [n] ≈
sim [n] × (rim [n + m] lim [n])
this point, the input samples form the LTS cyclic prefix [11]. m=0
Fine Time Mathematics: For the fine time synchronization Nl −1
operation, when the received samples are correlated with the
X sre [n] × (rim [n + m] lre [n]) +
ImΛ [n] ≈
preamble coefficients, the cross-correlation output is defined sim [n] × (rre [n + m] lim [n])
m=0
as, where s [n] = sre [n] + isim [n] and l [n] = lre [n] + ilim [n] are
N
Xl −1
computed from q̄ [n] as
Λ [n] = c̄ [m] · r [n + m] (13)
m=0
+1, qre [n] > 0
+1, qim [n] > 0
sre [n] = −1, qre [n] < 0 , sim [n] = −1, qim [n] < 0
where c̄ [m] denotes the conjugate of the m-th LTS sample.
0, qre [n] = 0 0, qim [n] = 0
We arrive at an accurate synchronized estimate by using the
following metric and (
log2 |qre [n]|, qre =
6 0
n̂ = arg max |Λ [n]| (14) lre [n] = ,
n 0, qre = 0
where 0 ≤ n̂ ≤ Nl denotes the estimated sample in the second
(
log2 |qim [n]|, 6 0
qim =
copy of the LTS after the correlation is completed [8]. To lim [n] = .
identify n̂ in (14) from the coarse timing estimate location in 0, qim = 0
the first LTS, we perform Nl , Nl -sample cross-correlations in Note that this is different than what was reported in [10], where
parallel using different offset LTSs as Λ [n] is obtained with a single summation.
N
Xl −1 The computation of magnitude operation in (14), as com-
Λi [n] = c̄i [m] · r [n + m] (15) pleted in the packet detector requires 2 multipliers per mag-
m=0 nitude computation. In [20], the following approximation is
where c̄i [m] = c̄ [(i + m) mod Nl ] for 0 ≤ i ≤ Nl − 1. used
q
From (15), one observes that Nl complex multiplications 2 2
|Λ [n]| = (ReΛ [n]) + (ImΛ [n])
are required to compute each Λi [n] value. Our structure is a
modification of the one presented in [8] using quantized train- max (abs (ReΛ [n]) , abs (ImΛ [n]))+
≈
ing sequences in [10] to reduce the correlation multiplications min (abs (ReΛ [n]) , abs (ImΛ [n]))/2
to shift and adds. In this case, the complex-number samples which provides a considerable reduction in hardware complex-
of the LTS are mapped to real powers of two, i.e., ity.
c [n] = cre [n] + icim [n] → q [n] = qre [n] + iqim [n] Implementation: The implementation of the coarse and fine
time synchronizer architectures are illustrated in Figures 4
where and 5, respectively. Like the CFO estimator, the coarse time
qre [n] , qim [n] ∈ 0, ±20 , ±21 , · · · , ±2i . synchronizer shares a portion of its logic with the packet
detector [13]. The circular shift register is implemented with
Let us define two functions Q1 (x) and Q2 (x) that together FFs. The outputs from the positions in the circular shift register
will perform the quantization operation, as create the different correlation sequences. To represent the
2i · x qre [n] and qim [n] values, we used signed magnitude format
Q1 (x) ,
max (cre [n] , cim [n]) with dlog2 (i + 2)e + 1 bits.
6
2
R n! Coarse nct
Time
Sync.
npd
Carrier
Fine
Packet Frequency
r n! Clip
Detector Offset
Time nft
Sync.
Estimator
R n!
#R n !
Carrier
Frequency
r" n! Carrier
Frequency
Offset
Clip
Offset r "" n !
Corrector Corrector
D. High-Level Synchronizer
With each of the three individual blocks described, we
proceed to discuss how they are connected together to form (b)
the entire synchronizer and which logic can be shared between
Figure 7: STS and LTS responses: (a) packet detector and
them.
coarse timing synchronizer, (b) coarse frequency offset esti-
mate.
IV. R ESULTS
In this section, our FPGA implementation and MATLAB
simulation results are summarized.
the delayed auto-correlation of the STS and LTS symbols
for the coarse and fine CFO estimates, respectively. The
A. FPGA Implementations coarse time synchronizer is implemented by finding a drop-
In Table II, the area implementation results for the three off in the delayed auto-correlation approach and providing a
blocks and entire synchronizer are reported on the Virtex-5 sample index to the fine time synchronizer which is located
xc5vlx110t-1ff1136. Note that some of the logic is shared somewhere in the first long training symbol. The fine time
between the packet detector and carrier frequency offset es- synchronizer performs a quantized cross-correlation and se-
timator. All the blocks met the timing constraint of 20 MHz lects the maximum magnitude result as the fine sample index
and no pipelining techniques are required. estimate.
This synchronizer will be deployed as a piece of our
B. MATLAB Simulations implementation of IEEE 802.11a PHY for use in part of larger
system to deliver the Triple-Play Media Services over the
MATLAB was used to simulate the performance of our de-
wireless channel.
sign under various channel conditions. We show the responses
of the packet detector and coarse timing estimate, as well as
the coarse frequency offset estimates in Figures 7a and 7b,
ACKNOWLEDGMENTS
respectively.
The authors wish to acknowledge...
V. S UMMARY AND F UTURE W ORK
In this paper, we described our implementation of an IEEE
802.11a synchronizer on FPGA. The material presented is can A PPENDIX
be used by future designers attempting to implement the syn-
chronizer for other packet-based OFDM protocols. The packet In Table III, the quantized long training sequence symbols
detector uses the established delayed auto-correlation/average are stated. In Figure 8, the sub-blocks which are part of the
power approach. The CFO estimator computes the angle of fine time synchronizer are illustrated.
7
# $
& %
x 0 x 0 x
Re r n ! x << A x Re r n ! x << B x -x 0
-x 1 -x 1 0
Re # i n ! 1
½ 1
Re " n ! Im " n ! x
(x > 0) ? x
y
-y 0 0 (x < y) ? " #i n !
y y
Im r n ! y << B Im r n ! y << A y y
(y < 0) ? y
1 -y 1 ½ 0
Im # i n ! 0
% $ & # y
-y 1
1
(a) (b)
Figure 8: Illustration of the sub-blocks in the fine time synchronizer architecture: (a) quantized complex multiplier, (b) complex
magnitude approximation.