Design, Implementation, and Evaluation of A WLAN Synchronizer On FPGA (Http://bbwizard - Com)

1
Design, Implementation, and Evaluation

of a WLAN Synchronizer on FPGA
The Authors
Abstract—Synchronization is a critical operation required by using VHDL [4] and implementation results are reported. The
a wireless local area network (WLAN) receiver. This paper performance of the architecture is evaluated under different
presents the design, implementation, and evaluation of an IEEE addative white Gaussian noise (AWGN) channel models gen-
802.11a packet synchronizer for deployment on a Xilinx Virtex-5
field-programmable gate array (FPGA) device. Packet detection, erated by MATLAB simulations. MOTIVATION SENTENCE.
carrier frequency offset estimation/correction, and time synchro- We note that although our work is for IEEE 802.11a, it can
nization are all achieved by processing samples before the fast be easily translated to other packet-oriented communication
Fourier transform (FFT) computation on the receiver. The design standards that employ OFDM, such as HYPERLAN/2 [5].
is fully synchronous, operates at 20 MHz in a single clock domain, Some publications that have contributed to WLAN syn-
and particular attention is paid to the precision of the arithmetic
computations. FPGA implementation results are reported and the chronizer design are as follows. In [6], a method for esti-
performance of the design is evaluated by simulation under dif- mating the carrier frequency offset in the frequency domain
ferent addative white Gaussian noise (AWGN) channel conditions is presented. In [7], a method for detecting a packet and
generated from MATLAB simulations. improved carrier frequency estimation technique in the time
Index Terms—IEEE 802.11a, OFDM, Wi-Fi, Xilinx Virtex-5. domain is proposed. In [8], the performance of different cross-
correlation methods for time synchronization are evaluated.
I. I NTRODUCTION In [9], a multiplierless cross-correlation strategy is presented,
where quantized versions of the training sequence samples
EEE 802.11a [1] is a widely used standard in many
I wireless local area network (WLAN) communication sys-
tems that achieves transmission rates of up to 54 Mbps and
are used, reducing the area complexity while maintaining
acceptable performance. In [10], a different quantized cross-
correlation approach is presented that relies only on shifts to
operates in the industrial, scientific, and medical (ISM) 5.8 perform the multiplications. In [11], some modifications to the
GHz unlicensed radio band. IEEE 802.11a uses orthogonal computations in [7] are presented, as well as the application of
frequency-division multiplexing (OFDM) [2] for its physical the frequency offset estimations in the top-level synchronizer
layer (PHY), and it is well known that the performance of an design. In [12], a synchronizer architecture with an automatic
OFDM system is degraded if there is any loss of orthogonality gain control (AGC) circuit is presented. Finally, a Master’s
between sub-carriers or inter-symbol interference (ISI). thesis, [13], presents the design and evaluation of a WLAN
For the very large-scale integration (VLSI) implementation synchronizer on FPGA.
of the 802.11a PHY, one of the most complicated operations The remainder of this paper is organized as follows. In
is synchronization. In the presence of channel noise, the Section II, the preliminaries are reviewed. In Section III, our
synchronizer on the receiver must be able to design is presented. In Section IV, our results are reported. In
• detect an incoming packet, Section V, the conclusion and future work are given.
• estimate and correct any carrier frequency offset, and
• perform accurate time synchronization.
II. P RELIMINARIES
To allow for the robust implementation of these three opera-
tions, IEEE 802.11a [1] specifies that all packets begin with a A. IEEE 802.11a Packet Format
preamble (training sequence) formed of periodic symbols, and The IEEE 802.11a OFDM packet format is illustrated in
to reduce the probability of ISI and increase the robustness Figure 1. For synchronization, the preamble (training se-
to multipath fading, a cyclic prefix is added to each OFDM quences) are of interest. There are two training sequences,
data symbol. Field-programmable gate arrays (FPGAs) offer the short training sequence (STS) and the long training se-
an attractive platform for OFDM baseband processing because quence (LTS). The STS consists of ten copies of a 16-sample
modern devices are inexpensive, reconfigurable, include built- sequence, while the LTS consists of a 32-sample cyclic prefix
in, high-performance digital signal processing (DSP) slices in followed by two copies of a 64-sample sequence. The received
their fabrics, and offer quick time-to-market. analog signal is sampled by a pair of ADCs operating at
In this paper, we present the design, implementation, and 20 MHz, therefore the duration of a sample is Ts = 50
evaluation of an IEEE 802.11a PHY receiver synchronizer ns, which translates to both the STS and LTS being 8 µs
VLSI architecture on a Xilinx Virtex-5 FPGA device [3]. We in length. Typically, the STS symbols are used for AGC,
present the proposed architecture in detail, including the preci- packet detection, coarse carrier frequency offset estimation,
sion of the arithmetic computations assuming two 14-bit, two’s and coarse timing synchronization, while the LTS symbols
complement analog-to-digital converters (ADCs) provide the are used for fine carrier frequency offset estimation and fine
input samples to the FPGA. The architecture is described timing synchronization [?].
2
10 # 0.8 " 8 s 1.6 ! 2 # 3.2 " 8 s 0.8 ! 3.2 " 4 s 0.8 ! 3.2 " 4 s 0.8 ! 3.2 " 4 s
OFDM OFDM
t1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t10 GI2 T1 T2 GI signal GI GI
symbol symbol
preamble field signal field payload field
Figure 1: Packet format of the IEEE 802.11a PHY.
B. Selected Xilinx Virtex-5 FPGA Device where, r̄ [n + m] denotes complex conjugate of r [n + m]. The
average power computation is defined as
The Xilinx Virtex-5 xc5vlx110t-1ff1136 device [3] is se-
s −1
NX
lected as the target FPGA for the synchronizer architecture 2
to be deployed on. In terms of available resources, the P [n] = |r [n + m]| . (2)
m=0
xc5vlx110t-1ff1136 device contains 17,280 slices, 64 DSP48E
slices, 148 dual port 36 Kb random access memory (RAM) Then, the STS of a packet can be detected when the ratio of
blocks, and 680 I/O pins. Each slice contains four look-up |R [n]|
2
tables (LUTs) and four flip flops (FFs). The DSP48E slices M [n] = 2 (3)
have an embedded signed 18-bit by 25-bit multiplier, adder, (P [n])
and accumulator. The block RAMs can be instantiated using crosses a predetermined threshold, i.e., M [n] > thpd [7].
the Xilinx primitive component RAMB36. RAMB36 instances It is known that (1) and (2) can be realized using a sliding
are dual port, highly configurable in terms of addressing window to reduce the area complexity of the computation [7].
modes, and have a maximum bus width of 36-bit per port. For the windowing implementation of (1), its form is
The numerous I/O pins contain flip-flops which can sample
R [n + 1] = R [n] − r̄ [n − Ns ] · r [n] + r̄ [n] · r [n + Ns ] (4)
the input signals or latch the output signals.
and the approach is similar for (2), i.e.,
2 2
III. M ETHODOLOGY P [n + 1] = P [n] − |r [n − Ns ]| + |r [n]| . (5)
In this section, we present the design of the packet detector, Since division is generally an expensive operation to imple-
carrier frequency offset estimator and corrector, as well as the ment in hardware, the detection metric in (3), can be realized
time synchronizer, afterward we explain how they function alternatively as
together to form the entire synchronizer. For each block, 2 2
|R [n]| > thpd · (P [n]) , n > 0. (6)
we present the related mathematics and then provide their
design for implementation on FPGA. In terms of notation, In the literature, different values are suggested for thpd , e.g.,
we let r [n] = rre [n] + irim [n] denote the n-th, 14-bit, two’s 0.5, 0.75, and 0.81 in [19], [20], and [16], respectively. In our
complement, complex-pair of samples output from the ADCs, architecture, we select thpd = 0.75 because it is determined to
and Ns = 16 and Nl = 64 denote the number of samples be a good choice by simulation and the multiplication opera-
in a STS and LTS symbol, respectively. Note that we have tion in (6) can be implemented as an addition operation with
completed the entire design of the synchronizer in the time the addends right-shifted [20]. Finally, to reduce the number
domain, i.e., before the FFT operation on the OFDM receiver, of false positives from (6), one can employ a smoothing circuit
thereby simplifying the high-level control logic on the receiver. which requires 8 of the last 32 comparisons to be true before
declaring a packet to be detected [13].
Implementation: An illustration of the implementation of
A. Packet Detector the packet detector architecture is shown in Figure 2. The top
and bottom branches realize the delayed auto-correlation (4)
The primary role of the packet detector in this synchronizer and average power (5) computations, respectively. The Ns -
is to alert the other blocks that the current samples are part cycle delay blocks are each implemented using a RAMB36,
of the STS. The packet detector architecture presented in this counter, and simple finite state machine (FSM), with the input
paper uses the popular STS delayed auto-correlation/average wired to RAMB36 port A and the delayed output on port B.
power thresholding technique proposed in [7]. Various imple- The complex multiplication (with conjugation) is implemented
mentations of this approach can be found in [12], [13], [14], using three DSP48E multipliers and five adder-subtractors as
[15], [16], [17], [18], [19], and [20].
Mathematics: We opted to implement the thresholding (are + iaim ) · (bre − ibim ) = pre + ipim ,
technique across one STS symbol, with the delayed auto- where
correlation computation defined as pre = are · bre + aim · bim
s −1
NX and
R [n] = r̄ [n + m] · r [n + m + Ns ] (1)
m=0 pim = (are + aim ) · (bre − bim ) − (are · bre − aim · bre ) .
3
+ +
16D x 16D x
-
* 16D
+
| |2
r n!
* 16D
+
-
r n! x shared with the 0

Determine
packet detector Translation
D (x > y) ? Smooth D Significant
Bit Range CORDIC
+ 1
+ y 64D x
| |2 - Average
16D
-
( )2 x
npd
r# n!
* 64D
+ coarse / fine
+
"R n !
thpd D
D
(a)
Figure 2: Illustration of the packet detector architecture.
"R n !
Register MOD 2! D
Ns
The power is computed as the squared magnitude of the Rotation

r n! r# n!
2 CORDIC
sampled complex signal, i.e., |r [n]| = rre [n]·rre [n]+rim [n]·
(b)
rim [n], and this operation requires two DSP48E multipliers
and one adder. Figure 3: Illustration of the carrier frequency offset architec-
Since the ADC samples are 14-bit, the precision of the tures: (a) estimator, (b) STS corrector.
results from the delayed auto-correlation and average power
multiplication operations are 29-bit. To prevent overflow in
the three window accumulators, the widths of the adder- Mathematics: For two identical samples on the receiver,
subtractors and registers must be selected appropriately. Con- it is known the phase difference ∆φ, is proportional to the
sidering that the largest magnitude value that a 14-bit ADC can frequency offset f , and the time difference ∆t, between the
output is −213 , the window accumulators have to be capable samples [7], i.e.,
of safely
representing the sum of 16 copies of 2 × −213 ×
−213 = +227 , i.e., +231 . Recall that to represent +231 ∆φ = 2π · ∆t · f. (7)
in two’s complement requires 33-bits. However, if one clips
maximum magnitude ADC value from −213 to −213 + 1, the If one considers r [n] to be a sample somewhere in the middle
window accumulator can be implemented using 28-bit input of the STS, then r [n] = r [n + Ns ] and (1) can be rewritten
and 32-bit output widths, resulting in hardware savings. We as
note that this clipping technique will be applied to many other s −1
NX
computations in the synchronizer. R [n] = r̄ [n + m] · r [n + m]
Assuming 32-bit outputs from all three window accumula- m=0
tors in Figure 2, we discuss the precision of the operations in s −1
NX
−j∆φ 2
(6). The output precision of the squared magnitude operation = e · |r [n + m]| .
on the delayed auto-correlation branch is 65-bit, while the m=0
output precision of the squaring operation on the average Consequently, ∆φ can be computed from
power branch is 64-bit. With the threshold multiplication is
realized as a 64-bit adder, the result is sign extended and ∆φ = ∠R [n], (8)
requires a 65-bit comparator. The final smoothing operation
then rearranging (7), substituting (8), and setting ∆t = Ns ·Ts ,
is implemented as a window accumulator (using a SRLC32E
the coarse CFO can be estimated as
to delay the input bit) and output equality tester.
∠R [n]
fc = . (9)
2π · (Ns · Ts )
B. Carrier Frequency Offset Estimator and Corrector
In [11], it is shown that of the coarse CFO estimate can be
Due to channel conditions, a carrier frequency offset (CFO) applied to the input samples as
will likely exist in the received packet samples. Similar to
r0 [n] = r [n] · e−j2πnTs fc , (10)
the approach described in [7], the architecture presented in
this paper uses the phase angle of the delayed auto-correlated and by substituting (9) into (10), one obtains
STS and LTS symbols for the computation and correction of
the coarse and fine CFO estimates, respectively. We adopt the r0 [n] = r [n] · e−jn∠R[n]/Ns . (11)
strategy presented in [11], where the coarse CFO estimate is
applied to future input samples. Then, the fine CFO estimate Since Ns = 24 , the division operation in (11) is implemented
is calculated over coarse CFO corrected samples and subse- as a simple 4-bit right shift. The multiplication operation in
quently applied to them before being passed on to the FFT. (11) is realized iteratively by additions as
Some other papers which present variations of this approach ∠R [n]
include [12], [13], [14], [17], [18], [19], and [20]. θ [n] = θ [n − 1] + , (12)
Ns
4
Table I: Translation CORDIC LogiCORE area results for

+ 0 D
different input precision values on xc5vlx110t-11f1136. 16D x
- 1
N Slice FF Slice LUT r n!
* 16D
+
# % # % | |2
x
(x > y) ?
y
shared with the
8 packet detector
D
16 x y
24
(x < y) ? ½
32
40 nct
Figure 4: Illustration of the coarse time synchronizer architec-

ture.
where θ [n − 1] = (n − 1) ∠R [n − 1] /Ns . Note that, a simi-
lar formulation follows using the LTS samples with Nl = 26
and the fine CFO estimation and correction operations are
area complexity in these blocks is the size of the CORDIC
calculated from and applied to the r0 [n] sequence, respectively.
LogiCOREs. In [20], the precision of the CFO estimator
Since the STS and LTS are not proceesed concurrently, one
CORDIC is N = 40, which would occupy approximately
can time-multiplex the CORDIC in the CFO estimator [13].
X% and Y % of the available FPGA slice LUTs and FFs,
Finally, as the angle output from the CORDIC will fluctuate
respectively (see Table I). In fact, the output precision from
over duration of STS and LTS, we propose using an averaging
the LTS delayed auto-correlation accumulator on the bottom
circuit which operates over a 16 sample duration is used to
branch in Figure 3a in this architecture is 38-bit. Therefore,
stabilize the estimate.
a 1-bit sign extension results in use of a 39-bit translation
Implementation: The implementation of the CFO estimator
CORDIC, clearly at a large hardware penalty and latency cost.
and STS corrector architectures are illustrated in Figures 3a
Now, we propose a method to significantly reduce the
and 3b, respectively. From Figure 3a, one observes that the
size of the translation CORDIC without sacrificing significant
delayed auto-correlation hardware for the STS can be shared
performance. Recall that the angle inputs for the rotation
with the packet detector architecture [13]. The angle in (9)
CORDICs in the CFO correctors are computed by the trans-
is calculated by using a translation CORDIC algorithm [21]
lation CORDIC in the CFO estimator (see Figure 3). Since
and implemented by the Xilinx IP CORDIC v4.0 LogiCORE
Ncr = 17, the number of bits accepted for the angle input in
[22]. The CFO corrector architecture is similar for the both the
the rotation CORDICs is 17, hence the minimum translation
STS and LTS, and we have illustrated it for the STS in Figure
CORDIC precision is found as
3b. The central block is a rotation CORDIC that accepts the
scaled angle from the translation CORDIC in Figure 3a. Nct = Ncr − min (log2 Ns , log2 Nl ) , + 1
The Cartesian inputs to Xilinx IP CORDIC v4.0 Logi-
COREs must be in the interval [−1.0, +1.0] and are formatted because of the division in (12). Using any amount precision
as si.fff· · · f, while the angle inputs must be in the interval greater than Nct = 14 will not influence the rotation CORDIC,
[−π, +π] and are formatted as sii.fff· · · f [22]. To guard against because the additional lower magnitude angle bits will be
Cartesian overflow, we sign extend the inputs by one bit, discarded. However, one must include a dynamic circuit which
i.e., ssi.fff· · · f, which is interpreted by the LogiCORE as will insure that the significant portions of R [n] are inputted
ss.ifff· · · f, effectively dividing the input by 2 and insuring to the translation CORDIC.
they will fall inside the acceptable interval. As the value of
θ [n] (12) will grow, and eventually fall outside of the interval
−π ≤ θ [n] ≤ +π, depending on the sign of ∠R [n], a circuit is C. Time Synchronizer
required to add ±2π when necessary to prevent angle overflow. Accurate time synchronization is required to determine the
Our baseband processing specification calls for the FFT which sample forms the starting point of OFDM frames and
operation to be completed with 16-bit precision. Consequently, to reduce the chance of ISI in the receiver. In our design,
our coarse CFO corrected samples r0 [n], output from the the role of the coarse time synchronizer is to provide the
rotation CORDIC can be 16-bit values. To accomplish this, we index of a sample located in the first LTS symbol. The fine
sign extend the 14-bit r [n] values to 15-bit to avoid Cartesian time synchronizer improves the timing estimate to give the
overflow and pad two least significant zeros to increase the fixed location of a sample in the second LTS symbol. This
precision of the inputs and the computation to 17-bit, i.e., point is used to perform the cyclic prefix removal for the
ssi.fff· · · f00. Afterward, we discard the most significant bit subsequent OFDM symbols in packet payload. For coarse time
(MSB) from the two Cartesian outputs and obtain a 16- synchronization, we employ the delayed auto-correlation drop-
bit r0 [n] result. The procedure is similar for the fine CFO off detector proposed in [11] and modified by [13] which
correction, with r00 [n] being computed from r0 [n] requiring operates on the STS. For fine time synchronization, it is known
only a 1-bit sign extension for the r0 [n] Cartesian inputs, that cross-correlation approaches typically outperform auto-
and again the MSBs are discarded on the r00 [n] output. correlations [8], however at a greater hardware cost.
Therefore, we select the precision of both CFO correction Coarse Time Mathematics: For the coarse time synchronizer
rotation CORDIC LogiCOREs to be Ncr = 17. architecture, we utilize the delayed auto-correlation hardware
We note that the dominating design parameter in terms of on the packet detector, see (1). After a packet has been
5
x D || and 
int(log2 x)
c 0! D
+2
 x>0
Q2 (x) , −2int(log2 −x) x<0,
x D ||

0 x=0

MAX
c 1! D
where i is the number of quantization levels,
nft
max (cre [n] , cim [n]) = 0.160660, and the function int
returns the closest integer. Then, define the quantized
x D || cross-correlation sequence as
c N l " 1! D q [n] , Q2 (Q1 (cre [n])) + iQ2 (Q1 (cim [n])) .
r# n!
Now, the cross-correlation in (13) can be approximated as
Figure 5: Illustration of the fine time synchronizer architecture. N
X l −1
Λ [n] ≈ q̄ [n] · r [n + m]
m=0
detected, the coarse time synchronizer waits until the value The estimate of the cross-correlated value, Λ [n] can calculated
of as
2 2
|R [n]| < thct · max |R [n]| , n > npd N l −1
X sre [n] × (rre [n + m] lre [n]) −
and declares that sample as the coarse timing estimate nct . At ReΛ [n] ≈
sim [n] × (rim [n + m] lim [n])
this point, the input samples form the LTS cyclic prefix [11]. m=0
Fine Time Mathematics: For the fine time synchronization Nl −1
operation, when the received samples are correlated with the
X sre [n] × (rim [n + m] lre [n]) +
ImΛ [n] ≈
preamble coefficients, the cross-correlation output is defined sim [n] × (rre [n + m] lim [n])
m=0
as, where s [n] = sre [n] + isim [n] and l [n] = lre [n] + ilim [n] are
N
Xl −1
computed from q̄ [n] as
Λ [n] = c̄ [m] · r [n + m] (13)  
m=0
+1, qre [n] > 0
 +1, qim [n] > 0

sre [n] = −1, qre [n] < 0 , sim [n] = −1, qim [n] < 0
where c̄ [m] denotes the conjugate of the m-th LTS sample.  
0, qre [n] = 0 0, qim [n] = 0
 
We arrive at an accurate synchronized estimate by using the
following metric and (
log2 |qre [n]|, qre =
6 0
n̂ = arg max |Λ [n]| (14) lre [n] = ,
n 0, qre = 0
where 0 ≤ n̂ ≤ Nl denotes the estimated sample in the second
(
log2 |qim [n]|, 6 0
qim =
copy of the LTS after the correlation is completed [8]. To lim [n] = .
identify n̂ in (14) from the coarse timing estimate location in 0, qim = 0
the first LTS, we perform Nl , Nl -sample cross-correlations in Note that this is different than what was reported in [10], where
parallel using different offset LTSs as Λ [n] is obtained with a single summation.
N
Xl −1 The computation of magnitude operation in (14), as com-
Λi [n] = c̄i [m] · r [n + m] (15) pleted in the packet detector requires 2 multipliers per mag-
m=0 nitude computation. In [20], the following approximation is
where c̄i [m] = c̄ [(i + m) mod Nl ] for 0 ≤ i ≤ Nl − 1. used
q
From (15), one observes that Nl complex multiplications 2 2
|Λ [n]| = (ReΛ [n]) + (ImΛ [n])
are required to compute each Λi [n] value. Our structure is a
modification of the one presented in [8] using quantized train- max (abs (ReΛ [n]) , abs (ImΛ [n]))+
≈
ing sequences in [10] to reduce the correlation multiplications min (abs (ReΛ [n]) , abs (ImΛ [n]))/2
to shift and adds. In this case, the complex-number samples which provides a considerable reduction in hardware complex-
of the LTS are mapped to real powers of two, i.e., ity.
c [n] = cre [n] + icim [n] → q [n] = qre [n] + iqim [n] Implementation: The implementation of the coarse and fine
time synchronizer architectures are illustrated in Figures 4
where and 5, respectively. Like the CFO estimator, the coarse time
qre [n] , qim [n] ∈ 0, ±20 , ±21 , · · · , ±2i . synchronizer shares a portion of its logic with the packet

detector [13]. The circular shift register is implemented with
Let us define two functions Q1 (x) and Q2 (x) that together FFs. The outputs from the positions in the circular shift register
will perform the quantization operation, as create the different correlation sequences. To represent the
2i · x qre [n] and qim [n] values, we used signed magnitude format
Q1 (x) ,
max (cre [n] , cim [n]) with dlog2 (i + 2)e + 1 bits.
6
2
R n! Coarse nct
Time
Sync.
npd
Carrier
Fine
Packet Frequency
r n! Clip
Detector Offset
Time nft
Sync.
Estimator
R n!
#R n !
Carrier
Frequency
r" n! Carrier
Frequency
Offset
Clip
Offset r "" n !
Corrector Corrector
Figure 6: Illustration of the high-level synchronizer architec-

ture.
Table II: Synchronizer implementation results on xc5vlx110t- (a)

1ff1136.
Slice Slice Timing
RAMB36 DSP48E
FF LUT (ns)
Packet Detector 123 531 4 17 29.654
Coarse Time Sync. 66 133 0 0 5.500
CFO Estimator
Coarse CFO Corr. 1276 1275 0 0 6.182
Fine CFO Corr.
Fine Time Sync.
Complete Sync.
D. High-Level Synchronizer
With each of the three individual blocks described, we
proceed to discuss how they are connected together to form (b)
the entire synchronizer and which logic can be shared between
Figure 7: STS and LTS responses: (a) packet detector and
them.
coarse timing synchronizer, (b) coarse frequency offset esti-
mate.
IV. R ESULTS
In this section, our FPGA implementation and MATLAB
simulation results are summarized.
the delayed auto-correlation of the STS and LTS symbols
for the coarse and fine CFO estimates, respectively. The
A. FPGA Implementations coarse time synchronizer is implemented by finding a drop-
In Table II, the area implementation results for the three off in the delayed auto-correlation approach and providing a
blocks and entire synchronizer are reported on the Virtex-5 sample index to the fine time synchronizer which is located
xc5vlx110t-1ff1136. Note that some of the logic is shared somewhere in the first long training symbol. The fine time
between the packet detector and carrier frequency offset es- synchronizer performs a quantized cross-correlation and se-
timator. All the blocks met the timing constraint of 20 MHz lects the maximum magnitude result as the fine sample index
and no pipelining techniques are required. estimate.
This synchronizer will be deployed as a piece of our
B. MATLAB Simulations implementation of IEEE 802.11a PHY for use in part of larger
system to deliver the Triple-Play Media Services over the
MATLAB was used to simulate the performance of our de-
wireless channel.
sign under various channel conditions. We show the responses
of the packet detector and coarse timing estimate, as well as
the coarse frequency offset estimates in Figures 7a and 7b,
ACKNOWLEDGMENTS
respectively.
The authors wish to acknowledge...
V. S UMMARY AND F UTURE W ORK
In this paper, we described our implementation of an IEEE
802.11a synchronizer on FPGA. The material presented is can A PPENDIX
be used by future designers attempting to implement the syn-
chronizer for other packet-based OFDM protocols. The packet In Table III, the quantized long training sequence symbols
detector uses the established delayed auto-correlation/average are stated. In Figure 8, the sub-blocks which are part of the
power approach. The CFO estimator computes the angle of fine time synchronizer are illustrated.
7
Table III: Quantized LTS symbols with i = 3.

n c [n] q [n] n c [n] q [n] n c [n] q [n]
0 +0.156250 − i0.000000 +23 − i0 22 −0.060310 + i0.081286 +23 − i23 44 +0.082218 − i0.092357 +23 − i23
1 −0.005121 − i0.120330 +23 − i23 23 −0.056455 − i0.021804 45 −0.131260 − i0.065227
2 +0.039750 − i0.111160 24 −0.035041 − i0.150890 46 −0.057206 − i0.039299
3 +0.096832 + i0.082798 25 −0.121890 − i0.016566 47 +0.036918 − i0.098344
4 +0.021112 + i0.027886 26 −0.127320 − i0.020501 48 +0.062500 + i0.062500
5 +0.059824 − i0.087707 27 +0.075074 − i0.074040 49 +0.119240 + i0.004096
6 −0.115130 − i0.055180 28 −0.002806 + i0.053774 50 −0.022483 − i0.160660
7 −0.038316 − i0.106170 29 −0.091888 + i0.115130 51 +0.058669 + i0.014939
8 +0.097541 − i0.025888 30 +0.091717 + i0.105870 52 +0.024476 + i0.058532
9 +0.053338 + i0.004076 31 +0.012285 + i0.097600 53 −0.136800 + i0.047380
10 +0.000989 − i0.115000 32 −0.156250 + i0.000000 54 +0.000989 + i0.115000
11 −0.136800 − i0.047380 33 +0.012285 − i0.097600 55 +0.053338 − i0.004076
12 +0.024476 − i0.058532 34 +0.091717 − i0.105870 56 +0.097541 + i0.025888
13 +0.058669 − i0.014939 35 −0.091888 − i0.115130 57 −0.038316 + i0.106170
14 −0.022483 + i0.160660 36 −0.002806 − i0.053774 58 −0.115103 + i0.055180
15 +0.119240 − i0.004096 37 +0.075074 + i0.074040 59 +0.059824 + i0.087707
16 +0.062500 − i0.062500 38 −0.127320 + i0.020501 60 +0.021112 − i0.027886
17 +0.036918 + i0.098344 39 −0.121890 + i0.016566 61 +0.096832 − i0.082798
18 −0.057206 + i0.039299 40 −0.035041 + i0.150890 62 +0.039750 + i0.111160
19 −0.131260 + i0.065227 41 −0.056455 + i0.021804 63 −0.005121 + i0.120330
20 +0.082218 + i0.092357 42 −0.060310 − i0.081286
21 +0.069557 + i0.014122 43 +0.069557 − i0.014122
# $
& %
x 0 x 0 x
Re r n ! x << A x Re r n ! x << B x -x 0
-x 1 -x 1 0
Re # i n ! 1
½ 1
Re " n ! Im " n ! x
(x > 0) ? x
y
-y 0 0 (x < y) ? " #i n !
y y
Im r n ! y << B Im r n ! y << A y y
(y < 0) ? y
1 -y 1 ½ 0
Im # i n ! 0
% $ & # y
-y 1
1
(a) (b)
Figure 8: Illustration of the sub-blocks in the fine time synchronizer architecture: (a) quantized complex multiplier, (b) complex
magnitude approximation.
R EFERENCES Transactions on Consumer Electronics, vol. 49, no. 1, pp. 107–114,

2003.
[1] IEEE Computer Society, IEEE Standard for Information technology- [10] T. Ha, S. Lee, and J. Kim, “Low-complexity Correlation System for
Telecommunications and Information exchange between systems-Local Timing Synchronization in IEEE 802.11a Wireless LANs,” in 2003
and metropolitan area networks-Specific requirements Part 11: Wireless Radio and Wireless Conference (RAWCON 2003), 2003, pp. 51–54.
LAN Medium Access Control (MAC) and Physical Layer (PHY) Speci- [11] J. Liu and J. Li, “Parameter Estimation and Error Reduction for OFDM-
fications, New York, NY, June 2007, IEEE Std 802.11-2007. Based WLANs,” IEEE Transactions on Mobile Computing, vol. 3, no. 2,
[2] R. W. Chang, “Synthesis of Band-Limited Orthogonal Signals for pp. 152–163, 2004.
Multichannel Data Transmission,” The Bell System Technical Journal, [12] V. P. G. Jimenez, M. J. F.-G. Garcia, F. J. G. Serrano, and A. G. Armada,
pp. 1775–1796, 1966. “Design and Implementation of Synchronization and AGC for OFDM-
[3] Xilinx Inc., Virtex-5 Family Overview, San Jose, CA, February 2009, based WLAN Receivers,” IEEE Transactions on Consumer Electronics,
Xilinx DS100 (v5.0). vol. 50, no. 4, pp. 1016–1025, 2004.
[4] IEEE Computer Society, IEEE Standard for VHDL Register Transfer [13] J. Pierri, “Design and Implementation of an OFDM WLAN Synchro-
Level (RTL) Synthesis, New York, NY, October 2004, IEEE Std 1076.6- nizer,” Master’s thesis, University of Waterloo, Waterloo, ON, 2007,
2004. Supervisor: A. K. Khandani.
[5] European Telecommunications Standards Institute, Broadband Radio [14] L. Schwoerer, “VLSI Suitable Synchronization Algorithms and Ar-
Access Networks (BRAN), HIPERLAN Type 2, Physical (PHY) Layer, chitecture for IEEE 802.11a Physical Layer,” in IEEE International
Sophia Antipolis Cedex, France, December 2001, ETSI TS 101 475 Symposium on Circuits and Systems (ISCAS 2002), vol. 5, 2002, pp.
V1.3.1 (2001-12). V.721–V.724.
[6] P. H. Moose, “A Technique for Orthogonal Frequency Division Multi- [15] C. Dick and F. Harris, “FPGA Implementation of an OFDM PHY,” in
plexing Frequency Offset Calculation,” IEEE Transactions on Commu- 37th Asilomar Conference on Signals, Systems, and Computers, 2003,
nications, vol. 42, no. 10, pp. 2908–2914, 1994. pp. 905–909.
[7] T. M. Schmidl and D. C. Cox, “Robust Frequency and Timing Synchro- [16] F. Manavi and Y. R. Shayan, “Implementation of OFDM modem for
nization for OFDM,” IEEE Transactions on Communications, vol. 45, the Physical Layer of IEEE 802.11a Standard Based on Xilinx Virtex-
no. 12, pp. 1613–1621, 1997. II FPGA,” in IEEE 59th Vehicular Technology Conference 2004-Spring
[8] A. Fort, J.-W. Weijers, V. Derudder, W. Eberle, and A. Bourdoux, (VTC2004-Spring), vol. 3, 2004, pp. 1768–1772.
“A Performance and Complexity Comparision of Auto-Correlation and [17] K. Wang, J. Singh, and M. Faulkner, “FPGA Implementation of an
Cross-Correlation for OFDM Burst Synchronization,” in 2003 Interna- OFDM-WLAN Synchronizer,” in Second IEEE International Workshop
tional Conference on Acoustics, Speech, and Signal Processing (ICASSP on Electronic Design, Test, and Applications (DELTA’04), 2004, pp.
2003), vol. 2, 2003, pp. II.341–II.344. 89–94.
[9] K.-W. Yip, Y.-C. Wu, and T.-S. Ng, “Design of Multiplierless Correlators [18] M. J. Canet, F. Vicedo, V. Almenar, J. Valls, and E. R. de Lima,
for Timing Synchronization in IEEE 802.11a Wireless LANs,” IEEE “Hardware Design of a FPGA-Based Synchronizer for Hiperlan/2,”
8
in 14th International Conference on Field Programmable Logic and

Applications (FPL 2004), 2004, pp. 494–504.
[19] ——, “A Common FPGA Based Synchronizer Architecture for Hiper-
lan/2 and IEEE 802.11a WLAN Systems,” in 15th IEEE International
Symposium, 2004, pp. 531–535.
[20] L. Liu, T. Cheng, Q. Xiaoyu, and Q. Jiahui, “Rsearch on Implementation
of OFDM Burst Packet Transmission on Software Radio Platform of
FPGA,” in 11th International Conference on Advanced Communication
Technology (ICACT 2009), 2009, pp. 646–650.
[21] J. Volder, “The CORDIC Trigonometic Computing Technique,” IRE
Transactions on Electronic Computers, vol. EC-8, pp. 330–334, 1959.
[22] Xilinx Inc., LogiCORE IP CORDIC v4.0, San Jose, CA, April 2009,
Xilinx DS249.

Design, Implementation, and Evaluation of A WLAN Synchronizer On FPGA (Http://bbwizard - Com)

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Design, Implementation, and Evaluation of A WLAN Synchronizer On FPGA (Http://bbwizard - Com)

Hochgeladen von

Copyright:

Verfügbare Formate

1

Design, Implementation, and Evaluation

preamble field signal field payload field

Figure 1: Packet format of the IEEE 802.11a PHY.

r n! x shared with the 0

The power is computed as the squared magnitude of the Rotation

Table I: Translation CORDIC LogiCORE area results for

Figure 4: Illustration of the coarse time synchronizer architec-

Figure 6: Illustration of the high-level synchronizer architec-

Table II: Synchronizer implementation results on xc5vlx110t- (a)

Table III: Quantized LTS symbols with i = 3.

R EFERENCES Transactions on Consumer Electronics, vol. 49, no. 1, pp. 107–114,

in 14th International Conference on Field Programmable Logic and

Das könnte Ihnen auch gefallen