Beruflich Dokumente
Kultur Dokumente
AbstractWhen designing complex communication systems, the hardware-efficient implementation of the transceiver algo-
such as MIMO-OFDM transceivers, prototypes have become an rithms is an important task. On the other hand, understanding
important tool for understanding the implementation trade-offs the interaction of various algorithms and system components
and the system behavior. This paper presents a real-time FPGA
prototype for a 4-stream MIMO-OFDM transceiver capable and assessing the performance of the overall system under
of transmitting 216 Mbit/s in 20 MHz bandwidth. The paper real-world conditions is a crucial step in the development and
covers all parts of the system from RF to channel decoding verification process.
and considers both algorithm and implementation aspects. In Prototype implementations of MIMO systems are an ex-
particular, we discuss the initial parameter estimation, channel
estimation, MIMO detection, parameter tracking, and channel cellent vehicle to study these system-level and performance
decoding. FPGA implementation results are reported along aspects. The real-time prototyping approach also offers oppor-
with measurements that demonstrate the throughput of spatial tunities to identify and work around complexity-bottlenecks, to
multiplexing with four spatial streams. consider complexity/performance trade-offs, and to assess the
Index TermsMIMO, OFDM, prototype, FPGA implementa- impact of hardware implementation on system performance.
tion, measurements, 802.11n, VLSI. Hence, the engineering of fully functional systems based on
programmable devices has frequently been adopted in the past
I. I NTRODUCTION to support the development of MIMO wireless systems [13]:
The first prototype in this area has been a DSP-based, non real-
interleaver
scrambler conv. encoder puncturer mapper OFDM modulator
parser
the hardware platform used for the field programmable gate
scrambler conv. encoder puncturer mapper OFDM modulator
array (FPGA) implementation. A detailed view on the archi-
scrambler conv. encoder puncturer mapper s[t]
OFDM modulator
tecture of the real-time 4 4 MIMO-OFDM physical layer
b[l] s[k, t]
is provided in Section V. Implementation and measurement
results are reported in Sections VI and VII and conclusions (a) Transmitter
are drawn in Section VIII. OFDM demod. MIMO processing depunc. Viterbi descr.
synchronization
deinterleaver
pilot-tracker
deparser
Notation: Matrices are denoted by bold uppercase letters, OFDM demod. spatial depunc. Viterbi descr.
AGC &
soft-
MIMO
the rth row of A by Ar . Lowercase bold letters represent OFDM demod. separa-
metric
extraction
depunc. Viterbi descr.
r[t] tion
column vectors. The rth entry of vector a is written as ar . OFDM demod. depunc. Viterbi descr.
Timing and
frequency synch. Block-type 2) Autocorrelation-based approaches rely on the detection
AGC setting channel training Header Payload data
of repeating portions of a received signal. The basic
idea was first described in [25] and was further re-
antennas
Transmit
ng
bl
fined to allow frame-timing extraction in the case of
m
ni
ea
ai
Tr
Pr
inversion lemma, the corresponding inverse can be updated on- RFO and SRO Estimation after Spatial Separation: The
the-fly. To this end, the corresponding iteration is initialized computational complexity of this algorithm is lower compared
by setting to the previous algorithm since there is no need to estimate the
1 received pilot symbols according to (4). Instead, the a-priori
P(1) = I known BPSK pilot symbols s[k, t] can be used as reference.
MT 2
Hence, the calculation of rfo is simplified to
and proceeds by computing
(n+1) (n)
HH
H n
n P(n) rfo [t] = H
s[k, t] s[k, t] , (6)
P P I , (3)
1+H n P(n) H
H kK
n
and the phase offset slope induced by the SRO is obtained
where H n denotes the nth row of H. After MR iterations,
from
HH + MT 2 I)1 and G = P(MR +1) H H.
P(MR +1) = (H
The dependence on the OFDM tone index k has been omitted k (s[k, t]H s[k, t]) rfo [t]
for brevity in this paragraph. = kK
[t] 2 . (7)
k
kK
D. Algorithms for Pilot Tracking For the compensation of the impact of RFO and SRO
Initial carrier frequency offset estimates suffer from esti- the received
signals on the data-bearing
tones are multiplied
mation uncertainties which induce a residual frequency offset
with exp j(rfo [t] + [t]k) . This compensation can be
(RFO). Additionally, the sampling rate offset (SRO) between performed before or after linear detection, independently of
transmitter and receiver introduces a tone dependent frequency the chosen estimation algorithm.
shift that cannot be estimated efficiently during the preamble An in-depth discussion of the estimation and compensation
and causes a tone-dependent phase deviation5 [39]. of analog and RF impairments in MIMO-OFDM systems can
Hence, both RFO and SRO inhibit coherent detection if not be found in [23].
compensated. To perform this compensation, the impact of
these impairments must be estimated and eliminated during
E. Generating Soft-Information
the data section of the frame using a-priori known pilot
The computation of LLRs L(b | r[k, t], H[k]) for channel
constellations transmitted on a set of pilot tones (those in K).
In the receiver architecture outlined in Fig. 1(b), pilot- decoding follows the algorithm described in [40]. The basic
tracking is performed immediately after OFDM demodulation. idea is to start from the input-output relation between the
In receivers based on linear MIMO separation, however, this transmitted vector symbol s and the output s of the spatial
task can be postponed until after the MIMO detection. Both separation in (2) which can be written as
alternatives are considered in the following. I)s + Gn,
s = s + (GH
RFO and SRO Estimation after OFDM Demodulation:
During the data section of the frame, the expected received where the time and tone indices have been omitted for brevity.
pilot constellations r[k, t] are calculated for k K based on Modeling the residual interference (GH I)s as i.i.d. Gaus-
the channel estimates H[k] and the known pilot constellations sian noise, the MIMO system is partitioned into MT parallel
s[k, t] according to SISO systems. For each of these SISO systems, the effective
signal to interference plus noise ratio (SINR) that accounts for
r[k, t] = H[k]s[k, t]. (4) the thermal noise and for the residual interference from other
Now, the phase offset due to RFO can be estimated as spatial streams is given by [41]
1
i = 1. (8)
rfo [t] = H
exp j(r[k, t] r[k, t]) , (5)
MT n2 (H[k] H[k] + MT n2 I)1
H
kK i,i
where the (.) and exp[.] functions are needed to normalize With this per-stream additive Gaussian noise model and by
the contributions of the individual pilot tones to rfo . The using the well known log-sum approximation, LLRs can be
slope of the phase deviation is calculated as approximated efficiently for the qth bit in the ith stream
independently of all other streams according to
k (r[k, t]H r[k, t]) rfo [t]
i (b , si )
L(b | s, H) (9)
= kK
[t] 2 .
k
with (b , si ) = min0 | 2
si s| min1 | 2
si s| , (10)
kK s
Aq s
Aq
Note that for this algorithm to function properly, samples must
where the bit-index = (i 1)Q + q. In (10) the sets A0q and
be inserted or removed in the time-domain sample stream
A1q contain the scalar constellation points for the one and zero
if the phase slope exceeds a threshold corresponding to the
hypothesis for the qth bit associated with si . Note that due to
offset of one sample, i.e., |[t]| > 2/N . This correction is the decoupling of the data streams, the evaluation of (9) for
necessary to avoid phase wrapping due to SRO. all bits in a vector symbol merely requires the computation
5 In the system under consideration the impact of RFO and SRO are of QMT Euclidean distances in a 1-dimensional, complex-
considered to be independent of each other. valued vector space, while the evaluation of exact LLRs would
882 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 26, NO. 6, AUGUST 2008
Symbol map.
Interleaving
4xDAC
Scrambling
Puncturing
Add
Convolu-
Demux
DUC
Tx preambles
tional
Host PCI bus
FFT processor
encoding
MIMO processing and prefix Synchroni-
channel estimation zation
Deinterleaving
Descrambling
Bit-metric unit
Depuncturing
Frame timing
Pilot tracking
FIFO buffer
4xADC
MIMO processing Freq.
Demux
DDC
AGC
Rx Viterbi
Mux
offset est.
buffer decoding
and comp.
H/G mem H estim.
Fig. 5. Overview of the digital signal processing architecture of the MIMO-OFDM testbed transceiver.
Synchronization Tracking
FFT FIFO buffering FFT
Channel estimation
Preprocessing
MIMO detection
Channel decoding
FFT Est. Preprocessing Receiver
latency latency latency latency
of the digital AGC gain. The signal-power estimation for each provides the rotating phasor and four real-valued multipliers
antenna is obtained by the accumulation of 64 successive are used to compensate the carrier frequency offset for all four
squared samples, which proves to be more accurate than an receive antennas.
infinite-impulse-response (IIR) filter when operating on the
periodic preamble signal. Costly mathematical functions, such B. OFDM De/modulation
as square-root and division, which are necessary to obtain
an accurate power correction factor, are implemented in an In transmit mode, the OFDM de/modulation unit maps bi-
iteratively decomposed manner in order to reduce their impact nary data to complex-valued constellation points, computes the
on hardware complexity. superposition of all modulated tones with an IFFT transform,
and inserts the cyclic prefix. At the beginning of each frame,
The DDC in the receive path and the digital up-conversion
the de/modulation unit also outputs the preamble, whose time-
(DUC) in the transmit path are realized as polyphase finite-
domain representation is stored in RAMs instead of being
impulse-response filters with incorporated IQ-de/modulation
generated at run-time to reduce the transmit latency to a
and down-/up-sampling in order to minimize hardware cost,
minimum. In receive mode, the same unit demodulates the
avoiding unnecessary multiplications.
received OFDM symbols by means of FFT transforms. For the
Power-based frame-timing recovery is realized based on de/modulation of OFDM symbols, a 64-point I/FFT processor
the architecture described in [9]. Two multipliers are used with a single radix-4 processing element [8] is shared among
to obtain the power estimate for all four receive chains at the transmit and receive data paths. The different spatial
baseband. streams are processed in a time-interleaved fashion by the
Initial carrier frequency offset estimation and compensation same hardware.
is active during the preamble and realizes the operations The architecture of the I/FFT processor is shown in Fig. 7.
described in [25]. However, instead of a running mean, a low- The memory unit stores the complex-valued vector to be
pass filter in the form of an IIR filter reduces the memory processed. In order to provide sufficient memory bandwidth,
requirements considerably. A coordinate rotation digital cal- the storage is divided into four separate, dual-ported memory
culation (CORDIC) architecture [42] is used to obtain the banks, each holding up to 16 complex-valued data words. The
per-sample phase offset required for compensation of the bus barrel shifters multiplex the data words to the appropriate
frequency offset. One built-in 18 kbit random-access memory bank and are required to support an addressing scheme [43]
(RAM) per receive antenna stores the incoming complex- that avoids access conflicts during the computation of I/FFTs.
valued samples required to compute the auto-correlation. The The processing unit performs the arithmetic operations and
size of the RAM allows to correlate over a lag of either is based on a conventional decimation-in-time radix-4 but-
16, 32, or 64 samples in order to enhance the accuracy of terfly [44], consisting of three complex-valued multipliers
the frequency-offset estimate if necessary. A look-up table and eight complex-valued adders. A dedicated look-up table
884 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 26, NO. 6, AUGUST 2008
control
input
control unit To perform the above described algorithm on the I/FFT
coefficient generation
for channel processor, its radix-4 processing unit was extended with
estimation
constant matrix LUT FFT Twiddle LUT additional multiplexers, two complex-valued adders, and two
accumulation registers as reported in [30] and [8] to enable
processing unit for channel
estimation the execution of plain complex-valued multiplications and
multiply-accumulate operation
multiply-accumulate operations. Moreover, an additional LUT
butterfly or multiplication operation
was added to the I/FFT processor shown in Fig. 7. This
memory unit LUT stores the 2 52 matrix required to compute the initial
input output data estimates for the subcarriers in Q. With these extensions, the
bank 0
bank 1
bank 2
bank 3
SRAM
SRAM
SRAM
SRAM
output
data
bus
barrel
bus
barrel
I/FFT processor can be shared between channel estimation and
input shifter shifter OFDM de/modulation to save hardware resources.
For the MIMO case, the described sequence of operations
Fig. 7. Top-level architecture of the radix-4 I/FFT processor. is repeated for all MT MR spatial subchannels sequentially,
incurring an overall latency of 42 s for channel-estimate
refinement in a 4 4 system.
(LUT) generates the twiddle coefficients for the I/FFT trans-
forms [44]. This twiddle-LUT is modest in size and is thus
realized with random logic. The computations are orchestrated D. MIMO Processing
by a central control unit which accepts instructions and
The MIMO processing unit performs both preprocessing
generates all control signals and addresses for the RAMs and
and detection. The main reason for combining these two
LUTs.
operations is the fact that in the present architecture they are
never performed at the same time. The hardware required for
C. Channel Estimation
the preprocessing can thus be reused for the detection of data
The straightforward FDML channel estimation algorithm symbols once preprocessing is complete. To ease hardware
has a negligible hardware complexity since it involves only reuse, the moderately parallel architecture detailed in [4] has
trivial multiplications with the known BPSK training se- been chosen for the implementation of the MIMO processing
quence. Hence, the corresponding operations are implemented unit. The corresponding circuit is comprised of a circular array
on dedicated hardware directly in the MIMO processing of MT identical processing elements (PEs) and of a common
unit. The interpolation-based channel estimation refinement divider for the division in (3). The PEs are connected only
algorithm described in Section III-B, instead, has a much to their neighbors and each PE contains a complex-valued
higher complexity. However, the interpolation algorithm can multiplier, an adder and some local registers. The arithmetic
be implemented efficiently based on I/FFT operations that is pipelined with one stage which provides a reasonable
can be outsourced to the I/FFT processor in the OFDM compromise between potential for higher clock speeds and
de/modulation unit. the need for additional clock cycles due to data dependencies.
The implementation of the interpolation-based channel es- This architectural choice is suitable for both preprocessing
timation requires the following steps for each spatial subchan- and detection, requires almost no control overhead to switch
nel: First, the tones in P, which have been computed on-the-fly between these modes, and provides a balance between speed
during the demodulation of the training OFDM symbols, are and resource utilization.
multiplied with a constant 2 52 matrix to obtain estimates In this configuration the proposed circuit runs at a clock
for the two untrained tones in Q. Next, the two results of frequency of 40 MHz on the targeted Xilinx XC2V6000-6
the matrix-vector multiplication, together with the remaining FPGA. For the preprocessing, this clock frequency entails a
14 tones that are part of X , are transformed into time- delay of 2.2 s per OFDM tone and, for the detection, 40
domain by a 16-point IFFT. The result of this transformation tones can be processed in the duration of one OFDM sym-
is element-wise multiplied with a phase correcting vector. bol. Unfortunately, maintaining real-time performance for the
After (virtually) zero-padding the result to a length of 64, system under consideration requires a detection throughput of
it is transformed back into the frequency-domain, yielding 52 tones per OFDM symbol6. In addition to that, a FIFO buffer
an estimate for the remaining untrained subcarriers (those for 29 MIMO-OFDM symbols is required to store the data
in S). These estimates are concatenated with the initial FDML symbols arriving during the preprocessing latency of 114.4 s
estimates for the trained subcarriers and with the estimates for to avoid a loss of data. The chosen straightforward solution
those in Q. At this stage, an estimate for all subcarriers is to achieve real-time detection performance for the proposed
available and the correlation between channel coefficients can system is to instantiate two identical MIMO processing units
be exploited to reduce the estimation error. This requires a to process two tones in parallel. In this configuration, the
64-point IFFT, a brick-wall filter that sets to zero all elements preprocessing of 52 tones incurs a latency of 57.2 s which
of the result exceeding the length of the channel impulse reduces the size of the required FIFO buffer. Moreover, the
response, and a 64-point FFT. On the I/FFT processor, the MIMO detection can now process up to 80 instead of the
multiplication with the brick-wall mask is performed only required 52 received vectors r[k, t] per OFDM-symbol interval
virtually and does not require any clock cycles. Additionally,
the last FFT, which operates on a vector containing many 6 Detection is performed for all 48 data-bearing tones and for the 4 pilot
zeros, is optimized to skip a third of the butterfly operations. tones used for tracking as described in Section III-D.
HAENE et al.: A REAL-TIME 4-STREAM MIMO-OFDM TRANSCEIVER: SYSTEM DESIGN, FPGA IMPLEMENTATION, AND CHARACTERIZATION 885
TABLE I
so that the receiver can catch up with the incoming data stream R EQUIRED FPGA RESOURCES
after 27 OFDM symbols.
XC2V6000-6 FPGA
Block Slice %Slice Mult Ram
E. Pilot Tracking
Synchronization and 5038 17.6 17 5
The pilot tracking implements (6) and (7) to estimate tracking
the impact of RFO and SRO using the post-linear-detection FIFO memories 0 0 0 32
algorithm. To avoid the storage of an entire OFDM symbol, a OFDM de/modulation 2879 10.1 12 12
prediction of the correcting phasor based on the estimates of MIMO processing and 8847 31.0 32 10
the two last OFDM symbols is applied to the current OFDM channel estimation
symbol. A CORDIC circuit is employed to implement the Channel decoding and 9082 31.8 4 5
bit-metric unit
(.) function. For the compensation, the correcting phasor
Others 2725 9.5 0 70
is retrieved from a LUT and a complex-valued multiplier is
Total 28571 100 65 134
used to process all four streams in a time-interleaved fashion.
XC2V1000-4 FPGAs
Block Slice %Slice Mult Ram
F. Bit-Metric Unit DUC 980 19.5 20 0
The SINR-independent part (b , si ) of the bit-metric is a DDC 997 19.8 6 0
piecewise linear function that can be scaled so that the slope of AGC 940 18.7 4 0
the different segments is one of 1, 2, 3, 4. Hence, (b , si ) Others 2113 42.0 0 39
can be implemented with comparators and adders only. The Total 5030 100 30 39
product of the per-stream SINR i [k] with (b , si ) in (9) is
computed using one multiplier for each received stream. This
multiplication produces values with a dynamic range well in mapped onto the XC2V1000-4 FPGA located on the asso-
excess of the 5-bit input word width of the Viterbi decoder ciated data converter module. The required FPGA resources
used in the testbed. Fortunately, the bit error rate performance for the two FPGA designs are detailed in Table I, where
is insensitive to the quantization of the bit-metrics. Mult and Ram refer to the built-in signed 18 bit18 bit
multipliers and 18 kbit RAMs, respectively. The resources
reported under Others refer to PCI interface, debugging
G. Channel Decoding and analysis circuitry, intermodule-communication interfaces,
Per-stream convolutional coding with subsequent cross- and other functional units that are not strictly related to the
antenna interleaving, as shown in Fig. 1, allows the processing physical layer signal processing.
of the computed soft-metrics on MT parallel Viterbi decoders. In summary, the circuits for the signal processing need a
This enables the real-time decoding on Virtex-II FPGAs. The total of 64 RAMs, whereof 32 are used for the FIFO buffer
parallel decoders are implemented following the approach required to bridge the latency incurred by the MIMO prepro-
introduced in [30] and [2], where a single hardware unit cessing and channel estimation. This buffer was intentionally
processes multiple data streams in an interleaved fashion. This oversized (by at least a factor of two) to enable special debug
allows to pipeline the add-compare-select recursion which is modes that allow to access the received data directly after the
the throughput bottleneck in conventional single-steam Viterbi FFT or after the pilot tracking. Hence, a significant amount of
decoders. With this architecture, real-time performance is memory could be saved in a real-world system by eliminating
achieved with reduced FPGA resources compared to MT these debug modes.
conventional decoders. The decoders process 5-bit-wide soft- In terms of throughput, all blocks in the testbed are designed
metrics. Traceback is implemented with the register-exchange to meet the real-time requirements imposed by the 20 MHz
technique and the traceback length was selected to be 54 trellis communication bandwidth and by the 216 Mbit/s throughput
steps long. supported by the system. In terms of latency, the delay
Interleaver and deinterleaver are based on dual-ported between the start of the frame and the time when the first
RAMs that allow the concurrent storage and retrieval, accord- bits are available at the output of the receiver is dominated by
ing to the interleaving pattern, of bits (or LLRs) pertaining the 57.2 s preprocessing latency discussed in Sec. V-D. The
two different OFDM symbols. Puncturing and depuncturing use of the channel estimation refinement algorithm, described
are hardware uncritical operations that require only a corre- in Sec. III-B increases this latency by an additional 42 s. In
sponding finite state machine. Scrambling and descrambling comparison, the latency of all other blocks is insignificant.
perform a bit-wise exclusive-OR combination of the payload
data with the output of a linear feedback shift register. VII. M EASUREMENTS AND C HARACTERIZATION
A. Measurement Setup
VI. I MPLEMENTATION R ESULTS Measurements were taken with two real-time terminals
The digital signal processing blocks shown in Fig. 5 are communicating over a wideband multipath channel emulator,
integrated on a single XC2V6000-6 FPGA, with the exception as shown in Fig. 8. In place of antennas, the multi-antenna
of DUC, DDC, and digital AGC gain stages. For each antenna RF transceivers are connected directly to a MIMO channel
pair, the corresponding instances of these three blocks are emulator, which supports RF inputs and outputs. Time-varying
886 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 26, NO. 6, AUGUST 2008
TABLE II
Tx MIMO-OFDM terminal Multipath MIMO Rx MIMO-OFDM terminal
(slave) channel emulator (master) C ONSIDERED 4 4 DATA RATES Tm IN [M BIT / S ]
160 160
140 140
Average throughput [Mbps]
100 100
80 80
60 TGn-A (0 ns)
60 Flat Rayleigh Fading
TGn-B (80 ns)
TGn-C (200 ns) TGn-A, antenna spacing 100
40 40 TGn-A, antenna spacing 2
2 tap, flat PDP (50 ns)
TGn-A, antenna spacing 1
20 20 TGn-A, antenna spacing 0.5
0 0
-63 -58 -53 -48 -43 -38 -33 -28 -63 -58 -53 -48 -43 -38 -33 -28
Average receive power [dBm] Average receive power [dBm]
Fig. 9. Average throughput T for different channel models. Fig. 11. Impact of different antenna spacings on the average throughput T
achieved over a TGn-A channel.
144 Mbps
140 160
64QAM, R=1/2
Permode average throughput [Mbps]
Fig. 10. Per-mode average throughput Tm over TGn-A for the transmission Fig. 12. Impact of channel estimation refinement and soft-information
modes in Table II with coding rate Rc = 1/2. extraction on the average throughput T over a TGn-B channel.
multipliers must be increased beyond 18 bit (which is the word throughput is investigated in the following. The FPGA re-
width of the FPGA built-in multipliers) for SNRs in excess sources required to support these algorithms are reported in
of 30 dB, when operating in an uncorrelated Rayleigh-fading Table III, where the slice percentage refers to the total number
environment. These limitations explain why the measured of used slices on the XC2V6000-6 FPGA (given in Table I).
average throughputs T in Fig. 9 saturate between 120 Mbit/s
In order to assess the impact of these algorithms on
and 140 Mbit/s, even though the highest physical layer data
performance, channel estimation refinement can be skipped,
rate in Table II is much higher (216 Mbit/s).
and the extraction of soft-information can be replaced with
3) Impact of Antenna Correlation: When reducing the simpler hard-decision MIMO detection. The measured average
antenna spacing, the correlation between the received signals throughput T achieved over a TGn-B channel is shown in
increases and linear MIMO detection becomes less reliable. Fig. 12. The performance with all receiver algorithms enabled
The impact of this correlation is shown in Fig. 11 for the corresponds to the topmost curve. For comparison, the per-
particular case of TGn-A, where the antenna spacing is set formance was measured after disabling channel estimation
to 100, 2, 1, and 0.5 wavelengths. For comparison a flat refinement and once again after additionally disabling the
Rayleigh-fading channel, corresponding to the special case extraction of LLRs. The average throughput Tm achieved with
of TGn-A without any spatial correlation, is considered. As 16-QAM and Rc = 1/2 is shown in Fig. 13. From this
expected, the system suffers significantly when the antenna figure, channel estimation refinement yields an SNR gain of
spacing is reduced below one wavelength. about 2 dB. The impact of soft-information extraction, which
4) Impact of Channel Estimation Refinement and Soft- is slightly harder to determine because the curves saturate at
Output MIMO Detection: The impact of channel estimation different levels due to the SNR limitation, amounts to about
refinement and soft-information extraction on the average 2.5 dB to 3.7 dB.
8 Only the overhead related to the bit-metric unit is considered. The
In summary, the curves in Fig. 13 and the numbers in
difference in silicon area between hard-input and soft-input Viterbi decoding Table III show that with a combination of the two receiver
is not considered. algorithms, an overall SNR gain of about 4.5 dB to 5.7 dB
888 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 26, NO. 6, AUGUST 2008
can be achieved with roughly 5% increase in FPGA-resource 20 2.5 dB All receiver algorithms on
utilization. Enhanced channel estimation off
Soft information extraction off
0
VIII. C ONCLUSION -63 -58 -53 -48 -43 -38 -33 -28
A real-time MIMO-OFDM physical layer transmitting at Average receive power [dBm]
for MIMO BLAST over third-generation wireless system, IEEE J. Sel. [39] M. Speth, S. A. Fechtel, G. Fock, and H. Meyr, Optimum receiver
Areas Commun., vol. 21, pp. 440451, 2003. design for wireless broad-band systems using OFDM, IEEE Trans.
[16] H. Sampath, S. Talwar, J. Tellado, V. Erceg, and A. Paulraj, A Commun., vol. 47, no. 11, pp. 16681677, Nov. 1999.
fourth-generation MIMO-OFDM broadband wireless system: design,
[40] I. B. Collings, M. R. G. Butler, and M. McKay, Low complexity
performance, and field trial results, IEEE Commun. Mag., vol. 40, no. 9,
receiver design for MIMO bit-interleaved coded modulation, in IEEE
pp. 143149, Sep. 2002.
Int. Symp. on Spread Spectrum Techniques and Applications, 2004,
[17] C. Dubuc, D. Starks, T. Creasy, and H. Yong, A MIMO-OFDM
pp. 1216.
prototype for next-generation wireless WANs, IEEE Commun. Mag.,
[41] A. Paulraj, R. Nabar, and D. Gore, Introduction to Space-Time Wireless
vol. 42, pp. 8287, Dec. 2004.
Communications. Cambridge Univ. Press, 2003.
[18] A. van Zelst and T. C. W. Schenk, Implementation of a MIMO OFDM-
[42] B. Parhami, Computer Arithmetic, Algorithms and Hardware Design.
based wireless LAN system, IEEE Trans. Signal Processing, vol. 52,
Oxford University Press, 2000.
no. 2, pp. 483494, Feb. 2004.
[43] L. G. Johnson, Conflict free memory addressing for dedicated FFT
[19] C. Mehlfuhrer, M. Rupp, F. Kaltenberger, and G. Humer, A scalable
hardware, IEEE Trans. Circuits Syst. II, vol. 39, pp. 312316, 1992.
rapid prototyping system for real-time MIMO OFDM transmissions,
[44] E. O. Brigham, The fast Fourier transform and its applications. Prentice
in Proc. of the 2nd IEE/EURASIP Conference on DSP enabled Radio,
Hall, 1988.
Sep. 2005, pp. 714.
[45] V. Erceg et al., TGn Channel Models, IEEE 802.11 document 03/940r4.
[20] Y. Heejung, K. Myung-Soon, C. Eun-young, J. Taehyun, and L. Sok-
kyu, Design and prototype development of MIMO-OFDM for next
generation wireless lan, IEEE Trans. Consumer Electron., vol. 51, pp.
11341142, Nov. 2005.
[21] T. Haustein, A. Forck, H. Gabler, V. Jungnickel, and S. Schiffermuller, Simon Haene (S03M08) was born in Basel,
Real-time signal processing for multiantenna systems: Algorithms, op- Switzerland, in 1978. He received the Diploma
timization, and implementation on an experimental test-bed, EURASIP degree in electrical engineering from ETH Zurich,
Journal on Applied Signal Processing, 2006, Article ID 27 573. Switzerland, in 2002. He then joined the Integrated
[22] IEEE 802.11a Standard, iSO/IEC 8802-11:1999/Amd 1:2000(E). Systems Laboratory of ETH Zurich, and graduated
[23] D. Perels, Frame-based MIMO-OFDM systems: Impairment estimation with a Dr. sc. ETH degree in 2007. In the same
and compensation, Ph.D. dissertation, IIS / ETH-Zurich, Aug. 2007, year, he co-founded Celestrius, an ETH-spinoff in
advisors: Prof. W. Fichtner (ETH Zurich), Prof. H. Bolcskei (ETH the field of MIMO wireless communication. In 2000,
Zurich). he held a summer researcher position at British
[24] Universal mobile telecommunications system (UMTS); spreading and Telecom, England.
modulation (FDD) (3GPP TS 25.213 version 7.2.0 release 7), 3GPP, His research interests include the design of VLSI
Technical specification ETSI TS 125 213, May 2007. circuits, digital signal processing for wireless communication systems, and
[25] T. M. Schmidl and D. C. Cox, Robust frequency and timing synchro- FPGA-based prototyping.
nization for OFDM, IEEE Trans. Commun., vol. 45, no. 12, pp. 1613
1621, Dec. 1997.
[26] A. Fort and W. Eberle, Synchronization and AGC proposal for IEEE
802.11a burst OFDM systems, in Proc. IEEE GLOBECOM, vol. 3, David Perels (S94M97) was born on April 15th
2003, pp. 13351338. 1972 in Heidelberg, Germany. He studied electrical
[27] J. J. van de Beek, M. Sandell, and P. O. Borjesson, ML estimation engineering from 1992 to 1997 at the ETH Zurich,
of time and frequency offset in OFDM systems, IEEE Trans. Signal Switzerland, and received his Diploma degree in
Processing, vol. 45, pp. 18001805, 1997. 1997.
[28] H. Bolcskei, Blind estimation of symbol timing and carrier frequency From 1997 to 2001 he worked at Swisscom
offset in wireless OFDM systems, IEEE Trans. Commun., vol. 49, Mobile in the field of mobile communications and
pp. 988999, 2001. mobile data services. In 2001 he joined the Inte-
[29] L. Deneire, P. Vandenameele, L. van der Perre, B. Gyselinck, and grated Systems Laboratory at ETH Zurich where he
M. Engels, A low-complexity ML channel estimator for OFDM, IEEE graduated with a Dr. sc. techn. degree in 2007 in the
Trans. Commun., vol. 51, no. 2, pp. 135140, 2003. field of VLSI design for wireless communications.
[30] S. Haene, VLSI circuits for MIMO-OFDM physical layer, Ph.D. Mr. Perels is currently working at Phonak, a hearing instrument manufacturer,
dissertation, IIS / ETH-Zurich, Aug. 2007, advisors: Prof. W. Fichtner in the research and development department.
(ETH Zurich), Prof. H. Bolcskei (ETH Zurich).
[31] A. Burg, VLSI circuits for MIMO communication systems, Ph.D.
dissertation, IIS / ETH-Zurich, Feb. 2006, advisors: Prof. W. Fichtner
(ETH Zurich), Prof. M. Rupp (TU-Vienna). Andreas Burg (S97M05) was born in Munich,
[32] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, and Germany, in 1975. He received his Dipl.-Ing. de-
H. Bolcskei, VLSI implementation of MIMO detection using the gree in 2000 from the Swiss Federal Institute of
sphere decoder algorithm, IEEE J. Solid-State Circuits, vol. 40, no. 7, Technology (ETH) Zurich, Zurich, Switzerland. He
pp. 15661577, Jul. 2005. then joined the Integrated Systems Laboratory of
[33] M. Wenk, M. Zellweger, A. Burg, N. Felber, and W. Fichtner, K-best ETH Zurich, from where he graduated with the
MIMO detection VLSI architectures achieving up to 424 Mbps, in Proc. Dr. sc. techn. degree in 2006.
IEEE Int. Symp. on Circuits and Systems, May 2006. In 1998, he worked at Siemens Semiconductors,
[34] C. Studer, A. Burg, and H. Bolcskei, Soft-output sphere decoding: San Jose, CA. During his doctoral studies, he was a
Algorithms and VLSI implementation, IEEE J. Select. Areas Commun., visiting researcher with Bell Labs Wireless Research
vol. 26, no. 2, pp. 290300, Apr. 2007. for a total of one year. From 2006 to 2007, he held
[35] E. Zimmermann and G. Fettweis, Adaptive vs. hybrid iterative MIMO positions as postdoctoral researcher at the Integrated Systems Laboratory and
receivers based on MMSE linear and soft-SIC detection, in Proc. IEEE at the Communication Technology Laboratory of the ETH Zurich. In 2007
Symp. on Personal, Indoor and Mobile Radio Communications, Sep. he co-founded Celestrius, an ETH-spinoff in the field of MIMO wireless
2006, pp. 15. communication, where he is responsible for the VLSI development. His
[36] M. Borgmann and H. Bolcskei, Interpolation-based efficient matrix research interests include the design of digital VLSI circuits and systems,
inversion for MIMO-OFDM receivers, in Proc. 38th Asilomar Conf. signal processing for wireless communications, and deep submicron VLSI
on Signals, Systems, and Computers, Nov. 2004, pp. 19411947. design.
[37] G. H. Golub and C. F. Van Loan, Matrix Computations. John Hopkins In 2000, Mr. Burg received the Willi Studer Award and the ETH Medal
Univ. Press, 1996. for his diploma and his diploma thesis, respectively. Mr. Burg was also
[38] B. Hassibi, An efficient square-root algorithm for BLAST, in Proc. awarded an ETH Medal for his Ph.D. dissertation in 2006. In 2008, Dr. Burg
IEEE Int. Conf. on Acoustics, Speach, and Signal Process ing (ICASSP), was awarded a 4-years grant from the Swiss National Science Foundation
vol. 2, Jun. 2000, pp. 737740. (SNF) on which he will join the ETH Zurich as an SNF Professor in 2008.