Beruflich Dokumente
Kultur Dokumente
Source
Abstract—The Third Generation Partnership Project (3GPP) Outer
Modulator
N-point Subcarrier M-point
Encoder DFT mapping IDFT
Long Term Evolution (LTE) standard is becoming the appro-
priate choice to pave the way for the next generation wireless
and cellular standards. While the popular OFDM technique Channel
where Ym , Hm m
n , Dn , and N
m
denote the DFT of ym , hin ,
i m
dn , and n , respectively.
equalizers, frequency domain equalizers [5] and combined
equalizers [6]. For high ISI channels, time domain equalizers
have high complexity and become unattractive to implement. Fig. 2. The BER performance of LTE uplink receiver, 2×2 MIMO, 16-QAM.
Among frequency domain equalizers (FDE), zero-forcing FDE
(ZF-FDE) and minimum mean-square error FDE (MMSE- Sphere detection (SD) solves the complexity problem of
FDE) equalizers are the simplest ones. The MMSE-FDE ML detection with some acceptable performance loss. In [8],
equalizer has better performance than the ZF-FDE. Some the authors proposed a computationally efficient SD and list
equalizers belong to the third type. For example, the block sphere detection (LSD) to achieve near-capacity performance
MMSE equalizer [2] is a type of equalizer operating in on a MIMO system. The candidates list L was used to compute
both time and frequency domains. This equalizer can achieve the APP information for each transmitted bit xk . It is assumed
better bit error rate (BER) performance with much higher that iterative detection and decoding of bits x that correspond
algorithmic complexity. Because of the simple architecture and to one channel usage are performed. Then the LLR value of
relatively good performance, the MMSE-FDE is chosen in our the bit xk , k = 0, · · · , M ·MC −1, conditioned on the received
implementation. vector symbol y can be expressed as:
MMSE-FDE minimizes the mean square error between its
output and the symbols transmitted from the transmitter. The
1 1
equation for an MMSE-FDE is: LE (xk |y) ≈ max − 2 y − H · s2 + xT[k]·LA,[k]
2 x∈L|Xk,+1 σ
(5)
1 1
Y = (ĤH Ĥ + σ 2 I)−1 ĤH Y, (3) − max − 2 y − H · s2 + xT[k]·LA,[k] ,
2 x∈L|Xk,−1 σ
where Ĥ is an estimated channel matrix of H for each where M is the number of transmit antennas, and MC is the
subcarrier. Ĥ is the output of channel estimation module. number of bits per constellation symbol.
Based on (5), we designed an APP unit to calculate the APP
B. MIMO Detection information used by the inner detector and outer decoder with
Maximum likelihood (ML) search is the optimum detection reduced hardware complexity. This APP unit is reconfigurable
method, which minimizes the BER. This scheme assumes and can be used in different soft MIMO detectors.
an exhaustive search over the set of all possible transmitted Notice that in (5), − σ12 y − H · s2 has been calculated
symbol vectors Λ for the minimum square error given by: as a partial Euclidean distance (PED) in sphere detector.
Therefore, the APP unit can take full advantage of the soft
ŝM L = arg min ||Hs − y||2 . (4) information from the inner detector to reduce the complexity
s∈Λ
of the hardware system.
However, the complexity of full ML search is too high. Even
with modern silicon technology the full ML search is still D. Simulation Results
not feasible, especially for the MIMO detection with multiple The 2×2 MIMO receiver is verified on the Rice WARPLab
antennas and high modulation orders [7]. platform [9]. WARPLab is a scalable and extensible pro-
grammable wireless platform based on software radio to
C. LLR Function for APP Detection prototype advanced wireless networks. Signals generated in
The outer soft decoder calculates the maximum a posteriori MATLAB can be transmitted in real-time over the air using
(MAP) or a posteriori probability (APP) values. The soft APP WARP nodes. This facilitates rapid prototyping of Physical
information is exchanged between inner detector and outer layer algorithms. The goal of this simulation is to verify that
decoder, and it is used as additional a priori knowledge in our receiver system satisfies the system requirements of the
the form of a vector of log-likelihood ratio (LLR) values. The LTE standard.
magnitude of the LLR value corresponds to the reliability of Fig. 2 shows the BER performance for 2×2 MIMO receiver,
the decision. The larger the LLR is, the more reliable the where 16-QAM modulation is used. The length of DFT and
decision for a decoded bit is. IDFT are 128 and 72, respectively. The length of codeword
249
Complex
FPGA symbol Bit
CP Subcarrier LDPC stream
DFT IFFT Bit
Removal demapping decoder Flex
DEMUX
MMSE Sphere APP . Demodulator LLR
Buffer
. . . .
Detector LLR
SD
Equalizer . Unit . LLR
. . .
(SD) LLR
LLR Value
. . .
Subcarrier
. .
LDPC Pre- Computation
Computation
CP Computation
Computation
Removal
DFT
demapping IFFT decoder Processing MUX Processor
Processor
Processor
Processor
Element
Element
Partial Element
Element
(PE)
(PE)
Euclidean (PE)
(PE)
Channel Estimation Block
distance (PED)
PED
Fig. 3. The system diagram of LTE uplink receiver. This system is 2×2
MIMO system.
Fig. 4. The top level diagram of the architecture of APP unit. The soft SD
TABLE I in this architecture is Flex SD.
PARAMETERS F OR A 2×2 MIMO LTE U PLINK R ECEIVER
250
Candidate APP Unit B. APP Function Unit
memory with
feedback We use Xilinx System Generator to implement the proposed
APP Unit
APP unit architecture. The APP unit processes the input data
Sphere Soft
without LDPC in parallel, and can be extended to support higher modulation
value
detector decoder
MUX feedback buffer orders and more antennas. By replacing multiplications with
shift and addition operations, we reduce the number of multi-
Feedback pliers required and get higher maximum frequency. Table III
soft value
buffer shows the Xilinx ISE synthesis result of the APP function unit.
TABLE III
Fig. 5. The architecture of the APP unit with iterative loop. Two buffers, FPGA RESOURCE UTILIZATION SUMMARY OF THE PROPOSED APP UNIT
one candidate memory and a multiplexing have been added. FOR THE X ILINX V IRTEX -4, XC4VFX100-10FF1517 D EVICE
Number of Slices 2,393 /42,176 (5%)
Number of 4 input LUTs 4,426 /84,352 (5%)
Number of DSP48s 0 /160 (0%)
Max. Frequency 208MHz
Max. Data Rate 1.628Gbps
C. Other Block Units
Table IV and Table V show the FPGA implementation
results for sphere detector and LDPC decoder blocks, respec-
tively. All of these parts are reconfigurable. We use the Xilinx
DFT core [15] to perform DFT and IDFT operations in our
system.
TABLE IV
FPGA RESOURCE UTILIZATION SUMMARY OF IDFT BLOCK F OR THE
Fig. 6. Performance comparison between fixed-point implementation and X ILINX V IRTEX -4
floating-point for MMSE-FDE block.
Number of Slices 3,748 /42,176 (8%)
subcarriers are transmitted and 1200 subcarriers are occupied Number of 4 input LUTs 5,699 /84,352 (6%)
by symbols. 16-QAM and 2304-bit LDPC code are used. It Number of DSP48s 16 /160 (10%)
is noticeable that since the system is reconfigurable, we could Max. Frequency 234MHz
easily change the parameters of the receiver so that it could Max. Data Rate 936Mbps
support other profiles in the LTE standard.
A. MMSE-FDE TABLE V
FPGA RESOURCE UTILIZATION SUMMARY OF SPHERE DETECTOR BLOCK
The performance comparison between fixed-point imple- F OR THE X ILINX V IRTEX -4, XC4VFX100-10FF1517 D EVICE
mentation and floating-point is shown in Fig. 6. Before 30dB,
the two curves are almost the same. The performance loss Number of Slices 7,780 /42,176 (18%)
after 30dB occurs since σ 2 becomes zero and the MMSE- Number of 4 input LUTs 14,300/84,352 (16%)
FDE reduces to the less accurate ZF-FDE as mentioned in Number of DSP48s 81 /160 (50%)
Section IV. Max. Frequency 220MHz
Xilinx System Generator is used to implement the proposed Max. Data Rate 220Mbps
MMSE-FDE. Table II shows the Xilinx ISE synthesis results
of the FPGA implementation. VI. S YSTEM P ERFORMANCE AND I MPLEMENTATION
TABLE II C ONSIDERATIONS
FPGA R ESOURCE U TILIZATION S UMMARY OF T HE P ROPOSED A. System Performance
MMSE-FDE F OR THE X ILINX V IRTEX -4, XC4VFX100-10FF1517
D EVICE In our LTE uplink receiver system, the parameters are
set as below: 2 receiving antennas, 2048 subcarriers, 1200
Number of Slices 5,145/42,176 (12%)
occupied subcarriers, 16-QAM, 2304-bit LDPC for 20MHz
Number of 4 input LUTs 9,019/84,352 (10%)
channel bandwidth. By using two clock domains, with MMSE-
Number of DSP48s 64/160 (40%)
FDE, IDFT, APP unit and LDPC decoder in one slower clock
Max. Frequency 99.461MHz
domain, and sphere detector in the other faster clock domain,
Max. Data Rate 397.844Mbps the current system can achieve a data rate of up to 220Mbps.
251
This data rate is much higher than the requirement given further optimize the whole system to reduce the usage of
by LTE standard, which is 115.2Mbps for 20MHz channel hardware resources by balancing the resource usage among
bandwidth under the 2×2 16-QAM scenario [1]. different parts of the system. For example, we could reuse
some modules and match the rates of different modules.
B. Higher Data Rate Furthermore, with this platform, we will investigate more
LTE standard specifies signal transmissions in six possible complicated algorithms with potentially better performance,
channel bandwidths ranging from 1.4MHz to 20MHz [1], such as the block MMSE equalizer and more sophisticated
[3]. There are 72 occupied subcarriers available in a 1.4MHz iterative detection-decoding schemes. We will also try to
channel and 1200 occupied subcarriers available in a 20MHz increase the reconfigurability of the receiver in order that it
channel. The data rate of a 20MHz channel for a 2×2 64-QAM can be configured on the fly.
uplink system is 172Mbps. To support this configuration, the
overall architecture of our system does not need to change. We
ACKNOWLEDGMENTS
only need to configure the sphere detector and APP unit to 64-
QAM mode. Accordingly, because the size of the transmission The authors would like to thank Nokia, Nokia Siemens
signal sequence becomes larger, we should increase the size Networks (NSN), Xilinx, Azimuth Systems, and US Na-
of the buffer between blocks. tional Science Foundation (under grants CCF-0541363, CNS-
0551692, CNS-0619767, EECS-0925942 and CNS-0923479)
C. System Scalability for their support of the research.
During implementing, in order to simplify the design pro-
R EFERENCES
cess, we assume that the input data comes from one user.
[1] 3rd Generation Partnership Project, “3GPP TS 36.211 - technical spec-
However, we could extend our receiver to support multi- ification group radio access networ; evolved universal terrestrial radio
user access. As is depicted in Fig. 3, most of the blocks access (E-UTRA); physical channels and modulation (Release 8),” Nov
do not need to change except for a few modifications. The 2007.
[2] P. Radosavljevic, “Sphere detection and LDPC decoding algorithms and
first difference is, instead of using an N-point IDFT block, architectures for wireless systems,” scholarship.rice.edu, Jan 2008.
we should replicate a few IDFT blocks with small length, [3] H. G. Myung and D. J. Goodman, “Single carrier FDMA: a new air
each of which is for one user. Another modification is to interface for long term evolution,” p. 185, Jan 2008.
[4] Y. Wu, X. Zhu, and A. Nandi, “Low complexity adaptive turbo space-
add LDPC decoders for multiple users. In order to separate frequency equalization for single-carrier multiple-input multiple-output
the codeword for each user, a de-multiplexing is required systems,” IEEE Transactions on Wireless Communications, vol. 7, no. 6,
between the APP computation unit and LDPC decoder. It is pp. 2050 – 2056, Jun 2008.
[5] D. Falconer, S. Ariyavisitakul, A. Benyamin-Seeyar, and B. Eidson,
noticeable that by replicating blocks for different users, we do “Frequency domain equalization for single-carrier broadband wireless
not modify the overall architecture or redesign the function systems,” IEEE Communications Magazine, vol. 40, no. 4, pp. 58 – 66,
unit blocks. More hardware resources and more chip area are Apr 2002.
[6] R. Koetter, A. Singer, and M. Tuchler, “Turbo equalization,” IEEE Signal
required when extending the system by replicating function Processing Magazine, vol. 21, no. 1, pp. 67 – 80, Jan 2004.
blocks. However, there are opportunities to reduce the usage [7] D. Garrett, L. Davis, S. ten Brink, and B. Hochwald, “Silicon complexity
of hardware resources. For example, we could exploit the for maximum likelihood MIMO detection using spherical decoding,”
IEEE Journal of Solid-State Circuits, Jan 2004.
potential reuse of the IDFT blocks and it is probable that some [8] B. Hochwald and S. ten Brink, “Achieving near-capacity on a multiple-
blocks could share specific hardware resources. antenna channel,” IEEE Transactions on Communications, vol. 51, no. 3,
pp. 389 – 399, Mar 2003.
VII. C ONCLUSION AND F UTURE W ORK [9] “Wireless open access research platform.” [Online]. Available:
http://warp.rice.edu/
This paper proposed a flexible architecture of the high data [10] K. Amiri, C. Dick, R. Rao, and J. Cavallaro, “Novel sort-free detector
rate LTE uplink receiver, which integrates several advanced with modified real-valued decomposition (M-RVD) ordering in MIMO
systems,” 2008 IEEE Global Telecommunications Conference, pp. 1 –
algorithms and features. A single FPGA prototype of this 5, Jan 2008.
architecture is presented. It supports different numbers of [11] Y. Sun, M. Karkooti, and J. Cavallaro, “VLSI decoder architecture for
antennas and modulation orders. The prototype is imple- high throughput, variable block-size and multi-rate LDPC codes,” 2007
IEEE International Symposium on Circuits and Systems, pp. 2104 –
mented using Xilinx System Generator and is verified on the 2107, Apr 2007.
WARPLab platform with channels generated by the Azimuth [12] S. Yoshizawa, Y. Yamauchi, and Y. Miyanaga, “VLSI implementation of
channel emulator. We also verified the system in real over-the- a complete pipeline MMSE detector for a 4 x 4 MIMO-OFDM receiver,”
IEICE Transactions on Fundamentals of Electronics, Jan 2008.
air indoor channels. [13] J. Eilert, D. Wu, and D. Liu;, “Implementation of a programmable
The FPGA prototype we built can be fit in one Xilinx linear MMSE detector for MIMO-OFDM,” 2008 IEEE International
Virtex4 FX140 FPGA. It supports data rates up to 220Mbps, Conference on Acoustics, Speech and Signal Processing, pp. 5396 –
5399, Jan 2008.
which is much higher than the data rate requirement of the [14] J. Bhatia, K. Mohammed, A. Shah, and B. Daneshrad, “A practical,
LTE standard. The prototype of our LTE uplink receiver can hardware friendly MMSE detector for MIMO-OFDM-based systems,”
be configured to support different transmission bandwidths EURASIP Journal on Advances in Signal Processing, Jan 2008.
[15] “Discrete fourier transform v3.0.” [Online]. Available:
specified by the LTE standard. http://www.xilinx.com/products/ipcenter/DFT.htm
The future work is to extend our LTE uplink receiver
to support multi-access from different users. We will
252