Sie sind auf Seite 1von 5

Reduced-Complexity LLL Algorithm

for Lattice-Reduction-Aided MIMO Detection


Chun-Fu Liao and Yuan-Hao Huang
Institute of Communications Engineering and Department of Eelectrical Engineering
National Tsing-Hua University, Hsinchu, Taiwan, R.O.C. 30013.
Email: g9664505@oz.nthu.edu.tw, yhhuang@ee.nthu.edu.tw
AbstractIn this paper, we propose a low-complexity constantthroughput LLL algorithm for lattice-reduction-aided (LRA)
multi-input multi-output (MIMO) detection. The traditional LLL
algorithm for the lattice reduction has a drawback of varying
throughput due to the variable iteration loops for the sizereduction and LLL-reduction checks. To address this problem,
we propose a constant-throughput LLL (CT-LLL) algorithm that
is well suited for real-time implementation. We further propose
some techniques to reduce the redundant operations in the CTLLL algorithm so that the computational complexity can be
reduced. Simulation and analysis results show that the proposed
low-complexity CT-LLL algorithm reduces the complexity of the
CT-LLL algorithm for 44 and 88 MIMO systems to 80% and
72.94%, respectively, with negligible performance degradation.

I. I NTRODUCTION
With the evolution of the wireless communication systems,
traditional single-input single-output (SISO) transmission can
not satisfy the high data rate and spectral efficiency requirements of the next generation wireless communication
systems. To increase the transmission capacity, the multipleinput multiple-output (MIMO) system has been proposed, but
the need for a high-performance and low-complexity MIMO
detector becomes an important issue. The maximum likelihood
(ML) detector is known to be an optimal detector; however,
it is impractical for implementation owing to its great computational complexity. Addressing this problem, researchers
have proposed tree-based search algorithms, such as sphere
decoding [1] and K-Best decoding [2], to reduce the complexity with near-optimal performance, but their computational
complexities are still very high. On the other hand, linear
methods, such as zero-forcing (ZF) and minimum mean square
error (MMSE) detectors, and non-linear methods, like ordered
successive interference cancellation (OSIC) detectors, have
lower complexities, but they fail to achieve full diversity gain.
The lattice-reduction-aided (LRA) detection technique [3] has
been proposed as a solution featuring full diversity gain and
acceptable complexity. The lattice reduction (LR) transforms
the channel matrix into a more orthogonal one by finding a
better basis for the same lattice so as to improve the diversity
gain of the MIMO detector.
The Lenstra-Lenstra-Lovasz (LLL) algorithm is a wellknown LR algorithm for its polynomial execution time. However, its variable execution time is a significant problem for
real-time implementation in the MIMO Rayleigh fading channel [4]. A literature have already proposed a fixed complexity

978-1-4244-5827-1/09/$26.00 2009 IEEE

LLL algorithm [5]; however it is still no practical enough for


implementation in chip due to the lack of parallelism. As
a result, we propose a constant-throughput LLL (CT-LLL)
algorithm by simply combining the parallel LLL algorithm
[6] and effective LLL algorithm [7] to achieves constant
throughput. Furthermore, we exploit this CT-LLL algorithm
and remove the redundant operation to close the complexity
between structural LLL algorithm and iterative LLL algorithm.
The remainder of this paper is organized as follows. Section
II briefly describes the notations and system model. In Section
III, we introduce the lattice-reduction-aided MIMO detection,
the LLL algorithm and the simple CT-LLL algorithm. In Section IV, we demonstrate the computation complexity reduction
scheme for CT-LLL algorithm, and in Section V, we present
the simulation and complexity analysis results. Finally, we
summarize our conclusions in Section VI.
II. S YSTEM M ODEL
A narrow-band nr nt MIMO system consisting of nt
transmitters and nr receivers can be modeled by
y = Hx + n,
nt

(1)
nr

where x A is the transmitted signal vector; y C is


the received signal vector; H = [h1 , h2 , ..., hnt ] represents a
flat-fading channel matrix; and n C nr is the white Gaussian
noise with variance n2 . All the vectors hi are independent and
identical complex Gaussian random vectors with zero means
and unity variances. Set A consists of the constellation points
of the QAM modulation.
To reduce the cost of complex-value operations, we can reformulate the equivalent real channel matrix as follows:


(y)
yr =
= Hr xr + nr
 (y)

 

(2)
(H) (H)
(x)
(n)
=
+
.
(H) (H)
(x)
(n)
Then, the dimension of Hr becomes n m, where m = 2nt
and n = 2nr . The vectors
yr and
nr belong
to R n and



xr Am , where A = 21 a, ..., M2 1 a denotes the real


constellation points
 for the M -QAM modulation. We use the
6
for power normalization.
parameter a = M1
The QR decomposition is often applied in the preprocessing of the MIMO detection because it provides decoding efficiency. Then, the channel matrix Hr can be expressed

1451

Asilomar 2009

,1387453 GHIDXOW3 , 1  VXFKWKDW+ 45

r = Hr T has the
are integers) such that a more orthogonal H
same lattice as Hr . Then, the signal model becomes

287387457VXFKWKDW+7 45DQG
 ,QLWLDOL]H4 45 57 3

r T1 xr + nr = H
r s + nr .
y
r = Hr xr + nr = H

 N = 
 ZKLOHN < 1
 IRUS = N   ! 
5 S N

5 S S
 LI 

   

5  S  N 5  S  S
 5  S  N
7  N  7  S
 7  N 
 HQG
 HQG


 LI 5 N  N  > 5 N  N + 5 N  N

 VZDSFROXPQVN DQGN LQ5DQG7


D E
 FDOFXODWH*LYHQVURWDWLRQPDWUL[  

E D
5 N  N 
5 N  N 
D =
DQGE =
5 N  N  N 
5 N  N  N 
5 N  N  N  1
 5 N  N  N  1
4   N  N +
 4   N  N
 N
 PD[ ^N  `
 HOVH
N + 
 N 
 HQG
 HQG

Fig. 1: LLL algorithm [10].

by
Hr = Qr R r ,

(3)

where Qr R nm is an orthogonal matrix, and Rr R mm


is an upper triangular matrix. By multiplying QH
r on both sides
of (2), we can obtain
H
y
r = QH
r yr = Rr xr + Qr nr ,

(4)

where QH
r nr is white Gaussian noise that experiences a
rotation corresponding to an orthonormal matrix. This formation is applied in many MIMO detection algorithms, e.g.,
QR-based successive iterative cancellation (QR-SIC) and Kbest algorithms. In addition, a column-norm-based sorted QR
decomposition (SQRD) [8] is often employed because it not
only enhances detection performance but also reduces the
computational complexity of the lattice reduction [9].
III. L ATTICE R EDUCTION
A lattice L is defined as {t1 hr1 + t2 hr2 + ... +
tN hrN |t1 ...tN Z }, where {hr1 , ..., hrN R n } are the
basis vectors and N equals m. The LR algorithm aims to find
a unimodular matrix T (|detT| = 1 and all elements of T

(5)

Since xr Z n , T1 xr = s Z n . In real cases, the


transmitted signals do not belong to an integer set; however,
we can still transform the signals xr An into an integer set
by linear operations such as scaling and shifting.
Several lattice-reduction algorithms are described in the
literature, and the LLL algorithm is the most popular approach,
as shown in Fig. 1. In the literature, Lines 4 to 19 are often
defined as an iteration loop that can be decomposed into two
parts: 1) Lines 4 to 10 deal with the size reduction operations;
and 2) Lines 11 to 19 handle LLL reduction operations. The
number of iterations performed in the size reduction depends
on the index k, and the LLL reduction operation may increase
or decrease the index k. Thus, both of the reduction operations
result in variable throughput. This issue makes the hardware
implementation infeasible because a large memory buffer is
required to realize real-time operation if the decoding time
varies for each received signal vector.
In [5], a fixed complexity LLL algorithm is proposed; however, the structure is not suited for real-time implementation.
The characteristic of the varying channel matrix determines the
iteration number of the index k and, thus, leads to the variable
numbers of the size-reduction loops and LLL-reduction loops,
as shown in Fig. 3(a). Effective LLL algorithm [7] performs
the size reduction check of the element Rk1,k for the corresponding LLL-reduction check, as shown in Fig. 3(b). Because
SIC-based detector is often employed, we can perform weak
size reduction as suggested in [7]. Next, we try to prevent the
irregular change of the index k during the LLL reduction by
introducing a parallel processing loop n, as shown in Fig. 2.
This concept is first proposed in [6]. This allows us to execute
size-reduction and LLL-reduction checks on all even column
vector pairs or odd column vector pairs in parallel. Thus, we
define a stage as one parallel processing operation of all even
or odd pairs in loop n, as shown in Fig. 3(b). Even-pair stages
and odd-pair stages are processed alternately. As a result, this
algorithm fixes the execution time by specifying the number
of stages and achieves constant throughput for the real-time
operation. Note that the number of stages has a great impact
on both performance and computational complexity and forms
a trade-off in this algorithm. Moreover, there are still several
redundant operations that could be eliminated to further reduce
the computational complexity.
IV. C OMPLEXITY R EDUCTION FOR CT-LLL A LGORITHM
In the LLL-based lattice reduction algorithm such as fixed
complexity LLL algorithm and CT-LLL algorithm, the computation complexity always grows higher than original iterative
type LLL algorithm because there are many redundant operations in the lattice reduction algorithm. In this section, we
propose two simple techniques to reduce the computational
complexity of the CT-LLL algorithm.

1452

,1387453 GHIDXOW3 , 1  VXFKWKDW+ 45



287387457VXFKWKDW+7 45
 ,QLWLDOL]H4 45 57 367$*( GHVLUHGVWDJHQXPEHU 1 ELW59& 
 S  





 IRU V =  V <= 67$*(  V + +


1 S +  3DUDOOHO3URFHVVLQJRI
 IRU Q = S  Q <= 1 Q = Q +  
6L]H5HGXFWLRQVDQG///5HGXFWLRQV


 FOHDUWKHQWK///59&UHJLVWHU




%(5

5 Q  Q
   

5 Q  Q 
 LI 





 5 S  Q
5 S  Q  5 S  S
 7 Q 
7 Q  7 S
 HQG



 LI V <= RU  = RU Q WKRUQ + WK59&UHJLVWHULV 




 LI 5 Q  Q  > 5 Q  Q + 5 Q  Q

QR/DWWLFHUHGXFWLRQ
///
///IL[HGORRSV
&7///VWDJH 
/RZFRPSOH[LW\&7///VWDJH 



 VHWWKHQWK///59&UHJLVWHU





 VZDSFROXPQVQ DQGQ LQ5DQG7


D E
 FDOFXODWH*LYHQVURWDWLRQPDWUL[  

E D
5 Q  Q 
5 Q  Q 
D =
DQGE =
5 Q  Q  Q 
5 Q  Q  Q 




615 G%







Fig. 4: BER versus SNR of the 44 QR-SIC MIMO detectors


using different lattice-reduction algorithms.

 5 Q  Q  Q  1


5 Q  Q  Q  O  1
 4  Q   Q
4  Q   Q +
 LI Q == S 


S 
 LI S >=  S 



S + 
 HOVHS 
 HQG





 HQG
 HOVHLI Q == S 
 S 
S + 





 HQG
 HOVHLI Q == S 
%(5

 S 
S + 
 HQG
 HQG
 HQG









 IRUQ =  ! 1 )XOOVL]HUHGXFWLRQFDQEHGHOHWHGLI6,&OLNHGHWHFWRUVDUHXVHG

QR/DWWLFHUHGXFWLRQ
///
///IL[HGORRSV
&7///VWDJH 
/RZFRPSOH[LW\&7///VWDJH 

 IRUS = Q   ! 


5 S Q

5 S S
34 LI 



   



5  S  Q 5  S  S
 5  S  Q





7  Q  7  S
 7  Q 
 HQG




615 G%







Fig. 5: BER versus SNR of the 88 QR-SIC MIMO detectors


using different lattice-reduction algorithms.

 HQG
 HQG

Fig. 2: The proposed low-complexity CT-LLL algorithm.

First, we find that the LLL reduction inequality check


(|R(n 1, n 1)|2 > |R(n, n)|2 + |R(n 1, n)|2 ) requires
a huge amount of complexity. Therefore, we propose an LLL
reduction violation check technique to determine whether the
LLL reduction inequality holds according to the results of the
LLL reduction check in the previous stage without computing
the inequality. We use (N 1)-bit reduction-violation-check
(RVC) registers to store the results of the previous stages
LLL-reduction check, as shown in Fig. 3(c). If the LLL
reduction check identifies no violation in the (n 1)-th loop
and the (n + 1)-th loop in the previous stage, and if the check
identifies no size reduction in the n-th loop in the current
stage, the LLL reduction check will not present a violation
in the n-th loop in the current stage. Thus, no LLL-reduction
check is required in the n-th loop. Since there are no previous

LLL reduction checks in the first two stages, we do not apply


this prediction technique in these two stages. By successfully
identifying unnecessary LLL-reduction checks in the current
stage, we can omit a large number of LLL-reduction check
computations without any performance degradation.
Second, we find that the parallel LLL reduction operations
cause some redundant check operations because the loop-n
operation is seldom performed in the R matrix with a small
index n in the later stages. We eliminate these operations by
introducing an index register p to indicate the first execution
column of each stage. The rule for changing the index indicator
is similar to that for the k index in the original LLL algorithm
except that we always increase p by one once the p equals
2. The complete low-complexity CT-LLL algorithm for SICbased detectors is shown in Fig. 2.

1453

N 
///5HGXFWLRQ&KHFNZLWK9LRODWLRQ5HVXOW
///5HGXFWLRQ&KHFNZLWK1RQ9LRODWLRQ5HVXOW
1R///5HGXFWLRQ&KHFN

N 
5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

N 
5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

N 

5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5
N 
5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5
N 
5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

N 1
5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

(a)
)XOO6L]H
5HGXFWLRQ

V  HYHQSDLUV
S 

V  RGGSDLUV
S 

V  HYHQSDLUV
S 

V 67$*(
S 

5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

(b)
V  HYHQSDLUV
V  RGGSDLUV
S 
S 
59&  59& 

5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

V  HYHQSDLUV
V  RGGSDLUV
S 
S 
59&  59& 

5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

V 67$*(
S 
59& 

5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

5 5 5 5 5 5 5 5
 5 5 5 5 5 5 5
  5 5 5 5 5 5
   5 5 5 5 5
    5 5 5 5
     5 5 5
      5 5
       5

(c)
Fig. 3: (a) The LLL algorithm, (b) the constant-throughput LLL algorithm, and (c) the low-complexity constant-throughput
LLL algorithm.
TABLE I: Average Computational Complexity of Lattice Reduction Algorithms for 44 MIMO detection.
Algorithm
LLL (average)
LLL Fixed 40 loop
CT-LLL, Stage=11
Low-Complexity
CT-LLL, Stage=11

Addition
346.12
335.56
297.15

Multiplication
521.34
431.25
511.59

Division
75.20/188207
72.31
49.15

Square Root
5.23
5.04
5.08

Total
947.89
844.16
863

245.34(82.5%)

408.41(79.8%)

32.34(65.8%)

5.08(100%)

691.17(80%)

TABLE II: Average Computational Complexity of Lattice Reduction Algorithms for 88 MIMO detection.
Algorithm
LLL
LLL, Fixed 190 loops
CT-LLL, Stage=25
Low-Complexity
CT-LLL, Stage=25

Addition
2046.6
2044.7
1341.7

Multiplication
2709.7
2705.5
2289.3

Division
364.1
363.7
211.85

Square Root
12.17
12.16
11.92

Total
5132.6
5126.1
3854.8

1029.50(76.73%)

1665.44(72.75%)

104.73(49.44%)

11.92(100%)

2811.6(72.94%)

V. S IMULATION R ESULTS
In this section, we compare the conventional LLL algorithm,
CT-LLL algorithm and the proposed low-complexity CT-LLL
algorithm in terms of computational complexity and BER
performance. We simulate the LRA-MIMO detections based
on the MIMO system described in Section II, and we employ
sorted QR decomposition for the preprocessing in all MIMO

detectors. The LLL-reduction parameter equals 0.75, as


suggested in [11]. We set the stage number of the CT-LLL
algorithm by selecting the minimal stage number that can
generate almost the same performance as the LLL algorithm,
Thus, we employ 11 stages and 25 stages for 4 4 and 8 8
MIMO detectors respectively. To demonstrate the advantage
of the CT-LLL algorithm, we take the intuitive fixed-loop
LLL algorithm as the constant-throughput benchmark, whose

1454



MIMO systems, respectively. The proposed low-complexity


CT-LLL algorithm causes negligible performance degradation
in SIC-based MIMO detection, such as the QR-SIC and K-best
algorithms. However, for a linear detector like the ZF detector,
the full-size reduction must be preserved for maintaining the
lattice-reduction effect.







%(5



VI. C ONCLUSION





=)
/RZFRPSOH[LW\&7///VWDJH =)
&7///VWDJH =)
///=)
456,&
/RZFRPSOH[LW\&7///VWDJH 456,&
///456,&
.EHVW. 
/RZFRPSOH[LW\&7///VWDJH .EHVW. 
///.EHVW. 
















615 G%







Fig. 6: BER versus SNR of different 4 4 MIMO detectors


using different lattice-reduction algorithms.




In this paper, we propose a low-complexity, constantthroughput LLL algorithm for real-time LRA-MIMO detection. Both effective size reduction and parallel LLL reduction
can prevent the variable iteration time with approximately the
same complexity as the original LLL algorithm. Since the CTLLL algorithm yields many redundant LLL-reduction check
operations, both LLL-reduction violation check and scarce
LLL reduction indication can further reduce the complexity to
80% and 72.94% of the original LLL algorithm for 4 4 and
8 8 MIMO systems. Therefore, we believe that the proposed
low-complexity CT-LLL algorithm offers a solid basis for the
implementation of a real-time LRA-MIMO detector. We will
investigate this aspect in our future work.





R EFERENCES







%(5



=)
/RZFRPSOH[LW\&7///VWDJH =)
&7///VWDJH =)
///=)
456,&
/RZFRPSOH[LW\&7///VWDJH 456,&
///456,&
.EHVW. 
/RZFRPSOH[LW\&7///VWDJH .EHVW. 
///.EHVW. 
























615 G%







Fig. 7: BER versus SNR of different 8 8 MIMO detectors


using different lattice-reduction algorithms.

iteration-loop number corresponds to the stage number in the


CT-LLL algorithm with the same number of LLL-reduction
checks. Thus, we choose 40 and 190 loops for the respective
4 4 and 8 8 MIMO detectors in the fixed-loop LLL
algorithm. The CT-LLL algorithm is described in Section IV.A,
and the low-complexity CT-LLL algorithm employs the costreduction techniques in Section IV. Fig. 4 and Fig. 5 show
the BER performances of the LRA-QRSIC detectors for the
4 4 and 8 8 MIMO systems, respectively. Clearly, the
CT-LLL algorithm outperforms the fixed-loop LLL algorithm,
and it achieve the same BER performance as the original LLL
algorithm. Moreover, the low-complexity CT-LLL algorithm
reduces the complexity of the CT-LLL algorithm to 80% and
72.94% for 4 4 and 8 8 MIMO systems respectively, with
almost no performance degradation. The BER performances of
the lattice-reduction algorithm for different MIMO detectors
are depicted in Fig. 6 and Fig. 7 for the 4 4 and 8 8

[1] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, Closest point search in


lattice, IEEE Transactions on Information Theory, vol. 48, no. 8, pp.
22012214, Aug. 2002.
[2] M. Shabany and P. G. Gulak, The application of lattice-reduction
to the K-best algorithm for near-optimal MIMO detection, in IEEE
International Symposium on Circuits and Systems, May 2008, pp. 316
319.
[3] H. Yao and G. Wornell, Lattice-reduction-aided detectors for MIMO
communication systems, in IEEE Global Telecommunications Conference, vol. 1, Nov. 2002, pp. 424428.
[4] J. Jalden, D. Seethaler, and G. Matz, Worst-case and average-case complexity of LLL lattice reduction in MIMO wireless systems, in IEEE
International Conference on Acoustics, Speech and Signal Processing,
vol. 1, Mar. 2008, pp. 26852688.
[5] H. Vetter, V. Ponnampalam, M. Sandell, and P. A. Hoeher, Fixed
complexity LLL algorithm, IEEE Transactions on Signal Processing,
vol. 57, no. 4, pp. 16341637, Apr. 2009.
[6] G. Villard, Parallel lattice basis reduction, in International Conference
on Symbolic and Algebraic Computation, 1992, pp. 269277.
[7] C. Ling and N. Howgrave-Graham, Effective LLL reduction for lattice
decoding, in IEEE International Symposium on Information Theory,
Jun. 2007, pp. 196200.
[8] P. Luethi, A. Burg, S. Haene, D. Perels, N. Felber, and W. Fichtner,
VLSI implementation of a high-speed iterative sorted MMSE QR
decomposition, in IEEE International Symposium on Circuits and
Systems, May 2007, pp. 14211424.
[9] Y. H. Gan and W. H. MOW, Novel joint sorting and reduction
technique for delay-constrained LLL-aided MIMO detection, IEEE
Signal Processing Letters, vol. 15, pp. 194197, 2008.
[10] D. Wubben, R. Bohnke, V. Kuhn, and K.-D. Kammeyer, MMSEbased lattice-reduction for near-ML detection of MIMO systems, in
ITG Workshop on Smart Antennas, May 2004, pp. 106113.
[11] A. K. Lenstra, H. W. Lenstra, and L. Lovasz, Factoring polynomials
with rational coefficients, Math. Annalen, vol. 261, pp. 515534, 1982.

1455