Low Complexity Detection Algorithms in Large-Scale MIMO Systems

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TWC.2015.2495163, IEEE Transactions on Wireless Communications
1
Low Complexity Detection Algorithms in Large-Scale MIMO

Systems
Ali Elghariani, Member, IEEE and Michael Zoltowski, Fellow, IEEE
School of Electrical and Computer Engineering Purdue University, West Lafayette IN 47906
Email: aelghari@purdue.edu and mikedz@purdue.edu
In this contribution, we present low-complexity detection

algorithms in large-scale MIMO systems where they achieve
significantly better bit error rate (BER) performance than known
heuristic algorithms in large-scale MIMO literature, such as
Local Ascent Search (LAS) and Reactive Tabu Search (RTS)
algorithms, especially at higher-order modulations. The proposed
techniques are developed from the conventional Quadratic Programming (QP) detector. The first one is based on performing
two stages of a QP detector with a novel combination of both
interference cancellation and shadow area constraints of the
constellation. The second one is based on the Branch and Bound
search tree algorithm. The efficacy of the proposed algorithms are
investigated at various QAM modulations. Computer simulations
show that the proposed algorithms outperform LAS and RTS
algorithms in both uncoded and turbo coded BER performance,
especially at higher QAM levels, with no significant change in
complexity as the modulation level increases. Also, an extension of
the QP detector for iterative detection and decoding is developed
for the case of QPSK using a low complexity approach.
Index TermsLarge-scale MIMO, Quadratic Programming,
Two-stage Quadratic Programming, Branch and Bound, Complexity, Iterative Detection and Decoding.
I. I NTRODUCTION
A large-scale multi-input multi-output (MIMO) (or a socalled Massive MIMO) system in which a large number of
antennas is used at the transmitter and/or receiver is one
of the main components of the future 5G wireless communication systems [2]. The capacity of this MIMO system
can be scaled up by installing more antennas at the transmitter and/or receiver to fulfill the demands for high data
rate applications [3], [4], [5]. The interest in these systems
poses challenges in several design aspects, such as channel
estimation, antenna correlation, hardware implementation, and
detection complexity [6],[5]. In particular, a critical design
challenge in a large-scale MIMO system is to design a reliable
and computationally efficient detector even if the number of
antennas grows very large or the modulation order increases.
There have been many linear detectors and near-Maximum
Likelihood detectors proposed in the literature of conventional
MIMO systems; however, they become noncompetitive when
A preliminary version of this work was presented in IEEE WCNC 2015 [1],
in which only one algorithm is considered. In this paper, further algorithms
are considered with extensive simulation results.
used to serve large-scale systems. One reason is because their

computational complexity becomes exponential, such as the
case of Sphere Decoding (SD) and its variants [7], [8], [9].
Another reason is because the performance worsens as the
number of antennas increases, such as the cases of minimum
mean square error (MMSE), MMSE with ordered successive
interference cancellation (MMSE-OSIC) [10], Chase [11], QR
Decomposition combined with an M algorithm (QRDM) [12],
and Fixed Sphere Decoding (FSD) [13], [14].
Various algorithms have been presented in the literature of
large-scale MIMO that exhibit a large-system behavior where
the BER performance improves as the number of antennas
increases, such as the family of Likelihood Ascent Search
(LAS) detectors and the reactive tabu search (RTS) detectors.
LAS detectors have been proposed in [15], [16], [7] for
large-scale MIMO systems. They are based on successively
searching the local neighborhood of some good initial vectors,
such as MMSE vector. They show near-single antenna AWGN
performance, especially when hundreds of antennas are used,
with an average per-received vector complexity of O(Nt3 ),
where Nt = Nr and Nt and Nr denote the number of transmit and receive antennas, respectively. LAS detectors have
been generalized for higher order modulations; however, they
still suffer from performance deterioration as the modulation
order increases. They also require a very large number of
antennas, in the order of hundreds, to achieve near-single
antenna unfaded performance. This number increases as the
modulation level increases [16]. The RTS algorithm has also
been proposed for large-scale MIMO systems with various
QAM modulations in [17], [13], [18]. It is a heuristic-based
combinatorial optimization technique that forces the search to
visit several neighborhood solutions and then choose the ML
solution among them. It achieves near-ML performance with
much lower complexity compared to ML and SD, especially in
low-order modulations; however, its computational complexity
scales up significantly with increasing QAM levels accompanied by performance deterioration.
In this article, three potential algorithms are proposed for a
large-scale MIMO detection problem. They can provide nearsingle antenna AWGN performance with only tens of antennas
and with nearly constant average complexity over all modulation orders. The first algorithm is simply the conventional
quadratic programming detector in which the ML problem is
reformulated using a quadratic optimization problem. We show
1536-1276 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
2
in this paper that it provides better performance than the LAS

detector with no major increase in average complexity. We
also show that its complexity does not grow significantly from
a low-order to a high-order modulation. While QP detectors
have already been studied in conventional MIMO systems
[19], [20], [21], [22], [23], their performance comparisons
with existing heuristic algorithms (especially in large-scale
MIMO) have not been seriously considered. In this work, we
present the performance and complexity comparisons of QPbased detectors with existing techniques and point out that they
are among the family of detectors that exhibit large-system
behavior.
The second proposed algorithm improves the performance
of the first algorithm with a minor complexity increase. The
improvement is based on the use of two stages of QP detector
with a successive interference cancellation strategy that utilizes
a shadow area constraint [24] to measure symbols reliability.
Finally, the third algorithm uses the Branch and Bound (BB)
search tree algorithm to further improve the solution of the
conventional QP detector. In this algorithm, we do not perform
the standard BB search tree as in [25], [26]; rather, a reduced
and controlled version is proposed to provide a flexible tradeoff between performance and complexity. A few nodes are
explored in the BB tree based on two criteria: one reduces
the depth of the BB tree and the other reduces the width of
the BB tree. This idea is based on combining our previously
proposed techniques which were used in [27], [28]. Although
the complexity of this algorithm is still high when Nt is large
at all SNRs, we reduced it dramatically (although, only at
high SNR) by applying a new pruning rule based on the
difference between the cost function of the integer problem
and its relaxed problem in each node of the BB search tree.
To the best of our knowledge the two proposed algorithms
are new and have not been presented in literature before,
especially in conjunction with QP detectors or large-scale
MIMO systems. In addition to these two algorithms, the
contribution of this paper includes: (i) Reducing complexity
of the standard QP solver with no major loss in performance.
This reduction is then used in implementing the two proposed
techniques. (ii) Investigating the performance of the proposed
techniques with a more realistic MIMO channel (the spatially
correlated Kronecker Model). And (iii) presenting a low complexity method that generates soft information from QP-based
detectors that can be used to implement an iterative detection
and decoding receiver.
III. P ROPOSED A LGORITHMS

A. Formulation of the Problem
The ML problem of model (1), which is equivalent to
Euclidean distance minimization, can be expressed as:
x k22
x = argmin k
y H
where
Nt is the set of all possible Nt -dimensional complex
candidate vectors of the transmitted vector x. The equivalent
real-valued model of (1) is:
y = Hx + n
"
y=
<{
y}
={
y}
"
,x =
<{
x}
={
x}
(1)
"
,n =
}
<{n
}
={n
(3)
#
"
}
<{H
,H =
}
={H
}
={H
}
<{H
(4)
In this real-valued system model, the real part of the complex

data symbols is mapped to [x1 , . . . , xNt ] and the imaginary
part of these symbols is mapped to [xNt +1 , . . . , x2Nt ]. Now
the equivalent ML detection problem of the real model can
be written as: x = argmin k y Hx k22 , where set =
2N
x t
{ C+1, .., 1, 1, ..., C1}, and C is the QAM constellation
size. Each element of this real set can be transformed to

a positive
integer using the following linear transformation:
z = x+( 2C1) . The above ML problem can be simplified to
the following optimization problem:
1
z = arg min{ zT Q z + bT z}
(5)
z2Nt 2
where = {0, 1, 2, .., C 1}, Q = HT

H is a symmetric
positive semidefinite matrix, b = HT (y+( C 1)H1)/2, and
1 = [1, 1, . . . , 1]T is a column vector of dimension (2Nt 1).
B. Algorithm I: A QP Detector (Review)

One way to approximate the solution of (5) is to use QP
solvers that rely on relaxing the integer constraints. Thus,
problem (5) can be relaxed to the following:
z
Consider a MIMO system with Nt transmit antennas and Nr

receive antennas employing a spatial multiplexing transmission known as Vertical Bell Laboratories Layered Space-Time
(V-BLAST) [29], [10]. At the transmitter side, the information
is generated in the source and mapped to symbols of different
alphabets. The mapped complex symbols are demultiplexed
into Nt separate independent data streams with a transmitted
signal vector
x = [
x1 , . . . , x
Nt ]T CNt 1 . The general
MIMO channel model is:
(2)
x
Nt
argmin
II. S YSTEM M ODEL
x + n
y = H
where y = [
y1 , . . . , yNr ]T CNr 1 is the received signal
CNr Nt denotes the flat
vector at all Nr antennas, H
fading channel gain matrix whose entries are modeled as
represents the receiver AWGN noise vector
CN (0, 1), and n
whose entries are modeled as i.i.d CN (0, 2 ). A more realistic
MIMO channel will be considered later in section IV. The tilde
symbol in (1) is made to distinguish the complex model from
the real model shown in the next section. We assume ideal
channel estimation and synchronization at the receiver end.
subject to
1 T
z Q z + bT z
2
0 z ( C 1)1
(6)
where 0 represents a 2Nt 1 vector of all zeros and the
constraints 0 z ( C 1)1 represents the box constraints

of all elements of z, i.e. each element (symbol) of z is lower
bounded by 0 and upper bounded by C 1. This form of an

optimization problem is a convex QP minimization problem.
A unique global continuous solution z can be obtained using
efficient interior-point solvers with reduced computational
complexity [30]. The importance of using an interior-point
solver is that in practice, the interior-point algorithm converges
3
in a number of iterations that is constant, independent of

the problem dimension [31]. This becomes attractive from
a complexity point of view, especially when the number of
antennas increases. Solving (6) provides a 2Nt dimensional
]T R2Nt and a scalar cost
solution vector z = [z1 , . . . , z2N
function value f (z ). If all elements of z satisfy the integer

constraints, then z is the optimum solution for problems (5)
and (6). In general, the integer solution of (6) is provided by
quantizing z to the nearest constellation set , that is:
zi = Q[zi ],
i = 1, 2, . . . , 2Nt
(7)
where Q[.] is a quantization function to the appropriate constellation levels of the set . In the next subsections, we propose
improvements to the QP detector in a large MIMO system
through performing further analysis to the problem (6) using
first, two-stage QP detection with interference cancellation,
and second, the concept of the BB search tree [32], [25],
[33]. It is worth noting that in the previous work [22],
[23], a randomization rounding technique is shown to provide
better performance than simple rounding (as in (7)), but with
additional complexity of the order O(Nt2 ). This technique,
however, can still be used with any of our proposed algorithms.
x|J |
the ith column from the matrix H. Similarly, x is obtained

by omitting the ith element from the vector x. This can be
generalized when a set of indices that satisfy the condition
(8) by replacing the index i with the set I . To obtain the
reduced size QP after the interference cancellation step, the
same formulation procedures that are shown in section III-A
can be repeated, but with the above reduced-size ML problem.
This yields the following QP problem:
argmin
z
subject to
1 T
T z
z Q z + b
2
z ( C 1)
0
1
The idea of this algorithm is to implement two stages of

QP detection with interference cancellation to further improve
the detection of the unreliable symbols (non-integer values of
z in (6)). One drawback of Algorithm I is that all symbols
are quantized simultaneously, irrespective of their reliabilities.
Therefore, in this algorithm we use the concept of interference
cancellation with symbols reliability that is based on shadow
area constraints. A shadow area between positive integers of
the constellation set (similar to [24]) is introduced before
performing quantization in (7). Any zi that falls in this
shadow area is considered unreliable. In other words, from
the continuous solution of (6), the variables with fractions
that are far from their nearest integers by a value greater
than or equal to are considered noisy, and therefore, need
another stage of QP detection after interference cancellation
of the more reliable symbols. We denote the positions of these
unreliable symbols by the set of indices J . On the other hand,
the variables with small fractions (< ) or purely integers can
be immediately quantized and their values are considered the
optimum integer solution for both (6) and (5). Thus, their
effects need to be canceled out so that the solution of the noisy
variables can be improved. The set of indices that represent
the positions of these integer variables is denoted as I , and
can be estimated using this criterion:
(8)
where bxe is the rounding operation of x to the nearest integer.

Note that 0 < < 0.5 is a measure of how close each zi to its
nearest integer. The maximum value of is 0.5 because the
set is made up of consecutive positive numbers.
Consider, for example, that zi from (6) satisfies the condition (8), i.e. zi = bzi e while the other elements of z do
not. Thus, in order to perform an interference cancellation for
(9)
= H
TH
, b
= H
T (
where Q
y + ( C 1)H
1)/2, and
1 is a
column vector of length (2Nt |I|) and |I| is the cardinality
and b,
we further
of the set I. In order to avoid recomputing Q
simplify them in terms of Q and b as follows:
= Q(J , J )
Q
= Q(I, J )T z(I) + b(J )
b
C. Algorithm II: A Two-Stage QP Detector
I = {i : i {1, 2, . . . , 2Nt } | |zi bzi e| }
this symbol in the second QP stage, the new modified

received
vector becomes y = yxi gi , where xi = 2

zi ( C 1), and gi
represents the ith column of H. Now with this known symbol
zi , the new reduced-size ML problem can be formulated as:
x k22 , where H
= H[i] is obtained by omitting
argmin k
y H
(10)
where Q(I, J ) denotes a submatrix composed of rows I and

columns J of Q for sets I and J . Also, b(J ) is a subvector
made of elements of b corresponding to the set J .
Note that unlike the conventional successive interference
cancellation techniques [10], [34], this algorithm provides
symbol ordering that is based on the non-integral measure,
, that indicates how many reliable or unreliable received
symbols there are. The parameter in (8) is a design parameter
and needs to be optimized. When is chosen to be very
small (e.g. < 0.1), a large number of variables fall into
the second QP stage because they dont pass the condition
in (8). In this case, the interference cancellation cannot do
much in improving detection performance of the first QP,
especially at low SNR. When is large (e.g. > 0.4), most
of the symbols will pass the integer condition, even though
they might be far from their nearest integers. With this ,
interference cancellation may improve the detection of some
symbols, especially at a high SNR regime. In this algorithm,
is optimized based on both minimum BER and complexity
across various SNR using simulation experiments, since the
analytical optimization seems cumbersome. We found that the
optimum is around 0.2 to 0.3 for various QAM levels. A
summary of the Algorithm II steps is shown in Table I. Note
in the sequel, we refer to this algorithm as 2QP.
D. Algorithm III: A Controlled Size BB Search Tree
In this section, we start by a quick review for the standard
BB algorithm, then we introduce our proposed approximations
that help controll the size of the BB search tree and reduce its
computational complexity.
Branch and Bound algorithm: is a search tree-based algorithm that successively forces non-integer values of z in
(6) to be integers in a recursive way. It does so using a
4
TABLE I A Two-stage QP Algorithm

1
2
3
4
5
6
7
8
9
the subtree below this node. The initial value of the upper
bound can be taken as a very large value, such as , or can
be computed from any available integer solution, such as ZF or
MMSE solutions. And 2) as mentioned above, if the solution
of any node satisfies the integer constraints, then no branching
is needed and the node is pruned.
In this paper, we focus on the Breadth First (BF) search
strategy [36], where the nodes of the tree are explored level
by level as dipicted in Fig. 1. We prefer this strategy because
it suits well our proposed approximation herein. In general,
Input: Q, b
z = quadprog(Q, b) from (6)
Find I that satisfies k z bz e k
z(I) = Q[z (I)]
Find set of indices J
= Q(J , J ), and
Find Q
= Q(I, J )T z(I) + b(J )
b
b)
from (9)
z (J ) = quadprog(Q,
z(J ) = Q[z (J )]
search tree structure [32], [35], as shown in Fig. 1. The input

problem to the BB search tree is problem (6). Its optimum
continuous solution and cost function are denoted as z(0)
and f (z(0) ), respectively. The rest of the nodes in the BB
search tree are denoted the same way, related to their node
numbers, as depicted in Fig. 1. The basic idea of the standard
BB search tree is that it starts by solving problem (6) at
node 0 and then checks all the solution elements of z(0) .
If they satisfy the integer constraints, then there is no need to
further explore node 0 for a better solution because z(0) is the
optimum integer solution for problem (6), and also for (5) [32].
Alternatively, if they are not all integers, for example when
z(0) contains some symbols with fractions, the BB algorithm
splits the problem in node 0 into two subproblems by adding
two mutually exclusive and exhaustive constraints, as shown
in Fig. 1. The new subproblems are called children nodes,
and the original problem is called the parent node. The new
relaxed problems at nodes 1 and 2 are similar to (6) except
that the upper and lower bounds of the
variable
j branching
k
l (say
m
(0)
(0)
variable i) are replaced with zi zi
and zi zi
,
respectively. That is, problems at node 1 and node 2 of level
one can be written as:
argmin
z
subject to
argmin
z
subject to
1 T
z Q z + bT z
2
j
k
(0)
,
0 z ( C 1)1, and zi zi
(11)
1 T
z Q z + bT z
2
l
m
(0)
0 z ( C 1)1, and zi zi
(12)
where zi is called
variable at index i (0
j
kthel branching
m
(0)
(0)
i 2Nt ), and zi
( zi
) denotes the largest (smallest)
integer smaller (greater) than or equal to zi(0) . There are
various strategies for choosing the branching variable [35],
but in this paper we choose the simplest one, which branches
a node at the first non integr variable. Now solving these
new subproblems again using the interior-point algorithm,
returns (z(1) , f (z(1) )) and (z(2) , f (z(2) )) for nodes 1 and
2, respectively. If the solutions to these subproblems do not
satisfy the integer constraints, each of them will be branched
into two more subproblems and the process of branching will
continue until the optimal integer solution is found, see more
details on [33] and [35]. Two important pruning rules are used
with the BB algorithm: 1) for any node in the tree, whenever its
cost function value is greater than a known upper bound f (up) ,
this node is pruned because no better solution is expected from
Fig. 1 Representation of Breath First BB search tree

applying standard BB to (6) can lead to the ML solution, as is
shown in our previous work [27], [28]. However, our system
of interest is large-scale MIMO, where the dimension of the
problem is 2Nt . This makes the standard algorithm computationally expensive, and thus simplifications are needed.
Proposed Approximations: our proposed algorithm in this section
relies on adding the following three approximations to the
standard BB algorithm.
1) Depth reduction: Instead of finishing the search tree all
the way down until the optimum integer solution is found,
this approximation forces the BB search tree to stop at a
predefined level (layer) L, even if the optimum integer solution
has not been reached yet. We denote the number of nodes
in the stopping level, L, as mL . Thus, the solution and the
corresponding cost function values of the existing nodes in
this level are z(p)
and f (z(p)
L
L ), respectively, where p = 1, ..., mL .
Therefore, the approximated integer solution is the quantized
version of the solution corresponds to the minimum cost
function at the stopping level L:
z = Q[z(t) ],
t = argmin f (z(p) ), p = 1, ...., mL

p
(13)
This approximation is based on the concept of the standard

BB algorithm, where every time the algorithm moves down
one layer in the tree, at least one node comes closer to the
optimum integer solution due to the branching rule. In other
words, the nodes located in the path that leads to the optimum
integer solution have the following property: the absolute value
of the difference between z and its quantized version becomes
smaller and smaller. For example, in Fig. 1, assume that
the optimum integer solution found using the standard BB
algorithm is in node 14, and the path leads to this node is the
path from nodes 0, 2 , 6 and 14, then, |z(14) Q[z(14) ]|
|z(6) Q[z(6) ]| |z(2) Q[z(2) ]| |z(0) Q[z(0) ]|.
2) Width Reduction: Instead of exploring all nodes in every
level of the search tree, this approximation explors only M
5
most probable nodes that may lead to the optimum solution,

while the rest are discarded (pruned). The selection criteria is
based on the cost function as a metric. To accomplish this, we
adopt the concept of the M algorithm [37], which is a breadthfirst algorithm that is widely used in the QRDM technique for
conventional MIMO systems [38].
3) For faster simulation time and a reduced number of
visited nodes (hence fewer computations), we further propose another approximation in conjunction with the BB(L,M)
search tree. This approximation depends on the difference
between the cost function value of the relaxed problem (using
continuous solution, z ) and the cost function value of the
integer problem (using rounded solution, bz e) of any node in
the tree. The idea is that whenever this difference is small
(based on some criteria), we can approximate the relaxed
continuous solution to be the integer solution. This adds one
more pruning rule to the BB algorithm because more integer
solutions are going to be available in the tree. Hence, it reduces
the number of visited nodes significantly, especially at high
SNR. Following the same notation in this section, we denote
the optimum continuous solution of the relaxed problem of
a node k by z(k) and its objective function value as f (z(k) ),
where k = 0, 1, 2, . . . , Nv and Nv is the number of visited nodes
in the search tree. Similarly, we denote the quantized optimum
continuous solution of the same node, k, by Q[z(k) ], and its
cost function value as f (Q[z(k) ]). Thus, the approximation is:
(
z(k) =
Q[z(k) ]
z(k)
if |f (z(k) ) f (Q[z(k) ])| |f (z(k) )|

otherwise
TABLE II BB(L,M) Algorithm Summary

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Initialize node LIST = empty, and f (up) =

Insert the values of L and M
Initialize search by adding Problem (6) to the node LIST
Initialize tree level l = 0 (root node level)
while (node LIST is not empty) do
for Loop m = 1 : ml
Pick problem from node LIST ( call it problem (P (m) ))
Solve P (m) z(m) and f (z(m) ).
if f (z(m) ) > f (up) ; prune node m and delete it
from the LIST
else if f (z(m) ) f (up) , then
if z(m) is all integer or satisfies if condition in (14),
update f (up) = f (z(m) ), and z = Q[z(m) ]
else keep node problem in the node LIST , end if
end if
end for loop
if all nodes in level l are pruned, GOTO 25
else Select the first M nodes that have the minimum
f (z(m) ) in level l, and delete the rest, end if
if l = L, then z = Q[z(t) ], t = argminp f (z(p) ),
empty node LIST, then GOTO 25
else expand the selected M nodes by branching
each node prblem into two new sub-problems
Push the new sub-problems into the node LIST
Delete the original M nodes from the node LIST
end if
Set l = l + 1
end while
)
(14)
where |(.)| represents the absolute value operation, and is a

small number > 0, which can be optimized based on a tradeoff between performance and complexity. The larger the , the
lower the performance and the complexity is reduced. In the
standard BB algorithm = 0. This approximation is different
from the one in [25], which prunes the node only if its cost
value is close to the best available upper bound.
Note that in the sequel, we refer to Algorithm III as BB(L,M),
where L is the stopping level of the search tree and M is the
number of nodes maintained in each level. The summary of
BB(L,M) is shown in Table II.
E. Complexity Analysis
The main ingredient of the computations in the QP detector
is the interior-point algorithm, which finds a point where
the Karush-Kuhn-Tucker (KKT) conditions hold for the optimization problem (6) in an iterative manner. As shown in
[30] and [39], each iteration of the interior-point algorithm
boils down to solving a system of linear equations where it
is required to perform a matrix inversion in every iteration.
Therefore, the complexity of one interior-point iteration is in
the order of O(Nt3 ), and becomes O(nNt3 ) for n iterations. In
practice, the interior-point converges in a number of iterations
which is almost always a constant, independent of the problem
dimension [31]. This is very attractive in high dimensional
optimization problems. From our simulation experiments, we
found that when using the standard interior-point algorithm,
the average number of iterations required for various number
of antennas is 6, 7, 8, and 9, when the symbol mapping is

QPSK, 16QAM, 64QAM, and 256QAM, respectively. In this
work, we further reduce the number of iterations to 2, 4, 5, and
6 without major loss in performance. The idea is as follows:
since the QP detector approximates the continuous solution
provided by the interior-point algorithm, an early termination
to the interior-point algorithm can speed up the convergence
to the integer solution. The early termination, which is done
before applying quantization step in (7), can be achieved by
relaxing the tolerance constraints of the convergence.
The second algorithm requires more computations than the
first algorithm, due to the presence of the second stage of QP.
Fortunately, the problem size of the second QP is much smaller
than the first, especially for medium to high SNR and when
the parameter is optimized. This makes the computational
complexity of Algorithms I and II is nearly the same when
the number of antennas becomes large. The interior-point
algorithm in the second stage requires complexity in the order
of O(n(|J |)3 ). Therefore, the total complexity of Algorithm II
is in the order of O(nNt3 + n(|J |)3 ).
Finally, the proposed controlled-size BB algorithm needs
more computations compared to the first two algorithms
because of the computations needed in every node of the
search tree. Thus, the total complexity can be in the order
of O(Nv nNt3 ) per received vector, where Nv is the number of
visited nodes in the proposed BB search tree. In large-scale
MIMO systems, n Nt and Nv is a function of both L and M
values of the tree (approximately, from simulations, Nv LM
at low SNR, whereas Nv LM at high SNR). Therefore,
6
BB(L,M) requires nearly Nv -times the complexity of 2QP.

For various QAM modulations, the complexity of the
proposed algorithms does not change significantly. In fact,
the small variation in complexity is due to the difference
in the number of interior-point iterations required for each
modulation case. For instance, the average number of interiorpoint iterations required by 256QAM modulation is about
3 times higher than that of QPSK modulation. This is an
important advantage for the QP-based detectors compared to
other algorithms in the literature of large-scale MIMO, such
as RTS, R3TS [18], and Fixed Complexity SD [14], which
require a large variation in complexity when the modulation
order changes from low to high (e.g. it is in the order of 100
times between QPSK and 64QAM for R3TS [18], and more
than that for FSD).
As shown in [15], the complexity per received vector of
MMSE-LAS is in the order of O(Nt3 ) + O(Nt3 ); one O(Nt3 ) due
to the MMSE initial vector, and one O(Nt3 ) due to the LAS
procedures. Therefore, the extra complexity needed by QP and
2QP over MMSE-LAS arises from the number of interior-point
iterations n, of the QP detector. Moreover, BB(L,M) requires
approximately nNv -times the complexity of MMSE-LAS.
IV. S IMULATION RESULTS
In this section, we show simulation results for an uncoded
and a coded large-scale MIMO system in a block flat fading
channel with Nt = Nr for various QAM levels, assuming
perfect knowledge of channel state information at the receiver.
We refer to our proposed algorithms as QP for Algorithm
I, 2QP for Algorithm II, and BB(L,M) for Algorithm III.
We compare our proposed algorithms with other detectors
including MMSE, MMSE-OSIC [10], MMSE-LAS [15], MIVLAS [16], and RTS [17]. MIV-LAS is a LAS algorithm that
uses three initial input vectors (matched filter (MF), zero
forcing (ZF), and MMSE). Since the performance gain of a
multiple symbol update LAS algorithm [7] over MMSE-LAS
is small, we limit our comparison to MIV-LAS and MMSELAS only. For fair comparison between various detection
techniques, all implementations are done using the real system
model shown in (3).
A. Optimizing , and the Number of Iterations n
Figs. 2a and 2b demonstrate, as an example with the QPSK
modulation, that the choice of can significantly improve the
performance of the 2QP detector over the conventional QP
detector. In this example of 32 32 MIMO, it can be said
that the value of between 0.25 and 0.3 provides the best
performance over other values. For instance, when = 0.25,
2QP has a 2 dB improvement over QP at 103 BER. The
problem size of the second stage of the 2QP detector decreases
as the value of increases (see Fig. 2b), especially at high
SNR, and in general it is far below the size of the first
stage, which is 2Nt . This makes the complexity of the 2QP
detector close to the QP detector. For example, in the QPSK
case with Nt = 32 and = 0.25 at 103 BER, the average
size of the second stage of 2QP is 6 compared to 64 in the
first stage. For various QAM modulations, SNRs, and Nt ,
Fig. 2c demonstrates that the value of = 0.25 is a good

optimized value, which also corresponds to the hueristic value
of = max /2. Thus, it is used in the rest of the paper.
As we mentioned in section III-E, the main computational
burden in the QP detector comes from the interior-point
solver. We proposed to reduce its computations by forcing
the algorithm to perform early termination, thus reducing the
number of iterations, n. We performed simulation experiments
using both QP and 2QP detectors for QPSK and 16 QAM
modulations with various interior-point iterations. Figs. 3a
and 3b show that 2 and 4 iterations for QPSK and 16QAM
modulations, respectively, are the minimum numbers that
guarantee no major loss in BER performance. The same
reduction procedures were done for 64QAM and 256QAM
and the minimum number of iterations was found to be 5
for 64QAM and 6 for 256QAM. The same idea is used to
optimize the value of in the BB(L,M) algorithm for various
modulation levels, and we found that = 0.01 for QPSK,
= 0.001 for 16QAM, and = 0.0001 for both 64QAM and
256QAM. These optimized number of iterations and will be
used in the rest of the simulation experiments.
B. Uncoded BER Performance vs. SNR
We choose a relatively large number of antennas, such
as Nt = Nr = 32, to demonstrate the performance of our
techniques. In Fig. 4, we present the average uncoded BER
performance for 3232 MIMO with QPSK, 16QAM, 64QAM,
and 256QAM modulations. Fig. 4 shows that both 2QP and
BB(L,M) algorithms improve the performance of the QP
detector at all displayed SNRs and at all QAM modulations.
When comparing 2QP with BB(L,M), say BB(16,2), as in Fig.
4a, 2QP performs better than BB(16,2) in QPSK with a 0.5 dB
improvement at 103 BER and with even lower complexity.
On the other hand, 2QP steadily becomes worse than BB(16,2)
as the modulation order increases (see Figs 4b,c,d).
A more detailed simulation of the BB(L,M) algorithm
is shown in Fig. 5 for 16QAM as an example. It shows
that as L increases, the BER performance increases. For
instance, BB(4,4) outperforms BB(2,4), and BB(8,4) outperforms BB(4,4). From the same figure, it can be observed
that the diversity of the system increases with increasing L.
Increasing the width of the BB tree can also improve the
performance, such as the case of BB(16,4) over BB(16,2);
however, in some cases extending the width of BB(L,M)
does not provide improved performance, it only adds more
complexity, as shown in the same figure with the cases of
BB(16,4) and BB(16,6). Note that in this paper we did not
focus on finding the optimum values of L and M, we only
show that some pairs can be chosen as good suggestions
to demonstrate how the algorithm works, such as BB(16,2),
BB(4,4), but for large Nt , especially with higher QAM levels,
it is enough to pick L=Nt /2, and M=2 to outperform the other
existing algorithms.
Fig. 4 shows that the advantages of the QP-based detectors
come to an effect when higher order modulations are used.
From QPSK simulation shown in Fig. 4a, RTS outperforms all
of our proposed techniques, and also MMSE-LAS and MIVLAS outperform QP and BB(8,4) at certain SNRs. While on
7
10
BER
QP
2QP (=0.1)
2QP (=0.4)
4
10
2QP (=0.2)
2QP (=0.3)
2QP (=0.25)
10
12
Average received SNR (dB)
14
15
QPSK 32x32 MIMO
25
=0.1
=0.2
=0.25
=0.3
=0.4
20
15
Blue color : 16
QAM
Black color:
QPSK
Red color: 256

QAM
10
BER
Second stage problem size
QPSK MIMO 32x32
10
Nt = 32, SNR = 21 dB
Nt = 64, SNR = 21 dB
10
Nt = 64, SNR = 15 dB
Nt = 32, SNR = 15 dB
Nt = 20, SNR = 15 dB
Nt = 64, SNR = 10 dB
Nt =32, SNR = 39 dB
0
0
10
2
10
12
(a)
14 15
Nt = 32, SNR = 49 dB
0.1
0.2
(b)
0.3
0.4
0.5
(c)
Fig. 2 A Two-stage QP detector (a) QPSK BER performance (b) Problem size of the second stage (c) QAM BER performance vs.
32x32 MIMO, QP Detector
10
32x32 MIMO, 2QP algorithm
10
QP Detector
Standard IP algorithm
Avg.IP iter=2
Avg.IP iter=1
Avg.IP iter=4
Avg.IP iter=3
Twostage QP
with =0.25
QPSK
2
QPSK
10
10
16 QAM
BER
BER
16 QAM
10
Avg. IP iter =2
Avg. IP iter =1
Avg. IP iter =4
Avg. IP iter =3
10
10
15
20
25
(a)
10
15
20
25
(b)
Fig. 3 The effect of reducing interior-point iters. on the BER performance in a 32 32 MIMO system (a) QP Detector (b) Two-stage QP
detector. Standard IP is the standard interior-point algorithm
the other hand, from Figs. 4b, c, d where the modulation level
increases, QP, 2QP, and BB(L,M) steadily become superior to
RTS and LAS at all displayed SNRs. For example, in Fig.
4d, the QP detector, which provides an upper bound BER to
2QP and BB(L,M), provides 5 dB improvements over MMSELAS and 3 dB improvements over RTS at 102 BER. The
performance of RTS was improved using a hybrid of RTS and
Belief Propagation (RTS-BP) in [40], but this only achieved
a 1.6 dB improvement at 103 BER with 16QAM (see Fig.
3 in [40]), while our algorithms 2QP and BB(32,4) provide
improvements of 2 dB and 3 dB over RTS, respectively. It is
worth noting that the performance of our proposed algorithms
can be further improved when combined with the LAS or RTS
algorithms, by making the starting initial vector of LAS or
RTS to be the vector results from QP, 2QP, or BB(L,M). The
simulation results for this claim are not extensively shown
here, but two examples for QP with LAS using QPSK, and
BB(32,4) with LAS using 16QAM are shown in Figs. 4a and
5, respectively.
Figs. 6a, b, c, d present a sample of complexity computations in terms of the average number of real operations versus
Nt measured at relatively low SNR and relatively high SNR
for both 16QAM and 256QAM. The important observations
from these figures are as follows: (i) The complexity of QP
and 2QP are almost similar with the advantage of 2QP for its
superior performance. (ii) There is no significant increase in
the computational complexity of the QP and 2QP detectors
over the MMSE-LAS detector; however, their performance

is substantially improved, especially at higher QAM modulations. For example, at 256QAM 32 32 MIMO, 2QP has a 7
dB improvement over MMSE-LAS (see Fig. 4-d), while it only
requires about double the computations of MMSE-LAS. (iii)
At 256QAM modulation, 2QP requires fewer computations
than RTS, with even better performance (see Fig. 4-d). (iv)
At fixed Nt = Nr , complexity of QP, 2QP, and BB(L,M) does
not change significantly from 16QAM to 256QAM. (v) At
relatively high SNR, the difference in complexity between
BB(4,4), BB(16,2) and QP,2QP is small while at relatively
low SNR the difference is clearly noticeable. (vi) At low SNR
and low order modulation, such as 16QAM, RTS requires
less computations than BB(L,M); however at higher SNR and
higher modulation order, such as 256QAM, BB(L,M) requires
less computations. This is due to the effect of pruning rule of
(14) which becomes clear at high SNR. (vii) Even though the
complexity of RTS is close to that of BB(4,4) and BB(16,2)
at 256QAM with low SNR, the BER performance of BB(4,4)
and BB(16,2) is significantly outperforming RTS.
C. Uncoded BER Performance vs. Nt
In Figs. 7, 8, and 9, we plot an uncoded BER performance
as a function of Nt = Nr , for various detectors at an average
received SNR of 15 dB, 26 dB, and 39 dB for QPSK, 16QAM
and 256QAM, respectively. We compare the proposed algorithms against MMSE-LAS, RTS, MMSE-OSIC, and QRDM.
8
QPSK 32x32 MIMO
16QAM 32x32 MIMO
10
10
10
10
10
10
10
10
10
BER
BER
10
MMSE
MMSELAS
3 MIVLAS
QP
BB(8,4)
BB(16,2)
2QP (=0.25)
QPLAS
RTS
SISOAWGN
MMSE
MMSELAS
3 MIVLAS
RTS
QP
BB(4,4)
BB(16,2)
2QP (=0.25)
BB(32,4)
SISOAWGN
10
10
10
12
14
Average received SNR
12
16
14
16
(a)
10
10
10
BER
10
BER
10
10
28
256 QAM 32x32 MIMO
10
10
26
(b)
64QAM MIMO 32x32 BB
18
20
22
24
Avgerage received SNR (dB)
20
MMSE
MMSELAS
3 MIVLAS
RTS
QP
2QP ( =0.25)
BB(4,4)
BB(16,2)
BB(32,8)
10
MMSE
MMSELAS
RTS
QP
2QP (=0.25)
BB(16,2)
BB(32,8)
SISOAWGN
10
25
30
Average received SNR, dB
35
40
(c)
30
35
40
45
(d)
Fig. 4 Uncoded BER performance of a 32 32 MIMO (a) QPSK (b) 16QAM (c) 64QAM (d) 256QAM
16QAM 32x32 MIMO
10
MMSE
QP
BB(2,4)
BB(4,4)
BB(8,4)
BB(16,2)
BB(16,4)
BB(16,6)
BB(16,8)
BB(32,4)
BB(32,4)LAS
SISOAWGN
10
BER
10
10
10
10
10
16
18
20
22
24
26
28
Fig. 5 16QAM BER performance using BB(L,M). Improvemnt is

clear as the move to a deeper level
MF and MMSE are also plotted for reference.

In the case of QPSK, Fig. 7 shows that MMSE-LAS
provides better performance than QP and BB(4,4) at Nt 30
and Nt 40, respectively, while it is completely inferior to 2QP
at all displayed Nt . BB(L,M) can outperform MMSE-LAS if
more levels are considered in the BB(L,M) search tree, such as
the case of BB(16,2). Similarly, RTS outperforms QP, BB(4,4)
and 2QP at all considered Nt ; however, at higher values of L,
such as 16, RTS is inferior to BB(L,M) when Nt < 20. On the
other hand, as we go for higher QAMs (see Figs. 8, and 9),
our algorithms clearly outperform LAS and RTS algorithms.
An interesting result regarding the 2QP algorithm, across

various QAM modulations, is that although it requires lower
complexity than BB(L,M), it has superior performance in some
ranges of Nt . For example, in QPSK, it outperforms BB(4,4)
and BB(16,2) at Nt > 10 and Nt > 28, respectively. At higher
QAM modulations, the value of Nt at which 2QP starts to
outperform BB(L,M) is increased (see Figs. 8 and 9).
We observe a flooring behavior with respect to BB(L,M)
performance. This is due to the fact that while we increase
Nt , we keep the same depth, L, which is not enough to reduce
more errors. This effect can be reduced if L is adaptively
increasing with increasing Nt . Fig. 7 shows that this effect
is reduced when BB(16,2) is replaced by BB(2Nt ,2).
MMSE-OSIC performs well only at smaller Nt ; using
QPSK, it performs better than QP at Nt 12; using 16QAM,
it performs better than QP and 2QP at Nt 16; using
256QAM, interestingly, it performs better than QP, 2QP, and
BB(4,4) at Nt 45; however, it requires more computations.
In general, MMSE-OSIC starts to exhibit a high error floor as
Nt increases, which is in line with the results shown in [15].
The reduced complexity search tree algorithms that are
studied in conventional MIMO, such as Fixed SD (FSD)
[14], K-best SD, and QRDM, demonstrate poor performance
in large-scale MIMO systems [41]. We present here, as an
example, the performance of the QRDM algorithm for both
QPSK with M=4 and 16QAM with M=16. It can be seen that
QRDM with M equals the QAM constellation size can provide
9
computational complexity for 16QAM at 19 dB SNR
10
10
MMSE
MMSELAS
QP
2QP
BB(4,4)
BB(16,2)
RTS
10
10
10
10
Computational Complexity for 16QAM at 26dB SNR
10
20
30
40
50
Avg. # of Arithmatic Operations.
10
10
10
MMSE
MMSELAS
QP
2QP
BB(4,4)
BB(16,2)
RTS
10
10
60
10
20
30
(a)
9
Computational Complexity for 256QAM at 35 dB
10
MMSE
MMSELAS
QP
2QP
BB(4,4)
BB(16,2)
RTS
10
10
20
30
40
50
60
(c)
60
Computational Complexity for 256 at 45 dB
10
10
10
50
(b)
10
10
40
N
10
10
MMSE
MMSELAS
QP
2QP
BB(4,4)
RTS
BB(16,2)
10
10
10
20
30
40
50
60
(d)
Fig. 6 Avg. Complexity in terms of # of real operations vs. Nt (a) 16QAM at 19dB SNR (b) 16QAM at 26dB SNR (c) 256QAM at 35dB
SNR (d) 256QAM at 45dB SNR
the best performance at Nt < 10, which is the ML performance;

however, as Nt gets higher, the BER performance deteriorates
due to the fact that the QRDM reduced search space becomes
smaller than the ML search space.
D. Turbo Coded BER Performance
In this subsection, we evaluate the turbo coded BER performance of the QP-based detectors compared to MMSE,
MMSE-LAS, and RTS detectors. A 32 32 MIMO system
is examined with both 16QAM and 256QAM, and with a
rate-1/3 turbo decoder of 10 iterations. A hard decision 1
output valued vector from all detectors is fed as an input to the
turbo decoder. Performance can be improved if a soft decision
output valued vector is fed instead. Fig. 10 demonstrates that
similar to uncoded BER performance, the turbo coded BER
performance of the QP-based detectors outperform RTS and
LAS detectors as the modulation order increases. In 16QAM
turbo coded performance, RTS outperforms QP and 2QP with
about 1.5 and 0.5 dB, respectively, at 102 BER, while in
256QAM, QP and 2QP outperform RTS with 4 and 4.5 dB,
respectively. The Nt = Nr = 32 with 16QAM and rate-1/3
turbo coded corresponds to 32 1/3 4 = 42.67 bit/sec/Hz
spectral efficiency. It becomes 85.33 bit/sec/Hz when 256QAM
is used. The theoretical minimum SNR required to achieve this

capacity is shown in Fig. 10.
E. Effect of MIMO Spatial Correlation
In this section, we investigate the performance of the 2QP
detector in a more realistic MIMO channel. We adopt a
spatially correlated MIMO fading model using the Kronecker
product model [42],[43], where the complex MIMO channel
matrix can be written as:
1/2
= R1/2
H
r Aiid Rt
(15)
where Rr and Rt are the correlation matrices for the receive antennas and transmit antennas, respectively, while Aiid
represents an i.i.d. (independent and identically distributed)
Rayleigh fading channel matrix. This model assumes that
the fading statistics of the transmit and receive arrays are
independent. In this paper, the correlation matrices of the
signals at both the transmit and receive sides are computed
based on the distance between antenna elements [44], [45].
Also, this model does not take into account the structure of
the scattering environment between transmitter and receiver.
The BER performance of the 2QP detector is only considered here for illustration. In this simulation, we consider a 16
10
QPSK MIMO , SNR = 15 dB
MF
MMSE
QRDM, M=4
MMSEOSIC
MMSELAS
QP
BB(4,4)
2QP (=0.25)
BB(16,2)
BB(2Nt ,2)
RTS
BER
10
10
MMSE
MMSELAS
QP
2QP
RTS
10
BER
10
MIMO 32x32 1/3 rate Turbo Coded
10
10
16QAM
256QAM
10
3.25
dB
9.75
dB
10
10
5
10
10
10
20
30
40
# of Antennas
50
Fig. 7 QPSK BER performance vs. Nt at SNR=15 dB.

16QAM MIMO, SNR =26 dB
10
10
BER
10
10
60
10
Fig. 10 16QAM and 256QAM turbo coded BER performance with
16QAM MIMO 2QP Algorithm
10
10
10
Uncoded 16x16(spatial corr. fading)

Rate1/3 turbo coded 16x16(spatial corr. fading)
Uncoded 16x16(iid fading)
Rate1/3 turbo coded 16x16 (iid fading)
Uncoded 16x17 (spatial corr. fading)
Rate1/3 turbo coded 16x17 (spatial corr. fading)
Uncoded 16x17 (iid fading)
Rate1/3 turbo coded 16x17 (iid fading)
Uncoded SISO AWGN
10
30
40
# of Antennas
50
Fig. 8 16QAM BER performance vs. Nt at SNR=26 dB
BER
10
10
MF
MMSE
MMSEOSIC
MMSELAS
2QP ( =0.25)
QP
BB(4,4)
BB(16,2)
RTS
10
10
256QAM MIMO ,SNR =39 dB
10
60
BER
20
40
the difference in performance at 102 uncoded BER is 1 dB

compared to 6 dB in 1616 scenario, whereas with turbo coded
performance the difference reduces to 0.6 dB.
10
35
a rate-1/3 32 32 MIMO system
MF
MMSE
MMSELAS
MMSEOSIC
QRDM, M=16
QP
BB(4,4)
BB(16,2)
2QP (=0.25)
RTS
10
10
15
20
25
30
Avergae received SNR, dB
10
15
20
25
30
35
40
Fig. 11 Uncoded/coded BER performance of a 2QP detector in i.i.d.

fading as well as in correlated MIMO fading for both 16 16 and
16 17 cases
10
10
V. I TERATIVE D ETECTION AND D ECODING USING A QP

D ETECTOR
10
20
40
60
80
100
# of Antennas
Fig. 9 256QAM BER performance vs. Nt at SNR=39 dB
16 MIMO system using 16QAM modulation for both iid fading
and spatially correlated fading. The distances between antenna

elements is taken to be 0.4 (mild correlation scenario). The
effect of spatial correlation is examined for both uncoded and
rate-1/3 turbo coded BER performance. Fig. 11 shows that
there is a clear performance loss when using correlated fading.
For instance, at 102 uncoded BER performance, 2QP with
correlated fading experiences degradation by 6 dB compared to
iid fading, while with the turbo coded BER performance, 2QP
with correlated fading exhibits degradation of 4 dB compared
to iid fading. To alleviate the degradation from correlation,
we increased the dimension of the receive array, similar to
the work in [7]. Fig. 11 shows that increasing the number of
receive antennas by just one (i.e. Nr = 17) can dramatically
alleviate this degradation. For instance, with 16 17 scenario,
In this section, the aim is to develop a turbo equalizationtype receiver using a QP detector. In the previous sections,
the performance of QP-based detectors were studied based on
uncoded/coded BER. In order to improve the performance of
such detectors in a low SNR regime, a turbo equalizationtype receiver can be used, in which a detector and a decoder
exchange soft information between each others in an iterative
manner (called iterative detection and decoding (IDD)) until
a stopping criteria is reached [46]. There are two challenges
in using QP in an IDD setting. First, how to incorporate a
priori information provided by the channel decoder, in the
form of Log-Likelihood Ratio (LLR), into the QP optimization
problem (6). Second, how to make the QP detector provide
soft information, in the form of LLR, so that it can be used
as a priori information to the soft-input soft-output channel
decoder. Addressing these challenges with implementation and
performance study will be presented in this section for largescale MIMO in a spatial multiplexing setup. We use the same
technique in [47] to incorporate a priori information into the QP
11
optimization problem; however, we further propose to reduce

the number of optimization problems needed to compute LLR
using local neighborhood solutions. Model (3) is used for the
analysis, with the focus on the QPSK modulation. A receiver
block diagram with turbo equalization is shown in Fig. 12.
where x = [x1 , . . . , x2Nt ]T is the vector of all interleaved bits,

and La = [La (1) , . . . , La (2Nt ) ]T is the vector of LLR ratios of
all interleaved bits. Now lets focus on the first term on the
right side of (20) and reformulate it to a QP problem, we get:

min
x1
xi
1
1
||y Hx||2 xT La
2 2
2
Q = HT H,
where = {0, 1}2Nt ,
1 T
z Q z + bT z
2
i
= min
z0
z
z =
(21)
x+1
, and b =
2
1
2
La . The result of (21) can be applied
HT (y + H1)
2
4
to both terms of (20), and with relaxing integer constraints,

Equation (20) becomes:
1
{ zT Q z + bT z}
2
1 T
min
{ z Q z + bT z}
06z61,zi =1 2
Lpost (xi )
Fig. 12 Receiver side for MIMO IDD with a QP detector

Consider QPSK symbols, which are mapped from coded and
interleaved bits, to be transmitted over a MIMO flat fading
channel. At the receiver side the complex channel model is
transformed to a real equivalent one, as shown in section
III-A. The real part of the complex data symbols is mapped to
[x1 , . . . , xNt ], and the imaginary part of these symbols is mapped
to [xNt +1 , . . . , x2Nt ], where bit xi {1, +1} , i = 1, . . . , 2Nt .
Therefore, the a posteriori LLR for bit xi is:
Lpost (xi ) = ln
p(xi = +1|y, H)
, i = 1, . . . , 2Nt
p(xi = 1|y, H)
(16)
Using Bayes theorem, Eq. (16) can be written as:

Lpost (xi ) = ln
p(y|x, H)P(x) ln
x +1
x
p(y|x, H)P(x)
x 1
x
(17)
where x1 is the set of all possible vectors of x satisfying

i
xi = 1. P(x) is the vector of a priori probabilities, which in the
case of turbo equalization, is delivered by the outer channel
decoder in the form of an a priori LLR ratio, as follows:
La (xi ) = ln
p(xi = +1)
, i = 1, . . . , 2Nt
p(xi = 1)
(18)
If the noise in the system is white Gaussian, the probability density function p(y|x,
H) can be represented by

2
2
1
exp
||y
Hx||
/2
.
This
can be used
2 2
P in (17), and
with the aid of max-log approximation (ln( i exp(i ))
maxi {i }) [48], Eq. (17) can be simplified to:

1
2
||y
Hx||
ln[P(x)]
x 1
2 2
x
i

1
2
min
||y
Hx||
ln[P(x)]
x +1
2 2
x
Lpost (xi )
min
(19)
In order to find the relation between the vector of a priori probability P(x) and La , we follow the same work and assumptions
in [46] and [48]. Thus, (19) can be written as:

1
1 T
2
||y
Hx||
x
L
a
2 2
2
x1
xi

1
1 T
2
min
||y
Hx||
x
L
a
2 2
2
x+1
x
Lpost (xi ) min
(20)
min
06z61,zi =0
(22)
Equation (22) shows that to evaluate LLR per one bit, it is

required to solve two QP problems of length 2Nt 1 each. The
LLR computations for these 2Nt bits require solving a total
of 4Nt QP problems, which are large computations. Thus, as
in [47], we, first solve the following problem without any bit
constraints,
z = Q[argmin
06z61
1 T
z Q z + bT z]
2
(23)
and second, we solve the same problem again 2Nt -times with
bit constraints as follows:
min
z
st
1 T
z Q z + bT z
2
0 6 z 6 1, zi = xor(zi , 1), i = 1, . . . , 2Nt
(24)
The cost function values that result from the minimization

problems in (23) and (24) are substituted back in (22) to
find Lpost (xi ). This idea reduces the number of problems to
be solved to 2Nt + 1.
As shown in [46], the exchange of extrinsic information
between the channel detector and the channel decoder is more
effective in improving performance of the turbo equalization
receiver. Thus, the required extrinsic information can be calculated, as follows:
LE (xi ) = Lpost (xi ) La (xi )
(25)
Although the above technique may suit the conventional

small MIMO systems because the size of the QP is small,
it is not computationally efficient for the large-scale MIMO
system. For instance, if Nt = 64 with QPSK modulation, 129
QP optimization problems need to be computed to evaluate
LLR for 128 bits (i.e. using (23) and (24)). Therefore, in this
section, we propose a simple algorithm that solves only one
optimization problem and then finds the neighborhood set of
solutions to the vector z to compute LLR per bit. It can be
summarized in the following steps:
1) Solve the QP problem (23) one time to find z .
2) Then, instead of solving problem (24) 2Nt -times, construct
the closest neighborhood solutions of z , as described below.
3) The list of solution vectors provided by z and its neighborhood is used in (20) or (22) to compute Lpost (x).
The construction of a neighborhood solution can be done
according to the following way: Let the alphabet set for QPSK
12
(
(j)
zi
zi
N (zi )
for i 6= j
for i = j
)
(26)
The simulation of this section is implemented using a soft-in

soft-out 1/2 rate convolutional channel decoder that is based on
the BCJR algorithm. Note that in the transmit side a convolutional encoder (rate R = 1/2, generator polynomials [133 171],
and constraint length 7) is used with a random interleaver and
a QPSK large-scale MIMO system with Nt = Nr = 16 and
64. The number of iterations represents the number of times
the soft-input soft-output MIMO detector and the soft-input
soft-output channel decoder are used.
Fig. 13 demonstrates the BER performance of three iterations of IDD when the soft-in soft-out QP detector is used. It
can be seen that as the number of iterations increases, a lower
BER is obtained for both cases of Nt = 16 and Nt = 64,
though the difference in performance between Nt = 16 and
Nt = 64 can be seen only at higher iteration numbers, such
as iteration 3. The uncoded and convolutionally coded cases
are also plotted in the same figure to point out the advantages
of IDD at low SNR. The coded performance of 16 16 and
64 64 represents the case where a hard decision QP detector
is followed by a hard decision viterbi decoder. As expected,
the performance difference between the hard decision and soft
decision (represented by iteration number 1 of IDD) is about 2
dB. Note that in this figure, the large system behavior between
1616 and 6464 can be observed in both uncoded and coded
cases; however, in IDD, it can be observed at higher iteration
numbers.
The performance of our proposed technique for reducing
LLR computations is shown in Fig. 14, with Nt = 16 and
Nt = 64, where LLR is computed based on (23) with the
set of neighborhood solutions. This is compared to the case
where LLR is computed based on multiple QP computations
((23) and (24)). When Nt is relatively small, such as 16, the
performance of the two techniques become very close as the
number of iterations increases, such as the case of iteration 3
in Fig. 14a. Whereas, for relatively large Nt , such as 64, the
performance of the proposed technique is quite similar to the
multiple QP computation technique. It becomes even slightly
better at the third iteration, as shown in Fig. 14b. This may
be due to the large system effect that appears more clearly at
Nt = 64 because it combines QP technique with some sort of
LAS technique in computing LLR.
VI. C ONCLUSION
This paper proposes low complexity detection algorithms
that are suitable for large-scale MIMO with higher QAM
modulations. The proposed algorithms are based on the QP
detector. They improve the performance of the conventional
QP detector with better trade-offs between complexity and
QPSK MIMO QP 1/2 rate Conv. Coded
10
Uncoded 16x16
Uncoded 64x64
Coded, 16x16
Coded, 64x64
IDD , 16x16
IDD , 64x64
10
10
BER
modulation = {0, 1}, so the symbol neighborhood of (0)

(i.e. N (0)) is {1}, and N (1) is {0}. The vector neighborhood
to z is the vector that differs from z in just one coordinate; hence, there will be 2Nt neighbor vectors to z . Let
the neighbor vectors be znb = [z(1) , . . . , z(j) , . . . , z(2Nt ) ], where
(j)
(j)
(j)
z(j) = [z1 , . . . , zi , . . . , z2Nt ]T , i, j = 1, . . . , 2Nt , and
Uncoded
10
Coded
10
iter# 3
iter# 1
10
iter# 2
6
10
8
10
12
14
Average Received SNR, dB
16
18
20
Fig. 13 BER performance of IDD using a QP detector (LLR

computations are based on (23) and (24))
performance. At high SNR and higher QAM modulations, the

proposed algorithms outperform LAS and RTS algorithms in
both coded and uncoded BER performance. At large Nt = Nr ,
the 2QP algorithm is more suitable in terms of performance
and complexity than BB(L,M), while BB(L,M) provides better
performance at relatively small Nt = Nr . This paper also
demonstrated that QP-based detectors can be used for iterative
detection and decoding with low complexity.
R EFERENCES
[1] A. Elghariani and M. Zoltowski, A quadratic programming-based
detector for large-scale mimo systems, accepted in IEEE Wireless
Communications and Networking Conference (WCNC), 2015.
[2] F. Boccardi, R. W. Heath Jr, A. Lozano, T. L. Marzetta, and P. Popovski,
Five disruptive technology directions for 5g, IEEE Communications
Magazine, vol. 52, no. 2, pp. 7480, 2014.
[3] G. J. Foschini and M. J. Gans, On limits of wireless communications in
a fading environment when using multiple antennas, Wireless personal
communications, vol. 6, no. 3, pp. 311335, 1998.
[4] H. Bolcskei, Principles of mimo-ofdm wireless systems, Chapter in
CRC Handbook on Signal Processing for Communications, M. Ibnkahla,
Ed, 2004.
[5] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta,
O. Edfors, and F. Tufvesson, Scaling up mimo: Opportunities and
challenges with very large arrays, IEEE Signal processing Magazine,
vol. 30, no. 1, pp. 4060, 2013.
[6] J. Hoydis, K. Hosseini, S. t. Brink, and M. Debbah, Making smart
use of excess antennas: Massive mimo, small cells, and tdd, Bell Labs
Technical Journal, vol. 18, no. 2, pp. 521, 2013.
[7] S. K. Mohammed, A. Zaki, A. Chockalingam, and B. S. Rajan, Highrate space-time coded large-mimo systems: low-complexity detection
and channel estimation, IEEE Journal of selected Topics Signal processing, vol. 3, no. 6, 2009.
[8] M. O. Damen, H. El Gamal, and G. Caire, On maximum-likelihood
detection and the search for the closest lattice point, IEEE Trans. Inf.
Theory, vol. 49, no. 10, pp. 23892402, 2003.
[9] M. Hansen, B. Hassibi, A. G. Dimakis, and W. Xu, Near-optimal
detection in mimo systems using gibbs sampling, in Global Telecommunications Conference, GLOBECOM. IEEE, 2009, pp. 16.
[10] P. W. Wolniansky, G. J. Foschini, G. Golden, and R. A. Valenzuela,
V-blast: An architecture for realizing very high data rates over the
rich-scattering wireless channel, in URSI International Symposium on
Signals, Systems, and Electronics, ISSSE 98. IEEE, 1998, pp. 295300.
[11] D. W. Waters and J. R. Barry, The chase family of detection algorithms
for multiple-input multiple-output channels, in Global Telecommunications Conference, GLOBECOM04, vol. 4. IEEE, 2004, pp. 26352639.
13
QPSK 16x16 MIMO IDD
QPSK 64x64 MIMO IDD
10
10
LLR using neighbor solutions
LLR using (23) and(24)
LLR using neighbor solutions

LLR using (23) and (24)
10
10
iter# 1
iter# 1
BER
10
BER
10
10
iter# 2
10
iter# 3
iter# 3
iter# 2
10
10
4
5
6
(a)
4
5
6
(b)
Fig. 14 IDD BER performance with reduced LLR computation (a) 16 16 (b) 64 64
[12] J. Yue, K. J. Kim, J. D. Gibson, and R. A. Iltis, Channel estimation and

data detection for mimo-ofdm systems, in Global Telecommunications
Conference, GLOBECOM03, vol. 2. IEEE, 2003, pp. 581585.
[13] N. Srinidhi, S. K. Mohammed, A. Chockalingam, and B. S. Rajan,
Near-ml signal detection in large-dimension linear vector channels
using reactive tabu search, arXiv preprint arXiv:0911.4640, 2009.
[14] L. G. Barbero and J. S. Thompson, Fixing the complexity of the sphere
decoder for mimo detection, IEEE Trans. Wireless Communications,
vol. 7, no. 6, pp. 21312142, 2008.
[15] K. Vishnu Vardhan, S. K. Mohammed, A. Chockalingam, and B. Sundar Rajan, A low-complexity detector for large mimo systems and multicarrier cdma systems, IEEE Journal Selected Area on Communication,
vol. 26, no. 3, 2008.
[16] P. Li and R. D. Murch, Multiple output selection-las algorithm in large
mimo systems, IEEE Communications Letters, vol. 14, no. 5, pp. 399
401, 2010.
[17] B. S. Rajan, S. K. Mohammed, A. Chockalingam, and N. Srinidhi,
Low-complexity near-ml decoding of large non-orthogonal stbcs using
reactive tabu search, in International Symposium Info. Theory. IEEE,
2009, pp. 19931997.
[18] T. Datta, N. Srinidhi, A. Chockalingam, and B. S. Rajan, Randomrestart reactive tabu search algorithm for detection in large-mimo systems, IEEE Communications Letters, vol. 14, no. 12, pp. 11071109,
2010.
[19] F. A. Bhatti, S. A. Khan, S. Ur Rehman, and F. Rasool, Mimo ofdm
signal detection using quadratic programming, in 14th International
Multitopic Conference (INMIC). IEEE, 2011, pp. 323328.
[20] Y. Zhang, W. Lu, and T. Gulliver, Integer qp relaxation based algorithms
for ici reduction in ofdm systems, in Canadian Conference on Electrical
and Computer Engineering, CCECE. IEEE, 2007, pp. 184187.
[21] A. Mobasher, M. Taherzadeh, R. Sotirov, and A. K. Khandani, A nearmaximum-likelihood decoding algorithm for mimo systems based on
semi-definite programming, IEEE Trans. Info. Theory, vol. 53, no. 11,
pp. 38693886, 2007.
[22] Z.-q. Luo, W.-k. Ma, A.-C. So, Y. Ye, and S. Zhang, Semidefinite
relaxation of quadratic optimization problems, IEEE Signal Processing
Magazine, vol. 27, no. 3, pp. 2034, 2010.
[23] W.-K. Ma, C.-C. Su, J. Jalden, T.-H. Chang, and C.-Y. Chi, The
equivalence of semidefinite relaxation mimo detectors for higher-order
qam, IEEE Journal of Selected Topics in Signal Processing, vol. 3,
no. 6, pp. 10381052, 2009.
[24] P. Li, R. C. de Lamare, and R. Fa, Multiple feedback successive
interference cancellation detection for multiuser mimo systems, IEEE
Trans. on Wireless Communications, vol. 10, no. 8, pp. 24342439, 2011.
[25] Z. Li, Y. Cai, and M. Ni, Low complexity mimo detection based on
branch and bound algorithm, in IEEE 18th International Symposium on
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
Personal, Indoor and Mobile Radio Communications (PIMRC). IEEE,

2007, pp. 15.
A. Murugan, H. El Gamal, M. Damen, and G. Caire, A unified framework for tree search decoding: rediscovering the sequential decoder,
IEEE Trans. Info. Theory, vol. 52, no. 3, pp. 933953, 2006.
A. Elghariani and M. D. Zoltowski, Branch and bound algorithm for
code spread ofdm, in Statistical Signal Processing Workshop (SSP).
IEEE, 2012, pp. 844847.
A. Elghariani and M. Zoltowski, Branch and bound with m algorithm
for near optimal mimo detection with higher order qam constellation, in
MILITARY COMMUNICATIONS CONFERENCE (MILCOM). IEEE,
2012, pp. 15.
G. J. Foschini, Layered space-time architecture for wireless communication in a fading environment when using multi-element antennas,
Bell labs technical journal, vol. 1, no. 2, pp. 4159, 1996.
C. V. Rao, S. J. Wright, and J. B. Rawlings, Application of interior-point
methods to model predictive control, Journal of optimization theory and
applications, vol. 99, no. 3, pp. 723757, 1998.
J. Gondzio, Interior point methods 25 years later, EJOR, vol. 218,
no. 3, pp. 587601, 2012.
E. Lawler and D. Wood, Branch-and-bound methods: A survey,
Operations research, pp. 699719, 1966.
W. Zhang, Branch and bound search algorithm and their computational
complexity, Document of USC/Information Sciences Institute, May
1996., Tech. Rep., 1996.
L.-L. Yang, Using multi-stage mmse detection to approach optimum
error performance in multiantenna mimo systems, in Vehicular Technology Conference Fall (VTC 2009-Fall), 2009 IEEE 70th. IEEE, 2009,
pp. 15.
J. Clausen, Branch and bound algorithms-principles and examples,
Department of Computer Science, University of Copenhagen, pp. 130,
1999.
T. Ibaraki, Theoretical comparisons of search strategies in branchand-bound algorithms, International Journal of Parallel Programming,
vol. 5, no. 4, pp. 315344, 1976.
J. Anderson and S. Mohan, Sequential coding algorithms: A survey and
cost analysis, IEEE Transactions on Communications, vol. 32, no. 2,
pp. 169176, 1984.
J. Zhang and K. Kim, Near-capacity mimo multiuser precoding with
qrd-m algorithm, in Conference Record of the Thirty-Ninth Asilomar
Conference on Signals, Systems and Computers (Asilomar). IEEE,
2005, pp. 14981502.
M. Lau, S. Yue, K. Ling, and J. Maciejowski, A comparison of interior
point and active set methods for fpga implementation of model predictive
control, in Proc. European Control Conference, 2009, pp. 156160.
T. Datta, N. Srinidhi, A. Chockalingam, and B. S. Rajan, A hybrid
14
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
rts-bp algorithm for improved detection of large-mimo m-qam signals,

in National Conference on Communications (NCC),.
P. Svac, F. Meyer, E. Riegler, and F. Hlawatsch, Soft-heuristic detectors
for large mimo systems, IEEE Trans. Signal Processing, vol. 61, pp.
45734586, 2013.
Y. S. Cho, J. Kim, W. Y. Yang, and C. G. Kang, MIMO-OFDM wireless
communications with MATLAB. JW&S, 2010.
H. Ozcelik,
M. Herdin, W. Weichselberger, J. Wallace, and E. Bonek,
Deficiencies of kroneckermimo radio channel model, Electronics
Letters, vol. 39, no. 16, pp. 12091210, 2003.
J. W. Wallace and M. A. Jensen, Modeling the indoor mimo wireless
channel, IEEE Transactions on Antennas and Propagation, vol. 50,
no. 5, pp. 591599, 2002.
M. A. Jensen and J. W. Wallace, A review of antennas and propagation
for mimo wireless communications, IEEE Transactions on Antennas
and Propagation, vol. 52, no. 11, pp. 28102824, 2004.
M. Tuchler, R. Koetter, and A. C. Singer, Turbo equalization: principles
and new results, IEEE Trans. Communications, vol. 50, no. 5, pp. 754
767, 2002.
B. Steingrimsson, Z.-Q. Luo, and K. M. Wong, Soft quasi-maximumlikelihood detection for multiple-antenna wireless channels, IEEE
Transactions on Signal Processing, vol. 51, no. 11, pp. 27102719, 2003.
B. M. Hochwald and S. Ten Brink, Achieving near-capacity on
a multiple-antenna channel, IEEE Transactions on Communications,
vol. 51, no. 3, pp. 389399, 2003.
Ali Elghariani received both the B.S. and M.S. degrees in Electrical and Electronic Engineering from
University of Tripoli in 1999 and 2008, respectively,
and the Ph.D. in Communications, Networking, and
Signal Processing from the School of Electrical
and Computer Engineering at Purdue University of
West Lafayette in 2014. He joined industry for
several years before he started his PhD. Currently
he is a lecturer at the Department of Electrical and
Electronic Engineering, University of Tripoli, Libya.
During 2013 he was a system engineer intern with
Qualcomm, Inc. at San Diego. He was the recipient of IEEE MILCOM
conference travel grant award in 2012. His current research interests are
signal detection and channel estimation in large-scale MIMO systems, symbol
spreading OFDM systems, turbo equalization, and the application of quadratic
programming optimization techniques in wireless communications.
Michael Zoltowski received both the B.S. and M.S.

degrees in Electrical Engineering with highest honors from Drexel University in 1983 and the Ph.D. in
Systems Engineering from the University of Pennsylvania in 1986. In Fall 1986, he joined the faculty
of Purdue University where he currently holds an
Endowed Chaired Professorship in Electrical and
Computer Engineering. In this capacity, he was the
Ruth and Joel Spira Outstanding Teacher Award for
1990-1991 and the 2001-2002 Wilfred Hesselberth
Award for Teaching Excellence, and the Engineering
Distance Education Award for 2012. In 2001, he was named a University
Faculty Scholar by Purdue University. On 25 September 2008, he became
the Thomas J. and Wendy Engibous Professor of Electrical and Computer
Engineering, an Endowed Chair conferred by the Board of Trustees of Purdue
University. Prof. Zoltowski is a co-recipient of a 2014 IEEE Globecom Best
Paper Award and a 21st Humantech Paper Award: Silver Prize sponsored
by Samsung. He is also the recipient of a 2002 Technical Achievement
Award from the IEEE Signal Processing Society. In addition, he served as
a 2003 Distinguished Lecturer for the IEEE Signal Processing Society. He
is a Fellow of IEEE. He is a recipient of the 2006 Distinguished Alumni
Award from Drexel University. Prof. Zoltowski is a co-recipient of the IEEE
Communications Society 2001 Leonard G. Abraham Prize Paper Award in
the Field of Communications Systems. He is also the recipient of the IEEE
Signal Processing Societys 1991 Paper Award, The Fred Ellersick MILCOM
Award for Best Paper in the Unclassified Technical Program at the 1998
IEEE Military Communications Conference, and a Best Paper Award at the
2000 IEEE International Symposium on Spread Spectrum Techniques and
Applications. In addition, from 1998 to 2001, Dr. Zoltowski served as an
elected Member-at-Large of the Board of Governors and Secretary of the
IEEE Signal Processing Society. From 2003-2005, he served on the Awards
Board of the IEEE Signal Processing Society and also served as the Area
Editor in charge of Feature Articles for the IEEE Signal Processing Magazine.
Within the IEEE Signal Processing Society, he has been a member of the
Technical Committee for the Statistical Signal and Array Processing Area,
the Technical Committee for DSP Education and the Technical Committee
on Signal Processing for Communications (SPCOM.) From 2003-2004, he
served as Vice-Chair of the Technical Committee on Sensor and Multichannel
(SAM) Processing, and served as Chair for 2005-2006. He has served as an
Associate Editor for both the IEEE Transactions on Signal Processing and the
IEEE Communications Letters. He was Technical Chair for the 2006 IEEE
Sensor Array and Multichannel Workshop. He served as Vice-President for
Awards & Membership for the IEEE Signal Processing Society, 2008-2010.

Low Complexity Detection Algorithms in Large-Scale MIMO Systems

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Low Complexity Detection Algorithms in Large-Scale MIMO Systems

Hochgeladen von

Copyright:

Verfügbare Formate

This article has been accepted for publication in a future issue of this journal, but has not been

Low Complexity Detection Algorithms in Large-Scale MIMO

In this contribution, we present low-complexity detection

used to serve large-scale systems. One reason is because their

in this paper that it provides better performance than the LAS

III. P ROPOSED A LGORITHMS

In this real-valued system model, the real part of the complex

{ C+1, .., 1, 1, ..., C1}, and C is the QAM constellation

size. Each element of this real set can be transformed to

where = {0, 1, 2, .., C 1}, Q = HT

B. Algorithm I: A QP Detector (Review)

Consider a MIMO system with Nt transmit antennas and Nr

II. S YSTEM M ODEL

where 0 represents a 2Nt 1 vector of all zeros and the

constraints 0 z ( C 1)1 represents the box constraints

bounded by 0 and upper bounded by C 1. This form of an

in a number of iterations that is constant, independent of

function value f (z ). If all elements of z satisfy the integer

the ith column from the matrix H. Similarly, x is obtained

The idea of this algorithm is to implement two stages of

where bxe is the rounding operation of x to the nearest integer.

C. Algorithm II: A Two-Stage QP Detector

I = {i : i {1, 2, . . . , 2Nt } | |zi bzi e| }

this symbol in the second QP stage, the new modified

vector becomes y = yxi gi , where xi = 2

where Q(I, J ) denotes a submatrix composed of rows I and

TABLE I A Two-stage QP Algorithm

search tree structure [32], [35], as shown in Fig. 1. The input

Fig. 1 Representation of Breath First BB search tree

t = argmin f (z(p) ), p = 1, ...., mL

This approximation is based on the concept of the standard

most probable nodes that may lead to the optimum solution,

if |f (z(k) ) f (Q[z(k) ])| |f (z(k) )|

TABLE II BB(L,M) Algorithm Summary

Initialize node LIST = empty, and f (up) =

where |(.)| represents the absolute value operation, and is a

of antennas is 6, 7, 8, and 9, when the symbol mapping is

BB(L,M) requires nearly Nv -times the complexity of 2QP.

Fig. 2c demonstrates that the value of = 0.25 is a good

Average received SNR (dB)

QPSK 32x32 MIMO

Red color: 256

Second stage problem size

QPSK MIMO 32x32

Average received SNR (dB)

32x32 MIMO, 2QP algorithm

Average received SNR (dB)

Average received SNR (dB)

over the MMSE-LAS detector; however, their performance

QPSK 32x32 MIMO

16QAM 32x32 MIMO

256 QAM 32x32 MIMO

64QAM MIMO 32x32 BB

16QAM 32x32 MIMO

Fig. 5 16QAM BER performance using BB(L,M). Improvemnt is

MF and MMSE are also plotted for reference.

An interesting result regarding the 2QP algorithm, across

computational complexity for 16QAM at 19 dB SNR

Computational Complexity for 16QAM at 26dB SNR

Avg. # of Arithmatic Operations.

Avg. # of Arithmatic Operations.

Computational Complexity for 256QAM at 35 dB

Avg. # of Arithmatic Operations.

Avg. # of Arithmatic Operations.