Sie sind auf Seite 1von 6

Genetic and Greedy User Scheduling for Multiuser

MIMO Systems with Successive Zero-Forcing


Robert C. Elliott1 , Shreeram Sigdel1 , Witold A. Krzymien1 , Mazin Al-Shalash2 and Anthony C. K. Soong2
1 University of Alberta / TRLabs, Edmonton, Canada
2 Huawei Technologies, Plano, TX, USA
AbstractIn this paper we consider efficient and low complexity scheduling algorithms for multiuser multiple-input multipleoutput (MIMO) systems. The optimal user scheduling involves an
exhaustive search, which becomes very complex for realistic numbers of users and transmit antennas. Among various suboptimal
but low complexity algorithms, greedy algorithms with heuristic
scheduling metrics have been shown to achieve performance close
to an exhaustive search. Meanwhile, genetic algorithms (GAs)
are a rapid, though suboptimal, option of performing a utility
(in this case scheduling) metric optimization. In this paper, we
propose and analyze the performance and complexity of greedy
and genetic scheduling algorithms for multiuser MIMO systems
with successive zero-forcing precoding. We demonstrate that at
lower K, the genetic algorithm performs better than the greedy
algorithm, where K denotes the total number of users requesting
service. For large K, however, the greedy algorithm outperforms
the genetic algorithm. The greedy algorithm also achieves similar
sum-rate growth (with K) as the exhaustive search. A detailed
complexity analysis shows that the order of complexity of the
genetic algorithm is higher than that of the greedy algorithm by
a factor equal to K02 , where K0 denotes the maximum number
of simultaneously served multiple-antenna users. Both algorithms
achieve a sum rate very close to the exhaustive search with much
less complexity for a small number of transmit antennas.

I. I NTRODUCTION
Efficient user scheduling methods for linearly precoded
multiuser MIMO channels with multiple antennas at the base
station and multiple antennas at each user have captured much
research attention recently. It is known that the capacity of
a multiuser MIMO broadcast channel (BC) can be achieved
through the use of dirty paper coding (DPC) [1], [2]. However,
DPC is highly nonlinear and very complex to implement in
practice. Therefore, reduced complexity precoding methods
are of interest to reduce the effect of multiuser interference (MUI). Such methods include zero-forcing beamforming
(ZFB) [3] for systems with single-antenna users, and block
diagonalization (BD) [4] and successive zero-forcing (SZF)
[5] for systems with multiple-antenna users.
In particular, BD is a technique that completely nulls the
interference between users. However, this nulling operation
imposes a constraint that the total number of receive antennas
be no larger than the number of transmit antennas. This also
yields a reduction in the performance and number of users that
Our work made use of the infrastructure and computational resources
of AICT (Academic Information and Communication Technologies) at the
University of Alberta. The authors also gratefully acknowledge funding
for this research provided by TRLabs, Huawei Technologies, the Rohit
Sharma Professorship, the Natural Sciences and Engineering Research Council
(NSERC) of Canada, the Alberta Informatics Circle of Research Excellence
(iCORE), and the Alberta Ingenuity Fund.

can be served relative to DPC. In many situations, a complete


removal of MUI is not beneficial. SZF is one example which
does not completely null the MUI. In [5] it is shown that with
multiple antenna users, the achievable throughput of SZF is
higher than that of BD in several cases. Additionally, SZF can
relax the transmit/receive antenna constraints and sometimes
serve a higher number of users simultaneously compared to
BD.
In multiuser systems, typically there are very many users
that request service simultaneously. The above mentioned
antenna constraints, as well as a transmit power constraint,
results in the necessity for user scheduling, since all users requesting service cannot be served simultaneously. The optimal
scheduling method is combinatorially complex and involves
an exhaustive search over the user pool in order to find
the selection that maximizes some sort of utility function.
Additionally, for successive encoding methods such as SZF,
the user encoding order also affects the performance of the
system, further complicating the choice. Thus, various heuristic scheduling methods have been proposed to simplify the
scheduling process [3], [6], [7], which perform close to the
optimal exhaustive search.
In this paper, we propose and analyze the performance and
complexity of greedy and genetic user scheduling algorithms
to maximize the system throughput in a multiuser MIMO
system with multiple-antenna receivers, in the context of SZF.
The first is a low complexity greedy heuristic algorithm for
user selection and ordering, as proposed in [8]. The second
method involves the use of genetic algorithms (GAs) [9] for
utility function maximization. GAs are known for achieving
very good solutions to optimization problems very quickly.
GAs for scheduling remove almost all of the complexity from
the user selection process and move it to the utility function
calculation. We compare the performance of a GA similar to
that described in [10], adapted for use with SZF.
The remainder of this paper is organized as follows. Section
II describes the system model and the pertinent details of SZF.
Section III describes the proposed greedy and genetic scheduling algorithms. Section IV provides a novel and extensive
analysis of the algorithms complexity. Simulation results are
provided in Section V, and concluding remarks are given in
Section VI.
II. S YSTEM M ODEL
We consider the downlink of a multiuser MIMO system
with MT transmit antennas and Nk receive antennas at
each of the K multiple-antenna users requesting service. Let

978-1-4244-5626-0/09/$26.00 2009 IEEE

Hk C Nk MT denote the downlink channel matrix of the


kth user, k = 1, 2, ..., K. We assume a spatially uncorrelated
flat Rayleigh fading channel model, i.e. the elements of Hk
are independent and identically distributed complex Gaussian
random variables with variance of 0.5 per dimension. We
assume that the transmitter has perfect knowledge of the
channel state information of all users (perfect CSIT), and
each user knows its channel perfectly. The data vector of
user k, sk C Nk 1 , is preprocessed at the transmitter with
the beamforming matrix Wk C MT Nk to produce the
transmitted signal vector xk C MT 1 . The Nk 1 received
signal vector of the kth user can be expressed as
K
yk = Hk

j=1

Wj sj + nk

(1)

where nk C Nk 1 denotes zero mean additive white Gaussian


2
noise with E{nk nH
k } = n IN .
Block diagonalization (BD) designs Wk to pre-eliminate
multiuser interference such that Hk Wj = 0, k=j and
1(j, k)K . This decomposes the multiuser channel into
equivalent single user channels, and the received signal vector
(1) becomes yk = Hk Wk sk + nk . In comparison, successive zero-forcing (SZF) precoding does not completely preeliminate the multiuser interference.
Due to the successive nature of SZF, the user precoding
order is important for sum-rate maximization. For a given set
of users with encoding order , for each user j {1, ..., K}
the received signal can be expressed
 as [5]
y(j) = H(j) (W(j) s(j) +
+

i>j

i<j

W(i) s(i)

W(i) s(i) ) + n(j)

(2)

The precoding matrix W(j) is designed such that it lies in


j1 of the j 1
the null space of the aggregate channel H
previously precoded users channels:
j1 = [HT(1) HT(2) . . . HT(j1) ]T
H

(3)

K1 ).
SZF of K users channels is possible1 if MT > rank(H
Let us denote the SVD of (3) as
H
1
0
j1 = U
j1 V
j1
j1 [V
j1
j1
j1
j1
H
V
=U
]H

(4)

j1 C
j1 )
where V
, and
holds the MT rank(H
j1 .
right column vectors defining the null-space basis of H
The precoding matrix of the jth user W(j) is constrained to
0
j1
. Hence, the third term in
lie in the subspace defined by V
(2) is canceled by the subspace constraint on the design of the
precoding matrices for users i > j. Then, (2) reduces to

MT MT

0
j1
V

y(j) = H(j) (W(j) s(j) +

i<j

W(i) s(i) ) + n(j)

(5)

Assuming the transmitted signal vectors to be Gaussian


distributed [4], [5], for a given set of ordered users the
maximum achievable
rate of each user is given by 

j



0
0
I+H(j) ( V

i1
i1
Q(i) (V
)H )HH
(j) 

i=1

R(j) = log2 
(6)
j1


I+H(j) (  V
0 Q(i) (V
0 )H )HH 
i1
i1
(j) 

i=1
1 The more general case of K M can be handled by coordinated beamT
forming, with joint design of the transmit precoding and receive processing
matrices, which entails additional complexity and is beyond the scope of this
paper.

where the precoder input covariance matrices Q(i) of the


H
0
0
i1
i1
=V
Q(i) (V
)H .
users are defined such that W(i) W(i)
The achievable sum rate of SZF precoding for a given user
order i is
K
i
RSZF
=
max 
R(j)
(7)
j=1
{Qj }j{1,...,K} :Qj 0,

Tr(Qj )P

The maximum achievable sum rate RSZF of SZF precoding


is then obtained by maximizing (7) over all K! possible user
orders:
R
=
max
R i
(8)
SZF

i ,i=1,2,...,K!

SZF

In general, solving (8) to come up with optimal covariance


matrices is a complex problem. For a given user order i , the
authors in [5] have proposed a DPC-based numerical technique
to solve (7). It uses the sum-power iterative waterfilling
algorithm proposed in [11] for the dual multiple access channel
(MAC), and the MAC to BC covariance transformation in
[2]. Once the optimal BC covariance matrices for DPC are
obtained from [11] and [2], transformation of those matrices
to SZF is performed. Note that this transformation provides
0
the channel input covariance matrices, which include the V
i1
seen in (6).
On the other hand, incorporating user ordering is still a
difficult problem
it requires
maximization of the achievable
 
as
K
sum rate over i=1 i! Ki possible different ordered subsets
of users. This makes the problem computationally even more
complex. In this paper, we use the algorithm proposed in [5]
to obtain the covariance matrices for SZF. Our objective in
this paper, however, is to propose and analyze simplified user
scheduling algorithms, where we assume that K >> MT . The
next section provides the details of the proposed algorithms.
III. U SER S CHEDULING A LGORITHMS
The optimal user scheduling requires an exhaustive search
through all possible combinations of subsets of simultaneously
served users and is thus computationally very complex. Hence,
the main objective of this paper is to investigate and analyze
the performance and complexity of reduced-complexity greedy
and genetic scheduling algorithms for SZF assuming perfect
CSIT. The proposed algorithms are outlined in the following.
A. Greedy Algorithm
Let U = {1, 2, ..., K} denote the set of all users requesting
service, and Ul U denote the
 possible subset of users
such that |Ul | K0 , l = 1, 2, ... KK0 , where |Ul | denotes the
cardinality of set Ul ; K0 denotes the maximum number of
users that can be served simultaneously. Denote the selected
user subset as Us . In this paper, a simple Frobenius-norm (Fnorm) based heuristic scheduling metric is designed as in [8]
for the sum-rate maximization objective. Further simplification
of the greedy scheduling algorithm is obtained by using a
intermediate user grouping technique as in [6]. Considering
(6), the proposed scheduling metric maximizes the signal-toapproximate-interference (from the previously selected users)
ratio to maximize the rate of a selected user. Once the users
to be served are selected using the proposed algorithm, the
SZF technique described in Section II is used to optimize the

TABLE I
S IMPLIFIED GREEDY USER SCHEDULING ALGORITHM FOR SZF
1. i 1; U = {1, 2, ..., K}; Us = {}.
Select a user u1 such that u1 = arg max Hk 2F .

10000 00010 || 01

10000 00000 || 10

1000000100 || 10

1000000100 || 10

01010 00000 || 10

01010 00010 || 01

1101000010 || 00

10010 00000 || 10

(b)

(c)

(d)

Set Us = Us {u1 }; U1 = U \{u1 }.


2. i i + 1.
T
T
T
s )=[HT
1 0 H
Define H(U
u1 Hu2 ...Hui1 ] =Ui i [Vi Vi ] .
3. [a.] If (|Us | < K0 ),
Intermediateuser grouping:


Scheduled Encoding
users
order

kU

Find Ui =

[b.]

k Ui1 , k
/ Us |

If (|Ui | = 0),
Select a user such that
ui =

1 F
Hk V
i
1 F
Hk F V
i

< .

0 2
max Hk V
arg kU
i F

if i = 2,

max 
arg kU

otherwise.

0 2
Hk V
i F
i1
0 2
Hk V
j F
j=2

Set Us = Us {ui }; Ui = U \{ui }; Go to Step 2.


Else exit

channel input covariance matrices for the selected users. The


proposed algorithm is described in Table I.
The algorithm starts by selecting a user with the maximum
F-norm of its channel. In Step 3a, an intermediate user
grouping is performed based on a specified threshold to find
a subset Ui . If the subset Ui is not empty, the next best user is
selected from that subset. Otherwise, the algorithm terminates.
The intermediate user grouping technique significantly reduces
the complexity of the regular greedy search algorithm by
limiting the search to Ui as |Ui | << K i. To avoid loss
in sum rate, the selection of becomes crucial. A detailed
discussion on the effect of the threshold is included in
[6], [8]. The user selection metric in step 3b is comprised
of two components: a) the squared F-norm of the kth users
projected channel (projected on the null space of the aggregate
channel of the previously selected users), and b) the sum of the
squared F-norms of the product of the kth users channel and
0
i1
20 to V
) of the prior aggregate channels
null-space bases (V
T
T
T
([Hu1 ] to [Hu1 , ..., Hui1 ] ). The second component gives an
approximation of the interference from the previously selected
users to user k Ui . The exact interference is known only
after the covariance matrices of the selected subset users are
optimized. Note that for the selection of the second user u2 , the
interference approximation cannot be made as the precoding
matrix of the first user does not involve a previously selected
user. Hence, the selection is done based only on the gain of the
null-space projected channel. The algorithm iterates through a
maximum of K0 1 steps.
For SZF, the encoding order of the selected users Us =
{u1 , u2 , ..., uK0 } is important for sum-rate maximization. Optimal ordering involves the precoding matrix optimization proposed in [5], but in this case it is K0 ! times more complex. We
consider a simplified approach for user ordering according to
the lemmas proposed in our earlier work [8]. The proposition
in [8] is that the users can be successively precoded in the
order that our proposed algorithm selects them. Even though
suboptimal, our results show the performance of this ordering
is close to the optimal order from the exhaustive search.

(a)

Fig. 1. Example of GA chromosomes for scheduling with K0 =2 and K=10


in an SZF system, and typical breeding process during one generation. (a)
Two typical chromosomes, showing scheduling and ordering of users {1, 9}
and {4, 2}, respectively. Random crossover point also shown. (b) Crossover
operation. (c) Mutation. (d) Repair of invalid chromosomes.

B. Genetic Algorithm
Genetic algorithms to some degree mimic breeding in biological systems. Potential solutions to an optimization problem
are encoded in a set, or population, of data structures known
as chromosomes. The chromosomes crossbreed, mutate, and
evolve towards the optimal solution over several iterations, or
generations, of the algorithm. The most fit chromosomes,
as defined by the value of the utility function for the solution
they represent, are the most likely to pass on their solution
parameters to the next generation. In the case of scheduling,
those parameters include which users to schedule and the order
in which to encode the users data. The utility function for
user scheduling with the GA in this work is the achievable
sum rate of SZF (7). The operation of the genetic algorithm
is described briefly below. More details of the operation are
included in [10].
Initialization: A set of Np chromosomes is initialized at
random. The chromosome consists of two parts; the head
of the chromosome is a K-bit vector that indicates which
users are scheduled, and the tail contains K0 log2 (K0 )
bits indicating the encoding order of the scheduled users.
A 1 in position k of the head denotes user k is to be
scheduled, and a 0 not scheduled; the head is constrained
to have between 1 and K0 1s, since at most K0 users can
be scheduled simultaneously. The nth group of log2 (K0 )
bits in the tail denotes the relative encoding order of the nth
scheduled user (i.e. the nth 1 in the head); each of these
groups must have a unique value.
Selection: Two chromosomes are selected from the popula
tion with probability pi =Gi /( n Gn ), where Gi is the utility function value of the solution represented by chromosome
i, i.e. its fitness. These chromosomes are known as parents.
Breeding: A uniformly random position is defined within
the chromosome, and then the two selected parents swap all
bits after that point to form two child chromosomes. This
crossover operation occurs with probability pc =1. Next, the
children undergo the mutation operation, for which each bit
in the children has a pm =1/(1 +2 G /G ) probability of
being toggled, where G and G are the mean and standard
deviation of the current populations fitness before selection,
and 1 and 2 can be chosen anywhere on the line segment
1 +0.152 =(KK0 /7.5773)1.2071 , 1 1.1, 2 3, as
adapted from [12]. Finally, if the child chromosomes represent solutions that violate the constraints (i.e. too many/few
users scheduled, or non-unique encoding order values), the
chromosome is corrected to meet the constraints. Generally,
1s are toggled at random in the head until the number of

1s is reduced to K0 , or a single 0 is randomly toggled


to a 1 if there are no 1s. Non-unique order values are set
randomly to a unique value.
Iteration: The process of selection and breeding is repeated
until a new set of Np chromosomes is created. This new
set then replaces the old population. The whole process then
repeats for a total of Ng generations.
Our GA also employs elitism, where the best chromosome
C of the past generation is kept in the new one. During
each generation, Np 2 chromosomes are created through the
selection and breeding processes, one is inserted as a copy
of C , and one is an additional copy of C , except that the
encoding order of two users is swapped at random. Fig. 1
shows an example of typical chromosomes and the operation
of the proposed genetic scheduling algorithm.
IV. C OMPLEXITY A NALYSIS
In this section, we compare the complexity of the greedy and
genetic algorithms in terms of the number of flops required. A
flop is a real-value floating point operation; an addition, multiplication, or division are each 1 flop. A complex-value addition
and multiplication take 2 and 6 flops, respectively. In general,
most matrix operations require about an equal number of
multiplications and additions. Thus, we assume that complexvalued operations need 4 times the flops as the real-valued
ones. For the analysis, we assume Nk =N, k, K0 =MT /N ,
and that the algorithms schedule the maximum of K0 users.
Since K0 =MT /N =MT /N +, 0<1, K0 grows on the same
order as MT /N .
A. Complexity of Various Matrix Operations
For an m n complex-valued matrix A C mn , we list
the complexity of various matrix operations required for our
proposed scheduling algorithms.
Multiplying an mn matrix by an np matrix requires
8mnp flops [13].
2
The F-norm ||A||F requires a total of 4mn flops [7].
1/p
The pth root A
or inverse pth root A1/p of an n n
matrix both require about (112+ 43 (p1))n3 flops [14]. In
3
particular, the (inverse) square root will require 340
3 n flops.
The determinant |A| of an nn matrix is calculated by
first performing an LU decomposition (A=LU), with a
complexity of 83 n3 flops [13]. The determinant is then the
product of the n diagonal entries of U. Thus, |A| has a total
complexity of 83 n3 +6n flops.
A QR decomposition of an mn matrix, mn,
to find RC mn and QC mm requires a total of
16m2 n8mn2 + 83 n3 flops [13].
Waterfilling over j eigenmodes requires a maximum of
2j 2 +6j flops [7].
H
A full SVD (A=UV ) of an mn matrix, mn, requires
2
2
3
16m n+32mn +36n flops [13].
B. Complexity of Greedy Algorithm
The complexity of the greedy algorithm comes mainly
from F-norm calculations and matrix multiplications. The
complexity of each step is as follows.

Step 1: The F-norm of the N MT channel matrix is


calculated for each of the K users, requiring 4KMT N flops.
s ) is (i1)N MT .
Step 2: For i 2, the matrix H(U
Performing an SVD of this matrix will require
16MT2 N (i1)+32MT N 2 (i1)2 +36N 3 (i1)3 flops.
1 F requires (8MT +4)N 2 (i1)
Step 3a: The term Hk V
i
flops for the matrix multiplication and F-norm calculation, for each of the |Ui1 | users. Hk F can be reused
1 F = (i1)N ,
from Step 1. Due to orthonormality, V
i
which is negligible to compute, as is the division for the
overall normalization. Thus, the complexity of Step 3a is
|Ui1 |(8MT +4)N 2 (i1) flops.
Step 3b: The sum in the denominator need not be recalculated for each i. Instead, a running sum can be kept and
updated with each i, at the expense of the storage of at most
0 2 requires
|U2 | real scalars. At each i, the term Hk V
i F
(8MT +4)N (MT (i1)N ) flops to compute for each user
in |Ui |. Then, 1 flop is required to divide by the sum in the
denominator, followed by 1 flop to update the sum for the
next i; we neglect these final 2 flops. Thus, the complexity
of Step 3b is (8MT +4)N (MT (i1)N ) flops.
Therefore, the 
total complexity of the greedy algorithm is
K0
2
2
2

4KMT N +
3

i=2

16MT N (i 1)+32MT N (i 1)

(9)

+36N (i 1) +|Ui1 |(8MT + 4)N (i 1)


+|Ui |(8MT + 4)N (MT (i 1)N )]

After some simplification of


above, the highest
Kthe
0 1
(8MT +4)N 2 n=1
[n|Un |n|Un+1 |] and
order terms are
K0
(8MT +4)MT N i=2
|Ui |. Expanding the sum in the first term
K0 1
K0 1
gives
[n|U
|n|U
|Un |.
n
n+1 |]=(K0 1)|UK0 |+
n=1
n=1
Thus,
the
complexity
of
the
greedy
algorithm
is


K0
O MT2 N i=2
|Ui | . |Ui | is a random variable in the
range 0 |Ui | (K i + 1), whose value depends on the
threshold . In the worst case, the greedy algorithm must
search over (Ki+1) users in Step 3. Thus, although the
search is generally simpler, the
complexity
of
 worst-case

 0
.
After
(Ki+1)
the greedy algorithm is O MT2 N K
i=2
further simplification,
we conclude
that the greedy algorithms



complexity is O KK0 MT2 N O KMT3 .
C. Complexity of Genetic Algorithm
The process of selecting users in the GA is mostly bit
manipulation operations with the chromosomes, which are of
negligible computational complexity. The complexity instead
lies in calculating the utility function for the selection of users
represented by each chromosome. This utility function is the
sum rate for the selected users and encoding order as given by
(7). To calculate the sum rate, the following steps are taken:
1) Find the covariance matrices Pi for the dual MAC with the
iterative algorithm in [11]; 2) Convert the MAC matrices Pi
to BC matrices i for DPC, as in [2]; 3) Convert the DPC
matrices i to SZF covariance matrices Qi as in [5]; and 4)
Calculate the sum rate from (7). More details for each step can
be found in [5]. The complexity of each step is as follows.
Step 1: For each iteration of the algorithm in [11], first
effective channel matrices are calculated for each user i

Thus, one GA metric calculation requires O(K0 MT3 ) flops. The


GA calculates this metric Np Ng times. We use Np =5K0
and Ng =K/2 in our simulations. Thus, the entire scheduling process is O(KK02 MT3 ). Hence, the genetic algorithm is

9.8
9

Average sum rate (bits/s/Hz)


by Gi =Hi (I + j=i Hj Pj Hj )1/2 , which involves matrix
additions and multiplications, and an inverse square root. The
block-diagonal matrix formed by these Gi is waterfilled in
order to obtain covariance matrices Si , which are in turn
used to update each Pi for the next iteration. Of all the
calculations, the most complex is the inverse square root
of an MT MT matrix for each of the K0 users during
each iteration. Thus, the complexity of Step 1 is O(K0 MT3 )
flops, where is the number of iterations required for the
algorithm to converge. From the figures in [11] and from our
own simulations, 3-5 iterations are generally enough for the
algorithm to converge to less than 1% error in the DPC sum
rate, which is sufficient for scheduling purposes. Hence, the
overall complexity of Step 1 is O(K0 MT3 ).
Step
2: The DPC covariance matrices j are
determined
successively
(assuming
user
1
is
A
encoded
first
on
the
MAC)
by
calculating
j =I+
j1  H
K0
i Hj ,
Bj =I+ i=j+1
HH
and
Hj
i Pi H i
i=1
1/2
1/2
H 1/2
H 1/2
j =Bj
Fj Gj Aj Pj Aj Gj Fj Bj
. This involves
matrix sums and multiplications, square roots, and an SVD
1/2
H
HH
to find Fj and Gj via B1/2
j Aj =Fj j Gj . As with
j
step 1, the most complex operation is the inverse square
root of the MT MT matrices Bj for K0 users. Thus, the
complexity of Step 2 is also O(K0 MT3 ).
Step 3: To convert the DPC matrices j to SZF covariance
matrices Qi , for each user j, the null space basis vectors
0 for the aggregate channel matrix of the previous
V
j
j1 encoded users are found through an SVD or a QR
10 =I. For users 1 through K0 1, the
decomposition; V
j0 V
j0H j V
j0 V
j0H . For
SZF matrices are found as Qj =V
the final user K0 , first a temporary covariance matrix
K0 is found by waterfilling over an effective channel
Q
K0 1
1
0 0H
K
2 HK V
VK0
Hef f =(I+HK0 ( j=1
Qj )HH
matrix
K0 )
0
0
K0 1
with
power
constraint
P j=1
Tr(Qj ),
then
0 0H
0 0H

QK0 =VK0 VK0 QK0 VK0 VK0 .Over all users, the calculation

of the null space vectors is O K02 MT2 N K03 MT N 2 +K04 N 3 ,
while the matrix multiplications for the Qj matrices are
O(K0 MT3 ). With K0 =MT /N , the above terms are all
about O(K0 MT3 ), which is therefore the overall complexity
of Step 3.
Step 4: In calculating the sum rate (7), each user requires
2 determinant calculations (except for the first, where an
identity matrix is in the denominator). The sum of Qj
matrices is updated once per user at a cost of 2MT2 flops. With
the sum calculated, each determinant value requires a total of
8MT2 N + 8MT N 2 + N + 8/3N 3 + 6N flops for the matrix
multiplications and the determinant value calculation. Lastly,
2 flops are required per user to multiply and divide all the
real determinant values together. Thus, the total complexity
of Step 4 is (2K0 1)(8MT2 N + 8MT N 2 + 8/3N 3 + 7N ) +
2K0 MT2 + 2K0 , which is O(K0 MT2 N ) O(MT3 ).

8
7
6.5

(a) SNR = 5 dB
10

20

30

40

50

60

70

80

90

100

15
14
13

Exhaustive search
Proposed genetic algorithm
Proposed greedy algorithm

12

(b) SNR = 10 dB

11
3

10

20

30

40

50

60

Number of users ( K )

70

80

90

100

Fig. 2. Performance of exhaustive search, proposed greedy and genetic


scheduling algorithms; MT = 4, Nk = 2, K0 = 2; SNR = 5 and 10 dB.

more complex than the greedy algorithm by a factor of K02 .


However, it should be noted that the greedy algorithm will
still have to calculate (7) once (i.e. perform the equivalent of
one GA metric calculation) to find the transmit covariance
matrices, at a cost of O(K0 MT3 ) flops. whereas, the GA
has already performed that calculation. In comparison, an
exhaustive search to find the user
  selection
 and ordering that
maximizes (8) will be O K0 ! KK0 K0 MT3 O(K K0 K0 MT3 ).
V. S IMULATION R ESULTS
In this section, we present simulation results demonstrating
the performance of our proposed algorithms. A performance
comparison of the optimal exhaustive search, proposed greedy
algorithm and proposed genetic algorithm (GA) for SZF are
presented. For the greedy algorithm, the optimal correlation
threshold (that maximizes the sum rate) is determined as in
[6] through simulation. It has been observed that the optimum
user grouping threshold decreases with increasing K [6]. For
example, a threshold in the range of 0.45 to 0.375 has been
observed to be optimal for K = 10 to 100. The optimal case
exhaustively searches through all possible user combinations
and orders.
Fig. 2 shows the performance for MT = 4. It is observed
that the proposed algorithms perform very close to the exhaustive search. For small K (e.g. K < 15 at SNR = 5
dB, and K < 30 at SNR = 10 dB in this example) the GA
outperforms the greedy algorithm, whereas for large K the
greedy algorithm outperforms the GA. At SNR = 10 dB, the
performance of the greedy algorithm and GA has been found
to be close to each other (with the greedy algorithm performing
marginally better than the GA at larger K); hence the plots
in Fig. 2b are not clearly distinguishable. The performance of
both algorithms is less than 0.8 bits/s/Hz inferior than optimal,
achieving about 95 98% of the sum rate of an exhaustive
search.
Similar results are observed for MT = 8 in Fig. 3. A
crossover of the performance curves of the greedy algorithm
and GA can be seen from these figures. The reason for this
as follows. When scheduling the maximum number of users
simultaneously, the cancelation of MUI becomes a significant
factor. It is often best to schedule less than the maximum

15

Average sum rate (bits/s/Hz)

14
13
12

(a) SNR = 5 dB

11
10
4
24

10

20

30

40

50

60

70

80

90

100

22
20

Exhaustive search
Proposed genetic algorithm
Proposed greedy algorithm

18

(b) SNR = 10 dB
16
4

10

20

30

40

50

60

70

80

90

100

Number of users (K)

Fig. 3. Performance of exhaustive search, proposed greedy and genetic


scheduling algorithms; MT = 8, Nk = 2, K0 = 4; SNR = 5 and 10 dB.

servable number of users in order to maximize the throughput


for small K. The greedy algorithm is biased towards scheduling the maximum number of users, whereas the GA is not.
Consequently, the performance of the greedy algorithm suffers
at small K. On the other hand, as K increases, the likelihood
of users having near-orthogonal channels increases as a result
of multiuser diversity, making it more likely to be optimal
to schedule K0 users. For example, at 10 dB, our exhaustive
search results indicate that at K=8, it is optimal to schedule 3
users instead of 4 about 61% of the time, whereas at K=20,
this drops to about 6.5%. For similar reasons, it becomes more
likely at higher K that the user with the best channel should be
scheduled, which the GA does not guarantee. Furthermore, it is
observed that the proposed greedy algorithm achieves similar
sum-rate growth versus K as the exhaustive search; the sum
rate of any beamforming (including SZF) grows as log(log K)
[15]. A plot of the greedy scheduling results vs. log(log K)
(not included due to lack of space) is indeed linear. The GA
does not keep up with the growth rate of the exhaustive search
at higher MT . Reasons for this and ways to compensate are
discussed in [10], but these are beyond the scope of this paper.
We also note that since SZF can serve more users at once
and can handle more interference than block diagonalization
(BD), it is also optimal to schedule all K0 users at comparatively lower values of K for SZF than for BD. Hence,
the crossover in performance is observed for SZF, whereas
no such crossover was observed up to K=100 in our related
scheduling work in the context of BD [16]. The performance
of both proposed algorithms is still quite close to that of an
exhaustive search, though not quite as close as for MT =4.
Full exhaustive search results are not available for larger K,
due to the combinatorially increasing complexity.
VI. C ONCLUSIONS
We have proposed and analyzed low complexity greedy
and genetic user scheduling algorithms for linearly precoded
multiuser MIMO downlink. The proposed algorithms are much
less complex, but perform close to the highly complex optimal
exhaustive search. We demonstrate that at lower K, the
genetic algorithm performs better than the greedy algorithm,
but at large K the greedy algorithm outperforms the genetic
algorithm. The greedy algorithm achieves similar sum-rate

growth with K as the exhaustive search, whereas the genetic


algorithm does not for larger MT . A detailed complexity
analysis of the proposed algorithms has also been presented,
and it shows that the genetic algorithm is more complex than
the greedy algorithm by an order of K02 , where K0 denotes the
maximum number of simultaneously served users. The current
work has shown that the proposed suboptimal algorithms can
potentially be used to improve the performance of fourth
generation wireless systems. However, in order to understand
the practicality of these scheduling algorithms, fairness should
also be incorporated into the scheduling metric, and the impact
of imperfect channel knowledge or correlation in the channel
on system performance should be investigated. Considering
fairness must first entail optimizing the transmit covariance
matrices for a weighted sum-rate, which to the best of our
knowledge, has not yet been considered in the literature.
R EFERENCES
[1] M. H. M. Costa, Writing on dirty paper, IEEE Trans. Inf. Theory,
vol. 29, no. 3, pp. 439441, May 1983.
[2] S. Vishwanath, N. Jindal, and A. Goldsmith, Duality, achievable rates,
and sum-rate capacity of Gaussian MIMO broadcast channels, IEEE
Trans. Inf. Theory, vol. 49, no. 10, pp. 26582668, Oct. 2003.
[3] T. Yoo and A. Goldsmith, On the optimality of multiantenna broadcast scheduling using zero-forcing beamforming, IEEE J. Sel. Areas
Commun., vol. 24, no. 3, pp. 528541, March 2006.
[4] Q. H. Spencer, A. L. Swindlehurst, and M. Haardt, Zero forcing methods for downlink spatial multiplexing in multiuser MIMO channels,
IEEE Trans. Signal Process., vol. 52, no. 2, pp. 461471, Feb. 2004.
[5] A. D. Dabbagh and D. J. Love, Precoding for multiple antenna Gaussian
broadcast channels with successive zero-forcing, IEEE Trans. Signal
Process., vol. 55, no. 7, pp. 38373850, July 2007.
[6] S. Sigdel and W. A. Krzymien, Simplified fair scheduling and antenna
selection algorithms for multiuser MIMO orthogonal space-division
multiplexing downlink, IEEE Trans. Veh. Technol., vol. 58, no. 3, pp.
13291344, March 2009.
[7] Z. Shen, R. Chen, J. G. Andrews, J. R. W. Heath, and B. L. Evans, Low
complexity user selection algorithms for multiuser MIMO systems with
block diagonalization, IEEE Trans. Signal Process., vol. 54, no. 9, pp.
36583663, Sept. 2006.
[8] S. Sigdel and W. A. Krzymien, Efficient user selection and ordering
algorithms for successive zero-forcing precoding for multiuser MIMO
downlink, in Proc. IEEE VTC09-Spring, April 2009, pp. 16.
[9] J. H. Holland, Adaptation in Natural and Artificial Systems, 1st ed. Ann
Arbor, MI: Univ. of Michigan Press, 1975.
[10] R. C. Elliott and W. A. Krzymien, Downlink scheduling via genetic
algorithms for multiuser single-carrier and multicarrier MIMO systems
with dirty paper coding, IEEE Trans. Veh. Technol., vol. 58, no. 7, pp.
32473262, Sept. 2009.
[11] N. Jindal, W. Rhee, S. Vishwanath, S. A. Jafar, and A. Goldsmith,
Sum power iterative water-filling for multi-antenna Gaussian broadcast
channels, IEEE Trans. Inf. Theory, vol. 51, no. 4, pp. 1570 1580,
April 2005.
[12] R. C. Elliott and W. A. Krzymien, On the convergence of genetic
scheduling algorithms for downlink transmission in multi-user MIMO
systems, in Proc. Int. Symp. Wireless Pers. Multimedia Commun.
(WPMC08), Sept. 2008.
[13] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed.
Baltimore, MD: The John Hopkins Univ. Press, 1996.
[14] C.-H. Guo and N. J. Higham, A Schur-Newton method for the matrix
pth root and its inverse, SIAM J. Matrix Anal. Appl, vol. 28, no. 3, pp.
788804, 2006.
[15] M. Sharif and B. Hassibi, A comparison of time-sharing, DPC, and
beamforming for MIMO broadcast channels with many users, IEEE
Trans. Commun., vol. 55, no. 1, pp. 1115, Jan 2007.
[16] S. Sigdel, R. C. Elliott, W. A. Krzymien, and M. Al-Shalash, Greedy
and genetic user scheduling algorithms for multiuser MIMO systems
with block diagonalization, accepted for publication in Proc. IEEE
VTC09-Fall, Anchorage, AK, USA, Sept. 2009.

Das könnte Ihnen auch gefallen