Sie sind auf Seite 1von 183

Dissertation

Wireline
Multiple - Input / Multiple - Output
Systems

ausgef
uhrt zum Zwecke der Erlangung des akademischen Grades
eines Doktors der technischen Wissenschaften

eingereicht an der
Technischen Universitat Wien
Fakultat f
ur Elektrotechnik und Informationstechnik

von

Dipl.-Ing. Georg Taubock


Wien, November 2005

Supervisor

Prof. Johannes Huber


Lehrstuhl f
ur Informations
ubertragung
Friedrich-Alexander-Universitat Erlangen-N
urnberg, Germany

Examiner

Prof. Johann Weinrichter


Institut f
ur Nachrichtentechnik und Hochfrequenztechnik
Technische Universitat Wien, Austria

ftw. Dissertation Series

Georg Taubock

Wireline
Multiple - Input / Multiple - Output
Systems

telecommunications research center vienna

This work was carried out with funding from Kplus


in the ftw. projects B0/I0/C4.

This thesis has been prepared using LATEX.

September 2006
1. Auflage
Alle Rechte vorbehalten
c 2006 Georg Taubock
Copyright
Herausgeber: Forschungszentrum Telekommunikation Wien
Printed in Austria
ISBN 3-902477-06-7

KURZFASSUNG

In den letzten Jahren kam die Idee auf, herkommliche Telefonleitungen fur die
hochratige Datenubertragung zu verwenden. Es stellte sich heraus, dass die existierenden Telefonleitungen schelle Datenubertragung unterstutzen, sofern sie nicht
zu lange sind. In der einschlagigen Literatur werden entsprechende digitale Datenu bertragungsverfahren unter dem Begriff xDSL (Digital Subscriber Line) Datenu bertragung zusammengefasst.
Die Ursache, dass die Verwendung von existierenden Telefonleitungen auf
kurzere Kabellangen beschrankt ist, liegt darin, dass in den meisten praktisch verwendeten Topologien die einzelnen Doppeladern zu Kabelbundel zusammengefasst sind, wodurch sich zwei Doppeladern, die nahe bei einander liegen, gegenseitig aufgrund des elektromagnetischen Feldes storen. In der entsprechenden Lit
eratur spricht man in diesem Zusammenhang auch von Ubersprechen.
Je nach

dem, wo das Ubersprechen


seinen Ursprung hat, wird das Ubersprechen
auch in
Near-End Crosstalk (NEXT) und Far-End Crosstalk (FEXT) eingeteilt. Fur die
vorliegende Arbeit beschranken wir uns auf FEXT.
Wir nehmen an, dass die Datenubertragung u ber die einzelnen Kabel mit Hilfe
von Discrete Multitone modulation (DMT) durchgefuhrt wird, das ist das Modulationsverfahren, das auch bei ADSL und VDSL eingesetzt wird. Da wir simultane Datenubertragung u ber mehrere Kabel eines Kabelbundels betrachten, kann

das komplette Ubertragungssystem


auch als Multiple - Input / Multiple - Output
(MIMO) System angesehen werden, und wir sprechen in diesem Zusammenhang
auch von MIMO DMT fur das gesamte Modulationsverfahren. Die vorliegende
Arbeit behandelt in sehr ausfuhrlicher Weise MIMO DMT.
Wir entwickeln eine Theorie fur komplexwertige Zufallsvektoren, die auch
rotationsvariante Zufallsvektoren (haben eine nicht verschwindende Pseudokovarianzmatrix) einschliet, und die daher fur uns von groter Bedeutung ist, da die
bei DMT und MIMO DMT auftretenden Zufallsvektoren im Allgemeinen diese
Eigenschaft besitzen (auch das wird in dieser Arbeit gezeigt). Wir beweisen ein
Verallgemeinertes Maximum Entropie Theorem und erzielen verschiedene Kapazitatsresultate, welche auch die Pseudokovarianzmatrix in ihren Ausdrucken inkludiert. Wir zeigen, dass eine nicht-verschwindende Pseudokovarianzmatrix (des
Rauschens) die Kapazitat erhoht und berechnen den Kapazitatsverlust, wenn irrtumlich angenommen wird, dass die Pseudokovarianzmatrix die Nullmatrix ist.
Wir fuhren eine detaillierte Rauschanalyse fur ein DMT System durch und
zeigen, dass das Rauschen am Entscheidereingang im Allgemeinen rotationsvariant ist. Wir berechnen die entsprechende Kovarianz- und Pseudokovarianzmatrix,

vi

welche dann in einem weiteren Schritt spezialisiert wird, um die Varianzen von
Real- und Imaginarteil des Rauschens und die Korrelationen zwischen Real- und
Imaginarteil des Rauschens bei fixen Frequenzen / Subtragern zu erhalten. Mittels Eigenwertzerlegungen ist es moglich, die Exzentrizitat und die Rotationen der
Rauschellipsen zu bestimmen. Es stellt sich heraus, dass die Rotationswinkel unabhangig von den Rauscheigenschaften am Empfangereingang sind. Sie hangen
ausschlielich von der Nummer des betrachteten Subtragers (Subtragerfrequenz)
ab. Auerdem wird gezeigt, dass verschiedene Rauschvarianzen bzw. Korrelationen nicht auftreten, wenn das Rauschen (am Empfangereingang) wei ist.
Fur farbiges Rauschen treten verschiedene Rauschvarianzen bzw. Korrelationen
allerdings auf, und man musste eigentlich rotierte rechteckige Signalpunktkonstellationen anstatt der u blichen (quadratischen) QAM - Konstellationen verwenden.
Andernfalls muss man einen Kapazitatsverlust und eine erhohte Symbolfehlerwahrscheinlichkeit hinnehmen. Wir berechnen beide Groen und zeigen, wie man
existierende Ladealgorithmen modifizieren muss, um die optimalen Konstellationsparameter zu erhalten.
Ebenso fuhren wir eine detaillierte Interferenzanalyse fur ein DMT System
durch. Wir betrachten den Fall, dass die Impulsantwort den Zyklischen Prafix auf
beiden Seiten u berschreitet, was zu Precursors und Postcursors von beiden benachbarten DMT Symbolen (Intersymbolinterferenz) und zu Intertragerinterferenz
fuhrt. Wir leiten geschlossene Formeln fur beide Beitrage ab und studieren ihre
statistischen Eigenschaften. Es stellt sich heraus, dass beide Interferenzbeitrage
komplexwertige Zufallsvektoren sind mit gleichen ersten und zweiten Momenten
und einer nicht-verschwindenden Pseudokovarianzmatrix.
Wir zeigen auch, wie die erzielten Rausch- und Interferenzresultate fur das
Design von Zeitbereichsentzerrer genutzt werden konnen.
In einem zweiten Schritt verallgemeinern wir die Rausch- und Interferenzresultate von DMT zu MIMO DMT. Wiederum ist es moglich, Losungen in geschlossener Form zu erhalten, und das sogar fur sehr allgemeine Annahmen in
Bezug auf die Korrelationen zwischen den verschiedenen Doppeladern eines Kabelbundels.

Wir prasentieren die allgemeine Form eines Ubertragungsverfahrens,


welches
auf den MIMO DMT Kanal abgestimmt ist und auf so genannten Joint Processing
Functions basiert. Es ermoglicht die Verwendung von Single - Input / Single Output (SISO) Codes.

Weiters beschaftigen wir uns mit Ubertragungsverfahren,


dessen Joint Processing Functions mit Hilfe der Singularwertzerlegung der Kanalmatrix bestimmt werden. Wir zeigen, dass die optimalen Joint Processing Functions auf diese Weise
erhalten werden konnen und studieren Varianten mit geringerer Komplexitat. Um
quantitative Resultate zu erzielen, fuhren wir Simulationen mit realistischen (praktisch genutzten) Parametern durch und vergleichen die verschiedenen Methoden.
Schlussendlich prasentieren wir das UP MIMO Verfahren (UP steht fur Unitary Parametrization) und behandeln verschiedenste Aspekte davon.

ABSTRACT

In recent years, the idea to use existing telephone lines for high data rate transmission took shape. It turned out that existing telephone lines can support high data
rates, provided that the lengths of the cables are not too long. In the corresponding
literature, these transmission methods are referred to as xDSL (Digital Subscriber
Line) transmission.
The reason that the use of existing telephone lines is limited to shorter loop
lengths is that the various twisted pairs are bundled together in cable bundles in
most practical topologies. It is quite clear that two loops that are close to each
other disturb each other because of the induced electro-magnetic field. Note that
this effect is called crosstalk in literature. Hence, there is recent research to reduce
the performance degradation introduced by crosstalk. There are several issues regarding crosstalk, and, therefore, different approaches to this problem. Note that
crosstalk is subdivided into two types, i.e., Near-End Crosstalk (NEXT) and FarEnd Crosstalk (FEXT), depending on the location the crosstalk originates from.
Throughout this manuscript, we consider only crosstalk that stems from the farend side (FEXT).
We will assume that transmission over the individual loops is performed via
Discrete Multitone modulation (DMT), which is the modulation scheme used in
ADSL and VDSL. Since we are considering simultaneous transmission over several
loops in a cable bundle, the whole transmission system can be regarded as a Multiple - Input / Multiple - Output (MIMO) system, and we will refer to the overall
modulation scheme as MIMO DMT modulation scheme.
The present work gives a comprehensive treatment of Multiple - Input / Multiple - Output Discrete Multitone (MIMO DMT) transmission. We show that such
a transmission scenario can be modeled by a complex vector channel, i.e., by a
deterministic complex matrix and by a complex noise vector.
We develop a theory for complex random vectors that takes into account rotationally variant random vectors, and is therefore of great importance for our
purpose, since complex random vectors (in DMT and MIMO DMT) have a nonvanishing pseudo-covariance matrix in general (as we will also show in this manuscript). We prove a Generalized Maximum Entropy Theorem, that includes the
pseudo-covariance matrix in its entropy inequality and therefore tightens the upper bound for rotationally variant random vectors. We show that the additional
correction term is independent of the specific probability distribution of the considered random vector. Furthermore, we obtain several capacity results for the
complex vector channel considered, that take into account the pseudo-covariance

viii

matrix. We show that a non-vanishing pseudo-covariance matrix (of the noise) increases capacity and calculate the capacity loss if it is erroneously assumed that the
pseudo-covariance matrix is the zero matrix. Note also that we derive a criterion
for a matrix to be a pseudo-covariance matrix. This generalizes the well-known
criterion that a matrix is a covariance matrix of a certain random vector if and only
if it is symmetric / Hermitian and non-negative definite.
We perform a detailed noise analysis for a DMT system and show that the
noise vector at the input of the Decision Device is rotationally variant in general.
We calculate the corresponding covariance matrix and pseudo-covariance matrix,
which is then specialized in order to obtain the noise variances of real and imaginary part and to obtain the correlations between real and imaginary part for a fixed
frequency / subcarrier. Via eigenvalue decompositions, we are able to determine
the eccentricities and the rotations of the noise ellipses. It turns out that the rotation
angles are independent of the actual noise characteristics. They only depend on the
number of the considered subcarrier. Furthermore, it is shown that different noise
variances and correlations of real and imaginary part do not occur in the presence
of white noise (at the input of the receiver). For colored noise, they do occur, and
one has to use rotated rectangular constellations instead of the common (square)
QAM constellations. Otherwise, one has to accept a capacity loss and increased
symbol error probability. We calculate both quantities. Furthermore, we show
how to modify the existing bit-loading algorithms in order to obtain the optimum
constellation parameters.
We also perform a detailed interference analysis for a DMT system. We consider the case when the channel impulse response exceeds the Cyclic Prefix on both
sides, which yields precursors and postcursors from both neighboring DMT symbols (intersymbol interference) and also intercarrier interference. We derive closed
form formulas for both contributions and consider their statistical properties as
well. It turns out that both interference contributions are complex random vectors
with equal first and second order moments and a non-vanishing pseudo-covariance
matrix.
We also show how the noise and interference results obtained can be utilized
for the design of Time Domain Equalizers.
In a second step, we generalize the noise and interference results from DMT to
the MIMO DMT case. Again, it is possible to obtain closed form solutions, even
for very general assumptions with respect to correlations across the various loops
of the cable bundle.
We present the general form of a transmission scheme that is suited to the
MIMO DMT channel and is based on so-called joint processing functions. It allows
the use of Single - Input / Single - Output (SISO) codes, and we introduce the (sum
-) capacity as a performance measure.
We deal with transmission schemes whose joint processing functions are based
on the Singular Value Decomposition (SVD) of the channel matrix. We show that
the optimum joint processing function can be obtained by means of the SVD. Furthermore, we study low(er)-complexity variations and discuss their performance.

ix

To obtain quantitative results, we perform simulations with realistic (practically


used) parameters and compare the various methods.
Finally, we present the UP MIMO scheme (UP stands for Unitary Parametrization), a scheme that was originally designed by the author for wireless transmission, and also has applications to wireline transmission. Specifically, it can be
used to reduce the computational complexity at the transmitter side (but not at the
receiver side). We treat various aspects of this scheme.

ACKNOWLEDGEMENT

I would like to thank Prof. Johannes Huber and Prof. Johann Weinrichter for their
support that goes far beyond what I would have expected.
I am grateful to all my colleagues at the Telecommunications Research Center
Vienna (ftw.), especially to Jossy Sayir for his continuous (in fact, continuous and
not piecewise continuous) assistance and encouragement, to Werner Henkel for his
support in every way, and to Driton Statovci for our fruitful discussions concerning
the practical aspects of my work. The collaboration with them was a constant
source of new ideas and entertaining hours. The professional, inspiring, and open
work environment at ftw., shaped by Markus Kommenda and Horst Rode, provided
the basis for the work on this thesis.
I would like to thank my wife, my family, and my friends for their continuous
sympathy during my research adventure, and especially my mother for being such
an enthusiastic grandmother with helping hands whenever the father is busy with
writing his thesis.
I would like to dedicate this work to my daughter Maria Shirin who is the
smiling sun in my life.

xii

CONTENTS

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2. The Subscriber-Line Network . . . . . . . . . . . . .
2.1 Discrete Multitone Modulation . . . . . . . . . .
2.2 Discrete Multitone Modulation on a Cable Bundle
2.3 The Channel Model . . . . . . . . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

. . . . . .
. . . . . .
. . . . . .
. . . . . .

5
5
16
24

3. Entropy and Capacity . . . . . . . . . . . . . . . . . . . . . . . . . .


3.1 The Maximum Entropy Theorem for Complex Random Vectors .
3.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Differential Entropy of Complex Random Vectors . . . . .
3.1.3 The Euclidean Matrix Norm . . . . . . . . . . . . . . . .
3.2 Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Rotationally Invariant Noise . . . . . . . . . . . . . . . .
3.2.2 Rotationally Variant Noise . . . . . . . . . . . . . . . . .
3.2.3 Capacity Loss . . . . . . . . . . . . . . . . . . . . . . . .

27
28
28
37
47
51
51
54
60

4. Noise and Interference Analysis of DMT . . . . . . . . . . . . . . . .


4.1 Derivation of the Noise Characteristics . . . . . . . . . . . . . . .
4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.2 First and Second Moments . . . . . . . . . . . . . . . . .
4.1.3 Frequency Dependent Noise Analysis . . . . . . . . . . .
4.1.4 Powers and Statistical Dependencies of Real and Imaginary Part of the Noise . . . . . . . . . . . . . . . . . . .
4.1.5 Consequences and Asymptotic Analysis of the Noise Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Capacity Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Symbol Error Probability and Optimized Bit-Loading . . . . . . .
4.4 Intersymbol Interference (ISI) and Intercarrier Interference (ICI) .
4.4.1 Deterministic Interference Analysis . . . . . . . . . . . .
4.4.2 Interference Statistics . . . . . . . . . . . . . . . . . . . .
4.4.3 Time Domain Equalizer . . . . . . . . . . . . . . . . . .

65
66
66
67
71
73
74
77
82
92
95
101
104

xiv

Contents

5. Multiple-Input / Multiple-Output Discrete Multitone . . . . . . . . . . .


5.1 Noise and Interference . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.2 Interference . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.3 MIMO Time Domain Equalizer . . . . . . . . . . . . . .
5.2 The Transmission Scenario . . . . . . . . . . . . . . . . . . . . .
5.3 New Design Methods based on the Singular Value Decomposition
5.3.1 Full Diagonalization . . . . . . . . . . . . . . . . . . . .
5.3.2 Approximate Diagonalization . . . . . . . . . . . . . . .
5.3.3 Diagonalization of Subsets . . . . . . . . . . . . . . . . .
5.3.4 Simulation Results . . . . . . . . . . . . . . . . . . . . .
5.4 The UP MIMO Scheme . . . . . . . . . . . . . . . . . . . . . . .
5.4.1 The Receiver Side . . . . . . . . . . . . . . . . . . . . .
5.4.2 Parametrization of Unitary (Orthonormal) Matrices . . . .
5.4.3 The Transmitter Side . . . . . . . . . . . . . . . . . . . .
5.4.4 Comments . . . . . . . . . . . . . . . . . . . . . . . . .
6. Conclusions and Outlook

107
108
108
115
120
121
126
127
131
132
133
136
137
139
142
145

. . . . . . . . . . . . . . . . . . . . . . . . 147

Appendix

151

A. Notation and Abbreviations . . . .


A.1 Mathematical Notation . . . .
A.2 Frequently Used Symbols . . .
A.3 Abbreviations . . . . . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

. . . . . .
. . . . . .
. . . . . .
. . . . . .

153
153
154
155

B. Simulation Scenarios . . . . . . . .
B.1 Scenario 1 . . . . . . . . . . .
B.1.1 Transmission Medium
B.1.2 DMT Parameters . . .
B.1.3 Noise Model . . . . .
B.2 Scenario 2 . . . . . . . . . . .
B.2.1 Transmission Medium
B.2.2 DMT Parameters . . .
B.2.3 Noise Model . . . . .
B.3 Scenario 3 . . . . . . . . . . .
B.3.1 Transmission Medium
B.3.2 DMT Parameters . . .
B.3.3 Noise Model . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .

157
157
157
157
157
158
158
158
159
159
160
160
160

1. INTRODUCTION

We are currently living in an age called the information age, characterized by a


dramatic increase in the exchange of information. Technically, this information
exchange is supported by the linking and networking of various systems, so that
the growing demand of (cross -) communication can be served. However, this
boosts the overall information flow, and we have to think of techniques that enable
us to transport this enormous amount of data.
In principle, the technologies that support transmission at extremely high data
rates already exist, but their technical implementation is expensive. Optical data
transmission is one example that provides us with the necessary data rates, but
that is very costly. Hence, economical aspects also play a role in the decision of
which technology to use. There is no doubt that optical data transmission will be
the preferred solution in the long term, but for the near future, people will try to
develop cheaper methods that accommodate the desired demand for high data rate
transmission.
In recent years, the idea to use existing telephone lines for high data rate transmission took shape. This solution is relatively inexpensive, so that - from an economical point of view - there would be no objections against it. It turns out that
existing telephone lines can support high data rates, provided that the lengths of the
cables are not too long. In the corresponding literature, these transmission methods
are referred to as xDSL (Digital Subscriber Line) transmission.
However, the data rates supported are limited for longer loop lengths. In such
situations, the application of an xDSL transmission system is limited.
So, one might ask, what are the (physical) reasons that xDSL transmission is
not effective for longer loop lengths? After analyzing this question, it was found
that the limiting mechanisms are due to the fact that the various twisted pairs are
bundled together in cable bundles in most practical topologies. It is quite clear even for people that are not electrical engineers - that two loops that are close to
each other disturb each other because of the induced electro-magnetic field. Note
that this effect is called crosstalk in literature.
So, is it possible to overcome these limitations? One trivial solution would be
not to bundle the different loops into cable bundles. On the other hand, this would
foil the economical advantages of xDSL transmission.
Hence, there is recent research to reduce the performance degradation introduced by crosstalk. There are several issues regarding crosstalk, and, therefore,
different approaches to this problem. Note that crosstalk is subdivided into two
types, i.e., Near-End Crosstalk (NEXT) and Far-End Crosstalk (FEXT), depending

1. Introduction

on the location the crosstalk originates from. Throughout this manuscript, we will
assume that we know how to cope with NEXT and consider only the crosstalk that
stems from the far-end side (FEXT). We want to emphasize that NEXT it still a
topic of ongoing research and our assumption is introduced to simplify matters by
considering only one crosstalk source.
There is one way to deal with this problem that is based on the viewpoint that
considers crosstalk not as disturbance but - instead - as part of the channel. In the
present work, we will pursue this idea and develop several methods for communication over channels with crosstalk. We also want to emphasize that the resulting
performance gains are in line with Information Theory, which has led to the fundamental result that reliable communication is only possible below a certain data
rate threshold (the capacity) for a given (physical) channel. If crosstalk is considered as part of the channel instead of as a disturbance source, this corresponds
to a change of the channel. Thus, Information Theory provides us with a new data
rate threshold.
In order to minimize the costs, we will assume that transmission over the individual loops is performed via Discrete Multitone modulation (DMT), which is the
modulation scheme used in ADSL and VDSL. Note that existing technologies are
usually cheaper than new technologies. Hence, a main part of this manuscript deals
with DMT, working out aspects not known before. We will show that there can be
situations (not only in theory but also in practice) where these aspects gain enormous influence on the transmission performance and should be taken into account.
The present work is structured as follows:
1. The first chapter contains this introduction.
2. The second chapter considers the Subscriber-Line Network and presents the
fundamentals of Discrete Multitone modulation (DMT). Furthermore, we are
considering transmission over cable bundles and show how the crosstalk can
be modeled in order to reflect the properties of the real world. If each loop
is equipped with DMT transmission, the crosstalk is transformed according
to the DMT modulation scheme. We will show that DMT transmission over
cable bundles can be described as a complex valued vector channel, i.e., as
a complex matrix - vector product plus an additive complex valued noise
vector. Since a vector channel has several inputs and outputs (the elements
of the vectors), we refer to such a scenario as Multiple - Input / Multiple
Output (MIMO) System.
3. Since we are dealing with complex valued vectors, Chapter 3 presents a theory about complex random vectors. Note that most literature about complex
random vectors deals only with a sub-class, the so-called rotationally invariant complex random vectors. We will show in Chapter 4 that the complex
random vectors we are looking at are not elements of this sub-class. Instead,
they are rotationally variant, so that it makes sense to develop this theory.
Specifically, we generalize the Maximum Entropy Theorem to rotationally

variant complex random vectors, and then use it to obtain certain capacity
results. With the introduction of a rotationally invariant analogon of a complex random vector we are able to show that the additional correction term in
the Generalized Maximum Entropy Theorem is independent of the actual distribution of the considered complex random vector. We want to emphasize
that it is not widely known that the complex random vectors are rotationally variant in general, and we will calculate the capacity loss that must be
accepted if this fact is neglected (using very general assumptions without
making use of any specific DMT properties).
4. As already mentioned, it is shown in Chapter 4 that we have to deal with rotationally variant random vectors. To be more precise, we will show in this
chapter that the noise vector (at the input of the Decision Device) has this
property (in general), and we will look at the implications resulting from
this observation. It turns out that colored noise at the input of the DMT receiver makes the noise vector at the input of the Decision Device rotationally
variant. In order to cope with rotationally variant noise, one should apply
rotated rectangular constellations instead of the common quadratic QAM
constellations, and we will derive analytical formulas for the (optimum) parameters / shape of these constellations. In case this is not done, one must
accept a capacity loss and higher symbol error probability. Both quantities
will be calculated.
Finally in this chapter, we consider the problem to determine the Time Domain Equalizer (TDE) coefficients. If the TDE coefficients are not suitably
adapted, there will be intersymbol and intercarrier interference. We will obtain closed form solutions for both interference contributions and study their
statistical properties. Again, it turns out that the interference is represented
by a rotationally variant complex random vector, which is a second argument for the use of rotated rectangular signal constellations.
5. The first part of Chapter 5 generalizes the results of Chapter 4 about noise
and interference to the MIMO case, i.e., instead of looking at one single loop
equipped with DMT transmission, we look at the whole cable bundle. This
is of great importance, since these results are fundamental for the design of
low-complexity MIMO Time Domain Equalizers. This design is much more
difficult than for the single loop case.
The second part presents the general form of a transmission scheme that
can cope with crosstalk. It is based on so-called joint processing functions
and allows the use of conventional Single - Input / Single - Output (SISO)
codes, so that we can again resort to existing technologies for this part of the
transmission scheme.
The third section deals with transmission schemes whose joint processing
functions are based on the Singular Value Decomposition (SVD) [19] of
the channel matrix. We will show that we can obtain the optimum joint

1. Introduction

processing functions by means of the SVD. Furthermore, we study low(er)complexity transmission variants and discuss their performance. To obtain
quantitative results, we perform simulations of these transmission schemes
with realistic (practically used) parameters and compare the various methods.
The final section of this chapter presents the UP MIMO1 scheme, a scheme
that was originally designed by the author for wireless transmission, and that
also has applications in wireline transmission. Specifically, it can be used to
reduce the computational complexity at the transmitter side (but not at the
receiver side). We will treat various aspects of this scheme.
6. The last chapter presents the overall conclusions and gives an outlook on
possible further developments.

The name comes from the terminology unitary parametrization.

2. THE SUBSCRIBER-LINE NETWORK

In this chapter we will first review one of the most important modulation schemes
used in the subscriber line network. This modulation scheme is called Discrete
Multitone modulation (DMT) and is currently used in the ADSL [25, 26, 2830]
and VDSL [912, 27] standards. It allows an efficient implementation and exhibits
performance near capacity. Our main focus in this chapter is to find a compact
mathematical description for the relationship between the data to be transmitted
and the received data.
Secondly, we will extend these considerations to DMT transmission over cable
bundles. It is well known, that if several modems transmit over a cable bundle simultaneously, each modem disturbs the other, so that we will encounter severe performance degradations. In literature, cf. [48], these interference mechanisms are
called Near-End Crosstalk (NEXT) and Far-End Crosstalk (FEXT). In this work we
will mainly focus on FEXT, since there exist1 several techniques [48] to compensate for NEXT, whereas FEXT cancellation methods are still a topic of research
and development. Again, it is our goal to find a compact model for the input output behavior.
It will turn out that both scenarios can be described essentially in the same way,
which puts us in a position to analyze and optimize both systems using the same
framework.

2.1 Discrete Multitone Modulation


Discrete Multitone modulation is a modulation scheme based on existing efficient
algorithms for the calculation of (inverse) discrete Fourier transforms (Fast Fourier
Transform - FFT, Inverse Fast Fourier Transform - IFFT, [59]). Adding and removing the so called Cyclic Prefix (also called Guard Interval), cf. [13], in the
transmitter and receiver, respectively, transform the linear and time-invariant distortions introduced by the channel (the copper wire) into cyclic distortions, which
can be completely removed by means of (inverse) discrete Fourier transforms. Alternatively, one can also say that the DMT is an intelligent way to cope with inter
symbol interference (ISI).
A complete (uncoded) DMT transmission system is depicted in Figure 2.1. For
a detailed discussion, we also refer to [13].
1
There is no doubt that present and future research should treat the various aspects of NEXT as
well.

2. The Subscriber-Line Network

Source

Sink
bit rate
1/ Tb

Inverse
Mapping

Mapping

Decision
Device

Frequency
Domain
Equalizer

Conjugate Complex
Extension

DMT symbol rate


1/ Ts

IFFT
(Length: N )

Remove Conjugate
Complex Extension

FFT
(Length: N )

Add Cyclic Prefix


(Length: p )

Remove Cyclic Prefix


(Length: p )

Parallel

Parallel

Serial

Serial

channel symbol rate


1/T

A/D Converter

D/A Converter

Transmit Filter
Driver Stage

Time Domain Equalizer


(Adaptive FIR Filter)

analog signal

Channel
(Twisted Pair)

Noise

Fig. 2.1: DMT transmission system.

Receive Filter
Lowpass

2.1. Discrete Multitone Modulation

The Source emits a sequence of bits that are converted in the block Mapping
into a sequence of complex valued symbol vectors of dimension N2 + 1, the so
called DMT symbols. Note that there is the additional constraint that the first
and last elements of the vectors (the DMT symbols) have to be real valued. Each
symbol element of the vector is chosen according to optimized discrete symbol
constellations. The optimization is usually carried out in a pre-transmission phase
in which channel and noise characteristics are measured, so that high performance
is guaranteed during transmission and the required (signal power) constraints (cf.
[912, 2530]) are fulfilled. In the next two blocks, Conjugate Complex Extension
and IFFT, each complex vector of dimension N2 +1 is extended to a complex vector
of dimension N , with the property that one half of the vector is a conjugated version
of the other half and passed through an inverse discrete Fourier transform of length
N . Note that the vectors at the output of the inverse discrete Fourier transform,
which is implemented using fast Fourier algorithms, are now real valued vectors.
The block Add Cyclic Prefix takes the last p elements of each vector and produces a
new (N + p) - dimensional vector, which is obtained by stacking these p elements
and the original N - dimensional vector. In the next step (block Parallel Serial),
all elements of these vectors are put together into a sequence of numbers, which are
then (block D/A Converter) transformed into the analog domain, passed through a
transmit filter, and transmitted over the twisted pair (block Transmit Filter, Driver
Stage).
The received signal is passed through a Receive Filter (Lowpass) and converted
back to the digital domain (block A/D Converter). It goes through an adaptive filter
called Time Domain Equalizer, and the symbols are then stacked into a sequence
of (N + p) - dimensional vectors (block Serial Parallel). The first p elements
of each vector are removed (block Remove Cyclic Prefix, which is naturally implemented together with the block Serial Parallel in practice), and for the resulting
N - dimensional vectors a Fast Fourier Transform (FFT) is computed. The first
N
2 + 1 elements of each vector (block Remove Conjugate Complex Extension) are
multiplied with N2 + 1 complex scalars2 (block Frequency Domain Equalizer),
which are calculated in the pre-transmission phase, so that the distortions introduced by the channel are compensated. The Decision Device maps back onto the
signal constellations, and the block Inverse Mapping in turn produces a bitstream,
which is (hopefully) equal to the bitstream generated by the Source.
Observe the relation between DMT symbol rate T1s and channel symbol rate T1 ,
cf. also Figure 2.1, given by
1
N +p
=
.
(2.1)
T
Ts
In order to analyze and optimize this system, we will build up a mathematical
model for the input - output behavior. We start with the part of the transmission
system that models the channel, as shown in Figure 2.2.
All involved signals are real valued, discrete-time signals and we assume that
2

The first and last scalars are real valued.

2. The Subscriber-Line Network

D/A Converter

A/D Converter

Transmit Filter
Driver Stage

Receive Filter
Lowpass

Channel
(Twisted Pair)

s
Noise

Fig. 2.2: DMT: The Channel.

the input - output behavior can be approximated by a linear and time-invariant


convolution of the input signal t = [t(n)]n=,..., with a real valued impulse
response g = [g(n)]n=,..., plus an additive noise term s = [s(n)]n=,..., ,
i.e.,
r = g t + s,

X
g(k)t(n k) + s(n).
r(n) =

(2.2)

k=

Next, we will include the so called Time Domain Equalizer (TDE) in our model,
cf. Figure 2.3. Again, we model its behavior by a convolution with a real valued,
time-discrete impulse response e = [e(n)]n=,..., , i.e.,
u = e r,

X
e(k)r(n k).
u(n) =

(2.3)

k=

Let h = [h(n)]n=,..., denote the convolution of e and g, h = e g, and let


z = [z(n)]n=,..., denote the convolution of e and s, z = e s. Then, we
obtain an input - output relationship as
u = h t + z,

X
u(n) =
h(k)t(n k) + z(n).

(2.4)

k=

It can be seen from Figure 2.3 that the TDE is an adaptive filter. Its coefficients
are determined in the pre-transmission phase, so that the overall channel impulse

2.1. Discrete Multitone Modulation

u
Time Domain Equalizer
(Adaptive FIR Filter)

D/A Converter

A/D Converter

Transmit Filter
Driver Stage

Receive Filter
Lowpass

Channel
(Twisted Pair)

s
Noise

Fig. 2.3: DMT: The Channel and the Time Domain Equalizer.

response h has a length shorter or equal to p + 1, p being the length of the Cyclic
Prefix, cf. Figure 2.1, or - to be more precise - they are determined such that
h(n) = 0,

n < 0 or n > p.

(2.5)

Note that for practically occurring channel impulse responses g, the resulting filter e will be always non-causal and therefore not implementable. Since a simple
delay in the receiver solves this problem, assumption (2.5) will be maintained for
simplicity.
We will address the issue of designing the coefficients of the TDE in more
detail (including an analysis of the case that (2.5) holds only approximately) later
in Section 4.4.
Due to assumption (2.5) the infinite sum in (2.4) is replaced by a finite sum, so
that (2.4) simplifies to
u(n) =

p
X

h(k)t(n k) + z(n).

(2.6)

k=0

We can now reformulate this representation using a matrix - vector notation.


With

z(n0 )
u(n0 )

z(n0 + 1)
u(n0 + 1)

vn0 =
bn0 =
RN ,
RN ,
..
..

.
.
z(n0 + N 1)
u(n0 + N 1)

2. The Subscriber-Line Network

10

qn0

t(n0 p)
..
.

R(N +p) ,

t(n0 1)
t(n0 )
t(n0 + 1)
..
.
t(n0 + N 1)

and

h(p) h(p 1)

..
..

.
.
H =

..

.
0

h(0)

0
..

..

.
h(p) h(p 1)

..

h(0)

N (N +p)

R
we have

bn0 = Hqn0 + vn0 .

(2.7)

From Figure 2.4, it can be seen that (2.7) describes the input - output behavior
between the input of the Parallel Serial block and the output of the Remove
Cyclic Prefix block.
Let an0 denote the N - dimensional input vector of the Add Cyclic Prefix block,
cf. also Figure 2.4. In this block3 , the last p elements of the vector are stacked
together with all N elements of the vector and are output as a (N +p) - dimensional
vector. We can write this operation as a matrix - vector product, i.e., qn0 = Ran0 ,
with

0 Ip

R(N +p)N ,
R=
(2.8)

IN

Ip and IN being identity matrices of dimension p and N , respectively. With Hcyc =


HR, we obtain the relation
bn0 = Hcyc an0 + vn0 .
3

We assume N > p.

(2.9)

2.1. Discrete Multitone Modulation

. . . , an0 , an0 +N +p , . . .

11

. . . , bn0 , bn0 +N +p , . . .

Add Cyclic Prefix


(Length: p )

Remove Cyclic Prefix


(Length: p )

. . . , qn0 , qn0 +N +p , . . .
Parallel

Parallel

Serial

Serial

Time Domain Equalizer


(Adaptive FIR Filter)

D/A Converter

A/D Converter

Transmit Filter
Driver Stage

Receive Filter
Lowpass

Channel
(Twisted Pair)

Noise

Fig. 2.4: DMT: The Channel, the Time Domain Equalizer, the Parallel / Serial Conversion,
and the Cyclic Prefix.

12

2. The Subscriber-Line Network

A straightforward calculation yields

h(0) h(1) h(p)


0

..
..
..

.
.
.

..
..

.
.
h(p)
T
RN N ,
Hcyc =

.
.
.
.
.
.
h(p)

.
.
0
.

..

..
..
.
.
. h(1)
h(1) h(p)
h(0)
which shows that the transposed matrix HT
cyc is a cyclic matrix - each row can be
obtained by cyclic permutations of the other rows. It is well known [13, 59] that
cyclic matrices can be diagonalized by means of the discrete Fourier transform
(DFT) and the inverse discrete Fourier transform (IDFT). Let
h
i
h
i
2
2
F = 1N e N kl
and F1 = 1N e N kl
k,l=0,...,N 1

k,l=0,...,N 1

(2.10)

denote the DFT - and the IDFT - matrix, respectively, and let
H(z) =

X
n=

h(n)z

p
X

h(n)z n

(2.11)

n=0

denote the Z - transform [13, 59] of h. Then,

2
0
H e N 0

H e N 1

1 T

F Hcyc F =
..

.
2

0
H e N (N 1)
and transposing this equation (using F = FT and F1 = FT ),

2
H e N 0
0

H e N 1

FHcyc F =
..

.
2

0
H e N (N 1)

(2.12)

Note that the block IFFT in the transmitter performs a multiplication with F1 and
the block FFT in the receiver performs a multiplication with F using efficient fast
Fourier algorithms, cf. Figure 2.5, such that Hcyc is diagonalized and hence ISI is
avoided.

2.1. Discrete Multitone Modulation

. . . , en0 , en0 +N +p , . . .

13

. . . , mn0 , mn0 +N +p , . . .
Frequency
Domain
Equalizer

. . . , fn0 , fn0 +N +p , . . .
Conjugate Complex
Extension

Remove Conjugate
Complex Extension

. . . , cn0 , cn0 +N +p , . . .

. . . , dn0 , dn0 +N +p , . . .

IFFT
(Length: N )

FFT
(Length: N )

Add Cyclic Prefix


(Length: p )

Remove Cyclic Prefix


(Length: p )

Parallel

Parallel

Serial

Serial

Time Domain Equalizer


(Adaptive FIR Filter)

D/A Converter

A/D Converter

Transmit Filter
Driver Stage

Receive Filter
Lowpass

Channel
(Twisted Pair)

Noise

Fig. 2.5: DMT: The Channel, the Time Domain Equalizer, the Parallel / Serial Conversion, the Cyclic Prefix, the (I)FFT, the Conjugate Complex Extension, and the
Frequency Domain Equalizer.

2. The Subscriber-Line Network

14

The blocks Conjugate Complex Extension and Remove Conjugate Complex Extension have the task of achieving a real valued transmit signal. This happens if the
output vector an0 of the IDFT is real valued. In order to guarantee this constraint,
the input vector of the IDFT has to fulfill a Hermitian symmetry condition, i.e., the
vector (for even N )

cn0 (0)

..

1
N

cn0 =
cn0 ( 2 ) with an0 = F cn0

..

.
cn0 (N 1)
has to satisfy [13, 59]
cn0 (n) = cn0 (N n),

n = 1, . . . , N2 1

and

= {cn0 (0)} = = cn0 ( N2 ) = 0,

(2.13)

where the superscript denotes

complex conjugation. The block Conjugate Complex Extension takes its N2 + 1 - dimensional complex input vector,

en0 (0)

..
=
,
.
N
en 0 ( 2 )

en0

for which the firstand last elements


are assumed to have vanishing imaginary parts,

= {en0 (0)} = = en0 ( N2 ) = 0, and extends it to the N - dimensional complex


vector

en0 (0)

..

en 0 ( N )
2

cn0 =
(2.14)
e ( N 1) .

n0 2

..

.
en0 (1)
On the other hand, the block Remove Conjugate Complex Extension inverts the
operation of the block Conjugate Complex Extension. Mathematically, this can be
written as a matrix - vector product, i.e., fn0 = Edn0 , with
h
i
N
E = I N2 +1 0 R( 2 +1)N
(2.15)
and dn0 being the output vector of the DFT, dn0 = Fbn0 . For the nomenclature of
the previously described input / output vectors, we also refer to Figure 2.5. Finally,

2.1. Discrete Multitone Modulation

15

we obtain an input - output relation as


fn0

= Edn0
= EFbn0

= EF Hcyc an0 + vn0

(2.16)

= EFHcyc an0 + Ewn0


= EFHcyc F1 cn0 + Ewn0
= Den0 + Ewn0
with the abbreviations
wn0 = Fvn0
and

D=

2
H e N 0

..

.
2 N
H e N 2

2
H e N 1

0
Let be defined as

: C C,

z 7 z =

1
z,

z=
6 0
0, z = 0

and let D denote the (Moore - Penrose) pseudo inverse [19] of D, i.e.,

2
0
H e N 0

1
e 2
N
H

.
D =

..

.
2 N

0
H e N 2
The Frequency Domain Equalizer (FDE), cf. also Figure 2.5, performs a multiplication of its input vector with the matrix D , i.e., mn0 = D fn0 , so that we finally
obtain
mn0 = en0 + D Ewn0 ,
(2.17)
2
with the additional requirement that en0 (n) = 0 if H e N n = 0.
The block Mapping, see Figure 2.1, maps the bitstream onto complex symbols,
according to specific constellations (usually QAM constellations). In principle, we
are free to design these constellations, except that the constellations corresponding to subcarrier 0 and N2 have to be real valued
(they correspond to en0 (0) and

en0 ( N2 )), and that en0 (n) = 0 if H e N n = 0. However, there is another requirement that we have to take care of. We cannot transmit with arbitrarily high

2. The Subscriber-Line Network

16

power, as this would violate the standards [912, 2530] and is also impossible
from an engineering point of view. In order to model this power constraint, we
will assume that the average sum power4 of the vector en0 is smaller or equal to
a certain number SDMT . This power constraint now translates to a constraint on
the possible signal constellations. We also want to emphasize that this model is
only an approximation for the power constraint of a real system required by the
standards [912, 2530], but it allows an analysis of the system on one hand, and is
also accurate to some extent on the other hand. Note that in practical systems, there
is an additional constraint on the power spectral density of the transmit signal.
There remain two blocks to explain. The Decision Device maps (rounds) the
noisy elements of the vectors mn0 back onto the constellations, whereas the block
Inverse Mapping is an exact inversion of the block Mapping and produces a bitstream, which is hopefully equal to the bitstream emitted by the Source.

2.2 Discrete Multitone Modulation on a Cable Bundle


In real transmission scenarios, the various loops that are used for transmission are
bundled together into a Cable Bundle. For example, consider the transmission between the Central Office and the Cabinet [48]. Since many customers have to be
provided with high data rate links, the telecom providers try to use the same cable duct for the different loops for as far as possible and to split the loops at the
furthest possible position for economical reasons. Usually this split position is the
Cabinet. A similar application for cable bundles might be data transmission from
and to a mobile base station, where high data rates are required and one loop has
not enough capacity. Of course, optical fibers will have sufficient capacity, but they
are expensive5 and cable bundles could therefore be a good compromise between
cost and data rate. On the other hand, it is intuitively clear that loops that are close
to each other disturb each other and limit the performance. This can be illustrated
as depicted in Figure 2.6. The k-th receiver (here shown for the 2-nd receiver)
receives not only the (distorted) signal from transmitter k plus (background) noise.
It also receives signals from all other transmitters, no matter in what direction they
transmit. We classify these additional receive signals by means of their origin: if
the source of the signal comes from the same location as the considered receiver,
we call it Near-End Crosstalk (NEXT) [48], whereas if the source of the signal
comes from the other end of the transmission line, we call it Far-End Crosstalk
(FEXT) [48]. Note that it is - at least conceptually - easier to cope with NEXT,
since the NEXT producing signals are also known in receiver k (they are at the
same location) and can therefore be subtracted [3]. This approach is called NEXT
cancellation [48]. Another approach is to use different frequency bands for different transmit directions. Therefore, we will assume in the subsequent sections and
4

To be more precise: the vector en0 is modeled as random vector having a correlation matrix [34]
with a trace smaller or equal to SDMT .
5
In fact, it is not the fiber that is expensive but the equipment for modulating optical signals and
- of course - to lay the fiber. Fiber is cheaper than copper.

2.2. Discrete Multitone Modulation on a Cable Bundle

17

1
2
FEXT

K
NEXT

1
2

Noise

Fig. 2.6: DMT: Transmission over a Cable Bundle.

chapters that there is no Near-End Crosstalk, without (severe) loss of generality.


The more challenging disturbance is Far-End Crosstalk. In principle, there are two
ways of dealing with these signals. One is to consider them as additional noise.
The other - more intelligent - is to develop algorithms that utilize the additional
signals, which will result in better performance (higher capacity), but will require
more computational effort. The design of such algorithms that achieve superior
performance and are efficiently implementable is still an open problem. We will
mainly treat this issue. To make things easier - from an analytical and economical
point of view - we will assume that each loop is equipped with Discrete Multitone modulation (DMT). Note that DMT is an existing, widely used and therefore
inexpensive technology, that is amenable to analysis, cf. Section 2.1.
The transmission system we are going to deal with is depicted in Figure 2.7
with a block definition as shown in Figure 2.8. Since it has several inputs and outputs, it can be considered as a Multiple - Input / Multiple - Output (MIMO) system.
We assume that all transmitters and receivers are synchronized, i.e., all transmitters use the same clock and sampling rate and, similarly, all receivers use the same
clock and sampling rate. Furthermore, all DMT modulators and demodulators are
assumed to operate with the same parameters: the (I)DFT lengths are all the same,
as well as the lengths of the Cyclic Prefixes.
Like in the single loop case, we start the analysis of this system with the part
that models the channel, cf. Fig 2.9. Again, all involved signals are real valued,
discrete-time signals and are denoted by thki , shki , rhki , and uhki , respectively,
where the superscript bracket notation hki , k = 1, . . . , K, is chosen to avoid a con-

2. The Subscriber-Line Network

18

hKi

hKi

. . . , en0 , en0 +N +p , . . .
?
DMT Modulator

h2i

Transmit Unit

h2i

. . . , en0 , en0 +N +p , . . .
?
DMT Modulator

Transmit Unit

DMT Modulator

Transmit Unit

h1i h1i
. . . , en0 , en0 +N +p , . . .

Noise

FEXT

h1i

h1i

. . . , fn0 , fn0 +N +p , . . .
?
DMT Demodulator

Time Domain Equalizer 1

Receive Unit

DMT Demodulator

Time Domain Equalizer 2

Receive Unit

Time Domain Equalizer K

Receive Unit

6
h2i h2i
. . . , fn0 , fn0 +N +p , . . .
DMT Demodulator

6
hKi hKi
. . . , fn0 , fn0 +N +p , . . .

Fig. 2.7: Multiple - Input / Multiple - Output DMT transmission system.

. . . , en0 , en0 +N +p , . . .

2.2. Discrete Multitone Modulation on a Cable Bundle

IFFT
(Length: N )

Parallel

Serial

Parallel

Serial

thki

hki

hki

Conjugate
Complex
Extension

Add
Cyclic
Prefix
(Length: p )

19

. . . , fn0 , fn0 +N +p , . . .

DMT Modulator

hki

hki

Remove
Conjugate
Complex
Extension

FFT
(Length: N )

Remove
Cyclic
Prefix
(Length: p )

DMT Demodulator

thki

Transmit Filter
Driver Stage

D/A Converter

Transmit Unit

rhki

Receive Filter
Lowpass

A/D Converter

Receive Unit

Fig. 2.8: Block definition.

uhki

2. The Subscriber-Line Network

20

thKi

Transmit Unit

th2i

Transmit Unit

th1i

Transmit Unit

Noise

sh1i , sh2i , . . . , shKi

uh1i

Time Domain Equalizer 1

uh2i

Time Domain Equalizer 2

uhKi

Time Domain Equalizer K

rh1i
rh2i

FEXT

Receive Unit

Receive Unit

rhKi
Receive Unit

Fig. 2.9: MIMO DMT: The Channel and the Time Domain Equalizers.

2.2. Discrete Multitone Modulation on a Cable Bundle

21

fusion between a signal on the k-th loop and the k-th power of a signal. The input
- output behavior is approximated
by linear and time-invariant convolutions of the

input signals thli = thli (n) n=,..., with real valued impulse responses ghkli =
hkli

g (n) n=,..., plus additive noise terms shki = shki (n) n=,..., . Note
that k, l = 1,
. . . , K. In contrast to the single loop scenario, the k-th received signal rhki = rhki (n) n=,..., does not only depend on the k-th transmit signal
thki . It also depends (linearly) on all other transmit signals thli , l 6= k, i.e.,
rhki =
r

hki

(n) =

K
X
l=1
K
X

ghkli thli + shki ,

(2.18)

g hkli (m)thli (n m) + shki (n).

l=1 m=

The Time Domain Equalizers (TDEs) have a similar function as in the single
loop case. Again, it is their task to shorten the impulse responses, and their behavior is modeled
by

convolutions with real valued, time-discrete impulse responses


ehki = ehki (n) n=,..., , i.e.,
uhki = ehki rhki ,

X
hki
u (n) =
ehki (m)rhki (n m).

(2.19)

m=

For complexity reasons, we do not allow the k-th TDE


to use other input signals
than the k-th input signal rhki . Let hhkli = hhkli (n) n=,..., denote the con

volution of ehki and ghkli , hhkli = ehki ghkli , and let zhki = z hki (n) n=,...,
denote the convolution of ehki and shki , zhki = ehki shki . Then, we obtain an input
- output relation as
uhki =

K
X

hhkli thli + zhki ,

(2.20)

l=1

uhki (n) =

X
X

hhkli (m)thli (n m) + z hki (n).

l=1 m=

The coefficients of the TDEs are determined in the pre-transmission phase,


such that all overall channel impulse responses hhkli have lengths shorter or equal
to p + 1, p being the length of the Cyclic Prefixes, or - to be more precise - they are
determined such that
hhkli (n) = 0,

n < 0 or n > p.

(2.21)

Note that the calculation of the TDE coefficients is a nontrivial problem, since the
impulse response of the k-th TDE, ehki , has to shorten all ghkli , l = 1, . . . , K,

2. The Subscriber-Line Network

22

simultaneously, and can therefore be an over-determined problem, which can be


only solved in an approximate sense. We will address the issue of designing the
coefficients of the TDEs in more detail later in Section 5.1.
The MIMO DMT transmission system, as depicted in Figure 2.7, has DMT
demodulators at each TDE output. As seen in the previous section, the DMT demodulator performs a linear operation of its input signal
hki

K
X

hhkli thli + zhki .

(2.22)

l=1

We can therefore interchange the sum in (2.22) with the operation of the DMT demodulator and apply the analysis of single loop DMT (Section 2.1) to each element
of the sum. The input - output behavior of the whole MIMO DMT system is then
obtained by summation. Let
hki

N
2

+1

hki

N
2

+1

. . . , ehki
n0 , en0 +N +p , . . .
and

. . . , fnhki
, fn0 +N +p , . . .
0

denote the input vector sequence of the k-th modulator and the output vector sequence of the k-th demodulator, respectively, and define

z hki (n0 )

z hki (n0 + 1)

hki
vn0 =
RN .
..

.
z hki (n0 + N 1)
Then the MIMO DMT input - output behavior can be immediately derived from
equation (2.16) as
K
X
hki
fnhki
=
Dhkli ehli
(2.23)
n0 + Ewn0
0
l=1

with the abbreviations


wnhki
= Fvnhki
0
0
and

Dhkli

2
H hkli e N 0
H hkli

0
where
H hkli (z) =

p
X
n=0

1
2
N

..

.
2 N
N 2
hkli
H
e

hhkli (n)z n

(2.24)

2.2. Discrete Multitone Modulation on a Cable Bundle

23

denotes the Z - transform of hhkli and E and F are defined in (2.15) and (2.10),
respectively. With equation (2.23), we have found an expression that - in principle
- can be used as starting point for further analysis and optimization. Unfortunately,
we have to deal with a set of K equations, since k takes values from 1 to K. For
simplicity (and also for notational reasons) we will rewrite equation (2.23) in the
following way. Let
h1i

h1i

h1i
en0 (0)
fn0 (0)
wn0 (0)

..
..
..

.
.
.

hKi

hKi
hKi
en0 (0)
fn0 (0)
wn0 (0)
h1i

h1i

en0 (1)

fn0 (1)
wnh1i
(1)
0

..
..
..

.
.
.

ehKi (1)
f hKi (1)
whKi (1)
n
n
n

0
0
0
xn0 =
, yn0 =
, and nn0 =

..
..
..

.
.
.

h1i N
h1i N
h1i N
en 0 2
fn0 2
wn0 2

..
..
..

.
.
.

hKi N
hKi N
hKi N
en0 2
fn0
wn0 2
2
C ( 2 +1)K
N

denote stacked versions of the input, output and noise vectors, respectively, and let
(0)

H
0

H(1)
N
N

HMIMO DMT =
C ( 2 +1)K( 2 +1)K
..

.
0

H( 2 )

with

H(n) =

2
H h11i e N n

..
..
.
. 2
N n
hK1i
H
e

2
H h1Ki e N n
..
.

2
hKKi
H
e N n

C KK

denote a block diagonal matrix. Then, equation (2.23) translates to


yn0 = HMIMO DMT xn0 + nn0 ,

(2.25)

and we have found a compact mathematical description of the input - output behavior for synchronized DMT transmission over cable bundles. Like in the single loop
case, we have to impose a power constraint. Instead of having power constraints

2. The Subscriber-Line Network

24

for each loop, we will assume that the average sum power6 of the vector xn0 is
smaller or equal to a certain number SMIMO DMT . Mathematically, this is easier to
handle, and if we set SMIMO DMT = KSDMT , it can be shown by simulation that
in real systems, this relaxed power constraint is sufficient to guarantee a correct
power distribution on the various loops.

2.3 The Channel Model


In the previous two sections, we built up a mathematical model for the behavior of
DMT transmission over one single loop and DMT transmission over a cable bundle. An inspection of the (final) equations (2.16) and (2.25) shows that both models
have the same structure, i.e., the output vector is obtained by a multiplication of the
input vector with a certain matrix plus an additive noise vector. As a consequence,
both scenarios are special cases of the following general channel model,
y = Ax + n,

(2.26)

where y C r and x C t denote the received and transmitted vectors, respectively. A is a deterministic r t complex matrix, the channel matrix, and n C r
is the noise vector. The transmitter is constrained in its total power7 to S,
E{xH x} S,
(2.27)

or, equivalently, since xH x = tr xxH , and expectation and trace commute,

tr E{xxH } S.
(2.28)
Observe that all occurring vectors are modeled as random vectors, so that the expectation operation E{} in equations (2.27) and (2.28) is well defined. We will
discuss related definitions and properties of complex random vectors in Section
3.1.
Furthermore, we want to emphasize that the channel matrix as well as the noise
characteristics are assumed to be known at the receiver and the transmitter. This
means that a real system has to determine (estimate) these parameters in a pretransmission phase before the effective information transmission starts.
If we compare the results obtained for DMT transmission over one single loop,
cf. equations (2.16) and (2.17), with this channel model, we conclude that A is a
diagonal matrix and we have r = t = N2 + 1 in this case. The Frequency Domain
Equalizer (FDE) even yields an identity channel matrix.
In the MIMO DMT case, cf. equation (2.25), the channel matrix is no longer
a diagonal matrix. It is a block diagonal matrix. All sub-matrices on the diagonal
N have dimension K K, whereas the whole matrix has dimensions r = t =
2 + 1 K. Observe that there is no FEXT if and only if A is a diagonal matrix.
6

To be more precise: the vector xn0 is modeled as random vector having a correlation matrix
[34] with a trace smaller or equal to SMIMO DMT .
7
The superscript H denotes Hermitian transposition, i.e., transposition and complex conjugation.

2.3. The Channel Model

25

In the following chapters, we will try to analyze and optimize transmission


over channels of the form (2.26), (2.27) and (2.28). We will develop algorithms
that are adapted to this transmission scenario and that can be efficiently implemented. We will show that in the special case of a diagonal or block diagonal
channel matrix, the computational effort can be remarkably reduced. We also want
to emphasize that the complex channel as defined in equations (2.26), (2.27) and
(2.28) is equivalent to a real channel of doubled dimension. Let <{} and ={}
denote element-wise real and imaginary part, respectively. With

<{x}
<{y}
<{n}
2t
2r
x
=
R , y
=
R , n
=
R2r ,
={x}
={y}
={n}
and

=
A

<{A} ={A}
={A}
<{A}

R2r2t ,

the channel (2.26) can be similarly written as


x + n
y
= A
,

(2.29)

The power constraints (2.27) and (2.28) translate to

and

E{
xT x
} S

(2.30)

tr E{
xx
T } S,

(2.31)

respectively. Note that the real and complex channel descriptions are fully equivalent.
Since the noise is modeled as a random vector, the noise characteristics are
specified by a (multivariate) joint probability density function (p.d.f.) - if it exists. If it is not explicitly mentioned otherwise, we will assume throughout this
manuscript, that n
is distributed according to a 2r - dimensional, zero-mean, Gaussian distribution [34], i.e., its p.d.f is given by
T 1
1
1
fn (
x) = p
e 2 (xn ) Cn (xn ) ,
det (2Cn )

(2.32)

where n = E{
n} = 0 R2r denotes the zero expectation / mean vector, and
Cn = E{(
n n ) (
n n )T } = E{
nn
T } R2r2r denotes the covariance
matrix. As already mentioned, we will assume, that the covariance matrix Cn is
known both at the receiver and transmitter, since the covariance matrix specifies
the probability distribution of a real valued, zero-mean, Gaussian random vector
completely.
If we deal with the complex channel model, cf. equation (2.26), we have to
consider a complex random noise vector as well. It is well known [41], that the
knowledge of the covariance matrix of a complex valued, zero-mean, Gaussian

26

2. The Subscriber-Line Network

random vector is not sufficient for a complete statistical description. Only for a subclass, the so-called rotationally invariant random vectors, the covariance matrix
fully specifies the p.d.f. . We will treat the theory of complex random vectors in
Section 3.1.

3. ENTROPY AND CAPACITY

In this chapter, it is our goal to calculate the capacity [6] for channels of the form
(2.26), (2.27) and (2.28). Since this is the complex channel representation, we have
to deal with complex random vectors. As already mentioned in Section 2.3, for
rotationally variant Gaussian complex random vectors, mean and covariance matrix are not sufficient for a statistical description. The knowledge of the so-called
pseudo-covariance matrix1 is also mandatory [41]. Therefore, the first section
(Section 3.1) of this chapter is dedicated to the theory of complex random vectors
and presents the main definitions and theorems. Furthermore, the concept of Entropy is extended from the real to the complex case, and a Generalized Maximum
Entropy Theorem is proved, cf. also [57]. In contrast to previous work [41, 58],
this theorem also takes into account the pseudo-covariance matrix and strengthens
the results known before. With the introduction of a rotationally invariant analogon of a complex random vector we are able to show that the main statement
is independent of the probability distribution of the considered complex random
vector.
The reason why we give such a detailed introduction to complex random vectors is that we will show in Section 4.1 that the noise vector Ewn0 in (2.16) and
in turn the noise vector nn0 in (2.25) are rotationally variant in general, and therefore the pseudo-covariance matrices of these noise vectors should be taken into
account. In practical systems, as well as in literature, this fact is simply neglected,
resulting in an unnecessary performance loss.
We will calculate the capacity for three cases. First, for rotationally invariant
noise, and second, for rotationally variant noise, where the pseudo-covariance matrix is taken into account. For the third case, see also [55], we assume that the
noise is rotationally variant but it is erroneously believed that the noise is rotationally invariant, so that the pseudo-covariance matrix is neglected. This results in a
decreased capacity and we will calculate the capacity loss. For related literature
we also refer to [33, 39, 62].
1
Note that the add on pseudo does not tell us anything about the definition of this matrix. This
is probably one of the reasons why other appellations - such as complementary covariance matrix
(see [34]) - are circulating in literature as well. None of these appellations is really satisfactory in
our opinion. In fact, this matrix is the cross-covariance matrix [34] of the considered random vector
and its complex conjugate. However, in the subsequent sections we will maintain the widely used
terminology pseudo-covariance matrix.

3. Entropy and Capacity

28

3.1 The Maximum Entropy Theorem for Complex Random Vectors


3.1.1 Preliminaries
A complex random vector x C n is defined as a random vector of the form
x = <{x} + ={x},

(3.1)

where the real and imaginary parts, <{x} and ={x} are real random vectors. The
expectation (mean) vector of a real random vector is naturally generalized to the
complex case as x = E{x} = E{<{x}} + E{={x}}. The statistical properties
of x = <{x} + ={x} are determined by the joint probability density function
(p.d.f.) of the
vector x
R2n consisting of its real and imaginary
real random

<{x}
parts, x
=
. A complex random vector x is said to be Gaussian if the
={x}
real random vector x
is Gaussian. Thus, to specify the distribution of a complex
Gaussian random vector x, it is necessary to specify the expectation vector x =
E{
x} R2n (or, equivalently, x C n ) and the covariance of x
, namely,
Cx = E{(
x x ) (
x x )H } = E{(
x x ) (
x x )T } R2n2n .

(3.2)

Using the covariance matrix


Cx = E{(x x ) (x x )H }

(3.3)

and the pseudo-covariance matrix


Px = E{(x x ) (x x )T }
one finds - after some algebra
E{<{x x }<{x x }T }
Cx =
E{={x x }<{x x }T }

1 <{Cx } ={Cx }
1
=
+
<{Cx }
2 ={Cx }
2
|
|
{z
}

(3.4)

E{<{x x }={x x }T }
E{={x x }={x x }T }

<{Px }
={Px }
(3.5)
,
={Px } <{Px }
{z
}

x
C

x
P

from which it follows that the knowledge of x , Cx , and Px also determines a


complex Gaussian random vector completely. Conversely, by the following relations,
(3.6)
T

<{Cx } = E{<{x x }<{x x } } + E{={x x }={x x } },


={Cx } = E{={x x }<{x x }T } E{<{x x }={x x }T },
<{Px } = E{<{x x }<{x x }T } E{={x x }={x x }T },
={Px } = E{={x x }<{x x }T } + E{<{x x }={x x }T },
Cx and Px can be calculated from Cx as well.

3.1. The Maximum Entropy Theorem for Complex Random Vectors

29

Proposition 3.1 Let x C n denote a complex random vector with expectation


vector x , covariance matrix Cx and pseudo-covariance matrix Px . Let A
C mn and b C m denote a deterministic matrix and a deterministic vector,
respectively, and consider the complex random vector y C m , obtained by the
affine transformation y = Ax + b. Then,
y = Ax + b,

Cy = ACx AH

and Py = APx AT .

Proof. y y = Ax + b Ax b = A (x x ), and therefore Cy =


= E{(y y ) (y y )H } = E{A (x x ) (x x )H AH } = ACx AH and
Py = E{(y y ) (y y )T } = E{A (x x ) (x x )T AT } = APx AT .

Definition 3.2 A complex random vector x C n is called rotationally invariant


(cf. [32]) or proper (cf. [41]) or circularly symmetric (cf. [58]) if its pseudocovariance matrix Px vanishes.
Corollary 3.3 Rotational invariance is invariant under affine transformations, i.e.,
Px = 0

PAx+b = 0.

Proof. Apply Proposition 3.1 to Definition 3.2.


and A

Decomposition (3.5) suggests the following definition2 of mappings x


, A

(note that x
and A can be easily distinguished considering their domains),
Definition 3.4

2n

x
: C R ,

x 7 x
=

<{x}
={x}

: C nm R2n2m ,
A

=
A 7 A

: C nm R2n2m ,
A

=
A 7 A

<{A} ={A}
={A}
<{A}
<{A}
={A}
={A} <{A}

, and

, so that

x.
for a rotationally invariant random vector, Cx = 12 C
These mappings have some remarkable properties, as stated in the next lemmas,
propositions and theorems:
2
to a covariance matrix C and the mapping A
to
Note that we will usually apply the mapping A
a pseudo-covariance matrix P (despite of the general results about the mapping properties).

3. Entropy and Capacity

30

Lemma 3.5
C = AB
C=A+B
C = AH
1
C=A
= |det (A)|2
det A
z=x+y
y = Ax
<{xH y}
nn
UC
is unitary
nn
CC
is non-negative definite

=A
B

C=A+B
T
=A

C
=A
1
C

= det AAH , A C nn

z=x
+y

x
y
= A
x
T y

R 2n2n is orthonormal
U
R 2n2n is non-negative
C
definite

Proof. [58]. The first three statements are immediate. The fourth statement follows
from the first and the fact that
In = I2n , where In denotes the nn identity matrix.
The fifth statement follows from


In In In In
A
0

det A
= det
A
= det
0
In
0
In
={A} A
= det (A) det (A) .
The statements 6 - 9 are immediate. The last statement follows from
x = <{xH Cx} = xH Cx 0,
x
T C
where the statements 7 and 8 were used.
and A

Next, we will establish some important relations between the mappings3 A


of Definition 3.4:
Lemma 3.6
=A
B

C = AB C

C = AB C = AB
=A
B

C = AB C
Proof. The first two statements follow from
AB = <{A}<{B} + ={A}={B} + (={A}<{B} <{A}={B})
and

B
=
A
3

<{A}<{B} + ={A}={B} <{A}={B} ={A}<{B}


={A}<{B} <{A}={B} <{A}<{B} + ={A}={B}

were studied.
In contrast to Lemma 3.5, where the mappings x
and A

3.1. The Maximum Entropy Theorem for Complex Random Vectors

31

and

<{A}<{B} + ={A}={B}
={A}<{B} <{A}={B}
={A}<{B} <{A}={B} <{A}<{B} ={A}={B}

B
=
A

respectively, whereas the last statement follows from


AB = <{A}<{B} ={A}={B} + (={A}<{B} + <{A}={B})
and

B
=
A

<{A}<{B} ={A}={B} <{A}={B} + ={A}<{B}


={A}<{B} + <{A}={B} ={A}={B} <{A}<{B}

occur always in pairs:


The following proposition tells us that the eigenvalues of A
with algebraic multiplicity m
Proposition 3.7 Let denote an eigenvalue of A

[24]. Then is also an eigenvalue of A with the same algebraic multiplicity m.


Proof. We have,

0
In


1
0 In 0 In
A
=
In
0
In
0

In
<{A}
={A}
0
In

= A,
0
={A} <{A}
In 0

and therefore,

I2n
I2n = (1)2n det A
+ I2n
det A
= det A

+ I2n ,
= det A
which implies the statement.
For a symmetric (not Hermitian) A C nn , i.e., AT = A, we are now able to
in terms of the original matrix A:
express all eigenvalues of A
Theorem 3.8 Let A C nn denote a symmetric matrix, i.e., AT = A. Then all
are real valued and the non-negative eigenvalues4 of A
are the
eigenvalues of A
singular values [19] of A.
Proof. Since A is a symmetric matrix, <{A} and ={A} are also symmetric matri is a symmetric matrix as well which implies that its eigenvalues
ces. Therefore A
are real valued.
Due to the first statement of Lemma 3.6 (setting B = AT = A) we have,
\
H,
A
= AA
A
4

According to Proposition 3.7 we then have knowledge of all (also the negative) eigenvalues.

32

3. Entropy and Capacity

and furthermore ( is real), applying Lemma 3.5,


H I ,
A
I2n = AA\
A
n

and consequently,

A
I2n = | det AAH In |2 .
det A
A
are the squared eigenvalues of A
and are equal to the
The eigenvalues of A
H
eigenvalues AA , which are the squared singular values of A.
for a symmetric
The previous result gives us knowledge about the eigenvalues of A
matrix A. But what about its eigenvectors? Can we obtain an analogous result?
This question is mainly of theoretical interest since Theorem 3.8 together with
Proposition 3.7 is sufficient to prove the subsequent results. However, in order
to get a deeper understanding of the underlying structure we will present another
proof of Theorem 3.8, which gives insight into the eigenvector problem as well.
The proof is strongly related to the not so well known Takagis factorization [24]
that applies to symmetric complex matrices:
Theorem 3.9 (Takagis factorization) Let A C nn denote a symmetric matrix, i.e., AT = A. Then there exist a unitary matrix Q C nn and a real valued,
diagonal matrix Rnn with non-negative entries, i.e.,

1
0

..
=
i 0, i = 1, . . . , n,
,
.
0
n
such that
A = QQT .

(3.7)
AAH ,

The columns of Q are an orthonormal set of eigenvectors for


and the corresponding diagonal entries of are the non-negative square roots of the corresponding eigenvalues of AAH , i.e., the singular values [19] of A.
Proof. [24].
We want to emphasize that factorization (3.7) represents at the same time the Singular Value Decomposition (SVD) [19] of A.
As a corollary we (re-)obtain a stronger version of Theorem 3.8 and Proposition
3.7 (for a symmetric A):
Corollary 3.10 Let A C nn denote a symmetric matrix, i.e., AT = A. Then
there exist a unitary matrix Q C nn and a real valued, diagonal matrix
Rnn with non-negative entries, such that
=Q

Q
T.
A

(3.8)

Q and are as in Theorem 3.9, in particular the diagonal entries of are the
singular values of A.

3.1. The Maximum Entropy Theorem for Complex Random Vectors

33

Proof.
= Q^
A
(QT )
^
T
Q
= Q

^
= Q
(QH )

(3.9)

d
H

Q
= Q

Q
T,
= Q

(3.10)
(3.11)

where (3.9) and (3.10) are consequences of statements 3 and 2 of Lemma 3.6, respectively, and (3.11) follows from the third statement of Lemma 3.5.
We want to emphasize that factorization (3.8) represents the eigenvalue decompo Note that the diagonal eigenvalue matrix
has the structure
sition of A.

1
0

..

..

.
0

i being the singular values of A. It is very remarkable that the orthonormal eigen for a unitary matrix Q.
vector matrix is of the form Q
Corollary 3.11 Let Px = E{(x x ) (x x )T } denote the pseudo-covariance
matrix of a complex random vector x C n . Then there exist a unitary matrix
Qx C nn and a real valued, diagonal matrix x Rnn with non-negative
entries, such that
x = Q
x
xQ
T
P
(3.12)
x.
The diagonal entries of x are the singular values of Px .
Proof. Px C nn and PT
x = Px .
x in terms of Px . We
Until now, we were able to express the eigenvalues of P
will see later that it is very important to find a similar relation for the determinant
of Cx . Due to decomposition (3.5) we can expect that det (Cx ) depends on the
covariance matrix Cx and the pseudo-covariance matrix Px . In order to solve this
problem, we need the following
Definition 3.12 A matrix B C nn is called generalized Cholesky factor of a
positive definite Hermitian matrix A C nn if it satisfies the equation
A = BBH .

3. Entropy and Capacity

34

Since det A = |det B|2 , a generalized Cholesky factor is always a non-singular


matrix. Note that the conventional Cholesky decomposition (cf. [19]), A = LLH ,
where L is lower-triangular, yields a generalized Cholesky factor L. There are
other ways of constructing a generalized Cholesky factor as well. Let A = UDUH
be the eigenvalue decomposition of A. Then U is a unitary matrix consisting of
the normalized eigenvectors of A and D is a diagonal matrix having the real and
positive eigenvalues of A as diagonal entries. For any matrix T, which satisfies
D = TTH , B = UT is a generalized Cholesky factor. The next theorem tells us
that if we know one generalized Cholesky factor, we know them all:
Theorem 3.13 Suppose B is a generalized Cholesky factor of A. Then, for any
unitary matrix U, C = BU is also a generalized Cholesky factor. Conversely, if
B and C are generalized Cholesky factors, there exists a unitary matrix U, such
that C = BU.
Proof. The direct part follows from
CCH = BUUH BH = BBH = A,
where UH = U1 was used. For the converse part, we have
BBH = CCH ,
and therefore
C1 B = CH BH ,
which is equivalent to

1 1 1 H
B C
= B C .

This shows that U = B1 C is unitary.

Lemma 3.14 Suppose the complex random vector x C n has a non-singular


covariance matrix Cx and a pseudo-covariance matrix Px . Let Bx be a generalized Cholesky factor of Cx and let i denote the singular values of PBx 1 x =
Bx 1 Px Bx T , the pseudo-covariance matrix of the complex random vector
Bx 1 x. Then det (Cx ) can be expressed as
det (Cx ) = 22n (det Cx )2

1 2i .
i >0

Proof. Consider the complex random vector y = Bx 1 x. Obviously, y has a


covariance matrix Cy = In and a pseudo-covariance matrix Py = PBx 1 x . This
xy
yields (
x=B
, according to Lemma 3.5)

3.1. The Maximum Entropy Theorem for Complex Random Vectors

35

x Cy B
T
det (Cx ) = det CB
=
det
B
xy
x

xB
T

= det (Cy ) det B


) det Cx
x = det (Cy
= det (Cy ) (det Cx )2 = 22n (det Cx )2 det (2Cy )

y ,
= 22n (det Cx )2 det I2n + P
and, applying Corollary 3.11,

y
det (Cx ) = 22n (det Cx )2 det I2n +
Y

= 22n (det Cx )2
1 2i ,
i >0

where the product is over all positive eigenvalues (counted with multiplicity) of
1 , or, equivalently (Corollary 3.11), over all (non-zero) singular values of
P
Bx x
PBx 1 x .
Furthermore, the concept of generalized Cholesky factors enables us to formulate a criterion for a matrix to be a pseudo-covariance matrix:
Theorem 3.15 Let C C nn be an Hermitian positive definite matrix and B a
generalized Cholesky factor of C. C and a matrix P C nn are covariance
matrix and pseudo-covariance matrix of a complex random vector, respectively, if
and only if P is symmetric and the singular values of B1 PBT are smaller or
equal to 1.
Proof. We will first prove the necessary condition, i.e., the condition that P is
symmetric and that the singular values of B1 PBT are smaller or equal to 1,
if C and P are covariance and pseudo-covariance matrix of a complex random
vector, respectively. Let us take a look at the linearly transformed random vector
y = B1 x, where x denotes the complex random vector with covariance matrix C
and pseudo-covariance matrix P. Obviously, y has a covariance matrix Cy = In
and a pseudo-covariance matrix Py = PB1 x = B1 PBT . From

1
y = 1
Cy =
Cy + P
2
2

^
T
1
I2n + B PB
,

and the fact that Cy is non-negative definite, we conclude - using Corollary 3.11
- that the singular values of B1 PBT are smaller or equal to 1. Note that a
pseudo-covariance matrix is symmetric by definition.
In order to prove that P being symmetric and the singular values of B1 PBT
being smaller or equal to 1 is also a sufficient condition that C and P are covariance and pseudo-covariance matrix of a complex random vector, respectively, we

3. Entropy and Capacity

36

have to construct a complex random vector with covariance matrix C and pseudocovariance matrix P. Define C1 = I2n and P1 = B1 PBT , and consider

1
^
1 = 1 I2n + B1
C1 + P
PBT ,
2
2
which is a symmetric and non-negative definite (apply Corollary 3.10) matrix.
Therefore, there exists5 a complex random vector y C n , such that

<{y}
y
=
R2n
={y}

1 + P
1 . By construction, y has covariance
has the covariance matrix Cy = 12 C
matrix Cy = C1 and pseudo-covariance matrix Py = P1 (cf. decomposition
(3.5)). The complex random vector x = By has a covariance matrix Cx = C and
a pseudo-covariance matrix Px = P.
Corollary 3.16 Suppose the complex random vector x C n has a non-singular
covariance matrix Cx and a pseudo-covariance matrix Px . Cx is non-singular if
and only if all singular values of Bx 1 Px Bx T , Bx being a generalized Cholesky
factor of Cx , are smaller than 1.
Proof. Apply Theorem 3.15 and Lemma 3.14.
We conclude this subsection with the following useful theorem regarding complex random vectors of dimension 1:
Theorem 3.17 Let x C denote a 1 - dimensional random vector with the 1 dimensional covariance and pseudo-covariance6 matrices
h
i
Cx = [Cx ] and Px = [Px ] = rx ex , rx , x R,
respectively. Then the covariance matrix of the equivalent 2 - dimensional real
random vector x
is given by

1 Cx + rx cos x
rx sin x
Cx =
(3.13)
rx sin x
Cx rx cos x
2
and has an eigenvalue decomposition,

1 0
Cx = U
UT ,
0 2
5

(3.14)

In fact, there are infinitely many such vectors, since covariance and pseudo-covariance matrices
are invariant under (deterministic, vector-valued) translations of random vectors.
6
For rotationally invariant vectors, x can be any number.

3.1. The Maximum Entropy Theorem for Complex Random Vectors

with

Cx + rx
Cx rx
and 2 =
2
2

cos 2x sin 2x
U=
.
sin 2x
cos 2x

1 =
and

37

(3.15)

(3.16)

Proof. (3.13) is a consequence of (3.5). For (3.14) with (3.15) and (3.16) observe
that

1 Cx 0
rx cos x
sin x
.
Cx =
+
0 Cx
2
2 sin x cos x
| 2
{z 3
}
1
0 5 T
U4
U
0 1
Note that we do not require rx 0, i.e., we explicitly allow ambiguities of x in
this relaxed polar coordinates representation of Px .

3.1.2 Differential Entropy of Complex Random Vectors


The differential entropy h(x) of a complex random vector x is defined as the differential entropy h(
x) of the corresponding real random vector7 x
:
Definition 3.18

Z
h(x) = h(
x) =

supp{fx }

fx (
x) log fx (
x)d
x,

provided that the p.d.f fx of x


and the integral exists. supp{fx } R2n is the
support set of the random vector x
.
For complex Gaussian random vectors, the differential entropy is calculated in [41]
or [58] as
1
h(x) = h(
x) = log det (2eCx ) ,
(3.17)
2
which - unfortunately - uses the covariance matrix of the corresponding real random vector x
. We are interested in a result depending on the covariance matrix Cx
and the pseudo-covariance matrix Px of x.
It is well known, cf. [41] or [58], that a rotationally invariant complex Gaussian random vector maximizes entropy under the constraint of a given covariance
matrix:
Theorem 3.19 (Maximum Entropy Theorem for Complex Random Vectors) a
Suppose the complex random vector x C n is zero-mean and has a non-singular
covariance matrix Cx . Then the differential entropy of x satisfies
h(x) log det (eCx )
7

Throughout this manuscript log = log2 unit of entropy is [bit].

3. Entropy and Capacity

38

with equality if and only if x is rotationally invariant and Gaussian.


Proof. [41], [58].
Theorem 3.19 tells us that all zero-mean complex random vectors with a specified covariance matrix C have entropies smaller or equal to log det (eC). But if
we are given a complex random vector with a certain covariance and non-vanishing
pseudo-covariance matrix, the bound is not tight anymore (since it does not take
into account the pseudo-covariance matrix). For this case, it should be possible to
strengthen the theorem in the sense that it is a statement for a class of complex random vectors with a specified covariance and pseudo-covariance matrix. In order to
prove such an extension, we need the real counterpart of Theorem 3.19:
Theorem 3.20 (Maximum Entropy Theorem for Real Random Vectors) a
Suppose the real random vector8 x
R2n has a non-singular covariance matrix
Cx . Then the differential entropy of x
satisfies
1
log det (2eCx )
2

h(
x)

with equality if and only if x


is Gaussian.
Proof. We will first prove this theorem for the special case of x
being a zero-mean
random vector (cf. also [6]). Let fx denote the p.d.f. of x
, and let g
R2n be a
random vector with p.d.f.
1 T 1
1
fg (
x) = p
e 2 x Cx x ,
det (2Cx )

(3.18)

so that g
is a zero-mean,
Gaussian distributed random vector with covariance matrix Cx . Observe that x
= [
x(1) x
(2n)]T
Z
R2n

Z
fx (
x)
x(i)
x(j)d
x=

R2n

fg (
x)
x(i)
x(j)d
x,

1 i, j 2n

and that log fg (


x) is an affine combination of the terms x
(i)
x(j). Thus,
Efx {log fg (
x)} = Efg {log fg (
x)}.
8
Note that this theorem is also valid for real random vectors, which are not induced by complex
random vectors. Furthermore, odd dimensions are allowed as well.

3.1. The Maximum Entropy Theorem for Complex Random Vectors

39

Then,
Z
h(
x) h(
g) =
=

supp{fx }
Z

Z
fx (
x) log fx (
x)d
x+

R2n

fg (
x) log fg (
x)d
x

fx (
x) log fx (
x)d
x+
Z
+
fx (
x) log fg (
x)d
x
supp{fx }
Z
fg (
x)
fx (
x) log
d
x
fx (
x)
supp{fx }

Z
fg (
x)
1
fx (
x)
1 d
x
ln 2 supp{fx }
fx (
x)
Z
!
1
fg (
x)d
x1
ln 2
supp{fx }
1
(1 1) = 0,
ln 2
supp{fx }

with equality if and only if fx = fg almost everywhere (cf. [21]). Thus h(


x)
h(
g).
The general case, where the random vector x
has a non-vanishing expectation
vector x , is an immediate consequence of the special case, since h(
x x ) =
h(
x) and Cxx = Cx .
Combining this theorem with Lemma 3.14, we end up with

Theorem 3.21 (Generalized Maximum Entropy Theorem) a


Suppose the complex random vector x C n has a non-singular covariance matrix
Cx and a pseudo-covariance matrix Px . Let Bx be a generalized Cholesky factor
of Cx and let i denote the singular values of Bx 1 Px Bx T , which must9 be
smaller than 1. Then the differential entropy of x satisfies
h(x) log det (eCx ) +

1X
log 1 2i ,
2

}
| i {z
0

with equality if and only if x is Gaussian.


9
According to Theorem 3.15, the singular values are smaller or equal to 1. This additional constraint is mandatory for the validity of the statement.

3. Entropy and Capacity

40

Proof. We have (using Lemma 3.14),

1
1
1
log det (2eCx ) =
log (2e)2n + log det Cx
2
2
2

1
1
= n log (2e) + log 22n + log (det Cx )2 +
2
2

1X
+
log 1 2i
2
i

= n log 2 + n log (e) n log 2 +


+ log det Cx +

1X
log 1 2i
2

1X
= log det (eCx ) +
log 1 2i ,
2

which, together with Definition 3.18, Theorem 3.20, and equation (3.17), implies
the theorem.
Note that Theorem 3.19 is a corollary of Theorem 3.21. Therefore, Theorem
3.21 is really a generalization of Theorem 3.19.
The Generalized Maximum Entropy Theorem (Theorem 3.21) compares two
complex random vectors, the original one and its Gaussian distributed counterpart, i.e., a complex random vector with the same covariance matrix and pseudocovariance matrix but with a Gaussian distribution. The differential entropy of this
Gaussian random vector is equal to the differential entropy of a Gaussian random
vector that is rotationally
with

the same covariance matrix modified by


P invariant
a correction term ( 12 i log 1 2i , cf. Theorem 3.21) that comes from the
non-vanishing pseudo-covariance matrix. We will show in the following that this
correction term is a very fundamental quantity that does not only apply to the differential entropy of a Gaussian distributed random vector or that is valid for the upper
bound of Theorem 3.21. To be more precise, we will show that this correction term
is independent of the actual distribution, i.e., it always measures the deviation of
the differential entropy from the ideal rotationally invariant differential entropy,
no matter according to what law the random vector is distributed. Of course, we
have to define carefully what we mean by ideal rotationally invariant differential entropy for an arbitrarily distributed random vector, but we will see quite soon
that this can be done in a very intuitive and natural way. But before we proceed in
this direction, we will introduce the concept of a widely affine transformation:
Definition 3.22 A mapping
A : C n C m,

x 7 A (x)

is called widely affine transformation or widely affine mapping if there exist two
matrices A1 , A2 C mn and a vector b C m , such that
A (x) = A1 x + A2 x + b,

x C n .

3.1. The Maximum Entropy Theorem for Complex Random Vectors

41

Note that this definition is a modification of the definition of a widely linear transformation / mapping that can be found in literature (see e.g. [33, 44]). The difference is the additional translation vector b.
A widely affine transformation can be equivalently described by an affine transformation on the real and imaginary part level:
Theorem 3.23 A mapping
A : C n C m,

x 7 A (x)

is a widely affine transformation if and only if there exist a matrix A R2m2n


and a vector b C m , such that
[

A
(x) = A
x + b,

x C n .

Proof. We have (using Lemma 3.5),


[

1x
2x
c + b
A
(x) = A
+A

In
0

= A1 x
+ A2
x
+b
0 In

x
2x
= A
+A
+b
1

<{A1 + A2 } ={A2 A1 }

=
x
+ b,
={A1 + A2 } <{A1 A2 }
|
{z
}

(3.19)
(3.20)

that not only shows how to calculate A from A1 and A2 , i.e. from (3.19),
1 + A
2,
A=A
but - conversely - also how to calculate A1 and A2 from A, i.e. from (3.20),
<{A1 } =
={A1 } =
<{A2 } =
={A2 } =

1
<{A1 + A2 } +
2
1
={A1 + A2 }
2
1
<{A1 + A2 }
2
1
={A1 + A2 } +
2

1
<{A1 A2 },
2
1
={A2 A1 },
2
1
<{A1 A2 },
2
1
={A2 A1 }.
2

The vector b of the present theorem is identical to the vector b of Definition 3.22.
Now we return to our original problem. Suppose we are given a rotationally
variant random vector. Is it possible to associate in a canonical way a rotationally invariant random vector that behaves like the given random vector with the
exception of being rotationally invariant? It is natural to demand that the associated random vector must have a distribution law that is as close as possible to the

42

3. Entropy and Capacity

distribution law of the given random vector. For a given Gaussian random vector
everything is clear: the associated random vector will be Gaussian as well and is
fully specified by mean vector, covariance matrix, and rotationally invariance. But
what about a random vector that is not Gaussian?
Let us look again at the Gaussian case. On the real and imaginary part level we
deal with two Gaussian random vectors with the same mean vector but with different covariance matrices (in the complex domain both random vectors have the
same covariance matrix, but one random vector has a vanishing pseudo-covariance
matrix whereas the other has a non-vanishing covariance matrix). Recalling that
affine transformations of Gaussian random vectors are again Gaussian (see e.g.
[34]) we can view one vector to be an affine transformed version of the other. It is
a consequence of the theory of generalized Cholesky factors that it is possible, cf.
also Theorem 3.27, to construct an affine transformation (on the real and imaginary
part level or a widely affine transformation on the complex level) that transforms
one covariance matrix into another. This observation together with the fact that a
(widely) affine transformation preserves the character of a random vector also in
the non-Gaussian case is used for the definition of a canonically associated rotationally invariant random vector to a given rotationally variant random vector:
Definition 3.24 Let y C n denote a complex random vector with mean vector
y , covariance matrix Cy , and pseudo-covariance matrix Py . A complex random
vector x C n is called rotationally invariant analogon of y, if its mean vector,
covariance matrix, and pseudo-covariance matrix satisfy
x = y ,

Cx = Cy ,

Px = 0,

and there exists a widely affine transformation


A : C n C n,

x 7 A (x) ,

such that y and A (x) are identically distributed [21].


Since we do not want to exclude the existence of a rotationally invariant analogon
in advance, the mapping A is allowed to be widely affine and not only affine.
According to Corollary 3.3, an affine A would imply Py = 0. Furthermore, in
order to ensure x = y the mapping A is allowed to be widely affine and not
only widely linear.
We want to emphasize that a rotationally invariant analogon of a given complex
random vector is not uniquely defined: suppose x denotes a rotationally invariant
analogon of a complex random
vector
y. Then, for all [0, 2[ ( is fixed), the
0

random vectors x = e x + 1 e x are rotationally invariant analogons of y


as well. To see this, observe that x and x0 have the same mean vectors, covariance
matrices, and pseudo-covariance matrices (equal to the zero matrix), respectively,
0
and that the widely affine transformations
0 A are obtained
from the widely affine
0
0
transformation A via A (x ) = A e x + 1 e x .

3.1. The Maximum Entropy Theorem for Complex Random Vectors

43

In the following we will deal with the existence of a rotationally invariant analogon, i.e., with the existence of the widely affine transformation of Definition 3.24.
First of all observe that invertible affine transformations do not change any existence statements regarding rotationally invariant analogons:
Proposition 3.25 Suppose the random vector x C n is a rotationally invariant
analogon of the random vector y C n . Then, for any invertible matrix M
C nn and any vector c C n (both deterministic), the random vector Mx + c is
a rotationally invariant analogon of the random vector My + c.
Proof. We have, cf. Proposition 3.1,
Mx+c = Mx + c = My + c = My+c ,
CMx+c = MCx MH = MCy MH = CMy+c ,
PMx+c = MPx MT = 0,
so that it remains to show the existence of the widely affine transformation of Definition 3.24. From Definition 3.24, there exists a widely affine transformation
A : C n C n,

x 7 A (x) ,

such that A (x) and y are identically distributed. Then the mapping
A0 : C n C n ,
defined by

A0 (x0 ) = MA M1 x0 M1 c + c,

x0 C n ,

is a widely affine transformation that satisfies


A0 (Mx + c) = MA (x) +c,
so that A0 (Mx + c) and My + c are identically distributed.
Next, we will prove the existence of a rotationally invariant analogon for a simple
special case:
Lemma 3.26 Let y C n denote a zero-mean complex random vector with covariance matrix and pseudo-covariance matrix

1
0

..
Cy = In
and
Py =
with 0 i < 1,
.
0
n
respectively. Then there exists a rotationally invariant analogon x of y.

44

3. Entropy and Capacity

Proof. According to (3.5) we have,

1 + 1

..

1
1 + n
Cy =
1 1
2

..

1 n

whereas a rotationally invariant analogon x would satisfy


1
Cx = I2n .
2
Let A R2n2n denote the (invertible) matrix

1 + 1

..

1 + n

A=
1 1

..

1 n

Then the random vector x defined (from y) via


x
= A1 y

is a rotationally invariant analogon of y. To see this consider the widely affine10


transformation
A : C n C n, x
7 A (x) ,
defined by (apply Theorem 3.23)
[
A
(x) = A
x,

x C n .

Note that i < 1 is mandatory for the validity of this proof, because otherwise
the inverse A1 would not exist. However, if we are given a rotationally invariant
random vector x (with Cx = 12 I2n ) and consider the random vector y that is
obtained from x via the same widely affine transformation y = A (x) as in the
proof, then x is a rotationally invariant analogon of y also in the case i = 1
(i {1, . . . , n}). This fact is very natural and is the reason why the widely affine
transformation of Definition 3.24 is chosen to be a mapping from the rotationally
invariant random vector to the rotationally variant random vector and not from the
10

It is even a widely linear transformation.

3.1. The Maximum Entropy Theorem for Complex Random Vectors

45

rotationally variant random vector to the rotationally invariant random vector. We


want to mention that it is even possible to prove Lemma 3.26 allowing i = 1
i = 1, . . . , n. Since this special case is not relevant for our work, we omit the
more sophisticated proof.
Finally, we present to following (general) existence theorem regarding rotationally invariant analogons:
Theorem 3.27 Suppose the complex random vector y C n has a non-singular
covariance matrix Cy and a pseudo-covariance matrix Py . Let By be a generalized Cholesky factor of Cy and assume that the singular values of By 1 Py By T ,
are smaller than 1. Then there exists a rotationally invariant analogon x of y.
Proof. According to Proposition 3.25 we can assume (maintaining full generality)
that y is a zero-mean random vector. Consider Takagis factorization (Theorem
3.9) of the pseudo-covariance matrix of the random vector y0 = By 1 y, i.e.,
1
T
Qy0 y0 QT
.
y0 = By Py By
1
1
0
00
The random vector y00 = Q1
y0 y = Qy0 By y has a covariance matrix Cy =
In and a (diagonal) pseudo-covariance matrix Py00 = y0 with entries within the
interval [0, 1[. A combination of Lemma 3.26 with Proposition 3.25 (M = By Qy0
and c = 0) shows the existence of a rotationally invariant analogon x of y and,
hence, concludes the proof.

With the introduced framework we are now in the position to formulate the
promised theorem that quantifies the deviation of the differential entropy of a rotationally variant random vector from the differential entropy of its rotationally
invariant analogon11 independently of the actual probability distribution:
Theorem 3.28 Suppose the complex random vector y C n has a non-singular
covariance matrix Cy and a pseudo-covariance matrix Py . Let By be a generalized Cholesky factor of Cy and let i denote the singular values of By 1 Py By T ,
which must be smaller than 1. Furthermore, let x denote a rotationally invariant
analogon of y (which exists according to Theorem 3.27). Then the differential
entropy of y satisfies

1X
log 1 2i .
h(y) = h(x) +
2

| i {z
}
0

Proof. From Definition 3.24 and Theorem 3.23 we know that there exist a deterministic matrix A R2n2n and a deterministic vector b C n , such that y
and

A
x + b are identically distributed. This implies
Cy = ACx AT ,
11
To be more precise, this should read ... differential entropies of all of its rotationally invariant
analogons ..., but it is a consequence of Theorem 3.28 that these entropies are all equal.

3. Entropy and Capacity

46

and furthermore, applying Lemma 3.14 to det Cy = det Cx (det A)2 ,


Y

22n (det Cy )2
1 2i = 22n (det Cx )2 (det A)2 ,
i

and finally, observing that det Cy = det Cx 6= 0,


sY

|det A| =
1 2i .
i

The final result follows from the well known - see e.g. [6] - transformation rule

= log |det A| + h (
h A
x+b
x) ,
and the definition of the differential entropy for complex random vectors (Definition 3.18).
Note that we can re-obtain the Generalized Maximum Entropy Theorem (Theorem
3.21) from the previous theorem, if we apply the conventional Maximum Entropy
Theorem (Theorem 3.19) to the rotationally invariant analogon x of y.
Using the inequality ln x x 1, one easily finds a lower and upper bound
for the difference between the differential entropy of a random vector and the differential entropy of (all of) its rotationally invariant analogon(s):
Corollary 3.29 Suppose the complex random vector y C n has a non-singular
covariance matrix Cy and a pseudo-covariance matrix Py . Let By be a generalized Cholesky factor of Cy and let dmax denote the largest eigenvalue of Py0 PH
y0
0
1
(y = By y), which must be smaller than 1. Furthermore, let x denote a rotationally invariant analogon of y. Then,

H
0P 0
tr
P
tr Py0 PH
0
y
y
y
h(x) h(y)
.
2 ln 2
2 ln 2 (1 dmax )
Observe that the deviation of the differential entropy from the ideal rotationally invariant differential entropy is approximately determined by the trace and the
largest eigenvalue of Py0 PH
y0 .
In the previous theorems and corollaries, we have the requirement that the covariance matrix is non-singular and that the singular (eigen) values of a certain
matrix are smaller than one. These assumptions are mandatory for the validity of
the statements. They are not merely technical assumptions. They ensure that the
occurring expressions have finite values. Otherwise, the considered random vector or - at least - a transformed version of it will have deterministic real and / or
imaginary parts of certain elements (almost everywhere). Even if we would allow
infinite values, we cannot guarantee that the statements remain valid. Note that
we can easily construct two random vectors, one being Gaussian and the other being non-Gaussian, that have deterministic real and / or imaginary parts of certain

3.1. The Maximum Entropy Theorem for Complex Random Vectors

47

elements, so that their differential entropies are both , i.e., both entropies are
equal. Hence, this can serve as a counterexample for the (Generalized) Maximum
Entropy Theorem, if the mentioned requirements are loosened and infinite values
are allowed.
We conclude this subsection with the extension of the definition of conditional
differential entropy to the complex case:
Definition 3.30 Let x and y denote two complex random vectors with an existing
joint p.d.f. fx,y of x
and y
. Then the conditional differential entropy h(y|x) is
defined as
Z
fx,y (
x, y
)
h(y|x) = h(
y|
x) = fx,y (
x, y
) log
d
xd
y
fx (
x)
with the marginal p.d.f. of x

Z
fx (
x) =

fx,y (
x, y
)d
y,

provided that both integrals exist.

3.1.3 The Euclidean Matrix Norm


For the following material, we also refer to [31].
Definition 3.31 (Normed Space, Banach Space) A normed space V is a vector
space with a norm defined on it. A Banach Space is a complete normed space
(complete in the metric defined by the norm; see (3.25), below). Here, a norm on
a (real or complex) vector space V is a real valued function on V whose value at
x V is denoted by
kxk
and which has the properties
kxk

(3.21)

kxk = 0 x = 0

(3.22)

kxk

||kxk

(3.23)

kx + yk

kxk + kyk;

(3.24)

here, x and y are arbitrary vectors in V and is any scalar.


A norm on V defines a metric d on V that is given by
d (x, y) = kx yk,

x, y V

(3.25)

and is called the metric induced by the norm. The normed space just defined is
denoted by (V, k k) or simply by V.

48

3. Entropy and Capacity

Definition 3.32 (Inner Product Space, Hilbert Space) An inner product space
(or pre-Hilbert space) is a vector space V with an inner product defined on V. A
Hilbert space is a complete inner product space (complete in the metric defined by
the inner product; cf. (3.32), below). Here, an inner product on V is a mapping of
V V into the scalar field F of V; i.e., with every pair of vectors x and y there is
associated a scalar which is written
hx, yi
and is called the inner product of x and y, such that for all vectors x, y, z and
scalars , we have
hx + y, zi

hx, zi + hy, zi

(3.26)

hx, yi

hx, yi

(3.27)

hx, yi

hy, xi

(3.28)

hx, xi

(3.29)

hx, xi = 0 x = 0

(3.30)

An inner product on V defines an induced norm on V given by


p
kxk = hx, xi

(3.31)

and - according to (3.25) - a metric on V given by


p
d (x, y) = kx yk = hx y, x yi.

(3.32)

Hence, inner product spaces are normed spaces, and Hilbert spaces are Banach
spaces.
A well known example for a Hilbert space (and in turn for a Banach space) is
the vector space of all n - dimensional complex vectors, i.e., the space V = C n ,
equipped with the inner product
hx, yi2 = yH x.

(3.33)

We will call the induced norm defined by (3.31) Euclidean (vector) norm and denote it by
q

kxk2 = hx, xi2 = xH x.


(3.34)
Definition 3.33 (Bounded Linear Operator) Let V1 and V2 be normed spaces
and let T : V1 V2 be a linear operator. The operator T is said to be bounded if
there is a real number c such that, for all x V1 ,
kTxk ckxk.

(3.35)

3.1. The Maximum Entropy Theorem for Complex Random Vectors

49

Theorem 3.34 The vector space V (V1 , V2 ) of all bounded linear operators from
a normed space V1 into a normed space V2 is itself a normed space with norm
defined by
kTxk
kTk = sup
=
sup
kTxk.
(3.36)
xV1 \{0} kxk
xV1 ,kxk=1
Proof. The proof is straightforward by checking the axioms (3.21), (3.22), (3.23),
and (3.24) of Definition 3.31.
Let us consider again the Hilbert / Banach spaces (C n , k k2 ) and (C m , k k2 ).
The vector space V (C m , C n ) of all bounded linear operators from C m into C n is
isomorphic to the space of all n m - dimensional complex matrices, i.e.,
V (C m , C n )
= C nm .

(3.37)

Applying Theorem 3.34, we conclude that12

nm
C
, k k2
is a normed space. We will call the corresponding (induced) norm, defined in
(3.36), the Euclidean matrix norm. That is, for any matrix A C nm , we have to
calculate
kAk2 = sup kAxk2 .
(3.38)
kxk2 =1

The question remains whether it is possible to find a simpler rule for calculating
the Euclidean matrix norm of a given matrix. But before we address this issue, we
will first present some important properties of the norm of a normed / inner product
space.
Lemma 3.35
kTxk kTkkxk x
kT1 T2 k kT1 kkT2 k

kTxk = kxk
1
If hTx, yi = hx, T yi x, y , then
kTk = 1

(3.39)
(3.40)
x

(3.41)

Proof. The properties are immediate consequences of the definition of the (induced) norm in (3.36).
We now return to our examples, the Hilbert spaces (C n , k k2 ) and (C m , k k2 ),
and the induced normed space (C nm , k k2 ) (it can be shown that it is even a Banach space [31]) with Euclidean matrix norm which has to be calculated according
to (3.38). The following theorem establishes a relation between the Euclidean
matrix norm of a matrix and its singular values, and provides us with a second
possibility for computing the Euclidean matrix norm.
12
Note that we write k k2 not only for the Euclidean vector norm, but also for the (induced)
Euclidean matrix norm.

3. Entropy and Capacity

50

Theorem 3.36 Let A C nm denote a complex matrix and max denote its
greatest singular value. Then,
kAk2 = max .

(3.42)

Proof. Let A = UVH be the Singular Value Decomposition (SVD) [19] of A.


Note that the SVD is well defined for all rectangular complex matrices and yields
unitary matrices U C nn and V C mm and a diagonal13 matrix Rnm
with non-negative entries on the main diagonal (the singular values, which are
usually14 ordered in descending order). Since

hUx, yi2 = yH Ux = U1 y x = x, U1 y 2 x, y
for any unitary matrix U, we have
kUk2 = 1
kVk2 = 1

and
and

kUxk2 = kxk2

x C n ,

(3.43)

kVyk2 = kyk2

(3.44)

y C ,

according to Lemma 3.35. This yields


kAk2 =
=

sup kAxk2

kxk2 =1

sup kUVH xk2

kxk2 =1

sup kVH xk2

(3.45)

kxk2 =1

sup
kVyk2 =1

kVH (Vy)k2

sup kyk2

kyk2 =1

= kk2 = max .

We want to emphasize that this result is the reason why we introduced the concept of normed spaces (Banach spaces) and inner product spaces (Hilbert spaces).
It was already mentioned that statements like the singular values of the matrix
B1 PBT have to be smaller than 1 and sometimes also the singular values of
the matrix B1 PBT have to be smaller than or equal to 1 are mandatory for the
validity of Theorems 3.15 and 3.21, and Corollary 3.16. According to Theorem
3.36, we can replace these statements by
1

B PBT < 1 and B1 PBT 1,


(3.46)
2
2
respectively. The following lemma gives a sufficient (but not necessary) condition
for the trueness of (3.46).
13

A rectangular diagonal matrix is a matrix for which all entries with different column and row
indices are 0.
14
U and V can be chosen to guarantee this.

3.2. Capacity

51

Lemma 3.37 Let C C nn be a Hermitian positive definite matrix and B a


generalized Cholesky factor of C. For any matrix P C nn that satisfies
kPk2 <

1
kC1 k2

we have

B PBT < 1,
2

whereas for any matrix P C nn that satisfies


kPk2

1
kC1 k2

we have

B PBT 1.
2

Proof. The implications follow from


1

B PBT B1 kPk BT = B1 2 kPk


2
2
2
2
2
| {z }2

<1
,
1

kC1 k2

where Lemma 3.35 was used.


It is an immediate consequence of the Singular Value Decomposition of C that
1
= min ,
kC1 k2

(3.47)

where min denotes the smallest singular value of C, cf. also Theorem 3.36.
Therefore, the sufficient conditions of Lemma 3.37 can be also formulated as if
the greatest singular value of P is smaller than (or equal) to the smallest singular
value of C, then . . ..

3.2 Capacity
3.2.1 Rotationally Invariant Noise
Now we return to our channel model as presented in Section 2.3. We have
y = Ax + n,

(3.48)

where y C r and x C t denote the received and transmitted vectors, respectively. A is a deterministic r t complex matrix, and n C r is zero-mean
complex Gaussian noise. We assume that the covariance matrix Cn = E{nnH }
is known and non-singular, and that the noise vector is rotationally invariant, i.e.,
the pseudo-covariance matrix Pn = E{nnT } is the zero matrix. This is the usual
assumption in literature [13, 58] as well in practice, although a vanishing pseudocovariance matrix occurs only in some special cases. We will derive a criterion for
the noise vector to be (essentially) rotationally invariant in Section 4.1 (for the special cases given by equations (2.16) and (2.25) ). The specification of the channel
model is completed by the power constraint,

E{xH x} = tr E{xxH } S,
(3.49)

3. Entropy and Capacity

52

and the assumption that the channel is memoryless, i.e., that the noise vectors for
different channel uses are independent. We want to emphasize that this is only
an approximation because in general, we can expect correlations / dependencies
between the noise vectors of different channel uses. For mathematical reasons, we
stick to this simplification. Note that such dependencies possibly increase capacity
but make an analytical derivation less tractable.
The mutual information I(x; y) between transmit and receive vector is defined
as
I(x; y) = h(y) h(y|x),
(3.50)
and can therefore be written as
I(x; y) = h(y) h(
y|
x)

= h(y) h(A
x+n
|
x)

(3.51)

= h(y) h(
n)
= h(Ax + n) h(n),
since n
and x
are independent.
The capacity C of the channel is defined as the maximum of the mutual information over all possible transmit random vectors x satisfying the power constraint
(3.49), i.e., we have to choose the distribution (the p.d.f.) of x (of x
) that maximizes equation (3.50) but also fulfills the power constraint. Mathematically, this
maximization can be written as15
C=

max

{I(x; y)} ,

(3.52)

{h(Ax + n)} h(n).

(3.53)

H
fx
:tr(E{xx })S

and, applying (3.51), we obtain


C=

max

H
fx
:tr(E{xx })S

Let xmax denote a random vector that maximizes (3.53), i.e.,


xmax =

arg max

H
fx
:tr(E{xx })S

{h(Ax + n)} ,

and
ymax = Axmax +n.
Obviously,

H
tr E{xmax xH
max } = tr (Cxmax ) + tr xmax xmax S,
|
{z
}
0

and (since xmax and n are independent and Pn = 0)

15

Cymax

= ACxmax AH + Cn ,

Pymax

= APxmax AT + Pn = APxmax AT .

If the unit of entropy is [bit], the unit of capacity is [bit / channel use].

(3.54)

3.2. Capacity

53

Applying the (Generalized) Maximum Entropy Theorem (Theorem 3.19 or 3.21),


we conclude that ymax is Gaussian with Pymax = APxmax AT = 0, and we can
restrict our search for the maximizing random vector to zero-mean, rotationally
invariant, Gaussian complex random vectors x, so that the maximization problem
(3.53) and (3.54) is simplified16 to17

C =
max
log det ACx AH + Cn log det Cn ,
Cx :tr(Cx )S

(3.55)
Cxmax = arg max
log det ACx AH + Cn ,
Cx :tr(Cx )S
where Cx is the covariance matrix of x, i.e., the maximization goes over all nonnegative definite Hermitian matrices with a trace smaller or equal to S.
Let Bn denote a generalized Cholesky factor of Cn , i.e.,
Cn = Bn BH
n
and let
H
B1
n A = UDV

denote the Singular Value Decomposition (SVD) [19] of B1


n A. We have

H
H
log det ACx A + Cn = log det ACx A + Bn BH
n
H

H H
= log det Bn B1
n ACx A Bn + Ir Bn

H H
= log det B1
n ACx A Bn + Ir +
+ log det Bn + log det BH
n

= log det UDVH Cx VDT UH + Ir +


+ log det Cn

H
T
= log det DV Cx VD + Ir + log det Cn ,
and, with Cx = VCa VH (a = VH x), the maximization problem (3.55) is equivalent to

C =
max
log det DCa DT + Ir ,
(3.56)
Ca :tr(Ca )S

Camax = arg max


log det DCa DT + Ir ,
Ca :tr(Ca )S
since tr (Cx ) = tr (Ca ).
Let di , i = 1, . . . , s (s = min{r, t}), denote the diagonal elements of D (the
18
singular values of B1
n A), i.e. ,
D = diagrt {d1 , . . . , ds } .
16

M C nn non-singular log det (eM) = log det (M) + n log (e)


Do not mix up the capacity C with the covariance matrix Cxmax of the maximizing random
vector.
18
diagrt {d1 , . . . , ds } with s = min{r, t} is defined to be a complex valued matrix with r rows
and t columns for which all entries with different row and column indices are 0 and the entry with
i-th row and i-th column index is equal to di (i = 1, . . . , s).
17

3. Entropy and Capacity

54

Power

L
camax 1

camax s

1
d22

camax s1

1
d21
1

1
d2s

d2s1

Sub-channels corresponding to diagonal elements

Fig. 3.1: Water Filling.

Proceeding as in [58], we find that the maximizing covariance matrix Camax is


diagonal, i.e.,
Camax = diagtt {camax 1 , . . . , camax t } ,
(3.57)
and, with the definition x+ = max{0, x}, the optimal diagonal entries can be
found via Water Filling, cf. Figure19 3.1, to be

+

, i s and di 6= 0
L d12
i
(3.58)
camax i =
0,
i s and di = 0 ,

0,
i>s
P
where the Water Level L is chosen to satisfy tr (Camax ) = ti=1 camax i = S. The
corresponding maximum mutual information (capacity) is given by
X
+
C=
log Ld2i
,
(3.59)
i:di 6=0

and Cxmax is finally obtained from Camax via


Cxmax = VCamax VH .

(3.60)

3.2.2 Rotationally Variant Noise


Again, we deal with the channel model
y = Ax + n,

(3.61)

19
This is the general, well known Water Filling illustration, which does not assume a special
(descending) ordering of {di : i = 1, . . . , s}, also cf. Footnote 14 in this chapter.

3.2. Capacity

55

where y C r and x C t denote the received and transmitted vectors, respectively. A is a deterministic rt complex matrix, and n C r is zero-mean complex
Gaussian noise. Again, we assume that the covariance matrix Cn = E{nnH } is
known and non-singular. However, we now assume that the noise vector is rotationally variant, i.e., the known pseudo-covariance matrix Pn = E{nnT } is
not the zero matrix20 . Similarly to the previous subsection, we restrict the set of
possible input vectors according to the power constraint,

tr E{xxH } S,
(3.62)
and we assume that the channel is memoryless, i.e., the noise vectors for different
channel uses are independent.
In contrast to the case of rotationally invariant noise (cf. Subsection 3.2.1), we
make two additional (technical) assumptions in order to simplify the analysis, so
that it is tractable at all. We assume:
1. t r. For our purposes, this restriction is not substantial, since, for our
practical channels (2.16) and (2.25), we have t = r.
2. A high signal-to-noise ratio (SNR). Note that it is not quite clear what we
mean by high SNR at the moment, because its definition is not straightforward for a channel of the form (3.61). We will give a precise definition later
in this subsection. However, for transmission over cable (bundles) this high
SNR assumption is usually fulfilled.
Again, the mutual information I(x; y) between transmit and receive vector can
be written as
I(x; y) = h(y) h(y|x)

(3.63)

= h(Ax + n) h(n),
such that the capacity of the channel
C=

max

{I(x; y)} ,

(3.64)

{h(Ax + n)} h(n).

(3.65)

H
fx
:tr(E{xx })S

is obtained as
C=

max

H
fx
:tr(E{xx })S

Let xmax denote a random vector that maximizes (3.65), i.e.,


xmax =

arg max

H
fx
:tr(E{xx })S

{h(Ax + n)} ,

(3.66)

and
ymax = Axmax +n.
20
Note that the analysis of this subsection is valid for a vanishing pseudo-covariance matrix as
well.

56

3. Entropy and Capacity

Obviously,

H
tr E{xmax xH
max } = tr (Cxmax ) + tr xmax xmax S,
|
{z
}
0

and, since xmax and n are independent,


Cymax

= ACxmax AH + Cn ,

Pymax

= APxmax AT + Pn .

Applying the Generalized Maximum Entropy Theorem (Theorem 3.21), we conclude that ymax is Gaussian. We can then restrict our search for the maximizing
random vector to zero-mean, Gaussian, complex random vectors x, so that the
maximization problem (3.65) is simplified to

1
C =
max
log det ACx AH + Cn +
log 1 2i

2
Cx ,Px :tr(Cx )S
i

(3.67)

+r log(e) h(n),

where Cx is the covariance matrix of x, Px is the pseudo-covariance matrix of x,


and i denote the singular values of

By 1 Py By T = By 1 APx AT + Pn By T ,
which must be smaller than 1, By being a generalized Cholesky factor of
Cy = ACx AH + Cn .
Hence, the maximization is over all non-negative, definite, Hermitian matrices Cx
with trace smaller or equal to S and over all symmetric matrices Px , such that
{Cx , Px } is a valid pair of covariance and pseudo-covariance matrix according to
the criterion presented in Theorem 3.15.
Observe that the term

1X
log 1 2i
(3.68)
2
i

of (3.67) depends on the pseudo-covariance matrix Px and on the covariance matrix Cx via the generalized Cholesky factor By , but is smaller or equal to 0. So, if
it is possible to take the covariance matrix Cx that maximizes the term

log det ACx AH + Cn


(3.69)
of (3.67) and then find a pseudo-covariance matrix Px , such that {Cx , Px } is a
valid pair and (3.68) is equal to 0, we have found the maximizing covariance and
pseudo-covariance matrix. Note that (3.69) does not depend on Px .

3.2. Capacity

57

We will show in the following that our technical assumptions are sufficient for
finding the maximum using this argument.
The first observation we make, if we look at (3.69), is that we have already
found the maximizing covariance matrix in the previous subsection, cf. (3.55). Let
Bn denote a generalized Cholesky factor of Cn , let
H
B1
n A = UDV

denote the Singular Value Decomposition of B1


n A, and let di , i = 1, . . . , r denote
the diagonal elements of D (the singular values of B1
n A), i.e.,
D = diagrt {d1 , . . . , dr }
with
dmax = d1 d2 . . . dr = dmin .
Then, according to (3.57), (3.58), (3.59), and (3.60), the maximizing covariance
matrix Cxmax is given by
Cxmax = VCamax VH ,

(3.70)

Camax = diagtt {camax 1 , . . . , camax t } ,

(3.71)

where
is a diagonal matrix with diagonal entries obtained by Water Filling, i.e.,

+

, i r and di 6= 0
L d12
i
camax i =
(3.72)
0,
i r and di = 0 ,

0,
i>r
Pt
where the Water Level L is chosen to satisfy tr (Camax ) =
i=1 camax i = S.
Furthermore,
X

log Ld2i
max
log det ACx AH + Cn =
+ log det Cn .
Cx ,Px :tr(Cx )S
i:di 6=0
(3.73)
Next, we will state our high SNR assumption more precisely: we will assume
in the following that the Water Level L is lower bounded by
L

,
d2min

dmin > 0.

(3.74)

Comparing this with (3.72), as illustrated in Figure 3.1, we conclude that this is
equivalent to having a signal-to-noise ratio on every virtual sub-channel21 greater
or equal to 1, i.e.,
camax i
1, i = 1, . . . , r.
(3.75)
1
d2i

21
By virtual sub-channel, we mean the scalar channels obtained by the diagonalization via the
SVD. So every virtual sub-channel is used for communications.

3. Entropy and Capacity

58

Note that (3.72) and (3.73) can now be written as


(
L d12 , i r
i
camax i =
0
, i>r

(3.76)

and
r

max
log det ACx AH + Cn =
log Ld2i + log det Cn
Cx ,Px :tr(Cx )S
i=1
(3.77)
with Water Level
r
S 1X 1
L= +
.
(3.78)
r
r
d2
i=1 i

We want to emphasize that (3.78) enables us to formulate the high SNR assumption
(3.74) as a condition for the signal power S as well. Using the Euclidean matrix
norm, cf. Subsection 3.1.3, to replace d21 and the trace operator tr() to replace
min
Pr 1
i=1 d2 , it is even possible (not shown here) to express this condition in terms of
i

the original matrix B1


n A and the signal power S.

Let D denote the (Moore - Penrose) pseudo inverse [19] of D, i.e.,


1

0
d1

..

1
0

dr Rtr
D =

(3.79)

and let us define the following symmetric matrix


T
T T
Pxmax = VD U1 B1
D VT ,
n Pn Bn U
|
{z
}
|
{z
}
A

(A )

(3.80)

where A denotes the (Moore - Penrose) pseudo inverse [19] of A, such that
Pymax

= APxmax AT + Pn
H

= Bn UDV Pxmax V
1

= Bn UIr U
| {z

(3.81)
T

D U

BT
n
T

B1 P BT U
} n n n |

Ir

+ Pn

I UT BT + Pn
{zr } n
Ir

= Pn + Pn
= 0.

(3.82)

This yields
T
B1
ymax Pymax Bymax = 0,

3.2. Capacity

59

vanishing singular values, and finally in (3.67),

1X
log 1 2i = 0.
2
i

It remains to be shown that {Cxmax , Pxmax } is a valid pair of covariance and


pseudo-covariance matrix. First of all, observe that the high SNR assumption (3.74)
implies
2

2 D L,
(3.83)
2

and, furthermore,

2
2
1


,
D L D =
C1

2
2
xmax 2

(3.84)

where (3.47) was used. This, together with Lemma 3.35, yields

1 1
T T

T
kPxmax k2 = VD U Bn Pn Bn U
D
V
(3.85)
2

T T
VD U1 B1
D VT
n Pn Bn U
2
2
{z
}
{z
}2
|
|
kD k2
kD k2

1 B1
Pn BT
n
n
2
Cxmax
2

T 1),
and, according to (3.46) and Theorem 3.15 (B1
n Pn Bn
2
1
kPxmax k2 1 .
Cxmax
2

(3.86)

Applying Lemma 3.37 to Theorem 3.15, we have proven that our choice of
{Cxmax , Pxmax } is indeed a valid pair of covariance and pseudo-covariance matrix.
Finally, the capacity (see also (3.67) and the Generalized Maximum Entropy
Theorem 3.21) is obtained as
C =
=

r
X
i=1
r
X

log Ld2i + log det Cn + r log(e) h(n)

(3.87)

log Ld2i + log det (eCn ) log det (eCn )

i=1

1X
log 1 i2
2
i

r
X
i=1

1X
log 1 i2 ,
log Ld2i
2
i

where i now denote the singular values of Bn 1 Pn Bn T , which must be smaller


than 1, Bn being a generalized Cholesky factor of Cn . Note that if a singular value

3. Entropy and Capacity

60

of Bn 1 Pn Bn T is equal to 1, this would result in an infinite capacity, as can be


shown by using the equivalent real channel model of doubled dimension defined in
(2.29), (2.30), and (2.31), and using Corollary 3.16.
We would like to emphasize that rotationally variant noise always increases
capacity, as can be seen by comparing (3.87) with (3.59).

3.2.3 Capacity Loss


In this subsection, we consider the same situation as in the previous subsection,
i.e., we deal with the same channel model (3.61), assume a non-vanishing pseudocovariance matrix Pn , and make the same two technical assumptions, cf. page
55.
In principle, all encoding-decoding schemes for this channel can be classified
into two types: those that utilize the knowledge of Pn , and those that neglect it, or to be more precise - erroneously assume that Pn is the (all) zero matrix. These two
types of transmission have different capacities in general, Cutilize Pn and Cneglect Pn ,
which motivates the calculation of the capacity loss
C = Cutilize Pn Cneglect Pn .

(3.88)

We want to mention that we could equivalently call this quantity capacity gain.
Note that most conventional schemes do not make use of the pseudo-covariance
matrix. Hence, if we compare the performance of a scheme that utilizes Pn with
a conventional scheme, we will encounter a gain and not a loss. However, capacity results are theoretical performance results of optimum schemes (of course,
optimum with respect to a certain optimality criterion). A scheme that neglects
the pseudo-covariance matrix is certainly not optimum, if we apply the optimality
criterion that takes into account the pseudo-covariance matrix. In our opinion it is
more appropriate to compare a scheme with an optimum scheme, and not with a
certain sub-optimum scheme, and therefore we speak of a capacity loss instead of
a capacity gain.
The previous Subsection 3.2.2 was dedicated to the first type of transmission,
i.e., we determined the capacity
Cutilize Pn =

r
X
i=1

1X

log Ld2i
log 1 i2 ,
2

(3.89)

where i denote the singular values of Bn 1 Pn Bn T , which must be smaller than


1, Bn being a generalized Cholesky factor of Cn .
Next, we consider the second type of transmission, where it is erroneously
assumed that Pn vanishes. Then, xmax is chosen as in Subsection 3.2.1, although
it does not maximize the expression for the mutual information
I(x; y) = h(y) h(y|x)
= h(Ax + n) h(n),

(3.90)

3.2. Capacity

61

i.e., xmax is a zero-mean, rotationally invariant (Pxmax = 0), complex Gaussian


random vector with covariance matrix

Cxmax = arg max


log det ACx AH + Cn .
Cx :tr(Cx )S

(3.91)

Applying the Generalized Maximum Entropy Theorem (Theorem 3.21), the capacity can be written as
Cneglect Pn

= I(xmax ; ymax )

(3.92)

= h (Axmax + n) h(n)

1X

= log det ACxmax AH + Cn +


log 1 2i
2
i

1X
log 1 i2 ,
log det Cn
2
i

where i denote the singular values of Pymax = APxmax AT +Pn


|
{z
}
0
T
1
T
B1
ymax Pymax Bymax = Bymax Pn Bymax ,

Bymax being a generalized Cholesky factor of


Cymax = ACxmax AH + Cn ,
and i denote the singular values of
Bn 1 Pn Bn T ,
which must be smaller than 1, Bn being a generalized Cholesky factor of Cn . Note
that all i are smaller than 1 as we will show in the following. First of all, observe
(compare (3.91) with the results of Subsections 3.2.1 and 3.2.2) that (using the
same nomenclature as in these two subsections)

Cxmax

= V

1
d21

0
..

.
L

1
d2r

0
..

.
0

H
V ,

(3.93)

3. Entropy and Capacity

62

and, furthermore,
Cymax

= ACxmax AH + Bn BH
(3.94)
n
1
H
H H
= Bn Bn ACxmax A Bn + Ir Bn

= Bn UDVH Cxmax VDH UH + Ir BH


n


1
L d2
0
1


..
H H
H

.

D + Ir
= Bn U D
U Bn

L d12


r
0
0

H H
H
= Bn U LDD Ir + Ir U Bn
= LBn UDDH UH BH
n.

This yields (Cymax = Bymax BH


ymax )
H
H
B1
= LUDDH UH ,
n Bymax Bymax Bn

(3.95)

and, taking the inverse,


H
1
1 H H
U D
DU ,
(3.96)
Bymax Bn B1
B
=
ymax n
L
such that (applying the results of Subsection 3.1.3)
2 1
1

By Bn 2 = 1
(3.97)
D ,
max
2
L
2
2
where (3.83) was used. This enables us to calculate
1

T T T
By Pn BT

By Bn B1
(3.98)
ymax 2 =
n Pn Bn Bn Bymax 2
max
max
1
2 1

Bymax Bn 2 Bn Pn BT
n
2
1
,

2
where Theorem 3.15 and (3.46) applied to n were used. Applying Theorem 3.36,
we conclude that all singular values i are even smaller than 12 . As was also the
case in Subsection 3.2.2, a singular value i equal to 1 would result in an infinite
capacity, as can be shown by using the equivalent real channel model of doubled
dimension defined in (2.29), (2.30), and (2.31), and using Corollary 3.16.
Inserting (3.77) into (3.92), we finally obtain the capacity
Cneglect Pn =

r
X
i=1

1X

1X
log Ld2i +
log 1 i2 . (3.99)
log 1 2i
2
2
i

Now we are in a position to calculate the capacity loss


C = Cutilize Pn Cneglect Pn

1X
=
log 1 2i ,
2
i

(3.100)

3.2. Capacity

Power

63

Power

Power

S+A
2

S+A
2

A+B
2

A+B
2

A+B
2

A
2

A
2

A
2

AB
2

AB
2

AB
2

<

<

<

(a)

(b)

(c)

noise power
distribution

Water Filling
neglecting Pn

Water Filling
utilizing Pn

Fig. 3.2: Water Filling strategies illustrating the capacity loss.

T
where i denote the singular values of B1
ymax Pn Bymax , Bymax being a generalized
Cholesky factor of Cymax = ACxmax AH + Cn . Note that the capacity loss lies
within the range
2
(3.101)
0 C r log ,
3
which follows from (3.98) and

1X
r
1
r
4
2
2

log 1 i log 1
= log = r log .
2
2
4
2
3
3

Example: Complex Scalar Channel


We would like to illustrate the mechanisms that lead to this capacity loss using the
following (simple) example of a 1 - dimensional complex scalar channel, i.e.,
y = x + n,

x, y, n C,

(3.102)

where the noise has a known real valued covariance matrix and a known real valued
pseudo-covariance matrix (of dimension 1 1)
Cn = [A]

and Pn = [B] ,

A B > 0.

(3.103)

64

3. Entropy and Capacity

Observe that the equivalent real noise vector n


(of dimension 2) then has a covariance matrix

1 A+B
0
Cn =
.
(3.104)
0
AB
2
Figure 3.2 (a) shows how the noise power is subdivided into real and imaginary
part. If it is erroneously assumed that the pseudo-covariance matrix is the (all)
zero matrix, the conventional Water Filling algorithm distributes the same signal
power onto the real and imaginary part as it is illustrated in Figure 3.2 (b). But
this is obviously not the optimum solution that maximizes the mutual information
in case of a non-vanishing pseudo-covariance matrix. The optimum distribution
of the signal power to the real and imaginary part that achieves capacity is shown
in Figure 3.2 (c). Note that this solution is given by Water Filling on a real and
imaginary part level. The difference of mutual information corresponding to Figure
3.2 (b) and (c) is exactly the capacity loss we are looking at in this subsection.

4. NOISE AND INTERFERENCE ANALYSIS OF DMT

In Section 2.3, we came to the conclusion that we deal with a channel model of the
form
y = Ax + n,

(4.1)

where y C r and x C t denote the received and transmitted vectors, respectively. A is the channel matrix and n C r is the noise vector. In order to obtain
capacity results, cf. Section 3.2, it is not sufficient to know the channel matrix
A. It is also necessary to know the statistical properties of the noise vector n. To
be more precise, we need the covariance matrix Cn = E{nnH } and the pseudocovariance matrix Pn = E{nnT }. For the DMT system, this means that we have
to calculate covariance and pseudo-covariance matrix of the vector Ewn0 in (2.16).
We will show in this chapter that the noise is rotationally variant in general, a fact
that is simply neglected in practical systems as well as in literature. Applying the
results of Subsection 3.2.3, we will calculate the resulting capacity loss. Furthermore, beyond capacity considerations, we also have a look at (uncoded) symbol
error probability. We will show that rotated rectangular constellations are more
appropriate than the common (quadratic) QAM constellations, and we will derive
formulas for the rotation angles and constellation sizes / densities. Finally, we will
show that this is of greatest importance if the noise at the input of the receiver is
colored, which will be the case in practice.
We also want to emphasize that the statements are also valid for MIMO DMT,
cf. Subsection 5.1.1. Note that the noise vector nn0 in (2.25) is then (follows from
the DMT case) clearly a rotationally variant complex random vector in general.
It was already mentioned in Section 2.1 that the Time Domain Equalizer has
the task to shorten the impulse response to a length shorter or equal to p + 1, p
being the length of the Cyclic Prefix. If this holds only approximately, one has to
accept intersymbol interference (ISI) and intercarrier interference (ICI), which can
be regarded as additional noise sources. In Section 4.4, we outline the fundamental
properties of the underlying mechanisms. The presented results are an extension of
the results of [42] to post - and precursors from both neighboring frames, relying
on a different derivation. Furthermore, we show that ISI and ICI are rotationally
variant in general and have equal first and second order moments.
For most of the other material presented in this chapter, we also refer to [56].

4. Noise and Interference Analysis of DMT

66

4.1 Derivation of the Noise Characteristics


4.1.1 Introduction
Let us consider the Discrete Multitone (DMT) system as it is shown in Figure
2.1. We mainly focus on the receiver part, especially on the noise characteristics
disturbing the transmission.
First of all, observe that all operations in the receiver before the input of the
decision device are linear operations. So it is sufficient to study the noise characteristics in the absence of a transmitted signal.
Throughout this chapter, the noise at the input of the receiver is modeled as a
discrete-time, real valued (due to baseband signalling), wide-sense stationary (not
necessarily Gaussian) random process. According to (2.4), the noise at the output
of the Time Domain Equalizer (TDE), z = [z(n)]n=,...,+ , is also a discretetime, real valued, wide-sense stationary random process, and we will assume that
we are given the mean1 z (n) = E{z(n)} = z and the autocorrelation function2
Rz (m + n, m) = E{z(m + n)z (m)} = Rz (n). Note that this model includes the
practically more relevant assumption of colored noise3 .
In the following, we analyze the first and second moments of the noise at the
input of the frequency domain equalizer. From this, we derive the moments at the
input of the decision device.
The first part of the receiver transforms the random process at the input into a
sequence of real random vectors. Each of these vectors has a first and second order
description of its statistical properties, i.e., a mean vector and a covariance matrix.
Due to the stationarity of the input random process, these mean vectors and covariance matrices are time independent. Furthermore, they do not depend on (time)
shifts of the serial to parallel conversion. In the next part of the receiver, each random vector of the sequence is passed through a discrete Fourier transform (DFT)
which in turn produces a sequence of random vectors, now complex valued. As
we have seen in Section 3.1, for complex random vectors, mean vector and covariance matrix are not sufficient for a complete first and second order description of
their statistical properties; one also needs the pseudo-covariance matrix. Again, all
mean vectors, covariance matrices, and pseudo-covariance matrices are the same
and do not depend on previous (time) shifts of the serial to parallel conversion.
As it can be seen from (3.6) (also shown in [41]), for a rotationally invariant
1

Note that this process is usually zero-mean in our application. However, there is the possibility
that there are other applications where a discrete-time, real valued, wide-sense stationary random
process is passed (block-wise) through a DFT that is not zero-mean. We can still apply our analysis
to such situations.
2
Again, the superscript denotes complex conjugation, which is of course redundant for real
valued random processes. Since we are also dealing with complex valued random processes later on,
we write it here for completeness.
3
Crosstalk is colored noise and the time domain equalizer transforms white noise into colored
noise as well.

4.1. Derivation of the Noise Characteristics

67

random vector, its real and imaginary part vectors have4 identical (auto-)covariance
matrices and a skew-symmetric [24] cross-covariance matrix [34]. As an immediate
consequence, real and imaginary part of an element of this random vector have
identical variances and are uncorrelated. Using the theory of proper random vectors
developed in [41], it can be easily seen that the random vectors at the output of the
DFT are rotationally variant except for the case when the input random vectors are
constant with probability 1, i.e., - roughly speaking - they are deterministic5 . This
suggests that at the output of the DFT (and also at the input of the decision device),
real and imaginary part at certain frequencies have different variances and / or are
correlated in general.
Remark: In the case of passband Orthogonal Frequency Division Multiplexing
(OFDM), the situation is different. At the input of the receiver, the demodulation of
the signal (passband to baseband conversion) requires the calculation of an analytical signal. It is shown in [43] that the analytical signal of any stationary signal is
rotationally invariant. It follows that all considered random vectors are rotationally
invariant as well.

4.1.2 First and Second Moments


To verify this conjecture about correlations and variance differences, we calculate
mean vector, covariance matrix and pseudo-covariance matrix of the complex random vector

wn0 (0)

..
wn0 =

.
wn0 (N 1)
at the output of the DFT (of even length N 2) analytically. Let us recall (see
Figure 4.1) that the real (DFT-)input random vector

vn0 (0)
..
.

vn0 =

vn0 (N 1)
is part6 of a discrete-time, real valued, wide-sense stationary random process
z = [z(n)]n=,...,+ with mean z and autocorrelation function Rz (n), i.e.,
vn0 (n) = z(n0 + n),
4

n = 0, . . . , N 1.

In fact, these properties can be equivalently used for the definition of rotationally invariance.
Wooding [63] and Goodman [20] were apparently the first to deal with random vectors satisfying
these conditions.
5
That is the only situation when a real random vector happens to be rotationally invariant.
6
According to the first block of the receiver (see Figure 4.1).

4. Noise and Interference Analysis of DMT

68

..., vn0 (0), vn0 +N +p (0), ...


..., wn0 (0), wn0 +N +p (0), ...

@
R
Frequency
Domain
Equalization

Decision
Device

remove
Cyclic
Prefix
(Length: p)

DFT

(Length: N)

..., z(0), ...

Serial

Parallel

OC
C

..., wn0 ( N
), wn0 +N +p ( N
), ...
2
2

..., vn0 (N 1), vn0 +N +p (N 1), ...

Fig. 4.1: Part of a DMT receiver.

Applying the DFT to vn0 , we get


N 1
2
1 X

vn0 (n)e N nk
N n=0

wn0 (k) =

(4.2)

N 1
2
1 X

z(n0 + n)e N nk ,
N n=0

k = 0, . . . , N 1,
and
N 1
2
1 X
z e N nk
w (k) = E{wn0 (k)} =
N n=0

N z ,
k=0
=
.
0
,
k = 1, . . . , N 1

Note that the mean vector

w (0)
..
.

(4.3)

w (N 1)
is real and does not depend on n0 . With7
Qw (k, l) = E{wn0 (k)wn0 (l)}
=
=
7

N 1 N 1
2
1 X X
E{z(n0 + n)z(n0 + m)}e N (mk+nl)
N

1
N

n=0 m=0
1
N
1 N
X
X
n=0 m=0

Qw (k, l) is also independent of n0 .

Rz (n m)e N (mk+nl) ,

(4.4)

4.1. Derivation of the Noise Characteristics

69

N 1
s=n

@
@
R

A1

A2

1
0

N 1

@
@
R

@
I
@

N 2

A3

s=n+1N

m
n=s+N 1

1N

Fig. 4.2: Summation area in (4.7).

the elements of covariance and pseudo-covariance matrix of wn0 can be written8


as
Cw (k, l) = Qw (k, l) w (k)w (l),

(4.5)

Pw (k, l) = Qw (k, l) w (k)w (l).

(4.6)

The next step is to simplify the expression for Qw (k, l). The idea is to reorder
the terms of the double sum, so that only one sum remains (after some calculations). We have
Qw (k, l) =
=
8

Again, no dependency on n0 .

N 1 N 1
2
1 X X
Rz (n m)e N (mk+nl)
N

1
N

n=0 m=0
n
N
1
X
X

n=0 s=n+1N

Rz (s)e N (nk+nlsk) ,

(4.7)

4. Noise and Interference Analysis of DMT

70

where the index change s = n m has been performed, such that the summation
over m is replaced by a summation over s. Next, we interchange the two sums.
Due to the dependence of n in the inner sum, we have to investigate the summation
over (n, s) in some more detail. Figure 4.2 shows the effective pairs that are used
in the two sums. They are denoted by the areas A1 , A2 , and A3 . Hence,
Qw (k, l) =

N 1
2
1 X
Rz (0)e N n(k+l)
N

+
+

n=0
N
1 N
1
X
X

1
N

Rz (s)e N (nk+nlsk)

s=1 n=s
1 s+N
X
X1

1
N

s=1N

(A1 ) (4.8)

Rz (s)e N (nk+nlsk) ,

(A2 )
(A3 )

n=0

and, furthermore,
N 1
2
1 X
Rz (0)e N n(k+l)
N

Qw (k, l) =

+
+

n=0
N
1 NX
1s
X

1
N
1
N

s=1 t=0
N
1 NX
1s
X
s=1

(4.9)
2

Rz (s)e N (tk+tl+sl)
2

Rz (s)e N (nk+nl+sk) ,

n=0

where the index change t = n s has been performed for term (A2 ) in (4.8), such
that its summation over n is replaced by a summation over t, and in term (A3 )
of (4.8) s has been replaced by s. Since Rz (s) is an even function, we obtain
(writing t instead of n)
N 1

Qw (k, l) =

X
2
1
Rz (0)
e N t(k+l) +
N

(4.10)

t=0

1
N

N
1
X

Rz (s)

NX
1s

s=1

t=0

Using the formula


M
1
X
t=0

(4.10) further

simplifies9

2
2
e N (tk+sl+tl) + e N (sk+tk+tl) .

(
t

a =

M,
1aM
1a ,

a=1
,
a 6= 1

(4.11)

to

9
After a long but straightforward calculation. For the general case (MIMO case), this calculation
is carried out in full detail in Subsection 5.1.1.

4.1. Derivation of the Noise Characteristics

Qw (k, l) =

71

(4.12)

P 1

ks , k + l = 0 or k + l = N
Rz (0) + 2 N
Rz (s) 1 Ns cos 2

s=1
N

=
.
2
2
cot 2N
(k + l) + PN 1

R
(s)
sin
ks
+
sin
ls
,
z

s=1
N
N

{z
}

|
k + l 6= 0 and k + l 6= N
2 (k+l)
e 2N2
N sin(
(k+l))
2N
4.1.3 Frequency Dependent Noise Analysis
In this section, we use the results of the previous section to calculate variances and
covariances of the real and imaginary parts of the noise at certain frequencies, i.e.,
we calculate covariance matrices of the real 2 - dimensional random vectors

<{wn0 (k)}
, k = 0, . . . , N 1,
={wn0 (k)}
where the frequency index k is considered as a fixed parameter. Splitting wn0 (k) =
<{wn0 (k)} + ={wn0 (k)} = an0 (k) + bn0 (k) into real and imaginary part, one
immediately - cf. also [41] - finds the relations,
Cov{an0 (k), an0 (l)} =
Cov{bn0 (k), bn0 (l)} =
Cov{an0 (k), bn0 (l)} =

1
<{Cw (k, l) + Pw (k, l)},
2
1
<{Cw (k, l) Pw (k, l)},
2
1
={Pw (k, l) Cw (k, l)},
2

(4.13)

with the usual definition of covariance Cov{, }. Specializing these equations to


k = l, this yields the variances and covariances
a2 (k) = Cov{an0 (k), an0 (k)},
b2 (k) = Cov{bn0 (k), bn0 (k)},

(4.14)

a,b (k) = Cov{an0 (k), bn0 (k)},


and, furthermore, applying (4.5) and (4.6),
a2 (k) =
b2 (k) =
a,b (k) =

o
1 n
< Qw (k, k) + Qw (k, k) 2 (w (k))2 ,
2
1
< {Qw (k, k) Qw (k, k)} ,
2
1
={Qw (k, k) Qw (k, k)}.
2

(4.15)

4. Noise and Interference Analysis of DMT

72

Inserting, cf. (4.12),

s
2
Qw (k, k) = Rz (0) + 2
Rz (s) 1
cos
ks ,
(4.16)
N
N
s=1

P 1
s
Rz (0) + 2 N

s=1 Rz (s) 1 N cos N ks ,

k = 0 or k = N2

Qw (k, k) =

k + PN 1
cot( 2

N )

2 s=1 Rz (s) sin 2


ks ,

N
N

k 6= 0 and k 6= N2 ,
N
1
X

and using (4.3), we finally obtain variances and covariances as


a2 (0)

= Rz (0) + 2

N
1
X
s=1

b2 (0)
a2 (

s
N 2z ,
Rz (s) 1
N

= a,b (0) = 0,

N
) = Rz (0) + 2
2

N
1
X
s=1

(4.17)

s
Rz (s) 1
(1)s ,
N

N
N
b2 ( ) = a,b ( ) = 0,
2
2
and, for k 6= 0 and k 6=

N
2,

as

N 1

s
Rz (0) X
2
+
Rz (s) 1
ks
=
cos
2
N
N
s=1

N 1

X
cot 2
2
Nk

Rz (s) sin
ks ,
N
N
s=1

N 1

Rz (0) X
s
2
2
b (k) =
+
Rz (s) 1
ks +
(4.18)
cos
2
N
N
s=1

N 1

X
cot 2
2
Nk
+
Rz (s) sin
ks ,
N
N
s=1

N 1
1 X
2
a,b (k) =
Rz (s) sin
ks .
N
N
a2 (k)

s=1

Note that the noise variances (and thus the noise powers) for real and imaginary
part at certain frequencies are different in general. Due to the properties of the DFT
of a real vector, cf. for example [59], this is not surprising for k = 0 and k = N2 .
Equation (4.18) shows that this also happens for k 6= 0 and k 6= N2 and that there
are statistical dependencies between real and imaginary part of the noise samples.
In the following subsection, we will study this effect in more detail.

4.1. Derivation of the Noise Characteristics

73

4.1.4 Powers and Statistical Dependencies of Real and Imaginary Part of the
Noise
Let us consider
thecovariance matrices Ca,b (k) of the real 2 - dimensional random

an0 (k)
vectors
, k = 1, . . . , N2 1:
bn0 (k)
2

a (k) a,b (k)


Ca,b (k) =
(4.19)
a,b (k) b2 (k)
Eigenvalue decompositions, cf. Theorem 3.17, expand them into diagonal and
orthonormal matrices a,b (k) and Ua,b (k), such that
Ca,b (k) = Ua,b (k) a,b (k) UT
a,b (k).

(4.20)

Using (4.18), one calculates the diagonal eigenvalue matrices

1 (k)
0
a,b (k) =
0
2 (k)
as
1 (k) =

N 1

Rz (0) X
s
2
+
Rz (s) 1
ks
cos
2
N
N
s=1

N 1

2 (k) =

X
1
2

Rz (s) sin
N sin N k s=1

N 1

Rz (0) X
s
2
+
Rz (s) 1
ks +
cos
2
N
N
s=1

2
ks ,
N
(4.21)

N
1
X
1
2

+
ks ,
Rz (s) sin
N
N sin 2
N k s=1

and the orthonormal eigenvector matrices as



cos N
k sin N
k
Ua,b (k) =
.

sin N
k
cos N
k

(4.22)

It is important to observe that there are no correlations and variance differences


between real and imaginary part at a certain frequency k if and only if 1 (k) =
2 (k). Therefore, it is natural to look at the difference 2 (k) 1 (k). From
(4.21), we obtain
2 (k) 1 (k) =
=

N
1
X
2
2

Rz (s) sin
ks
N
N sin 2
N k s=1
(N 1
)
X
2
2
j
ks

=
Rz (s)e N
N sin 2
Nk
s=1

(4.23)

74

4. Noise and Interference Analysis of DMT

and, using the properties of the DFT / IDFT, cf. for example [59], we are now able
to state the following theorem:
Theorem 4.1 Correlations and variance differences between real and imaginary
part of the noise at the output of the DFT do not occur at any frequency k =
1, . . . , N2 1 if and only if the autocorrelation function Rz (n) satisfies the relation
Rz (n) = Rz (N n),

n = 1, . . . , N 1.

(4.24)

Note that the same theorem holds essentially for the noise at the input of the
decision device, because the frequency domain equalizer performs simple multiplications with complex numbers for each frequency, which corresponds to rotations
(phase) and scalings (absolute value) in the complex plane (for each frequency
separately).
Theorem 4.1 tells us that correlations and / or variance differences occur only
in the presence of colored noise. This is not a big surprise: the cyclic prefix is
not able to transform the linear convolution that models the noise correlations into
a cyclic convolution. All effects of the linear convolution remain visible at the
output of the DFT. It is obvious that in practical systems, the condition (4.24) on
the autocorrelation function10 is never fulfilled, i.e., there will be (at least) one
n {1, . . . , N 1} with Rz (n) 6= Rz (N n). So one has to be aware that the
noise powers for real and imaginary part are different.

4.1.5 Consequences and Asymptotic Analysis of the Noise Characteristics


Consider again a common DMT system as depicted in Figure 2.1. In the block
Mapping, the bit stream is mapped onto QAM-symbols of quadratic (QAM) constellations of size and density that depend on the noise power of the corresponding
carrier / frequency, e.g., according to the water filling solution. But this implicitly assumes that the noise power of real and imaginary part are equal. As we
showed in the previous subsections, this is not the case in general. In order to be
better adapted to the discovered noise characteristics, one should use rotated (nonsquare) rectangular complex symbol alphabets instead, i.e., symbol constellations
of the form

(t1 V1 + t2 V2 ) e :

t1 {1, 3, . . . , (M1 1)}


,
(4.25)

t2 {1, 3, . . . , (M2 1)}


where is the rotation angle, V1 and V2 are certain gain factors, and M1 and M2 are
the (even) numbers of signal points in the two orthogonal directions, respectively,
cf. also Figure 4.3. The rotation angles can be obtained from the orthonormal
10

An inspection of (4.24) shows that this condition is equivalent to the requirement that
the
autocorrelation
function
shifted to the left by N2 is an even function within the interval

N
N
( 2 1), +( 2 1) . Do not mix this up with the fact that any non-shifted autocorrelation
function is an even function.

4.1. Derivation of the Noise Characteristics

75

V2

<

V1

Fig. 4.3: Rotated rectangular complex symbol constellation (M1 = 8, M2 = 4).

eigenvector matrices11 , cf. (4.22), and the rotations introduced by the Frequency
Domain Equalizer, whereas size and density are determined by the eigenvalues, cf.
(4.21), and the scalings introduced by the Frequency Domain Equalizer.
Theorem 4.2 The angles of the noise rotations at the output of the DFT
(k) =

k,
N

k = 1, . . . ,

N
1,
2

are independent of the mean z and of the autocorrelation function Rz (n) of the
noise at the output of the Time Domain Equalizer (TDE).
To get a better picture of the deviations from common quadratic QAM constellations for the (ideal) rotated rectangular constellations, we have to look at the
relative differences
d(k) =
11

2 (k) 1 (k)
2 (k) 1 (k)
= 2
,
1 (k) + 2 (k)
a (k) + b2 (k)

These matrices are rotation matrices and describe the noise rotations.

(4.26)

4. Noise and Interference Analysis of DMT

76

|Relative Differences| (N=512)


1

0.9

0.8

0.7

|d(k)|

0.6

0.5

0.4

0.3

0.2

0.1

q=2
0

q = 70
50

q = 254

100

150

200

250

Fig. 4.4: Relative differences. The noise is zero mean filtered white Gaussian noise with
variance 1.

which can be viewed as a measure for the eccentricity of the noise ellipses and in
turn for the non-squareness of the rotated rectangular constellations. Note that the
scaling factor that is introduced by the Frequency Domain Equalizer is canceled
out in the normalized expression (4.26).
From (4.21), we can obtain the traces
2 (k) + 1 (k) = Rz (0) + 2

N
1
X
s=1

s
2
ks .
Rz (s) 1
cos
N
N

We omit an analytic treatment of the relative differences. Instead, we first provide


some simulation results, and then analyze what happens if N goes to infinity for a
special choice of Rz (n).

Simulation Results
From Figure 4.4, one can see that significant differences between ideal rotated
rectangular constellations and the quadratic QAM constellations used may occur.
The noise is zero mean filtered white Gaussian noise with filter impulse response12
12

This is an artificially constructed example that merely shows the existence of extreme cases.
Note that if |d(k)| = 1 for a certain frequency k, the (optimum) rotated rectangular constellation at
this frequency k collapses into a constellation whose symbols are laying on a straight line.

4.2. Capacity Loss

77

(q is a fixed parameter)

1 cos 2n
cos 2qn
N
N , n = 0, . . . , N 1 .
gq (n) =
0
, n=
6 0, . . . , N 1

(4.27)

Note that 1 is the maximum value |d(k)| can take (in this case: 1 (k) = 0 or
2 (k) = 0).

Asymptotic Analysis
In the following, we assume that the random noise process at the input of the receiver is zero mean and has an autocorrelation function

1, n = 0
1
, n = 1 or n = 1
Rz (n) =
.
(4.28)
2
0, n 6= 1 and n 6= 0 and n 6= 1
The corresponding relative (eigenvalue) difference at frequency

N
1
,
1 =
2
N (N 1) cos 2
N

N
2

1 is given by
(4.29)

which tends to 1, if N tends to infinity. This shows that, in general, an increasing


DFT-length N does not decrease this effect of having correlations and / or variance differences of real and imaginary part. On the contrary, there are frequencies
(dependent on N ; here shown for frequency N2 1) for which non-square symbol constellations are becoming more and more appropriate. The bigger N , the
more dramatic the achievable gain by using a non-square symbol constellation at
frequency N2 1 instead of a common quadratic QAM constellation.

4.2 Capacity Loss


In order to use rotated rectangular constellations, one has to modify the existing
bit-loading / mapping algorithms, such that they take into account the powers of
the real and imaginary parts of the noise. Note that the noise rotation angles do not
depend on the actual noise characteristics (see Theorem 4.2) and need not be estimated during transmission (Of course, the rotations introduced by the Frequency
Domain Equalizer have to be taken into account, too. But these parameters have
to be estimated anyway.). On the other hand, if one sticks to the conventional
quadratic QAM constellations, one has to accept significant increases in symbol
error probability (at least if there is no appropriate coding, cf. Section 4.3) and
decreased capacity. We will specialize the results of Subsection 3.2.3 in order to
obtain this capacity loss. Of course, we will assume that all required technical
assumptions (high SNR assumption) of Subsection 3.2.3 (see also Subsection
3.2.2) are fulfilled.

4. Noise and Interference Analysis of DMT

78

First of all, observe that only the frequencies k = 1, ..., N2 1 contribute to


this capacity loss because, for frequencies k = 0 and k = N2 , the corresponding
elements of the input-vector of the IDFT have to be real valued, cf. also (2.13),
which is in line with (4.17). Mapping this to our nomenclature (of Subsection
N
3.2.3), we have x, y, n C 2 1 (r = t = N2 1, cf. the channel model (3.61) and
especially the dimensions of the involved vectors and matrices) and, see (2.16),

2
H e N 1
0

..
(4.30)
A=
,
.
2 N

0
H e N ( 2 1)
H(z) being the Z - transform of the impulse response h. It remains to specify the
covariance matrix and the pseudo-covariance matrix of the complex noise vector
n. An inspection of (4.3), (4.5), (4.6) and (4.12) shows that this has been already
done in Section 4.1. However, these formulas are too complicated to serve as a
base for an analytical expression for the capacity loss. Instead, we will use an
approximation that only takes into account dependencies and power differences
of real and imaginary part at certain differences and neglects cross-correlations
between different frequencies (subcarriers). Of course, the random noise vector
is assumed to be Gaussian distributed. Using (the notation of) (4.14) or (4.19),
we can write the covariance matrix of the equivalent real valued noise vector n
of
doubled dimension N 2 as
2

a (1)
0
a,b (1)
0

..
..

.
.

N
N
2

0
a,b ( 2 1)
0
a ( 2 1)

, (4.31)
Cn =
2 (1)

(1)
0

0
a,b
b

..
..

.
.
a,b ( N2 1)

from which we obtain, by applying (3.6),


2
a (1) + b2 (1)

..
<{C } =
n

={Cn } = 0,

<{Pn } =

b2 ( N2 1)

0
.

a2 ( N2 1) + b2 ( N2 1)

a2 (1) b2 (1)

={Pn } = 2

0
a,b (1)
0

..

..

.
a2 ( N2 1) b2 ( N2 1)

a,b ( N2 1)

, (4.32)

4.2. Capacity Loss

79

and finally, by applying (4.20), (4.21) and (4.22),

Cn =

Pn

1 (1) + 2 (1)

..

e N 1

1 ( N2 1) + 2 ( N2 1)

.
e

0
..

(4.33)

N
1)
2
N ( 2

1 (1) 2 (1)

0
..

.
1 ( N2 1) 2 ( N2 1)

such that we are now in the position to specialize the results of Subsection 3.2.3.
With the generalized Cholesky factor of Cn ,
p

Bn =

1 (1) + 2 (1)

0
..

1 ( N2

1) +

2 ( N2

(4.34)

1)

we calculate


1 (1)+2 (1)

B1
A
=

2
H e N 1

..

2 N 1)
H e N(2
q


ff
2
arg H e N 1

1)+2 ( N
1)
1 ( N
2
2

.
e


ff
2 N 1)
arg H e N ( 2

2
H e N 1


1 (1)+2 (1)

(4.35)

0
..

0
..

2 N

H e N ( 2 1)

q
N
N
1 ( 2 1)+2 ( 2 1)

4. Noise and Interference Analysis of DMT

80

ff

2
arg H e N 1

0
..


ff
2 N 1)
arg H e N ( 2

e
{z

2
H e N 1


1 (1)+2 (1)

H
Q

0
..

{z

2 N

H e N ( 2 1)

q
N
N
1 ( 2 1)+2 ( 2 1)

Q QH ,
|{z}
VH

where Q is a permutation matrix such that D contains the singular values of B1


n A
in descending order, cf. Footnote 14 in Subsection 3.1.3. Note that a permutation
matrix is a square matrix that contains exactly one 1 in each column and row and
has zero entries otherwise. It follows that any permutation matrix is unitary, and
we have in fact found the SVD of B1
n A. From (3.94), we obtain

2 2

N 1
0
H
e

..
(4.36)
Cymax = L
,
.

2 N
2

0
H e N ( 2 1)
L being the Water Level, and, furthermore,
2

0
H e N 1


..
Bymax = L
.
2 N

0
H e N ( 2 1)

(4.37)

such that

T
B1
ymax Pn Bymax

e N 1
..
0

1

L

.
e

(1)2 (1)
1
2

2
H e N 1

0
..

(4.38)

N
2
1)
N ( 2

.
1 ( N
1)2 ( N
1)
2
2
2
2
N

H e N ( 2 1)

4.2. Capacity Loss

81

Capacity Loss
20

18

16

14

12

C
CRot Rect

[%]

10

5
6
Loop length [km]

10

Fig. 4.5: Normalized capacity loss (with respect to the capacity CRot Rect that assumes that
correlations and variance differences of real and imaginary part of the noise are
utilized) in terms of loop length, cf. also Section B.1 in the Appendix.

has singular values13


k =

|1 (k) 2 (k)|
2 2 ,

L H e N k

k = 1, . . . ,

N
1.
2

(4.39)

Inserting the k into (3.100), we finally end up with the following (approximate)
expression for the capacity loss [bit / channel use],

N
1
2
2
X
1
(1 (k) 2 (k))

C
log2 1
(4.40)
2k 4 ,

2(N + p)
2
N
L H e
k=1

where p denotes the length of the Cyclic Prefix. Note that the division by N + p
originates from the serial-to-parallel conversion, the conjugate complex extension
and the addition of the Cyclic Prefix (compare also with (2.1)).
We performed some simulations in order to demonstrate the capacity loss. We
used the parameters of a real ADSL scenario, i.e., a DFT-length of N = 512, a subcarrier spacing of 4312.5 Hz, and a transmit power of 100 mW. The transfer functions of the loops were obtained by measurements of Austrian cables. The noise
13

Do not mix up k of Subsection 3.2.3 with 1 (k) and 2 (k) of Section 4.1.

82

4. Noise and Interference Analysis of DMT

model was as follows: we assumed two additive, statistically independent noise


components. One was the typical noise environment in a cable bundle including
crosstalk and background noise. The other was stationary narrowband interference
with a bandwidth of 10 kHz, center frequency of 1.07 MHz, and 0 dBm power. For
a full and detailed description of this simulation scenario we refer to Section B.1 in
the Appendix. The normalized capacity loss depending on the loop length can be
seen in Figure 4.5. Obviously the capacity loss ist not severe for short and medium
length cables.

4.3 Symbol Error Probability and Optimized Bit-Loading


In this section, we are again interested in the gain achieved by rotated (non-square)
rectangular constellations instead of quadratic QAM constellations. But, in contrast to the previous section, where capacity was used to express this gain, we now
use uncoded symbol error probability (and SNR gain) as a performance measure.
Note that this measure can be regarded as the other extreme: while capacity results
implicitly assume an optimum (not easily implementable) coding strategy, we do
not take into account coding here at all.
Furthermore, we also address the issue of designing the optimum (rotated) rectangular constellation and present explicit formulas for the number of constellation
points and gain factors in the respective directions.
We start our analysis with a basic property of rectangular constellations. Let us
assume that the following rectangular constellation

t1 V1 (k) + t2 V2 (k) :

t1 {1, 3, . . . , (M1 (k) 1)}


(4.41)

t2 {1, 3, . . . , (M2 (k) 1)}


is assigned to the k-th subcarrier, where V1 (k) and V2 (k) are certain gain factors
and M1 (k) and M2 (k) are the (even) numbers of signal points in real and imaginary part direction, respectively. Note that the total number of signal points is
Mall (k) = M1 (k)M2 (k). If these signal points are chosen with equal probabilities
during transmission, the average signal power in real and imaginary part direction
is calculated as (i = 1, 2)
Mi (k)
2

Si (k) =

X
2
(2m 1)2 (Vi (k))2
Mi (k)
m=1
2

2 (Vi (k))
Mi (k)

Mi (k)
2

4m2 4m + 1

m=1

(4.42)

1
2
2
(Vi (k)) (Mi (k)) 1 ,
=
3
P
P
M (M +1)
M (M +1)(2M +1)
2
where M
and M
were used. If the
m=1 m =
m=1 m =
2
6
number of signal points Mi (k) is not too small, we can approximate the obtained

4.3. Symbol Error Probability and Optimized Bit-Loading

83

signal power by

1
(Vi (k))2 (Mi (k))2 .
(4.43)
3
Next, we assume that a certain conventional bit-loading algorithm assigns a
square QAM constellation to the k-th subcarrier. This implies that the gain factors
and the number of signal points in real and imaginary part direction, respectively,
are equal, see (4.41), i.e.,
Si (k)

V (k) = V1 (k) = V2 (k),


M (k) = M1 (k) = M2 (k).
Note that the total number of signal points is Mall (k) = (M (k))2 . Specializing
(4.42) and (4.43), the average signal power (in both real and imaginary part direction together) is obtained as
Sall (k) = S1 (k) + S2 (k)
2
=
(V (k))2 (Mall (k) 1)
3
2

(V (k))2 Mall (k),


3

(4.44)

the approximation being good, provided that the number of signal points Mall (k) is
not too small. At the input of the decision device, the eigenvalues of the covariance
matrices of real and imaginary parts of the noise at certain frequencies are given
by - see also (2.16) 2k 2

(4.45)
1 (k) = H e N 1 (k),
2k 2

2 (k) = H e N 2 (k).
In the following, we will assume 1 (k) > 2 (k) without loss of generality. In the
presence of Gaussian noise, the symbol error probabilities can be approximated by
(see Figure 4.6)
s

2
2
(V (k))
(V (k))
(4.46)
PSquare QAM (k) 2Q
+ 2Q
1 (k)
2 (k)
s
s

2
2
(V (k)) (V (k))
4Q
Q
1 (k)
2 (k)

s
2
(V (k))
,
2Q
1 (k)
where the Q-function is defined as
1
Q(x) =
2

Z
x

t2

e 2 dt.

(4.47)

84

4. Noise and Interference Analysis of DMT

2 (k)
1 (k)

V (k)
1 V
2

(k)

2V (k)

Fig. 4.6: Decision region of interior point in square QAM constellation (bounded by solid
line) and lower and upper bound areas (bounded by dashed lines). The edges of
these lower and upper bound areas are chosen to be parallel to the eigenvectors of
the noise and are margined by the inscribe circle and circum
circle of the square
decision region. This is the reason for the factors 12 and 2.

4.3. Symbol Error Probability and Optimized Bit-Loading

dB

0 dB

6 dB

85

9.5 dB

12 dB

14 dB

15.6 dB

10

10

10

10

10

Q (x)

10

10

10

10

10

10

10

x
Fig. 4.7: The function Q ().

Note that the first approximation in (4.46) originates from the fact that only interior
points of the QAM constellation are considered and that the noise rotations are not
taken into account. Observe that the noise rotations at the input of the decision
device (in contrast to the noise rotations at the outputn of the DFT)
o are not only
2k
determined by Theorem 4.2: they also depend on arg H e N
. The second

approximation is due to the fact that the function Q is a strictly monotone
decreasing function having values close to zero for the arguments under consideration, cf. also Figure 4.7, or - to say it differently - the expression on the right
hand side of (4.46) is dominated by the first term. We can see from Figure 4.6
that the decision region is always contained in and contains another quadratic area
whose edges are parallel to the noise (or more precisely: to the eigenvectors of
the noise). This enables us to determine a lower and an upper bound for the symbol
error probability which are valid for all possible rotation angles, i.e.,
s

2
2
(V (k))
1 (V (k))
2Q 2
. PSquare QAM (k) . 2Q

.
(4.48)
1 (k)
2 1 (k)
Again, these inequalities are only approximations, as indicated by the special less
or equal than - symbol, because boundary points of the QAM constellation are not
incorporated in the formulas and some Q-function-expressions are approximated

86

4. Noise and Interference Analysis of DMT

by the zero value. But we want to emphasize that these approximations are quite
good and do not affect the validity of the inequalities significantly.
Inserting (4.44), we can summarize the results obtained for the symbol error
probability for a square QAM constellation under the presence of rotationally variant noise as follows,
s
PSquare QAM (k) 2Q

3Sall (k)
c
2Mall 1 (k)

!
,

(4.49)

where c = 1 is an approximation, which neglects the noise rotations (but not the
power differences), and c = 2 and c = 12 yield lower and upper bounds, respectively, which are valid for all possible rotation angles.
We continue our analysis by considering (optimized) rotated rectangular constellations. First of all, observe that we can restrict ourselves to the case when
the noise eigenvectors are parallel to the constellation edges, because the rotation angles
a o
priori known from Theorem 4.2 and the additive contribution
n are2k
N
of arg H e
. Note that there is no additional estimation effort for these
2k
parameters, because H e N
has to be estimated anyway in order to guarantee that the Frequency Domain Equalizer works properly. We will calculate the
reduced symbol error probability that occurs if one applies an optimized (with respect to symbol error probability) rotated rectangular constellation which has the
same number Mall (k) of signal points, i.e., it serves the same data rate, and uses
the same signal power Sall (k) (compare with (2.27) or (2.28)).
Similarly to the square QAM case, the symbol error probabilities (for the following analysis we also refer to [13]) can be approximated by

2
2
(V
(k))
(V
(k))
1
2
+ 2Q

(4.50)
PRot Rect (k) 2Q
1 (k)
2 (k)
s
!
s
!
3S1 (k)
3S2 (k)
2Q
+ 2Q
,
(M1 (k))2 1 (k)
(M2 (k))2 2 (k)
s

where (4.43) was inserted. As already mentioned, the approximations are valid
for all rotation angles, and neglect only boundary points of the constellation and
some small values of the Q-function. According to (4.43), we also assume that the
number of signal points is not too small.
In order to determine the optimum constellation parameters, we have to minimize the function

F S1 (k), S2 (k), M1 (k), M2 (k) =


q

q
3S1 (k)
3S2 (k)
= 2Q
+
2Q
2
(M (k)) (k)
(M (k))2
1

2 (k)

(4.51)

4.3. Symbol Error Probability and Optimized Bit-Loading

87

under the side constraints


S1 (k) + S2 (k) = Sall (k),

(4.52)

M1 (k)M2 (k) = Mall (k),


Si (k) > 0,

i = 1, 2,

Mi (k) > 0 and even,


Let us consider the function

f (x, y) = 2Q ( xy) + 2Q

i = 1, 2.

y0
(x0 x)
y

(4.53)

and observe that

Mall (k)
F S1 (k), Sall (k) S1 (k), M1 (k),
= f (x, y) ,
M1 (k)

(4.54)

if
x = S1 (k),

(4.55)

3
y =
,
(M1 (k))2 1 (k)
x0 = Sall (k),
9
,
y0 =
2
(Mall (k)) 1 (k)2 (k)
such that we can find the minimum of F (, , , ) (under the required side constraints) by minimizing f (, ). Note that y is regarded as a continuous, real valued
parameter during the minimization process; this conflicts with the requirement that
M1 (k) is an even number. But we will stick to this simplification and will fulfill
the requirement by rounding to an even number after having found the continuous
minimum. Using
x
1 e 2
d

Q x =
,
(4.56)
dx
2 2 x
we obtain

y
(x0 x) y0
xy
2

1 e 2
e
y0
f (x, y) =
(4.57)
yq
x
xy
2
(x0 x) yy0 y

(x x)y
r
0 2y 0
xy
1
e
y
e 2
,
= q
y
x
2
(x0 x)

y0

xy
2

y
(x0 x) y0
2

1
e
e
y0
f (x, y) = x q
(x0 x) 2
y
0
y
xy
y
2
(x0 x) y

r
r
(x x)y
x
1
y0
0 2y 0
xy
e
=
(x0 x) 3 e 2
.
y
y
2

4. Noise and Interference Analysis of DMT

88

Setting

x f (x, y)

y f (x, y)

= fx (x, y) = 0,

(4.58)

= fy (x, y) = 0,

yields
q
xy
2

xy
xy

e 2

xy
which implies

(x0 x) yy0
y
(x0 x) y0

qe
(x0 x) yy0
e

y
(x0 x) y0

y0
,
y2

x0 x y0
,
x y2

x0 x
= 1,
x

(4.59)

(4.60)

and furthermore

x0
.
2
Substituting this back into (4.59), we obtain the following equation for y
x=

x0
y0 x20 yy0
e
= ye 2 y ,
y

which has the solution


y=

y0 .

(4.61)

(4.62)

(4.63)

Note that it can happen that this solution of (4.62) is not the unique solution. The
question whether it is unique or not depends on the parameters x0 and y0 . To see
this, observe that (4.62) can be written as

y0
gx0
= gx0 (y)
(4.64)
y
with the function
gx0 (y) = ye

x0
y
2

(4.65)

which is not one-to-one (also called not injective) in general. However, we will not
consider
other solutions. Instead, we will show that the solution (x, y) =
x0 possible

2 , y0 is indeed a (relative) minimum (and not e.g. a maximum of f (x, y)).


Note that (4.58) is only a necessary condition for (x, y) being a relative minimum.
For a sufficient condition, we consider the 2 2 matrix consisting
of the partial

derivatives of second order evaluated at (x, y) = x20 , y0 , i.e.,
x f x0 , y f x0 , y
0
xx 2
0 xy 2 0 .
S
, y0 =
fyx x20 , y0 fyy x20 , y0
2

(4.66)

4.3. Symbol Error Probability and Optimized Bit-Loading

89

It is lengthy but straightforward to compute these partial derivatives as


s

x
1 x0 y0
2 3 x0
0
= e 4
y0
y0 + 1 ,
fxx
, y0
2
x0
2
2
x
x
0
0
fxy
= fyx
, y0
, y0
(4.67)
2
2
s

x0
1
2 1 x0
= e 4 y0
y

1
,

0
x0 y0 2
2
s

x0
1
x0
1 3 x0
0
4 y0
= e
fyy
, y0
y0 1 ,

2
2
y0
2
2
such that the determinant of S

x0
2 , y0 is obtained as

x 1 x0
2 1
0
2 y0
, y0
.
det S
= e
1

x0 y0

(4.68)

x
0
It is a well known result of Differential Calculus,
see
e.g.
[7,
8],
that
S
2 , y0 to
x0
be positive definite is sufficient for (x, y) = 2 , y0 being
minimum of
a relative

f (x, y). It is also shown in [7, 8] that a positive definite S x20 , y0 is equivalent
to
x
0
fxx
, y0 > 0
(4.69)
2
and
x
0
det S
, y0
> 0.
(4.70)
2
Comparing (4.69) with (4.67), we conclude that the first condition (4.69) is always
fulfilled. According to (4.68), (4.70) is equivalent to
x0
y0 > 1,
2

(4.71)

which then - see (4.55) - translates to


3Sall (k)
p
> 1.
2Mall (k) 1 (k)2 (k)

(4.72)

It is clear that this condition is usually fulfilled in practice, cf. (4.75) and Figure
4.7. Note that we are operating in a range (especially in wireline transmission),
where the (symbol) error probabilities are small.
We also want to emphasize that the previous proof that the solution obtained
is indeed a minimum can be just as well omitted from a practical point of view,
because we will show in the following that the symbol error probability is substantially reduced when using rotated rectangular constellations with the obtained
parameters. This is obviously sufficient for application purposes. On the other
hand, we have shown that rotated rectangular constellations with these parameters

90

4. Noise and Interference Analysis of DMT

are really the optimum rotated rectangular constellations with respect to symbol
error probability.
Substituting (4.55) back into (4.61) and (4.63) and making use of the side constraints (4.52), we finally come to the conclusion that a bit-loading algorithm that
takes into account power differences and statistical dependencies between real and
imaginary parts of the noise distributes the same signal power onto the two axes of
the rotated rectangular constellations, i.e.,
Sall (k)
.
(4.73)
2
Furthermore, the optimum numbers of signal points in the two directions are derived as - applying (4.45)
1
p
2 (k) 4
M1 (k) =
Mall (k)
(4.74)
1 (k)

1
p
2 (k) 4
=
Mall (k)
,
1 (k)

1
p
1 (k) 4
M2 (k) =
Mall (k)
2 (k)
1

p
1 (k) 4
=
Mall (k)
,
2 (k)
S1 (k) = S2 (k) =

which, of course, have to be rounded to even numbers. Using (4.42), the gain
factors Vi (k) are determined as well. The symbol error probabilities (4.50) are
calculated as
s
!
3Sall (k)
p
PRot Rect (k) 4Q
,
(4.75)
2Mall 1 (k)2 (k)
so that we obtain14 (approximate) SNR gains of
s
s
SNRRot Rect (k)
1 1 (k)
1 1 (k)
G(k) =

=
,
SNRSquare QAM (k)
c 2 (k)
c 2 (k)

(4.76)

where c has the same meaning as in equation (4.49) and is explained below equation (4.49). Finally, we can express the gains in terms of the relative (eigenvalue)
differences, cf. (4.26), as
s
1 1 + |d(k)|
G(k)
.
(4.77)
c 1 |d(k)|
We want to emphasize that in our example of Section 4.2, cf. also Section B.1
in the Appendix, the modulus of the relative differences is close to 1 for almost
all frequencies (see Figure 4.8), so that the overall SNR gain is very high, without
much influence from the value of c, cf. Figure 4.9.
14

The SNR is defined via P Q

SNR .

4.3. Symbol Error Probability and Optimized Bit-Loading

91

|Relative Differences| (N=512)


1

0.9

0.8

0.7

|d(k)|

0.6

0.5

0.4

0.3

0.2

0.1

50

100

150

200

250

Fig. 4.8: Relative differences. Real ADSL scenario with narrowband interference, cf. also
Section B.1 in the Appendix.

SNR gain (N=512)


18

16

14

G(k) [dB]

12

10

50

100

150

200

250

Fig. 4.9: SNR gains (c = 1). Real ADSL scenario with narrowband interference, cf. also
Section B.1 in the Appendix.

4. Noise and Interference Analysis of DMT

92

The SNR gains are independent of the channel transfer function and of the signal
power and thus of the loop length, which is not the case for the capacity loss. Furthermore, the previous example shows that the use of rotated rectangular constellations is much more effective in reducing the (uncoded) symbol error probability
than in increasing capacity. Note that statements about capacity always assume an
optimum coding strategy which is not usually applicable in practice. For practical
en- / decoders, the overall gain will be somewhere in-between. It depends on the
ability of the code to use the safer transmission in one direction (corresponding to
one eigenvector) to correct the more frequent errors in the other direction (corresponding to the other eigenvector). The effort required to adapt the coding strategy
is much higher than for implementing rotated rectangular constellations.

4.4 Intersymbol Interference (ISI) and Intercarrier Interference (ICI)


It was already mentioned in Section 2.1 that the Time Domain Equalizer (TDE)
has the task to shorten the channel impulse response, so that the resulting impulse
response has a length shorter or equal to p + 1, p being the length of the Cyclic
Prefix. In other words, e is chosen such that h = e g, g being the (original)
channel impulse response, satisfies
h(n) = 0,

n < 0 or n > p.

(4.78)

In the following, we are interested in what happens if this constraint (4.78) holds
only approximately. To analyze the effects occurring, we relax (4.78) to
h(n) = 0,

n < l < 0 or n > l+ > p.

(4.79)

Again, the filter e (the TDE) will be always non-causal and therefore not implementable for practically occurring channel impulse responses g. Since a simple
delay in the receiver solves this problem, we will stick to (4.79) for the reason of
simplicity. We want to emphasize that this model together with a variable (adaptive) delay includes the case when the evaluation frame in the receiver is moved to
a certain extent in order to maximize performance. Furthermore, main parts of this
section are not limited to wireline DMT transmission. The derivation and (most of
the) results remain valid for wireless OFDM transmission as well.
Due to our new assumption (4.79), the input - output relationship (2.6) has to
be modified, i.e.,
+

u(n0 + n) =

l
X

h(k)t(n0 + n k) + z(n0 + n)

(4.80)

k=l

p
X

h(k)t(n0 + n k) + z(n0 + n)

k=0

1
X
k=l

h(k)t(n0 + n k) +

l
X
k=p+1

h(k)t(n0 + n k).

4.4. Intersymbol Interference (ISI) and Intercarrier Interference (ICI)

previous symbol

present symbol

an0 (N +p)

CP

93

next symbol

an0

an0 +(N +p)

CP

CP

qn0 (N +p)

qn0

bn0 (N +p)

qn0 +(N +p)

bn0

bn0 +(N +p)

an0
t(n0 p)

t(n0 )

t(n0 + N 1)
qn0

u(n0 )

u(n0 + N 1)
bn0

Fig. 4.10: DMT symbol structure (CP = Cyclic Prefix).

At the output of the DFT, we therefore obtain the vector

dn0 (0)

..
dn0 =

(4.81)

dn0 (N 1)
with
dn0 (m) =

N 1 p
2
1 XX

h(k)t(n0 + n k)e N nm
N n=0 k=0

(4.82)

N 1
2
1 X
z(n0 + n)e N nm
+
N n=0

(4.83)

N 1 1
2
1 X X
+
h(k)t(n0 + n k)e N nm
N n=0 k=l

(4.84)

N 1 l
2
1 X X
+
h(k)t(n0 + n k)e N nm ,
N n=0 k=p+1

cf. also our nomenclature of Section 2.1,


dn0 = Fbn0

(4.85)

4. Noise and Interference Analysis of DMT

94

with

bn0 =

u(n0 )
u(n0 + 1)
..
.

u(n0 + N 1)
For an illustration of our nomenclature, we also refer to Figure 4.10. The first term
(4.82) is the good term, that describes the influence of that part of the impulse
response h that does not exceed the length of the Cyclic Prefix, and has already
been discussed in detail in Section 2.1. The second term (4.83) originates from the
(background) noise and has been analyzed in Sections 4.1, 4.2 and 4.3. The third
and fourth term (4.84) and (4.85), respectively, result from the weakening of (4.78)
to (4.79). In the following, we will consider these two expressions, which cause as it is called - interference on the m-th subcarrier, i.e.,

i
n0 (m)

N 1 1
2
1 X X

h(k)t(n0 + n k)e N nm ,
N n=0 k=l

N 1 l
2
1 X X

h(k)t(n0 + n k)e N nm ,
N n=0 k=p+1

(4.86)

i+
n0 (m)

(4.87)

m = 0, . . . , N 1.
First of all, observe that i
n0 (m) has contributions from
t(n + 1), . . . , t(n0 + N 1), t(n0 + N ), . . . , t(n0 + N + |l | 1),
| 0
{z
} |
{z
}
elements of qn0

elements of qn0 +(N +p) ,qn0 +2(N +p) ,...

whereas i+
n0 (m) has contributions from
t(n l+ ), . . . , t(n0 p 1) , t(n0 p), . . . , t(n0 + N p 2),
| 0
{z
} |
{z
}
elements of ...,qn0 2(N +p) ,qn0 (N +p)

elements of qn0

cf. also Figure 4.10. Hence, both terms depend on the present DMT symbol (also
called frame). This effect is called intercarrier interference (ICI). The dependence
on the preceding and following DMT symbols (frames) is called intersymbol interference (ISI). More specifically, i
n0 (m) contains precursors from the following
(future) symbols and i+
(m)
contains
postcursors from the preceding (past) symn0
bols.

4.4. Intersymbol Interference (ISI) and Intercarrier Interference (ICI)

95

l
l+

A1
p+1
p

l+ p 1

N 1

?
/

l = l+ n
l =p+1n

A2

m
l+ N

m
n = l+ l

n=p+1l

A3
p+1N

Fig. 4.11: Summation area in (4.88). Note that the area structure is slightly different for
l+ N + p, since then l+ p 1 N 1, but the basic idea behind the
interchange of the two sums remains the same.

4.4.1 Deterministic Interference Analysis


We start our analysis with
+

i+
n0 (m)

N 1 l
2
1 X X

h(k)t(n0 + n k)e N nm
N n=0 k=p+1

N 1 l n
2
1 X X

h(n + l)t(n0 l)e N nm ,


N n=0 l=p+1n

(4.88)

where the index change l = k n has been performed, such that the summation
over k is replaced by a summation over l. Next, we interchange the two sums. Due
to the dependence of n in the inner sum, we have to investigate the summation over
(n, l) in some more detail. Figure 4.11 shows the effective pairs that are used in the
two sums. They are denoted by the areas A1 and A2 . In addition to interchanging
the two sums, we sum over area A1 , add the sum over areas A2 + A3 , and then

4. Noise and Interference Analysis of DMT

96

subtract the sum over A3 , i.e.,


+

i+
n0 (m)

l
l l
2
1 X X

h(n + l)t(n0 l)e N nm


N l=p+1 n=0

1
+
N
1

p
X

+ l
lX

(A1 ) (4.89)
2

h(n + l)t(n0 l)e N nm

(A2 + A3 )

l=p+1N n=p+1l
l+
N
X

+ l
lX

h(n + l)t(n0 l)e N nm

(A3 ).

l=p+1N n=N

Another index change, i.e., k = n + l, such that the summation over n is replaced
by a summation over k, yields
+

i+
n0 (m)

l
l
X
2
2
1 X

t(n0 l)e N lm
h(k)e N km
N l=p+1
k=l

1
+
N
1

p
X

t(n0 l)e

t(n0 l)e

l=p+1

l
X

2
lm
N

l=p+1N
Pl+

h(k)e N km

k=p+1

l+ N

l
X

2
lm
N

l=p+1N

(4.90)

{z

h(k)e N km .

k=N +l

2
t(n0 +N l)e N lm

Pl+

k=l

2
h(k)e N km

Replacing
1

p
X

t(n0 l)e N lm =

l=p+1N

N
1

NX
p1

t(n0 + l)e N lm

(4.91)

l=p
NX
p1

t(n0 + l)e N lm +

l=0
1
2
1 X
+
t(n0 + l)e N lm
N l=p

NX
p1

t(n0 + l)e N lm +

l=0
1
2
1 X
+
t(n0 + N + l)e N lm
N l=p

4.4. Intersymbol Interference (ISI) and Intercarrier Interference (ICI)

NX
p1

97

t(n0 + l)e N lm +

l=0
N 1
2
1 X
+
t(n0 + l)e N lm
N l=N p

N 1
2
1 X

t(n0 + l)e N lm
N l=0

N 1
2
1 X

an0 (l)e N lm
N l=0

= cn0 (m),
where the Cyclic Prefix, t(n0 + l) = t(n0 + N + l), l = p, . . . , 1, was utilized
and the nomenclature of Section 2.1, in particular an0 = F1 cn0 , was taken into
account (cf. also Figure 4.10), we obtain
+

i+
n0 (m)

= cn0 (m)

l
X

h(k)e N km

(4.92)

k=p+1
+

l
l
X
2
2
1 X
+
t(n0 l) e N lm
h(k)e N km
N l=p+1
k=l
{z
}
|

(4.93)

2 m

Hl+ e N
+

l
l
X
2
2
1 X
t(n0 + N l) e N lm

h(k)e N km ,
N l=p+1
k=l
{z
}
|

(4.94)

2 m

Hl+ e N

where
+

2
lm
N

l
X

h(k)e

2
km
N

k=l

+ l
lX

k=0

h(k + l)e N km

(4.95)

h(k + l)e N km

k=0

2
= Hl+ e N m
denotes the one-sided Z + - transform of (the tail of) the impulse response h shifted
to the left by the (non-negative) integer number l,
Hl+ (z) =

X
k=0

h(k + l)z k ,

(4.96)

4. Noise and Interference Analysis of DMT

98
2

evaluated at e N m . Note that the first term (4.92) is a contribution to the channel
matrix D of (2.16), i.e., instead of
p
X

h(k)e N km ,

k=0

we now have

l
X

h(k)e N km

(4.97)

k=0

as diagonal entries. The second term (4.93) causes intersymbol interference (ISI).
To be more precise, it expresses how postcursors from previous symbols disturb
the transmission. Finally, the third term (4.94) describes intercarrier interference
(ICI), provided that l+ is not too large. As long15 as l+ p + N , only the present
symbol has influence. Otherwise, additional ISI - again postcursors from preceding
symbol(s) - has to be accepted.
We continue our analysis with
i
n0 (m) =
=

N 1 1
2
1 X X

h(k)t(n0 + n k)e N nm
N n=0 k=l
N 1 1n
2
1 X X

h(n + l)t(n0 l)e N nm ,


N n=0 l=l n

(4.98)

where the index change l = k n has been performed, such that the summation
over k is replaced by a summation over l. Next, we interchange the two sums. Due
to the dependence of n in the inner sum, we have to investigate the summation over
(n, l) in some more detail. Figure 4.12 shows the effective pairs that are used in the
two sums. They are denoted by the areas A1 and A2 . In addition to interchanging
the two sums, we sum over area A1 , add the sum over areas A2 + A3 , and then
subtract the sum over A3 , i.e.,
i
n0 (m) =

N
X

N
1
X

h(n + l)t(n0 l)e N nm

(A1 )

(4.99)

l=l N +1 n=l l

1
+
N

0
X

1l
X

h(n + l)t(n0 l)e N nm

(A2 + A3 )

l=N +1 n=l l

0
1
X
X
2
1

h(n + l)t(n0 l)e N nm


N l=l +1 n=l l

(A3 ).

Again, another index change, i.e., k = n + l, such that the summation over n is
15

For wireline transmission and typical FFT-lengths, this is always true.

4.4. Intersymbol Interference (ISI) and Intercarrier Interference (ICI)

99

l
1

0 1

A3

N 2

N 1

l + 1

l = 1 n

A2

n = 1 l

l = l n

m
n = l l
N + 1
N

A1
l N + 1

Fig. 4.12: Summation area in (4.98). Note that the area structure is slightly different for
l N , but the basic idea behind the interchange of the two sums remains
the same.

replaced by a summation over k, yields


i
n0 (m)

N
X

t(n0 l)e

2
lm
N

l=l N +1

P0

l=l +1

1
+
N

0
X

PN 1
l=0

2
t(n0 +N l)e N lm

{z

h(k)e N km (4.100)

k=l

{z

t(n0 l)e N lm

l=N +1

NX
1+l

P1+l

k=l

1
X

}
2
h(k)e N km
2

h(k)e N km

k=l

2 lm

t(n0 +l)e N

0
1+l
X
X
2
2
1

t(n0 l)e N lm
h(k)e N km
N l=l +1
k=l

|l |1
1l
X
2
1 X
2
lm
N

t(n0 + N + l)e
h(k)e N km
N l=0
k=l
1
X

+cn0 (m)

h(k)e N km

k=l

| |1
1l
X
2
2
1 X

t(n0 + l)e N lm
h(k)e N km ,
N l=0
k=l
l

4. Noise and Interference Analysis of DMT

100

where again the nomenclature of Section 2.1 was used (cf. also Figure 4.10). Let
2

e N lm

1l
X

1
X

h(k)e N km =

k=l

h(k l)e N km

(4.101)

k=l +l
1
X

h(k l)e N km

k=

2
= Hl e N m
denote the one-sided Z - transform of (the beginning of) the impulse response h
shifted to the right by the (non-negative) integer number l,
Hl (z) =

1
X

h(k l)z k ,

(4.102)

k=
2

evaluated at e N m . Then,
1
X

i
n0 (m) = cn0 (m)

h(k)e N km

(4.103)

k=l

|l |1
2
1 X
t(n0 + N + l)Hl e N m
+
N l=0

(4.104)

|l |1
2
1 X
t(n0 + l)Hl e N m .

N l=0

(4.105)

Again, the first term (4.103) is a contribution to the channel matrix D of (2.16),
i.e., instead of
l+
X
2
h(k)e N km ,
k=0

according to (4.97), we now have


+

l
X

h(k)e N km

(4.106)

k=l

as diagonal entries. With the Z - transform [13, 59] of h,


H(z) =

X
k=

h(k)z

l
X

h(k)z k ,

(4.107)

k=l
2

the diagonal entries of D can be written as H(e N m ), so that - from a notation


point of view - the matrix D of (2.16) has not changed. Of course, the values of

4.4. Intersymbol Interference (ISI) and Intercarrier Interference (ICI)

101

its entries have changed. The second term (4.104) causes intersymbol interference
(ISI). To be more precise, it expresses how precursors from following (future) symbols disturb the transmission. Finally, the third term (4.105) describes intercarrier
interference (ICI), provided that |l | is not too large. As long16 as |l | N , only
the present symbol has influence. Otherwise, additional ISI - again precursors from
following symbol(s) - has to be accepted.
Finally, we can express the overall interference on the m-th subcarrier introduced by (4.79) including both the + and the contribution as
+

in0 (m) =

l
2
1 X

t(n0 l)Hl+ e N m
N l=p+1

(4.108)

(postcursor ISI from previous symbol)


2
X
t(n0 + N l)Hl+ e N m
l+

l=p+1

(ICI, if l+ p + N )
|l |1
2
1 X
t(n0 + l)Hl e N m

N l=0


(ICI, if l N )

|l |1
2
1 X
t(n0 + N + l)Hl e N m ,
+
N l=0
(precursor ISI from next symbol)
m = 0, . . . , N 1.
We want to emphasize again that a too long impulse response according to (4.79)
also changes the values of the channel matrix D, but not its Z - transform notation.

4.4.2 Interference Statistics


In the previous subsection, we have derived an explicit relation between the transmit signal t and intersymbol and intercarrier interference. In the present subsection, we are interested in the statistics of this interference. As usual, the transmit
signal t is modeled as a discrete-time, real valued (due to baseband17 signalling),
random process. Similarly to18 [42], we will assume that all elements of this process are zero-mean with a variance of t2 , and that all elements are pairwise uncorrelated, except of course for the elements of the Cyclic Prefix which are - by
16
17
18

For wireline transmission and typical FFT-lengths this is always true.


This is only true for wireline DMT transmission, and not for wireless OFDM transmission.
For simplicity, we maintain these assumptions.

4. Noise and Interference Analysis of DMT

102

definition - identical to other elements. Note that two identical random variables
are never uncorrelated. It is obvious that the interference is zero-mean as well.
Assuming19

l+ + l N,
(4.109)
we have, according to (4.108),
+

E{in0 (n) (in0 (m)) } =

l
1 X 2 + 2 n + 2 m
t Hl e N
Hl e N
N

(ISI)

l=p+1

l
1 X 2 + 2 n + 2 m
Hl e N
t Hl e N
+
N

(ICI)

|l |1
1 X 2 2 n 2 m
t Hl e N
+
Hl e N
N

(ICI)

l=p+1

l=0

1
N

1
|lX
|

2
2
t2 Hl e N n Hl e N m

(ISI)

l=0

and
+

E{in0 (n)in0 (m)} =

l
1 X 2 + 2 n + 2 m
t Hl e N
Hl e N
N

(ISI)

l=p+1

l
1 X 2 + 2 n + 2 m
t Hl e N
+
Hl e N
N

(ICI)

|l |1
1 X 2 2 n 2 m
t Hl e N
+
Hl e N
N

(ICI)

|l |1
1 X 2 2 n 2 m
t Hl e N
+
Hl e N
N

(ISI),

l=p+1

l=0

l=0

so that we obtain the covariance matrix20 Ci of the interference vector

in0 (0)

..
i n0 =

.
in0 (N 1)
as
Ci = E{in0 iH
n0 } = CiISI + CiICI ,
19
20

For wireline transmission and typical FFT-lengths, even this is usually true.
There is no dependence on n0 anymore.

(4.110)

4.4. Intersymbol Interference (ISI) and Intercarrier Interference (ICI)

103

where CiISI and CiICI denote the covariance matrices of ISI and ICI, respectively,
satisfying
CiISI = CiICI ,
(4.111)
and having entries
CiISI (n, m) = CiICI (n, m)

l+
t2 X + 2 n + 2 m
Hl e N
=
Hl e N
N
l=p+1

1
|lX
|

(4.112)

2
2

Hl e N n Hl e N m
,

l=0

and the pseudo-covariance matrix21 Pi of the interference vector in0 as


Pi = E{in0 iT
n0 } = PiISI + PiICI ,

(4.113)

where PiISI and PiICI denote the pseudo-covariance matrices of ISI and ICI, respectively, satisfying
(4.114)
PiISI = PiICI ,
and having entries
PiISI (n, m) = PiICI (n, m)

l+
2
t X + 2 n + 2 m
Hl e N
=
Hl e N
N
l=p+1

1
|lX
|

(4.115)

2
2

Hl e N n Hl e N m .

l=0

Note the important result that ISI and ICI have the same statistics with respect to
first and second order moments, and that both interference mechanisms are uncorrelated under the chosen assumptions, so that the statistics of the overall interference are obtained by a simple addition (or by multiplying one interference
contribution by a factor of 2).
Furthermore, observe that intersymbol and intercarrier interference are rotationally variant in general, so that we can expect further performance improvements by using rotated rectangular constellations. As in Subsection 4.1.4, the
computation of the rotation angles and of the other constellation parameters requires eigenvalue decompositions of the covariance matrices of real and imaginary
part of the interference at certain frequencies (subcarriers). Applying Theorem
21

Again no dependence on n0 .

104

4. Noise and Interference Analysis of DMT

3.17 to the individual elements of the interference vector, we can immediately express the rotation angles and the other parameters in terms of the diagonal elements
of CiISI , CiICI , PiISI , and PiICI .
We also want to emphasize that if we are interested in the optimum constellation parameters for the overall system, considering both noise and interference, we
have to apply Theorem 3.17 to the individual elements of the sum vector of noise
and interference, which yields expressions depending on the diagonal elements of
covariance and pseudo-covariance matrix of this sum vector,
Cn+i = Cn + CiISI + CiICI ,

(4.116)

Pn+i = Pn + PiISI + PiICI .


Unfortunately, we cannot expect the result that rotation angles only depend on the
number of the considered subcarrier, see Theorem 4.2, if interference is taken into
account as well.

4.4.3 Time Domain Equalizer


The Time Domain Equalizer (TDE) has the task to shorten the impulse response,
because the longer the Cyclic Prefix, the greater the capacity loss due to the introduced redundancy. However, a shortening of the impulse response (in the time
domain) corresponds to a flattening of the transfer function in the frequency domain, i.e., a non-flat channel transfer function has to be multiplied by a function
(the transfer function of the TDE), so that the resulting transfer function is flat.
This means that the transfer function of the TDE has to amplify the incoming signal at various frequencies with the drawback that it also amplifies the noise (at
these frequencies). Therefore, people came up with the idea, see e.g. [22], that it is
possibly better to choose a TDE that does not shorten the impulse response completely, but avoids too strong noise amplifications (pursuing the typical idea of the
Minimum Mean Square Error approach). Of course, ISI and ICI will occur in such
a case, and there will be a tradeoff between noise amplification and intersymbol
and intercarrier interference. One can think of different measures for this tradeoff
in order to find the optimum balance between noise and interference. However,
all approaches will try to maximize the overall performance. In [22], the authors
use capacity as the objective function, which they try to maximize over the set of
admissible TDEs.
With our results, we are now in a position to adopt this idea to obtain (practical)
TDE algorithms.
First of all, observe that the overall channel matrix D of (2.16) has entries
2
H(e N m ) that depend on the TDE via the following Z - transform equation,
H(z) = E(z)G(z),

(4.117)

where E(z) denotes the Z - transform of e, the impulse response of the TDE, and
G(z) denotes the Z - transform of the channel g, see Figure 2.2.

4.4. Intersymbol Interference (ISI) and Intercarrier Interference (ICI)

105

The noise has a covariance and pseudo-covariance matrix according to (4.3),


(4.5), (4.6) and (4.12), or, if we use an approximation that only takes into account dependencies and power differences of real and imaginary part of the noise
at certain differences and neglects cross-correlations between different frequencies
(subcarriers), cf. Theorem 3.17 and also (4.33),

Cn

Pn =

a2 (0)

1 (1) + 2 (1)

e N 0
e

2
N

a2 ( N2 )

0
..

a2 (0)

..

1 ( N2 1) + 2 ( N2 1)

(4.118)

N
2

0
1 (1) 2 (1)

..

.
1 ( N2 1) 2 ( N2 1)

a2 ( N2 )

whose entries are dependent on the mean z and on the autocorrelation function
Rz of the noise at the output of the TDE via (4.17) and (4.21). This mean z and
autocorrelation function Rz in turn depend on the TDE according to
z = s

e(k),

(4.119)

k=

Rz = e0 e Rs ,

(4.120)

with22
e0 (n) = e (n) = e(n),

n Z,

where s denotes the mean23 and Rs the autocorrelation function of the noise at
the input of the receiver, respectively. Note that we have modeled the noise at the
input of the receiver, s = [s(n)]n=,...,+ , as a discrete-time, real valued (due
to baseband signalling), wide-sense stationary (not necessarily Gaussian) random
process.
Finally, ISI and ICI have covariance and pseudo-covariance matrices according
to (4.112) and (4.115), whose entries are dependent on the impulse response e of
22
23

The impulse response of the TDE is real valued.


Of course, the noise is usually zero-mean.

106

4. Noise and Interference Analysis of DMT

the TDE via


Hl+ (z) =
Hl (z) =

X
k=

e(k)G+
lk (z) ,

(4.121)

e(k)G
l+k (z) ,

k=

where G+
l (z) and Gl (z) are defined as in (4.96) and (4.102) applied to the channel impulse response g, respectively. Note, that l+ and |l | need not to be known
beforehand, because (4.112) and (4.115) remain valid if these two parameters are
chosen to be large, as long as (4.109) is satisfied.
If we want to use capacity as an overall performance measure (similarly as in
[22]), we can plug these matrices into the formulas of Section 3.2, and, concluding
that capacity is a function of the filter coefficients of the TDE, we can maximize
this function with respect to these parameters. Note that our analytical results provide us with an explicit relation between the capacity and e, so that e.g. a conventional numerical maximization algorithm can be used to solve this maximization
problem. Also, for the design of (practical) low-complexity TDE algorithms, the
results obtained can be very useful.
At this point we will stop our considerations about the (design of the) Time
Domain Equalizer. We admit that this is possibly unsatisfactory for the reader
who is also interested in quantitative results. However, such results depend on
the chosen TDE algorithms, and in turn on the chosen design methods. A full
and meaningful analysis would need a lot of additional work and can be regarded
as a separate topic. This is beyond the scope of this manuscript. However, we
developed all analytical tools that are required for this research area and showed
that the utilization of the pseudo-covariance matrix plays an important role in the
design and analysis of Time Domain Equalizers.

5. MULTIPLE-INPUT / MULTIPLE-OUTPUT DISCRETE


MULTITONE

In Section 2.3, we came to the conclusion that we are dealing with a channel model
of the form
y = Ax + n,
(5.1)
where y C r and x C t denote the received and transmitted vectors, respectively. A is the channel matrix and n C r is the noise vector. In order to obtain
capacity results, cf. Section 3.2, and to develop efficient transmission schemes, it
is not sufficient to know the channel matrix A. It is also necessary to know the
statistical properties of the noise vector n. To be more precise, we need the covariance matrix Cn = E{nnH } and the pseudo-covariance matrix Pn = E{nnT }.
For the MIMO DMT system, this means that we have to calculate covariance and
pseudo-covariance matrix of the vector nn0 in (2.25). This is done in the first section of this chapter. We extend the results of the previous chapter to a very general
noise model at the input of the receivers, allowing correlations between the noise
signals of different receivers. Again, it will turn out that the noise is rotationally
variant in general.
In the same section, we deal with the problem of a Cyclic Prefix that is too
short. It was already mentioned in Section 2.2 that the design of Time Domain
Equalizers is critical in the MIMO case, since over-determined problems have to
be solved. We will generalize the results of Section 4.4 and obtain closed form
formulas for the MIMO interference (not to be mixed up with crosstalk) and study
their statistical properties. We will show that the interference (in the cable bundle
case) is rotationally variant as well.
Finally in this section, we will show how the obtained noise and interference
results can be used to design a MIMO Time Domain Equalizer.
In the second section of this chapter, we present the general form of a transmission scheme that can cope with channels of the form of (5.1). It is based on
so-called joint processing functions and allows the use of conventional Single Input / Single - Output (SISO) codes.
The third section deals with transmission schemes whose joint processing functions are based on the Singular Value Decomposition (SVD) of the channel matrix,
cf. also [36, 53, 54]. We will show that we can obtain the optimum joint processing functions by means of the SVD. Furthermore, we study low(er)-complexity
variants and discuss their performance. To obtain quantitative results, we perform
simulations with realistic (practically used) parameters and compare the various

108

5. Multiple-Input / Multiple-Output Discrete Multitone

methods.
The final section presents the UP MIMO1 scheme, a scheme that was originally
designed by the author, see [2, 37], for wireless transmission, and that also has
applications in wireline transmission. Specifically, it can be used to reduce the
computational complexity at the transmitter side (but not at the receiver side). We
will treat various aspects of this scheme.

5.1 Noise and Interference


5.1.1 Noise
In this subsection, we are interested in specifying the noise characteristics for our
MIMO DMT transmission system of Section 2.2, i.e., we want to determine the
covariance and pseudo-covariance matrix of nn0 in (2.25). According to Section
2.2, Figure 2.9, at each receiver input there is an additive noise signal
h
i
shki = shki (n)
,
n=,...,

where k = 1, . . . , K denotes the receiver number. We will assume in the following that these signals are modeled as discrete-time, real valued (due to baseband signalling), (pairwise) jointly wide-sense stationary (not necessarily Gaussian) random processes with given means2 and cross-correlation (autocorrelation)
functions3 [34],
n
o
(5.2)
shki (n) = E shki (n) = shki ,
n
o

Rshk1 i ,shk2 i (m + n, m) = E shk1 i (m + n)shk2 i (m) = Rshk1 i ,shk2 i (n),


k, k1 , k2 = 1, . . . , K

and n, m = , . . . , .

Note that this model incorporates dependencies between different (k1 6= k2 ) noise
signals shk1 i and shk2 i via the cross-correlation function Rshk1 i ,shk2 i . Consider, e.g.,
the case when there is one dominant noise source in a cable bundle that influences
all of its loops. Of course, there is also the possibility that the disturbances are
independent of each other. Or one can think of hybrid situations, where a noise
source only influences some nearby loops but not loops that are far away. Our
model includes all cases mentioned so that we can assume that (5.2) covers a wide
range of possible (colored) noise environments in a cable bundle.
1

The name comes from the terminology unitary parametrization.


Note that these processes are usually zero-mean in our application. However, there is the possibility that there are other applications where discrete-time, real valued, wide-sense stationary random
processes are passed (block-wise) through DFTs that are not zero-mean. We can still apply our analysis to such situations.
3
Again, the superscript denotes complex conjugation, which is of course redundant for real
valued random processes. Since we are also dealing with complex valued random processes, we
write it here for completeness.
2

5.1. Noise and Interference

109

According to (2.20), the noise signals at the output of the TDEs,


h
i
zhki = z hki (n)
, k = 1, . . . , K,
n=,...,

are also discrete-time, real valued, (pairwise) jointly wide-sense stationary random
processes with means and cross-correlation (autocorrelation) functions,
zhki
Rzhk1 i ,zhk2 i

= shki

ehki (n),

n=
0
hk2 i
hk1 i

= e

(5.3)

Rshk1 i ,shk2 i ,

k, k1 , k2 = 1, . . . , K,
with4

ehk2 i (n) = ehk2 i (n) = ehk2 i (n),

n Z.

Before we proceed as in Section 4.1, we introduce the following notion of


cross-covariance matrix and pseudo-cross-covariance matrix. Let x and y denote
two complex random vectors. We will call the matrix
Cx,y = E{(x x ) (y y )H }

(5.4)

the cross-covariance matrix, and the matrix


Px,y = E{(x x ) (y y )T }

(5.5)

the pseudo-cross-covariance matrix of the random vectors x and y. An inspection


of nn0 in (2.25) shows that, for the knowledge of Cnn0 and Pnn0 , it is sufficient
(equivalent) to know, cf. (2.23),
Cwhk1 i ,whk2 i
n0

hki

n0

and Pwhk1 i ,whk2 i


n0

n0

k1 , k2 = 1, . . . , K,

(5.6)

hki

where wn0 = Fvn0 denote the noise vectors at the outputs of the DFTs, while

z hki (n0 )

..
N
vnhki
=
R
.
0
z hki (n0 + N 1)
denote the noise vectors at the inputs of the DFTs. We want to emphasize that
this approach describes Cnn0 and Pnn0 as a matrix of matrices according to (5.6).
Similarly to Section 4.1, the l-th elements of the mean vectors (do not depend on
n0 ) are calculated as

N zhki ,
l=0
whki (l) =
,
(5.7)
0
,
l = 1, . . . , N 1
4

The impulse responses of the TDEs are real valued.

110

5. Multiple-Input / Multiple-Output Discrete Multitone

and we can write the (l1 , l2 )-th elements of (5.6) - again no dependence on n0 - as
Cwhk1 i ,whk2 i (l1 , l2 ) = Qwhk1 i ,whk2 i (l1 , l2 ) whk1 i (l1 )whk2 i (l2 ), (5.8)
Pwhk1 i ,whk2 i (l1 , l2 ) = Qwhk1 i ,whk2 i (l1 , l2 ) whk1 i (l1 )whk2 i (l2 ),

(5.9)

l1 , l2 = 0, . . . , N 1,
with
Qwhk1 i ,whk2 i (l1 , l2 ) = E{wnhk01 i (l1 )wnhk02 i (l2 )}
=

(5.10)

N 1 N 1
1 X X
E{z hk1 i (n0 + n1 )z hk2 i (n0 + n2 )}
N
n1 =0 n2 =0

e N (n1 l1 +n2 l2 )
=

N 1 N 1
2
1 X X
Rzhk1 i ,zhk2 i (n1 n2 )e N (n1 l1 +n2 l2 ) .
N
n1 =0 n2 =0

The next step is to simplify the expression for Qwhk1 i ,whk2 i (l1 , l2 ). Again, the
idea is to reorder the terms of the double sum, so that only one sum remains (after
some calculations). We have
(5.11)
Qwhk1 i ,whk2 i (l1 , l2 ) =
=

1
N

N
1 N
1
X
X

Rzhk1 i ,zhk2 i (n1 n2 )e N (n1 l1 +n2 l2 )

n1 =0 n2 =0
n1
X

N 1
1 X
N

Rzhk1 i ,zhk2 i (s)e N (n1 l1 +n1 l2 sl2 ) ,

n1 =0 s=n1 +1N

where the index change s = n1 n2 has been performed, such that the summation
over n2 is replaced by a summation over s. Next, we interchange the two sums.
Due to the dependence of n1 in the inner sum, we have to investigate the summation
over (n1 , s) in some more detail. Figure 5.1 shows the effective pairs that are used
in the two sums. They are denoted by the areas A1 , A2 , and A3 . Hence,
Qwhk1 i ,whk2 i (l1 , l2 ) =

N 1
2
1 X
Rzhk1 i ,zhk2 i (0)e N n1 (l1 +l2 )
N

(5.12)

n1 =0

(A1 )
+

N 1 N 1
2
1 X X
Rzhk1 i ,zhk2 i (s)e N (n1 l1 +n1 l2 sl2 )
N
n =s
s=1

(A2 )
+

1
N

1
X

s+N
X1

Rzhk1 i ,zhk2 i (s)e N (n1 l1 +n1 l2 sl2 ) ,

s=1N n1 =0

(A3 )

5.1. Noise and Interference

111

N 1
s = n1

@
@
R

A1

A2

1
0

N 1

@
@
R

@
I
@

n1

N 2

A3

s = n1 + 1 N

m
n1 = s + N 1

1N

Fig. 5.1: Summation area in (5.11).

and, furthermore,

Qwhk1 i ,whk2 i (l1 , l2 ) =

N 1
2
1 X
Rzhk1 i ,zhk2 i (0)e N n1 (l1 +l2 )
N

(5.13)

n1 =0

+
+

N 1 N 1s
2
1 X X
Rzhk1 i ,zhk2 i (s)e N (tl1 +sl1 +tl2 )
N

1
N

s=1 t=0
N
1 NX
1s
X
s=1

Rzhk1 i ,zhk2 i (s)e N (n1 l1 +n1 l2 +sl2 ) ,

n1 =0

where the index change t = n1 s has been performed for term (A2 ) in (5.12),
such that its summation over n1 is replaced by a summation over t, and in term

112

5. Multiple-Input / Multiple-Output Discrete Multitone

(A3 ) of (5.12) s has been replaced by s. Writing t instead of n1 , we obtain


N 1

X
2
1
e N t(l1 +l2 )
Rzhk1 i ,zhk2 i (0)
N

Qwhk1 i ,whk2 i (l1 , l2 ) =

(5.14)

t=0

+
+

N 1
NX
1s
2
1 X
Rzhk1 i ,zhk2 i (s)
e N (sl1 +tl1 +tl2 )
N

1
N

s=1
N
1
X

Rzhk1 i ,zhk2 i (s)

s=1

t=0
NX
1s

e N (sl2 +tl1 +tl2 ) .

t=0

Using the formula


M
1
X

(
t

a =

M,
1aM
1a ,

t=0

a=1
,
a 6= 1

(5.15)

(5.14) further simplifies to

Qwhk1 i ,whk2 i (l1 , l2 ) = Rzhk1 i ,zhk2 i (0) +

N
1
X

s=1

(5.16)

2
2
Rzhk1 i ,zhk2 i (s)e N sl1 + Rzhk1 i ,zhk2 i (s)e N sl1 ,
for l1 + l2 = 0 or l1 + l2 = N
and
2

e 2N (l1 +l2 )
2

Qwhk1 i ,whk2 i (l1 , l2 ) =


2N sin 2N
(l1 + l2 )
N
1
2

X
2

Rzhk1 i ,zhk2 i (s) e N sl1 e N sl2 +


s=1

(5.17)

2
+Rzhk1 i ,zhk2 i (s) e N sl2 e N sl1 ,
for l1 + l2 6= 0 and l1 + l2 6= N,

as we will show in the following. Suppose we have


l1 + l2 = 0 or l1 + l2 = N.

(5.18)

5.1. Noise and Interference

113

Then, according to (5.15),


N
1
X

e N t(l1 +l2 ) = N,

t=0
NX
1s
t=0
NX
1s

2
(sl1 +tl1 +tl2 )
N

= e

2
sl1
N

e N (sl2 +tl1 +tl2 ) = e N sl2

NX
1s
t=0
NX
1s

e N t(l1 +l2 ) = e N sl1 (N s),


e N t(l1 +l2 ) = e+ N sl1 (N s),

t=0

t=0

where in the last equation assumption (5.18) was used twice. This applied to (5.14)
implies (5.16). Suppose now we have
l1 + l2 6= 0 and l1 + l2 6= N.

(5.19)

Then, according to (5.15),


1

N
1
X

e N t(l1 +l2 ) =

t=0
NX
1s

2
(sl1 +tl1 +tl2 )
N

z
}|
{
2
1 e N N (l1 +l2 )
2

1 e N (l1 +l2 )

= e

2
sl1
N

NX
1s

t=0

e N t(l1 +l2 )

t=0

= e

sl1 1
2
N

= e N sl1

NX
1s

= 0,

e N (sl2 +tl1 +tl2 )

e N (N s)(l1 +l2 )
2

1 e N (l1 +l2 )
2
1 e N s(l1 +l2 )
2

1 e N (l1 +l2 )
2
2
e N sl1 e N sl2
=
,
2
1 e N (l1 +l2 )
NX
1s
2
2
sl2
N
e N t(l1 +l2 )
= e

t=0

t=0

= e

2
sl2 1
N

= e N sl2
=

e N (N s)(l1 +l2 )
2

1 e N (l1 +l2 )
2
1 e N s(l1 +l2 )
2

1 e N (l1 +l2 )
2
2
e N sl2 e N sl1
,
2
1 e N (l1 +l2 )

5. Multiple-Input / Multiple-Output Discrete Multitone

114

which applied to (5.14) implies (5.17), if one observes


1

2
(l +l2 )
N 1

1e

2
2N

1
2

2
(l1 +l2 )
e 2N (l1 +l2 ) e 2N (l1 +l2 )
2

e 2N (l1 +l2 ) 2
2

2
2 e 2N (l1 +l2 ) e 2N (l1 +l2 )

e 2N (l1 +l2 )
2
.
=
2 sin 2N
(l1 + l2 )
As already mentioned, (5.16) and (5.17) fully specify the covariance and the
pseudo-covariance matrix of nn0 in (2.25), and we have found the desired solution. However, if we are not interested in correlations between different frequencies (subcarriers), we can approximate5 the covariance and the pseudo-covariance
matrix by block diagonal matrices, which is also in line with the block diagonal
structure - see (2.25) - of HMIMO DMT , i.e,

Cnn0

= CnMIMO DMT

Pnn0

= PnMIMO DMT

Cn (0)

0
..

+1 K N +1 K
C( 2 ) ( 2 )

.
N

Cn ( 2 )

Pn (0)

..

+1 K N +1 K
C( 2 ) ( 2 )

.
N

Pn ( 2 )

with
Cn (l) , Pn (l) C KK .

(5.20)

Specializing (5.8) and (5.9) to l = l1 = l2 , while applying (5.7), (5.16) and (5.17),
we obtain the elements of Cn (l) and Pn (l) as
Cn

(0)

(k1 , k2 ) = Pn

(0)

(k1 , k2 ) = Rzhk1 i ,zhk2 i (0) +

N
1
X

s=1

(5.21)

Rzhk1 i ,zhk2 i (s) + Rzhk1 i ,zhk2 i (s) N zhk1 i zhk1 i ,


N

s
(1)s
N
s=1

Rzhk1 i ,zhk2 i (s) + Rzhk1 i ,zhk2 i (s) ,

Cn ( 2 ) (k1 , k2 ) = Pn ( 2 ) (k1 , k2 ) = Rzhk1 i ,zhk2 i (0) +

N
1
X

This is then an extension of the results obtained in Subsection 4.1.3 to the MIMO case.

5.1. Noise and Interference

and, for l 6= 0 and l 6=

N
2,

115

as

Cn (l) (k1 , k2 ) = Rzhk1 i ,zhk2 i (0) +

N
1
X

s=1

(5.22)

2
2
Rzhk1 i ,zhk2 i (s)e N sl + Rzhk1 i ,zhk2 i (s)e N sl ,

2
N
1
X
e N l
2
(l)

Pn (k1 , k2 ) =
sin
sl
Rzhk1 i ,zhk2 i (s)+
N
N sin 2
N l s=1

+Rzhk1 i ,zhk2 i (s) .


Note that we have expressed the first and second moments of the noise at the outputs of the DFTs in terms of the statistical properties of the noise at inputs of the
receivers.

5.1.2 Interference
In this subsection, it is our goal to translate the results of Section 4.4 about intersymbol and intercarrier interference to the MIMO case. It was already mentioned
in Section 2.2 that the TDEs have the task to shorten all (k, m = 1, . . . , K) channel impulse responses ghkmi , so that the resulting impulse responses have lengths
shorter or equal to p + 1, p being the length of the Cyclic Prefixes. In other words,
ehki are chosen such that hhkmi = ehki ghkmi satisfy
hhkmi (n) = 0,

n < 0 or n > p.

(5.23)

Note that the calculation of the TDE coefficients is a nontrivial problem, since the
impulse response of the k-th TDE, ehki , has to shorten all ghkmi , m = 1, . . . , K,
simultaneously, and can therefore be an over-determined problem, which can then
only be solved in an approximate sense. In order to analyze the effects occurring,
we relax (5.23) - similarly to Section 4.4 - to
hhkmi (n) = 0,

n < lhkmi < 0 or n > lhkmi > p.

(5.24)

Applying the results of the same section, we conclude that the notation of the
block diagonal HMIMO DMT with its blocks H(n) using Z - transforms H hkmi (z)
of hhkmi , see (2.25), does not change, whereas the values of its entries change
according to (5.24). Let thmi denote the m-transmit signal, such that we have at
the output of the k-th TDE, cf. (2.22),
hki

K
X

hhkmi thmi + zhki .

(5.25)

m=1

It is an immediate consequence of (4.108) that we can express the overall interference on the l-th subcarrier at the output of the k-th DFT (in the k-th receiver)

116

5. Multiple-Input / Multiple-Output Discrete Multitone

introduced by (5.24) including both the + and the contribution as

+
K
lhkmi
2
X
X
1
hmi
hkmi +

ihki
(l)
=
t
(n

n)H
e N l
0
n0
n
N m=1 n=p+1

(5.26)

(postcursor ISI from previous symbol)


l

hkmi +

thmi (n0 + N n)Hnhkmi

2
e N l

n=p+1
+

(ICI, if lhkmi p + N )

hkmi
1
l

thmi (n0 + n)Hnhkmi

2
e N l

n=0

(ICI, if lhkmi N )

hkmi
l
1

X
2

+
thmi (n0 + N + n)Hnhkmi e N l ,
n=0

(precursor ISI from next symbol)


l = 0, . . . , N 1,

k = 1, . . . , K,

hkmi +
Hn
(z)

hkmi
and Hn
(z) are defined
responses hhkmi , respectively.

where
as in (4.96) and (4.102) applied to
the impulse
In the following, we are interested in the statistics of this interference. As
usual, the m-th transmit signal thmi is modeled as a discrete-time, real valued (due
to baseband signalling), random process. Extending the assumptions of Section 4.4
(and still maintaining their simplicity), we will assume that all elements of these
processes are zero-mean and pairwise uncorrelated for different time instants (also
across different processes), except of course for the elements of the Cyclic Prefixes
which are - by definition - identical to other elements. For fixed time indices n, the
variances and correlations of the elements of the K transmit signals are given by
covariances
n
o
thm1 i ,thm2 i = E thm1 i (n)thm2 i (n) , m1 , m2 = 1, . . . , K,
that do not depend on these time indices n (i.e., they are the same for all time
instants n). It is obvious that the interference is zero-mean as well. Assuming6

(5.27)
lhkmi N + and lhkmi N with N + + N = N,
we have, according to (5.26),
n

o
hk1 i
hk2 i
E in0 (l1 ) in0 (l2 )
=
6

For wireline transmission and typical FFT-lengths, this is usually true.

5.1. Noise and Interference

117

K
K
N
1 X X X
thm1 i ,thm2 i
N
m1 =1 m2 =1 n=p+1
2
2
+
+
Hnhk1 m1 i e N l1 Hnhk2 m2 i e N l2

(ISI)

K
K
N
1 X X X
+
thm1 i ,thm2 i
N
m1 =1 m2 =1 n=p+1
2
2
+
+
Hnhk1 m1 i e N l1 Hnhk2 m2 i e N l2

(ICI)

K
K N 1
1 X X X
+
thm1 i ,thm2 i
N
m1 =1 m2 =1 n=p+1
2
2

Hnhk1 m1 i e N l1 Hnhk2 m2 i e N l2

(ICI)

K
K N 1
1 X X X
thm1 i ,thm2 i
+
N
m1 =1 m2 =1 n=p+1
2
2

Hnhk1 m1 i e N l1 Hnhk2 m2 i e N l2

(ISI)

and
n
o
hk2 i
1i
E ihk
(l
)i
(l
)
=
1
2
n0
n0

K
K
N
1 X X X
thm1 i ,thm2 i
N
m1 =1 m2 =1 n=p+1
2
2
+
+
Hnhk1 m1 i e N l1 Hnhk2 m2 i e N l2

(ISI)

K
K
N
1 X X X
thm1 i ,thm2 i
+
N
m1 =1 m2 =1 n=p+1
2
2
+
+
Hnhk1 m1 i e N l1 Hnhk2 m2 i e N l2

(ICI)

K N 1
K
1 X X X
thm1 i ,thm2 i
+
N
m1 =1 m2 =1 n=p+1
2
2

Hnhk1 m1 i e N l1 Hnhk2 m2 i e N l2

(ICI)

K
K N 1
1 X X X
+
thm1 i ,thm2 i
N
m1 =1 m2 =1 n=p+1
2
2

Hnhk1 m1 i e N l1 Hnhk2 m2 i e N l2 ,

(ISI)

5. Multiple-Input / Multiple-Output Discrete Multitone

118

so that we obtain the cross-covariance matrices7 Cihk1 i ,ihk2 i of the interference vectors

hki
in0 (0)

..
ihki
, k = 1, . . . , K,
n0 =
.
hki

in0 (N 1)
as
H

1 i hk2 i
Cihk1 i ,ihk2 i = E{ihk
} = Cihk1 i ,ihk2 i + Cihk1 i ,ihk2 i ,
n0 in0
ISI

ISI

ICI

(5.28)

ICI

where Cihk1 i ,ihk2 i and Cihk1 i ,ihk2 i denote the cross-covariance matrices of ISI and
ISI

ISI

ICI

ICI

ICI, respectively, satisfying

Cihk1 i ,ihk2 i = Cihk1 i ,ihk2 i ,


ISI

ISI

ICI

(5.29)

ICI

and having entries


Cihk1 i ,ihk2 i (l1 , l2 ) = Cihk1 i ,ihk2 i (l1 , l2 )
ISI

ISI

ICI

(5.30)

ICI

K
K
1 X X
=
thm1 i ,thm2 i
N
m1 =1 m2 =1

N+
2
2
X
+
+

Hnhk1 m1 i e N l1 Hnhk2 m2 i e N l2
+
n=p+1

1
NX

Hnhk1 m1 i

2
2

,
e N l1 Hnhk2 m2 i e N l2

n=0

and the pseudo-cross-covariance matrices8 Pihk1 i ,ihk2 i of the interference vectors


hki

in0 , k = 1, . . . , K, as
T

1 i hk2 i
Pihk1 i ,ihk2 i = E{ihk
} = Pihk1 i ,ihk2 i + Pihk1 i ,ihk2 i ,
n0 in0
ISI

ISI

ICI

(5.31)

ICI

where Pihk1 i ,ihk2 i and Pihk1 i ,ihk2 i denote the cross-covariance matrices of ISI and
ISI

ISI

ICI

ICI, respectively, satisfying

ICI

Pihk1 i ,ihk2 i = Pihk1 i ,ihk2 i ,


ISI

7
8

ISI

There is no dependence on n0 anymore.


Again no dependence on n0 .

ICI

ICI

(5.32)

5.1. Noise and Interference

119

and having entries


Pihk1 i ,ihk2 i (l1 , l2 ) = Pihk1 i ,ihk2 i (l1 , l2 )
ISI

ISI

ICI

(5.33)

ICI

K
K
1 X X
thm1 i ,thm2 i
N
m1 =1 m2 =1

N+
2
2
X
+
+

Hnhk1 m1 i e N l1 Hnhk2 m2 i e N l2 +
n=p+1

1
NX

Hnhk1 m1 i

l
2
N 1

Hnhk2 m2 i

2
e N l 2 .

n=0

To be in line with (the special ordering of) (2.25), we have to consider the
covariance matrix CiMIMO DMT and the pseudo-covariance matrix PiMIMO DMT of the
stacked interference vector

iMIMO DMT n0

h1i
in0 (0)

..

hKi

in0 (0)

.
..
=

h1i N
in0

..

.
hKi N
in0 2

(5.34)

The cross-covariance matrices and pseudo-cross-covariance matrices previously


obtained contain all elements of the covariance matrix CiMIMO DMT and the pseudocovariance matrix PiMIMO DMT , respectively, so that our results remain compatible to
(2.25), cf. also Subsection 5.1.1.
Note that the important result that ISI and ICI have the same statistics with
respect to first and second order moments, and that both interference mechanisms
are uncorrelated under the chosen assumptions, so that the statistics of the overall
interference are obtained by a simple addition (or by multiplying one interference
contribution by a factor of 2) is true also in the MIMO case. Furthermore, intersymbol and intercarrier interference are again rotationally variant in general, such
that we can benefit from taking into account the pseudo-covariance matrix. As in
Subsection 5.1.1, see (5.20), we can look at block diagonal approximations, but in
contrast to the noise results, the corresponding formulas are not simplified. We
also want to emphasize that if we are interested in the overall system, we have to

5. Multiple-Input / Multiple-Output Discrete Multitone

120

consider the sum vector of noise and interference, i.e,


CnMIMO DMT +iMIMO DMT

= CnMIMO DMT + CiMIMO DMT ,

PnMIMO DMT +iMIMO DMT

= PnMIMO DMT + PiMIMO DMT .

(5.35)

5.1.3 MIMO Time Domain Equalizer


We have already pointed out that the design of Time Domain Equalizers (TDEs) is a
nontrivial problem in the MIMO situation, because one TDE has to shorten several
impulse responses simultaneously. Secondly, as we have already mentioned, a
simple shortening algorithm usually amplifies the noise. Therefore, a solution to
this problem is usually given by minimizing some objective functions that represent
the interference and take into account the noise as well, or by maximizing certain
performance measures. It is a very common approach in practice to tackle such
minimizing / maximizing tasks by means of efficient numerical algorithms. In
Subsection 4.4.3, we showed how we can apply this idea to determine the filter
coefficients of a non-MIMO TDE. We will now explain how we can extend this
approach for the design of a MIMO TDE.
First, observe that the overall channel matrix HMIMO DMT of (2.25) has entries
2
H hkmi (e N l ) that depend on the TDEs via the following Z - transform equation,
H hkmi (z) = E hki (z)Ghkmi (z),

(5.36)

where E hki (z) denotes the Z - transform of ehki , the impulse response of the k-th
TDE, and Ghkmi (z) denote the Z - transforms of all (cross-)channels ghkmi , see
Figure 2.9.
The noise has a covariance and pseudo-covariance matrix according to (5.7),
(5.8), (5.9), (5.16) and (5.16), or, if we use an approximation that neglects crosscorrelations between different frequencies (subcarriers), according to (5.20), (5.21)
and (5.22). Observe that (5.3) connects those formulas to the impulse responses of
the TDEs.
Finally, ISI and ICI have covariance and pseudo-covariance matrices according
to (5.30) and (5.33), whose entries are dependent on the impulse responses ehki of
the TDEs via
+

l=

Hnhkmi (z) =
Hnhkmi (z) =

hkmi +

ehki (l)Gnl

hkmi

ehki (l)Gn+l

(z) ,

(5.37)

(z) ,

l=
hkmi +

hkmi +

where Gn
(z) and Gn
(z) are defined as in (4.96) and (4.102) applied to
the channel impulse responses ghkmi , respectively.
If we want to use capacity as an overall performance measure (similarly as in
[22]), we can plug these matrices into the formulas of Section 3.2, and, concluding

5.2. The Transmission Scenario

121

that capacity is a function of the filter coefficients of the TDEs, we can maximize
this function with respect to these parameters. Note that our analytical results
provide us with an explicit relation between the capacity and ehki , k = 1, . . . , K,
so that e.g. a conventional numerical maximization algorithm can be used to solve
this maximization problem. Also, for the design of (practical) low-complexity
MIMO TDE algorithms, the results obtained can be very useful.
Again, we will stop our considerations about the (design of) MIMO Time Domain Equalizers at this point. We admit that this is possibly unsatisfactory for the
reader who is also interested in quantitative results. However, such results depend
on the chosen MIMO TDE algorithms, and in turn on the chosen design methods.
A full and meaningful analysis would need a lot of additional work and can be regarded as a separate topic. This is beyond the scope of this manuscript. However,
we developed all analytical tools that are required for this research area and showed
that the utilization of the pseudo-covariance matrix plays an important role in the
design and analysis of MIMO Time Domain Equalizers.

5.2 The Transmission Scenario


In Section 2.3, it was pointed out that we have to deal with the following general
channel model,
y = Ax + n,
(5.38)
where y C r and x C t denote the received and transmitted vectors, respectively. A is a deterministic r t complex matrix, the channel matrix, and n C r
is the zero-mean noise vector. The transmitter is constrained in its total power to
S,
E{xH x} S,
(5.39)

or, equivalently, since xH x = tr xxH , and expectation and trace commute,

tr E{xxH } S.

(5.40)

Since this is the complex channel model, we assume that we have knowledge of
both the covariance matrix Cn and the pseudo-covariance matrix Pn . This can be
justified due to our results of Section 5.1. For notational simplicity, we also assume
that the remaining interference (if there is interference at all) is also incorporated
in this noise vector, i.e., in fact we deal with the covariance matrix Cn+i and the
pseudo-covariance matrix Pn+i , cf. also (5.35). We can compute the capacity of
this channel by applying the results of Subsection 3.2.2. Of course, in order to
obtain this capacity, it is necessary to utilize the non-vanishing pseudo-covariance
matrix.
Another, but equivalent - cf. Chapter 3 - approach to the channel (5.38) and
(5.39) / (5.40) is to consider the equivalent real channel model
x + n
y
= A

(5.41)

5. Multiple-Input / Multiple-Output Discrete Multitone

122

with

x
=

<{x}
={x}

2t

R ,

and

y
=

=
A

<{y}
={y}

2r

R ,

<{A} ={A}
={A}
<{A}

n
=

<{n}
={n}

R2r ,

R2r2t ,

cf. also Section 3.1. The power constraints (5.39) and (5.40) translate to

and

E{
xT x
} S

(5.42)

tr E{
xx
T } S,

(5.43)

respectively, and the covariance9 matrix Cn R2r2r of the real noise vector n

is obtained from the covariance matrix Cn and the pseudo-covariance matrix Pn


(in fact from Cn+i and Pn+i ) according to (3.5). Note that the real and complex
channel descriptions are fully equivalent.
The equivalent real channel model (5.41) with (5.42) / (5.43) has the advantage that it is sufficient to deal with only one noise matrix. There is no need
to consider a pseudo-covariance matrix. Also, certain high SNR assumptions,
cf. Subsections 3.2.2 and 3.2.3, can be dropped. On the other hand, if one really
wants to work out the differences between rotationally invariant and variant noise
or wants to study the implications resulting from rotationally variant noise, it is
mandatory to use the complex channel model.
Since we are now interested in the design of a transmission scheme rather than
on effects that originate from a non-vanishing pseudo-covariance matrix, we will
mostly consider the equivalent real channel model. The reader might ask at this
point, why we developed such a big machinery for the utilization of the pseudocovariance matrix, and switch now to the equivalent real channel model, where the
concept of a pseudo-covariance matrix does not make sense anymore. First of all,
it provides us with a good argument for using the equivalent real channel model
in this chapter. But this is by far not the only reason. One of the main reasons is,
that - apart from the approach of this chapter - there are other approaches to deal
with MIMO DMT systems (e.g. [49, 50]) that do not make use of the equivalent
real channel model. Since it was not known before, that noise and interference in
(MIMO) DMT systems are rotationally variant in general, the other methods are
not suited to this effect. The results of our previous chapters can now be used to
study the effects originating from a non-vanishing pseudo-covariance matrix on
such systems and to propose some modifications to benefit from this knowledge.
9

We will assume that this covariance matrix is non-singular, since otherwise the capacity is ei has a dimension 1, i.e., after
ther infinite or there is a zero-sub-channel (the kernel [19] of A
we obtain some vanishing diagonal elements, so that the sub-channels
a diagonalization of A,
corresponding to these elements cannot be used for transmission - these sub-channels are called
zero-sub-channels).

5.2. The Transmission Scenario

123

Note that we can expect (to some extent) that such modifications do not change
the underlying concepts of the existing approaches, whereas MIMO (DMT) systems relying on the equivalent real channel model seem to be more revolutionary
solutions.
We also want to emphasize that if the complex channel has a block diagonal
structure, as, e.g., the channel in (2.25) with (5.20), (5.21) and (5.22), this block
diagonal structure can be maintained in the equivalent real channel model by simple permutations of the matrices and vectors considered. To be more precise, the
equivalent real channel model is in turn equivalent to another (described in the
following) real (valued) channel model which is related to the original (first) real
channel model by permutations of the vectors and matrices and which has a block
diagonal structure. One can obtain this block diagonal real channel model by considering the sub-channels - introduced by the sub-matrices of the block diagonal
complex channel model - and computing the equivalent real channel models for
each of these sub-channels. Note that a diagonal complex channel, cf. (2.16), is
always a block diagonal channel (the sub-matrices are 1 1 matrices), so that the
above statements remain true also for a diagonal complex channel.
It is the goal of the remaining chapter to develop transmission schemes for
channels of the form (5.41), (5.42) and (5.43). It is important to observe that these
channels have some remarkable properties, i.e., the occurring dimensions (2r and
2t) are even numbers and the channel matrices have the special structure

=
A

<{A} ={A}
={A}
<{A}

We want to emphasize that all transmission methods we will propose in the following do not rely on the mentioned properties, i.e., they are designed for channels of
the form
y = Ax + n,

(5.44)

where y Rr and x Rt denote the received and transmitted vectors, respectively, r and t being arbitrary natural numbers. A can be any deterministic r t
real valued matrix, the channel matrix, and n Rr is the zero-mean noise vector
with known non-singular covariance matrix Cn . The transmitter is constrained in
its total power to S,
E{xT x} S,

(5.45)

or, equivalently, since xT x = tr xxT , and expectation and trace commute,

tr E{xxT } S.

(5.46)

124

5. Multiple-Input / Multiple-Output Discrete Multitone

Encoder

Decoder

Fig. 5.2: Joint en- and decoding of elements of transmit and receive vector, respectively.

Note that if we substitute the variables according to


2r r,
2t t,
A,
A
x
x,
y
y,
n
n,

we can apply transmissions schemes that are developed for the general vector channel (5.44), (5.45) and (5.46) also to channels of the form (5.41), (5.42) and (5.43).
Equations (5.44), (5.45) and (5.46) have the advantage of having a simple nota are necessary) and being very general (e.g., odd dimensions are
tion (no x
and X
allowed as well). This is the reason why we deal with this channel model in the
following.
It is obvious that a rate close to capacity can only be achieved if coding is
applied. Furthermore, it is not sufficient in general that each element of the transmit
and receive vectors x and y is encoded and decoded, respectively, separately; in
order to achieve (a rate close to) capacity, one has to encode all elements of x and
- similarly - decode all elements of y jointly, see Figure 5.2.
There is a lot of work about codes for Single - Input / Single - Output (SISO)
channels on one hand, and on the other, research about joint coding, cf. also socalled Space Time Codes [52], has started not so long ago. Hence, what we propose
subsequently is to decompose the joint encoder and the joint decoder into two parts.
One performs joint processing, and this will be the part that we are looking at in
the following; the other applies non-joint (SISO) en- and decoding algorithms,
which can be chosen out of a vast pool of research results. This corresponds to a
transmission scenario as it is depicted in Figure 5.3. For related literature we also
refer to [17, 39].

5.2. The Transmission Scenario

Joint Encoding

z
}|
{
Encoder
1
t

125

Joint Decoding

z
}|
{
Decoder
1
A

Encoder s

Decoder s

Fig. 5.3: Joint processing of elements of transmit and receive vector according to (not necessarily linear) functions T and R, cf. (5.47), and non-joint coding. s denotes the
number of en- / decoders and depends on T and R.

We describe the joint processing by (not necessarily linear) functions, i.e.,


T : Rs Rt ,
r

(5.47)

R : R R ,
the joint processing functions, so that the system including this joint processing
can be written as
r = R (AT (t) + n) .

(5.48)

Note that the parameter s depends on the joint processing functions T and R and is
included in the model to cope e.g. with a non-square channel matrix A. In this case
s min{r, t}. Furthermore, if there is a zero-sub-channel, cf. Footnote 9 in this
chapter, this channel cannot be used for transmission and s has to be decreased.
We will assume that t = [t(1) t(s)]T is zero-mean and impose power constraints on its individual elements, i.e.,
E{t(i)} = 0 and E{(t(i))2 } = 1,

i = 1, . . . , s,

(5.49)

which is only a normalization and not a restriction, since any other mean and power
distribution can be incorporated in the transmit function T as well. The reason for
doing that is that we want to compare different transmission schemes defined by
different joint processing functions T and R and due to our normalization (5.49)
fairness is guaranteed. Without this normalization the joint processing at the transmitter side according to Figure 5.3 would not be well defined, i.e., there would
(could) be still some freedom in distributing the power onto the elements of t such
that (5.45) / (5.46) is satisfied, and different distributions would (could) yield different performances.
Note that we do not allow the different encoders and decoders, respectively, to
cooperate. Depending on the joint processing, this implies that an individual subchannel i {1, . . . , s} - corresponding to the i-th element of t and r - is not only

126

5. Multiple-Input / Multiple-Output Discrete Multitone

disturbed by the noise n, there will also be impairments due to crosstalk from the
other sub-channels j 6= i. However, we can compute the mutual information [6] as

I r(i); t(i) = h r(i) h r(i)|t(i) ,

(5.50)

which can be maximized in order to calculate the capacity C(i) of the i-th subchannel. For simplicity, we will assume that all involved random variables are
Gaussian distributed, especially also the crosstalk contributions, while maximizing
(5.50). Finally, the (sum -) capacity (throughput) of the transmission scenario of
Figure 5.3 is obtained by summation of the individual capacities,
CT,R =

s
X

C(i).

(5.51)

i=1

Note that the joint processing functions T and R determine whether the maximizing input distribution of (5.50) is Gaussian or not, even if all other involved
random variables are Gaussian. Nevertheless, we will assume in the following that
the maximizing input distribution is Gaussian.
It is obvious that the (sum -) capacity of (5.51) is smaller or equal to the capacity of (5.44) and (5.45) / (5.46) with the transmission scenario of Figure 5.2.
Therefore, it will be our goal in the following to design the joint processing functions T and R in such a way that the (sum -) capacity of (5.51) is close (or even
equal) to this capacity.
The simplest way to define the joint processing functions wouldbe to use
diagonal matrices T Rts and R Rsr with the side constraint tr TTT = S,
i.e.,
T : Rs Rt ,
t 7 x = T(t) = Tt,
r

T diagonal,

tr TTT = S,

(5.52)

R : R R ,
y 7 r = R(y) = Ry,

R diagonal.

This configuration with s = r = t = N K, if all K loops and all N2 1 complex valued and the two real valued subcarriers (at frequencies 0 and N2 ) are used,
essentially corresponds to the conventional transmission over cable bundles where
the transmission is not coordinated and FEXT is regarded as noise. Later in the
simulation results, cf. Subsection 5.3.4, we will compare the proposed methods
with this conventional approach.

5.3 New Design Methods based on the Singular Value Decomposition


In this section we are considering exclusively linear joint processing functions T
and R, hence, joint processing functions that can be described by matrices T

5.3. New Design Methods based on the Singular Value Decomposition

127

Rts and R Rsr , i.e.,


T : Rs Rt ,

(5.53)

t 7 x = T(t) = Tt,
R : Rr Rs ,
y 7 r = R(y) = Ry.
We will define three transmission schemes by means of these matrices, and the
central part of the calculation rule is based in all three cases on the Singular Value
Decomposition (SVD) [19]. The SVD will be used as a tool that is able to diagonalize an arbitrary (even rectangular) matrix. But note that in the latter two cases we
will omit full diagonalization for the benefit of a lower computational complexity.

5.3.1 Full Diagonalization


For the material of this subsection, we also refer to [36, 45, 53, 54]. Let Bn
Rrr denote a generalized Cholesky factor of the covariance matrix Cn Rrr
of the real valued noise vector n (we are considering the real channel model (5.44)
and (5.45) / (5.46)). Note that a real valued generalized Cholesky factor of a real
valued positive definite symmetric matrix can be obtained - as in the complex case
- by conventional Cholesky decomposition [19] or by eigenvalue decomposition.
According to Definition 3.12, we have
Cn = Bn BT
n.
Furthermore, let
T
B1
n A = UDV

denote the Singular Value Decomposition (SVD) [19] of B1


n A. In contrast to the
complex case, the SVD yields orthonormal matrices U Rrr and V Rtt ,
i.e., they satisfy
U1 = UT and V1 = VT .
The diagonal matrix

D = diagrt d1 , . . . , dmin{r,t} Rrt


has non-negative entries
d1 d2 . . . dq > 0 and dq+1 = dq+2 = . . . = 0

(5.54)

on the main diagonal (the singular values). So D has the same properties as in the
complex case. Let10
Ca = diagtt {ca 1 , . . . , ca t } ,
(5.55)
10
Note that there exists in fact a random vector a, i.e., a = Ba t, cf. (5.57), with this covariance
matrix Ca .

128

5. Multiple-Input / Multiple-Output Discrete Multitone

denote a diagonal matrix that is obtained from D - again using the definition x+ =
max{0, x} - via Water Filling, as
(
+
L d12
, iq
ca i =
,
i
0,
i>q
where the Water Level L is chosen to satisfy tr (Ca ) =

Pt

i=1 ca i

= S. Note that

ca 1 ca 2 . . . ca s > 0 and ca s+1 = ca s+2 = . . . = 0

(5.56)

with s q, so that we can finally write

Ba = diagts { ca 1 , . . . , ca s } .
Although we have
Ca = Ba BT
a,
Ba is not a generalized Cholesky factor of Ca in general, since it can happen that
Ca is a singular and Ba a rectangular (not square) matrix.
We define the transmit function and receive function to be the (here linear)
functions
T : Rs Rt ,

(5.57)

t 7 x = T(t) = VBa t,
| {z }
T
r

R : R R ,
y 7 r = R(y) = (DBa ) UT B1
n y,
|
{z
}
R

where (DBa ) denotes the (Moore - Penrose) pseudo inverse [19] of DBa , i.e.,

0
d1 ca 1

..
(DBa ) =
(5.58)
0 Rsr ,
.

0
ds ca s

which is well defined, because di ca i 6= 0, i = 1, . . . , s, according to (5.54) and


(5.56).
Note that this definition is in line with (5.45) / (5.46) since Ba t has a covariance matrix Ca due
to (5.49)
and the assumption that different encoders do not
cooperate, and tr VCa VT =tr (Ca )=S. Furthermore, this definition yields
r = R (AT (t) + n)

= (DBa ) U
|

{z
D

= t + m,

(5.59)

B1
n AV Ba t
}

+ (DBa ) U

B1
n n

5.3. New Design Methods based on the Singular Value Decomposition

129

m having a covariance matrix

T
Cm = (DBa ) UT B1
C
B
U
(DB
)
a
| n {zn n }
I

{zr

Ir

d21 ca 1

0
..

.
1
d2s ca s

(5.60)

Note that there is no remaining crosstalk and we obtain the mutual information of
the i-th sub-channel, cf. (5.50),

I r(i); t(i) = h r(i) h r(i)|t(i)

= h t(i) + m(i) h m(i)

1
1
1
1
=
log 1 + 2
log
2
2
di ca i
d2i ca i

1
=
log 1 + d2i ca i
2
1 2 +
=
log Ldi
,
2

(5.61)

and, furthermore,
CT,R =

s
X
i=1

s
s
X
2 +

1X
log Ldi
C(i) =
I r(i); t(i) =
.
2
i=1

(5.62)

i=1

It is a consequence of (5.49) and of the Gaussian assumption that we do not have the
freedom anymore to maximize (5.61) over the input distributions of the individual
sub-channels.
On the other hand, we can compute the capacity of the channel (5.44) and
(5.45) / (5.46) with the transmission scenario depicted in Figure 5.2. Applying the
Maximum Entropy Theorem for Real Random Vectors (Theorem 3.20) we conclude - following the same line of arguments as in Subsection 3.2.1 - that this
capacity is given by

C =

max

Cx :tr(Cx )S

1
1
T
log det ACx A + Cn log det Cn ,
2
2
(5.63)

where Cx is the covariance matrix of x, i.e., the maximization goes over all non-

130

5. Multiple-Input / Multiple-Output Discrete Multitone

negative definite, symmetric matrices with trace smaller or equal to S. We have

log det ACx AT + Cn = log det ACx AT + Bn BT


n

T
T T
= log det Bn B1
n ACx A Bn + Ir Bn

= log det UDVT Cx VDT UT + Ir +


+ log det Cn

T
T
= log det DV Cx VD + Ir + log det Cn ,
and, with Cx = VCa VT (
a = VT x), the maximization problem (5.63) is equivalent to

1
T
C =
max
log det DCa D + Ir ,
(5.64)
2
C
a :tr(C
a )S
since tr (Cx ) = tr (Ca ). Proceeding as in [58], we find that the matrix Ca in (5.55)
maximizes (5.64). The corresponding maximum mutual information (capacity) is
given by
X 1
+
log Ld2i
C=
,
(5.65)
2
i:di 6=0

which is equal to the capacity CT,R in (5.62).


Hence, we have proven that the joint processing functions of (5.57) are optimum in the sense that they make it possible to achieve capacity using noncooperating SISO codes. Of course, this is only true in the asymptotic limit.
If we want to apply this transmission scheme to our MIMO DMT channel, we
immediately recognize that we have to expect a high computational Online Complexity, that is the complexity of the operations performed during data transmission.
For the complexity that is required only in the beginning of transmission (and each
time the channel changes) we speak of Startup Complexity. It is obvious that Online Complexity is more critical, especially at the high data rates (clock rates) the
current systems are usually running.
An inspection of (2.25) shows that the dimensions of the matrices T and R in
(5.57) are given by s = r = t = N K in general (if all K loops and all N2 1
complex valued and the two real valued subcarriers at frequencies 0 and N2 are
used) and that these matrices can possess a block diagonal structure if the special
ordering assumption of the singular values of the SVD
[19]

is dropped, cf. also


Section 5.2. In this case, both matrices consist of N2 1 (2K 2K) and two
(K K) blocks. Therefore, the numbers of required operations for evaluating the
joint processing functions (5.57) in the transmitter and receiver11 are
T
R
Omult
= Omult
= (N 1) 2K 2 ,
T
Oadd
11

R
Oadd

(5.66)
2

= N K (2K 1) 2K ,

Another measure would be to count the numbers of performed operations in the transmitter and
receiver per transmitted bit of information. For the reason of simplicity we stick to our complexity
definition.

5.3. New Design Methods based on the Singular Value Decomposition

131

(real) multiplications and additions, respectively, which are quite large for typical
ADSL or VDSL [912, 2530] parameters. Note that these operations have to be
performed for each transmitted and received DMT symbol vector.

5.3.2 Approximate Diagonalization


In this subsection, we will study - based on a very basic approach - what happens
if we try to reduce the Online Complexity. The underlying idea is to use the same
matrices T and R as in (5.57) and to simply set half of the elements, i.e., the
elements with the smallest absolute values, equal to zero. At the transmitter side,
the resulting matrix has to be scaled in order to meet the power constraint (5.45)
/ (5.46). Hence, the joint processing functions together with the newly defined
matrices T and R read
T : Rs Rt ,

v
u
S
u
VBa t,
t 7 x = T(t) = t
T
tr VBa VBa
|
{z
}

(5.67)

T
r

R : R R ,
y 7 r = R(y) = DR (DBa ) UT B1
n y,
|
{z
}
R

where X denotes the matrix in which half of the elements of X - the elements with
the smallest absolute values - are set to zero. The diagonal matrix DR is chosen
such that the overall channel matrix RAT has diagonal elements equal to 1. If
this is not possible, we have one (more) zero-sub-channel(s), cf. Footnote 9 in
this chapter, that can be neglected by decreasing s, the number of en- / decoders.
Therefore, we maintain this assumption without loss of generality. Note that this
scheme has an Online Complexity of
T
R
Omult
= Omult
= (N 1) K 2 ,
N
T
R
Oadd
Oadd

K (2K 1) K 2 ,
2

(5.68)

that is half of the complexity of the scheme of Subsection 5.3.1. In Subsection


5.3.4, we will calculate (by means of simulations) the corresponding (sum -) capacity CT,R .

5. Multiple-Input / Multiple-Output Discrete Multitone

132

5.3.3 Diagonalization of Subsets


Another approach to reduce the Online Complexity is based on the following partitionings of the channel matrix
h11i

A
Ah1ui

..
..
hiji
..
A=
Rrp tp ,
(5.69)
, A
.
.
.
Ahu1i

Ahuui

and the covariance matrix of the noise

Cn h11i Cn h1ui

..
..
..
Cn =
,
.
.
.
hu1i
huui
Cn
Cn

Cn hiji Rrp rp ,

(5.70)

with

r
t
and tp = .
(5.71)
u
u
Note that we partition the channel matrix and the covariance matrix of the noise
into u2 sub-matrices with dimensions (rp tp ) and (rp rp ). It is obvious that u,
rp and tp have to be integer numbers. Hence, our freedom to select u is limited by
the condition that rp and tp - according to (5.71) - are integer numbers.
If we neglect all off-diagonal sub-matrices and apply the results of12 Subsection
5.3.1 to the block diagonal matrices
h11i

A
0
Cn h11i
0

..
..
(5.72)

and
,
.
.
rp =

Ahuui

Cn huui

we obtain joint processing functions together with transmit and receive matrices T
and R, respectively, as
T : Rs Rt ,

t 7 x = T(t) =
|

Th11i

0
..

{z

Thuui

(5.73)

t,
}

R : Rr Rs ,

y 7 r = R(y) =
|

Rh11i

0
..

{z

Rhuui

y
}

R
12
Again, we will drop the special ordering assumption of the singular values of the SVD [19], so
that the block diagonal structure is maintained.

5.3. New Design Methods based on the Singular Value Decomposition

133

with
hiii

Thiii = Vhiii Ba ,

T hiii 1
hiii
Rhiii =
Dhiii Ba
Uhiii Bn
,

(5.74)

i = 1, . . . , u,
defined as in Subsection 5.3.1. For our MIMO DMT channel, this method yields
an Online Complexity of
2K 2
T
R
Omult
= Omult
= (N 1)
,
u

2K
2K 2
T
R
Oadd
= Oadd
= NK
1
,
u
u

(5.75)

which is about u times less than the complexity of the scheme of Subsection 5.3.1
that also makes use of the off-diagonal sub-matrices. We want to emphasize that
different orderings of the elements of input and output vector of the channel correspond to permutations of the columns and rows of the channel matrix A and of
the covariance matrix Cn and yield different partitionings. Hence, it is possible to
improve the performance of this scheme while maintaining its Online Complexity
by optimizing with respect to the ordering of the elements of x, y (and n). Note
also that the physical transmitters / receivers corresponding to different partitions
(subsets) of the partitioning need not to be co-located. Therefore, this method can
be applied to distributed physical transmitter / receiver topologies as well. In the
next subsection, we will calculate (by means of simulations) the (sum -) capacity
CT,R for u = 2, so that we have the same Online Complexity as for the approach
of Subsection 5.3.2 and obtain comparable results.

5.3.4 Simulation Results


In this section, we would like to demonstrate the performance of the previously
defined schemes. As relevant performance measure, we use the (sum -) capacity
CT,R , defined in (5.51), which we calculate by means of simulation. We use the
parameters of a real ADSL scenario, i.e., a DFT-length of N = 512, a subcarrier
spacing of 4312.5 Hz, and a transmit power of 100 mW per twisted pair. We are
considering a cable bundle consisting of K = 20 loops, whose transfer functions
were obtained by measurements of Austrian cables. Of course, this applies also to
the cross transfer functions modeling FEXT. The simulation results are presented
for loop lengths from 1 km to 7 km. We are comparing two different noise models:
1. Only background noise, with a constant (one-sided) power spectral density
(PSD) [34] of 140 dBm/Hz. For a full and detailed description of this
simulation scenario we refer to Section B.2 in the Appendix. Note that this
scenario models the situation, where one operator is owner of the whole

134

5. Multiple-Input / Multiple-Output Discrete Multitone

cable bundle and is therefore able to perform joint processing over all loops
in this cable bundle. The remaining noise is only background noise. The
obtained results are depicted in Figure 5.4 (channel symbol rate T1 = 2.208
mega-symbols / second).
2. Typical noise environment in a cable bundle including crosstalk and background noise. For a full and detailed description of this simulation scenario
we refer to Section B.3 in the Appendix. Note that this scenario models the
situation, where an operator does not own the whole cable bundle alone, so
that he is not able to perform joint processing over all loops in this cable
bundle. He can merely perform joint processing over a subset (here again
K = 20 loops, to obtain comparable results) of the loops in the cable bundle
and has to accept crosstalk from the neighboring loops that do not belong to
his subset. Hence, this crosstalk has to be regarded as noise and is therefore
included in the power spectral density (PSD) of the (stationary) noise process. The obtained results are depicted in Figure 5.5 (channel symbol rate
1
T = 2.208 mega-symbols / second).
We start with the first case (only background noise), cf. Figure 5.4. First of
all, we observe that for loop lengths up to 3 km, the SVD scheme of Subsection
5.3.1 (full diagonalization) has a capacity that is (almost) twice the capacity of the
scheme that has no joint processing functions (so there is no cooperation between
the different transmitters / receivers and they encounter full FEXT). The second
observation is that the scheme of Subsection 5.3.2 (approximate diagonalization)
has a capacity that is within the range of the no joint processing function (NJPF)
scheme. Therefore, the idea to reduce complexity by simply setting small (half of
the) values equal to zero has a very negative impact on capacity. The other approach with the same reduced complexity, cf. Subsection 5.3.3 with u = 2, where
we looked at an optimized partitioning, has superior performance and approaches
the curve of the optimum SVD scheme with full diagonalization for loop lengths
above 4 km. Note also that all curves have the same asymptotic limit, because for
long loop lengths, the SNR is very low, so that the noise is extremely dominant and
any algorithm to reduce FEXT has very little impact on performance.
The simulation results for the second noise model are depicted in Figure 5.5.
Again, we conclude that the scheme of Subsection 5.3.2 (approximate diagonalization) yields no capacity gain and is - for sure - not worth applying. However,
the gain of the SVD scheme of Subsection 5.3.1 (full diagonalization) is highly
reduced as well and one must weigh of whether this gain pays for the additional
computational effort. On the other hand, this computational effort can be reduced
because the subsets scheme of Subsection 5.3.3 has the optimum capacity curve
as well (for u = 2). The reason that the gain is smaller for all schemes compared
to the first noise scenario is due to the higher noise power. Again, the noise is
extremely dominant and any algorithm to reduce FEXT has very little impact on
performance.

5.3. New Design Methods based on the Singular Value Decomposition

135

SVD Based Schemes

x 10

Full Diagonalization
Diagonalization of Subsets
Approximate Diagonalization
No Joint Processing

(Sum ) Capacity [bits/s]

4
Loop length [km]

Fig. 5.4: SVD based joint processing functions and corresponding (sum -) capacities CT,R
(140 dBm/Hz background noise, cf. also Section B.2 in the Appendix).

3.5

SVD Based Schemes

x 10

Full Diagonalization
Diagonalization of Subsets
Approximate Diagonalization
No Joint Processing

(Sum ) Capacity [bits/s]

2.5

1.5

0.5

4
Loop length [km]

Fig. 5.5: SVD based joint processing functions and corresponding (sum -) capacities CT,R
(typical noise environment in a cable bundle including crosstalk and background
noise, cf. also Section B.3 in the Appendix).

136

5. Multiple-Input / Multiple-Output Discrete Multitone

Summing up, we come to the conclusion: as long as there is no substantial


crosstalk outside the scheme, which has to be regarded as noise (and cannot be
compensated by the joint processing functions), it makes sense to use joint processing functions to improve performance. Furthermore, observe the trade-off between
computational complexity and (sum -) capacity.

5.4 The UP MIMO Scheme


In this section, we consider the so-called UP MIMO13 scheme. This scheme was
originally [2, 37] designed by the author for wireless MIMO transmission since it
allows to use different amount of channel state information (CSI) at the transmitter,
i.e., it can actually adapt to the current situation. Consider, e.g., the situation where
transmission takes place in a static environment for a certain time period (channel
state information at the receiver can be justified) but suddenly transmitter and / or
receiver start moving. Then - of course - channel knowledge at the transmitter side
is lost and we cannot make use of it anymore. For related literature we refer to
[5, 35, 38, 46, 47].
In wireline transmission, it can be assumed that full channel state information is
present also at the transmitter side. This is due to the fact that there are no channel
variations for long time periods and all CSI parameters can be re-transmitted from
the receiver to the transmitter and remain valid until these rare channel changes take
place. Hence, the capability of the UP MIMO scheme to adapt to different amount
of CSI that can be made available at the transmitter seems to be no argument for
the application of the UP MIMO scheme also in the wireline domain.
However, there is another feature of the UP MIMO scheme that makes it interesting for wireline MIMO transmission. The UP MIMO scheme is essentially
a weighted mixture of the V-BLAST scheme [15, 16, 18] and the optimum (full
diagonalization) SVD scheme, cf. Subsection 5.3.1, and there is full freedom14 in
choosing the weighting parameter, see Subsection 5.4.3. It is important to note, cf.
Subsection 5.4.3, that the computational complexity of the V-BLAST scheme at
the transmitter side is negligible, whereas the (full diagonalization) SVD scheme
has a high computational at the transmitter side, and that the weighting parameter is
able to weight the computational complexity at the transmitter side as well. Hence,
we can apply this scheme in order to reduce complexity at the transmitter side. The
complexity at the receiver side is not reduced. We can think of applications of this
scheme to situations where we have a lot of computing power on one side of the
transmission and only little computing power on the other side.
We start the presentation of the UP MIMO scheme by defining the joint processing function at the receiver side.
13

The name comes from the terminology unitary parametrization.


In fact, this is the parameter that distinguishes between different amount of CSI available at
the transmitter, but in wireline transmission we need not to adapt to e.g. the capacity of a low-rate
backward channel.
14

5.4. The UP MIMO Scheme

137

5.4.1 The Receiver Side


In order to specify the joint processing function at the receiver side, we have to
assume that the joint processing function at the transmitter side is a linear function
and can be described by a matrix15 , i.e,
T : Rs Rt ,

(5.76)

t 7 x = T(t) = Tt.
Applying the QR-decomposition [19], we decompose (for simplicity we will assume r s) the matrix B1
n AT, Bn being a real valued generalized Cholesky
factor of Cn , into a product of an orthonormal matrix Q Rrr with an upper
triangular matrix J Rrs , i.e.,
QJ =Bn1 AT.

(5.77)

The joint processing function at the receiver side is divided into two parts, i.e.,
R : Rr Rs ,

(5.78)

y 7 r = R(y) = R2 (R1 (y)) ,


where the first function
R1 : Rr Rr ,
y 7 q = R1 (y) = Q

(5.79)
T

B1
n y,

performs multiplications by QT B1
n , so that we obtain an input-output relationship
between the transmit vector t and the vector q at the output of the first part of the
joint processing function R1 () as
q = QT B1
n y
=
=
=
=

Q B1
n (Ax + n)
T 1
Q Bn (ATt + n)
QT QJt + QT B1
n n
T 1
Jt + Q Bn n

(5.80)

J(1, 1) . . . J(1, s)

..

..
.

=
t + m,
0
J(s,
s)

0
15

At this point, this matrix is not specified yet and is allowed to be an arbitrary matrix.

138

5. Multiple-Input / Multiple-Output Discrete Multitone

the transformed noise m =QT B1


n n having a covariance matrix
Cm = Ir .

(5.81)

The second part R2 () of the receiver joint processing function is then defined
as follows
R2 : Rr Rs ,

r(1)
q(1)

7 r = ... = R2 (q),
q = ...
r(s)
q(r)
via the recursion (Nulling and Cancelling)

q(s)
r(s) = dec
J(s, s)
P

q(k) si=k+1 J(k, i)r(i)


,
r(k) = dec
J(k, k)

(5.82)

(5.83)
k = (s 1), . . . , 1,

where J(k, k) 6= 0 for k = 1, . . . , s is assumed and dec {} denotes the mapping


back onto the symbol constellations (Decision Device). Note that R2 is a nonlinear function.
With these definitions, we can calculate (assuming error free previous decisions) the (sum -) capacity as

1X
log 1 + (J(i, i))2 .
2
s

CT,R =

(5.84)

i=1

We want to emphasize that this receive joint processing function is very general
in the sense that if we use appropriate transmit matrices T, we obtain well known
MIMO schemes as special cases.
Consider, e.g., the case when we use a permutation matrix as transmit matrix,
i.e., a matrix, which contains exactly one 1 per row and column, and all other
entries are zero. It can be shown that if we use a certain (optimized) permutation
matrix, the scheme is equivalent to the V-BLAST scheme [15, 16, 18]. Note that VBLAST performs an efficient algorithm in order to find the optimum permutation
matrix. It is obvious that a multiplication by a permutation matrix requires no
T = 0 in this case.
T
= Oadd
additional complexity, i.e., we have Omult
Let us also discuss the case when we use the same transmit matrix as it is used
in the full diagonalization SVD scheme (Subsection 5.3.1). We have,
QJ = B1
n AVBa
T

= UDV VBa
= UDBa ,

(5.85)

5.4. The UP MIMO Scheme

139

and, furthermore,
Q = U and J = DBa ,
since the columns of U are pairwise orthogonal and (DBa ) is a diagonal matrix.
Hence, the whole recursion (5.83) collapses into independent decisions, and we
have shown that the scheme is equivalent to the SVD scheme (full diagonalization)
with this choice of transmit matrix.
Finally, for our MIMO DMT scheme - utilizing the block diagonal structure
of (2.25) - we can compute the (increased compared with the full diagonalization
SVD scheme) Online Complexity at the receiver side as
N
K (6K 1) 3K 2 ,
2
N
=
3K (2K 1) 3K 2 ,
2
= N K.

R
Omult
=
R
Oadd
R
Odiv

(5.86)

The joint processing function we will choose at the transmitter side is based on
a special algorithm [40] that decomposes any unitary (orthonormal) matrix into a
product of basic rotation matrices. We will look at this algorithm in the following
subsection.

5.4.2 Parametrization of Unitary (Orthonormal) Matrices


First of all, observe that any real valued orthonormal matrix is unitary if it is considered as a complex matrix. Hence, we will present the algorithm for the more
general case of a unitary matrix. We start by introducing a family of n - dimensional unitary matrices
[Up q ]1p<qn
consisting of the unitary matrices
U1 2 , U1 3 , . . . , U1 n , U2 3 , . . . , U2 n , . . . , U(n1) n .
Each of these unitary matrices depends on two parameters p q , p q [, [,
which are indexed by p and q, because they can be different for different matrices
of this family, and is defined as

1,
i = k and i 6= p, q

cos (p q ) ,
i = k and i = p, q

ep q sin (p q ) , i = p and k = q
(5.87)
U p q (p q , p q ) (i, k) =

p q sin ( ) ,

e
i
=
q
and
k
=
p

p
q

0,
otherwise,
i.e., it is a rotation matrix. To illustrate such an Up q , we have

cos 1 3
0 e1 3 sin 1 3 0

0
1
0
0
,

e 1 3 sin 1 3 0
cos 1 3
0
0
0
0
1

(5.88)

5. Multiple-Input / Multiple-Output Discrete Multitone

140

which is the 4 - dimensional basic unitary matrix U1 3 (1 3 , 1 3 ). Note that


Up qH (, ) = Up q (, ) .

(5.89)

Theorem 5.1 Let U C nn denote a unitary matrix. Then there exist parameters

p q , p q , for p < q, and 1 , . . . , n , which all belong to the interval 2 , 2 ,


except for 1 n , 2 n , . . . , (n1) n and n , which can take values in the interval
[, [, so that16

e 1
0
1
n
Y
Y

..

U=
Up q (p q , p q ) .

.
0
en p=n1 q=p+1
Proof. Can be found in [1, 40]. For completeness, we will present the proof in the
following. Given an n - dimensional unitary matrix U, consider the unitary matrix
H
Uh1i = UU1 n (1 n , 1 n ) obtained by post-multiplying U with the Hermitian
transposed of U1 n (1 n , 1 n ). We choose the parameters 1 n and 1 n in such a
way that Uh1i satisfies the following two constraints:
1.

Uh1i =



.. . .
.
.

.. ..
. .

2.
either U h1i (1, 1) = 0 or

U h1i (1, n) = 0 ,

n
o

arg U h1i (1, 1) .


2
2

It
be verified that these conditions uniquely specify 1 n [, [ and 1 n
can

2 , 2 (with the convention that we set 1 n to zero in case it is indeterminate).
Essentially, these constraints imply that we choose the parameters according to
tan 1 n =

U (1, n) 1 n
e
R.
U (1, 1)

Similarly, we can post-multiply Uh1i by U1 (n1)


Uh2i , such that
1.

Uh2i =

16

If n1 > n2 ,

Qn2

n=n1



.. . .
.
.

0 0

.. .. ..
. . .

an = an1 an1 1 an2 .

(5.90)

1 (n1) , 1 (n1) to obtain

U h2i (1, n 1) = 0 ,

5.4. The UP MIMO Scheme

2.
either U h2i (1, 1) = 0 or

141

n
o

arg U h2i (1, 1) .


2
2

Again,
using the same convention,

these conditions uniquely specify 1 (n1)


2 , 2 and 1 (n1) 2 , 2 . Note that the range of 1 (n1) is different from

the range of 1 n as a result of the condition 2 arg U h1i (1, 1) 2 .


We can continue this process until all elements of the first row are zero with
the exception of the first element. Hence, we obtain
H

Uhn1i = UU1 n (1 n , 1 n ) U1 2 (1 2 , 1 2 ) ,
hn1i

U = U

12

1n

(1 2 , 1 2 ) U

where Uhn1i is a unitary matrix with only


position (1, 1), i.e.,

Uhn1i = . .
.. ..

(5.91)

(1 n , 1 n ) ,

one nonzero entry in its first row at

..
.

..
.

(5.92)

As the Euclidean norm of a row of a unitary


matrix is 1, cf. Subsection 3.1.3,
U hn1i (1, 1) = e1 for some 1 2 , 2 . From the orthogonality of the rows
of a unitary matrix, it follows that U hn1i (k, 1) = 0 for k 6= 1. Therefore, Uhn1i
can be represented as follows,

Uhn1i =

e1
0
..
.

0
V

(5.93)

0
where V is an (n 1) - dimensional unitary matrix. We can operate on V in a
similar manner by using matrices U2 k (2 k , 2 k ) for k = n, . . . , 3 and pass to an
(n 2) - dimensional unitary matrix. Continuing this process, we pass to a 1 dimensional unitary matrix, which consists of just one entry en . Therefore, by
this process, we can parameterize the bounded and closed space of n - dimensional
unitary matrices by
1 + 3 + 5 + . . . + (2n 1) = n2

(5.94)

parameters. Except k n , for k = 1, . . . , n 1, and n , which


take values in the
interval [, [, all other parameters belong to the interval 2 , 2 . Thus, out
of n2 parameters, n belong
to the interval [, [, and all remaining parameters

belong to the interval 2 , 2 . This concludes the proof.

142

5. Multiple-Input / Multiple-Output Discrete Multitone

After analyzing this algorithm, we come to the conclusion that a real valued
orthonormal matrix yields

0
,
n =
(5.95)

1 = . . . = n1 = 0,
p q = 0 p, q,
such that the basic unitary matrices Up q obtained and in turn the decomposition of
Theorem 5.1 are real valued.

5.4.3 The Transmitter Side


In order to describe our transmission scheme completely, we have to specify the
transmit matrix T Rts . Our starting point is the (full diagonalization) SVD
method, since it yields the optimum performance, provided that the available transmit power is distributed according to the Water Filling rule. We have
T
B1
n A = UDV .

Applying Theorem 5.1 to VT Rtt , we obtain


e 1
0
1
t
Y
Y

..

VT =
Up q (p q , p q ) ,

.
0
et p=t1 q=p+1
where 1 = . . . = t = 0 without loss of generality, since these factors - if not
already zero according to (5.95) - can be incorporated in the matrix U. Then,

!
t1 p+1
Y
Y
p qH
V=
U
(p q , p q ) .
(5.96)
p=1

q=t

Observe that we have


Up qH (0, p q ) = It

(5.97)

and

|p q |
p qH

(p q , p q ) It = 2 sin
,
(5.98)
U
2
2
which motivates us to set those basic rotation matrices equal to the identity matrix
that have the smallest
absolute o
angle values |p q |. To put it in another way, given
n
any number f 0, . . . , t(t1)
, we obtain a unitary matrix Vf Rtt according
2
to

!
t1 p+1
Y

Y
Y
p qH
U
(p q , p q )
,
(5.99)
Vf =

|{z}
p=1 q=t
f factors
with greatest |p q |
of

5.4. The UP MIMO Scheme

143

using the convention V0 = It . Let


B = diagts {B(1, 1), . . . , B(s, s)}
(5.100)

denote a diagonal matrix with tr BBT = S. We will define the transmit matrix
to be
T = Tf = Vf B,
(5.101)
where B is optimized according to the Water Filling rule applied to the diagonal
elements of the matrix J0 , J0 being the upper triangular matrix of the QR - decomposition of B1
n AVf , i.e.,
Q0 J0 = Bn1 AVf .
(5.102)
Note the relationship between J0 and J, J being the upper triangular matrix of the
QR - decomposition of B1
n AT, i.e.,
QJ = Bn1 AVf B

(5.103)

0 0

= Q J B,
from which follows
Q = Q0

and J = J0 B,

(5.104)

and, furthermore,
J(i, i) = J 0 (i, i)B(i, i),

i = 1, . . . , s,

(5.105)

so that we obtain the (sum -) capacity, cf. (5.84), as


CT,R =
=
where

2
1X
log 1 + (B(i, i))2 J 0 (i, i)
2

1
2

i=1
s
X

(5.106)


2 +
log L J 0 (i, i)
,

i=1

(B(i, i)) = L

+
1
, i = 1, . . . , s,
(J 0 (i, i))2
P
with a Water Level L, chosen to satisfy si=1 (B(i, i))2 = S.
We want to emphasize that using optimized permutation matrices in the transmitter - similar to the BLAST approach - may enhance the performance even further. For our method, this means that we deal with transmit matrices of the form
2

T = Tf = Vf PB,

(5.107)

where the permutation matrix P is chosen, such that the QR - decomposition of


B1
n AVf P, i.e.,
Q0 J0 = B1
(5.108)
n AVf P,

144

5. Multiple-Input / Multiple-Output Discrete Multitone

yields optimized diagonal elements J 0 (i, i). Again, permutation corresponds to


re-labeling, and the multiplication by the permutation matrix does not increase the
Online Complexity. For an optimization technique, we refer to [60].
Observe that if f = 0, the UP MIMO scheme is essentially equivalent to VBLAST, whereas if f = t(t1)
2 , we have equivalence to the optimum SVD scheme
(full diagonalization). Therefore, we can regard the UP MIMO scheme as a mixture
of V-BLAST and the SVD scheme, having the freedom to adapt the parameter f to
the actual situation (environment).
The joint processing function at the transmitter side
T : Rs Rt ,

(5.109)

t 7 x = T(t) = Tf t,
can be evaluated in two different ways during transmission.
The conventional way would be to perform multiplications of the vector t with
the transmit matrix Tf , where Tf has been completely determined in the Startup
phase before data transmission. This method would yield an Online Complexity of
T
Omult
= (N 1) 2K 2 ,
T
Oadd

(5.110)

= N K (2K 1) 2K

for our MIMO DMT scheme (2.25), where its block diagonal structure has been
utilized. It is equal to the transmit Online Complexity of the full diagonalization
SVD scheme, cf. (5.66).
The other way to evaluate (5.109) is to make use of the factorization of Tf
into a diagonal matrix B, possibly a permutation matrix P, and f basic rotation
matrices according to (5.99). Observe that a multiplication by a real valued basic
rotation matrix Up q (p q , 0), cf. (5.87), requires 4 real valued multiplications and
2 real valued additions. Hence, if we make use of the blockndiagonal structure
o of
K(K1)
our MIMO DMT scheme (2.25), and denote by f0 , f N 0, . . . ,
and
2
2

fn {0, . . . , K(2K 1)}, n = 1, . . . , N2 1, the number of rotation matrices


used per block17 , we obtain
T
Omult
= 4f + N K,
T
Oadd

with

f=

2
X

n=0

(5.111)

= 2f,

N
2
fn 0, . . . , K (2K 1) K .
2

(5.112)

Therefore, if we only make use of a small number f of rotation matrices, the computational effort required at the transmitter is very low. But note that if f exceeds a
17
Note that there is one block per subcarrier / frequency (n denotes the subcarrier / frequency
index).

5.4. The UP MIMO Scheme

145

certain threshold, the Online Complexity of the multiplications will be even higher
than the corresponding quantity in (5.110). We can easily compute this threshold
to be
N
K2
K (2K 1)
,
(5.113)
4
2
which is exactly one half of all basic rotation matrices obtained from the decomposition of Theorem 5.1. The transmit Online Complexity of the required additions
will never exceed its conventional counterpart, as can be seen from comparing
(5.110) with (5.111) and (5.112). Note that this second approach will always yield
a high transmit Online Complexity if we choose f to be close to the (optimum)
SVD scheme. In order to obtain a small complexity, we have to operate close to
V-BLAST.

5.4.4 Comments
As already mentioned in the beginning of this section, the UP MIMO scheme
was originally designed for wireless transmission in order to cope with different
amounts of channel state information at the transmitter side.
Whereas channel knowledge at receiver side can be justified in practice by the
use of channel estimation and channel prediction techniques, channel knowledge
at the transmitter side can only be guaranteed if either reciprocity applies or there
is a backward channel. This implies that transmission schemes that utilize channel knowledge at the transmitter side are mostly of theoretical interest, since they
can yield upper bounds for achievable performances. However, there are some
scenarios, e.g. slowly varying channels, for which it would make sense to use
transmission schemes that require channel knowledge at the transmitter side.
In practice, we have the problem that we do not know in advance whether a
transceiver operates only in such a special environment, or if the environment may
suddenly change and channel knowledge at the transmitter side is lost. Consider
for example the situation where a user does not move during the first moments of
transmission, but, after a certain period of time, starts moving. Therefore, such
a transmission scheme is very inflexible and sensitive to (faster) variations of the
channel.
This was the original reason for designing the UP MIMO transmission scheme
[2, 37], since it can adapt to the current channel situation. According to (5.101) /
(5.107) with (5.99), the transmit matrix T can be described by a set of parameters,
i.e., the parameters of the basic rotation matrices and the diagonal elements of B.
Note that the transmitter must have knowledge of T but not of anything more. If
the channel is quasi-static, all parameters are re-transmitted to the transmitter using
a backward channel, and full (SVD) performance is obtained. If the channel starts
to vary (not too fast), only the most important parameters (the ones that correspond
to the greatest absolute angle values) are re-transmitted; and if the channel fades
really fast, no parameters are re-transmitted (V-BLAST performance). Note also
the possibility that there is only a low-rate backward channel, so that we can only

146

5. Multiple-Input / Multiple-Output Discrete Multitone

re-transmit a few channel parameters. Hence, the main advantage of the (wireless)
UP MIMO scheme is its flexibility in dealing with channel fluctuations.
We also want to mention that the author proposed a differential variant of the
UP MIMO scheme as well in [2]. The basic idea behind this differential UP MIMO
scheme is that it applies Theorem 5.1 not to the optimum transmit matrix obtained
by the SVD, but to the changes, i.e., the quotient, of the optimum transmit matrices
for consecutive time instants. With this method, it is feasible to reduce the number
of re-transmitted parameters even further (if the channel does not vary too fast).
We performed simulations of the UP MIMO scheme with the same parameters
as in Subsection 5.3.4. It turned out that the UP MIMO scheme yields the same
curves as the full diagonalization SVD scheme, even for small numbers f of rotation matrices used (and often even for f = 0). The reason for this is that we
have a high SNR for shorter loop lengths, so that even a simple channel matrix
inversion (zero-forcing) would have a performance almost identical to the SVD
scheme (full diagonalization). For such an operating range, we will mostly gain
by crosstalk (FEXT) removal, with very little influence of the noise. For longer
loop lengths, we already came to the conclusion, cf. Subsection 5.3.4, that joint
processing yields less benefit, so that the gain achieved by applying the UP MIMO
scheme is limited.
So, one might ask, why use the UP MIMO scheme at all (for wireline transmission. For wireless transmission, its advantages are unquestioned)? The answer
is that there are situations where the UP MIMO scheme has a better performance
than other schemes. We found out that if the noise is correlated across the various
loops, the UP MIMO scheme, and especially the use of some rotation matrices at
the transmitter side, can improve performance, provided that the SNR is not too
high. Hence, one good argument for applying the UP MIMO scheme is to make
the transmission robust against certain impairments by utilizing the information
available and still having a low Online Complexity at the transmitter side.
In [61], it is shown that a Decision Feedback structure at the receiver side has
its dual at the transmitter side, that is called Precoding, cf. also [14]. Since we
have a decision feedback structure according to (5.83), we can just as well think
of a UP MIMO precoding scheme, which benefits from the use of basic rotation
matrices.
Finally, note that our algorithm for parameter determination / selecting the basic rotation matrices is very simple, since it merely has to find the parameters with
the greatest absolute values, cf. (5.99). However, it only takes into account the
matrix VT obtained by the SVD of
T
B1
n A = UDV ,

ignoring any information contained in the matrix D. Note that the matrix U does
not affect the upper triangular matrix in the QR-decomposition of B1
n AT. So
one may look for an improved (and maybe more complex) algorithm that makes
additional use of this matrix (i.e., the singular values of B1
n A).

6. CONCLUSIONS AND OUTLOOK

This work gives a comprehensive treatment of Multiple - Input / Multiple - Output


Discrete Multitone (MIMO DMT) transmission, applied to data transmission over
cable bundles. It was shown that such a transmission scenario can be modeled by a
complex vector channel, i.e., by a deterministic complex matrix and by a complex
noise vector.
We developed a theory for complex random vectors that takes into account
rotationally variant random vectors, and is therefore of great importance for our
purpose, since complex random vectors (in DMT and MIMO DMT) have a nonvanishing pseudo-covariance matrix in general. We proved a Generalized Maximum Entropy Theorem, that includes the pseudo-covariance matrix in its entropy
inequality and therefore tightens the upper bound for rotationally variant random
vectors. We showed that the additional correction term is independent of the specific probability distribution of the considered random vector. Furthermore, we
obtained several capacity results for the complex vector channel considered, that
take into account the pseudo-covariance matrix. We calculated the capacity loss
if we erroneously assume that the pseudo-covariance matrix is the zero matrix.
Note also that we derived a criterion for a matrix to be a pseudo-covariance matrix.
This generalizes the well-known criterion that a matrix is a covariance matrix of a
certain random vector if and only if it is symmetric / Hermitian and non-negative
definite.
We performed a detailed noise analysis for a DMT system and showed that
the noise vector at the input of the Decision Device is rotationally variant in general. We calculated the corresponding covariance matrix and pseudo-covariance
matrix, which were then specialized in order to obtain the noise variances of real
and imaginary part and to obtain the correlations between real and imaginary part
for a fixed frequency / subcarrier. Via eigenvalue decompositions, we were able to
determine the eccentricities and the rotations of the noise ellipses. It turned out that
the rotation angles are independent of the actual noise characteristics. They only
depend on the number of the considered subcarrier. Furthermore, it was shown
that different noise variances and correlations of real and imaginary part do not
occur in the presence of white noise (at the input of the receiver). For colored
noise, they do occur, and one has to use rotated rectangular constellations instead
of the common (square) QAM constellations. Otherwise, one has to accept a capacity loss and increased symbol error probability. We calculated both quantities
and found that the impact on capacity is not very substantial due to the high SNR

148

6. Conclusions and Outlook

in wireline transmission. On the other hand, the loss1 measured by (uncoded) symbol error probability can be quite large, so that we can expect enough benefit to
afford the additional effort required for implementation. Furthermore, we showed
how to modify the existing bit-loading algorithms in order to obtain the optimum
constellation parameters.
We also performed a detailed interference analysis for a DMT system. We considered the case when the channel impulse response exceeds the Cyclic Prefix on
both sides, which yields precursors and postcursors from both neighboring DMT
symbols (intersymbol interference) and also intercarrier interference. We derived
closed form formulas for both contributions and considered their statistical properties as well. We came to the conclusion that both interference contributions are
complex random vectors with equal first and second order moments and a nonvanishing pseudo-covariance matrix.
We also showed how the noise and interference results obtained can be utilized
for the design of Time Domain Equalizers.
In a second step, we generalized the noise and interference results from DMT
to the MIMO DMT case. Again, it was possible to obtain closed form solutions,
even for very general assumptions with respect to correlations across the various
loops of the cable bundle.
We presented the general form of a transmission scheme that is suited to the
MIMO DMT channel and is based on so-called joint processing functions. It allows
the use of SISO codes, and we introduced the (sum -) capacity as a performance
measure.
We dealt with transmission schemes whose joint processing functions were
based on the Singular Value Decomposition (SVD) of the channel matrix. We
showed that the optimum joint processing function can be obtained by means of
the SVD. Furthermore, we studied low(er)-complexity variations and discussed
their performance. To obtain quantitative results, we performed simulations with
realistic (practically used) parameters and compared the various methods.
Finally, we presented the UP MIMO scheme, a scheme that was originally designed by the author for wireless transmission, and also has applications to wireline
transmission. Specifically, it can be used to reduce the computational complexity
at the transmitter side (but not at the receiver side). We treated various aspects of
this scheme.
We also want to mention potential areas for further research:
The application of rotated and / or non-square constellations is not compliant to the xDSL standards, since it requires a modification of the transmitter.
In order to utilize the non-vanishing pseudo-covariance matrix in a standard
compliant manner, one has to think of alternative solutions. A possible approach could be to adapt the decoding strategy (soft - decoding) so that it
makes use of the knowledge that transmission is more reliable, e.g., for the
real part than for the imaginary part of the transmitted symbol.
1

Or gain. This depends on the viewpoint.

149

We explained how the noise and interference results obtained can be used for
the design of Time Domain Equalizers both in the DMT and MIMO DMT
case. However, we did not present explicit algorithms and this is certainly a
very interesting application.
We already mentioned that the UP MIMO scheme has potential for further
improvements if it utilizes the singular values of the channel matrix. A low
complexity algorithm for parameter determination / selecting the basic rotation matrices that has better performance because it makes use of this data
could be the goal of future research activities.
Joint processing functions with small Online Complexity and good performance are of interest as well as methods that do not require synchronization
between the different loops.

150

6. Conclusions and Outlook

APPENDIX

A. NOTATION AND ABBREVIATIONS

In this appendix we introduce the mathematical notation used in this work and
summarize the most important abbreviations.

A.1 Mathematical Notation


.
The set of real numbers is written as R, the complex numbers are C = R + R,
integers Z, and natural numbers N. For complex variables, <{} and ={} denote
the real and imaginary part (operators), respectively, i.e., c = <{c} + ={c} for
c C. Complex conjugation is written as
c = <{c} ={c}.

(A.1)

Similarly, the set of real valued vectors of dimension n (real valued n - tuples)
is written as Rn and the set of complex valued vectors of dimension n (complex
.
valued n - tuples) is written as C n = Rn + Rn . Furthermore, the set of real valued
(n m) - matrices is written as Rnm and the set of complex valued (n m) .
matrices is written as C nm = Rnm + Rnm . A boldface font is used to denote
vectors (lowercase letters, e.g. x) and matrices (uppercase letters, e.g. A), in
order to distinguish these objects from scalars. For complex vectors and matrices,
the real and imaginary part (operators), <{} and ={}, respectively, are defined
componentwise, such that we have x = <{x} + ={x} for x C n and A =
<{A} + ={A} for A C nm . Analogously to (A.1), complex conjugation is
written as
x = <{x} ={x} and A = <{A} ={A}.

(A.2)

Transpose and Hermitian transpose of a vector / matrix are denoted by the superscripts T and H (in a boldface font), respectively. The inverse of a non-singular
square matrix A is denoted by A1 , whereas the Moore - Penrose pseudo inverse
[19] of an arbitrary matrix A C nm or A Rnm is denoted by A . Determinant and trace of a matrix A are denoted by det A and tr A, respectively.
A rectangular matrix is called diagonal if all entries with different column and
row indices are 0.
By diagrt {d1 , . . . , ds } with s = min{r, t} we denote a complex (real) valued
matrix with r rows and t columns for which all entries with different row and
column indices are 0 and the entry with i-th row and i-th column index is equal to
di (i = 1, . . . , s).

A. Notation and Abbreviations

154

A.2 Frequently Used Symbols


The following table summarizes some symbols used throughout the text.
Symbol

= 1
K
p
N

F=
F1 =

2
1 e N kl
N

2
1 e N kl
N

Meaning
Imaginary number
Number of (considered) loops
in a cable bundle
Length of Cyclic Prefix
DFT and IDFT length

i
k,l=0,...,N
1
i
k,l=0,...,N 1

In
E{}
x
Cx
Px
Rz
Rz1 ,z2
h(x)
I(x; y)
C
CT,R
C
L
d(k)
R t2
Q(x) = 12 x e 2 dt
P
k
H(z) = P
k= h(k)z

+
Hl (z) = k=0 h(k + l)z k
Hl (z) =
T
R

P1

k= h(k

l)z k

N - dimensional DFT - matrix


N - dimensional IDFT - matrix
n - dimensional identity matrix
Expectation operator
Mean / expectation vector of
random vector x
Covariance matrix of
random vector x
Pseudo-covariance matrix of
random vector x
Autocorrelation function of
random process z
Cross-correlation function between
random processes z1 and z2
Differential entropy of
random vector x
Mutual information between
random vectors x and y
Channel capacity
(Sum -) capacity
Capacity loss
Water Level (Water Filling)
Relative eigenvalue difference
at frequency / subcarrier k
Q-function
Z - transform of h
One-sided Z + - transform of
h shifted to the left by l
One-sided Z - transform of
h shifted to the right by l
Transmit joint processing function
Receive joint processing function

A.3. Abbreviations

A.3 Abbreviations
The following table summarizes the acronyms used in this work.
Acronym
ADSL
CP
CSI
DFE
DFT
DMT
DSL
FDE
FEXT
FFT
IDFT
ICI
IFFT
ISI
MIMO
MMSE
NEXT
NJPF
OFDM
PSD
QAM
QR
SISO
SNR
SVD
TDE
UP MIMO
V-BLAST
VDSL
xDSL

Meaning
Asymmetric Digital Subscriber Line
Cyclic Prefix
Channel State Information
Decision Feedback Equalization
Discrete Fourier Transform
Discrete Multitone Modulation
Digital Subscriber Line
Frequency Domain Equalizer
Far-End Crosstalk
Fast Fourier Transform
Inverse Discrete Fourier Transform
Intercarrier Interference
Inverse Fast Fourier Transform
Intersymbol Interference
Multiple - Input / Multiple - Output
Minimum Mean Square Error
Near-End Crosstalk
No Joint Processing Function (scheme)
Orthogonal Frequency Division Multiplexing
Power Spectral Density
Quadrature Amplitude Modulation
Matrix decomposition (Gram - Schmidt orthogonalization)
Single - Input / Single - Output
Signal-to-Noise Ratio
Singular Value Decomposition
Time Domain Equalizer
Unitary Parametrization Multiple - Input / Multiple - Output
Vertical Bell Labs Layered Space-Time (detection algorithm)
Very-high bit rate Digital Subscriber Line
Acronym for all DSL systems

155

156

A. Notation and Abbreviations

B. SIMULATION SCENARIOS

B.1 Scenario 1
For the used nomenclature we refer to Section 2.1.

B.1.1 Transmission Medium


1 twisted pair copper wire
Manufacturer: Huber & Drott
Type: F-02YHJA2Y
Diameter: 0.4 mm
Loop length: 1 km - 10 km
The transfer function and in turn the impulse response g are obtained by measurements.

B.1.2 DMT Parameters


DFT-length: N = 512
Length of Cyclic Prefix: p = 32
Channel symbol rate:

1
T

= 2.208 MHz

Subcarrier spacing: 4312.5 Hz


Transmit power: SDMT = 100 mW
No Time Domain Equalizer

B.1.3 Noise Model


We assume two additive, statistically independent noise components. One is the
typical noise environment in a cable bundle including crosstalk and background
noise. The other is stationary narrowband interference with a bandwidth of 10
kHz, center frequency of 1.07 MHz, and 0 dBm power. The overall - including
these two noise components - (one-sided) power spectral density (PSD) [34] of the
stationary noise process s at the input of the receiver is depicted in Figure B.1 (used
for all loop lengths). The noise process s is assumed to be zero-mean.

B. Simulation Scenarios

158

Power Spectral Density


20

40

dBm/Hz

60

80

100

120

140

200

400

600

800

1000

kHz

Fig. B.1: One-sided power spectral density (PSD) of the noise process s at the input of the
receiver.

B.2 Scenario 2
For the used nomenclature we refer to Section 2.2.

B.2.1 Transmission Medium


K = 20 twisted pair copper wires in a cable bundle
Manufacturer: Huber & Drott
Type: F-02YHJA2Y
Diameter: 0.4 mm
Loop length: 1 km - 7 km
The transfer functions (including the FEXT modeling cross transfer functions)
and in turn the impulse responses ghkli , k, l = 1, . . . , K, are obtained by measurements.

B.2.2 DMT Parameters


The DMT parameters are identical for all K = 20 loops.

B.3. Scenario 3

159

Power Spectral Density


20

40

dBm/Hz

60

80

100

120

140

200

400

600

800

1000

kHz

Fig. B.2: One-sided power spectral density (PSD) of all noise processes shki , k =
1, . . . , K, at the input of the receivers.

DFT-length: N = 512
Length of Cyclic Prefix: p = 32
Channel symbol rate:

1
T

= 2.208 MHz

Subcarrier spacing: 4312.5 Hz


Transmit power: SDMT = 100 mW SMIMO DMT = 20 100 mW = 2 W
No Time Domain Equalizers

B.2.3 Noise Model


We assume only background noise. To be more precise, we assume that the stationary noise processes shki , k = 1, . . . , K, at the input of the receivers are zero-mean
and mutually independent, hence, Rshk1 i ,shk2 i = 0 for k1 6= k2 , cf. Subsection
5.1.1, and that they have constant (one-sided) power spectral densities (PSDs) [34]
of 140 dBm/Hz, i.e., they all have the same (one-sided) power spectral density
(PSD) that is depicted in Figure B.2 (used for all loop lengths).

B.3 Scenario 3
For the used nomenclature we refer to Section 2.2.

B. Simulation Scenarios

160

B.3.1 Transmission Medium


K = 20 twisted pair copper wires in a cable bundle
Manufacturer: Huber & Drott
Type: F-02YHJA2Y
Diameter: 0.4 mm
Loop length: 1 km - 7 km
The transfer functions (including the FEXT modeling cross transfer functions)
and in turn the impulse responses ghkli , k, l = 1, . . . , K, are obtained by measurements.

B.3.2 DMT Parameters


The DMT parameters are identical for all K = 20 loops.
DFT-length: N = 512
Length of Cyclic Prefix: p = 32
Channel symbol rate:

1
T

= 2.208 MHz

Subcarrier spacing: 4312.5 Hz


Transmit power: SDMT = 100 mW SMIMO DMT = 20 100 mW = 2 W
No Time Domain Equalizers

B.3.3 Noise Model


We assume the typical noise environment in a cable bundle including crosstalk
and background noise. To be more precise, we assume that the stationary noise
processes shki , k = 1, . . . , K, at the input of the receivers are zero-mean and
mutually independent, hence, Rshk1 i ,shk2 i = 0 for k1 6= k2 , cf. Subsection 5.1.1,
and that their (one-sided) power spectral densities (PSDs) [34] are all equal to
the power spectral density (PSD) that is depicted in Figure B.3 (used for all loop
lengths).

B.3. Scenario 3

161

Power Spectral Density


80

90

100

dBm/Hz

110

120

130

140

150

200

400

600

800

1000

kHz

Fig. B.3: One-sided power spectral density (PSD) of all noise processes shki , k =
1, . . . , K, at the input of the receivers.

162

B. Simulation Scenarios

BIBLIOGRAPHY

[1] D. Agrawal, T. J. Richardson, R. Urbanke, Multiple Antenna Signal Constellations for Fading Channels, IEEE Transactions on Information Theory,
vol. 47, no. 6, pp. 2618-2626, Sep. 2001.
[2] A.Burr, Y. Zacharov, H. Toeger, W. Qiu, M. Meurer, Ch. Stimming, A.
Vanaev, G. Tauboeck, J. Shen, H. Mai, Selected MIMO Techniques and
their Performance, IST-2001-32125 FLOWS Deliverable D14, 2003.
[3] A. Busboom, G. Herrmann, R. Tzschoppe, J. B. Huber, IFC - Aktive

Kompensation des Nahnebensprechens fur die DSL Ubertragung,


Proc. of
12. ITG-Fachtagung Kommunikationskabelnetze, Koln, Germany, December
2005.
[4] S. Buzzi, M. Loops, A. M. Tulino, A new Class of Muliuser CDMA Receivers based on The Minimum Mean-Output-Energy Strategy, Proc. of IEEE
International Symposium on Information Theory (ISIT), Naples, Italy, p. 355,
June 2000.
[5] J. Choi, B. Mondal, R.W. Heath, Jr., Interpolation Based Unitary Precoding
for Spatial Multiplexing MIMO-OFDM with Limited Feedback, submitted to
IEEE Transactions on Signal Processing, December 2004.
[6] T.M. Cover, Y.A. Thomas, Elements of Information Theory, John Wiley &
Sons, Inc., 1991.
[7] H.J. Dirschmid, Mathematische Grundlagen der Elektrotechnik, 4.,
verbesserte Auflage, Vieweg Braunschweig, 1990.
[8] K. Endl, W. Luh, Analysis II: Eine integrierte Darstellung, 8. Auflage,
AULA-Verlag Wiesbaden, 1994.
[9] ETSI, Transmission and Multiplexing (TM); Access transmission systems on
metallic access cables; Very high speed Digital Subscriber Line (VDSL); Part
1: Functional requirements, ETSI, TM6 TS 101 270-1, Version 1.2.1, Oct,
1999.
[10] ETSI, Transmission and Multiplexing (TM); Access transmission systems
on metallic access cables; Very high speed Digital Subscriber Line (VDSL);
Part 2: Transceiver specification, ETSI, TM6 TS 101 270-2, Version 1.1.1,
Feb, 2001.

164

Bibliography

[11] ETSI, Transmission and Multiplexing (TM); Access transmission systems


on metallic access cables; Very high speed Digital Subscriber Line (VDSL);
Part 1: Functional requirements, ETSI, TM6 TS 101 270-1, Version 1.3.1, Jul,
2003.
[12] ETSI, Transmission and Multiplexing (TM); Access transmission systems
on metallic access cables; Very high speed Digital Subscriber Line (VDSL);
Part 2: Transceiver specification, ETSI, TM6 TS 101 270-2, Version 1.2.1,
Jul, 2003.
[13] R.F.H. Fischer, Mehrkanal- und Mehrtragerverfahren fur die schnelle dig
itale Ubertragung
im Ortsanschlussleitungsnetz, PhD thesis, Universitat Erlangen/Nurnberg, Germany, 1996.
[14] R.F.H. Fischer, Precoding and Signal Shaping for Digital Transmission,
John Wiley & Sons, Inc., New York, NY, USA, 2002.
[15] G. J. Foschini, Layered Space-Time Architecture for Wireless Communication in a Fading Environment When Using Multiple Antennas, Bell Laboratories Technical Journal, pp. 41-59, Autumn 1996.
[16] G. J. Foschini, G. D. Golden, R. A. Valenzuela, P.W. Wolniansky, Simplified Processing for High Spectral Efficienty Wireless Communication Employing Multi-Element Arrays, IEEE Journal on Selected Areas in Communications, pp. 1841-1852, Nov. 1999.

[17] W. Gerstacker, Entzerrverfahren fur die schnelle digitale Ubertragung


u ber
symmetrische Leitungen , PhD thesis, Universitat Erlangen/Nurnberg, Germany, 1999.
[18] G. Ginis, J.M. Cioffi, On the Relation between V-BLAST and the GDFE,
IEEE Communications Letters, pp. 364-366, Sep. 2001.
[19] G.H. Golub, C.F. Van Loan, Matrix Computations, North Oxford Academic Publishers Ltd, a subsidiary of Kogan Page Ltd, 1986.
[20] N.R. Goodman, Statistical analysis based on a certain multivariate complex
Gaussian distribution, Ann. Math. Statist., vol. 34, pp. 152-176, 1963.
[21] Paul R. Halmos, Measure Theory, Springer-Verlag New York Inc., 1974.
[22] W. Henkel, Th. Kessler, Maximizing the Channel Capacity of Multicarrier Transmission by Suitable Adaption of the Time-Domain Equalizer, IEEE
Transactions on Communictions, vol. 48, no. 12, pp. 2000-2004, Dec. 2000.
[23] C.A.R. Hoare, Quicksort, Computer Journal, Vol. 5, 1, 10-15, 1962.
[24] R.A. Horn, C.R. Johnson, Matrix Analysis, Cambridge University Press,
1999.

Bibliography

165

[25] ITU-T, Asymmetric Digital Subscriber Line ADSL Transceivers, ITU-T,


G.992.1, Jul, 1999.
[26] ITU-T, Splitterless Asymmetric Digital
Transceivers, ITU-T, G.992.2, Jul, 1999.

Subscriber

Line

ADSL

[27] ITU-T, Very high speed digital subscriber line foundation, ITU-T, G.993.1,
Nov, 2001.
[28] ITU-T, Asymmetric Digital Subscriber Line ADSL Transceivers 2
(ADSL2), ITU-T, G.992.3, Jul, 2002.
[29] ITU-T, Splitterless Asymmetric Digital Subscriber Line
Transceivers 2 (splitterless ADSL2), ITU-T, G.992.4, Jul, 2002.

ADSL

[30] ITU-T, Asymmetric Digital Subscriber Line ADSL Transceivers - Extended


bandwidth ADSL2 (ADSL2plus), ITU-T, G.992.5, May, 2003.
[31] E. Kreyszig, Introductory Functional Analysis with Applications, John Wiley & Sons Inc., 1978.
[32] A. Lampe, R. Schober, W.H. Gerstacker, Iterative Multiuser Detection
for Complex Modulation Schemes, Proc. of IEEE International Symposium
on Information Theory (ISIT), (Washington D.C.), p. 33, 2001.
[33] A. Lampe, Multiuser Detection and Channel Estimation for DS-CDMA
Systems, PhD thesis, Universitat Erlangen/Nurnberg, Germany, 2003.
[34] E.A. Lee, D.G. Messerschmitt, Digital Communication, Second Edition,
Kluwer Academic Publishers Boston/Dordrecht/London, 1994.
[35] D.J. Love, R.W. Heath, Jr., Limited Feedback Unitary Precoding For Orthogonal Space-Time Block Codes, IEEE Transactions on Signal Processing,
vol. 53, no. 1, pp. 64-73, January 2005.
[36] T. Magesacher, W. Henkel, G. Taubock, T. Nordstrom, Cable measurements supporting xDSL technologies, Elektrotechnik und Informationstechnik (e&i), Feb, 2002.
[37] M. Meurer, W. Qiu, Ch. Stimming, G. Tauboeck, G. White, A. Burr,
Outline Design for Terminal Baseband Processing and Implementation Complexity, IST-2001-32125 FLOWS Deliverable D18, October 2004.
[38] B. Mondal, R.W. Heath, Jr., Channel Adaptive Quantization for Limited
Feedback MIMO Beamforming systems, submitted to IEEE Transactions on
Signal Processing, September 2004.

[39] R.R. Muller,


Power and Bandwidth Efficiency of Multiuser Systems with
Random Spreading, PhD thesis, Universitat Erlangen/Nurnberg, Germany,
1999.

166

Bibliography

[40] F. D. Murnaghan, The Unitary and Rotation Groups, vol. III of Lectures on
Applied Mathematics, Spartan, Washington, DC, 1962.
[41] F.D. Neeser, Communication Theory and Coding for Channels with Intersymbol Interference, PhD thesis, ETH Zurich, Switzerland, 1993.

[42] P. Odling,
W. Henkel, P. O. Borjesson, G. Taubock, N. Petersson, et. al,
The Cyclic Prefix of OFDM/DMT - An Analysis, Proc. of the International
Zurich Seminar on Broadband Communications, Zurich, Switzerland, Feb,
2002.
[43] B. Picinbono, Random Signals and Systems, Englewood Cliffs, NJ:
Prentice-Hall, 1993.
[44] B. Picinbono, P.Chevalier, Widely Linear Estimation with Complex Data,
IEEE Transactions on Signal Processing, vol. 43, pp. 2030-2033, Aug. 1995.
[45] G. G. Raleigh, J. M. Cioffi, Spatio-Temporal Coding for Wireless Communication, IEEE Transactions on Communications, pp. 357-366, March 1998.
[46] J.C. Roh, B.D. Rao, Channel Feedback Quantization Methods for MISO
and MIMO Systems, Proc. of IEEE Symposium on Personal, Indoor and Mobile Radio Communications, Barcelona, Spain, September 2004.
[47] J.C. Roh, B.D. Rao, Vector Quantization Techniques for Multiple-Antenna
Channel Information Feedback, Proc. of International Conference on Signal Processing and Communications (SPCOM), Bangalore, India, December
2004.
[48] T. Starr, M. Sorbara, J. Cioffi, P. Silverman, Understanding Digital Subscriber Line Technology, Prentice Hall, 2003.
[49] D. Statovci, T. Nordstrom, Adaptive Subcarrier Allocation, Power Control, and Power Allocation for Multiuser FDD-DMT Systems, Proc. of IEEE
International Conference on Communications, ICC 2004, Paris, France, June
2004.
[50] D. Statovci, T. Nordstrom, Adaptive Resource Allocation in Multiuser
FDD-DMT Systems, Proc. of the 12th European Signal Processing Conference, EUSIPCO 2004, Vienna, Austria, Sept 7 - Sept 10, 2004.
[51] J. Stoer, R. Bulirsch, Introduction to Numerical Analysis Third Edition,
Springer New York, NY, 2002.
[52] V. Tarokh, N. Seshadri, A.R. Calderbank, Space-time codes for high data
rate wireless communication: performance criterion and code construction,
IEEE Transactions on Information Theory, vol. 44, no. 2, pp. 744-765, March
1998.

Bibliography

167

[53] G. Taubock, W. Henkel, T. Nordstrom , Verfahren zur Ubertragung


von

Daten , Patent angemeldet beim Osterreichischen Patentamt am 30. Aug.


2000, Vienna, Austria, Aug 30, 2000.
[54] G. Taubock, W. Henkel, MIMO Systems in the Subscriber-Line Network,
Proc. of the 5th International ODFM Workshop, Hamburg, Germany, Sept
2000.
[55] G. Taubock, Rotationally Variant Complex Channels, Proc. of the 23rd
Symposium on Information Theory in the Benelux, Louvain-la-Neuve, Belgium, May 29-31, 2002.
[56] G. Taubock, Noise Analysis of DMT, Proc. of IEEE Global Communications Conference, GLOBECOM, San Francisco, CA, USA, Dec 1-5, 2003.
[57] G. Taubock, On the Maximum Entropy Theorem for Complex Random
Vectors, Proc. of IEEE International Symposium on Information Theory, ISIT,
Chicago, Illinois, USA, June 27 - July 2, 2004.
[58] I.E. Telatar, Capacity of Multi-antenna Gaussian Channels, European
Transactions on Telecommunications, Vol. 10, No. 6, pp. 585-595, Nov/Dec
1999.
[59] A.W.M Van Den Enden, N.A.M. Verhoeckx, Discrete-Time Signal Processing, An Introduction, Prentice Hall, 1989.
[60] C. Windpassinger, T. Vencel, R.F.H. Fischer, Optimising MIMO DFE for
systems with spatial loading, Electronics Letters, vol. 38, no. 24, pp. 15911593, Nov. 2002.
[61] C. Windpassinger, R.F.H. Fischer, T. Vencel, J. B. Huber, Precoding in
Multi-Antenna and Multi-User Communications, IEEE Transactions on Wireless Communications, vol. 3, no. 4, pp. 1305- 1316, Jul. 2004
[62] C. Windpassinger, Detection and Precoding for Multiple Input Multiple
Output Channels , PhD thesis, Universitat Erlangen/Nurnberg, Germany,
2004.
[63] R.A. Wooding, The multivariate distribution of complex normal variables,
Biometrika, vol. 43, pp. 212-215, 1956.
[64] Y.C. Yoon, H. Leib, Maximizing SNR in Improper Complex Noise and
Applications in CDMA, IEEE Communications Letters, vol. 1, pp. 5-8, Jan.
1997.

168

Bibliography

BIOGRAPHY

Georg Taubock was born in Modling, Austria, in 1973. He received the Dipl.Ing. degree in electrical engineering from Vienna University of Technology in
1999 and finished his studies in Violoncello with the Diploma Examination at the
Conservatory of Vienna in 2000. He joined the Telecommunications Research
Center Vienna (ftw.) in 1999, where he is still working as researcher in the strategic I0 project. He is author of several scientific papers, two book-chapters, and one
patent. His research interests include Multiple - Input / Multiple - Output (MIMO),
Discrete Multitone modulation (DMT), Orthogonal Frequency Division Multiplexing (OFDM), Information Theory, Free Probability Theory, Time-Frequency Analysis, and mathematics in general.