Bounds On The MMSE of "Bad" LDPC Codes at Rates Above Capacity

Forty-Sixth Annual Allerton Conference
Allerton House, UIUC, Illinois, USA

September 23-26, 2008
ThB6.4
Bounds on the MMSE of Bad LDPC Codes at

Rates Above Capacity
Amir Bennatan , A. Robert Calderbank and Shlomo Shamai (Shitz)
Abstract We present bounds on the minimum mean square

error (MMSE) of LDPC codes at rates above capacity. One
potential application for MMSE estimation involves cooperative
communication. A relay following a compress-and-forward (CF)
strategy could rst compute an estimate of the transmitted
codeword, to reduce the level of noise in the retransmitted
signal. Our rst bound is based on an analysis of the LDPC
belief-propagation decoder. A second bound relies on the
relationship between the mutual information and the MMSE,
which was discovered by Guo et al. . We compute our bounds
for bad LDPC codes (requiring SNRs that are far above the
Shannon limit, for reliable communications to be possible) and
show that such codes substantially outperform good codes.
This advantage of bad codes implies an interesting degree of
freedom in the design of codes for cooperative communications.
Fig. 1.
I. I NTRODUCTION
The problem of communication at rates above capacity
was traditionally considered in the context of list decoding
(e.g. [6]). List decoding involves the point-to-point communication scenario. Our interest in the problem was motivated
by the following cooperative communication scenario.
Cooperative communication frequently takes the form of
one or more relays which facilitate the communications
between a source and a destination (see Fig. 1). Two
of the most common relay communication strategies (see
e.g. [3], [10]) are decode-and-forward (DF) and compressand-forward (CF). While DF focuses on the case when
the channel to the relay is strong enough to enable it to
decode, CF involves scenarios when decoding at the relay is
not possible. With CF, the relay helps by retransmitting its
observed signal so that the destination may combine it with
its own channel observation and decode using both1 .
In this paper, we were interested in the following question.
When the relay cannot decode (i.e., the communication is at a
rate that exceeds the capacity of the channel from the source
to the relay), perhaps it can use its knowledge of the code
structure to compute an MMSE estimate of the transmitted
codeword. The MMSE would enjoy a lower variance in comparison to the noise in the original signal (Similar approaches
Amir Bennatan and A. Robert Calderbank are with the Program in
Applied and Computational Mathematics (PACM), Princeton University
(email: abn@princeton.edu, calderbk@Math.Princeton.EDU).
Shlomo Shamai (Shitz) is with the Department of Electrical
Engineering, Technion - Israel Institute of Technology (email:
sshlomo@ee.technion.ac.il).
1 See [3], [10] for a precise discussion of CF.
978-1-4244-2926-4/08/$25.00 2008 IEEE
were suggested by [24], [21], [23], [19], [20]). Interestingly,

the analysis indicates that with good codes, which were
optimized for point-to-point communication (and approach
the capacity of such a channel), MMSE estimation breaks
down and produces no benet. More precisely, the MMSE
is similar to that of uncoded communications. Bad codes,
however, typically render a substantially lower MMSE.
A relay channel.
Measson et al. [11] examined the MMSE of LDPC

codes at SNR values below the codes threshold (the
minimum SNR required for reliable communications). Peleg et al. [15] were also interested in communications at
such SNRs, and considered the average extrinsic information
(explained in [15]) of good codes. In [11], [15], the authors
were interested in an analysis of the MMSE in the context
of its usefulness in designing good LDPC codes at higher
levels of SNR. Thus, the authors were interested in good
codes, rather than bad codes which, as mentioned above,
are the focus of this paper.
Our main contribution in this paper is rigorous bounds
on the MMSE of LDPC codes as a function of the codes
parameters. Our rst bound is based on an analysis of the
LDPC belief-propagation decoding algorithm. This algorithm
can be made to produce soft estimates on the values of the
transmitted codebits, rather than hard decisions. The resulting
mean square estimation error is an upper bound on the
minimum achievable error, the MMSE. Rigorous bounds on
the performance of belief-propagation, at the limit of large
block lengths, are possible using density evolution, as dened
by Richardson and Urbanke [17].
A second bound on the MMSE is based on the results
of Guo et al. [9], which relate the MMSE to the mutual
information I(X; Y) between the random transmitted codeword X and the received vector Y. Gallager [7] was the
rst to obtain bounds on a similarly dened I(X; Y) over
the binary symmetric channel (BSC). These bounds were
extended by Burshtein et al. [2] and later by Wiechman
1065
ThB6.4
and Sason [22] to arbitrary binary-input symmetric-output
channels, including binary-input AWGN channels. The results of Guo et al. [9], however, involve the derivative
of I(X; Y) with respect to the SNR. None of the abovementioned bounds has been proven to be tight, and thus their
derivatives cannot straightforwardly be applied to obtain a
bound on the MMSE. In this paper, we nonetheless extend
the method of [22] to bound the MMSE. Our bounds are
conned to regular LDPC codes (see [17]), and an analysis
of irregular codes is deferred to future work.
We begin in Sec II by introducing our notation and some
background. In Sec III we present our main results, namely
bounds the MMSE of LDPC codes. We also present a
theorem bounding the MMSE of good codes, which is a
variation of a theorem of Peleg et al. [15], and provide
numeric results comparing bad LDPC codes to good
codes. In Sec. IV we provide the proof of the validity of
our second bound. Finally, Sec. V concludes the paper.
II. BACKGROUND AND N OTATION
A. General Notation and Denitions
Vectors are denoted in boldface (e.g. x) and scalar values
in normal face (e.g. x). Random variables are upper cased
(e.g. X) and their realizations are lower cased (e.g. x). a
b denotes the bitwise XOR between two binary variables.
a b denotes the component-wise XOR between two binary
vectors. For p, q [0, 1] we dene pq =p(1q)+q(1p).

denotes the Euclidean norm.
In this paper, we consider the transmission of binary
codes over the AWGN channel. We assume that BPSK
signals are used, with the digits 0 and 1 being mapped
to 1 and 1, respectively. X is assumed to be over the
BPSK alphabet, {1}. For a given value of X, we let
X b denote its corresponding binary representation, using the
above mentioned mapping. Greek-alphabet characters (e.g. )
are assumed to be dened over the binary alphabet {0, 1},
unless dened otherwise (e.g. will be real-valued).
CB (snr) is the capacity, in nats, of the binary-input AWGN
channel2 . whose equation is given by,
Y = snr X + N
(1)
where Y is the channel output, X is as dened above, snr 0
and N N (0, 1).
Throughout the paper, we will also be interested in transmission of vectors X over the channel. That is,
Y = snr X + N
(2)
where X is an n-dimensional vector with BPSK-valued
components, N is a Gaussian vector with independent components, Ni N (0, 1) and Y is the channel output.
Throughout the paper, we follow the convention of [9]
and use upper-case and to refer to MMSE and SNR in a
general context, and lower-case (e.g. snr and mmse) to refer

to specic values.
B. Good Codes
The concept of good codes is central to our analysis.
In [18], [15] good codes were dened as being capable
of transmission at rates arbitrarily close to capacity, at
vanishingly low probability of error. Since transmission with
any xed code involves a nonzero probability of error, the
formal denition involves sequences of codes.
In this paper, we use the following denition. Unlike [18],
our denition focuses on codes that are within of channel
capacity, but may not necessarily achieve it. The motivation
for this denition is that analysis of LDPC codes frequently
focuses on sequences of codes that have a xed, given, edge
distribution, corresponding to a xed code rate3 . Although
such code sequences can be found at rates arbitrarily close
to capacity, with vanishingly low probability of error [12],
the rate of any one such sequence is bounded away from
capacity.
Denition 1: A sequence of BPSK codes {C (n) }
n=1 is
-good at an SNR of snr if the following hold,
1) For each n, the code rate Rn of C (n) is greater or
equal to CB (snr)
2) The probability of decoding error, when the codes are
used over a BPSK AWGN channel with an SNR of
snr, approaches zero with n.
C. MMSE and Mutual Information
We begin with the following denition, which focuses on
an arbitrary random variable X.
Denition 2: Given an n-dimensional random variable X,
and a nonnegative real value snr,

mmse(X, snr) = E X(Y)

X2
where Y is related to X through (2) and X(Y)

is the MMSE
estimate of X given Y.
In this paper, we wish to analyze the MMSE when estimating
the transmitted codeword over a channel. We therefore dene
the following value.
Denition 3: Given a code C, the values mmse(C, snr)
is dened to equal mmse(X, snr) where X is uniformly
distributed within the codewords of C.
In our analysis, we will compare the MMSE of various codes
with that of uncoded transmission. We will thus be interested
in the following value,
Denition 4: The
uncoded
MMSE,
denoted
mmse(uncoded, snr), is dened to equal mmse(X, snr)
where X is scalar, and takes the values 1 and 1 with equal
probability.
2 Not to be confused with the power-constrained AWGN channel, whose

capacity is 1/2 log(1 + snr)
1066
3 More
precisely, a xed design rate, see [17].
ThB6.4
In [9][Equation (17)] it is shown than,

mmse(uncoded, snr) = 1 E tanh(snr snr N )
(3)
where N N (0, 1).

As mentioned in Sec. I, our second bound on the MMSE
is based on the following theorem, due to Guo et al. [9].
Theorem 1: [9][Theorem 2] Let X and Y be dened as
in Denition 2. Assume that the expectation EX2 exists
and is nite. Then,
d
1
mmse(X, snr) =
I(X; Y)
(4)
2
d snr
Theorem 1 relates the MMSE to the mutual information
when transmitting over a channel. It is thus useful to dene
I(C, snr) to equal I(X; Y) where X and Y are dened as
in Denition 3. Similarly, we dene I(uncoded, snr) to equal
I(X; Y ) where X and Y dened as in Denition 4.
Remark 1: The binary-input AWGN channel is symmetric
in the sense of [8][Page 94]. Thus, its capacity achieving
distribution is uniform, and therefore I(uncoded, snr) =
CB (snr).
of a code selected at random from an ensemble of codes.

Specically, the ensemble of LDPC codes that are dened
using regular bipartite graphs (see e.g. [17]). This average
performance is dependent on the parameters j and k which
characterize the graphs5 .
Theorem 2: Consider the ensemble of LDPC codes of
length n, based on (j, k)-regular bipartite graphs. Assume
a randomly chosen code from this ensemble is used to
transmit over a Gaussian channel with SNR = snr. Let
BP(j, k, snr) denote the mean square estimation error, when
using belief-propagation algorithm, averaged over all codes
in the ensemble. Let DE(j, k, snr) be the corresponding value
as predicted by density evolution. Let mmse(j, k, snr) be
the MMSE (i.e., corresponding to an optimal estimation
algorithm), similarly averaged over all codes in the ensemble.
Then the following holds,
1
mmse(j, k, snr)
n
1
BP(j, k, snr)
n

1
1
DE(j, k, snr) + O
n
n
III. A NALYSIS OF THE MMSE

A. A Bound Based on the Analysis of Belief-Propagation
Our rst bound is based on analysis of the beliefpropagation decoding algorithm for LDPC codes (Gallager [7]). A simple modication of the algorithm enables the
computation of a soft estimate, rather than a hard decision, on
each of the transmitted bits. This algorithm is not guaranteed
to be optimal, and thus its analysis produces an upper bound
on the performance of an optimal estimation algorithm.
Belief-propagation works by computing, for each transmitted code bit i, an estimate pi of the a posteriori probability
(APP) that the bit was 1, given all the channel outputs4 .
Typically, the estimates p1 , ..., pn (n is the block length) are
used to compute hard decisions on the values of the bits.
However, these values can also straightforwardly be used to
1 , ...X
n of the values of the bits.
compute soft estimates X
That is,
i =
X
(+1) pi + (1) (1 pi )
(5)
Each value pi is a function of the random channel outputs. As such, it is a random variable. Richardson and
Urbanke [17] suggested an efcient algorithm, called density
evolution, to compute the distribution of this random variable.
The algorithm can thus be extended to obtain the distribution
i , and of the mean square error of this estimate.
of X
The precise application of density evolution requires a
number of assumptions, which are justied in Appendix I.
Relying on these assumptions, we obtain the following
theorem. This theorem considers the average performance
4 Typically, at each iteration of belief-propagation, several different messages corresponding to each code bit, are computed, one for each outgoing
edge in the LDPC bipartite graph (see e.g. in [7], [17] for a detailed description of the algorithm). The messages are different because of the extrinsic
information rule. However, at the very last iteration of the algorithm, this
rule needs not be obeyed, and thus one value per code bit is available.
B. A Bound Based on the Derivative of the Mutual Information

Our second bound is based on the approach of Guo et al. ,
as described in Sec. I. Its proof is provided in Sec. IV. This
bound is stated in the theorem below. In this theorem, we
have used LDPC codes as dened by Gallager [7], which we
refer to as the Gallager LDPC ensemble. For completeness,
the details of this ensemble are provided in Appendix II.
Theorem 3: Let C be a code from the (j, k)-regular Gallager LDPC ensemble, of block-length n. Then:
1
mmse(C, snr) mmse(uncoded, snr)
n
1
d
2
gp (snr)k
k p=1 2p (2p 1) d snr
(6)
where,
gp (snr)

E tanh2p (snr + snr N )
(7)
where N N (0, 1).

The derivative of gp (snr)k is easily computed to be (8) on
the top of the next page.
In numerical computations of (6), it is possible to compute
only a nite number of the elements of the innite sum it
involves. The following lemma implies that the neglected
elements are nonnegative, and so the computed value remains
a valid upper bound. Its proof is provided in [1].
Lemma 1: gp (snr)k is a non-descending function of snr
(i.e., its derivative is nonnegative).
5 In this paper, we focus on regular LDPC codes. The extension of the
theorem to irregular codes is immediate.
1067
ThB6.4
d
gp (snr)k
dsnr

N
tanh2p1 (snr + snr N ) sech2 (snr + snr N )
= k gp (snr)k1 E 2p 1 +
2 snr
C. Good Codes
(8)
2) If snr > snr , then 1/n mmse(C (n) , snr) approaches

zero with n.
Part 2 of this theorem is obtained from Denition 1, by
upper bounding the MMSE with the mean-square error
when maximum-likelihood decoding is used as suboptimal
estimation, and observing that the mean-square error, prior to
normalization by n, is upper bounded by 4 n (the maximum
square-distance between any two words).
D. Numerical Results
In Fig. 2, we used Theorems 2, 3 and 4 to compare
bad LDPC codes to good codes. By [12], [22], good
LDPC codes are characterized by large parameters j and
k. In Fig. 2, we therefore considered LDPC (2,4) codes,
i.e. characterized by low values of these parameters. In
our application of the bounds of Theorems 2 and 4, we
have assumed innitely large n, and neglected (which, in
Theorem 4, can be made arbitrarily small).
When examining the performance of good codes, the
sharp transition at snr = 1.044, as prescribed by Theorem 4,
may be observed. Our tightest bound on the MMSE of LDPC
(2,4) codes was obtained using Theorem 2, relying on beliefpropagation analysis. At an SNR of 1.04, the MMSE of
LDPC (2,4) (normalized by n), is upper bounded by 0.248.
The equivalent value for good codes of the same rate, by
comparison, is 0.437, at least 76% higher than LDPC (2,4).
IV. P ROOF OF T HEOREM 3
A. Review of the Approach of Wiechman and Sason [22]
Theorem 1 enables us to shift our attention from
mmse(C, snr) to the derivative of I(C, snr). As mentioned
LDPC(2,4), beliefpropagation bound

LDPC(2,4), mutualinformation bound
"Good" codes, rate 1/2
0.9
0.8
0.7
1/n MMSE
Theorem 3 relates the MMSE of LDPC codes, to that

of uncoded transmission. Theorem 4 below, focuses on the
MMSE of good codes, and similarly relates it to that of
uncoded transmission. As noted in Sec. I, Peleg et al. [15],
examined the average extrinsic information of good codes.
They showed that this information approaches zero as the
rate of the codes approaches the channel capacity and the
probability of error approaches zero. Theorem 4 is a variation
of this result, which focuses on the MMSE and on our
slightly different denition of good codes. The proof of
the theorem is provided in [1].
Theorem 4: Let {C (n) } be an -good sequence of linear
codes, at an SNR of snr . Assume that the block length of
C (n) is n. Then the following holds:
1) If snr snr 8 and n is large enough, then:
1
mmse(uncoded, snr) 8
mmse(C (n) , snr)
n
mmse(uncoded, snr)
0.6
0.5
0.4
0.3
0.2
0.1
0
0.2
0.4
0.6
0.8
1.2
1.4
SNR
Fig. 2. Comparison between the MMSE of LDPC (2,4) codes and good
codes of the same rate.
in Sec. I, our analysis involves modifying the method of

Wiechman and Sason [22], which focused on a similarly
dened I(C, snr). We thus begin by reviewing their analysis.
We repeat part of this analysis here, rst for completeness
and to review some useful notation. Furthermore, in the
analysis of [22], a number of inequalities can easily be
strengthened to become equalities. Such strengthening, which
was inconsequential to their results, is crucial to ours, which
involve bounding the derivative of I(C; snr) rather than its
value.
Let X and Y be dened as in Denition 3. Given a
channel output Yi , i = 1, ..., n, we dene random variables
as [22][Proposition 4.1].
Yi > 0
0,
1,
Yi < 0
i = |2 snr Yi |, i =
(9)
0 or 1 w.p. 12 , Yi = 0.
where |x| denotes the absolute
value of x. Note that with the
channel dened by (1), 2 snr Yi coincides with the loglikelihood-ratio (LLR) value computed by the LDPC decoder
in its rst decoding iteration [22]. Thus, our denitions of
i and i coincide with those of [22]. We also dene Yi =
involves no loss
(i , i ). Clearly, switching from Y to Y
of information, and in the sequel we assume that the channel
output is Y.
Following [22], we also observe that i is independent
of the transmitted X. Combined with the Markov relation
{j }j=i X Xi Yi i , this implies that he random
variables i , i = 1, ..., n are independent of one another.
Given a xed i = i , the transition probabilities from
Xib (recall that Xib is the binary representation of the BPSKvalued Xi , see Sec. II-A) to i are equal to those of a binarysymmetric channel (BSC) with crossover probability p(i ),
1068
ThB6.4
where,

1
p() =
1 tanh
(10)
2
2
We can thus dene the noise i , which is a binary
random variable, dependent on i and distributed as i
Bernoulli(p()), such that i = Xi i . Using similar
arguments to the one involving the components of , the
random variables i , i = 1, ..., n are also independent of
one another.
In the sequel, we will use normal face (rather than bold
face) to correspond to uncoded transmission. That is, X and
Y are dened as in Denition 4. Y , , , are obtained
, , were
from X and Y in exactly the same way as Y,
obtained from X and Y.
We are now ready to examine I(C, snr).
= H(Y)
H(Y
| X) (11)
I(C, snr) = I(X; Y) = I(X; Y)
We now argue that,

b | S, ) = nR.
H(X
In doing so, we again strengthen an argument that was made

b = r(S) where r(S) is
in [22]. We may dene X
some representative of the coset of C corresponding to the
syndrome S. We now observe that = Xb . Since Xb
is a codeword, we have H Xb = 0 and thus S = H .
Thus, we obtain,
b
X
| X)
H(Y
n

=
=
H(Yi | Xi )
n H(Y | X)

n H(Y ) I(X; Y )
n H(Y ) n CB (snr)
(d)
n (H() + H( | )) n CB (snr)
(e)
n (H() + 1) n CB (snr)
(12)
(a) follows by the Markov relation X Xi Yi . (b) follows

by the fact that the channel X Yi is symmetric, and thus
H(Yi | Xi ) = H(Yi | Xi = 1) = H(Yi | Xi = 1) =
H(Y | X). (c) follows by Remark 1 (Sec. II-C). (d) follows
by the denition of Y = (, ), and nally (e) follows by
the fact that given , is the output of a BSC whose input
is X b . X b Bernoulli(1/2) and thus Bernoulli(1/2).
Note that the above analysis strengthens a similar analysis
in [22] (the derivation leading to Equation (60) in that paper).
Turning to the rst term in (11), we obtain,
H(Y)
(a)
H() + H( | )
(b)
n H() + H( | )
(13)
where is dened as above. (a) follows by the denition

= (, ). (b) follows from the above mentioned
of Y
independence between the components of .
Combining (11), (12), and (13), we obtain:
I(C, snr) = n CB (snr) + H( | ) n
b , S | )
H(X
b | S, ) + H(S | )
H(X
Xb ( r(H ))
(17)
B. Analysis of H(S | )
We now focus to the second term in (17). We assume the
syndrome components S = [S1 , ..., SL ] are ordered as in
Appendix II. Let S(1) = [S1 , ..., Sn/k ] denote the syndromes
that correspond to the rst submatrix in Fig. 3, and S(2) =
[Sn/k+1 , ..., Sjn/k ] denote the rest of the syndromes. Thus
we may write,
H(S | ) =
=
(15)
H(S(1) , S(2) | )
H(S(1) | ) + H(S(2) | S(1) , ) (18)
We begin with the rst term in (18). By the above discussion,

S = H . Letting H(1) denote the rst submatrix in the
construction of H (Appendix II), we obtain that S(1) =
H(1) . Thus we obtain,
Si =
ik

m ,
i = 1, ..., n/k
(19)
m=(i1)k+1

where
denotes modulo-2 sum. The syndrome components
{Si }i=1,...,n/k are functions of distinct sets of independent
random variables {m }, and thus they are independent. We
therefore obtain,
(14)
We now focus on H( | ). Following Gallager [7], we

b ) (the
observe that can be uniquely represented as (S, X
b
and X
is explained in Sec. II-A), where
relation between X
S = H is the syndrome vector, H is the parity check matrix
b is a codeword (not to
of the LDPC code in question, and X
be confused with the true transmitted codeword Xb ). Thus,
H( | ) =
=
The analysis of [22] now proceeds to bound H(S | ) by

nj/k
i=1 H(Si | ). In this paper, we follow a different course
of action, designed to bound the derivative of I(C, snr) rather
than its value.
(c)
r(H )
I(C, snr) = n CB (snr) + H(S | ) n(1 R)
i=1
(b)
Xb is independent of , of (as discussed above) and

consequently of S = H . Since X is uniformly distributed
in C we obtain that given the values of and of S = H ,
too is uniformly distributed in C. Thus H(X|S,
X
) = nR.
Combining (14), (15) and (16) we obtain,
We begin by examining the second of the two terms.

(a)
(16)
H(S(1) | ) =
n/k

H(Si | )
(20)
i=1
Following in direct lines as in [22][Equation (65)] we obtain6 ,
k
1
E tanh2p
(21)
H(Si | ) = 1
2p
(2p
1)
2
p=1
6 A ln 2 factor that appears in [22][Equation (65)] was removed, because
in the context of our discussion, I(X; Y) is evaluated in nats rather than
bits as in [22].
1069
ThB6.4
We now observe that,

(a)
2p
E tanh
=
2
(b)
(c)
(d)
(e)

|2 snr Y |
2p
E tanh
2

2p
E tanh (|snr X + snr N |)

E tanh2p (|snr 1 + snr N |)

E tanh2p (snr 1 + snr N )

gp (snr)
(22)
where (a) follows by (9), (b) follows by the channel equation (1). (c) follows by the symmetry of the distribution of
N , and by the fact that X {1}. (d) follows by the fact
that tanh is an odd function, and (e) follows by the denition
of gp (snr) (7). Combining (20), (21) and (22), we obtain:

n
1
(1)
k
H(S | ) = 1
(23)
gp (snr)
k
2p (2p 1)
p=1
Combining (17), (18) and (23), we obtain,
I(C, snr) = n CB (snr)

n
1
k
+ 1
gp (snr)
k
2p (2p 1)
p=1
(2)
+H(S
(1)
|S
in (25) determines the distributions of {1 , ..., n }. That

is, i Bernoulli(p(i )) where p() was dened by (10).
Consequently, it also determines the distributions of the
syndromes S1 , ..., Sjn/k .

We let S1 , ..., Sjn/k
, S(1) , S(2) and 1 , ..., n denote random variables conditioned on (1 , ..., n ) =

(1 , ..., n ). We similarly dene S1 , ..., Sjn/k
, S(1) ,
S(2) and 1 , ..., n to correspond to a conditioning on
(1 , ..., n ) = (1 , ..., n ).
For each i = 1, ..., n, since i i 0 (by the
conditions of the lemma), we have by (10), 0 < p(i )
p(i ) 1/2. Therefore, the random noise variable i is
stochastically degraded with respect to i , in the sense that
i , such that
we may dene an independent random variable

i i is identical distributed with i . In the sequel we will

i,
assume, with slight abuse of notation, that i = i
i = 1, ..., n, where the random variables {i }i=1,...,n are

independent of one another and of {i }i=1,...,n .
1 , ...,
n ] and S
= [
= H .
With
We also dene

are
these denitions, we have S = S S, where S and S
independent of one another.
Finally, we are ready to examine the function f ().
f (1 , ..., n ) =
, ) n(1 R)
= H(S(2) | S(1) , (1 , ..., n ) = (1 , ..., n ))

(a)
= H(S(2) | S(1) )
Taking the derivative with respect to snr and multiplying by

2/n, we obtain,
(b)
(1) , S
(2) )
H(S(2) | S(1) , S
(c)
(2) | S(1) S
(1) , S
(1) , S
(2) )
= H(S(2) S
(1) , S
(2) )
= H(S(2) | S(1) , S
1 d
d
2
I(C, snr) = 2
CB (snr)
n dsnr
dsnr
1
2
d
gp (snr)k
k p=1 2p (2p 1) dsnr
= H(S(2) | S(1) )
(d)
2 d
+
(24)
H(S(2) | S(1) , )
n dsnr
By Theorem 1, the left hand side of the above equality equals
1/n mmse(C, snr). By Remark 1 and Theorem 1, the rst
term on the right hand side is equal to mmse(uncoded, snr).
Thus, to complete the proof, we must show that the derivative
of H(S(2) | S(1) , ) is non-positive. This will be the focus
of the following section.
C. Analysis of H(S(2) | S(1) , )
(2)
(1)
In this section, we show that H(S | S , ) is a nonascending function of snr, and thus its derivative is nonpositive. We begin by dening,
= H(S(2) | S(1) , (1 , ..., n ) = (1 , ..., n ))

= f (1 , ..., n )
(e)
Steps (a) and (e) follow by the denitions of S(1) , S(2)

and S(1) , S(2) , respectively. Step (b) follows by the fact that
conditioning can only reduce entropy. In step (c), we have
applied an invertible transformation on S(1) and S(2) (that
(1) , S
(2) are given). In step (d), we
is, it is invertible when S
have exploited the fact that S(1) and S(2) are independent
(1) and S
(2) . This concludes the proof of the lemma.
of S
Recall that our objective is to show that H(S(2) | S(1) , ) is
a non-ascending function of snr. By (25), we have:
H(S(2) | S(1) , ) = Ef (1 , ..., n )

(27)
f (1 , ..., n ) = H(S(2) | S(1) , (1 , ..., n ) = (1 , ..., n )) (25)By (9) and (1), i = |2snr Xi + 2 snr Ni |. It is thus
clear that with increasing snr, larger values of i have
We proceed by proving the following lemma,
larger probability. Thus, at a rst glance it would appear
(2) (1)
Lemma 2: Let 1 , ..., n and 1 , ..., n be non-negative from (27) and Lemma 2 that H(S |S , ) is a descending

function of snr, as desired. However, Lemma 2 requires
real values, such that i i for all i = 1, ..., n. Then,
a simultaneous condition on all of the arguments of f ()
f (1 , ..., n ) f (1 , ..., n )
(26) for (26) to be valid.
To obtain our desired result, we apply the following
Proof: Since S = H (as discussed above),
each syndrome is a sum of random variables from the set technique. Let F (; snr) denote the cumulative distribution
{1 , ..., n }. The condition (1 , ..., n ) = (1 , ..., n ) function (CDF) of i , for a xed given value of snr (F () is
1070
ThB6.4
independent of i because the components of are identically
distributed). We will let i = F (i ; snr). Since i is a

continuous random variable, i is uniformly distributed in
[0, 1] (see e.g. [14]). Furthermore, F (; snr) is an invertible
function of in the range 0, and thus we may write,

H(S(2) | S(1) , ) = Ef F 1 (1 ; snr), ..., F 1 (n ; snr) (28)
where F 1 (; snr) is the inverse of F (; snr) with respect
to . In Appendix III we prove that F 1 (; snr) is a nondescending function of snr for xed . Thus, if snr > snr
then,
F 1 (i ; snr ) F 1 (i ; snr ),
i = 1, ..., n
And therefore, we may apply Lemma 2 to obtain

f F 1 (1 ; snr ), ..., F 1 (n ; snr )

f F 1 (1 ; snr ), ..., F 1 (n ; snr )
Recall that the distibutions of 1 , ..., n are invariant to
snr. Therefore, taking the expectations of both sides of
this inequality, and invoking (28), we will have obtained
that H(S(2) | S(1) , ) at snr is less than or equal to its
value at snr . This would imply that H(S(2) | S(1) , ) is
non-ascending, as desired. Combined with the discussion
following (24), this concludes the proof of Theorem 3.
V. C ONCLUSION
Our analysis indicates that codes that are optimal for pointto-point communications (good codes) are not optimal in
terms of the MMSE at a node which experiences low levels of
SNR (e.g., a relay). In the context of the specic cooperative
communication scenario of Sec. I, this implies that other
codes, which are typically considered bad, may potentially
outperform them, or narrow the gap to them (without sacricing communication rates). The ability to compensate for
bad point-to-point performance by better MMSE at a relay
constitutes a new potential degree of freedom, in the design
of codes for cooperative communications.
In this context, an important open question involves the
benet of MMSE estimation, in terms of the rate at the
destination. MMSE estimation of codes over a BPSK alphabet is nonlinear. Thus, the resulting estimation error is
not independent of the transmitted codeword, rendering the
analysis hard. Further research will focus on this problem.
In this paper, we have focused on bad LDPC codes.
However, other non-capacity-achieving codes, like convolutional, Reed-Muller and Reed-Solomon codes, may potentially exhibit similar advantages. Such codes provide many
benets in terms of delay and decoding complexity. Thus, an
analysis of their application to cooperative communication
with MMSE estimation at the relay, is of practical interest.
Lastly, recently an interest has risen in structured codes,
following the pioneering works of Philosof and Zamir [16]
and Nazer and Gastpar [13]. In their work, they compared
structured codes to random ones (good codes are frequently
random). However, structured codes are not synonymous
with bad ones. In fact, they are frequently capacityapproaching (see e.g. [5]) and therefore good in the sense
of Denition 1. Thus, the focus of the discussions of [16],
[13] is very different from ours.
A PPENDIX I
A PPLICATION OF D ENSITY E VOLUTION
We now discuss two conditions, which need to be satised
in order for density evolution to be valid. That is, for
the distribution computed by density evolution to correctly
correspond to the distribution of each of the pi , as dened
in Sec. III-A.
The rst condition is known as the all-one codeword
assumption. With density-evolution, the distribution of each
of the values pi that is computed, is conditioned on the
event that the transmitted codeword was the all-one codeword
(assuming BPSK representation of signals, see Sec. II-A). In
reality, however, the transmitted codeword may be different.
Nevertheless, Richardson and Urbanke [17] have shown that
the probability of error, when belief-propagation decoding is
used, is the same for all transmitted codewords. Thus, an
analysis of the algorithm for one particular case is sufcient.
A simple modication of [17][Lemma 1] allows us to extend
the all-ones codeword assumption to an analysis of the meansquare estimation error as well.
The second condition involves loops. Density-evolution
becomes invalid if the neighborhood graph, according to
which pi is computed, contains a loop [17]. However,
the probability that this occurs for any particular pi , in a
randomly chosen code from an LDPC bipartite ensemble,
diminishes as O(1/n) (n is the block length). Furthermore,
by (5), the square estimation error is upper bounded by 4.
Thus, the mean-square error as predicted by density evolution
is correct up to an additive term of order O(1/n).
A PPENDIX II
T HE (j, k)-R EGULAR G ALLAGER LDPC E NSEMBLE
A Gallager (j, k)-regular LDPC matrix is dened for positive integers j and k. Its parity-check matrix is obtained [7]
by combining j submatrices (see Fig. 3), constructed as
follows. The rst submatrix is given by Fig 4. That is,
in the ith row, columns (i 1) k + 1, ..., i k are 1,
and all the rest are zero. Each of the other submatrices
(numbered 2, ..., j) is obtained by applying a permutation
l l = 2, ..., j on the columns of the rst submatrix. The
Gallager ensemble contains all codes whose parity-check
matrices were constructed this was, i.e., each one corresponds
to some choice of permutations 2 , ..., j .
Examining this construction, it is easy to observe that the
weight (number of ones) in each row is k, and the weight of
each column is j. Each submatrix contains n/k rows. The
design rate of the LDPC code dened by this parity-check
matrix is R = 1 j/k. The true rate (see [7]) is guaranteed
to be at least R.
1071
ThB6.4
R EFERENCES
Fig. 3.
A Gallager LDPC code.
Fig. 4.
The rst submatrix in a Gallager LDPC code.
A PPENDIX III
A NALYSIS OF F 1 (; SNR)
In this appendix, we prove that F 1 (; snr), dened in
Sec. IV-C, is a non-descending function of snr for xed .
To do so, we rst examine F (; snr) in the range
0. As
noted in Sec. IV-C, by (9) and (1), i = |2snr Xi + 2 snr
Ni |. Using similar arguments to the ones leading to(22), the
distribution of i is identical to that of |2snr + 2 snr N |
where N N (0, 1).
F (; snr)
Pr[ ]

/2 snr
/2 snr
= Q
Q
snr
snr
where Q() is the standard normal Q-function. Taking the

derivative with respect to snr, and applying simple algebraic
manipulations, we obtain

d
1 2 + snr2
1
exp
F (; snr) =
dsnr
2
snr
2 2 snr3/2
( cosh(/2) + 2snr sinh(/2))
The above derivative is clearly non-positive in the range
0. Thus, F (; snr) is non-ascending as a function of snr in
this range. We now turn to F 1 (; snr), for xed . Let
snr snr , = F 1 (; snr ) and = F 1 (; snr ). We
wish to show that . Now,
(c)
F ( ; snr ) = = F ( , snr ) F ( ; snr )

(a)
(b)
Equalities (a) and (b) follow from the denitions of and

. Inequality (c) follows from the above-proven fact that
F (; snr) is non-ascending as a function of snr. The function
F (; snr ) is strictly ascending as a function of in the range
0. Thus, as desired.
ACKNOWLEDGEMENTS
[1] A. Bennatan, A. R. Calderbank and S. Shamai (Shitz) Bounds on

the MMSE of Bad LDPC Codes at Rates Above Capacity, to be
submitted, IEEE Trans. Inf. Theory.
[2] D. Burshtein, M. Krivelevich, S. Litsyn, and G. Miller, Upper bounds
on the rate of LDPC codes, IEEE Trans. Inf. Theory, vol. 48, no. 9,
pp. 2437-2449, Sep. 2002.
[3] T. M. Cover and A. A. El Gamal, Capacity theorems for the relay
channel, IEEE Trans. Inf. Theory, vol. IT-25, no. 5, pp. 572-584, Sep.
1979.
[4] T.M. Cover and J.A. Thomas, Elements of Information Theory, John
Wiley and Sons, 1991.
[5] U. Erez, S. Litsyn and R. Zamir, Lattices which are good for (almost)
everything, submitted for publication IEEE Trans. on Inform. Theory,
October 2005.
[6] G. D. Forney, Exponential error bounds for erasure, list, and decision
feedback schemes, IEEE Trans. Inf. Theory, vol. 14, no. 27, pp. 206
2202, Mar. 1968.
[7] R. G. Gallager, Low Density Parity Check Codes, M.I.T Press, Cambridge, Massachusetts, 1963.
[8] R. G. Gallager, Information Theory and Reliable Communication, John
Wiley and Sons, 1968.
[9] D. Guo, S. Shamai and S. Verdu, Mutual information and minimum
mean-square error in Gaussian channels, IEEE Trans. Inf. Theory,
vol.51, no.4, pp. 12611282, April 2005.
[10] G. Kramer, M. Gastpar, and P. Gupta, Cooperative strategies and
capacity theorems for relay networks, IEEE Trans. Inf. Theory, vol.
51, no. 9, pp. 3037-3063, Sep. 2005.
[11] C. Measson, A. Montanari, T. Richardson and R. Urbanke, Life
Above Threshold: From List Decoding to Area Theorem and MSE,,
2004 IEEE Inf. Theory Workshop (ITW 2004), San Antonio, Oct. 24-29
2004.
[12] G. Miller and D. Burshtein, Bounds on the Maximum Likelihood
decoding error probability of LDPC codes, IEEE Trans. Inf. Theory,
vol. 47, no. 7, pp. 26962710, Nov. 2001.
[13] B. Nazer and M. Gastpar, Compute-and-Forward: Harnessing Interference with Structured Codes, 2008 International Symposium on
Information Theory (ISIT2008), Toronto, Ontario, Canada, July 611,
2008.
[14] A. Papoulis and S. U. Pillai, Probability, Random Variables and
Stochastic Processes, 4th Edition, McGraw-Hill.
[15] M. Peleg, A. Sanderovich and S. Shamai (Shitz), On Extrinsic
Information of Good Codes Operating Over Gaussian Channels,
European Trans. Telecommunications, Vol. 18, No. 2, 2007, pp. 133
139.
[16] T. Philosof and R. Zamir, The Rate Loss of Single-Letter Characterization: The Dirty Multiple Access Channel, submitted to the IEEE
Trans. Inf. Theory, Mar. 2008, available at arXiv:0803.1120v3 [cs.IT].
[17] T. Richardson and R. Urbanke, The capacity of low-density parity
check codes under message-passing decoding, IEEE Trans. Inform.
Theory, vol. 47, pp. 599618, February 2001.
[18] S. Shamai (Shitz) and S. Verdu, The Empirical Distribution of Good
Codes, IEEE Trans. on Inform. Theory, vol. 43, no. 3, pp. 836846,
May 1997.
[19] J. M. Shea, T. F. Wong, A. Avudainayagam and X. Li, Collaborative
Decoding on Block Fading Channels, to appear in IEEE Trans.
Communications.
[20] Harold H. Sneessens and Luc Vandendorpe, Soft Decoding and Forward Improves Cooperative Communications, 6th IEE International
Conference on 3G and Beyond, Nov 79, 2005.
[21] Y. Li, B. Vucetic, T. F. Wong and M. Dohler, Distributed Turbo
Coding With Soft Information Relaying in Multihop Relay Networks,
IEEE Journal on Selected Areas in Communications, Vol. 24, No. 11,
pp. 20402060 , November 2006.
[22] G. Wiechman and I. Sason, Parity-Check Density Versus Performance
of Binary Linear Block Codes: New Bounds and Applications, IEEE
Trans. on Inform. Theory, Vol. 53, no.2, pp. 550579, Feb. 2007.
[23] S. Yang and R. Koetter, Network Coding over a Noisy Relay: a
Belief Propagation Approach, 2007 IEEE International Symposium
on Information Theory (ISIT2007), Nice, France , June 2429, 2007.
[24] B. Zhao and M C. Valenti, Distributed turbo coded diversity for relay
channel, Electronic Letters, vol. 39, pp.786787, May 2003.
The work of Shlomo Shamai was supported by the USIsrael Binational Science Foundation.
1072

Bounds On The MMSE of "Bad" LDPC Codes at Rates Above Capacity

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Bounds On The MMSE of "Bad" LDPC Codes at Rates Above Capacity

Hochgeladen von

Copyright:

Verfügbare Formate

Forty-Sixth Annual Allerton Conference

Allerton House, UIUC, Illinois, USA

Bounds on the MMSE of Bad LDPC Codes at

Abstract We present bounds on the minimum mean square

978-1-4244-2926-4/08/$25.00 2008 IEEE

were suggested by [24], [21], [23], [19], [20]). Interestingly,

Measson et al. [11] examined the MMSE of LDPC

vectors. For p, q [0, 1] we dene pq =p(1q)+q(1p).

general context, and lower-case (e.g. snr and mmse) to refer

mmse(X, snr) = E X(Y)

where Y is related to X through (2) and X(Y)

2 Not to be confused with the power-constrained AWGN channel, whose

precisely, a xed design rate, see [17].

mmse(uncoded, snr) = 1 E tanh(snr snr N )

where N N (0, 1).

of a code selected at random from an ensemble of codes.

III. A NALYSIS OF THE MMSE

B. A Bound Based on the Derivative of the Mutual Information

k p=1 2p (2p 1) d snr

E tanh2p (snr + snr N )

where N N (0, 1).

2) If snr > snr , then 1/n mmse(C (n) , snr) approaches

LDPC(2,4), beliefpropagation bound

Theorem 3 relates the MMSE of LDPC codes, to that

1) If snr snr 8 and n is large enough, then:

in Sec. I, our analysis involves modifying the method of

We now argue that,

In doing so, we again strengthen an argument that was made

(a) follows by the Markov relation X Xi Yi . (b) follows

where is dened as above. (a) follows by the denition

We begin with the rst term in (18). By the above discussion,

We now focus on H( | ). Following Gallager [7], we

The analysis of [22] now proceeds to bound H(S | ) by

I(C, snr) = n CB (snr) + H(S | ) n(1 R)

Xb is independent of , of (as discussed above) and

We begin by examining the second of the two terms.

Following in direct lines as in [22][Equation (65)] we obtain6 ,

E tanh2p (|snr 1 + snr N |)

E tanh2p (snr 1 + snr N )

in (25) determines the distributions of {1 , ..., n }. That

i i is identical distributed with i . In the sequel we will

i = 1, ..., n, where the random variables {i }i=1,...,n are

= H(S(2) | S(1) , (1 , ..., n ) = (1 , ..., n ))

Taking the derivative with respect to snr and multiplying by

= H(S(2) | S(1) , (1 , ..., n ) = (1 , ..., n ))

Steps (a) and (e) follow by the denitions of S(1) , S(2)

H(S(2) | S(1) , ) = Ef (1 , ..., n )

distributed). We will let i = F (i ; snr). Since i is a

And therefore, we may apply Lemma 2 to obtain

A Gallager LDPC code.

The rst submatrix in a Gallager LDPC code.

where Q() is the standard normal Q-function. Taking the

F ( ; snr ) = = F ( , snr ) F ( ; snr )

Equalities (a) and (b) follow from the denitions of and

[1] A. Bennatan, A. R. Calderbank and S. Shamai (Shitz) Bounds on

Das könnte Ihnen auch gefallen

mmse(X, snr) = E X(Y)