Beruflich Dokumente
Kultur Dokumente
Abstract— A method is proposed, called channel polarization, probabilities W (y|x), x ∈ X , y ∈ Y. The input alphabet X
to construct code sequences that achieve the symmetric capacity
arXiv:0807.3917v5 [cs.IT] 20 Jul 2009
will always be {0, 1}, the output alphabet and the transition
I(W ) of any given binary-input discrete memoryless channel (B- probabilities may be arbitrary. We write W N to denote the
DMC) W . The symmetric capacity is the highest rate achievable
subject to using the input letters of the channel with equal channel corresponding to NQuses of W ; thus, W N : X N →
N
probability. Channel polarization refers to the fact that it is Y N with W N (y1N | xN 1 ) = i=1 W (yi | xi ).
possible to synthesize, out of N independent copies of a given Given a B-DMC W , there are two channel parameters of
(i)
B-DMC W , a second set of N binary-input channels {WN : primary interest in this paper: the symmetric capacity
1 ≤ i ≤ N } such that, as N becomes large, the fraction of
(i) ∆
XX1 W (y|x)
indices i for which I(WN ) is near 1 approaches I(W ) and I(W ) = W (y|x) log 1
(i)
the fraction for which I(WN ) is near 0 approaches 1 − I(W ). y∈Y x∈X
2 2 W (y|0) + 12 W (y|1)
(i)
The polarized channels {WN } are well-conditioned for channel
coding: one need only send data at rate 1 through those with and the Bhattacharyya parameter
capacity near 1 and at rate 0 through the remaining. Codes ∆
Xp
constructed on the basis of this idea are called polar codes. The Z(W ) = W (y|0)W (y|1).
paper proves that, given any B-DMC W with I(W ) > 0 and any y∈Y
target rate R < I(W ), there exists a sequence of polar codes
{Cn ; n ≥ 1} such that Cn has block-length N = 2n , rate ≥ R, and These parameters are used as measures of rate and reliability,
probability of block error under successive cancellation decoding
1
respectively. I(W ) is the highest rate at which reliable com-
bounded as Pe (N, R) ≤ O(N − 4 ) independently of the code rate. munication is possible across W using the inputs of W with
This performance is achievable by encoders and decoders with equal frequency. Z(W ) is an upper bound on the probability
complexity O(N log N ) for each.
of maximum-likelihood (ML) decision error when W is used
Index Terms— Capacity-achieving codes, channel capacity, only once to transmit a 0 or 1.
channel polarization, Plotkin construction, polar codes, Reed- It is easy to see that Z(W ) takes values in [0, 1]. Through-
Muller codes, successive cancellation decoding.
out, we will use base-2 logarithms; hence, I(W ) will also take
values in [0, 1]. The unit for code rates and channel capacities
I. I NTRODUCTION AND OVERVIEW will be bits.
A fascinating aspect of Shannon’s proof of the noisy channel Intuitively, one would expect that I(W ) ≈ 1 iff Z(W ) ≈ 0,
coding theorem is the random-coding method that he used and I(W ) ≈ 0 iff Z(W ) ≈ 1. The following bounds, proved
to show the existence of capacity-achieving code sequences in the Appendix, make this precise.
without exhibiting any specific such sequence [1]. Explicit Proposition 1: For any B-DMC W , we have
construction of provably capacity-achieving code sequences 2
with low encoding and decoding complexities has since then I(W ) ≥ log , (1)
1 + Z(W )
been an elusive goal. This paper is an attempt to meet this
goal for the class of B-DMCs. p
I(W ) ≤ 1 − Z(W )2 . (2)
We will give a description of the main ideas and results of
the paper in this section. First, we give some definitions and The symmetric capacity I(W ) equals the Shannon capacity
state some basic facts that are used throughout the paper. when W is a symmetric channel, i.e., a channel for which
there exists a permutation π of the output alphabet Y such
A. Preliminaries that (i) π −1 = π and (ii) W (y|1) = W (π(y)|0) for all y ∈ Y.
The binary symmetric channel (BSC) and the binary erasure
We write W : X → Y to denote a generic B-DMC channel (BEC) are examples of symmetric channels. A BSC
with input alphabet X , output alphabet Y, and transition
is a B-DMC W with Y = {0, 1}, W (0|0) = W (1|1), and
E. Arıkan is with the Department of Electrical-Electronics Engineering, W (1|0) = W (0|1). A B-DMC W is called a BEC if for each
Bilkent University, Ankara, 06800, Turkey (e-mail: arikan@ee.bilkent.edu.tr) y ∈ Y, either W (y|0)W (y|1) = 0 or W (y|0) = W (y|1). In
This work was supported in part by The Scientific and Technological the latter case, y is said to be an erasure symbol. The sum
Research Council of Turkey (T UBITAK) under Project 107E216 and in part
by the European Commission FP7 Network of Excellence NEWCOM++ under of W (y|0) over all erasure symbols y is called the erasure
contract 216715. probability of the BEC.
2
3) Channel polarization:
y1 (i)
u1 + s1 v1 Theorem 1: For any B-DMC W , the channels {WN }
polarize in the sense that, for any fixed δ ∈ (0, 1), as N
u2 s2 v2 y2 goes to infinity through powers of two, the fraction of indices
(i)
.. .. .. .. i ∈ {1, . . . , N } for which I(WN ) ∈ (1 − δ, 1] goes to I(W )
. . . WN/2 . (i)
and the fraction for which I(WN ) ∈ [0, δ) goes to 1 − I(W ).
uN/2−1 sN/2−1 vN/2−1 yN/2−1 This theorem is proved in Sect. IV.
+
uN/2 sN/2 vN/2 yN/2
1
0.9
RN 0.8
sN/2+1
Symmetric capacity
uN/2+1 vN/2+1 yN/2+1 0.7
+
0.6
uN sN vN yN 0.1
0
1 256 512 768 1024
Channel index
WN
(i)
Fig. 4. Plot of I(WN ) vs. i = 1, . . . , N = 210 for a BEC with ǫ = 0.5.
Fig. 3. Recursive construction of WN from two copies of WN/2 .
We stated the polarization result in Theorem 2 in terms by setting ûAc = uAc , the real decoding task is to generate
(i) (i)
{Z(WN )} rather than {I(WN )} because this form is better an estimate ûA of uA .
suited to the coding results that we will develop. A rate of The coding results in this paper will be given with respect
(i)
polarization result in terms of {I(WN )} can be obtained from to a specific successive cancellation (SC) decoder, unless some
Theorem 2 with the help of Prop. 1. other decoder is mentioned. Given any (N, K, A, uAc ) GN -
coset code, we will use a SC decoder that generates its decision
ûN
1 by computing
C. Polar coding (
We take advantage of the polarization effect to construct ∆ ui , if i ∈ Ac
ûi = N i−1 (11)
codes that achieve the symmetric channel capacity I(W ) by hi (y1 , û1 ), if i ∈ A
a method we call polar coding. The basic idea of polar
in the order i from 1 to N , where hi : Y N × X i−1 → X ,
coding is to create a coding system where one can access
(i) i ∈ A, are decision functions defined as
each coordinate channel WN individually and send data only (i)
(i) WN (y1N ,ûi−1 |0)
through those for which Z(WN ) is near 0. ∆
0, if 1
≥1
hi (y1N , ûi−1
(i)
) = WN (y1N ,ûi−1 |1) (12)
1) GN -coset codes: We first describe a class of block codes 1
1,
1
otherwise
that contain polar codes—the codes of main interest—as a
special case. The block-lengths N for this class are restricted for all y1N ∈ Y N , ûi−1
1 ∈ X i−1 . We will say that a decoder
to powers of two, N = 2n for some n ≥ 0. For a given N , block error occurred if û1 6= uN
N
1 or equivalently if ûA 6= uA .
each code in the class is encoded in the same manner, namely, The decision functions {hi } defined above resemble ML
decision functions but are not exactly so, because they treat
xN N
1 = u1 GN (8)
the future frozen bits (uj : j > i, j ∈ Ac ) as RVs, rather than
where GN is the generator matrix of order N , defined above. as known bits. In exchange for this suboptimality, {hi } can be
For A an arbitrary subset of {1, . . . , N }, we may write (8) as computed efficiently using recursive formulas, as we will show
in Sect. II. Apart from algorithmic efficiency, the recursive
xN c
1 = uA GN (A) ⊕ uAc GN (A ) (9) structure of the decision functions is important because it
renders the performance analysis of the decoder tractable.
where GN (A) denotes the submatrix of GN formed by the
Fortunately, the loss in performance due to not using true
rows with indices in A.
ML decision functions happens to be negligible: I(W ) is still
If we now fix A and uAc , but leave uA as a free variable,
achievable.
we obtain a mapping from source blocks uA to codeword
3) Code performance: The notation Pe (N, K, A, uAc ) will
blocks xN 1 . This mapping is a coset code: it is a coset of denote the probability of block error for a (N, K, A, uAc )
the linear block code with generator matrix GN (A), with the
code, assuming that each data vector uA ∈ X K is sent
coset determined by the fixed vector uAc GN (Ac ). We will
with probability 2−K and decoding is done by the above SC
refer to this class of codes collectively as GN -coset codes.
decoder. More precisely,
Individual GN -coset codes will be identified by a parameter
vector (N, K, A, uAc ), where K is the code dimension and Pe (N, K, A, uAc ) =
∆
specifies the size of A.1 The ratio K/N is called the code rate. X 1 X
We will refer to A as the information set and to uAc ∈ X N −K WN (y1N |uN
1 ).
as frozen bits or vector. K
2K
uA ∈X y1N ∈Y N : ûN N N
1 (y1 )6=u1
For example, the (4, 2, {2, 4}, (1, 0)) code has the encoder
The average of Pe (N, K, A, uAc ) over all choices for uAc will
mapping
be denoted by Pe (N, K, A):
x41 = u41 G4 ∆
X 1
Pe (N, K, A) = Pe (N, K, A, uAc ).
1 0 1 0 1 0 0 0 2 −K
N
= (u2 , u4 ) + (1, 0) . (10) N −K
uAc ∈X
1 1 1 1 1 1 0 0
A key bound on block error probability under SC decoding
For a source block (u2 , u4 ) = (1, 1), the coded block is x41 = is the following.
(1, 1, 0, 1). Proposition 2: For any B-DMC W and any choice of the
Polar codes will be specified shortly by giving a particular parameters (N, K, A),
rule for the selection of the information set A. X (i)
2) A successive cancellation decoder: Consider a GN -coset Pe (N, K, A) ≤ Z(WN ). (13)
i∈A
code with parameter (N, K, A, uAc ). Let uN 1 be encoded into
a codeword xN 1 , let xN
1 be sent over the channel W N , and Hence, for each (N, K, A), there exists a frozen vector uAc
let a channel output y1N be received. The decoder’s task is to such that
generate an estimate ûN N
1 of u1 , given knowledge of A, uAc ,
X (i)
Pe (N, K, A, uAc ) ≤ Z(WN ). (14)
and y1N . Since the decoder can avoid errors in the frozen part i∈A
1 Weinclude the redundant parameter K in the parameter set because often This is proved in Sect. V-B. This result suggests choosing A
we consider an ensemble of codes with K fixed and A free. from among all K-subsets of {1, . . . , N } so as to minimize
5
the RHS of (13). This idea leads to the definition of polar Theorem 5: For the class of GN -coset codes, the complex-
codes. ity of encoding and the complexity of successive cancellation
4) Polar codes: Given a B-DMC W , a GN -coset code with decoding are both O(N log N ) as functions of code block-
parameter (N, K, A, uAc ) will be called a polar code for W length N .
if the information set A is chosen as a K-element subset of This theorem is proved in Sections VII and VIII. Notice
(i) (j)
{1, . . . , N } such that Z(WN ) ≤ Z(WN ) for all i ∈ A, that the complexity bounds in Theorem 5 are independent of
j ∈ Ac . the code rate and the way the frozen vector is chosen. The
Polar codes are channel-specific designs: a polar code for bounds hold even at rates above I(W ), but clearly this has no
one channel may not be a polar code for another. The main practical significance.
result of this paper will be to show that polar coding achieves As for code construction, we have found no low-complexity
the symmetric capacity I(W ) of any given B-DMC W . algorithms for constructing polar codes. One exception is the
An alternative rule for polar code definition would be to case of a BEC for which we have a polar code construc-
specify A as a K-element subset of {1, . . . , N } such that tion algorithm with complexity O(N ). We discuss the code
(i) (j)
I(WN ) ≥ I(WN ) for all i ∈ A, j ∈ Ac . This alternative construction problem further in Sect. IX and suggest a low-
rule would also achieve I(W ). However, the rule based on complexity statistical algorithm for approximating the exact
the Bhattacharyya parameters has the advantage of being polar code construction.
connected with an explicit bound on block error probability.
The polar code definition does not specify how the frozen
vector uAc is to be chosen; it may be chosen at will. This D. Relations to previous work
degree of freedom in the choice of uAc simplifies the perfor- This paper is an extension of work begun in [2], where
mance analysis of polar codes by allowing averaging over an channel combining and splitting were used to show that
ensemble. However, it is not for analytical convenience alone improvements can be obtained in the sum cutoff rate for some
that we do not specify a precise rule for selecting uAc , but specific DMCs. However, no recursive method was suggested
also because it appears that the code performance is relatively there to reach the ultimate limit of such improvements.
insensitive to that choice. In fact, we prove in Sect. VI-B that, As the present work progressed, it became clear that polar
for symmetric channels, any choice for uAc is as good as any coding had much in common with Reed-Muller (RM) coding
other. [3], [4]. Indeed, recursive code construction and SC decoding,
5) Coding theorems: Fix a B-DMC W and a number which are two essential ingredients of polar coding, appear to
R ≥ 0. Let Pe (N, R) be defined as Pe (N, ⌊N R⌋, A) with have been introduced into coding theory by RM codes.
A selected in accordance with the polar coding rule for W . According to one construction of RM codes, for any N =
Thus, Pe (N, R) is the probability of block error under SC 2n , n ≥ 0, and 0 ≤ K ≤ N , an RM code with block-
decoding for polar coding over W with block-length N and length N and dimension K, denoted RM(N, K), is defined as
rate R, averaged over all choices for the frozen bits uAc . The a linear code whose generator matrix GRM (N, K) is obtained
main coding result of this paper is the following: by deleting (N − K) of the rows of F ⊗n so that none of the
Theorem 3: For any given B-DMC W and fixed R < deleted rows has a larger Hamming weight (number of 1s in
I(W ), block error probability for polar coding under succes- that row) than any of the remaining
K rows. For instance,
1000
sive cancellation decoding satisfies ⊗2
GRM (4, 4) = F = 1 0 1 0 and GRM (4, 2) = [ 11 01 11 01 ].
1 1 0 0
1 1111
Pe (N, R) = O(N − 4 ). (15) This construction brings out the similarities between RM
This theorem follows as an easy corollary to Theorem 2 and codes and polar codes. Since GN and F ⊗n have the same
the bound (13), as we show in Sect. V-B. For symmetric chan- set of rows (only in a different order) for any N = 2n , it is
nels, we have the following stronger version of Theorem 3. clear that RM codes belong to the class of GN -coset codes.
Theorem 4: For any symmetric B-DMC W and any fixed For example, RM(4, 2) is the G4 -coset code with parameter
R < I(W ), consider any sequence of GN -coset codes (4, 2, {2, 4}, (0, 0)). So, RM coding and polar coding may be
(N, K, A, uAc ) with N increasing to infinity, K = ⌊N R⌋, regarded as two alternative rules for selecting the information
A chosen in accordance with the polar coding rule for W , set A of a GN -coset code of a given size (N, K). Unlike polar
and uAc fixed arbitrarily. The block error probability under coding, RM coding selects the information set in a channel-
successive cancellation decoding satisfies independent manner; it is not as fine-tuned to the channel
1 polarization phenomenon as polar coding is. We will show
Pe (N, K, A, uAc ) = O(N − 4 ). (16)
in Sect. X that, at least for the class of BECs, the RM rule
This is proved in Sect. VI-B. Note that for symmetric for information set selection leads to asymptotically unreliable
channels I(W ) equals the Shannon capacity of W . codes under SC decoding. So, polar coding goes beyond RM
6) Complexity: An important issue about polar coding is coding in a non-trivial manner by paying closer attention to
the complexity of encoding, decoding, and code construction. channel polarization.
The recursive structure of the channel polarization construction Another connection to existing work can be established
leads to low-complexity encoding and decoding algorithms for by noting that polar codes are multi-level |u|u + v| codes,
the class of GN -coset codes, and in particular, for polar codes. which are a class of codes originating from Plotkin’s method
for code combining [5]. This connection is not surprising in
6
(1) (N )
view of the fact that RM codes are also multi-level |u|u + v| dent copies of W into WN , . . . , WN . The goal in this sec-
codes [6, pp. 114-125]. However, unlike typical multi-level tion is to show that this blockwise channel transformation can
code constructions where one begins with specific small codes be broken recursively into single-step channel transformations.
to build larger ones, in polar coding the multi-level code is We say that a pair of binary-input channels W ′ : X → Ỹ
obtained by expurgating rows of a full-order generator matrix, and W ′′ : X → Ỹ × X are obtained by a single-step
GN , with respect to a channel-specific criterion. The special transformation of two independent copies of a binary-input
structure of GN ensures that, no matter how expurgation is channel W : X → Y and write
done, the resulting code is a multi-level |u|u + v| code. In
(W, W ) 7→ (W ′ , W ′′ )
essence, polar coding enjoys the freedom to pick a multi-level
code from an ensemble of such codes so as to suit the channel iff there exists a one-to-one mapping f : Y 2 → Ỹ such that
at hand, while conventional approaches to multi-level coding X1
do not have this degree of flexibility. W ′ (f (y1 , y2 )|u1 ) = W (y1 |u1 ⊕ u′2 )W (y2 |u′2 ), (17)
2
Finally, we wish to mention a “spectral” interpretation of po- ′ u2
lar codes which is similar to Blahut’s treatment of BCH codes ′′ 1
[7, Ch. 9]; this type of similarity has already been pointed out W (f (y1 , y2 ), u1 |u2 ) = W (y1 |u1 ⊕ u2 )W (y2 |u2 ) (18)
2
by Forney [8, Ch. 11] in connection with RM codes. From for all u1 , u2 ∈ X , y1 , y2 ∈ Y.
the spectral viewpoint, the encoding operation (8) is regarded According to this, we can write (W, W ) 7→ (W2 , W2 )
(1) (2)
as a transform of a “frequency” domain information vector for any given B-DMC W because
uN N
1 to a “time” domain codeword vector x1 . The transform X1
−1 (1) ∆
is invertible with GN = GN . The decoding operation is W2 (y12 |u1 ) = W2 (y12 |u21 )
regarded as a spectral estimation problem in which one is given u2
2
a time domain observation y1N , which is a noisy version of xN 1 ,
X1
and asked to estimate uN = W (y1 |u1 ⊕ u2 )W (y2 |u2 ), (19)
1 . To aid the estimation task, one is 2
u 2
allowed to freeze a certain number of spectral components of
(2) 1
∆
uN1 . This spectral interpretation of polar coding suggests that W2 (y12 , u1 |u2 ) = W2 (y12 |u21 )
it may be possible to treat polar codes and BCH codes in a 2
1
unified framework. The spectral interpretation also opens the = W (y1 |u1 ⊕ u2 )W (y2 |u2 ), (20)
door to the use of various signal processing techniques in polar 2
coding; indeed, in Sect. VII, we exploit some fast transform which are in the form of (17) and (18) by taking f as the
techniques in designing encoders for polar codes. identity mapping.
It turns out we can write, more generally,
(i) (i) (2i−1) (2i)
E. Paper outline (WN , WN ) 7→ (W2N , W2N ). (21)
The rest of the paper is organized as follows. Sect. II This follows as a corollary to the following:
explores the recursive properties of the channel splitting op- Proposition 3: For any n ≥ 0, N = 2n , 1 ≤ i ≤ N ,
eration. In Sect. III, we focus on how I(W ) and Z(W ) get
(2i−1)
transformed through a single step of channel combining and W2N (y12N , u12i−2 |u2i−1 ) =
splitting. We extend this to an asymptotic analysis in Sect. IV X 1 (i)
2i−2 2i−2
and complete the proofs of Theorem 1 and Theorem 2. This WN (y1N , u1,o ⊕ u1,e |u2i−1 ⊕ u2i )
u
2
completes the part of the paper on channel polarization; the 2i
(i) 2N 2i−2
rest of the paper is mainly about polar coding. Section V · WN (yN +1 , u1,e |u2i ) (22)
develops an upper bound on the block error probability of polar
and
coding under SC decoding and proves Theorem 3. Sect. VI
considers polar coding for symmetric B-DMCs and proves (2i)
W2N (y12N , u12i−1 |u2i ) =
Theorem 4. Sect. VII gives an analysis of the encoder mapping
1 (i) N 2i−2 2i−2
GN , which results in efficient encoder implementations. In W (y , u ⊕ u1,e |u2i−1 ⊕ u2i )
Sect. VIII, we give an implementation of SC decoding with 2 N 1 1,o
(i) 2N 2i−2
complexity O(N log N ). In Sect. IX, we discuss the code · WN (yN +1 , u1,e |u2i ). (23)
construction complexity and propose an O(N log N ) statistical
This proposition is proved in the Appendix. The transform
algorithm for approximate code construction. In Sect. X, we
relationship (21) can now be justified by noting that (22) and
explain why RM codes have a poor asymptotic performance
(23) are identical in form to (17) and (18), respectively, after
under SC decoding. In Sect. XI, we point out some generaliza-
the following substitutions:
tions of the present work, give some complementary remarks,
(i) (2i−1)
and state some open problems. W ← WN , W ′ ← W2N ,
′′ (2i)
W ← W2N , u1 ← u2i−1 ,
II. R ECURSIVE CHANNEL TRANSFORMATIONS 2i−2 2i−2
u2 ← u2i , y1 ← (y1N , u1,o ⊕ u1,e ),
We have defined a blockwise channel combining and split- 2N 2i−2 2N 2i−2
ting operation by (4) and (5) which transformed N indepen- y2 ← (yN +1 , u1,e ), f (y1 , y2 ) ← (y1 , u1 ).
7
set S(b1 , . . . , bn ) ∈ Fn and use Prop. 7 to write This result can be interpreted as saying that, among all B-
1 1 DMCs W , the BEC presents the most favorable rate-reliability
E[In+1 |S(b1 , · · · , bn )] = I(Wb1 ···bn 0 ) + I(Wb1 ···bn 1 ) trade-off: it minimizes Z(W ) (maximizes reliability) among
2 2
all channels with a given symmetric capacity I(W ); equiva-
= I(Wb1 ···bn ).
lently, it minimizes I(W ) required to achieve a given level of
Since I(Wb1 ···bn ) is the value of In on S(b1 , . . . , bn ), (41) reliability Z(W ).
follows. This completes the proof that {In , Fn } is a martin- Proof: Consider two channels W and W ′ with Z(W ) =
∆
gale. Since {In , Fn } is a uniformly integrable martingale, by Z(W ′ ) = z0 . Suppose that W ′ is a BEC. Then, W ′ has
general convergence results about such martingales (see, e.g., erasure probability z0 and I(W ′ ) = 1 − z0 . Consider the
[9, Theorem 9.4.6]), the claim about I∞ follows. random processes {Zn } and {Zn′ } corresponding to W and
It should not be surprising that the limit RV I∞ takes values W ′ , respectively. By the condition for equality in (34), the
a.e. in {0, 1}, which is the set of fixed points of I(W ) under process {Zn } is stochastically dominated by {Zn′ } in the sense
(1) (2)
the transformation (W, W ) 7→ (W2 , W2 ), as determined that P (Zn ≤ z) ≥ P (Zn′ ≤ z) for all n ≥ 1, 0 ≤ z ≤ 1.
by the condition for equality in (25). For a rigorous proof Thus, the probability of {Zn } converging to zero is lower-
of this statement, we take an indirect approach and bring the bounded by the probability that {Zn′ } converges to zero, i.e.,
process {Zn ; n ≥ 0} also into the picture. I(W ) ≥ I(W ′ ). This implies I(W ) + Z(W ) ≥ 1.
Proposition 9: The sequence of random variables and Borel
fields {Zn , Fn ; n ≥ 0} is a supermartingale, i.e.,
B. Proof of Theorem 2
Fn ⊂ Fn+1 and Zn is Fn -measurable, (42) We will now prove Theorem 2, which strengthens the
E[|Zn |] < ∞, (43) above polarization results by specifying a rate of polarization.
Zn ≥ E[Zn+1 |Fn ]. (44) Consider the probability space (Ω, F, P ). For ω ∈ Ω, i ≥ 0,
by Prop. 7, we have Zi+1 (ω) = Zi2 (ω) if Bi+1 (ω) = 1 and
Furthermore, the sequence {Zn ; n ≥ 0} converges a.e. to a Zi+1 (ω) ≤ 2Zi (ω) − Zi (ω)2 ≤ 2Zi (ω) if Bi+1 (ω) = 0. For
random variable Z∞ which takes values a.e. in {0, 1}. ζ ≥ 0 and m ≥ 0, define
Proof: Conditions (42) and (43) are clearly satisfied. To ∆
verify (44), consider a cylinder set S(b1 , . . . , bn ) ∈ Fn and Tm (ζ) = {ω ∈ Ω : Zi (ω) ≤ ζ for all i ≥ m}.
use Prop. 7 to write For ω ∈ Tm (ζ) and i ≥ m, we have
1 1 (
E[Zn+1 |S(b1 , . . . , bn )] = Z(Wb1 ···bn 0 ) + Z(Wb1 ···bn 1 ) Zi+1 (ω) 2, if Bi+1 (ω) = 0
2 2 ≤
≤ Z(Wb1 ···bn ). Zi (ω) ζ, if Bi+1 (ω) = 1
where H is the binary entropy function. Define n0 (m, η, δ) as WN , the input to the product-form channel W N , the output of
the smallest n such that the RHS of (46) is greater than or W N (and also of WN ), and the decisions by the decoder. For
equal to 1 − δ/2; it is clear that n0 (m, η, δ) is finite for any each sample point (uN N N N
1 , y1 ) ∈ X ×Y , the first three vectors
∆
m ≥ 0, 0 < η < 1/2, and δ > 0. Now, with m1 = m1 (δ) = take on the values U1 (u1 , y1 ) = uN
N N N N N N
1 , X1 (u1 , y1 ) =
N N N N N
∆
m0 (ζ0 , δ) and n1 = n1 (δ) = n0 (m1 , η0 , δ), we obtain the u1 GN , and Y1 (u1 , y1 ) = y1 , while the decoder output
desired bound: takes on the value Û1N (uN N
1 , y1 ) whose coordinates are defined
recursively as
P [Tm1 (ζ0 ) ∩ Um1 ,n (η0 )] ≥ I0 − δ, n ≥ n1 . (
ui , i ∈ Ac
Finally, we tie the above analysis to the claim of Theorem 2. Ûi (uN N
1 , y1 ) = i−1 (48)
∆ hi (y1N , Û1 (uN N
1 , y1 )), i ∈ A
Define c = 2−4+5m1 /4 and
∆ for i = 1, . . . , N .
Vn = {ω ∈ Ω : Zn (ω) ≤ c 2−5n/4 }, n ≥ 0;
A realization uN 1 ∈ X
N
for the input random vector U1N
and, note that corresponds to sending the data vector uA together with the
frozen vector uAc . As random vectors, the data part UA
Tm1 (ζ0 ) ∩ Um1 ,n (η0 ) ⊂ Vn , n ≥ n1 .
and the frozen part UAc are uniformly distributed over their
So, P (Vn ) ≥ I0 − δ for n ≥ n1 . On the other hand, respective ranges and statistically independent. By treating
X 1 UAc as a random vector over X N −K , we obtain a convenient
P (Vn ) = n
1{Z(Wω1n ) ≤ c 2−5n/4 } method for analyzing code performance averaged over all
2
n nω1 ∈X codes in the ensemble (N, K, A).
1 The main event of interest in the following analysis is the
= |AN | block error event under SC decoding, defined as
N
∆ (i) ∆
where AN = {i ∈ {1, . . . , N } : Z(WN ) ≤ c N −5/4 } with E = {(uN N
1 , y1 ) ∈ X
N
× Y N : ÛA (uN N
1 , y1 ) 6= uA }. (49)
N = 2n . We conclude that |AN | ≥ N (I0 − δ) for n ≥ n1 (δ).
This completes the proof of Theorem 2. Since the decoder never makes an error on the frozen part of
Given Theorem 2, it is an easy exercise to show that polar U1N , i.e., ÛAc equals UAc with probability one, that part has
coding can achieve rates approaching I(W ), as we will show been excluded from the definition of the block error event.
in the next section. It is clear from the above proof that The probability of error terms Pe (N, K, A) and
Theorem 2 gives only an ad-hoc result on the asymptotic rate Pe (N, K, A, uAc ) that were defined in Sect. I-C.3 can
of channel polarization; this result is sufficient for proving a be expressed in this probability space as
capacity theorem for polar coding; however, finding the exact Pe (N, K, A) = P (E),
asymptotic rate of polarization remains an important goal for (50)
future research.2 Pe (N, K, A, uAc ) = P (E | {UAc = uAc }),
where {UAc = uAc } denotes the event {(ũN N
1 , y1 ) ∈ X
N
×
V. P ERFORMANCE OF POLAR CODING N
Y : ũAc = uAc }.
We show in this section that polar coding can achieve
the symmetric capacity I(W ) of any B-DMC W . The main
B. Proof of Proposition 2
technical task will be to prove Prop. 2. We will carry out the
analysis over the class of GN -coset codes before specializing We may express the block error event as E = ∪i∈A Bi where
the discussion to polar codes. Recall that individual GN -coset
∆
codes are identified by a parameter vector (N, K, A, uAc ). Bi = {(uN N
1 , y1 ) ∈ X
N
× YN :
In the analysis, we will fix the parameters (N, K, A) while u1i−1 = Û1i−1 (uN N N N
1 , y1 ), ui 6= Ûi (u1 , y1 )} (51)
keeping uAc free to take any value over X N −K . In other
words, the analysis will be over the ensemble of 2N −K GN - is the event that the first decision error in SC decoding occurs
coset codes with a fixed (N, K, A). The decoder in the system at stage i. We notice that
will be the SC decoder described in Sect. I-C.2.
Bi = {(uN N
1 , y1 ) ∈ X
N
× Y N : ui−1
1 = Û1i−1 (uN N
1 , y1 ),
Thus, we have 0
[ X 10
E⊂ Ei , P (E) ≤ P (Ei ).
(i)
X 1 This bound on Pe (N, K, A, uAc ) is independent of the frozen
WN (y1N , ui−1
1 | ui ) = WN (y1N | uN
1 )
N
2 N −1 vector uAc . Theorem 4 is now obtained by combining Theo-
ui+1
rem 2 with Prop. 2, as in the proof of Theorem 3.
X 1 Note that although we have given a bound on P (E|{U1N =
= WN (aN N N N
1 GN · y1 | u1 ⊕ a1 )
2N −1 u1 }) that is independent of uN
N
1 , we stopped short of claiming
uN
i+1
i−1
that the error event E is independent of U1N because our
= WN (aN N
1 GN · y1 , u1 ⊕ ai−1
1 | u i ⊕ ai ) decision functions {hi } break ties always in favor of ûi = 0.
where we used the fact that the sum over uN N −i If this bias were removed by randomization, then E would
i+1 ∈ X can
be replaced with a sum over ui+1 ⊕ ai+1 for any fixed aN
N N become independent of U1N .
1
since {uN N N
i+1 ⊕ ai+1 : ui+1 ∈ X
N −i
} = X N −i .
(i)
C. Further symmetries of the channel WN
B. Proof of Theorem 4 We may use the degrees of freedom in the choice of aN1 in
(i)
We return to the analysis in Sect. V and consider a code (58) to explore the symmetries inherent in the channel WN .
ensemble (N, K, A) under SC decoding, only this time as- For a given (y1N , ui1 ), we may select aN i i
1 with a1 = u1 to
suming that W is a symmetric channel. We first show that the obtain
error events {Ei } defined by (52) have a symmetry property. (i) (i)
WN (y1N , ui−1
1 | ui ) = WN (aN N i−1
1 GN · y1 , 01 | 0). (65)
Proposition 14: For a symmetric B-DMC W , the event Ei
has the property that So, if we were to prepare a look-up table for the transition
(i)
probabilities {WN (y1N , ui−1 | ui ) : y1N ∈ Y N , ui1 ∈ X i },
(uN N
1 , y1 ) ∈ Ei iff (aN
1 ⊕ uN N
1 , a1 GN · y1N ) ∈ Ei (59) 1
it would suffice to store only the subset of probabilities
for each 1 ≤ i ≤ N , (uN N
1 , y1 ) ∈ X
N
× Y N , aN
1 ∈X .
N (i)
{WN (y1N , 0i−1
1 | 0) : y1N ∈ Y N }.
Proof: This follows directly from the definition of Ei by The size of the look-up table can be reduced further by
(i)
using the symmetry property (58) of the channel WN . using the remaining degrees of freedom in the choice of aNi+1 .
Now, consider the transmission of a particular source vector N ∆ N N i i
Let Xi+1 = {a1 ∈ X : a1 = 01 }, 1 ≤ i ≤ N . Then, for
uA and a frozen vector uAc , jointly forming an input vector any 1 ≤ i ≤ N , aN N N N
1 ∈ Xi+1 , and y1 ∈ Y , we have
uN1 for the channel WN . This event is denoted below as
(i) (i) i−1
{U1N = uN N
1 } instead of the more formal {u1 } × Y .
N
WN (y1N , 0i−1 |0) = WN (aN N
1 GN · y1 , 01 |0) (66)
Corollary 1: For a symmetric B-DMC W , for each 1 ≤
which follows from (65) by taking ui1
= 0i1
on the left hand
i ≤ N and uN N N
1 ∈ X , the events Ei and {U1 = u1 } are
N
N N side.
independent; hence, P (Ei ) = P (Ei | {U1 = u1 }). N ∆
To explore this symmetry further, let Xi+1 · y1N = {aN1 GN ·
Proof: For (uN N
1 , y1 ) ∈ X
N
× Y N and xN N
1 = u1 GN ,
y1 : a1 ∈ Xi+1 }. The set Xi+1 · y1 is the orbit of y1N under
N N N N N
we have N N
X the action group Xi+1 . The orbits Xi+1 · y1N over variation
P (Ei | {U1N = uN 1 }) = WN (y1N | uN N N
1 ) 1Ei (u1 , y1 )
N N
of y1 partition the space Y into equivalence classes. Let
N
y1N Yi+1 be a set formed by taking one representative from each
X (i)
= WN (xN N N N N N equivalence class. The output alphabet of the channel WN
1 · y1 | 01 ) 1Ei (01 , x1 · y1 ) (60)
N
y1N
can be represented effectively by the set Yi+1 .
For example, suppose W is a BSC with Y = {0, 1}. Each
= P (Ei | {U1N = 0N
1 }). (61) N
orbit Xi+1 · y1N has 2N −i elements and there are 2i orbits.
(1)
Equality follows in (60) from (57) and (59) by taking = aN
1 In particular, the channel WN has effectively two outputs,
uN
1 , and in (61) from the fact that {xN N
1 ·y 1 : y N
1 ∈ Y N
} = Y N
and being symmetric, it has to be a BSC. This is a great
13
(1)
simplification since WN has an apparent output alphabet size
(i)
of 2N . Likewise, while WN has an apparent output alphabet u1 u1 v1 y1
N +i−1 +
size of 2 , due to symmetry, the size shrinks to 2i .
Further output alphabet size reductions may be possible by u2 u3 v2 y2
exploiting other properties specific to certain B-DMCs. For +
(i)
example, if W is a BEC, the channels {WN } are known to
.. .. WN/2 ..
be BECs, each with an effective output alphabet size of three.
(i) . . .
The symmetry properties of {WN } help simplify the com-
putation of the channel parameters.
Proposition 15: For any symmetric B-DMC W , the pa- uN/2−1
uN/2 vN/2 yN/2
(i)
rameters {Z(WN )} given by (7) can be calculated by the +
simplified formula
X RN
(i)
Z(WN ) = 2i−1 N
|Xi+1 · y1N |
y1N ∈Yi+1
N uN/2+1 u2 yN/2+1
q vN/2+1
(i) (i) N
· WN (y1N , 0i−1 i−1
1 |0)WN (y1 , 01 |1). uN/2+2 yN/2+2
u4
We omit the proof of this result. vN/2+2
For the important example of a BSC, this formula becomes .. .. WN/2 ..
. . .
(i)
Z(WN ) = 2N −1
X q (i) (i) N
· WN (y1N , 0i−1 i−1
1 |0) WN (y1 , 01 |1). uN uN vN yN
y1N ∈Yi+1
N
(i)
This sum for Z(WN ) has 2i terms, as compared to 2N +i−1 WN
terms in (7). Fig. 8. An alternative realization of the recursive construction for WN .
VII. E NCODING
In this section, we will consider the encoding of polar codes valid for N ≥ 2. This form appears more suitable to derive a
and prove the part of Theorem 5 about encoding complexity. recursive relationship. We substitute GN/2 = RN/2 (F ⊗GN/4 )
We begin by giving explicit algebraic expressions for GN , back into (67) to obtain
the generator matrix for polar coding, which so far has been
GN = RN F ⊗ RN/2 F ⊗ GN/4
defined only in a schematic form by Fig. 3. The algebraic ⊗2
forms of GN naturally point at efficient implementations of the = RN I2 ⊗ RN/2 F ⊗ GN/4 (68)
encoding operation xN N
1 = u1 GN . In analyzing the encoding where (68) is obtained by using the identity (AC) ⊗ (BD) =
operation GN , we exploit its relation to fast transform methods (A ⊗ B)(C ⊗ D) with A = I2 , B = RN/2 , C = F , D =
in signal processing; in particular, we use the bit-indexing idea F ⊗ GN/4 . Repeating this, we obtain
of [11] to interpret the various permutation operations that are
part of GN . GN = BN F ⊗n (69)
∆
where BN = RN (I2 ⊗ RN/2 )(I4 ⊗ RN/4 ) · · · (IN/2 ⊗ R2 ). It
A. Formulas for GN can seen by simple manipulations that
In the following, assume N = 2n for some n ≥ 0. Let Ik
BN = RN (I2 ⊗ BN/2 ). (70)
denote the k-dimensional identity matrix for any k ≥ 1. We
begin by translating the recursive definition of GN as given We can see that BN is a permutation matrix by the following
by Fig. 3 into an algebraic form: induction argument. Assume that BN/2 is a permutation matrix
for some N ≥ 4; this is true for N = 4 since B2 = I2 . Then,
GN = (IN/2 ⊗ F ) RN (I2 ⊗ GN/2 ), for N ≥ 2, BN is a permutation matrix because it is the product of two
with G1 = I1 . permutation matrices, RN and I2 ⊗ BN/2 .
Either by verifying algebraically that (IN/2 ⊗ F )RN = In the following, we will say more about the nature of BN
RN (F ⊗ IN/2 ) or by observing that channel combining oper- as a permutation.
ation in Fig. 3 can be redrawn equivalently as in Fig. 8, we
obtain a second recursive formula B. Analysis by bit-indexing
To analyze the encoding operation further, it will be conve-
GN = RN (F ⊗ IN/2 )(I2 ⊗ GN/2 )
nient to index vectors and matrices with bit sequences. Given
= RN (F ⊗ GN/2 ), (67) a vector aN n
1 with length N = 2 for some n ≥ 0, we denote
14
its ith element, ai , 1 ≤ i ≤ N , alternatively as ab1 ···bn where Proposition 16: For any N = 2n , n ≥ 1, the generator
b1 · · · bn is theP binary expansion of the integer i−1 in the sense matrix GN is given by GN = BN F ⊗n and GN = F ⊗n BN
n
that i = 1 + j=1 bj 2n−j . Likewise, the element Aij of an where BN is the bit-reversal permutation. GN is a bit-reversal
N -by-N matrix A is denoted alternatively as Ab1 ···bn ,b′1 ···b′n invariant matrix with
where b1 · · · bn and b′1 · · · b′n are the binary representations n
Y
of i − 1 and j − 1, respectively. Using this convention, it (GN )b1 ···bn ,b′1 ···b′n = (1 ⊕ b′i ⊕ bn−i b′i ). (72)
can be readily verified that the product C = A ⊗ B of a i=1
Proof: F ⊗n commutes with BN because it is invari-
2n -by-2n matrix A and a 2m -by-2m matrix B has elements
Cb1 ···bn+m ,b′1 ···b′n+m = Ab1 ···bn ,b′1 ···b′n Bbn+1 ···bn+m ,b′n+1 ···b′n+m . ant under bit-reversal, which is immediate from (71). The
statement GN = BN F ⊗n was established before; by proving
We now consider the encoding operation under bit-indexing. that F ⊗n commutes with BN , we have established the other
First, we observe that the elements of F in bit-indexed form statement: GN = F ⊗n BN . The bit-indexed form (72) follows
are given by Fb,b′ = 1 ⊕ b′ ⊕ bb′ for all b, b′ ∈ {0, 1}. Thus, by applying bit-reversal to (71).
F ⊗n has elements Finally, we give a fact that will be useful in Sect. X.
n
Y n
Y Proposition 17: For any N = 2n , n ≥ 0, b1 , . . . , bn ∈
Fb⊗n ′ ′ = Fbi ,b′i = (1 ⊕ b′i ⊕ bi b′i ). (71) {0, 1}, the rows of GN and F ⊗n with index b1 · · · bn have
1 ···bn ,b ···b
1 n
i=1 i=1 the same Hamming weight given by 2wH (b1 ,...,bn ) where
n
X
Second, the reverse shuffle operator RN acts on a row vector ∆
wH (b1 , . . . , bn ) = bi (73)
uN1 to replace the element in bit-indexed position b1 · · · bn with i=1
the element in position b2 · · · bn b1 ; that is, if v1N = uN 1 RN ,
is the Hamming weight of (b1 , . . . , bn ).
then vb1 ···bn = ub2 ···bn b1 for all b1 , . . . , bn ∈ {0, 1}. In other
Proof: For fixed b1 , . . . , bn , the sum of the terms
words, RN cyclically rotates the bit-indexes of the elements
(GN )b1 ···bn ,b′1 ···b′n (as integers) over all b′1 , . . . , b′n ∈ {0, 1}
of a left operand uN 1 to the right by one place.
gives the Hamming weight of the row of GN with index
Third, the matrix BN in (69) can be interpreted as the bit-
b1 · · · bn . From the preceding formula for (GN )b1 ···bn ,b′1 ···b′n ,
reversal operator: if v1N = uN 1 BN , then vb1 ···bn = ubn ···b1
this sum is easily seen to be 2wH (b1 ,...,bn ) . The proof for F ⊗n
for all b1 , . . . , bn ∈ {0, 1}. This statement can be proved by
is similar.
induction using the recursive formula (70). We give the idea
of such a proof by an example. Let us assume that B4 is a
C. Encoding complexity
bit-reversal operator and show that the same is true for B8 .
Let u81 be any vector over GF (2). Using bit-indexing, it can For complexity estimation, our computational model will
be written as (u000 , u001 , u010 , u011 , u100 , u101 , u110 , u111 ). be a single processor machine with a random access memory.
Since u81 B8 = u81 R8 (I2 ⊗ B4 ), let us first consider the The complexities expressed will be time complexities. The
action of R8 on u81 . The reverse shuffle R8 rearranges the discussion will be given for an arbitrary GN -coset code with
elements of u81 with respect to odd-even parity of their indices, parameters (N, K, A, uAc ).
so u81 R8 equals (u000 , u010 , u100 , u110 , u001 , u011 , u101 , u111 ). Let χE (N ) denote the worst-case encoding complexity over
∆
This has two halves, c41 = (u000 , u010 , u100 , u110 ) and all (N, K, A, uAc ) codes with a given block-length N . If we
∆ take the complexity of a scalar mod-2 addition as 1 unit and the
d41 = (u001 , u011 , u101 , u111 ), corresponding to odd-even in-
complexity of the reverse shuffle operation RN as N units, we
dex classes. Notice that cb1 b2 = ub1 b2 0 and db1 b2 = ub1 b2 1 for
see from Fig. 3 that χE (N ) ≤ N/2+N +2χE (N/2). Starting
all b1 , b2 ∈ {0, 1}. This is to be expected since the reverse
with an initial value χE (2) = 3 (a generous figure), we obtain
shuffle rearranges the indices in increasing order within each
by induction that χE (N ) ≤ 32 N log N for all N = 2n , n ≥ 1.
odd-even index class. Next, consider the action of I2 ⊗ B4 on
Thus, the encoding complexity is O(N log N ).
(c41 , d41 ). The result is (c41 B4 , d41 B4 ). By assumption, B4 is a A specific implementation of the encoder using the form
bit-reversal operation, so c41 B4 = (c00 , c10 , c01 , c11 ), which
GN = BN F ⊗n is shown in Fig. 9 for N = 8. The input to
in turn equals (u000 , u100 , u010 , u110 ). Likewise, the result
the circuit is the bit-reversed version of u81 , i.e., ũ81 = u81 B8 .
of d41 B4 equals (u001 , u101 , u011 , u111 ). Hence, the overall The output is given by x81 = ũ81 F ⊗3 = u81 G8 . In general, the
operation B8 is a bit-reversal operation.
complexity of this implementation is O(N log N ) with O(N )
Given the bit-reversal interpretation of BN , it is clear that for BN and O(N log N ) for F ⊗n .
T
BN is a symmetric matrix, so BN = BN . Since BN is a An alternative implementation of the encoder would be to
−1
permutation, it follows from symmetry that BN = BN . apply u81 in natural index order at the input of the circuit in
It is now easy to see that, for any N -by-N matrix A, Fig. 9. Then, we would obtain x̃81 = u81 F ⊗3 at the output.
T
the product C = BN ABN has elements Cb1 ···bn ,b′1 ···b′n = Encoding could be completed by a post bit-reversal operation:
Abn ···b1 ,b′n ···b′1 . It follows that if A is invariant under bit- x81 = x̃81 B8 = u81 G8 .
reversal, i.e., if Ab1 ···bn ,b′1 ···b′n = Abn ···b1 ,b′n ···b′1 for every The encoding circuit of Fig. 9 suggests many parallel
b1 , . . . , bn , b′1 , . . . , b′n ∈ {0, 1}, then A = BN T
ABN . Since implementation alternatives for F ⊗n : for example, with N
T −1
BN = BN , this is equivalent to BN A = ABT . Thus, processors, one may do a “column by column” implementa-
bit-reversal-invariant matrices commute with the bit-reversal tion, and reduce the total latency to log N . Various other trade-
operator. offs are possible between latency and hardware complexity.
15
ũ1 = u1 x1 ûi−1
1 , and upon receiving them, computes the likelihood ratio
(LR)
(i)
ũ1 = u5 x2 (i) ∆ WN (y1N , ûi−1
1 |0)
LN (y1N , ûi−1
1 ) = (i)
WN (y1N , ûi−1
1 |1)
ũ3 = u3 x3 and generates its decision as
(
(i)
0, if LN (y1N , ûi−1
1 )≥ 1
ũ4 = u7 x4 ûi =
1, otherwise
if i ∈ Ac , the decision ûi is set to the known frozen value ui , The decoder is visualized as consisting of N DEs situated at
(i)
regardless of LN (y1N , ûi−1
1 ). the left-most side of the decoder graph. The node with label
To see where the computational savings will come from, we (y18 , ûi−1
1 ) is associated with the ith DE, 1 ≤ i ≤ 8. The
inspect (74) and (75) and note that each LR value in the pair positioning of the DEs in the left-most column follows the
(2i−1) (2i) bit-reversed index order, as in Fig. 9.
(LN (y1N , û12i−2 ), LN (y1N , û12i−1 ))
is assembled from the same pair of LRs: y18 y14 y12 y1
(i) N/2 2i−2 2i−2 (i)
N 2i−2 1 2 3 4
(LN/2 (y1 , û1,o ⊕ û1,e ), LN/2 (yN/2+1 , û1,e )).
(y18 , û41 ) (y14 , û41,e ⊕ û41,o ) (y12 , û1 ⊕ û2 ⊕ û3 ⊕ û4 ) y2
Thus, the calculation of all N LRs at length N requires exactly
21 22 23 5
N LR calculations at length N/2.3 Let us split the N LRs at
length N/2 into two classes, namely, (y18 , û21 ) (y14 , û1⊕ û2 ) y34 y3
(i) N/2 2i−2 2i−2 17 18 6 7
{LN/2 (y1 , û1,o ⊕ û1,e ) : 1 ≤ i ≤ N/2},
(i)
(78) (y18 , û61 ) (y14 , û61,e
⊕ û61,o ) (y34 , û3⊕ û4 ) y4
N 2i−2
{LN/2 (yN/2+1 , û1,e ) : 1 ≤ i ≤ N/2}.
29 30 24 8
Let us suppose that we carry out the calculations in each class (y18 , û1 ) y58 y56 y5
independently, without trying to exploit any further savings
16 9 10 11
that may come from the sharing of LR values between the
two classes. Then, we have two problems of the same type as (y18 , û51 ) (y58 , û41,e ) (y56 , û2 ⊕ û4 ) y6
the original but at half the size. Each class in (78) generates a 28 25 26 12
set of N/2 LR calculation requests at length N/4, for a total
N/2 ∆ N/2 N/2 (y18 , û31 ) (y58 , û2 ) y78 y7
of N requests. For example, if we let v̂1 = û1,o ⊕ û1,e ,
20 19 13 14
the requests arising from the first class are
(i) N/4 2i−2 2i−2 (y18 , û71 ) (y58 , û61,e ) (y78 , û4 ) y8
{LN/4(y1 , v̂1,o ⊕ v̂1,e ) : 1 ≤ i ≤ N/4},
32 31 27 15
(i) N/22i−2
{LN/4(yN/4+1 , v̂1,e ) : 1 ≤ i ≤ N/4}.
Fig. 10. An implementation of the successive cancellation decoder for polar
Using this reasoning inductively across the set of all lengths coding at block-length N = 8.
{N, N/2, . . . , 1}, we conclude that the total number of LRs
that need to be calculated is N (1 + log N ).
So far, we have not paid attention to the exact order in which Decoding begins with DE 1 activating node 1 for the
(1)
the LR calculations at various block-lengths are carried out. calculation of L8 (y18 ). Node 1 in turn activates node 2 for
(1) 4
Although this gave us an accurate count of the total number L4 (y1 ). At this point, program control passes to node 2,
of LR calculations, for a full description of the algorithm, we and node 1 will wait until node 2 delivers the requested
need to specify an order. There are many possibilities for such LR. The process continues. Node 2 activates node 3, which
an order, but to be specific we will use a depth-first algorithm, activates node 4. Node 4 is a node at the channel level; so it
(1)
which is easily described by a small example. computes L1 (y1 ) and passes it to nodes 3 and 23, its left-
We consider a decoder for a code with parameter side neighbors. In general a node will send its computational
(N, K, A, uAc ) chosen as (8, 5, {3, 5, 6, 7, 8}, (0, 0, 0)). The result to all its left-side neighbors (although this will not be
computation for the decoder is laid out in a graph as shown in stated explicitly below). Program control will be passed back
Fig. 10. There are N (1+log N ) = 32 nodes in the graph, each to the left neighbor from which it was received.
responsible for computing an LR request that arises during Node 3 still needs data from the right side and activates node
(1) (1)
the course of the algorithm. Starting from the left-side, the 5, which delivers L1 (y2 ). Node 3 assembles L2 (y12 ) from
first column of nodes correspond to LR requests at length the messages it has received from nodes 4 and 5 and sends it to
8 (decision level), the second column of nodes to requests node 2. Next, node 2 activates node 6, which activates nodes
at length 4, the third at length 2, and the fourth at length 1 7 and 8, and returns its result to node 2. Node 2 compiles its
(1)
(channel level). response L4 (y14 ) and sends it to node 1. Node 1 activates
(1)
Each node in the graph carries two labels. For example, the node 9 which calculates L4 (y58 ) in the same manner as node
third node from the bottom in the third column has the labels (1)
2 calculated L4 (y14 ), and returns the result to node 1. Node
(y56 , û2 ⊕ û4 ) and 26; the first label indicates that the LR value (1)
1 now assembles L8 (y18 ) and sends it to DE 1. Since u1 is a
(2)
to be calculated at this node is L8 (y56 , û2 ⊕ û4 ) while the frozen node, DE 1 ignores the received LR, declares û1 = 0,
second label indicates that this node will be the 26th node to and passes control to DE 2, located next to node 16.
be activated. The numeric labels, 1 through 32, will be used (2)
DE 2 activates node 16 for L8 (y18 , û1 ). Node 16 assembles
as quick identifiers in referring to nodes in the graph. (2) (1)
L8 (y18 , û1 ) from the already-received LRs L4 (y14 ) and
(1) 8
3 Actually, some LR calculations at length N/2 may be avoided if, by L4 (y5 ), and returns its response without activating any node.
chance, some duplications occur, but we will disregard this. DE 2 ignores the returned LR since u2 is frozen, announces
17
û2 = 0, and passes control to DE 3. parallel. Note that this is the maximum degree of parallelism
(3) possible in the second time slot. Node 23, for example, cannot
DE 3 activates node 17 for L8 (y18 , û21 ). This triggers LR
(2)
requests at nodes 18 and 19, but no further. The bit u3 is calculate LN (y12 , û1 ⊕ û2 ⊕ û3 ⊕ û4 ) in this slot, because û1 ⊕
not frozen; so, the decision û3 is made in accordance with û2 ⊕ û3 ⊕ û4 is not yet available; it has to wait until decisions
(3)
L8 (y18 , û21 ), and control is passed to DE 4. DE 4 activates û1 , û2 , û3 , û4 are announced by the corresponding DEs. In
(4) the third time slot, nodes 2 and 9 do their calculations. In time
node 20 for L8 (y18 , û31 ), which is readily assembled and
returned. The algorithm continues in this manner until finally slot 4, the first decision û1 is made at node 1 and broadcast
(7) to all nodes across the graph (or at least to those that need it).
DE 8 receives L8 (y18 , û71 ) and decides û8 .
There are a number of observations that can be made by In slot 5, node 16 calculates û2 and broadcasts it. In slot 6,
looking at this example that should provide further insight into nodes 18 and 19 do their calculations. This process continues
the general decoding algorithm. First, notice that the compu- until time slot 15 when node 32 decides û8 . It can be shown
(1) that, in general, this fully-parallel decoder implementation has
tation of L8 (y18 ) is carried out in a subtree rooted at node
a latency of 2N − 1 time slots for a code of block-length N .
1, consisting of paths going from left to right, and spanning
all nodes at the channel level. This subtree splits into two
disjoint subtrees, namely, the subtree rooted at node 2 for the IX. C ODE CONSTRUCTION
(1)
calculation of L4 (y14 ) and the subtree rooted at node 9 for the
(1) 8 The input to a polar code construction algorithm is a triple
calculation of L4 (y5 ). Since the two subtrees are disjoint, the (W, N, K) where W is the B-DMC on which the code will be
corresponding calculations can be carried out independently used, N is the code block-length, and K is the dimensionality
(even in parallel if there are multiple processors). This splitting of the code. The output of the algorithm is an information set
of computational subtrees into disjoint subtrees holds for all P (i)
A ⊂ {1, . . . , N } of size K such that i∈A Z(WN ) is as
nodes in the graph (except those at the channel level), making
small as possible. We exclude the search for a good frozen
it possible to implement the decoder with a high degree of
vector uAc from the code construction problem because the
parallelism.
problem is already difficult enough. Recall that, for symmetric
Second, we notice that the decoder graph consists of but- channels, the code performance is not affected by the choice
terflies (2-by-2 complete bipartite graphs) that tie together of uAc .
adjacent levels of the graph. For example, nodes 9, 19, 10, In principle, the code construction problem can be solved
and 13 form a butterfly. The computational subtrees rooted (i)
by computing all the parameters {Z(WN ) : 1 ≤ i ≤ N }
at nodes 9 and 19 split into a single pair of computational and sorting them; unfortunately, we do not have an efficient
subtrees, one rooted at node 10, the other at node 13. Also algorithm for doing this. For symmetric channels, some com-
note that among the four nodes of a butterfly, the upper-left putational shortcuts are available, as we showed by Prop. 15,
node is always the first node to be activated by the above but these shortcuts have not yielded an efficient algorithm,
depth-first algorithm and the lower-left node always the last either. One exception to all this is the BEC for which the
one. The upper-right and lower-right nodes are activated by (i)
parameters {Z(WN )} can all be calculated in time O(N )
the upper-left node and they may be activated in any order or
thanks to the recursive formulas (38).
even in parallel. The algorithm we specified always activated
Since exact code construction appears too complex, it makes
the upper-right node first, but this choice was arbitrary. When
sense to look for approximate constructions based on estimates
the lower-left node is activated, it finds the LRs from its right (i)
of the parameters {Z(WN )}. To that end, it is preferable
neighbors ready for assembly. The upper-left node assembles
to pose the exact code construction problem as a decision
the LRs it receives from the right side as in formula (74),
problem: Given a threshold γ ∈ [0, 1] and an index i ∈
the lower-left node as in (75). These formulas show that the
{1, . . . , N }, decide whether i ∈ Aγ where
butterfly patterns impose a constraint on the completion time
of LR calculations: in any given butterfly, the lower-left node ∆ (i)
Aγ = {i ∈ {1, . . . , N } : Z(WN ) < γ}.
needs to wait for the result of the upper-left node which in
turn needs to wait for the results of the right-side nodes. Any algorithm for solving this decision problem can be used
Variants of the decoder are possible in which the nodal to solve the code construction problem. We can simply run
computations are scheduled differently. In the “left-to-right” the algorithm with various settings for γ until we obtain an
implementation given above, nodes waited to be activated. information set Aγ of the desired size K.
However, it is possible to have a “right-to-left” implementation Approximate code construction algorithms can be proposed
in which each node starts its computation autonomously as based on statistically reliable and efficient methods for es-
soon as its right-side neighbors finish their calculations; this timating whether i ∈ Aγ for any given pair (i, γ). The
allows exploiting parallelism in computations to the maximum estimation problem can be approached by noting that, as we
possible extent. (i)
have implicitly shown in (53), the parameter Z(WN ) is the
For example, in such a fully-parallel implementation for expectation of the RV
the case in Fig. 10, all eight nodes at the channel-level start v
u (i)
calculating their respective LRs in the first time slot following u W (Y N , U i−1 |Ui ⊕ 1)
t N 1 1
(79)
the availability of the channel output vector y18 . In the second (i)
WN (Y1 , U1i−1 |Ui )
N
time slot, nodes 3, 6, 10, and 13 do their LR calculations in
18
where (U1N , Y1N ) is sampled from the joint probability as- follows. For any ℓ ≥ 1, b1 , . . . , bℓ ∈ {0, 1},
N ∆ −N
signment PU1N ,Y1N (uN 1 , y1 ) = 2 WN (y1N |uN
1 ). A Monte- I(Wb1 ···bℓ 0 ) = I(Wb1 ···bℓ )2
Carlo approach can be taken where samples of (U1N , Y1N )
are generated from the given distribution and the empirical I(Wb1 ···bℓ 1 ) = 2I(Wb1 ···bℓ ) − I(Wb1 ···bℓ )2
(i) ≤ 2I(Wb1 ···bℓ )
means {Ẑ(WN )} are calculated. Given a sample (uN N
1 , y1 )
N N
of (U1 , Y1 ), the sample values of the RVs (79) can all be
with initial values I(W0 ) = I 2 (W ) and I(W1 ) = 2I(W ) −
computed in complexity O(N log N ). A SC decoder may be
I 2 (W ). These give the bound
used for this computation since the sample values of (79) are
n−r
just the square-roots of the decision statistics that the DEs in I(W0n−r 1r ) ≤ 2r (1 − ǫ)2 . (81)
a SC decoder ordinarily compute. (In applying a SC decoder
for this task, the information set A should be taken as the null Now, consider a sequence of RM codes with a fixed rate 0 <
set.) R < 1, N increasing to infinity, and K = ⌊N R⌋. Let r(N )
Statistical algorithms are helped by the polarization phe- denote the parameter r in (80) for the code with block-length
nomenon: for any fixed γ and as N grows, it becomes easier N in this sequence. Let n = log2 (N ). A simple asymptotic
(i)
to resolve whether Z(WN ) < γ because an ever growing analysis shows that the ratio r(N )/n must go to 1/2 as N is
(i) increased. This in turn implies by (81) that I(W0n−r 1r ) must
fraction of the parameters {Z(WN )} tend to cluster around
0 or 1. go to zero.
It is conceivable that, in an operational system, the esti- Suppose that this sequence of RM codes is decoded using a
(i)
mation of the parameters {Z(WN )} is made part of a SC SC decoder as in Sect. I-C.2 where the decision metric ignores
decoding procedure, with continual update of the information knowledge of frozen bits and instead uses randomization over
set as more reliable estimates become available. all possible choices. Then, as N goes to infinity, the SC
decoder decision element with index 0n−r 1r sees a channel
whose capacity goes to zero, while the corresponding element
X. A NOTE ON THE RM RULE of the input vector uN 1 is assigned 1 bit of information by
the RM rule. This means that the RM code sequence is
In this part, we return to the claim made in Sect. I-D that the asymptotically unreliable under this type of SC decoding.
RM rule for information set selection leads to asymptotically We should emphasize that the above result does not say
unreliable codes under SC decoding. that RM codes are asymptotically bad under any SC decoder,
Recall that, for a given (N, K), the RM rule constructs a nor does it make a claim about the performance of RM
GN -coset code with parameter (N, K, A, uAc ) by prioritizing codes under other decoding algorithms. (It is interesting that
each index i ∈ {1, . . . , N } for inclusion in the information set the possibility of RM codes being capacity-achieving codes
A w.r.t. the Hamming weight of the ith row of GN . The RM under ML decoding seems to have received no attention in
rule sets the frozen bits uAc to zero. In light of Prop. 17, the the literature.)
RM rule can be restated in bit-indexed terminology as follows.
RM rule: For a given (N, K), with N = 2n , n ≥ 0, 0 ≤
XI. C ONCLUDING REMARKS
K ≤ N , choose A as follows: (i) Determine the integer r such
that In this section, we go through the paper to discuss some
n n results further, point out some generalizations, and state some
X n X n open problems.
≤K< . (80)
k k
k=r k=r−1
A. Rate of polarization
(ii) Put each index b1 · · · bn with wH (b1 , . . . , bn ) ≥ r into
A. (iii) Put sufficiently many additional indices b1 · · · bn with A major open problem suggested by this paper is to deter-
wH (b1 , . . . , bn ) = r − 1 into A to complete its size to K. mine how fast a channel polarizes as a function of the block-
We observe that this rule will select the index length parameter N . In recent work [12], the following result
has been obtained in this direction.
n−r r
z }| { z }| { Proposition 18: Let W be a B-DMC. For any fixed rate
n−r r ∆
0 1 = 0···01···1 R < I(W ) and constant β < 21 , there exists a sequence of
sets {AN } such that AN ⊂ {1, . . . , N }, |AN | ≥ N R, and
for inclusion in A. This index turns out to be a particularly X (i) β
poor choice, at least for the class of BECs, as we show in the Z(WN ) = o(2−N ). (82)
remaining part of this section. i∈AN
Let us assume that the code constructed by the RM rule is
Conversely, if R > 0 and β > 12 , then for any sequence of
used on a BEC W with some erasure probability ǫ > 0. We
sets {AN } with AN ⊂ {1, . . . , N }, |AN | ≥ N R, we have
will show that the symmetric capacity I(W0n−r 1r ) converges
β
to zero for any fixed positive coding rate as the block-length (i)
max{Z(WN ) : i ∈ AN } = ω(2−N ). (83)
is increased. For this, we recall the relations (6), which, in
bit-indexed channel notation of Sect. IV, can be written as As a corollary, Theorem 3 is strengthened as follows.
19
Proposition 19: For polar coding on a B-DMC W at any the size of the construction is N = mn after n steps. The
fixed rate R < I(W ), and any fixed β < 21 , construction is characterized by a kernel Fm : X m ×R → X m
β
where R is some finite set included in the mapping for
Pe (N, R) = o(2−N ). (84) randomization. The reason for introducing randomization will
1 be discussed shortly.
This is a vast improvement over the O(N − 4 ) bound proved The vectors uN N
and y1N ∈ Y N in Fig. 11 denote
1 ∈ X
in this paper. Note that the bound still does not depend on the input and output vectors of WN . The input vector is first
the rate R as long as R < I(W ). A problem of theoretical transformed into a vector sN N
1 ∈X by breaking it into N con-
interest is to obtain sharper bounds on Pe (N, R) that show a secutive sub-blocks of length m, namely, um N
1 , . . . , uN −m+1 ,
more explicit dependence on R. and passing each sub-block through the transform Fm . Then,
Another problem of interest related to polarization is ro- a permutation RN sorts the components of sN 1 w.r.t. mod-m
bustness against channel parameter variations. A finding in residue classes of their indices. The sorter ensures that, for any
this regard is the following result [13]: If a polar code is 1 ≤ k ≤ m, the kth copy of WN/m , counting from the top of
designed for a B-DMC W but used on some other B-DMC the figure, gets as input those components of sN 1 whose indices
W ′ , then the code will perform at least as well as it would are congruent to k mod-m. For example, v1 = s1 , v2 = sm+1 ,
perform on W provided W is a degraded version of W ′ in vN/m = s(N/m−1)m+1 , vN/m+1 = s2 , vN/m+2 = sm+2 , and
the sense of Shannon [14]. This result gives reason to expect a so on. The general formula is vkN/m+j = sk+(j−1)m+1 for
graceful degradation of polar-coding performance due to errors all 0 ≤ k ≤ (m − 1), 1 ≤ j ≤ N/m.
in channel modeling. We regard the randomization parameters r1 , . . . , rm as being
chosen at random at the time of code construction, but fixed
B. Generalizations throughout the operation of the system; the decoder operates
with full knowledge of them. For the binary case considered
in this paper, we did not employ any randomization. Here,
u1 s1 v1 y1 randomization has been introduced as part of the general
.. ..
. Fm . .. .. construction because preliminary studies show that it greatly
um sm WN/m
. . simplifies the analysis of generalized polarization schemes.
vN/m yN/m This subject will be explored further in future work.
r1 Certain additional constraints need to be placed on the
um+1 sm+1
.. .. vN/m+1 yN/m+1 kernel Fm to ensure that a polar code can be defined that is
. Fm .
u2m s2m suitable for SC decoding in the natural order u1 to uN . To that
.. .. end, it is sufficient to restrict Fm to unidirectional functions,
. WN/m .
r2 namely, invertible functions of the form Fm : (um 1 , r) ∈
v2N/m y2N/m X m × R 7→ xm ∈ X m
such that x = f (u m
, r), for a
1 i i i
RN given set of coordinate functions fi : X m−i+1 × R → X ,
i = 1, . . . , m. For a unidirectional Fm , the combined channel
.. .. (i)
. . .. .. WN can be split to channels {WN } in much the same way
. . as in this paper. The encoding and SC decoding complexities
of such a code are both O(N log N ).
vN−N/m+1 Polar coding can be generalized further in order to overcome
yN−N/m+1
the restriction of the block-length N to powers of a given
uN−m+1 sN−m+1
number m by using a sequence of kernels Fmi , i = 1, . . . , n,
.. ..
.. .. . WN/m . in the code construction. Kernel Fm1 combines m1 copies
. Fm . of a given DMC W to create a channel Wm1 . Kernel Fm2
uN sN vN yN
combines m2 copies of Wm1 to create Qna channel Wm1 m2 , etc.,
for an overall block-length of N = i=1 mi . If all kernels are
rm unidirectional, the combined channel WN can still be split into
WN (i)
channels WN whose transition probabilities can be expressed
Fig. 11. General form of channel combining.
by recursive formulas and O(N log N ) encoding and decoding
complexities are maintained.
So far we have considered only combining copies of one
The polarization scheme considered in this paper can be DMC W . Another direction for generalization of the method is
generalized as shown in Fig. 11. In this general form, the to combine copies of two or more distinct DMCs. For example,
channel input alphabet is assumed q-ary, X = {0, 1, . . . , q−1}, the kernel F considered in this paper can be used to combine
for some q ≥ 2. The construction begins by combining m copies of any two B-DMCs W , W ′ . The investigation of
independent copies of a DMC W : X → Y to obtain Wm , coding advantages that may result from such variations on the
where m ≥ 2 is a fixed parameter of the construction. The basic code construction method is an area for further research.
general step combines m independent copies of the channel It is easy to propose variants and generalizations of the
WN/m from the previous step to obtain WN . In general, basic channel polarization scheme, as we did above; however,
20
p
find k 2 /(1 + k 2 ) = δ. So, occurs at δi = δRi
the maximum √ From these and the fact that (Y1 , Y2 ) 7→ Ỹ is invertible, we
Pn p 2
and has the value i=1 p Ri − δ 2 Ri2 = 1 − δ 2 . We have get
thus shown pthat Z(W ) ≤ 1 − d(W )2 , which is equivalent
to d(W ) ≤ 1 − Z(W )2 . I(W ′ ) = I(U1 ; Ỹ ) = I(U1 ; Y1 Y2 ),
From the above two lemmas, the proof of (2) is immediate. I(W ′′ ) = I(U2 ; Ỹ U1 ) = I(U2 ; Y1 Y2 U1 ).
W ′ (ỹ|u1 ) = PỸ |U1 (ỹ|u1 ), Thus, either there exists no y2 such that W (y2 |0)W (y2 |1) > 0,
in which case I(W ) = 1, or for all y1 we have W (y1 |0) =
W ′′ (ỹ, u1 |u2 ) = PỸ U1 |U2 (ỹ, u1 |u2 ). W (y1 |1), which implies I(W ) = 0.
22
D. Proof of Proposition 5 To prove (28), we need the following result which states
Proof of (26) is straightforward. that the parameter Z(W ) is a convex function of the channel
X p transition probabilities.
Z(W ′′ ) = W ′′ (f (y1 , y2 ), u1 |0) Lemma 4: Given any collection of B-DMCs Wj : X → Y,
y12 ,u1 j ∈ J , and a probability distribution
P Q on J , define W : X →
p Y as the channel W (y|x) = j∈J Q(j)Wj (y|x). Then,
· W ′′ (f (y1 , y2 ), u1 |1)
X
X 1p Q(j)Z(Wj ) ≤ Z(W ). (88)
= W (y1 | u1 )W (y2 | 0)
2
2 j∈J
y1 ,u1
p Proof: This follows by first rewriting Z(W ) in a different
· W (y1 | u1 ⊕ 1)W (y2 | 1) form and then applying Minkowsky’s inequality [10, p. 524,
Xp ineq. (h)].
= W (y2 | 0)W (y2 | 1) Xp
y2 Z(W ) = W (y|0)W (y|1)
X 1 Xp y
· W (y1 | u1 )W (y1 | u1 ⊕ 1)
2 y " #2
u 1 1 1 X Xp
= −1 + W (y|x)
= Z(W )2 . 2 y x
" #2
To prove (27), we put for shorthand α(y1 ) = W (y1 |0), 1 XX Xq
δ(y1 ) = W (y1 |1), β(y2 ) = W (y2 |0), and γ(y2 ) = W (y2 |1), ≥ −1 + Q(j) Wj (y|x)
2 y x
and write j∈J
X
Xp = Q(j) Z(Wj ).
Z(W ′ ) = W ′ (f (y1 , y2 )|0) W ′ (f (y1 , y2 )|1) j∈J
y12
X 1p
= α(y1 )β(y2 ) + δ(y1 )γ(y2 ) We now write W ′ as the mixture
2
y12
1
p W ′ (f (y1 , y2 )|u1 ) = W0 (y12 | u1 ) + W1 (y12 |u1 )
· α(y1 )γ(y2 ) + δ(y1 )β(y2 ) 2
X 1 hp p i where
≤ α(y1 )β(y2 ) + δ(y1 )γ(y2 )
2 W0 (y12 |u1 ) = W (y1 |u1 )W (y2 |0),
y12
W1 (y12 |u1 ) = W (y1 |u1 ⊕ 1)W (y2 |1),
hp p i
· α(y1 )γ(y2 ) + δ(y1 )β(y2 )
Xp and apply Lemma 4 to obtain the claimed inequality
− α(y1 )β(y2 )δ(y1 )γ(y2 )
y12 1
Z(W ′ ) ≥ [Z(W0 ) + Z(W1 )] = Z(W ).
2
where the inequality follows from the identity
Since 0 ≤ Z(W ) ≤ 1 and Z(W ′′ ) = Z(W )2 , we have
hp i2 Z(W ) ≥ Z(W ′′ ), with equality iff Z(W ) equals 0 or 1. Since
(αβ + δγ)(αγ + δβ) Z(W ′ ) ≥ Z(W ), this also shows that Z(W ′ ) = Z(W ′′ ) iff
p √ √ p √ Z(W ) equals 0 or 1. So, by Prop. 1, Z(W ′ ) = Z(W ′′ ) iff
+ 2 αβδγ ( α − δ)2 ( β − γ)2
hp i2 I(W ) equals 1 or 0.
p √ p p
= ( αβ + δγ)( αγ + δβ) − 2 αβδγ .
E. Proof of Proposition 6
Next, we note that
X p From (17), we have the identities
α(y1 ) β(y2 )γ(y2 ) = Z(W ).
W ′ (f (y1 , y2 )|0)W ′ (f (y1 , y2 )|1) =
y12
1
p W (y1 |0)2 + W (y1 |1)2 W (y2 |0)W (y2 |1)+
Likewise, each p expanding ( α(y1 )β(y2 ) +
term obtained by p 4
p
δ(y1 )γ(y2 ))( α(y1 )γ(y2 ) + δ(y 1
p1 )β(y2 )) gives Z(W ) W (y2 |0)2 + W (y2 |1)2 W (y1 |0)W (y1 |1), (89)
when summed over y12 . Also, α(y1 )β(y2 )δ(y1 )γ(y2 ) 4
summed over y12 equals Z(W )2 . Combining these, we obtain
W ′ (f (y1 , y2 )|0) − W ′ (f (y1 , y2 )|1) =
the claim (27). Equality holds in (27) iff, for any choice of
y12 , one of the following is true: α(y1 )β(y2 )γ(y2 )δ(y1 ) = 0 1
[W (y1 |0) − W (y1 |1)] [W (y2 |0) − W (y2 |1)] . (90)
or α(y1 ) = δ(y1 ) or β(y2 ) = γ(y2 ). This is satisfied if W 2
is a BEC. Conversely, if we take y1 = y2 , we see that for Suppose W is a BEC, but W ′ is not. Then, there exists (y1 , y2 )
equality in (27), we must have, for any choice of y1 , either such that the left sides of (89) and (90) are both different from
α(y1 )δ(y1 ) = 0 or α(y1 ) = δ(y1 ); this is equivalent to saying zero. From (90), we infer that neither y1 nor y2 is an erasure
that W is a BEC. symbol for W . But then the RHS of (89) must be zero, which
23
is a contradiction. Thus, W ′ must be a BEC. From (90), we [12] E. Arıkan and E. Telatar, “On the rate of channel polarization,” Aug.
conclude that f (y1 , y2 ) is an erasure symbol for W ′ iff either 2008, arXiv:0807.3806v2 [cs.IT].
[13] A. Sahai, P. Glover, and E. Telatar. Private communication, Oct. 2008.
y1 or y2 is an erasure symbol for W . This shows that the [14] C. E. Shannon, “A note on partial ordering for communication channels,”
erasure probability for W ′ is 2ǫ − ǫ2 , where ǫ is the erasure Information and Control, vol. 1, pp. 390–397, 1958.
probability of W . [15] R. G. Gallager, “Low-density parity-check codes,” IRE Trans. Inform.
Theory, vol. IT-8, pp. 21–28, Jan. 1962.
Conversely, suppose W ′ is a BEC but W is not. Then, there [16] G. D. Forney Jr., “Codes on graphs: Normal realizations,” IEEE Trans.
exists y1 such that W (y1 |0)W (y1 |1) > 0 and W (y1 |0) − Inform. Theory, vol. IT-47, pp. 520–548, Feb. 2001.
W (y1 |1) 6= 0. By taking y2 = y1 , we see that the RHSs of [17] E. Arıkan, “A performance comparison of polar codes and Reed-Muller
codes,” IEEE Comm. Letters, vol. 12, pp. 447–449, June 2008.
(89) and (90) can both be made non-zero, which contradicts
the assumption that W ′ is a BEC.
The other claims follow from the identities
F. Proof of Lemma 1
The proof follows that of a similar result from Chung
∆
[9, Theorem 4.1.1]. Fix ζ > 0. Let Ω0 = {ω ∈ Ω :
limn→∞ Zn (ω) = 0}. By Prop. 10, P (Ω0 ) = I0 . Fix ω ∈ Ω0 .
Zn (ω) → 0 implies that there exists n0 (ω, ζ) such that
n ≥ n0 (ω, ζ) S ⇒ Zn (ω) ≤ ζ. Thus, ω ∈ S Tm (ζ) for some
∞ ∞
m. So, Ω0 ⊂ m=1 Tm (ζ).S Therefore, P ( m=1 Tm (ζ)) ≥
∞
P (Ω0 ). Since Tm (ζ) ↑ m=1 Tm (ζ), by the monotone
convergence
S∞ property of a measure, limm→∞ P [Tm (ζ)] =
P [ m=1 Tm (ζ)]. So, limm→∞ P [Tm (ζ)] ≥ I0 . It follows
that, for any ζ > 0, δ > 0, there exists a finite m0 = m0 (ζ, δ)
such that, for all m ≥ m0 , P [Tm (ζ)] ≥ I0 − δ/2. This
completes the proof.
R EFERENCES
[1] C. E. Shannon, “A mathematical theory of communication,” Bell System
Tech. J., vol. 27, pp. 379–423, 623–656, July-Oct. 1948.
[2] E. Arıkan, “Channel combining and splitting for cutoff rate improve-
ment,” IEEE Trans. Inform. Theory, vol. IT-52, pp. 628–639, Feb. 2006.
[3] D. E. Muller, “Application of boolean algebra to switching circuit design
and to error correction,” IRE Trans. Electronic Computers, vol. EC-3,
pp. 6–12, Sept. 1954.
[4] I. Reed, “A class of multiple-error-correcting codes and the decoding
scheme,” IRE Trans. Inform. Theory, vol. 4, pp. 39–44, Sept. 1954.
[5] M. Plotkin, “Binary codes with specified minimum distance,” IRE Trans.
Inform. Theory, vol. 6, pp. 445–450, Sept. 1960.
[6] S. Lin and D. J. Costello, Jr., Error Control Coding, (2nd ed). Upper
Saddle River, N.J.: Pearson, 2004.
[7] R. E. Blahut, Theory and Practice of Error Control Codes. Reading,
MA: Addison-Wesley, 1983.
[8] G. D. Forney Jr., “MIT 6.451 Lecture Notes.” Unpublished, Spring 2005.
[9] K. L. Chung, A Course in Probability Theory, 2nd ed. Academic: New
York, 1974.
[10] R. G. Gallager, Information Theory and Reliable Communication. Wiley:
New York, 1968.
[11] J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation
of complex Fourier series,” Math. Comput., vol. 19, no. 90, pp. 297–301,
1965.