Lecture Notes

MATH32031 Coding Theory 1
0. Introduction
Synopsis. This lecture offers a general introduction into the subject matter of Coding The-
ory, and as such, contains very little mathematics. We try to give some motivation and
historical examples. We conclude with a list of topics from Pure Mathematics which the
students should be familiar with to better follow the course.
It is fair to say that our age is the age of information. Huge quantities of information and
data literally ow around us and are stored in various forms. Information processing gives
rise to many mathematical questions. Information may need to be processed because we
may need to:
store the information;
encrypt the information;
transmit the information.
For practical purposes, information needs to be stored efciently, which leads to problems
such as compacting or compressing the information. For the purposes of data protection
and security, information may need to be encrypted. We will NOT consider these problems
here.
In this course, we will address problems that arise in connection with information trans-
mission.
We do not attempt to give an exhaustive denition of information. For our purposes, it will
be enough to say:
to transmit information means to be able to convey different messages from the sender to
the receiver.
Note that there should exist at least two different messages which can be transmitted. If
there is only one possible message, it does not carry any information.
1
1
One may ask if a recipient which waits for one specic signal is not receiving any information, as there is
only one possible signal. In this case, the absence of a signal at a given time should count as one message, and
the presense of a signal as another message so that there are two possible messages. One should remember,
though, that we present a simplistic mathematical model of information transmission.
A simple model of information transmission is as follows:
sender
MESSAGE
channel
NOISE
RECEIVED
WORD
user
The channel is the medium which conveys the information. Real-life examples of channels
include a telephone line, a satellite communication link, a voice (in a face to face conversa-
tion between individuals), a CD (the sender writes information into it the user reads the
information from it), etc.
In an absolute majority of cases, there is noise in the channel: this means that the words
transmitted down the channel may arrive corrupted. If
RECEIVED WORD = MESSAGE,
we say that a transmission error occurred.
Coding Theory is motivated by the need to reduce the likelihood of transmission errors,
thus making the information transmission more robust. The general scheme for doing that
is as follows:
sender
MESSAGE
encoder
CODEWORD
channel
NOISE
RECEIVED
WORD
decoder
DECODED
MESSAGE
user
By choosing codewords well, one can achieve the following: even if RECEIVED WORD =
CODEWORD, it may be possible to recover the CODEWORD and therefore the MESSAGE.
However, this comes at a price. Codewords are longer (contain more characters) than mes-
sages, therefore their transmission typically costs more than transmission of bare mes-
sages. One should weigh the increased robustness of transmission against the increased
costs of transmission.
The set of codewords is called a code. The goal of Coding Theory is to mathematically
design codes which decrease the likelihood of transmission errors but at the same time are
efcient: do not increase the length of transmitted messages too much. (We will soon make
this more precise.)
Let us consider some examples.
Example 1.
Let the message be either YES or NO. We want to encode it using the characters 0 and 1.
Approach # 1. Use codewords of length 1. NO = codeword 0, YES = codeword 1.
An error occurs if 0 is changed to 1 or vice versa. But looking at the received word, which
is 0 or 1, we have no way to determine whether an error has occurred.
Approach # 2. Use codewords of length 5. NO = codeword 00000, YES = codeword 11111.
Suppose that the received vector is 00100. What was the message?
Intuitively, the likelihood is that the message was NO, encoded by the codeword 00000.
Indeed, we can observe that it takes only one errors to change 00000 to 01001, but it takes
three errors to change 11111 into 00100. We work under the following reasonable assump-
tion: a smaller number of errors in a codeword is more likely to occur that a larger number
of errors. We therefore decode the received vector 01001 as NO.
It is easy to see that the original message will thus be correctly decoded if no more than
2 errors occur in transmission of a codeword. But this comes at a price of multiplying the
length of the message by 5.
Essentially, repetition codes have been used for thousands of years: if we didnt hear a
word in a conversation, we ask the speaker to repeat it. Amazingly, more efcient codes
were only invented in the middle of 20th century. The credit goes to Richard Hamming
(19151998), who is considered to be the founder of the modern coding theory.
Example 2.
Here is a real-world example of how Coding Theory is used in scientic research.
Voyager 1 is an unmanned spacecraft launched by NASA in 1977. Its primary mission
was to explore Jupiter, Saturn, Uranus and Neptune. Voyager 1 sent a lot of precious
photographs and data back to Earth. It has recently been in the news because the NASA
scientists had concluded that it reached the interstellar space
2
.
The messages from Voyager 1 have to travel through the vast expanses of interplanetary
space. Given that the spacecraft is equipped with a mere 23 Watt radio transmitter (pow-
ered by a plutonium-238 nuclear battery), it is inevitable that noise, such as cosmic rays,
2
See for example the BBC News item dated 12 September 2013 at http://www.bbc.co.uk/news/science-
environment-24026153
Figure 1: The Voyager spacecraft. Image taken from http://voyager.jpl.nasa.gov/
interferes with its transmissions. In order to protect the data from distortion, it is encoded
with the error-correcting code called extended binary Golay code. We will look at this code
later in the course. More modern space missions employ more efcient and more sophisti-
cated codes.
Example 3.
Here is a more down-to-earth example of the use of error-correcting codes. A CD can hold
up to 80 minutes of music, represented by an array of zeros and ones. The data on the CD
is encoded using a Reed-Solomon code. This way, even if a small scratch, a particle of dust
or a ngerprint happens to be on the surface of the CD, it will still play perfectly well all
due to error correction.
However, every method has its limits, and larger scratches or stains may lead to something
like a thunderclap during playback!
Figure 2: A punch card. Image taken from http://www.columbia.edu/cu/computinghistory
Example 4.
To nish this historical excursion, let us recall one of the very rst uses of error-correcting
codes.
In 1948, Richard Hamming was working at the famous Bell Laboratories. Back then, the
data for computers was stored on punch cards: pieces of thick paper where holes repre-
sented ones and absences of holes represented zeros. Punchers who had to perforate punch
cards sometimes made mistakes, which frustrated Hamming.
Hamming was able to come up with a code with the following properties: each codeword is
7 bits long, and if one error is made in a codeword (i.e., one bit is changed from 0 to 1 or vice
versa), one can still recover the original codeword. This made the punch card technology
more robust, as a punch card with a few mistakes would still be usable. The trade-off,
however, was that the length of data was increased by 75%: there are only 16 different
codewords, therefore, they can be used to convey messages which have the length of 4 bits.
The original Hamming code will be introduced in the course very soon, and every student
will be expected to learn it by heart!
List of topics
The following topics or notions from pure mathematics will be of use when we will be con-
structing error-correcting codes.
Metric spaces. (We will use the basic terminology from metric spaces. We will dene
everything, so having taken a Metric Spaces course is not a prerequisite.)
Finite elds, such as Z
p
. They are nite sets with two operations, + and , satisfying
standard axioms.
Vector spaces over nite elds. They are nite sets but carry linear structure: notions
of a subspace, a basis, linear independence, etc., will be used.
Matrices over nite elds, matrix multiplication.
Polynomials with coefcients in a nite eld, their factorisation.
1. Basic notions
Synopsis. In this lecture, we begin to describe a mathematically rigorous setup for er-
ror-detecting an error-correcting codes. We introduce the relevant terminology and dene the
Hamming distance.
Recall the model of information transmission from the rst lecture:
sender
MESSAGE
encoder
CODEWORD
channel
NOISE
RECEIVED
WORD
decoder
DECODED
MESSAGE
user
Let us start dening the terminology we are going to use throughout the course.
Alphabet: a nite set F. We denote by q the number of elements of F: |F| = q, and assume
q 2.
In this course, F will always be a nite eld F
q
with q elements. The reason for this is
that we are going to design codes using linear algebra. In particular, q must be a prime
power, q = p
k
. We will mostly work with the eld F
p
= Z/pZ = {0, 1, 2, . . . , p 1} where
p is prime. Note that many books on coding theory allow more general alphabets, such as
Z/26Z

= {A, . . . , Z}.
Symbol (or character): an element of the alphabet.
Word: an element of F
n
. Note that F
n
is the set of all n-tuples of symbols:
F
n
= {v = (v
1
, v
2
, . . . , v
n
) | v
i
F, 1 i n}.
We refer to elements of F
n
q
as q-ary words of length n. We use the following traditional
terminology: binary = 2-ary, ternary = 3-ary.
Code: a non-empty subset of F
n
. We say q-ary code of length n. We denote a code by C.
That is, C F
n
, C = .
Codeword: an element of the code.
The number of errors in the received word: If a codeword v = (v
1
, . . . , v
n
) was trans-
mitted and the received word is y = (y
1
, . . . , y
n
), the number of errors in y is the number of
positions where the character in y differs from the character in v:
d(v, y) = |{i {1, . . . , n} : y
i
= v
i
}|.
Denition (Hamming distance)
If v, y are words of length n, the number d(v, y) dened above is called the Hamming dis-
tance between v and y.
Example
Consider the set F
3
2
of binary words of length 3. Explicitly,
F
3
2
= {000, 001, 010, 011, 100, 101, 110, 111}.
Note that d(001, 100) = 2 because in the vectors 001 and 100, the symbols in the rst and
the third position differ.
Clearly, if v, y F
n
q
, then 0 d(v, y) n.
Lemma 1.1 (Properties of the Hamming distance)
For any words v, y, z F
n
,
1. d(v, y) 0; d(v, y) = 0 v = y.
2. d(v, y) = d(y, v).
3. d(v, z) d(v, y) +d(y, z) (the triangle inequality).
Remark: a function d(, ) of two arguments which satises axioms 1.3. is called a met-
ric. This is familiar to those who studied Metric spaces. The Lemma says that the Hamming
distance turns F
n
into a metric space.
Proof. Recall that the Hamming distance d(v, y) between two vectors v, y F
n
is the num-
ber of positions where these vectors differ.
1. Clearly, the number of positions where v and y differ is an integer between 0 and n. It is
zero iff the two vectors are equal.
2. The number of positions in which v differs from y is equal to the number of positions in
which y differs from v. This is obvious.
3. d(v, y) is the minimal number of symbol changes needed to get from v to y. Take the
vector v and make d(v, y) changes to obtain y. Then make d(y, z) changes to obtain z. We
have made d(v, y) + d(y, z) changes to get from v to z. This number of changes may not be
minimal (because we may have changed some symbols twice), but it is at least d(v, z).
We continue to dene the terminology we are going to use.
Decoder (or decoding scheme, or decoding algorithm): a function DECODE: F
n
C.
Our model of transmission is as follows:
1. A codeword v C is sent via the channel.
2. A word y is received at the other end. It is possible that y = v, because of noise.
3. By looking at y, we want to guess what was v. In other words: we apply the function
DECODE to y.
We always work under assumption that fewer errors during transmission are more likely
than more errors. This motivates the following
Denition (nearest neighbour)
Let C F
n
be a code. A nearest neighbour of a word y F
n
is a codeword v C such that
d(v, y) = min{d(z, y) | z C}.
Notice that a word might have more than one nearest neighbour (there may be several
codewords at the same distance from y). So a nearest neighbour is not always unique.
Example
Consider the following binary code of length 3: C = {001, 100}. The word 000 F
3
2
has two
nearest neighbours in C, namely, 001 and 100. Indeed, d(001, 000) = d(000, 100) = 1.
The same can be said of the word 101.
Aside: Interestingly, both 000 and 101 are midpoints between v = 001 and w = 100.
Indeed, the distance from 000 to v and w is exactly half the distance between v and w.
So, in the metric space F
n
dened by the Hamming distance, there can be more than one
midpoint between a pair of points (or there can be none). Note that in a Euclidean space,
there is always exactly one midpoint between any two points. This shows that F
n
, as a
metric space, is quite different from the more conventional Euclidean spaces.
Remark
Let C be a code in F
n
.
If y F
n
is a codeword, then y has unique nearest neighbour in C: namely, y itself.
We will soon introduce a class of codes called perfect codes. It will turn out that if a
code C is perfect, every word in F
n
has a unique nearest neighbour in C.
Denition (nearest neighbour decoding)
A decoder for a code C is called nearest neighbour decoding if for any y F
n
, DECODE(y) is
a nearest neighbour of y in C.
Throughout the course, a decoder is always a nearest neighbour decoder.
Remark: The above assumption may not determine a decoder completely. Suppose that
a received word y has more than one nearest neighbour (which may happen unless the
code is perfect); which one of them will be DECODE(y)? This will depend on a particular
implementation of a decoder.
Nevertheless, the Nearest Neighbour Principle a strong assumption which allows us to
prove a number of properties of codes. In particular, it guarantees that if v is a codeword,
DECODE(v) = v.
We can now see that transmission can have three possible outcomes. Let v be the code-
word which is sent, and let y be the received word.
DECODE(y) = v. The codeword is decoded correctly.
y / C (so we know for sure that y contains errors) but DECODE(y) = v. The error is
detected but not corrected.
y C so that automatically DECODE(y) = y, but y = v. This is undetected error.
Intuitively, it is easy to see that to avoid undetected errors, the codewords should be far
apart, i.e., the distance between any two distinct codewords should be large. That way,
a lot of symbol errors in the received word y are needed in order for it to be a codeword
different from v and this is not likely to happen. We will formalise this in the next
lecture.
Synopsis. It turns out that the ability of a code C to detect and correct transmission errors
is expressed by the minimum distance of C, denoted d. We look at simple examples of codes
and learn to use the notation (n, M, d)
q
and [n, k, d]
q
.
Denition (parameters of a code)
Let C F
n
q
be a code. Then:
n is the length of the code;
M denotes the number of codewords, i.e., M = |C|;
k = log
q
M is the information dimension of C
(warning: k may not be an integer, although we will see that k is an integer for inter-
esting types of codes);
d(C) = min{d(v, w) : v, w C, v = w} is the minimum distance of C. It is dened if
|C| 2.
We say that C is an (n, M, d)
q
-code or an [n, k, d]
q
-code.
The importance of the minimum distance d is clear from the following
Theorem 1.2 (The number of errors detected/corrected by a code)
Let C be a code with d(C) = d.
1. A word at a distance d 1 from a codeword CANNOT be another codeword. We thus
say that the code detects up to d 1 errors.
2. A word at a distance
_
d1
2
from a codeword v has a unique nearest neighbour, which is

v. We say that the code corrects up to
_
d1
2
errors.
Remark: [a] denotes the integer part of a real number a. Thus, [3] = [3.5] = [] = 3,
[7
3
4
] = 7, etc.
Proof and explanation. Let v C.
1. If y F
n
and d(v, y) d 1, then y cannot be another codeword because the minimum
distance from v to another codeword is d.
Thus, if at most d 1 symbol errors occur in a codeword i.e., up to d 1 symbols in the
codeword are changed during transmission via the channel the received word will not be
in the code, and the receiver will know that errors have occurred.
2. Suppose d(v, y) t =
_
d1
2
. If w = v is a nearest neighbour of y, then d(y, w) t. Then

d(v, w) d(v, y) +d(y, w) t +t = 2t d 1
which contradicts d being the minimum distance of C. Therefore, v is the only nearest
neighbour of y.
Thus, if up to t symbol errors occur in a codeword v during transmission, the received word
is decoded back to v by the Nearest Neighbour Decoding principle. This means that the
errors are corrected.
We will now discuss some easy codes and nd out how many errors so they detect and
correct. The following are examples from pre-Hamming era.
Example: the trivial code
Let C = F
n
(every word is a codeword).
Then d(C) = 1. Indeed, d(C) is always positive, because it is measured between pairs of
distinct codewords; and there are codewords at distance 1 in the trivial code, for example
000 . . . 0 and 100. . . 0.
This code C is called the trivial q-ary code of length n. It is an [n, n, 1]
q
-code. It detects 0
errors and corrects 0 errors.
Example: the repetition code
C = {00...0, 11...1, . . . , } F
n
q
has q codewords of length n. Each codeword is made up of one
symbol repeated n times.
The number of codewords is M = q. The q-ary repetition code of length n is an [n, 1, n]
q
-
code.
It detects (n 1) errors and corrects
_
n1
2
errors.
Is this an efcient error-correcting code? We will be able to answer this question when we
learn how to compare codes and see more examples.
To introduce the next example, we need the following
Denition (weight)
The weight, w(v), of a vector v F
n
q
is the number of non-zero symbols in v.
Example: the even weight code
The binary even weight code of length n is dened as C = {v F
n
2
: w(v) is even}.
For example, the binary even weight code of length 2 is {00, 11}.
Remark: 0 is an even number! The binary even weight code contains the codeword 00 . . . 0.
Minimum distance: let a binary word v be of even weight. If we change one bit (=binary
symbol) in v, we get a word of odd weight. Indeed, if 0 is changed to 1 or 1 is changed to 0,
this increases/decreases the weight by 1, making it odd.
It follows that the distance between two codewords cannot be 1, so d(C) 2. On the other
hand, d(0000 . . . 0, 1100 . . . 0) = 2. This shows that d(C) = 2.
The code detects 1 error and corrects 0 errors.
The number of codewords: in a codeword v = (x
1
, x
2
, . . . , x
n
), the rst n1 bits (= binary
symbols) can be arbitrary, and then the last one is determined by x
n
= x
1
+x
2
+. . . +x
n1
,
where + is the addition is in the eld F
2
. This makes sure that the weight of the whole
codeword is even. We thus have 2
n1
codewords.
Another argument to that effect is as follows. We can take a binary word and ip (change)
its rst bit. This operation splits the set F
n
2
into pairs of vectors, such that the vectors in a
pair only differ in the rst bit. Each pair contains one vector of even weight and one vector
of odd weight. Therefore, the number of vectors of even weight is equal to the number of
vectors of odd weight, and is
1
2
|F
n
2
| = 2
n1
.
Conclusion: this is an [n, n 1, 2]
2
-code.
Remark: A widely used code. If an error is detected, the recipient will request retransmis-
sion of the codeword where the error occurred. Error correction is not available.
Synopsis. We learn how to compare codes of different length using the code rate, R, and
the relative distance, . We plot the simple families of codes on the R plane, but are not
satised with the results. We compare Hammings approach to codes with Shannons and
state Shannons Theorem, which shows that codes make it possible to transmit information
over a noisy channel with an arbitrary small probability of error.
Recall that useful propertied of a code C F
n
q
are reected in its parameters (n, M, d)
q
,
otherwise written as [n, k, d]
q
where k = log
q
M. The power of the code to detect and correct
errors is reected in its minimum distance, d. There is a special term for codes that share
the same set of parameters:
Denition (parameter equivalence)
We say that two codes are parameter equivalent, if they both are [n, k, d]
q
-codes for some n,
k, d and q.
Clearly, parameter equivalent codes are of the same value if one considers how efcient
they are in correcting transmission errors
3
. But how to compare codes of different length?
The accepted way in Coding Theory is to use the following two parameters:
Denition (code rate, relative distance)
Let C be an [n, k, d]
q
-code.
The code rate of C is R =
k
n
.
The relative distance of C is =
d
n
.
We will say that code C
is better than code C if R(C
) R(C) and (C
) (C).
Remark: Intuitively, it is clear why the higher the code rate, the better. Indeed, when
we encode information using a code C, we increase the length of information by a factor of
1
R
. This is easy to see if k is an integer. In this case, we can represent each of the M = q
k
messages by a word of length k in the q-ary alphabet. After encoding, it becomes a codeword
of length n, so that the length is increased
n
k
-fold.
It is less clear why we should compare the relative distance of codes. We will not give a
formal argument at this stage.
3
One of parameter equivalent codes can still be preferable for other reasons for example, it may be easier to
write a computer programme to encode/decode it.
R
C
C
1
1
0
Figure 3: The R plane
The R plane
It is customary to represent a code C by a dot in the R coordinate plane, with coordinates
(C) and R(C). See Figure 3.
Looking at Figure 3, we observe that code C
is better than code C
, because the dot repre-

senting C
is higher and to the right of the dot representing C.

Note that and R are not arbitrary real numbers. In fact, they cannot be greater than 1:
Observation (trivial bound). For any code C one has R(C) 1. Moreover, R(C) = 1 if
and only if C is a trivial code.
Proof. Let C be an [n, k, d]
q
-code. Then, by denition, C is a non-empty subset of F
n
q
. In
particular, 1 M = |C| |F
n
q
| = q
n
. So 0 k = log
q
M n and 0 R =
k
n
1. The
equality is attained only if C = F
n
q
, i.e., C is a trivial code.
This upper bound on M, k and R is known as the trivial bound, because we use that every
code C is a subset of a trivial code F
n
q
.
Observation (0 < 1). As noted earlier, the Hamming distance between any two words
of length n is an integer between 0 and n. Therefore, 0 < d(C) n for any code of length
n, hence 0 <
d(C)
n
1.
It follows that the dots representing codes are all in the 1 1 square [ 0, 1] [ 0, 1].
Discussion: are high and high R achievable?
Note that the trivial codes, the repetition codes and the even weight codes are families of
codes that converge to points on one of the axis in the R plane. That is, one of the limit
paremters is zero for each family. This isnt too satisfactory.
Surely it would be great to have very good codes with both code rate and the relative dis-
tance very close to 1. (The code rate cannot equal 1 if we want non-trivial codes.) This
would allow us to achieve a very low likelihood of an error when transmitting information
over a noisy channel.
It would be even better to have a family of codes, say C
1
, C
2
, . . ., such that the length of C
m
increases and (C
m
) and R(C
m
) both tend to 1. Because a code of xed length n cannot
correct more than
_
n1
2
errors, it is clear that we may never be able to make the likelihood

of an uncorrected error as low as we want. The solution to that is to design a family of codes
of increasing length.
Such a good family of codes would be represented by a sequence of points converging to
(1, 1) in the R plane, like the one shown in Figure 3.
There is only one problem with a good family of codes: namely, it does not exist. We will
see why this is so in the next section. At the moment, let us say that not every point of
the square [ 0, 1] [ 0, 1] is attainable i.e., has a family of codes converging to it. Far
from that there is a big region in [ 0, 1] [ 0, 1], which contains the upper right corner
of the square and consists of unattainable points. In fact, well see that C
in Figure 3 is
unattainable. We will start seeing that region when we talk about bounds.
End of discussion.
For the moment, let us plot the simple families of codes on the R plane.
The family of trivial codes: the trivial code of length n is an [n, n, 1]
q
-code, so =
1
n
and
R = 1.
We can see that this family converges to the point (0, 1) in the R plane.
The family of repetition codes: the parameters of a repetition code are [n, 1, n]
q
, so
= 1 and R =
1
n
. They converge to the point (1, 0) in the R plane.
The binary even weight codes: they are [n, n 1, 2]
2
codes, represented by (
2
n
, 1
1
n
)
converging to (0, 1).
We plot these families of codes and their limit points in Figure 4.
R
1
1
0
trivial codes
even weight codes
repetition codes
Figure 4: The code families of trivial, repetition and even weight codes
Discussion: Hamming vs. Shannon
We will now talk about the two approaches to codes, both of which date back to 1948. One
is due to Hamming, the other due to Shannon.
Hammings goal was to make sure that all the information is received and decoded correctly.
This is possible if we know that the noise is such that there are at most t =
_
d1
2
symbol
errors per codeword this is known as the worst case assumption. One can rarely make
such an assumption in practice. Nevertheless, codes designed with this assumption in mind
tend to perform well in real-life situations (examples will follow later in the course).
On the other hand, Shannon assumed that the noise in the channel is random. Shannons
model of the channel is as follows. The data is sent down the channel, symbol after symbol.
There are two possible outcomes when a symbol x is sent down the channel:
x is received at the other end (no symbol error);
some other symbol y = x is received at the other end (symbol error).
Shannon assumed that symbol errors occur randomly, and the channel is memoryless: the
symbol error in symbol x is independent of errors in all previous symbols.
Shannon stated that the goal of a code must be to maximise
P
corr
= the probability for a codeword to be received and decoded correctly.
He was able to prove his main theorem: even if a channel is noisy, it is still possible to
transmit information with P
corr
arbitrarily close to 1, as long as the code rate R is less than
the capacity of the channel, and the length n of the code is large.
A simplied statement of Shannons theorem is given in the slides below.
Hamming vs. Shannon
Richard Hamming
(19151998, USA)
Founder of Coding Theory
Claude Shannon
(19162001, USA)
Founder of Information
Theory
Hammings vs. Shannons approach to codes
Noise model Worst case Random noise
Objectives Maximise R Maximise R
Maximise d Maximise P
corr
Who wins in
practical applications? ? ?
Shannons Theorem.
Every memoryless channel has a capacity 0 Cap 1, such
that for > 0 and for suciently large length n:
There exists a code with Cap < R < Cap and
P
corr
> 1 for all codewords;
Every code with R > Cap + contains a codeword for which
P
corr

1
2
.
2. Bounds
Synopsis. To respond to the Information Theory challenge, coding theorists want to design
families of codes which have high code rate and high relative distance. There is a trade-off
between these parameters, which is expressed by inequalities known as bounds. We prove
two such inequalities: the Hamming bound and the Singleton bound. We discuss how they
affect the shape of the subset of the unit square called the code region.
Discussion (not examinable). Shannons Theorem set a challenge for coding theorists. It
said that for each memoryless channel, there exists a family of codes C
i
, with the code rate
R(C
i
) approaching the capacity Cap of the channel and the probability of correct decoding
P
corr
(C
i
) approaching 1 as i . (Necessarily, the length n
i
of C
i
tends to innity.)
Shannons proof essentially said that for sufciently large n, one can pick the codewords
randomly. Provided that the code rate stays below the channel capacity, almost all random
codes have P
corr
close to 1. This was a ne example of an existence proof ; moreover, the idea
of considering a random object has since found many uses in mathematics.
But random codes are impractical for a number of reasons, such as:
a randomly picked code is not guaranteed to perform well, it is merely likely to do so
but we might be unlucky;
to decode a received word, we need to nd a codeword closest to it, but for a random
code this is only possible if we store a complete list of codewords a lot of storage
memory is required.
Coding Theory aims to design more practical families of codes. Ideally, a family of codes
should have high code rate R and high relative distance . The latter will ensure that,
under reasonable assumptions about the random noise in the channel, P
corr
will go to 1 as
n increases.
Intuitively, there is a trade-off between R and d: high R means a large number of code-
words, but then the codewords will be more densely packed in the set F
n
, and the minimum
distance, hence , will be low.
This is expressed rigorously by bounds inequalities on the parameters of a code.
End of discussion.
Theorem 2.1 (Hamming bound)
For any (n, M, d)
q
code, M
q
n
t
i=0
_
n
i
_
(q 1)
i
where t = [
d1
2
].
Before we prove the theorem, recall that
_
n
i
_
is the number of ways to choose i positions
out of n. This integer is called the binomial coefcient. It is given by the formula
_
n
i
_
=
n!
(ni)! i!
=
n(n1)...(ni+1)
12...i
.
Denition (Hamming sphere)
1
If y F
n
, denote
S
t
(y) = {v F
n
: d(v, y) t}.
We refer to the set S
i
(v) as the Hamming sphere with centre v and radius i.
We can nd the number of words in the Hamming sphere of radius t as follows:
Lemma 2.2 (the number of points in a Hamming sphere). |S
t
(v)| =
t
i=0
_
n
i
_
(q1)
i
.
Proof of Lemma 2.2. To get a vector v at distance i from y, we need to choose i positions out
of n where y will differ from v. Then we need to change the symbol in each of the i chosen
positions to one of the other q 1 symbols. The total number of choices for v which is at
distance exactly i from y is thus
_
n
i
_
(q 1)
i
.
The Hamming sphere contains all vectors at distance 0 i t from v, so we sum in i from
0 up to t.
Proof of Theorem 2.1. First of all, we prove that spheres of radius t centred at distinct code-
words do not overlap. Suppose for contradiction that v, w C, v = w, and S
t
(v) and S
t
(w)
both contain a vector y. Then d(v, w) d(v, y) + d(y, w) t + t = 2t d 1 which
contradicts the minimum distance of the code being d.
This means that the whole set F
n
contains M disjoint spheres centred at codewords. Each
of the Mspheres contains
t
i=0
_
n
i
_
(q1)
i
words. The total number of words in the spheres
is M
t
i=0
_
n
i
_
(q 1)
i
, and it does not exceed |F
n
| = q
n
. The bound follows.
1
A more modern term Hamming ball is also acceptable, and better agrees with the terminology used in Metric
Spaces.
Given the length n and the minimum distance d, we may wish to know whether there are
codes with the number of codewords equal to the Hamming bound. Such a code would be the
most economical (highest possible number M of codewords, highest rate R = log
q
(M)/n).
Such codes have a special name:
Denition (perfect code)
A code which attains the Hamming bound is called a perfect code.
(Attains the bound means that the inequality in the bound becomes an equality for this
code.)
Back to the above question: we may wish to ask whether there exists a perfect (n, M, d)
q
code with given n, d and q.
It turns out that, unfortunately, useful perfect codes are quite rare. We will see their
complete classication up to parameter equivalence later in the course.
The Singleton bound
Another upper bound on the number M of codewords can be conveniently stated for k =
log
q
M.
Theorem 2.3 (Singleton bound). For any [n, k, d]
q
code, k n d +1.
Proof. See the solution to Question 3 on the example sheet, where the bound is established
by puncturing the code d 1 times.
Denition (Maximum distance separable code)
A code which attains the Singleton bound is called a maximum distance separable (MDS)
code.
A practical question is what do these bounds mean for large n. Let us show, on a graph, the
asymptotic form of the Hamming bound and the Singleton bound as n . We use the
and R coordinates.
Fact (not examinable): Let q = 2. In the limit as n , the Hamming bound becomes the
following inequality:
R 1 H
2
(/2), where H
2
(p) = plog
2
1
p
+ (1 p) log
2
1
1 p
.
Note that 1 H
2
(/2) is a smooth decreasing function with value 1 at = 0 and value 0 at
= 1.
Fact: The Singleton bound in terms of and R is R =
k
n

nd+1
n
= (1 +
1
n
) . Taking the
limit as n , we obtain
R 1 ,
independently of q. Let us plot the two asymptotic bounds on the same graph:
0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
R
We can observe that asymptotically, the Hamming boung is stronger than the Singleton
bound. However, this is only true for q = 2. Here is a graph for q = 5, where the limit form
of the Hamming bound is different:
0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
R
The asymptotic form of the two bounds reveal some information about the following set:
Denition (code region)
The q-ary code region is the set of all points (, R) in the square [ 0, 1] [ 0, 1] such that
there exists a sequence {C
i
}
i1
of q-ary codes, where the length of C
i
strictly increases,
lim
i
(C
i
) = and lim
i
R(C
i
) = R.
It follows that the code region lies below both the asymptotic Hamming bound and the
asymptotic Singleton bound. It is known, however, that it is strictly less than the set
bounded by the two asymptotic bounds. The exact shape of the code region is still an open
question in Coding Theory.
Important remark: a dot representing an individual code can lie outside the code region.
This is because the code region is made up of limits of sequences of codes. For example,
trivial codes have R = 1 and as such, are all outside the code region; their limit is the point
(0, 1) which is the uppermost point of the code region.
3. Linear codes
Synopsis. We dene the most important class of codes called the linear codes. Their ability
to correct errors is no worse than that of general codes, but linear codes are easier to im-
plement in practice and allow us to use algebraic methods. In this lecture, we learn how
to dene a linear code by its generator matrix and how to nd the minimum distance by
looking at weights.
Reminder (vector spaces)
As usual, let F
q
denote the eld of q elements. Recall that the set F
n
q
has the structure of a
vector space over the eld F
q
. If the vectors u, v are in F
n
q
, we can add the vectors together:
u +v F
n
q
, and multiply a vector by a scalar: u F
n
q
for all F
q
.
The addition and the scalar multiplication are performed componentwise. We will often
write vectors in compact form, as words:
011011, 100110 F
6
2
011011 +100110 = 111101 F
6
2
.
Denition (linear code)
A code C F
n
q
is linear, if C is a vector subspace of F
n
q
.
Remark: this means that the zero vector 0 belongs to C, and that sums and scalar multi-
ples of codewords are again codewords.
Discussion (not examinable). Why are linear codes useful?
They seem to be as efcient as general codes. In particular, it was proved that Shan-
nons Theorem about the capacity of a channel is still true for linear codes.
It is possible to dene a linear code without specifying all the codewords (see below).
The minimum distance is easier to calculate than for general codes (see below).
We can use algebra to design linear codes and to construct efcient encoding and
decoding algorithms.
The absolute majority of codes designed by coding theorists are linear codes. In the rest of
the course, (almost) all the codes we consider will be linear codes.
End of discussion.
In general, the only way to specify a code C F
n
q
is to list all the codewords of C. But if C
is linear, we can only specify a basis of C. In Coding Theory, this is done in the form of a
generator matrix.
Denition (generator matrix)
Let C F
n
q
be a linear code. A generator matrix of C is a matrix
G =
_
_
r
1
r
2
.
.
.
r
k
_
_
,
where the row vectors r
1
, . . . , r
k
are a basis of C.
Remark (all codewords)
To obtain the list of all codewords of C from the generator matrix G as above, we use that
C = {
1
r
1
+. . . +
k
r
k
|
1
, . . . ,
k
F
q
}.
Because r
1
, . . . , r
k
are a basis of C, each codeword is represented by one, and only one, linear
combination
1
r
1
+. . . +
k
r
k
.
Note that there are q
k
such linear combinations: indeed, the rst coefcient,
1
, can be
chosen from the eld F
q
in q possible ways, same for
2
, . . . ,
k
.
Remark
To better visualise the difference between storing all the q
k
codewords of a linear code and
storing only k rows of a generator matrix, consider the following example. A binary code
of dimension about 1500 was used in computer networks for error detection. While it is
possible to store 1500 rows of a generator matrix, it is denitely not possible to store a list
of all 2
1500
codewords. Indeed, the number 10
100
(the googol) is believed to be much bigger
than the number of electrons in the visible Universe; and the googol is less than 2
340
.
Properties of a generator matrix
Let G be a generator matrix of a q-ary linear code C. Then:
the rows of G are linearly independent;
the number n of columns of G is the length of the code;
the number k of rows of G is the dimension, dim(C), of the code;
the number of codewords is M = q
k
;
the dimension of the code is equal to its information dimension: k = log
q
M.
In particular, if an [n, k, d]
q
-code is linear, k is necessarily an integer.
Example
Consider the binary even weight code of length 3:
C = {000, 011, 101, 110}.
We know that the dimension of this code is 2. Therefore, a generator matrix has 2 rows and
3 columns.
To write down a generator matrix, we need to take two linearly independent codewords. We
must not use the zero codeword, 000, because a linearly independent set must not contain
the zero vector.
So we can use
G =
_
0 1 1
1 0 1
_
or G =
_
0 1 1
1 1 0
_
or G =
_
1 0 1
0 1 1
_
etc.
Each of these matrices is a generator matrix for C.
Reminder (the weight of a vector)
Let y F
n
q
. The weight, w(y), of y is the number of non-zero symbols in y.
Denition (the minimum weight of a code)
The minimum weight of a linear code C is
w(C) = min{w(v) | v C \ {0}}.
Recall that the minimum distance, d(C), of a code C is a very important parameter which
tells us how many errors can the code detect and correct. The following theorem shows how
one can nd d(C) if C is linear:
Theorem 3.1. d(C) = w(C) for a linear code C.
Proof. Observe that for v, y F
n
q
, d(v, y) = w(v y). Indeed, d(v, y) is the number of
positions i, 1 i n, where v
i
= y
i
. Obviously, this is the same as the number of positions
i where v
i
y
i
= 0. By denition of the weight, this is w(v y).
Therefore,
d(C) = min{w(v y) | v, y C, v = y}.
Because C is linear, in the above formula v y C. Thus, d(C) is the minimum of some
weights of non-zero vectors from C, whereas w(C) is the minimum of all such weights. It
follows that d(C) w(C).
On the other hand, let z C \ {0} be such that w(z) = w(C). Recall that 0 C, so
d(C) d(z, 0) = w(z 0) = w(z) = w(C).
Hence d(C) w(C). We have proved that d(C) = w(C).
Remark 1. In the proof, we used that C is a linear code at least twice. This condition is
essential.
Remark 2. Given a linear code C, one needs to check only M 1 vectors to compute
d(C) = w(C). For a non-linear code, one has to check M(M 1)/2 pairs of vectors to
compute the minimum distance d.
Remark 3. However, if a linear code C is specied by a generator matrix G, it may be
difcult to compute the minimum weight of C. Of course, the minimum weight of C does
not exceed, but is in general not equal to, the minimum weight of a row of G.
Encoding using a generator matrix
Synopsis. Linear codes are the codes used in virtually all practical applications. In an
[n, k, d]
q
-linear code, messages are all q-ary vectors of length k, and an encoder is a linear
map determined by a generator matrix G of the code. This works especially nicely if G is in
standard form. We learn how to nd a generator matrix in standard form and look at an
example of encoding.
Let an [n, k, d]
q
-linear code C be given by a generator matrix
G =
_
_
r
1
r
2
.
.
.
r
k
_
_
.
As we know, G has k rows and n columns, and the entries of G are in F
q
.
Denition (encoder for a linear code)
The generator matrix G gives rise to the encoder for code C, which is the linear map
ENCODE: F
k
q
C, ENCODE((u
1
, . . . , u
k
)) = u
1
r
1
+. . . +u
k
r
k
,
or, in matrix form,
ENCODE(u) = uG F
n
q
.
In this situation, messages are vectors of length k, and the function ENCODE maps messages
to codewords which are of length n. In any non-trivial linear code, n > k, which means that
encoding makes a message longer.
Encoder depends on the choice of a generator matrix. In practice, there is the best choice:
Denition (generator matrix in standard form)
A generator matrix G is in standard form if its leftmost colums form an identity matrix:
G = [I
k
| A] =
_
_
1 0 . . . 0 . . .
0 1 . . . 0 . . .
.
.
.
0 0 . . . 1 . . .
_
_
.
Note that the entries in the last n k columns, denoted by , are arbitrary elements of F
q
.
If G is in standard form, then, after encoding, the rst k symbols of the codeword show the
original message:
u F
k
q
ENCODE(u) = uG = u[I
k
| A] = [u| uA]
(note that we started using multiplication of block matrices as explained in Week 01, Ex-
amples class).
In this situation, the rst k symbols of a codeword are called information digits. The last
n k symbols are called check digits; their job is to protect the information from noise by
increasing the Hamming distance between codewords.
Theorem 3.2 (generator matrix in standard form)
If a generator matrix in standard form exists for a linear code C, it is unique, and any
generator matrix can be brought to the standard from by the following operations:
(R1) Permutation of rows.
(R2) Multiplication of a row by a non-zero scalar.
(R3) Adding a scalar multiple of one row to another row.
Proof. Not given a standard fact from linear algebra. We will do some examples to show
how to nd the generator matrix in standard form.
Remark. If we apply a sequence of the row operations (R1), (R2) and (R3) to a generator
matrix of a code C, we again obtain a generator matrix of C. This is implied in the Theorem,
and follows from the fact that a basis of a vector space remains a basis under permutations,
multiplication of an element of the basis by a scalar, and adding a scalar multiple of an
element to another element. This fact is known from linear algebra.
Example (nding a generator matrix in standard form)
A 5-ary code C is given by its generator matrix
_
_
0 1 1 0 2 0
2 2 0 0 4 1
3 0 0 0 3 3
1 0 0 1 2 0
_
_
.
Find a generator matrix G of C in standard form.
Solution. To nd the generator matrix in standard form, we can apply the following se-
quence of operations.
Operation Result
1. Choose a row r
i
where the rst entry a
i1
is not
zero. Swap r
1
and r
i
to ensure a
11
= 0.
r
1
r
3
_
_
3 0 0 0 3 3
2 2 0 0 4 1
0 1 1 0 2 0
1 0 0 1 2 0
_
_
2. Multiply the rst row by a
1
11
to ensure a
11
= 1. r
1
2r
1
_
_
1 0 0 0 1 1
2 2 0 0 4 1
0 1 1 0 2 0
1 0 0 1 2 0
_
_
3. Fromeach row, subtract the rst rowmultiplied
by an appropriate fcient to ensure that all non-
diagonal entries in column 1 are zero.
r
2
r
2
2r
1
r
4
r
4
r
1
_
_
1 0 0 0 1 1
0 2 0 0 2 4
0 1 1 0 2 0
0 0 0 1 1 4
_
_
The rst column has become the identity column.
We now focus on column 2.
4. Similarly to the above, make sure that the diag-
onal entry in column two is 1 and any off-diagonal
entries are 0.
r
2
3r
2
r
3
r
3
r
2
_
_
1 0 0 0 1 1
0 1 0 0 1 2
0 0 1 0 1 3
0 0 0 1 1 4
_
_
The second column has become identity column.
We now focus on column 3, then column 4.
not necessary in this
example the matrix
is already in standard
form
Example
Using the generator matrix G in standard form, encode the messages 0224 and 0304.
Solution. We compute:
ENCODE(0224) =
_
0 2 2 4
_
_
_
1 0 0 0 1 1
0 1 0 0 1 2
0 0 1 0 1 3
0 0 0 1 1 4
_
_
=
_
0 2 2 4 3 1
_
.
Note that in the codeword 022431, the information digits 0224 repeat the original message.
The last two symbols, 31, are the check digits. They are found as follows: 01+21+2
1 +4 1 = 3 and 0 1 +2 2 +2 3 +4 4 = 1. We continue:
ENCODE(0304) =
_
0 3 0 4
_
_
_
1 0 0 0 1 1
0 1 0 0 1 2
0 0 1 0 1 3
0 0 0 1 1 4
_
_
=
_
0 3 0 4 2 2
_
.
This example illustrates how encoding increases the Hamming distance between messages.
One has d(0224, 0304) = 2. However, after encoding, d(022431, 030422) = 4.
In fact, one can check that the code given by this generator matrix has a minimum distance
of 3.
Remark. While not directly relevant to the code and the encoding algorithm, let us point
out that the above messages were created using the following mnemonic key a pair of
5-ary symbols is written as an English letter from A to Y:
00 = A, 01 = B, 02 = C, 03 = D, 04 = E, 10 = F, 11 = G, 12 = H, 13 = I, 14 = J, 20 = K,
21 = L, 22 = M, 23 = N, 24 = O, 30 = P, 31 = Q, 32 = R, 33 = S, 34 = T, 40 = U, 41 = V,
42 = W, 43 = X, 44 = Y.
The messages 0224 0304 can be read as CODE.
The codewords 022431 030422 are then read as COQ DEM.
Remark (tradeoff between reliability and transmission costs)
The benet of the increased Hamming distance is that it decreases the likelihood of a trans-
mission error. Indeed, one sybmol error in the message 0224 = CO can make it 1224 = HO,
and there will be no way to recover the original message.
However, because the code has minimum distance 3, one symbol error in the codeword
022431 = COQ will always be corrected by the decoder: 122431 = HOQ will be decoded as
022431 = COQ because 122431 is not in the code, and 022431 is the nearest neighbour of
this vector in the code.
We have to pay for inreased robustness of transmission: if we encode the messages, we
have to transmit 50% more symbols, so that the transmission costs incease.
Synopsis. We have seen how a generator matrix of a linear code is used to encode messages
and produce codewords. We now start discussing decoding. It is a more difcult operation
because, unlike encoding, it is not given by a linear map. In this lecture we see why coset
leaders are important and construct a standard array of a linear code to nd them.
Decoder, error vector
Let C F
n
q
be a linear code. By denition, a decoder for C is a function
DECODE: F
n
q
C.
Coding Theory suggests the following model for the transmission of a codeword of a linear
code via a noisy channel:
codeword v C
noise
y = v +e,
where
y F
n
q
is the received vector,
e = y v is the error vector.
It is assumed that the error vector does not depend on v and is a random vector. Its proba-
bilistic distribution depends on the assumptions about the channel.
The job of the decoder is, given y, to estimate the error vector e and then to decode y as
y e.
Even without assuming anything about the channel, we can see that knowing y restricts
possible values of e, as shown in Lemma 3.3.
Denition (coset)
Given a linear code C F
n
q
and a vector y F
n
q
, the set
y +C = {y +c | c C}
is called the coset of y.
Lemma 3.3 (error vector must lie in the coset of received vector)
Any decoder for a linear code C F
n
q
must satisfy DECODE(y) = y e where e lies in the
coset y +C of y.
Proof. Let v = DECODE(y). Then v C and e = y v. Because C is a linear code, v C,
therefore e = y + (v) y +C.
Let us state basic properties of cosets. They are known from Group Theory (see Algebraic
Structures 1). In Group Theory one speaks about left and right cosets. But a linear code
C is a subgroup of the group F
n
q
which is abelian. Hence is no difference between left and
right cosets, and they are simply called cosets.
Facts about cosets
C = 0+C is itself a coset. (C is called the trivial coset.) Moreover, C is the coset of any
codeword c C.
If y, z F
n
q
, then either y +C = z +C or (y +C) (z +C) = .
|y +C| = |C|.
There are
|F
n
q
|
|C|
distinct cosets. If n is the length and k is the dimension of C, the
number of cosets is
q
n
q
k
= q
nk
.
Thus, the whole space F
n
q
is split into q
nk
disjoint cosets:
F
n
q
= C (a
1
+C) . . . (a
q
nk
1
+C).
Nearest Neighbour decoders and coset leaders
Recall that in Coding Theory, we only consider decoders which satisfy the Nearest Neigh-
bour Decoding principle. This means that DECODE(y) must be a codeword nearest to y in
terms of Hamming distance, or one of such codewords. We will now see how this leads to
the following notion:
Denition (coset leader). A coset leader of a coset y + C is a vector of minimum weight
in y +C.
Indeed, let us make the following
Observation. For a linear code,
d(y, DECODE(y)) = d(y, y e) = w(y (y e))
= w(e),
that is, the distance between the received vector and decoded vector is equal to the weight
of the error vector.
Conclusion. To decode y, a Nearest Neighbour decoder must estimate the error vector to
be a coset leader of y +C.
The following construction is a way to nd all cosets and coset leaders for a given linear
code C.
The standard array construction
A standard array for a linear code C F
n
q
is a table with |C| = q
k
columns and q
nk
rows.
Each row is a coset. Row 0 is the trivial coset (i.e., C itself). The rst column consists of
coset leaders. The table contains every vector from F
n
q
exactly once.
We will show how to construct a standard array, using the linear code C = {0000, 0111,
1011, 1100} F
4
2
as an example.
1
Row 0 of the standard array: lists all codewords (vectors in C = 0 +C). They must start
from 0, but otherwise the order is arbitrary.
0000 0111 1011 1100
Row1: choose a vector a
1
of smallest weight not yet listed. Because of its minimum weight,
that vector will automatically be a coset leader. Fill in Row 1 by adding a
1
to each codeword
in Row 0.
Say, a
1
= 0001. To list its coset, add it to row 0:
0001 0110 1010 1101
Row 2: choose a
2
of smallest weight not yet listed, and do the same as for Row 1.
Say, a
2
= 0010, add it to row 0:
0010 0101 1001 1110
Row 3: same with, say, a
3
= 0100:
0100 0011 1111 1000
1
This code is the even weight code of length 3 with last bit repeated. But the origin of the code is not important
for the standard array construction.
We obtain the following standard array for the code C = {0000, 0111, 1011, 1100}:
0000 0111 1011 1100
0001 0110 1010 1101
0010 0101 1001 1110
0100 0011 1111 1000
Remarks
1. The coset C = 0 +C always has exactly one coset leader, 0.
2. Other cosets may have more than one coset leader.
Exercise: in the above standard array for C, nd coset leaders which are not in column 1.
3. Knowing all the coset leaders is not important for decoding. It is enough to know one
leader in each coset see the next lecture.
Synopsis. For a linear code C, we explicitly describe a Nearest Neighbour decoder
DECODE: F
n
q
C based on a standard array for C. For binary C sent via a binary symmetric
channel, we nd the probability P
undetect
of an undetected transmission error. It is related to
the weight distribution function of C.
The Standard Array decoding
Let C F
n
q
be a linear code. In the previous lecture, we proved that any Nearest Neighbour
decoding algorithm for C must satisfy
DECODE(y) = y COSET LEADER(y +C).
This suggests the following decoding algorithm for C:
Algorithm 3.4 (the standard array decoder)
Preparation. Construct a standard array for C.
Decoding.
Receive a vector y F
n
q
.
Look up y in the standard array:
The row of y starts with its chosen coset leader a
i
.
The column of y starts with y a
i
.
Return the topmost vector of the column of y as DECODE(y).
Example
Using the standard array decoder for the binary code C = {0000, 0111, 1011, 1100},
decode the received vectors 0011 and 1100;
give an example of one bit error occurring in a codeword and being corrected;
give an example of one bit error occurring in a codeword and not being corrected.
Solution. We already know a standard array for C:
0000 0111 1011 1100
0001 0110 1010 1101
0010 0101 1001 1110
0100 0011 1111 1000
The received vector 0011 is in the second column, so DECODE(0011) = 0111. The received
vector 1100 is a codeword (in the fourth column), so DECODE(1100) = 1100.
Suppose that the codeword 0000 is sent. If an error occurs in the last bit, the word 0001 is
received and decoded correctly as 0000. If an error occurs in the rst bit, the word 1000 is
received and decoded incorrectly as 1100.
Remark (a standard array decoder is not unique)
Recall that there may be more than one possible standard array for the code C. Indeed, in
the above example the coset 0100+C has two coset leaders: 0100 and 1000. Thus, we could
construct a different standard array for C:
0000 0111 1011 1100
0001 0110 1010 1101
0010 0101 1001 1110
1000 1111 0011 0100
The decoder associated to this standard array is different from the decoder considered
above. Both decoders decode the same linear code C and both are Nearest Neighbour de-
coders. A linear code can have more than one Nearest Neighbour decoder.
However, if C is a perfect linear code, then
each coset has only one coset leader, so the Standard Array decoder is unique;
each vector has a unique nearest neighbour in C; thus, there is only one Nearest
Neighbour decoder, namely the Standard Array decoder.
The above facts about perfect codes are true by questions 8(b) and 13(c) on the example
sheets.
Reminder (the number of errors corrected by a code)
Recall that a code with minimum distance d corrects t =
_
d1
2
errors.
The code C in the above example is linear, hence d(C) = w(C) = 2 (it is easy to nd the
minimum weight of the code by inspection). This means that the code corrects
_
21
2
= 0
errors. And indeed, as we saw in the example, it is possible that one bit error occurs in a
codeword and is not corrected.
So, from the point of view of Hammings theory, this code C has no error-correcting capabil-
ity. (It still detects one error.)
But in Shannons theory, error-detecting and error-correcting performance of a code are
measured probabilistically.
Error-detecting and error-correcting performance of a linear code:
Shannons theory point of view
Shannons Information Theory is interested in how likely is it that a transmission error in
a codeword is not detected/corrected by a decoder of C.
We will answer these questions for a binary linear code C, but we need to have a stochastic
model of the noise. Here it is:
Assumption. The channel is BSC(r), the binary symmetric channel with bit error rate r.
This means that one bit (0 or 1), transmitted via the channel, arrives unchanged with
probability 1 r, and gets ipped with probability r:
0
1
1 r
r
r
1 r
0
1
When a codeword v is transmitted, the channel generates a random error vector and adds
it to v. By denition of BSC(r), for a given e F
n
2
one has
P(the error vector equals e) = r
i
(1 r)
ni
, where i = w(e)
(similar to Q4(b) on the Example sheet).
Recall that an undetected error means that the received vector v+e is a codeword not equal
to v. Note that, if v C and C is a vector space,
v +e C e C.
Therefore, an undetected error means that the error vector is a non-zero codeword. We can
now calculate
P
undetect
= P(undetected error) =
eC,e=0
P(the error vector is e)
=
n
i=1
A
i
r
i
(1 r)
ni
,
where
A
i
is the number of codewords of weight i in C.
The coefcients A
i
give rise to the following useful function:
Denition (the weight distribution function)
Let C F
n
q
be a linear code. The weight distribution function of C is the polynomial A(x) =
A
0
+A
1
x +A
2
x
2
+. . . +A
n
x
n
. It is otherwise written as
A(x) =
vC
x
w(v)
.
Lemma 3.5 (P
undetect
via the weight distribution function)
Suppose that a binary linear code C of length n with weight distribution function A(x)
is transmitted via BSC(r). Then the probability of an undetected transmission error is
P
undetect
= (1 r)
n
_
A(
r
1r
) 1
_
.
Proof (not given in class, not examinable). One has A
0
= 1 because a linear code C contains
exactly one codeword of weight 0, namely 0. Therefore A(x) 1 = A
1
x +A
2
x
2
+. . . +A
n
x
n
,
which gives A(
r
1r
) 1 = A
1
r
1r
+ A
2
r
2
(1r)
2
+ . . . + A
n
r
n
(1r)
n
. Multiplying the right-hand
side by (1 r)
n
, one obtains the formula for P
undetect
deduced earlier.
Example
For the binary code C = {0000, 0111, 1011, 1100} as above, the weight distribution function
is
A(x) = 1 +x
2
+2x
3
,
because C has one codeword of weight 0, zero codewords of weight 1, one codeword of weight
2 and two codewords of weight 3. If a codeword of C is transmitted via BSC(r), then an
undetected error occurs with probability P
undetect
= r
2
(1 r)
2
+2r
3
(1 r).
Remark
In an earlier example we sent the codeword 0000 of the code C down the channel and
received the vector 1000. In this case, the error was detected (because the received vector
was not a codeword), but not corrected.
In the next lecture, we will nd the probability of an error being corrected for C.
Error-correcting performance of a linear code
Synopsis. As in the previous lecture, we assume that a binary linear code C is transmitted
via BSC(r). We nd P
corr
, the probability that a codeword is decoded correctly, in terms of
weights of coset leaders of C. We conclude by stating the need for better codes and better
decoding algorithms.
Let C be a binary linear code. A codeword v C is transmitted via the binary symmetric
channel with bit error rate r, and a vector v + e is received and decoded. We say that the v
is decoded correctly if DECODE(v +e) = v. We denote
P
corr
(C) = P(v is decoded correctly).
If we use a Standard Array decoder,
DECODE(v +e) = v +e COSET LEADER(v +e)
= v +e COSET LEADER(e).
Therefore, correct decoding occurs whenever the error vector is a coset leader in the standard
array.
Recall that e equals a given coset leader of weight i with probability r
i
(1r)
ni
. We obtain
Lemma 3.6 (formula for P
corr
)
For a binary linear code C, let
i
be the number of cosets where the coset leader is of
weight i. Then
P
corr
(C) =
n
i=0
i
r
i
(1 r)
ni
.
Example
Recall the code with a standard array
0000 0111 1011 1100
0001 0110 1010 1101
0010 0101 1001 1110
1000 1111 0011 0100
One has
0
= 1,
1
= 3,
2
=
3
= 0, so P
corr
(C) = (1 r)
4
+3r(1 r)
3
.
Numerical example
Let r = 0.1 be the bit error rate of the channel
1
. Let us compare two scenarios:
1. a two-bit message is transmitted via the channel without any encoding;
2. a two-bit message is encoded using the code C as above, then the codeword of length
4 is transmitted via the channel, and decoded at the other end.
In each case, we determine the probability that the message arrives uncorrupted.
Case 1: by denition of BSC(0.1), the probability of no errors in a message of length 2 is
(1 0.1)
2
= 0.81.
Case 2: by the above calculation, the probability of correct decoding is P
corr
(C) = (10.1)
4
+
3 0.1(1 0.1)
3
= 0.8748.
As we can see, the code C allows us to improve the reliability of transmission, but not by
much (the error probability goes down from 0.19 to about 0.12, i.e., is not even halved). The
main reason for that is that C is not a good code. From Hammings theory perspective, it
does not correct even a single error.
Discussion: how can one improve performance of error-correcting codes?
1. One can use longer codes (increase n).
However, decoding is going to be a problem. We described the standard array decoding
algorithm for linear codes. Its main disadvantage is the storage requirement: the standard
array contains all vectors of length n and must be stored in memory while the decoder
operates. This requires an amount of memory proportional to nq
n
.
Example: The Voyager 1 and 2 deep space missions used a binary code
of length 24 to transmit colour photographs of Jupiter and Saturn back to
Earth in 197981. A standard array decoder for this code would require
24 2
24
bits of memory, which is 48 Mbytes.
This is little by todays standards, however, the Voyager 1 spacecraft has
recently left the Solar system with 68 kbytes of onboard memory...
1
This is a very bad channel. When telephone lines were rst used for transmission of binary data, the bit error
rate was of the order 10
3
.
In the next section, we will introduce an improved technique called syndrome decoding.
It will require signicantly less memory, but decoding a received vector will require more
computation. Syndrome decoding is based on the notion of a dual linear code.
2. One can use codes which correct more errors.
In the rest of the course, we will construct several families of codes algebraically.
End of discussion.
4. Dual codes
Synopsis. We dene the dual code C
of a linear code C. If C has a generator matrix in

standard form, we learn how to nd a generator matrix of C
.
Denition (inner product)
For u, v F
n
q
, the scalar (element of F
q
) dened as
u v =
n
i=1
u
i
v
i
is called the inner product of the vectors u and v.
Properties of the inner product
(1) u v = uv
T
(matrix multiplication).
Explanation: we write elements of F
n
q
as row vectors. Thus, u is a row vector (u
1
, . . . , u
n
),
and v
T
is the transpose of v, so a column vector
_
_
_
_
_
_
v
1
v
2
.
.
.
v
n
_
_
_
_
_
_
. Multiplying u and v
T
as matrices
(an 1 n matrix by an n 1 matrix), we obtain a 1 1 matrix, i.e., a scalar in F
q
.
(2) u v = v u (symmetry).
Explanation: is easily seen from the denition.
(3) For a scalar F
q
we have (u + w) v = u v + (w v); u (v + w) = u v + (u w)
(this property is known as bilinearity).
Explanation: we know that the matrix product in uv
T
is bilinear.
Denition (dual code)
Given a linear code C F
n
q
, we dene the dual code C
as
C
= {v F
n
q
| v c = 0 for every c C}.
Example (the dual code of the binary repetition code)
Let Rep
n
= {00 . . . 0, 11 . . . 1} F
n
2
be the binary repetition code of length n.
By denition, (Rep
n
)
= {v F
n
2
| v 00 . . . 0 = 0, v 11 . . . 1 = 0}. The rst condition,
v 00 . . . 0 = 0 is vacuous (holds for all vectors v F
n
2
). The second condition, v 11 . . . 1,
means v
1
+ v
2
+ . . . + v
n
= 0 in F
2
, i.e., v E
n
, the binary even weight code of length n.
Thus, (Rep
n
)
= E
n
.
Lemma 4.1
Let A be an mn matrix over F
q
. The following are equivalent:
i. A = 0 (zero matrix);
ii. u F
m
q
uA = 0;
iii. v F
n
q
Av
T
= 0;
iv. u F
m
q
v F
n
q
uAv
T
= 0.
Remark: 0 denotes the column vector
_
_
_
_
0
.
.
.
0
_
_
_
_
= (0, . . . , 0)
T
= 0
T
.
Proof (not given in class easy exercise). Obviously, i. = ii. = iv.
Now, take u = (0, . . . , 1, . . . , 0) (the ith component is 1, the rest are 0) and v = (0, . . . , 1, . . . , 0)
(the jth component is 1, the rest are 0), and observe that uAv
T
= a
ij
, the (i, j)-th entry of
A. Therefore, iv. = i., and i. ii. iv.
In the same way, i. iii. iv.
Proposition 4.2. If G is a generator matrix for C, then C
= {v F
n
q
| vG
T
= 0}.
Proof. Recall that
C = {uG : u F
k
q
}
(see Encoding using a generator matrix in 3). Hence
C
= {v F
n
q
: u F
k
q
v (uG) = 0} = {v F
n
q
: u F
k
q
vG
T
u
T
= 0}.
where we use the rule (AB)
T
= B
T
A
T
for matrix multiplication. By Lemma 4.1(iii. = i.),
this is {v : vG
T
= 0}.
Corollary 4.3. C
is a linear code. (By Question 12 on the example sheets.)

Theorem 4.4
Assume that C has a k n generator matrix G = [ I
k
| A] in standard form. Then C
has
generator matrix
H = [ A
T
| I
nk
].
Proof. Notation: if v F
n
q
, we will write v = [ x | y] where x F
k
q
and y F
nk
q
. Here x
denotes the vector formed by the rst k symbols in v, and y denotes the vector formed by
the last n k symbosl in v.
Observe that A is a k (n k) matrix, so its transpose A
T
is an (n k) k matrix, hence
H is an (n k) n matrix. The last n k columns of H are identity columns, therefore the
rows of H are linearly independent. So H is a generator matrix of the linear code
{yH : y F
nk
q
} = { [ yA
T
| y] : y F
nk
q
}
= { [ x | y] : y F
nk
q
, x +yA
T
= 0 }
= { [ x | y] : [ x | y]
_
_
I
k
A
T
_
_ = 0 }
= { [ x | y] : [ x | y]G
T
= 0 }
= C
by Proposition 4.2.
Synopsis. We prove an important theorem about the properties of a dual code C
. We
introduce a new decoding algorithm for linear codes called syndrome decoding. It requires
the knowledge of a check matrix of the code.
Denition (check matrix)
A check matrix for a linear code C means a generator matrix for C
.
If C is a binary code, we can also say parity check matrix.
(The reason for this terminology will be explained soon.)
Question: what if the code C does not have a generator matrix G in standard form?
Answer: for practical purposes, we will replace C by a linearly equivalent code with G in
standard form.
Denition (linearly equivalent codes)
Two linear codes C, C
F
n
q
are linearly equivalent, if C
can be obtained from C by a

sequence of linear transformations of the following types:
(C1) choose indices i, j; in every codeword, swap symbols x
i
and x
j
;
(C2) choose index i and non-zero F
q
; in every codeword, multiply x
i
by .
Exercise: linearly equivalent codes are parameter equivalent. Indeed, transformations
(C1), (C2) do not change:
the dimension, dimC, of the code;
the weight, w(C), of the code.
1
Fact from linear algebra
Every generator matrix can be brought into the standard form by using row operations
(R1), (R2), (R3) from 3 and column operations (C1). We can thus nd a generator matrix
in standard form for a linearly equivalent code.
Theorem 4.5
If C F
n
q
is a linear code of dimension k, then:
i. dimC
= n k;
ii. (C
= C;
iii. if H is a check matrix for C, then C = {v F
n
q
: vH
T
= 0}.
Proof. i. (sketch of proof ) If C has a generator matrix in standard form, then Theorem
4.4 yields a generator matrix for C
with n k rows, so we are done. Otherwise, using

operations (C1), permute symbol positions in C and obtain a code C
, linearly equivalent
1
The weight of C is the same as the minimum weight of C the minimum weight of a non-zero codeword.
to C, with a generator matrix in standard form. One has dim(C
= n k. Now observe
that the same permutation applied to C
gives (C
, so dimC
= dim(C
= n k.
ii. c C = v C
v c = c v = 0 = c (C
. This shows that C (C
. But
dim(C
= n (n k) = k = dimC. So C = (C
.
iii. is immediate by ii. and Proposition 4.2.
Remark
Thus, a check matrix H allows us to check whether a given vector v F
n
q
is a codeword in
C: this is true iff vH
T
= 0.
Denition (syndrome)
Let H be a check matrix for a linear code C F
n
q
. Let y F
n
q
. The vector
S(y) = yH
T
is called the syndrome of y. The linear map
S: F
n
q
F
nk
q
is the syndrome map.
We can see that the syndrome map can be used for error detection:
S(y) = 0 y is a codeword.
We will now use the syndromes in error correction. Using the above notation:
Theorem 4.6. S(y) = S(v) y, v are in the same coset of C.
Proof. yH
T
= vH
T
(y v)H
T
= 0 y v C. By denition of cosets, this means
that v is in the coset of y.
Syndrome decoding
If we know a check matrix H for a linear code C F
n
q
, we can improve the standard array
decoder for C. We will write the same decoder differently; it will require much less memory
but more calculations.
Algorithm 4.7 (the syndrome decoder)
Preparation. Construct a table of syndromes, with q
nk
rows, of the form
Coset leader a
i
S(a
i
)
The top row contains the codeword 0 and its syndrome S(0) = 0.
At each step, choose a vector a
i
F
n
q
of smallest weight such that S(a
i
) does not appear in
the table; then a
i
is a coset leader of a new coset.
Decoding.
Receive a vector y F
n
q
.
Calculate S(y) = yH
T
.
In the table, nd a
i
with S(a
i
) = S(y). Then a
i
is the coset leader of the coset of y.
Return DECODE(y) = y a
i
.
Remarks
The syndrome decoder is based on a choice of one coset leader in every coset. This is the
same as for the standard array decoder.
In fact, if the same coset leaders are chosen in both decoders, both decoders with yield the
same function DECODE: F
n
q
C. They differ only in the way this function is computed.
The number of arithmetic operations required to calculate the syndrome S(y) = yH
T
can
be of order n
2
, whereas the standard array decoder requires n operations to look up a
vector. On the other hand, the amount of memory required by the syndrome decoder is
proportional to q
nk
which is better thatn q
n
for the standard array, especially for codes
with high code rate
k
n
.
Example of syndrome decoding
This example was given in the examples class.
Let C be the binary linear code with generator matrix
_
1 0 0 1 1 0
0 0 1 0 1 1
_
.
(a) Find a code Dlinearly equivalent to C such that Dhas a generator matrix in standard
form.
(b) Find a parity check matrix H for the code D.
(c) Construct the table of syndromes for D using the matrix H.
(d) Using the table of syndromes, decode the received vector y = 111111.
Solution. (a) Swap columns 2 and 3 to obtain a generator matrix G =
_
1 0 0 1 1 0
0 1 0 0 1 1
_
in standard form. Let D be the code generated by this matrix.
We now forget about the code C and work only with D.
(b) We can nd H using Theorem 4.4. Observe that G = [ I
2
| A] with A =
_
0 1 1 0
0 0 1 1
_
.
Put H =
_
_
0 0 1 0 0 0
1 0 0 1 0 0
1 1 0 0 1 0
0 1 0 0 0 1
_
_
.
(c) When calculating syndromes, it is useful to observe that the syndrome of a vector
0 . . . 010. . . 0 (with 1 in position i and 0s elsewhere) is equal to the ith column of H, trans-
posed.
The syndrome map is linear, so the syndrome of a sum of two vectors is the sum of their
syndromes, etc.
For example, S(011000) = 0011+1000 = 1011 (the sum of the second and the third columns
of H, transposed).
vector syndrome leader?
000000 0000 yes
000001 0001 yes
000010 0010 yes
000100 0100 yes
001000 1000 yes
010000 0011 yes
100000 0110 yes
All vectors of weight 1 have different syndromes, so they all are coset leaders. We need
more coset leaders, hence we start looking at vectors of weight 2, then weight 3:
000011 0011 no, syndrome already in the table
000101 0101 yes
001001 1001 yes
001010 1010 yes
001100 1100 yes
010100 0111 yes
011000 1011 yes
101000 1110 yes
001101 1101 yes
011100 1111 yes
When we try a vector, say of weight 2, and nd that is syndrome is already in the table, we
ignore that vector and try another one.
We found 16 = 2
62
coset leaders so we stop.
(d) S(111111) = 1010 which is the syndrome of the coset leader 001010 in the table. There-
fore, DECODE(111111) = 111111 001010 = 110101.
(Sanity check. The decoded vector must be a codeword. Indeed, we can see that the vector
110101 is the sum of the two rows of the generator matrix for D.)
End of example.
5. Hamming codes
Synopsis. Hamming codes are essentially the rst non-trivial family of codes that we shall
meet. We start by proving the Distance Theorem for linear codes we will need it to de-
termine the minimum distance of a Hamming code. We then give a construction of a q-ary
Hamming code.
We already know how to read the length and the dimension of a linear code C off a check
matrix H of C:
the number of columns of H is the length of C;
the number of columns minus the number of rows of H is the dimension of C.
The following theorem tells us how to determine the minimum distance of C using H.
Theorem 5.1 (Distance Theorem for Linear Codes)
Let C F
n
q
be a linear code with check matrix H. Then d(C) = d if and only if every set
of d 1 columns of H is linearly independent and some set of d columns of H is linearly
dependent.
Proof. By Theorem 4.5, a vector x = (x
1
, . . . , x
n
) F
n
q
is a codeword if and only if xH
T
= 0.
This can be written as
x
1
h
1
+x
2
h
2
+. . . +x
n
h
n
= 0
where h
1
, . . . , h
n
are the columns of the matrix H. The number of columns of H that appear
in the linear dependency x
1
h
1
+ . . . + x
n
h
n
= 0 with non-zero coefcient is the weight of x.
Therefore, d(C), which is the minimum possible weight of a non-zero codeword of C, is the
smallest possible number of columns of H that form a linear dependency.
Example
Use the Distance Theorem to nd the minimum distance of the ternary linear code with
check matrix H =
_
1 1 1 0
1 2 0 1
_
.
Solution. Step 1. Note that H has no zero columns. This means that every set of 1 column
is linearly independent (a one-element set is linearly dependent iff that element is zero).
So d 2.
Step 2. Any two columns of H are linearly independent, because no two columns are pro-
portional to each other. So d 3.
Step 3. There are three linearly dependent columns in H: for example, columns 1, 3 and 4
form a linear dependency
_
1
1
_
_
1
0
_
_
0
1
_
= 0. Therefore, d = 3.
Remark: although, for small matrices, the algorithm for nding d can be used in practice,
it requires a lot of computations for large n. Essentially, one needs to check all possible sets
of columns of H for linear dependence. The required number of operations is exponential
in n.
The projective space P
n1
(F
q
)
Denition (line, representative vector, projective space)
A line is a 1-dimensional subspace of the vector space F
n
q
.
A representative vector of a line is a non-zero vector u from that line. The line is then given
by {u | F
q
}.
The projective space P
n1
(F
q
) is the set of all lines in F
n
q
.
Remark: the terminology comes from euclidean geometry in the euclidean plane, the set
of all vectors proportional to a given non-zero vector is a straight line through the origin.
Moreover, projective spaces over the eld R of real numbers are well-studied geometric
objects.
For example, P
1
(R) the set of all lines through the origin in the euclidean plane can
be thought of as the unit circle with antipodes identied. We are working over a nite eld
F
q
where these notions are less intuitive.
Denition (Hamming codes)
Let s 2 be given. We let Ham(s, q) denote an F
q
-linear code whose check matrix has
columns which are representatives of the lines in P
s1
(F
q
), exactly one representative vec-
tor from each line.
Remark
Ham(s, q) is dened up to a linear equivalence:
we can multiply a column by a non-zero scalar to get another representative of the
same line;
we can put columns in any order.
This means that Ham(s, q) is not just one code but a class of linearly equivalent codes. We
will therefore say a Ham(s, q) code to mean any of the linearly equivalent codes.
For convenience, we may make the last s columns the identity columns, to get a check
matrix H in standard form.
Example: Ham(3, 2)
To construct a parity check matrix for Ham(3, 2), we need to take one non-zero column from
each line in F
3
2
. Note that for binary vectors, a line u | F
2
} consists only of two points,
0 and u. Therefore, P
s1
(F
2
) is the same as the set of all non-zero column binary vectors of
size s.
We start lling in the parity check matrix by putting the identity columns at the end. We
do this to obtain a parity check matrix in standard form.
We can nd a total of 7 non-zero binary vectors of size 3:
H =
_
_
1 1 1 0 1 0 0
1 1 0 1 0 1 0
1 0 1 1 0 0 1
_
_
From this parity check matrix in standard form, we can reconstruct the generating matrix:
G =
_
_
1 0 0 0 1 1 1
0 1 0 0 1 1 0
0 0 1 0 1 0 1
0 0 0 1 0 1 1
_
_
This is, up to linear equivalence, the generating matrix of the original [7, 4, 3]
2
code in-
vented by Hamming.
A historical remark
Despite their name, the q-ary Hamming codes were not invented by Hamming. Richard
Hamming told Claude Shannon (who he shared an ofce with at Bell Labs) about his binary
[7, 4, 3]-code, and Shannon mentioned it in his paper of 1948. That paper was read by
Marcel J. E. Golay (19021989), a Swiss-born American mathematician and electronics
engineer, who then suggested the Ham(s, q) construction in his paper published in 1949.
Golay went further and constructed two perfect codes which are not Hamming codes. He
asked whether there are any more perfect codes.
We will see the Golay codes, and will learn the answer to Golays question about perfect
codes, later in the course.
Synopsis. By calculating the parameters of a Ham(s, q) code explicitly, we prove that Ham-
ming codes are a family of perfect codes with minimum distance 3. We show that syndrome
decoding works for Hamming codes in an especially simple way.
We considered an example of a Ham(3, 2) code, which by looking at its generator matrix
turns out to be a [7, 4, d]
2
code. It is not difcult to see that d = 3, see Question 10 on the
example sheets. We know that any [7, 4, 3]
2
-code is perfect, see Question 5 on the example
sheets.
We will now generalise this and show that all Hamming codes are perfect.
Theorem 5.2 (properties of Hamming codes)
Ham(s, q) is a perfect [n, k, d]
q
code where
n =
q
s
1
q 1
, k = n s, d = 3.
Proof. The length n of the code is equal to the number of columns in the check matrix,
which is |P
s1
(F
q
)|, the number of lines in F
s
q
.
Observe that two lines intersect only at one point, namely 0. The set F
s
q
\ {0} is therefore a
disjoint union of lines. Each line {u : F} contains q 1 non-zero points.
So the number of lines in F
s
q
can be found as
| F
s
q
\ {0} |
q 1
=
q
s
1
q 1
.
Note that k = dimHam(s, q) = n s because by construction, the check matrix H has s
rows.
To nd the minimum distance, we use the Distance Theorem for linear codes. Note that
any two columns of H are linearly independent because they are from different lines in
F
s
q
. (Two vectors can be linearly dependent only if they are proportional to each other, i.e.,
belong to the same line.) Therefore, d 3.
On the other hand, H has columns (a, 0, 0, . . . , 0)
T
, (0, b, 0, . . . , 0)
T
and (c, c, 0, . . . , 0)
T
, from
three different lines, linearly dependent:
a
1
_
_
a
0
.
.
.
0
_
_
+b
1
_
_
0
b
.
.
.
0
_
_
c
1
_
_
c
c
.
.
.
0
_
_
= 0.
So d = 3 by the Distance Theorem.
It remains to show that Ham(s, q) is perfect. We compute the Hamming bound: put t =
[(d 1)/2] = [2/2] = 1 and calculate
q
n
t
i=0
_
n
i
_
(q 1)
i
=
q
n
_
n
0
_
+
_
n
1
_
(q 1)
, where
_
n
0
_
+
_
n
1
_
(q 1) = 1 + n(q 1) = 1 +
q
s
1
q 1
(q 1) = q
s
. So the Hamming bound equals
q
ns
= q
k
= M (the number of codewords). Thus, Ham(s, q) is perfect.
Remark: n =
q
s
1
q 1
= q
s1
+q
s2
+. . . +q+1. The right-hand side is obviously an integer.
Syndrome decoding for Ham(s, q)
By Question 13 on the example sheets, the coset leaders of Ham(s, q) are the vector 0 and
all the vectors of weight 1.
Syndromes of coset leaders: let a = (0, . . . , 0, , 0, . . . , 0) be a coset leader, where is in
position i. Then the syndrome of a is S(a) = aH
T
= [ the ith column of H]
T
.
Reminder about the syndrome decoding: DECODE(y) = y a where a is a coset leader with
S(a) = S(y).
Hence we obtain the following
Decoding procedure for a Hamming code:
Let a Hamming code be given by its check matrix H.
Suppose that a vector y is received.
Calculate S(y) = yH
T
.
Find a column of H such that S(y) = that column. (ith column)
Subtract from the ith position in y. The result is the codeword DECODE(y).
See the answer to Question 17 on the example sheets for an example.
6. Cyclic codes
Cyclic codes are an interesting class of linear codes discovered in 1957.
Denition (cyclic code)
A cyclic code in F
n
q
is a linear code C such that
(a
0
, a
1
, . . . , a
n1
) C = (a
n1
, a
0
, . . . , a
n2
) C.
Remark: the vector (a
n1
, a
0
, . . . , a
n2
) is called the cyclic shift of (a
0
, . . . , a
n1
).
We can iterate the cyclic shift, so if (a
0
, a
1
, . . . , a
n1
) is a codeword of a cyclic code C, then
(a
n2
, a
n1
, a
0
, . . . , a
n3
) is a codeword, ..., (a
1
, . . . , a
n1
, a
0
) is a codeword.
To study cyclic codes, we need certain algebraic methods which go beyond the basic linear
algebra. We need to work with commutative rings.
Denition (commutative ring)
A commutative ring is a set R equipped with two binary operations, + (addition) and
(multiplication), which satisfy the eight standard axioms.
Remark: for reference, the axioms of a commutative ring are as follows. For all a, b, c R
(writing ab for a b):
1. (a +b) +c = a + (b +c);
2. a +b = b +a;
3. 0 F: a +0 = a;
4. a F a F: a + (a) = 0;
5. (ab)c = a(bc);
6. ab = ba;
7. 1 F: 1 = 0, a 1 = a;
8. a(b +c) = ab +ac.
Note that the axiom a R \ {0} a
1
R: aa
1
= 1 is not part of the axioms of a
commutative ring. Commutative rings which satisfy this additional axiom (the inverse
axiom) are elds; see the examples class in Week 1.
Note that the word commutative refers to the commutativity of multiplication, ab = ba.
Example: the ring of polynomials
We will consider F
q
[x], the ring of polynomials with coefcients in the eld F
q
. As a set,
F
q
[x] = {a
0
+a
1
x +. . . +a
n
x
n
| n 0, a
0
, a
1
, . . . , a
n
F
q
},
with addition and multiplication dened for polynomials in the standard way.
Note that F
q
[x] is an innite ring, though F
q
is a nite eld.
If f = a
0
+ a
1
x + . . . + a
n
x
n
is a polynomial, a
n
= 0, we say that the degree of f is n and
write deg f = n. Every non-zero polynomial has a degree.
Synopsis. We would like to study cyclic codes using an algebraic structure which is richer
than just a vector space. An appropriate structure is a ring. But the ring F
q
[x] of polynomials
is too big (innite). New, smaller rings can be produced using the quotient ring construction.
In particular, cyclic codes coincide with ideals of the ring R
n
= F
q
[x]/(x
n
1).
We are going to dene quotient rings. An essential ingredient of that construction is an
ideal.
Denition (ideal)
An ideal of a commutative ring R is a subset I R satisfying three conditions:
(1) 0 I; (2) x, y I = x +y I; (3) r R, x I = rx I.
We refer to (2) and (3) by saying that an ideal is closed under addition and is also closed
under multiplication by arbitrary elements of the ring.
Remark: if I is an ideal, r R, x I, then xr I because xr = rx.
Examples: in an arbitrary commutative ring R, subsets {0} and R are ideals. (But these
ideals are not interesting.)
A proper ideal is an ideal I R such that I = R.
Denition (quotient ring)
Let I be a proper ideal of a commutative ring R. We dene a new ring R/I called the quotient
ring of R modulo I.
Elements of R/I are cosets r + I where r R. We denote the coset r + I by r. The addition
and multiplication are introduced on R/I by
r +s = r +s, r s = rs.
One can check that these two operations are well dened; i.e., that their result does not
change if, instead of r, one chooses a different representative of the same coset r. One has
to remember that
r = r
1
iff r r
1
I.
So one could verify, using the denition of an ideal, that if r = r
1
then r +s = r
1
+s and
rs = r
1
s. We skip this verication.
The ring R
n
We would like to construct an explicit example of a quotient ring. Let f(x) F
q
[x] be a
polynomial. Denote
(f) = {f(x)g(x) | g(x) F
q
[x]}.
Observe that (f) is an ideal of F
q
[x] (the three conditions that dene an ideal are easy to
check). The ideal (f) is called the ideal generated by f.
We make the situation even more explicit. Put f(x) = x
n
1 and dene R
n
to be the quotient
ring F
q
[x]/(x
n
1).
The following result describes R
n
explicitly as a nite-dimensional vector space over F
q
.
Proposition 6.1
R
n
is a vector space over the eld F
q
with basis {1, x, x
2
, . . . , x
n1
}.
(In particular, R
n
is a nite set with q
n
elements.)
Proof. Elements of R
n
are of the formg where g is a polynomial in F
q
[x]. That is, an element
of R
n
can be written as g = a
0
1 +a
1
x +. . . +a
N
x
N
. It is clear that R
n
is a vector space (the
operations of addition and multiplication by a scalar come from F
q
[x]).
To prove that {1, x, . . . , x
n1
} is a basis, we must prove that this set spans R
n
and is linearly
independent.
Spanning set: note that x
n
= x
n
1+1 = 0+1 = 1. (We use the fact that if R/I is a quotient
ring and r I, then r = 0 in R/I: every element of the ideal I becomes zero when we pass to
the quotient ring R/I. Of course, x
n
1 belongs to the ideal (x
n
1).)
Then x
n+1
= x
n
x = 1x = x, and, more generally, if k 0, 0 r n 1, then x
kn+r
= x
r
.
This shows that every monomial is equal, in the ring R
n
. to one of the monomials 1, x, . . . , x
n1
.
Every element of R
n
is a linear combination of monomials, hence is equal to a linear com-
bination of 1, x, . . . , x
n1
. We proved that these n monomials form a spanning set.
Linear independence: assume that b
0
+ b
1
x + . . . + b
n1
x
n1
= 0 in R
n
. This means that
the polynomial b
0
+ b
1
x + . . . + b
n1
x
n1
F
q
[x] belongs to the ideal (x
n
1). That is,
b
0
+ b
1
x + . . . + b
n1
x
n1
is of the form (x
n
1)g(x) for some g(x) F
q
[x]. But the degree
of b
0
+ b
1
x + . . . + b
n1
x
n1
is less than n (the degree of x
n
1), so this is only possible if
b
0
+ b
1
x + . . . + b
n1
x
n1
is the zero polynomial. Thus, b
0
= b
1
= . . . = b
n1
= 0. Linear
independence is proved.
Identication of F
n
q
with R
n
We will identify F
n
q
with R
n
= F
q
[x]/(x
n
1):
a = (a
0
, a
1
, . . . , a
n1
) a = a
0
+a
1
x +. . . +a
n1
x
n1
.
We will usually omit over x when writing elements of R
n
.
Under this identicaton,
100 . . . 00 1 R
n
, 010. . . 00 x R
n
, etc.
How is the multiplication in the quotient ring R
n
dened?
The real benet of identifying F
n
q
with the ring R
n
is the multiplication operation which
arises on F
n
q
. Let us see how multiplication by x works:
x(a
0
+a
1
x +. . . +a
n1
x
n1
) = a
0
x +. . . +a
n2
x
n1
+a
n1
x
n
(using x
n
1 = 0 in R
n
) = a
n1
+a
0
x +. . . +a
n2
x
n1
.
That is, the multiplication by x is the same as cyclic shift.
We come to the following key observation which allows us to use ring theory to study cyclic
codes:
Lemma 6.2 (Prange, 1957)
If vectors in F
n
q
are identied with elements of R
n
= F
q
[x]/(x
n
1) in the usual way, a cyclic
code C F
n
q
becomes an ideal of R
n
.
Proof. Let C F
n
q
be a cyclic code. Then, by denition, C is linear, so C contains zero and
is closed under addition. Thus, conditions (1) and (2) from the denition of an ideal are
fullled.
It remains to check: r R
n
, a C = ra C.
First of all, xa C because xa is the cyclic shift of a, and C is cyclic.
Iterating this argument gives a, xa, x
2
a, . . . , x
n1
a C.
Finally, writing r R
n
as b
0
+ b
1
x + . . . + b
n1
x
n1
, we conclude that ra = b
0
a + b
1
(xa) +
. . . +b
n1
(x
n1
a) C because C is linear.
Synopsis. We prove that every cyclic code C is an ideal generated by a unique monic poly-
nomial called the generator polynomial. We also dene a check polynomial of C. We can
classify cyclic codes of length n by listing all monic divisors of the polynomial x
n
1. We
learn how to write a generator matrix of a cyclic code with a given generator polynomial.
To classify cyclic codes in F
n
q
, we need to classify ideals of R
n
.
For example, given g R
n
, the set (g) = gR
n
= {gh : h R
n
} is an ideal of R
n
, called a
principal ideal.
We will now see that all ideals of R
n
are principal.
Theorem 6.3 (structure of ideals of R
n
)
If C is an ideal of R
n
, then there exists a unique monic polynomial g F
q
[x] of minimum
degree such that C = gR
n
. This polynomial g is a divisor of x
n
1 in F
q
[x].
(Monic means: the coefcient of the highest power of x in g is 1.)
Proof. Let g(x) F
q
[x] be a polynomial of lowest degree with the property that g C. Make
g(x) monic by dividing it by its leading coefcient.
If f(x) F
q
[x] is such that f C, the polynomial division algorithm allows us to write
f(x) = g(x)q(x) + r(x) where deg r(x) < deg g(x) or r(x) = 0. Then r = f gq C as C is an
ideal and f, g C.
Then by the choice of g(x), it is impossible that deg r(x) < deg g(x). So r(x) = 0 and f(x) =
g(x)q(x) and f = gq. This proves C gR
n
. But gR
n
CR
n
C as C is an ideal. So
gR
n
= C.
Uniqueness: another polynomial h with this property must be divisible by g, so, given that
h and g are monic and of the same degree, h = g.
Finally, putting f(x) = x
n
1 so that f = 0 C, we observe as above that x
n
1 = g(x)q(x).
So g(x) divides x
n
1.
Denition (generator polynomial, check polynomial)
If C F
n
q
is a cyclic code (i.e., an ideal of R
n
), the polynomial g(x) F
q
[x] given by Theo-
rem 6.3 is called the generator polynomial of C.
The polynomial h(x) such that g(x)h(x) = x
n
1 is called the check polynomial of C. If
deg g = r, deg h = n r, and h is monic.
The next result shows how to nd a generator matrix and a check matrix of a cyclic code.
Theorem 6.4 (generator matrix and check matrix of a cyclic code)
Let C F
n
q
be a cyclic code with generator polynomial g(x) = g
0
+g
1
x+. . . +g
r1
x
r1
+g
r
x
r
and check polynomial h(x) = h
0
+ h
1
x + . . . + h
k
x
k
. We note that k = n r and that
g
r
= h
k
= 1 (the polynomials are monic). Then
G =
_
_
g
0
g
1
. . . . . . g
r1
1 0 . . . 0
0 g
0
g
1
. . . . . . g
r1
1
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . 0 g
0
. . . . . . g
r1
1
_
_
(k rows),
H =
_
_
1 h
k1
. . . . . . h
1
h
0
0 . . . 0
0 1 h
k1
. . . . . . h
1
h
0
.
.
.
0
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . 0 1 . . . . . . h
1
h
0
_
_
(r rows)
are a generator matrix and a check matrix for C, respectively.
Comment: The matrix G has n columns and n r = k rows. The rst row of G is the
codeword corresponding to the polynomial g(x). The rest of the rows are cyclic shifts of the
rst row.
The matrix H has n columns and r rows. The rst row of H contains the coefcients of h(x)
but in reverse order. The rest of the rows are cyclic shifts of the rst row.
This is not the only generator matrix (resp., check matrix) for C. As we know, a generator
matrix is not unique. Moreover, these matrices are not usually in standard form. Note that
a generator polynomial of C is unique.
To prove the Theorem, we will need the following lemma.
Lemma 6.5. In the above notation, C = {c R
n
: ch = 0}.
The proofs of Theorem 6.4 and Lemma 6.5 will be given in the next lecture.
We will now use Theorem 6.4 in our classication of cyclic binary codes of length 3.
Example
Classify the binary cyclic codes of length 3.
Solution. We need to determine all the possible generator polynomials monic factors of
x
3
1 in F
2
[x]. Observe:
x
3
1 = (x 1)(x
2
+x +1).
The polynomial x 1 = x +1 is irreducible (= cannot be written as a product of polynomials
of lower degree), because it is of degree 1.
Can we factorise the polynomial x
2
+x+1 in F
2
[x]? If we could, we would have a factorisation
(x +a)(x +b). But then ab = 1 which means a = b = 1 in F
2
. Note that (x +1)
2
= x
2
+1 in
F
2
[x]. We have shown that x
2
+x +1 is irreducible in F
2
[x].
So the possible monic factors of x
3
1 in F
2
[x] are:
1; 1 +x; 1 +x +x
2
; 1 +x
3
.
We can now list all the cyclic codes in F
3
2
as ideals of R
3
generated by each of the above
polynomials. For each code we can give a generator matrix G.
g(x) = 1, G =
_
_
1 0 0
0 1 0
0 0 1
_
_ which corresponds to the trivial binary code of length 3:

{000, 100, 010, 001, 110, 101, 011, 111}.
g(x) = 1 + x, G =
_
1 1 0
0 1 1
_
. This is {000, 110, 011, 101}, the binary even weight code
of length 3.
g(x) = 1 + x + x
2
, G =
_
1 1 1
_
. This is {000, 111}, the binary repetition code of
length 3.
g(x) = 1+x
3
. Theorem 6.4 forces the matrix G to have 33 = 0 rows, G = [ ]. In this
case, the generator matrix does not make much sense. We can look at the dimension
of the code, which by the Theorem equals 3 3 = 0. The only code of dimension 0 is
the zero code, {000}. It is a useless code but formally it is a linear and cyclic code, so
we have to allow it for reasons of consistency.
End of example.
Synopsis. We conclude the section on cyclic codes by proving the theorem about genera-
tor and check matrices for a cyclic code. We dene two Golay codes and give a complete
classication of perfect codes up to parameter equivalence.
Proof of Lemma 6.5.
C = {ga | a(x) F
q
[x]} = {c | a(x) F
q
[x] : c(x) = g(x)a(x) }
= {c | a(x) F
q
[x] : c(x)h(x) = g(x)h(x)a(x) }
= {c | a(x) F
q
[x] : c(x)h(x) = a(x)(x
n
1) }
= {c | a(x) F
q
[x] : c(x)h(x) (x
n
1) }
= {c | ch = 0 in R
n
}.
Proof of Theorem 6.4. The rows of the matrix G (respectively, H) are linearly independent,
because this matrix is in row echelon form. Hence G and H are generator matrices of linear
codes, say, C
G
and C
H
, of dimension k and r, respectively.
Note that C
G
C: indeed, the rst row of G is g which is in C; the other rows are cyclic
shifts of g, so they are also in C as C is cyclic. Hence dimC k.
Let c = c
0
+. . . +c
n1
x
n1
C. Write h = h
0
+. . . +h
k1
x
k1
+x
k
. By Lemma 6.5, ch = 0
in R
n
. When we expand 0 = ch, we get
0 = c
0
h
0
+ (c
0
h
1
+c
1
h
0
)x +. . . + (c
0
h
n1
+. . . c
n1
h
0
)x
n1
+ (c
1
h
n1
+... +c
n1
h
1
) 1 + (c
2
h
n1
+... +c
n1
h
0
)x +. . . + (. . .)x
k1
,
remembering that x
n
= 1 in R
n
. Note that 1, x, . . . , x
n1
are a basis of R
n
, and the coefcient
of x
k
is 0 = c
0
h
k
+. . . +c
k
h
0
. This is the inner product c (h
k
, h
k1
, . . . , h
0
, 0, . . . , 0).
Because c C was arbitrary, we proved that the rst row of H lies in C
. But C
is cyclic
(Question 24(ii) on the example sheets), so C
contains all rows of H, because they are

cyclic shifts of the rst row. Hence dimC
r.
But then n = k +r dimC + dimC
= n. So dimC = k and C = C
G
. Also, dimC
= r and
C
= C
H
.
Remark
Recall that:
the only way to specify a general non-linear code in F
n
q
is to list all the codewords,
which consist of a total of q
k
n symbols;
a linear code can be specied by a generator matrix, which has k n entries;
a cyclic code can be specied in an even more compact way by giving its generator
polynomial, which corresponds to a single codeword! We only need to specify n k
coefcients of the generator polynomial (its degree is n k and its leading coefcient
is 1).
What do we use check matrices for?
For example, to nd the minimum distance of a linear code.
Strategy of searching for interesting/perfect/etc codes:
Look for divisors of x
n
1 and hope that the cyclic codes they generate have a large mini-
mum distance.
Example: two new perfect codes the Golay codes
The following two codes were found by Maurice Golay in 1949. They are known as the
binary Golay code and the ternary Golay code, respectively.
In F
2
[x], x
23
1 = g(x)h(x), where g(x) = x
11
+ x
10
+ x
6
+ x
5
+ x
4
+ x
2
+ 1 and h(x) =
x
12
+x
11
+x
10
+x
9
+x
8
+x
5
+x
2
+1. (Exercise: check this!)
The cyclic code generated by g is of length 23 and dimension 12. It has minimum distance
7 and corrects 3 errors. It is a perfect binary code denoted G
23
.
Trivia
The code G
23
was used by Voyager 1 and 2 spaceships (NASA, Jupiter and Saturn, 1979-
81).
1
The ternary Golay code
In F
3
[x], x
11
1 = g(x)h(x) where g(x) = x
5
+ x
4
+ 2x
3
+ x
2
+ 2 and h(x) = x
6
+ 2x
5
+ 2x
4
+
2x
3
+x
2
+1.
The ternary Golay code G
11
is the cyclic code of length 11 generated by g. It is an [11, 6, 5]
3
code. It is perfect.
Remark: Golay found his two perfect codes in 1949, before cyclic codes were discovered.
He dened the codes G
23
and G
11
by writing their check matrices.
Amazingly, Golays 1949 paper where he constructs all the Hamming codes and the two
Golay codes, is barely half a page long. A copy of the paper is displayed on the course
website.
Now we can state the classication theorem about perfect codes. It was proved in 1973,
more than twenty years since Golay gave a conjecturally complete list of perfect codes. We
will not give its proof here, but you should learn the statement of the theorem:
Theorem 6.6.
A perfect [n, k, d]
q
-code is parameter equivalent to one of the following:
a trivial code: n arbitrary, k = n, d = 1, q arbitrary;
a binary repetition code of odd length: n odd, k = 1, d = n, q = 2;
a Hamming code Ham(s, q): n = (q
s
1)/(q 1), k = n s, d = 3, q arbitrary;
the Golay code G
23
, which is a [23, 12, 7]
2
-code;
the Golay code G
11
which is a [11, 6, 5]
3
-code.
1
In fact, the Golay code was used in a version extended to 24 bits by adding a parity check bit to each codeword,
as shown in Question 9 on example sheets. This increased the minimum distance to 8 thereby improving error
detection (not affecting error correction). But the extended 24 bit code is no longer perfect.
7. Reed-Muller codes
Synopsis. The minimum distance of a perfect code cannot exceed 7 unless the code is a
repetition code. This is disappointingly low. In this nal section of the course, we construct
Reed-Muller codes, a family of codes with large minimum distance. Unfortunately, they are
not perfect. First, we need to cover the necessary material on the Boolean algebra.
The rst notion we will need to dene Reed-Muller codes is that of a Boolean function. We
x m 1.
Denition (Boolean functions)
Denote by V
m
the set of all binary words of length m. (It is the same as F
m
2
but viewed
without any vector space structure). A Boolean function is a (set-theoretical) function
f: V
m
F
2
.
Remark (the number of Boolean functions)
The total number of all Boolean functions on V
m
is |F
2
|
|V
m
|
= 2
2
m
.
Remark (Boolean functions as rows of a truth table)
One has certainly met Boolean functions when constructing truth tables for statements in
basic logic. To give an illustration, let m = 3. Consider statements which involve variables
x
1
, x
2
, x
3
, each of which can take values 0 (FALSE) or 1 (TRUE).
We will represent a logical statement by a row (not column) in a truth table. (We use rows
because it is common in Coding Theory to think of codewords as of row vectors; and we will
see that in Reed-Muller codes, the role of codewords is played by certain Boolean functions.)
In our example (m = 3), the table will have 8 columns:
x
1
0 1 0 1 0 1 0 1
x
2
0 0 1 1 0 0 1 1
x
3
0 0 0 0 1 1 1 1
(x
1
AND x
2
) = x
3
1 1 1 0 1 1 1 1
0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1
v
2
v
3
0 0 0 0 0 0 1 1
In this table, (x
1
AND x
2
) = x
3
is a statement whose truth value depends on the values
of x
1
, x
2
and x
3
. Therefore, it can be viewed as a Boolean function: its value at the binary
word 000 is 1, at the word 100 the value is 1, and so on. The only binary word where this
function takes the value 0 is the word 110: indeed, if x
1
and x
2
are TRUE, then x
1
AND x
2
is
TRUE, but x
3
is FALSE, and the value of the implication TRUE = FALSE is FALSE.
(The other rows in the table will be explained below.)
The Boolean algebra
Because Boolean functions take values in F
2
= {0, 1} which is a eld, Boolean functions can
be added and multiplied pointwise: if f, g: V
m
F
2
, one has the functions
f +g, fg: V
m
F
2
; (f +g)(x) = f(x) +g(x), (fg)(x) = f(x)g(x), x V
m
.
Also, there are constant functions 0 and 1. (They are shown in the 2nd, respectively 3rd,
row of the truth table above.) The Boolean function 1 is often called the tautological truth.
We conclude that Boolean functions form a commutative ring.
The traditional logical operations can be written in terms of the ring operations + and .
Clearly, multiplication is the same as AND:
fg = f AND g.
The addition obeys the rule 0 + 0 = 0, 0 + 1 = 1 + 0 = 1, 1 + 1 = 0. The logical operation
which corresponds to addition is called the exclusive OR:
f +g = f XOR g = (f OR g) AND NOT(f AND g).
The ring of Boolean functions on V
m
is also a vector space over F
2
.
The ring/vector space of Boolean functions on V
m
is called the Boolean algebra on V
m
.
Dimension of the Boolean algebra
Because the Boolean algebra on V
m
has 2
2
m
elements, it is of dimension 2
m
over F
2
.
We will now introduce two special kinds of elements of the Boolean algebra: coordinate
functions and, more generally, monomial functions.
Denition (coordinate function)
The Boolean function v
i
: V
m
F
2
dened by v
i
(x
1
, x
2
, . . . , x
m
) = x
i
is called the ith coordi-
nate function.
Denition (monomial function)
To each subset {i
1
, . . . , i
r
} {1, . . . , m} there corresponds the monomial function (or mono-
mial) v
i
1
. . . v
i
r
, of degree r.
Also, 1 is the monomial function corresponding to the set , of degree 0.
Remark. Properties of monomials:
Observation: because the values of any Boolean function are 0 and 1, one has v
i
=
v
2
i
= v
3
i
= . . .. This is the reason why there are no higher powers of the v
i
in the
denition of a monomial.
There are 2
m
monomials in the Boolean algebra on V
m
(because there are 2
m
subsets
of {1, . . . , m}).
There are
_
m
r
_
monomial functions of degree r.
Lemma 7.1. Monomials are a basis of the Boolean algebra.
Proof. There are 2
m
(= the dimension of the Boolean algebra on V
m
) monomials. Therefore,
it is enough to prove only one thing: that the monomials are linearly independent, or that
the monomials span the Boolean algebra.
We choose to show that they span the Boolean algebra.
To each point x V
m
there is associated a delta-function:
x
: V
m
F
2
,
x
(y) =
0 if y = x,
1 if y = x.
There are 2
m
delta-functions on V
m
. We note that the delta-functions span the Boolean
algebra. Indeed, let f: V
m
F
2
be a Boolean function. We have
f =
xV
m
f(x)
x
,
which is easily seen if one evaluates both sides on an arbitrary vector y V
m
: the left-hand
side yields f(y) while the right-hand side yields f(y)1.
It remains to show that each delta-function lies in the linear span of monomials. Let x =
(x
1
, . . . , x
m
) be a binary word. Observe that
x
=
i:x
i
=1
v
i
j:x
j
=0
(1 +v
j
).
Indeed, the product on right-hand side becomes 1 if it is evaluated at the word x, and 0 if it
is evaluated at a word which differs from x in at least one bit.
Now open the brackets and expand the product to see that
x
is a sum of products of some
of the v
i
, i.e., a sum of several monomials.
Synopsis. We dene the Reed-Muller codes R(r, m) as certain subspaces of the Boolean
algebra on V
m
. We calculate the parameters of R(r, m).
Denition (Reed-Muller code)
The rth order Reed-Muller code on V
m
, denoted R(r, m), is the subspace of the Boolean
algebra on V
m
spanned by monomials of degree at most r.
(Here 0 r m.)
How to write codewords of R(r, m) as row vectors in F
2
m
2
?
Order all binary words of length m as y
0
, . . . , y
2
m
1
. The standard ordering is obtained by
giving the word x
1
x
2
. . . x
m
the number x
1
+x
2
2+. . . +x
m
2
m1
. Thus, the binary words
of length 3 appear in the following order: 000, 100, 010, 110, 001, 101, 011, 111.
Let f: V
m
F
2
be a Boolean function. To f we associate the value vector of f, which is the
binary vector (f(y
0
), . . . , f(y
2
m
1
)), of length 2
m
.
We can say that R(r, m) is spanned by the value vectors of all monomials of degree at most
r in the Boolean algebra on V
m
.
Lemma 7.2. R(r, m) has length 2
m
, dimension
_
m
0
_
+
_
m
1
_
+. . . +
_
m
r
_
.
Proof. Length = 2
m
by construction; dimension is the number of monomials of degree
0, 1, . . . , r, which are a basis of R(r, m).
To nd the minimum distance of R(r, m), we need the following auxiliary construction.
The bar product (the [ u| u +v ] construction)
If u, v F
n
q
, the vector [ u| v ] F
2n
q
is formed by u followed by v.
Denition (the bar product)
Let C
1
, C
2
F
n
q
be two linear codes. The linear code
|C
1
|C
2
| = {[ u| u +v ] : u C
1
, v C
2
}.
of length 2n is called the bar product of C
1
and C
2
.
Theorem 7.3. d(|C
1
|C
2
|) = min{ 2d(C
1
), d(C
2
) }.
Proof. Let [ u| u+v ] be a non-zero codeword in |C
1
|C
2
|, where u C
1
and v C
2
. Its weight
is w(u) +w(u +v).
If v = 0, this is
2w(u) 2d(C
1
) min{ 2d(C
1
), d(C
2
) }.
If v = 0, we use the (in)equalities w(x) = w(x) and w(x + y) w(x) + w(y), explained in
the solution to Question 13 (see example sheets):
w(u) +w(u +v) = w(u) +w(u +v)
w(u +u +v)
= w(v) d(C
2
) min{ 2d(C
1
), d(C
2
) }.
This shows that w(|C
1
|C
2
|) min{ 2d(C
1
), d(C
2
) }.
But is also true: if u
i
C
i
is of weight w(C
i
), i = 1, 2, then w([ u
1
| u
1
+ 0]) = 2d(C
1
) and
w([ 0| 0+u
2
]) = d(C
2
), hence the minimum weight of |C
1
|C
2
| is at most min{ 2d(C
1
), d(C
2
) }.
Proposition 7.4. R(r +1, m+1) = |R(r +1, m)|R(r, m)|.
Explanation. A function f R(r +1, m+1) corresponds to linear combination of monomials
in v
1
, . . . , v
m+1
, of degree at most r + 1. Each monomial contains a copy of v
m+1
or none.
Hence
f(v
1
, . . . , v
m+1
) = g(v
1
, . . . , v
m
) +v
m+1
h(v
1
, . . . , v
m
),
where g R(r + 1, m) and h R(r, m). Observe that our numbering of binary words of
length m+1 is such that the value vector of v
m+1
is
v
m+1
=
_
0 0 . . . 0 | 1 1 . . . 1
_
F
2
m+1
2
,
and that g has value vector [ g| g] and v
m+1
h has value vector [ 0| h]. Thus, f has value
vector [ g| g +h], and any g, h as above yield f = g +v
m+1
h R(r +1, m+1).
Theorem 7.5. d(R(r, m)) = 2
mr
.
Proof. If r = 0, R(0, m) is spanned by the value vector of the monomial 1. Hence it is the
binary repetition code of length 2
m
. So d = 2
m
, as claimed.
If r = m, R(m, m) is the trivial code and has d = 1 = 2
0
, as claimed.
Now let us use induction in m. The case m = 1 is done (here either r = 0 or r = 1).
Inductive step: R(r +1, m+1) = |R(r +1, m)|R(r, m)|, hence, using the inductive hypothesis,
d(R(r +1, m+1)) = min{2 2
mr1
, 2
mr
} = 2
mr
, as claimed.
Trivia
The code R(1, 5) was used by NASA Mariner 9 space probe to transmit greyscale images of
the surface of Mars to Earth in 1972. It is a [32, 6, 16]
2
code. Each pixel was a 6-bit message,
representing 64 grey values, and encoded as a 32-bit codeword. The code corrected up to 7
errors in a codeword.

Lecture Notes

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lecture Notes

Hochgeladen von

Copyright:

Verfügbare Formate

MATH32031 Coding Theory 1

from a codeword v has a unique nearest neighbour, which is

. If w = v is a nearest neighbour of y, then d(y, w) t. Then

is better than code C if R(C

is better than code C

, because the dot repre-

is higher and to the right of the dot representing C.

errors, it is clear that we may never be able to make the likelihood

of a linear code C. If C has a generator matrix in

is a linear code. (By Question 12 on the example sheets.)

can be obtained from C by a

with n k rows, so we are done. Otherwise, using

. This shows that C (C

_ which corresponds to the trivial binary code of length 3:

contains all rows of H, because they are

Das könnte Ihnen auch gefallen