Sie sind auf Seite 1von 209

The Fast Fourier Transform

A Mathematical Perspective

Todd D. Mateer

Contents
Chapter 1. Introduction

Chapter 2. Polynomials
1. Basic operations
2. The Remainder Theorem and Synthetic Division
3. Modular reduction of a polynomial
4. Multipoint polynomial evaluation

17
17
19
21
22

Chapter 3. Complex numbers


1. Number systems
2. Complex arithmetic
3. Polar Representation of Complex Numbers
4. Primitive Roots of Unity
5. Eulers Formula
6. Rotation transformations

27
27
29
30
34
37
38

Chapter 4. FFT algorithms


1. The Binary Reversal Function
2. Classical radix-2 FFT
3. Operation count of the classical radix-2 algorithm
4. Twisted radix-2 FFT
5. Classical radix-4 FFT
6. Twisted radix-4 FFT
7. Radix-8 FFT algorithms
8. Split-radix FFT
9. Other 2-adic FFT algorithms
10. Properties of the DFT
11. FFTs of real data

45
45
47
53
58
63
68
70
76
80
82
87

Chapter 5. Inverse FFT algorithms


1. Lagrangian Interpolation
2. Classical radix-2 inverse FFT
3. Twisted radix-2 IFFT
4. Other 2-adic inverse FFT algorithms
5. The duality property

93
93
94
98
100
103

Chapter 6. Polynomial multiplication algorithms


1. Classical Multiplication
2. Karatsuba multiplication
3. FFT-based multiplication

107
107
110
114

CONTENTS

Chapter 7. The engineering perspective


1. The Discrete Fourier Series
2. A mathematicians perspective of the engineers FFT
3. The two types of engineering FFT algorithms
4. Convolution

117
117
124
126
132

Chapter 8. FFT algorithms for other input sizes


1. The Ternary Reversal Function
2. Classical radix-3 FFT
3. Twisted radix-3 FFT
4. Radix-5 FFTs
5. Radix-6 algorithms

139
139
141
148
155
158

Chapter 9. Additional topics


1. Additive Fast Fourier Transforms
2. Computer Algebra Algorithms
3. Truncated Fast Fourier Transform
4. Implementation of FFT Algorithms
5. Applications of the FFT
6. Concluding Remarks

163
163
170
173
176
177
177

Appendix A. The relationship between the Fourier Transform and the


Discrete Fourier Transform

179

Appendix B. Residue Rings


1. Background
2. Definitions
3. The set of residues is a ring
4. The residue ring (D\m, ,
(D/(m), , )
5. Examples
6. Concluding Remarks

193
193
194
195
) is isomorphic to the quotient ring
197
198
200

Appendix C. The convolution theorem

203

Appendix.

207

Bibliography

CHAPTER 1

Introduction
During the second half of the twentieth century, the Fast Fourier Transform
(FFT) has become one of the most important techniques in Electrical Engineering.
This statement is supported by the fact that over 2,000 papers have been published
on the topic since the 1960s [22] and that a list of over 75 applications of the
FFT are given in [7]. But what is the Fast Fourier Transform? This is not an
easy question to answer and the response you get depends upon who you ask.
After some introductory definitions, we will attempt to provide an answer to this
question. First, according to [DSP] a signal is defined as follows:
Signal.
A signal is any physical quantity that varies with time, space,
or any other independent variable or variables.

Those that have completed a high school mathematics curriculum may think that
the definition of signal closely matches the definition of a function. The only difference seems to be that a function is a mathematical concept, whereas a signal
represents something tangible in the real world. In this text, the terms signal
and function will be used interchangably.
Signals can be classified in many different ways. The first type of signal of
interest in this introductory section is:
Analog signal.
An analog signal is a function that is defined for all inputs in
a specified interval.

A signal is said to be continuous if it can be drawn on a piece of paper without


lifting ones pencil off of the paper. A signal is said to be bandlimited if all of the
nonzero outputs of the function are restricted to a finite interval of input values.

1. INTRODUCTION

EXAMPLE

Let xa (t) be the analog signal shown in the figure below. It consists of two triangles, each of which has a
width of 40 milliseconds. The signal is zero for all inputs less than 20 milliseconds and for all inputs more
than 140 milliseconds, so it is bandlimited. The signal can be drawn without lifting ones pencil off of the
paper, so it is continuous.
xa (t)
1
.5
0

t
80ms

160ms

.5
1


Digital signal.
A digital signal is a function that is only defined at certain
values of time (usually multiples of an integer) and is a function that
only has a finite set of possible values.

Given an analog function x(t), a digital signal x( ) can be constructed by


sampling x(t) at evenly-spaced inputs determined by the sampling interval.

EXAMPLE

We are going to construct the digital signal x( ) by


sampling xa (t) above over the input range 0 ms t <
160 ms using 16 samples. To do so, the required sampling interval is 160 milliseconds 16 = 10 milliseconds. The resulting function is given below.
x( )
1
.5
0

.5
1

1. INTRODUCTION

A true digital signal only has a finite number of possible outputs, so x(t) is
typically rounded to the nearest finite value in this case. In this text, we will relax
this assumption and work with discrete-time signals which sample an analog
signal at evenly-spaced intervals, but place no restriction on the allowable output
values.
Now that we have seen the difference between an analog and discrete-time
signal, we are ready to return to the goal of figuring out what is meant by the
phrase Fast Fourier Transform. First, we will introduce the Fourier Transform
which operates on analog signals. Next, we will proceed to the Discrete Fourier
Transform which operates on discrete-time signals. Finally, we will explain what a
Fast Fourier Transform is and how it relates to the Discrete Fourier Transform.
In a typical signals analysis course in an Electrical Engineering curriculum, the
Fourier transform of some analog signal f (t) is usually defined by the following
formula

F (s) =

f (t) e2tsI dt.

Here I is a symbol that represents 1, used with complex numbers, t is a real


variable, and f (t) is a function that may produce either a real or a complex number
output. For a person studying the FFT for the first time, this formula might be
somewhat scary as it involves both Calculus and complex numbers. But the reader
should take comfort in the fact the optional appendix is the only part of this book
that will require any knowledge of Calculus and a full chapter of the book will be
devoted to basic properties of complex numbers.
The Discrete Fourier Transform of some discrete-time signal fe( ) is usually
computed using either the formula
F (k) =

N
1
X
=0

or
F (k)

fe( ) e2k/N I

N 1
1 X e
f ( ) e2k/N I
N =0

depending on what book one picks up. The mathematical notation used in these
formulas means to add up the expression that follows the symbol at each value
of from 0 to N 1. For example, if N = 4, then
N
1
X
=0

0+1+2+3=6

1. INTRODUCTION

A question that is often asked at this point is how the Fourier transform and the
discrete Fourier transform are related. This is also not an easy question to answer
and requires a knowledge of Calculus. A fairly detailed discussion of this topic can
be found in the appendix. For now, let it suffice to say that fe( ) is a function that
samples f (t) uniformly at N locations. As N increases, fe( ) becomes a better and
better approximation to f (t) and the Discrete Fourier Transform becomes a better
and better approximation to the Fourier Transform. From this point forward, we
will use the simpler notation f ( ) to represent the sampled version of f (t).
To illustrate how the Fourier Transform and Discrete Fourier Transform relate
to one another, let us consider what is probably the most popular example used in
courses that cover the Fourier Transform. The so-called rectangle function is a
function that takes on a value of 1 in the interval 1/2 < t < 1/2 and is defined to
be zero elsewhere.1 A graph of the rectangle function is given by

One nice feature about this function is that it is symmetric about the vertical
axis. This means that the graph to the left of the vertical line above is the mirror
image of the graph to the right of the vertical line. When a function is symmetric
about the vertical axis, the Fourier Transform definition simplifies to

F (s)

2 f (t) cos(2t s) dt

which eliminates the complex variables in this case.


Earlier, it was mentioned that Calculus is not required to understand the material in this book. However, for the benefit of those readers that do have a background in Calculus, the Fourier transform of the rectangle function can be computed
as follows:

1At the values of t = 1/2 and t = 1/2 where the function suddenly jumps between 0 and
1, the rectangle function is usually defined to have a value of 1/2.

1. INTRODUCTION

F (s) =

1/2

=
=

2 1 cos(2 t s) dt

sin(2 t s)
s
sin(s)
s

1/2

t=0

which is defined for all values of s 6= 0. The sinc function is defined using the
above formula with the added condition that sinc(0) = 1.
Two perspectives of the sinc function are given below. The domain of this
function is all of the real numbers, so it is impossible to show a graph of the entire
function. Observe that the sinc function has a value of zero for all integer inputs
with the exception that the function has an output of one when the input is zero.
This is a desirable feature of the sinc function which is useful in engineering.
1

To compute the Discrete Fourier Transform of the rectangle function, we sample


the above function at N evenly spaced points over an interval of size T0 . To simplify
the notation, we will use the interval T0 /2 x < T0 /2. The parameter T0 can
be chosen to be whatever value that one wishes. Here, we will let T0 = 2. Other
values for T0 will be explored in the exercises. The number of samples N can be
chosen to be whatever value one wishes as well.
Because the rectangle function is symmetric about the vertical axis, the Discrete
Fourier Transform formula simplifies to:

F (k) =

1
N

NX
2 1

=N2

f ( ) cos(2 k/N )

which also eliminates the complex variables. The sample values of the rectangle
function can now be used to determine the DFT of the rectangle function for each
value of k in the range N/2 to N/2 1. 2
For example, when N = 4, then
2Typically, the range 0 to N 1 is used for k, but this alternative range was selected so that
the DFT outputs more closely match the graph of the sinc function given earlier.

1. INTRODUCTION

F (1) =

=
=
=

0 cos(2 1/4 (2)) + 1/2 cos(2 1/4 (1))


4
1 cos(2 1/4 (0)) + 1/2 cos(2 1/4 (1))
+
4
0 cos() + 1/2 cos(/2) + 1 cos(0) + 1/2 cos(/2)
4
0+0+1+0
4
1
4

Graphs of the Discrete Fourier Transforms of the rectangle function with 4, 8,


16, and 32 samples are given in the right column of the figures on the next page.

1. INTRODUCTION

Observe that as N increases, the results of the Discrete Fourier Transform look
more and more like the sinc function.
But we still have not really answered the question What is the Discrete Fourier
Transform?. To resolve this question, we turn to the 1807 publication by Fourier

10

1. INTRODUCTION

that describes the construct that bears his name. It can be shown that any periodic function can be represented by an infinite series of sine and cosine functions
called a Fourier Series. It turns out that as the parameter T0 increases without
bound, then nearly any function defined over a finite interval of input values can
be represented by an infinite series of sine and cosine functions and the inverse of
the Fourier transform

f (t) =

F (s) e2tsI ds

provides this representation.


It turns out that the N function samples used in the Discrete Fourier Transform
can be generated by a function that is the sum of at most N sine and cosine
functions called a Discrete Fourier Series. The magnitudes of these sinusoids
are determined by the values found in the Discrete Fourier Transform. In the
example considered in this section, the function was symmetric about the vertical
axis. In this special case, the Discrete Fourier Transform is given by

F (k) =

1
N

NX
2 1

=N2

f ( ) cos(2 k/N )

and gives the magnitude of the sinusoid cos(k/N ) in the Discrete Fourier Series,
This is illustrated in the figure on the next page. In the left column, each of the
components of the Discrete Fourier Transform are shown for the case where N = 4.
In the right column are the corresponding cosine functions. The function cos(k/N
) is displayed with a light line in each graph and the scaled version of the function
(determined by the Discrete Fourier Transform component) is displayed with a
dark line. At the bottom of the page, the complete Discrete Fourier Transform is
displayed along with the result of adding all of the scaled sinusoids with the original
sample values displayed on the graph. This graph represents the function

f4 ( )

0 cos ( ) +

  1
 
1
1
cos + cos (0 ) + cos

4
2
2
4
2

=
Observe that
cos
(0

)
is
equal
to
1
for
all
values
of

and
that
cos

2


cos 2 due to the symmetry properties of the cosine function. Therefore, the
above function can also be represented as

f4 ( )

 
1 1
+ cos

2 2
2

which is a more compact way of expressing the Discrete Fourier Series in this case.

1. INTRODUCTION

11

12

1. INTRODUCTION

When N = 8, the Discrete Fourier Series becomes

f8 ( )

  1 2
2
1
+
cos
+
cos ( )
2
4
2
4

using the magnitudes of the Discrete Fourier Transform computed earlier. The
function f8 ( ) is displayed in the left graph below and the function is displayed
with the original sample values in the right graph.

The graphs of the Discrete Fourier Series for the cases N = 16 and N = 32
are given below. Observe that as the number of samples increases, the resulting
Discrete Fourier Series more closely matches the rectangle input function.

Thus,

1. INTRODUCTION

13

Discrete Fourier Transform.


The Discrete Fourier Transform is a method that determines the
magnitudes of up to N sinusoids that can be combined and used to
recover a particular sequence of N samples of some real input function
over the interval T0 /2 < x < T0 /2. The resulting function, called a
Discrete Fourier Series can be used to approximate this input
function. As the number of samples N increases, the Discrete Fourier
Series becomes a better and better approximation to the input signal
within this interval. As the parameter T0 increases, the Discrete
Fourier Series approximates the input function over a wider interval.

How much effort does it take to compute the Discrete Fourier Transform? It
appears that the Discrete Fourier Transform formula must be evaluated N times
and that there are N terms involved in each formula evaluation. Also, every term
in the summation involves one multiplication. Thus, one is tempted to conclude
that a total of N 2 multiplications and N 2 N additions are required.
In 1965, a short 5 page paper written by Cooley and Tukey [11] appeared in
the literature which forever changed the field of Electrical Engineering. This paper
described an algorithm which computes the Discrete Fourier Transform in roughly
1/2 N log2 (N ) multiplications and N log2 (N ) additions. This algorithm is now
called the Fast Fourier Transform (FFT) and significantly reduces the amount
of work needed to compute the Discrete Fourier Transform for large sizes.
The following figure illustrates how much the FFT improves the Discrete Fourier
Transform computation. The upper line represents the number of multiplications
required if the formulas prevented earlier in this section are evaluated literally. The
lower line (which is somewhat difficult to distinguish from the horizontal axis) represents the number of multiplications required using the Cooley-Tukey algorithm.

14

1. INTRODUCTION

This graph only covers FFT sizes up to 256 where the slow method requires
65,536 multiplications and the fast method requires about 1,000 multiplications.
Typical FFT sizes are in the thousands or higher, but the difference in the number
of operations for these sizes is so significant that one really would not be able to tell
the difference between the FFT graph and the horizontal axis. One can already see
the incredible improvement of the FFT in the amount of effort needed to compute
the Discrete Fourier Transform.
Although the Cooley-Tukey paper is one of the most influencial papers of the
20th century, it is not the first time that the technique was first described in the
literature. The account of [23] shows that what is now known as the FFT may
have been discovered as early as 1805 by Carl Gauss. The main difference between
the success of the Cooley-Tukey paper and the unappreciated earlier papers was
a new invention called a computer which could be used to actually perform the
computations. Once engineers were able to compute the Discrete Fourier Transform
so quickly, they found many uses for the technique.
The process of creating a function which has a specified collection of function
values is called interpolation. Thus, the Discrete Fourier Transform can be viewed
as an interpolation process of the given sample values. The Discrete Fourier Transform can be reversed to receive the Discrete Fourier Series as an input and recover
the N sample values. Because the Discrete Fourier Series is being evaluated at N
points, this process is called multipoint evaluation.
There are many good books (e.g. [7]) which consider the FFT from the above
perspective and give algorithms that can efficiently compute the FFT. This book
approaches the FFT from a different perspective which is more common among
some mathematicians. The alternative perspective defines the FFT as a special
case of multipoint evaluation and the inverse FFT to be an interpolation algorithm.
This alternative viewpoint was first introduced in a 1971 paper by Fiduccia [15]
and has since become popular with researchers who are interested in Computer
Algebra and Error-Correcting Codes. This is a somewhat unfortunate development
because papers are now written using both of the perspectives and it can become
confusing for the beginning student to read these documents and have the same
perspective on the problem as the author.
A nice feature of the Fiduccia perspective is that the FFT can be developed from
an algebraic perspective using the remainders that result when two polynomials
are divided. This algebraic perspective has been studied in much greater detail by
Daniel Bernstein whose work ([1], [2] , [3]) has been very influencial of the present
authors understanding of this alternative perspective of the FFT. This textbook
is intended to further expand upon the ideas of these two pioneers and present the
Ficuddia perspective of the FFT in a form that both undergraduate engineering
and mathematics students can understand and appreciate.
Because both viewpoints of the FFT are used in publications, both perspectives
will be treated in this book. After some background material in Chapters 2 and 3,
the FFT will be developed from the Fiduccia perspective in Chapters 4-5. Chapter
6 will then consider the problem of fast polynomial multiplication, an important
application of the FFT. Next in Chapter 7, we will return to the more traditional

1. INTRODUCTION

15

approach to the FFT considered in this first chapter. Some additional topics will
be considered in Chapters 8 and 9.
EXERCISES.
1. Use a computer package to produce a graph of the rectangle function and
the sinc function as shown earlier in this chapter.
2. Write a routine that computes the Discrete Fourier Transform of a function
that is symmetric about the vertical axis using the formula given in this chapter.
Use the routine to compute the Discrete Fourier Transform of the rectangle function.
Make a graph of the Discrete Fourier Transform for N = 4, 8, 16, or 32 to verify
the results given in this section.
3. Compute the Discrete Fourier Transform of the rectangle function for the
case N = 128 (or even higher). Use these components to construct the Discrete
Fourier Series for the case N = 128 and graph the result. The graph should really
look like the rectangle function now (see below).

However, there are spikes where the graph transitions between 0 and 1. This
is called Gibbs phenonenon and is a problem that engineers must deal with when
working with signals constructed from the Discrete Fourier Transform. Special
components are used to prevent a signal from exceeding a certain value (called
overshoot) or going below a certain value (called undershoot). The result of a
signal going through these components might look something like the following:
FIGURE
4. Throughout this chapter, assumed that T0 = 2. Select one or more of the
cases: T0 = 1/2, T0 = 1, T0 = 4, T0 = 8 and:
(A) Sample the function uniformly over one period of the function using N = 32
sample values. Produce a graph of the results.
(B) Compute the Discrete Fourier Transform of the sampled function and produce a graph of the results. Compare it with the graph of the sinc function produced
in this chapter. In particular, how does the height of the function compare with
the one produced in this chapter and how many values of the sinc function are
produced between each time that the function crosses the horizontal axis?

16

1. INTRODUCTION

(C) Try to reconstruct the rectangle function with the Discrete Fourier Series
based on your Discrete Fourier Transform. Comment on the success or failure of
your attempt.
(D) The results of this exercise only give scaled versions of the input function.
Based on you work completed for this exercise, can you guess what scaling factor
(in terms of N and T0 ) that each of these results should be multiplied by to recover
the original input function?

CHAPTER 2

Polynomials
1. Basic operations
In a typical introductory algebra course (e.g. [4]), one is introduced to the
concept of a polynomial. Although polynomials involving multiple variables are
frequently used in algebra, here we will restrict ourselves to polynomials involving
one variable. First, let us review the concept of a monomial.

Monomial.
A monomial is an expression of the form
a xn

where a is any real number called the coefficient and


n is any nonnegative integer called the exponent.

Monomials can be combined to form polynomials, the topic of this chapter.

Polynomial.
A polynomial is either a monomial or a sum or difference of
monomials. Each monomial which comprises a polynomial is called
a term.

Although the above definitions specify that the coefficients are real numbers, it
is possible to create polynomials for other number systems. We will create polynomials with complex number coefficients in the next chapter.
The degree of a monomial is simply its exponent in the case of single variables.
The degree of a polynomial is the largest exponent which appears in one of its terms.
The monomial which has this exponent in the polynomial is called the leading
term.
EXAMPLE
17

18

2. POLYNOMIALS

A polynomial should be simplified whenever possible. This involves combining


like terms or its monomials which have the same exponent. In this case, one
replaces these monomials with a new term whose coefficient is the sum or difference
of the coefficients of the terms with this common exponent.
EXAMPLE
We will now review the four basic arithmetic operations involving polynomials.
The symbols f (x), g(x), h(x), etc. are used to label polynomials with variable x.

Addition of polynomials.
The sum of the polynomials f (x) and g(x) is determined by
forming the expression f (x) + g(x) and combining like terms.

EXAMPLE.
Before reviewing the operation of subtraction, recall that an opposite or additive inverse of a number a is some other number b such that a + b = 0. If a is
positive, then b is formed by putting a minus sign in front of a. If a is negative, then
b is formed by removing the minus sign in front of a. The opposite of a polynomial
f (x) is formed by replacing each term of f (x) with a term of the same degree and
the opposite of the coefficient in f (x). We will denote the opposite of f (x) with
the notation f (x).
EXAMPLE
Subtraction of polynomials.
The difference of the polynomials f (x) and g(x) is determined by
forming the expression f (x) + (g(x)) and combining like terms.

To multiply two monomials, we multiply the coefficients of the two monomials


and add the degrees of the monomials.
EXAMPLE
Multiplication of polynomials.
The product of the polynomials f (x) and g(x) is determined by
multiplying every term of f (x) by every term of g(x) and
combining like terms.
EXAMPLE

2. THE REMAINDER THEOREM AND SYNTHETIC DIVISION

19

As one can see, polynomial multiplication requires much more effort than polynomial addition or subtraction. In Chapter 6, we will see that the FFT can be used
to significantly reduce the amount of work needed to multiply two polynomials of
large degree.
Polynomials are an example of what is called Euclidean Domain. This means
that given two polynomials a(x) and b(x), there exists a polynomial q(x) called a
quotient and r(x) called a remainder such that

a(x)

q(x) b(x) + r(x)

with the property that either r(x) is the zero polynomial or else the degree of r(x)
is less than the degree of b(x). It can be shown that q(x) and r(x) are unique in
this particular type of Euclidean Domain.
To determine the quotient and remainder of a(x) divided by b(x), we follow a
procedure that works much like the division of two integers.
Division of polynomials.
The quotient q(x) of the polynomial a(x) divided by b(x) with
remainder r(x) is determined by following the following sequence of
steps:
1. Initialize r(x) equal to a(x).
2. While deg(r(x)) deg(b(x)):
3.
Divide the leading term of r(x) by the leading term of b(x).
Call this term `(x) and add it to the quotient q(x).
4.
Subtract `(x) b(x) from r(x).
5. End while

EXAMPLE
It can be shown that there are no other polynomials q(x) and r(x) such that
a = q(x) b + r(x) that have the property that the degree of r(x) is less than the
degree of b(x).

2. The Remainder Theorem and Synthetic Division


Suppose that we wish to evaluate a polynomial a(x) at some value , i.e. we
wish to compute a(). In the previous section, we learned that there are unique
polynomials q(x) and r(x) such that

20

2. POLYNOMIALS

a(x)

q(x) (x ) + r(x)

if b(x) = x where either r(x) must be 0 or the degree of r(x) is less than the
degree of x , i.e. a polynomial of degree 0. In either case, r(x) must be a
constant. Let us call this constant C. If we evaluate the above equation at , we
obtain

a()

= q() ( ) + r()

= q() 0 + C
= C

We have proven the so-called Remainder Theorem:


Remainder Theorem.
If a(x), a polynomial with real coefficients, is divided by the
polynomial x , the remainder is equal to a().
EXAMPLE
If one divides a polynomial by another polynomial of the form x , i.e. a
polynomial of degree 1 with a coefficient of 1 on the variable term, we do not need
to write down the computations involving this variable term. This process is called
synthetic division.
EXAMPLE
One may encounter an alternative technique for evaluating a polynomial called
Horners Rule.
Horners Rule.
Given a point and a polynomial
a(x) = an1 xn1 + an2 xn2 + + a2 x2 + a1 x + a0 , we can
compute the polynomial evaluation a() using the formula
a() = ((. . . ((an1 x) + an2 x) + . . . ) + a1 x) + a0
EXAMPLE
By comparing the above two examples, we see that evaluation of a polynomial
using Horners Rule involves the exact same computations as the Remainder Theorem implemented through synthetic division, but in a slightly different presentation.

3. MODULAR REDUCTION OF A POLYNOMIAL

21

3. Modular reduction of a polynomial


We have already learned that the Remainder Theorem can be used to evaluate
a polynomial at a single point. To efficiently evaluate a polynomial at a number of
points, we will use the remainders that result from other polynomial divisions. In
this text, we will call the divisors in these calculations modulus polynomials and
the resulting remainders will be called residue polynomials. We will also use the
following notation 1 in the development of this technique.
f (x) mod M(x).
The modular reduction of the polynomial f (x) by the modulus polynomial M(x) is the remainder that results when f (x) is divided by
M(x).
In addition to the Remainder Theorem, two other results involving modular
reductions will be used to efficiently evaluate a polynomial at a number of points.
Modular reduction - result 1.
If f (x) has degree less than M(x), then f (x) mod M(x) = f (x).
This result can be established by recalling that the division of f (x) by M(x)
must have a unique quotient and a unique remainder where either the remainder
is zero or has degree less than the degree of M. Since f (x) has degree less than
M(x), then f (x) must be this unique remainder of the polynomial division.

1Mathematicians who have studied advanced algebra may object to this definition of

f (x) mod M(x). Traditionally, the notation f (x) mod M(x) is used to represent an element
of something called a quotient ring which consists of the set of all polynomials that have the
same remainder as the remainder of f (x) divided by M(x). However, these mathematicians then
select a representative element of this set for computational purposes. In terms of the representative elements, the definition of f (x) mod M(x) is the same as the one considered in this
section. The reader should be cautioned, however, that this view is only valid with polynomials
involving one variable. With polynomials involving two or more variables, one needs to learn the
concept of a quotient ring and the more advanced mathematical techniques used for computing in
these quotient rings. The multivariate case will not be encountered anywhere in this manuscript.
More details about the relationship between residue polynomials and quotient rings is given the
appendix for those with the appropriate algebra background.

22

2. POLYNOMIALS

Modular reduction - result 2.


If MA (x), MB (x), and MC (x) are polynomials such that
MA = MB MC and f (x) is any polynomial, then
f mod MB
f mod MC

(f mod MA ) mod MB
(f mod MA ) mod MC

The proof of the Result 2 is left as an exercise.


These two results allow a polynomial to be evaluated using multiple modular
reductions. For example, if f = X and M = (x 1) (x 2) (x 3) (x 4),
then f mod M = f by the first result. By the second result, then f (1) can also be
computed with the sequence of modular reductions
EXAMPLE
By comparing the effort needed to compute these two modular reductions with
the synthetic division computed earlier, we see that we have not reduced the number
of operations needed to compute the polynomial evaluation. With some additional
effort, a sequence of modular reductions can efficiently evaluate a polynomial at a
number of points.

4. Multipoint polynomial evaluation


To evaluate a polynomial at a number of points, we can build a tree of polynomials similar to the one below for evaluating a polynomial at the points
S = {4, 3, 2, 1, 1, 2, 3, 4}.

(x + 4)(x + 3)(x + 2)(x + 1)(x 1)(x 2)(x 3)(x 4)


(x + 4)(x + 3)(x + 2)(x + 1)

(x + 4)(x + 3)

x+4

x+3

(x + 2)(x + 1)

x+2

x+1

(x 1)(x 2)(x 3)(x 4)


(x 1)(x 2)
x1

x2

In simplified form, the polynomials are also given by

(x 3)(x 4)
x3

x4

4. MULTIPOINT POLYNOMIAL EVALUATION

23

x8 30x6 + 273x4 820x + 576


x4 + 10x3 + 35x2 + 50x + 24

x2 + 7x + 12

x+4

x+3

x2 + 3x + 2

x+2

x+1

x4 10x3 + 35x2 50x + 24


x2 3x + 2
x1

x2

x2 7x + 12
x3

x4

The multipoint evaluation algorithm works by selecting n points for which


we wish to evaluate some polynomial f (x) of degree less than n. By the first
result in the previous section, f mod M = f where M is the polynomial at the
top of this tree. Using the second result in the previous section, one can reduce
f mod MA into f mod MB where MA = MB MC at each branch of the tree.
Here, MA is the polynomial at the parent node of a branch and MB and MC are
the polynomials of the children nodes. For example, the first modular reduction
in the above computation tree would involve the transformation of f mod X into
f mod X and f mod X. Here, X and MA = MB MC . The resulting computations
are
COMP
The modular reductions proceed down each branch of the tree until one reaches
the leaf nodes at the bottom of the tree. At this point, one has computed f mod (x
) for each of the n points for which we wished to evaluate f . By the Remainder
Theorem, then these results represent the evaluation of f at each of the n points.
The sequence of modular reductions for the case of the example can be organized into the following tree.
TREE
By carefully counting the number of operations involved in these modular reductions, we obtain XX additions and XX multiplications. This is the same number
of operations required to compute the evaluations using synthetic division. So far,
we have not gained anything by using the technique discussed in this section.
Let us now rearrange the points in S according to the order
S = {4, 4, 3, 3, 2, 2, 1, 1}. The tree of polynomials now becomes

24

2. POLYNOMIALS

(x + 4)(x 4)(x + 3)(x 3)(x + 2)(x 2)(x + 1)(x 1)


(x + 4)(x 4)(x + 3)(x 3)
(x + 4)(x 4)
x+4

x4

(x + 3)(x 3)
x+3

x3

(x + 2)(x 2)(x + 1)(x 1)


(x + 2)(x 2)
x+2

x2

(x + 1)(x 1)
x+1

x1

or in simplified form

x8 30x6 + 273x4 820x + 576


x4 25x2 + 144
x2 16
x+4

x4

x4 5x2 + 4

x2 9
x+3

x3

x2 4
x+2

x2

x2 1
x+1

x1

Observe that many of the polynomials in this tree have fewer terms than the
nodes of the tree given in XXX. The sequence of modular reductions for evaluating
f in set S can be given by the tree
PICTURE
and this time only XX multiplications and XX additions are required. This is a
consequence of the fact that the modulus polynomials in the new tree involve fewer
terms.
Note that if MB is of the form xm b and MC is of the form xm + b, then
MA will be of the form x2m b2 . Each of these polynomials has only two terms
and resulted in the reduced operation count of the multipoint evaluation using the
second modulus tree. We were able to arrange the points of S so that we could
achieve this situation at the bottom of the modulus tree, but we were not able to
construct polynomials of the desired form higher in the tree. It turns out that it is
impossible to find two polynomials with two terms that multiply together to form
x2m b2 whenever b2 < 0 if we restrict ourselves to the real numbers. By selecting
the points in S from a extension of the real numbers called the complex numbers,

4. MULTIPOINT POLYNOMIAL EVALUATION

25

we will be able to reduce the number of terms of the modulus polynomials higher
in the tree and achieve a faster multipoint evaluation technique.

CHAPTER 3

Complex numbers
1. Number systems
Over the years, mathematicians have invented several new number systems to
handle cases where one cannot solve a particular problem with the existing number
systems.
In elementary school, one first learns the number system of the natural numbers which consists of all of the positive integers, i.e. 1, 2, 3, 4, .
Next, the number zero is introduced and the number system is expanded into
the whole numbers.
Then, one is asked to find a number (possibly represented by ) such that
 + 1 = 0. Here, the concept of a negative number is introduced and the students
number system is expanded to the integers,
i.e. , 4, 3, 2, 1, 0, 1, 2, 3, 4, .
Later, one is asked to find a solution to an equation similar to 2x = 1 and the
concept of a fraction is introduced. Now, the students number system has been
expanded to the rational numbers, i.e. all numbers that can be expressed as a
ratio of two integers.
An equation similar to x2 2 = 0 cannot be solved with the rational numbers
and the concept of an irrational number is introduced. The number system is next
expanded to the real numbers.
The number system must be expanded again so that
an equation of the form
x2 + 1 = 0 can be solved. We will introduce a new symbol, 1, that we will define
as the solution
to this equation. Mathematicians traditionally use the symbol i
to represent 1 which engineers typically choose the symbol j to represent the
symbol. In this book, we will use the symbol I which is used in some popular
computer algebra packages. The number system has now been expanded to the
complex numbers

27

28

3. COMPLEX NUMBERS

Complex numbers.
The number system of complex numbers consists of all expressions of
the form
A+IB

where A and B are real numbers.

We can extract the two components of a complex number using the following operations.
Components of a complex number.
If C = A + I B is a complex number, then
Re(C) = A is the real part of the complex number and
Im(C) = B is the imaginary part of the complex number.

EXAMPLE
It is unfortunate that one of the components of a complex number is called imaginary. This term was introduced because at first some people did not believe that
these numbers had any practical applications. If one thinks about it carefully, negative numbers can also be considered imaginary because they cannot be used to
count anything tangible in the real world. However, negative numbers have become
accepted because they can be used to represent the concepts of debt and loss. This
text is all about one of the important practical application of complex numbers.
So, while the term imaginary is traditionally applied to one of the components
of a complex number, this term should not be interpreted as a description of the
usefulness of complex numbers.
So what is the next expansion after the complex numbers? A consequence of
the Fundamental Theorem of Algebra introduced by Carl Gauss states that the
complex number system is complete, meaning that all equations with coefficients
in the complex numbers can be solved using only the complex numbers.
It is possible to expand the complex number system two more times, but each
time we lose a property that one usually associates with numbers. First, the
quaternions are a set of numbers of the form A + I B + J C + K D where A, B, C,
and D are real numbers and I2 = J2 = K2 = 1 and I J K also equals 1. This
number system is not commutative, which means that a + b can be different from
b + a. This number system was long thought to be of only theoretical interest, but
has recently been applied to computer graphics and video games. This number
system can again be expanded into the octonions which are like the quaternions,
but has eight components. This number system is also not commutative, but has
the additional property that it is not associative. That is to say, a + (b + c) may
be different from (a + b) + c. Currently, this number system seems to be mainly of

2. COMPLEX ARITHMETIC

29

theoretical interest. It turns out that the octonions represent the last expansion of
the number system that can be made where both addition and multiplication are
defined.
The following table summarizes the various expansions of the number system
discussed in this section.
TABLE

2. Complex arithmetic
In the previous section, we learned that a complex number is of the form A+I B
where A and B are real numbers. The two quantities A and I B are kept distinct
because one cannot combine real and imaginary numbers. In [30], Loy relates the
components of a complex number to the idea of apples and oranges in a fruit basket.
Apples can be added to apples and oranges can be added to oranges, but apples
cannot be added to oranges. This illustration may be useful as we give the following
definition of addition and subtraction with complex numbers
Complex addition.
The sum of the two complex numbers A + I B and C + I D is given by
(A + C) + I (B + D).

EXAMPLE

The sum of 1 + I 2 and 3 + I 4 is given by:


(1 + 3) + I (2 + 4) = 4 + I 6

Complex subtraction. The difference of the two complex numbers


A + B I and C + D I is given by (A C) + (B D) I.
EXAMPLE
Complex multiplication works by treating each complex number as a binomial
(a polynomial with two terms) and combining the four terms of the product using
the property that I2 = 1. In other words:
Complex multiplication.
The product of the two complex numbers A + B I and C + D I
is given by
(AC BD) + (AD + BC) I
EXAMPLE

30

3. COMPLEX NUMBERS

Before discussing complex division, we first introduce the concept of the complex conjugate
Complex conjugate.
The conjugate of the complex number A + B I is given by
ABI

Complex division.
The quotient of the two complex numbers A + B I and C + D I
is given by
A+BI
C +DI

BC AD
AC + BD
+I 2
C 2 D2
C D2

This division formula is established by multiplying the numerator and denominator of the quotient by the complex conjugate of C + D I and simplifying.
The complex numbers can be graphed on a Cartesian plane where the horizontal
axis is used for the real component and the vertical axis is used for the imaginary
component. The complex number A + B I is mapped on the complex plane
using the ordered pair (A, B).

Im
(A, B)
B
A

Re

EXAMPLE.
3. Polar Representation of Complex Numbers
In this section, we will consider a second method of representing a complex
number called polar form illustrated in the figure below.

3. POLAR REPRESENTATION OF COMPLEX NUMBERS

31

Im
(r, )
r

Re

Complex numbers (polar form).


The number system of complex numbers consists of all expressions of
the form
(r, )
where r is the distance from the point to the origin called the magnitude and is the angle that a line drawn from the point to the
origin makes with the real axis. This angle is traditionally called the
angle or argument of the complex number.

If r = 0, then is undefined as any angle can be used to represent this point.


One difficulty with working with complex numbers in polar form is that they
may have more than one representation. If the above complex number is (r, 40o ),
then a second representation of the number is (r, 40o +360o) = (r, 400o ). In fact, any
multiple of 360 degrees can be added or subtracted from the argument to obtain
another polar representation of the complex number. To make sure that every
complex number has a unique representation in polar form, one often restricts the
range of argument values allowed. In this text, we will assume that the angles are
in the range 0o < 360o or 0 < 2 in radians. If a computation results in a
value of outside of this range, multiples of 360o will be added or subtracted from
the result to bring back within 0 < 360o.
We are now going to develop methods of converting between the Cartesian
representation and the polar representation of a complex number. The following
figure which illustrates both representations of a complex number may be helpful
in developing these formulas.

32

3. COMPLEX NUMBERS

Im

Re

First, we will show how to convert from the Cartesian representation to the
polar representation. Observe that r can be computed using the Pythagorean
Theorem or the distance formula
p
A2 + B 2

= tan

and satisfies


B
A

To solve this equation for , one must exercise caution because tan1 is restricted
to the range 180o < < 180o. The following formulas give a solution for in the
range 0 < 360o for all cases except when r = 0.

tan1 (B/A)

tan1 (B/A) + 180o

tan1 (B/A) + 360o

0
=
o
90

180o

270o

undefined

if
if
if
if
if
if
if
if

A>0
A<0
A>0
A>0
A=0
A<0
A=0
A=0

and B > 0
and
and
and
and
and
and

B
B
B
B
B
B

>0
=0
>0
=0
<0
=0

To convert from polar form to Cartesian form, we will apply the following
results from trigonometry.

3. POLAR REPRESENTATION OF COMPLEX NUMBERS

A =
B

33

r cos()

r sin()

Substituting these formulas into A + B I, we obtain another form of the polar


representation of a complex number

r (cos() + I sin())
which is sometimes abbreviated as r cis . Engineers often instead use the abbreviation r and call the polar representation of a complex number a phasor.
One advantage of the polar representation of complex numbers is that multiplication is easy in this form.
Complex multiplication (polar form). The product of two
complex numbers written in polar form r1 1 and r2 2 is given by
(r1 1 ) (r2 2 )

= r1 r2 (1 + 2 )

Thus, all one needs to do is multiply the magnitudes and add the arguments of two
complex numbers to compute the product. This result is a consequence of the sum
and difference formulas learned in trigonometry. The derivation of the formula is
left as an exercise.
This multiplication formula can be used to derive an expression for the square
of a complex number

(r)2

r2 (2)

By repeated use of the multiplication formula, we can derive de Moivres Theorem which allows a complex number to be raised to any integer power n.
de Moivres Theorem. If r is a complex number with magnitude
r and argument , then
(r)n

rn (n)

By a similar derivation used to produce the formula for complex multiplication,


a formula for the division of r1 1 and r2 2 is given by

34

3. COMPLEX NUMBERS

Complex division (polar form). The product of two


complex numbers written in polar form r1 1 and r2 2 is given by
r1 1 r2 2

r1
(1 2 )
r2

There are no good formulas for addition and subtraction in polar form. The
best course of action for these operations is to convert the two numbers to be added
or subtracted into Cartesian form, compute the sum or difference, and then convert
the result back into polar form if desired.
4. Primitive Roots of Unity
In this section, we will restrict ourselves to complex numbers which have magnitude 1 when represented in polar form. These complex numbers are said to form
the unit circle in the complex plane.

Im(z)

1
Re(z)

Consider the equation z n = 1. Here, z is a complex variable, i.e. an expression of the form x + I y where x and y are unknown real numbers. Alternatively,
z can be a variable involving complex numbers represented in polar form.
A solution to the above equation is called an nth root of unity. By de
Moivres Theorem discussed in the previous section, we can verify that

1(360od/n) =

cos(360od/n) + I sin(360od/n)

is a solution to this equation for all 0 d < n. The Fundamental Theorem of


Algebra tells us that there are exactly n solutions to the equation and we see that
the above expression can be used to find all of them.

4. PRIMITIVE ROOTS OF UNITY

EXAMPLE

35

For n = 8, the solutions to z 8 = 1 are:


1(360o 0/8) = 10o
1(360o 1/8) = 145o

1(360o 2/8) = 190o


1(360o 3/8) = 1135o

1(360o 4/8) = 1180o


1(360o 5/8) = 1225o

1(360o 6/8) = 1270o

1(360o 7/8) = 1315o


16 90o

16 135o

16 45o
1

16 180o

16 0o

16 225o

16 315o
16 270o


An element is said to be a primitive nth root of unity if n = 1, but


6= 1 for any m that divides n. Another way of looking at a primitive root of
unity is that { 0 , 1 , 2 , . . . , n1 } should give all of the solutions to xn = 1.
m

EXAMPLE

If n = 8, then a primitive 8th root of unity is given


by = 145o. In the above example, we saw that
{ 0 , 1 , 2 , . . . , 7 } gives all eight solutions to z 8 = 1.
On the other hand, the element 1 = 190o is
not a primitive root of unity. Note that (1 )4 =
1 where 4 divides 8. Also, {1 0 , 1 1 , 1 2 , . . . , 1 7 }
only gives four of the solutions to z 8 = 1:
{10o, 190o , 1180o, 1270o}.


It can be shown that if is a primitive nth root of unity, then 1 = n1 is also


a primitive nth root of unity. Here, the notation 1 means the (multiplicative)
inverse of and is the unique element such that 1 = 1 = 1.

36

3. COMPLEX NUMBERS

EXAMPLE

Observe that (145o ) (1315o) = (1360o) = 1. So,


1315o is the multiplicative inverse of 145o .
So, 1315o is also a primitive 8th root of unity. One
can verify this by moving counterclockwise in increments of 45 degrees in the figure above.


Next, consider the equation z n = 1. A solution to the above equation is


called an nth root of 1. Again, by de Moivres Theorem, we can verify that

1(/n + 360od/n) =

cos(/n + 360o d/n) + I sin(/n + 360od/n)

is a solution to this equation for all 0 d < n. This gives all n solutions to this
equation.
EXAMPLE

Consider the equation z 4 = 1 = 1180o. The four


solutions to the equation are given by
1(180o/4 + 360o 0/4) = 145o

1(180o/4 + 360o 1/4) = 1135o

1(180o/4 + 360o 2/4) = 1225o


1(180o/4 + 360o 3/4) = 1315o

16 135o

16 45o
1

16 225o

16 315o

5. EULERS FORMULA

37

Finally, the solutions to z 4 1 are the called the 4th roots of unity and are
given by {1, I, 1, I}.

1
1
1

I
Here = I is a primitive 4th root of unity. Since I4 = 1, then the powers of
I cycle according to the following pattern:
0, 4, 8, . . . = 1
1, 5, 9, . . . = I
2 , 6 , 10 , . . . = 1
3 , 7 , 11 , . . . = I
5. Eulers Formula
In a typical Calculus course, one encounters the following Taylor expansions
for the cosine, sine, and exponential functions:

cos(x)

sin(x)

ex

x4
x6
x2
+

+
2!
4!
6!
3
5
7
x
x
x
x
+

+
3!
5!
7!
x2
x3
x4
x5
x6
x7
x
+
+
+
+
+
+
1+ +
1!
2!
3!
4!
5!
6!
7!
1

In the 18th century, Leonhard Euler saught a single formula that related these three
expressions. The expression

cos(x) + sin(x)

= 1+

x2
x3
x4
x5
x6
x7
x

+
+

+
1!
2!
3!
4!
5!
6!
7!

almost does it, but some of the signs are wrong.

38

3. COMPLEX NUMBERS

In order to make a formula that works, Euler decided to replace the x in the
definition of ex with I y where y is a real number. Then we obtain
eIy

=
=
=
=
=
=

(I y)3
(I y)4
(I y)5
(I y)6
(I y) (I y)2
+
+
+
+
+
+
1!
2!
3!
4!
5!
6!
Iy
I2 y 2
I3 y 3
I4 y 4
I5 y 5
I6 y 6
1+
+
+
+
+
+
+
1!
2!
3!
4!
5!
6!
y2
I y3
y4
I y5
I y6
Iy

+
+

+
1+
1!
2!
3!  4!
5!
6!


y4
I y5
I y I y3
y2
+
+ +

+
+
1
2!
4!
1!
3!
5!




4
3
5
2
y
y
y
y
y
+
+ + I

+
+
1
2!
4!
1!
3!
5!
cos(y) + I sin(y)
1+

Now, certain mathematicians would correctly raise several objections to the derivation of the above formula. Euler did not concern himself with such issues and
neither will we in this presentation. However, it should be mentioned that some
more advanced mathematics covered in a course in complex variables is needed to
properly derive the above result. In any event,

Eulers Formula.
eIy

= cos(y) + I sin(y)

is now widely accepted by mathematicians and engineers.


Another way of representing a complex number in polar form which uses Eulers
Formula is given by

r (cos() + I sin())

= r eI

and the nth roots of unity are sometimes expressed as


{eI2/n , eI4/n , eI6/n , . . . , eI2(n1)/n } in this form. Here, we used the fact that
360 degrees is equivalent to 2 radians. Two primitive nth roots of unity are given
by eI2/n and eI2/n in this form.
6. Rotation transformations
Suppose that we multiply a complex number r by I = 190o. What is the
effect of this multiplication? From the multiplication formula for complex numbers

6. ROTATION TRANSFORMATIONS

39

in polar form, we see that the result of the multiplication is r( + 90o ). The new
point is the same distance from the origin, but the angle between the horizontal axis
and the ray connecting the point to the origin has been increased by 90 degrees.

EXAMPLE

If the point 130o is multiplied by I = 190o , we


obtain the new point 1(30o + 90o ) = 1120o as illustrated in the figure below.

16 120o

16 90o

16 30o

We are now going to consider the effect of multiplying the complex number r
by 1 where is any angle. Again, from the multiplication formula in polar form,
the result of the multiplication is r( + ). The distance that the new point is
away from the origin remains unchanged, but the angle that the ray connecting the
point to the origin makes with the horizontal axis has been increased by degrees.

40

3. COMPLEX NUMBERS

EXAMPLE

If the point 130o is multiplied by 145o , we obtain


the new point 1(30o + 45o) = 175o as illustrated in
the figure below.
16 75o
16 45o

16 30o

Next, suppose that we transform the complex plane with the mapping z =
1 z. This means that every point in the complex plane should be multiplied by
1. In other words, the magnitude of every point is unchanged, but the argument
of every point is increased by degrees. The figure below illustrates how four points
are represented in a transformed complex plane where = 45o.

z = 16 45o z

We see here that a negative value for decreases the argument of every point or
rotates the points clockwise in the complex plane.
Another way of looking at this transformation is to leave the points fixed in
the transformed complex plane, but adjust the axis system instead. In this case,
the axis system is rotated by degrees. The following figure illustrates this idea
for the case where = 45o and thus the axis system is rotated counterclockwise
by (45o) = 45o .

41

6. ROTATION TRANSFORMATIONS

z = 16 45o z

The above two figures show two equivalent ways of looking at the same transformation. Another way of expressing the transformation is in terms of z instead
of z using the notation z = 1() z z. This will be the form used in the coming
chapters.

EXAMPLE

In the previous examples, we considered the transformation z = 1 45o z. Multiply both sides of this
equation by 145o to obtain
145o z = 145o 1 45o z
145o z = 1 z
So this transformation can be expressed with the
equivalent formula z = 145o z.

So if a transformation is expressed as z = 1 z, this means that one can


either rotate all of the points clockwise by degrees or rotate the axis system
counterclockwise by degrees. One can verify this in the figures above with the
example z = 145o z.
The main purpose of these mappings in this text is to transform a set of solutions to the equation z n = 1 to be roots of unity in a transformed complex
plane. The following result will help us achieve this goal.

42

3. COMPLEX NUMBERS

One can transform the equation z n = 1 with the mapping z =


(1(/n)) z to obtain the equation zn = 1 in the transformed complex
plane.

EXAMPLE

Consider the equation z 4 = 1 = 1180o which was


solved in a previous section with solutions illustrated
in the following figure.

16 135o

16 45o
1

16 225o

16 315o

We wish to perform a transformation so that these


four solutions are roots of unity. By the above
result, the transformation that should be used is
z = 1(180o/4) = 145o z.
Let us verify that this transformation actually accomplishes this effect.
z4
(1(45o ) z)4

(1(45o ))4 z4
1180o z4
z4

=
=

1180o
1180o

1180o

=
=

1180o
1

So the result indeed holds for this case.


We will conclude this section by reviewing the two ways of looking at the
transformation for the above example. First, one can rotate the points clockwise
by 45 degrees as illustrated in the following figures

6. ROTATION TRANSFORMATIONS

43

16 90o
16 135o

16 45o
1

1
o
6
1 180

16 225o

16 0o

16 315o
16 270o

or one can rotate the axis system counterclockwise by 45 degrees as illustrated in


the following figures.

16 135o

16 0o

16 45o

16 90o

16 225o

16 270o

16 315o
16 180o

These rotation transformations will play an important role in one of the FFT algorithms discussed in the next chapter.

CHAPTER 4

FFT algorithms
1. The Binary Reversal Function
Having completed a study of complex numbers, we are ready to resume our
search for a fast multipoint evaluation algorithm. The Fast Fourier Transform
(FFT) algorithms in this chapter evaluate a polynomial f at each of the nth roots
of unity for some n = 2k . To achieve this goal, we are going to construct modulus
polynomials of the form z 2m b2 which can be factored into z m b and z m + b at
every node of the modulus polynomial tree. Before presenting the algorithms, we
need a method to easily find the bs in the expressions above. This method requires
one to convert numbers in our system called decimal which is base-10 to the one
that computers use called binary which is base-2. Note that dec is a prefix which
means 10 and bi associated with 2, which may be helpful in figuring out what to
call other number systems.
In elementary school, one learned that each digit in a number represented a
power of 10. For example, the number 123 is used to represent 1 group of a hundred,
2 groups of ten, and 3 groups of one, i.e.

123 = 1 102 + 2 101 + 3 100


A computer uses a number system where each digit (called a bit, meaning binary
digit) is either a 0 or a 1 and each position in the number represents a power of 2.
For example, the number (101)2 is used to represent 1 group of four, 0 groups of
two, an 1 group of one, i.e.

(101)2

= 1 22 + 0 21 + 1 20

The notation ()2 is used to indicate that this represents a binary number.
To convert from decimal to binary, divide the number by two and record the
remainder. Then divide the quotient by two and again record the remainder. Continue the process until the quotient is zero. The binary representation of the number
is the sequence of remainders in reverse order.
45

46

4. FFT ALGORITHMS

EXAMPLE

Let us express 13 in binary form.


13 2
62

32
12

= 6
= 3

R1
R0

= 1
= 0

R1
R1

So in binary form, 13 = (1101)2 .

To convert from binary to decimal, expand the binary number according to


its place values. Then evaluate the resulting expression, treating all numbers as
decimal values.
EXAMPLE

Let us express (1101)2 in decimal form


(1101)2

1 23 + 1 22 + 0 21 + 1 20

=
=

18+14+02+11
8+4+0+1

13


To form the modulus polynomials used in the FFT algorithms, we need to be


able to convert between the two number systems. The tool that will be used to
construct these polynomials is:
Binary reversal function.
Given some integer j expressed in binary form, i.e.
j = (bk1 bk2 bk3 . . . b2 b1 b0 )2 , the binary reversal of j
(with respect to n = 2k ), denoted by (j)(n) is obtained by expressing
the bits of j in reverse order, i.e. (j)(n) = b0 b1 b2 . . . bk3 bk2 bk1 .

EXAMPLE

Let j = 5 and n = 16 = 24 .
In binary form, j = (0101)2.
So (j)(16) = (1010)2 ..
As a decimal number, (j)(16) = 10.

Note in the example that leading zeros should be included in the binary reversal of
the number. Also, n is often the same for many related calculations. If this is the

2. CLASSICAL RADIX-2 FFT

47

case and it is understood from the context of the situation what n is, then it is not
necessary to specify n in the notation and one can simply use (j) instead.
Properties of the binary reversal function.
(1). (j) = 2 (2j) for j < n/2
(2). (2j + 1) = (2j) + n/2 for j < n/2
(3). This function is permutation of the integers {0, 1, 2, , n 1}.
To establish the first property, let j < n/2 and write j in binary form, i.e.
j = (0bk2 bk3 . . . b2 b1 b0 )2 where k = log2 (n). Then
(j) = (b0 b1 b2 . . . bk3 bk2 0)2 . Now, 2j = (bk2 bk3 bk4 . . . b1 b0 0)2 and
(2j) = (0b0 b1 b2 . . . bk3 bk2 )2 . Multiplying this result by 2 gives (j) as desired.
The proof of the other two properties is left as exercises.
EXAMPLE

Let n = 8 = 23 . We are going to compute the binary


reversal of each number in the range from 0 to 7.
0 = (000)2

(0) = (000)2 = 0

1 = (001)2
2 = (010)2

(1) = (100)2 = 4
(2) = (010)2 = 2

3 = (011)2

(3) = (110)2 = 6

4 = (100)2
5 = (101)2

(4) = (001)2 = 1
(5) = (101)2 = 5

6 = (110)2
7 = (111)2

(6) = (011)2 = 3
(7) = (111)2 = 7


One can verify that the three properties above hold for when n = 8.

2. Classical radix-2 FFT


After all of this background material, it is finally time to consider the more
efficient multipoint evaluation algorithm called the Fast Fourier Transform. This
algorithm was published in a paper by Cooley and Tukey [11] for various input
sizes. In this section, we will describe the Cooley-Tukey algorithm which evaluates
a polynomial at n = 2k specially chosen points. A variant of the algorithm with
n = 3k will be discussed in Chapter 8. Because we will encounter a different
formulation of the FFT algorithm in the next section, we will call the CooleyTukey version of the FFT algorithm the classical FFT algorithm since it came
first.

48

4. FFT ALGORITHMS

The input to the reduction step of the classical radix-2 FFT algorithm is given
by f mod (z 2m b2 ) and computes f mod (z m b) and f mod (z m + b). Before
discussing the reduction step itself, let us consider the tree of modulus polynomials.
At the top of the tree is z n 1. Thus, we are going to evaluate f at each of
the nth roots of unity. At each reduction step with input size 2m, let b = (2j)
for some j < n/m. In the previous section, we saw that b2 = ( (2j) )2 = (j) and
b = (2j) = (2j+1) . So the input modulus polynomial is z 2m (j) and the
output modulus polynomials are z m (2j) and z m (2j+1) . At the bottom of
the tree are z (j) for all 0 j < n.
The modulus polynomial tree when n = 16 is given by:

z 16 1
z8 0
z4 1
z2 1
0

z4 + 1

z2 + 1
4

z8 8

12

z4 4

z2 4

z 2 12

10

14

z2 1
1

z 4 12

z2 + 1
5

13

z2 4

z 2 12

11

15

The next figure shows how each of the roots of unity


are subdivided at each

branch of the tree when n = 8. Here, = 2/2 + I 2/2.

2. CLASSICAL RADIX-2 FFT

49

z8 1

z4 1

z4 + 1

z2 1

z1

z+1

z2

z2 + 1

1 z +

z 1

z 5

z2 +

z 3

z 7

The reduction step is simple to perform with these modulus polynomials. Split
the input into two blocks of size m by writing f mod (z 2m (j) ) = fA xm + fB .
Then the outputs are given by fY = (2j) fA + fB and fZ = (2j) fA + fB .
The reduction step can also be expressed in matrix form as

fY
fZ

(2j)
(2j)

1
1


 
fA

fB

Engineers often represent the reduction step as a picture similar to the one below
and call it a butterfly operation.

fB

fA
(2j)

fY

fZ

50

4. FFT ALGORITHMS

EXAMPLE

Suppose that we are trying to compute an FFT of size


8 and the input to a reduction step in this process
is f = z 3 + z 2 z = f mod (z 4 4 ). Split this
input polynomial into two blocks of size 2 by writing
f = (z +1)z 2 +(z). So fA = z +1 and fB = z +0.
If b2 = 4 = (1) , then b = 2 = (21) = I.
So b fA = I (z + 1) = I z + I and
fY = b fA + fB

= (I z + I) + (z + 0)
= (1 + I) z + I

fZ = b fA + fB

= (I z + I) + (z + 0)

= (1 I) z I

In terms of butterfly operations, the reduction step


can be expressed as
f0

f1

I
(fY )0

1 + I
(fY )1

f2
2

I
(fZ )0

f3
2

1 I
(fZ )1


Suppose that we want to compute the FFT of a polynomial f of degree less than
n = 2k . Then f is equal to f mod (z n 1). We will recursively apply the reduction
step with appropriate selections of m and b. After all of the reduction steps have
been completed with input size 2m = 2, then we have f mod (z (j) ) = f ( (j) )
for all j < n, i.e. the desired FFT of f . In terms of butterfly operations, the FFT
of size 8 is expressed as:

2. CLASSICAL RADIX-2 FFT

f0

f1

f2

f3

f ( 0 )

f5

f ( 4 )

f ( 2 )

f ( 6 )

f7

f6

f4

51

f ( 1 )

f ( 5 )

f ( 3 )

f ( 7 )

The FFT of size 8 for the input polynomial f (z) = z 7 + 2z 6 + 3z 5 + z 4 + 2z 3 +


3z + 2z + 1 considered earlier in this section is given by
2

z 7 + 2z 6 + 3z 5 + z 4 + 2z 3 + 3z 2 + 2z + 1

3z 3 + 5z 2 + 5z + 2

8z + 7

15

z3 + z2 z

2z 3

3 + 2I

(1 + I) z + I

3 2I 2 + I

2+I

(1 I) z I

2I

2I

The above example was specially designed so that the exact answers could fit
nicely in the boxes for each step. Real-world problems use decimal approximations
instead as illustrated by the following example.

52

4. FFT ALGORITHMS

EXAMPLE

Suppose that we wish to compute the FFT of size 8


with input f (z) = z 3 + z 2 + z + 2.
Since this input is of degree less than 4, then
f (z) mod (z 4 1) = f (z) mod (z 4 + 1) = f (z).
Next,
f (z) mod (z 2 1) =

f (z) mod (z 2 + 1) =
2

f (z) mod (z I) =
2

f (z) mod (z + I) =

2z + 3
1
(1 + I) z + (2 + I)

(1 I) z + (2 I)

Finally,
f ( 0 ) =
4

f ( ) =
2

5
1

f ( ) =

f ( 6 ) =

f ( 1 ) =

2 + 2.4142 I

f ( ) =
3

f ( ) =
f ( 7 ) =

2 0.4142 I

2 + 0.4142 I

2 2.4142 I

In unscrambled form, the output is


f ( 0 ) =

f ( 1 ) =

2 + 2.4142 I

f ( 2 ) =
3

f ( ) =
4

2 + 0.4142 I

f ( ) =

f ( 5 ) =

2 0.4142 I

f ( 6 ) =
7

f ( ) =

2 2.4142 I

Looking at the unscrambled output in the example above, we see that f ( d ) is


the complex conjugate of f ( nd ) for each 0 < d < n. It turns out that whenever
the input to the FFT consists of purely real numbers, the output will have this
property. We will explore more properties of the FFT later in the chapter.
Pseudocode is a method of informally describing a sequence of steps (called an
algorithm) that is not dependent on a particular computer language. The pseudocode for the classical radix-2 FFT is given in Figure 1. It may be helpful to

3. OPERATION COUNT OF THE CLASSICAL RADIX-2 ALGORITHM

53

follow the steps of the algorithm using the two examples above to understand how
the algorithm works.
Algorithm : Classical radix-2 FFT
Input: f mod (z 2m (j) ), a polynomial of degree less than 2m
with complex number coefficients; an nth root of unity .
Here, m is a power of 2 where 2m n.
Output: f ( (j2m+0) ), f ( (j2m+1) ), . . . , f ( (j2m+2m1) )
0. If (2m) = 1 then return f mod (z (j) ) = f ( (j) )
1. Split f mod (z 2m (j) ) into two blocks fA and fB each of size m
such that f mod (x2m (j) ) = fA xm + fB
2. Compute f mod (z m (2j) ) = fA (2j) + fB
3. Compute f mod (z m (2j+1) ) = fA (2j) + fB
4. Compute the FFT of f mod (xm (2j) ) to obtain
f ( (j2m+0) ), f ( (j2m+1) ), . . . , f ( (j2m+m1) )
5. Compute the FFT of f mod (z m (2j+1) ) to obtain
f ( (j2m+m) ), f ( (j2m+m+1) ), . . . , f ( (j2m+2m1) )
6. Return f ( (j2m+0) ), f ( (j2m+1) ), . . . , f ( (j2m+2m1) )
Figure 1. Pseudocode for classical radix-2 FFT

3. Operation count of the classical radix-2 algorithm


In this section, we are going to show how to determine the number of multiplications and the number of additions required to compute an FFT with size a power
of 2 using the algorithm provided in the previous section. The main purpose of this
section is to introduce several techniques which can be used to evaluate most of the
other algorithms discussed in this text.
First, let us determine the number of multiplications required to compute the
FFT with input size equal to 2m. We will denote this multiplication count using
the notation M (2m). The reason why the input size is expressed as a multiple of 2
is to avoid fractions which some people feel makes the pseudocode harder to read.
Line 0 simply returns the input when the input size is 1 to end the recursion involved in this algorithm. Clearly, this instruction requires no multiplications. Line
1 splits the input polynomial into two parts of size m used in the next few instructions. This instruction also does not require any multiplications. Line 2 involves
the addition of fA and fB , two polynomials of size m. Again, this instruction does
not require any multiplications. In line 3, we must multiply each of the components of fA by (2j) . Since fA has size of m, then a total of m multiplications
are required. Notice, however, that if j = 0, then (2j) = 1 and no multiplications are required. With the input polynomial reduced into f mod (z m (j) ) and
f mod (z m (j) ), the rest of the FFT of size 2m is determined by computing
these two FFTs of size m. We can say that the cost of these two instructions is
2 M (m). Line 6 simply returns the FFT after all of the components have been
determined and does not require any multiplications.

54

4. FFT ALGORITHMS

Combining all of the results of the analysis of this algorithm for the case that
j is never equal to 0, we can express the number of multiplications required using
the formula

M (2m) = 2 M (m) + m
At this point, it is more convenient to express the input as size n = 2m. With this
change of variables, the above equation becomes

M (n) =

2M

n
2

n
2

This is called a recurrence relation because it expresses the operation count in terms
of itself with a different input size.
To solve a recurrence relation, we need something called an initial condition.
This is a solution of the recurrence relation for some value of the input. In the case
of the multiplication count, we know that no multiplications are required when
n = 1, i.e. M (1) = 0.
There are several techniques that can be used to solve a recurrence relation.
A branch of mathematics called combinatorics involves the study of something
called generating functions, which are a powerful tool that can be used to solve
recurrence relations. The interested reader can study [32] to learn more about this
method. A simpler, but more limited method of solving recurrence relations is with
a technique called substitution.
Suppose that we let m = n/4 in the above recurrence relation. In this case, we
have

n
2

2M

n
4

n
4

By replacing M (n/2) with the above expression in the formula derived for M (n),
we obtain

M (n) =
=
=

n n
2M
+
 2  n 2 n  n
+
+
2 2M
4
2
n4
n
2
+2
2 M
4
2

The technique of substitution repeats this procedure until the initial condition is
reached. Usually, one has to discover a pattern to reduce the recurrence relation
to this point. The initial condition is then substituted into the formula to obtain
a closed-form solution for the recurrence relation. If n = 2k for some k, i.e. k =

3. OPERATION COUNT OF THE CLASSICAL RADIX-2 ALGORITHM

55

log2 (n), then the derivation of the multiplication count for the classical radix-2
FFT algorithm continues as follows:

n
n
+2
M (n) = 22 M
4

 n  2n 
n
2
= 2 2M
+
+2
8
2
n 8
n
+3
= 23 M
2
 8n 
n
4
= 2 M
+4
2
 16
n
n
5
= 2 M 5 +5
2
2
=
n
n
= 2k M k + k
2
2
n
n
k
= 2 M
+k
n
2
n
k
= 2 M (1) + k
2
n
k
= 2 0+k
2
1
n log2 (n)
=
2

We now need to handle the cases where j = 0. If the combinatorics approach is


used, then this case can be naturally modeled into a system of generating functions
to be solved. With the substitution approach, we will instead directly count the
number of multiplications that are not needed when j = 0 and then subtract this
amount from the number of multiplications required in the general case.
The only way that the case j = 0 can be encountered is if the algorithm is
initially called with this condition. In line 2, m multiplications are not needed
when j = 0. The recursive call in line 4 retains the condition that j = 0, but the
call in line 5 makes j = 1. So only one of the two recursive calls should be included
in the recurrence relation. Combining these results for 2m = n, we obtain

Ms (n)

Ms

n
2

n
2

for Ms (n), the number of multiplications saved in the computation of an FFT of


size n at each of the roots of z n 1. The initial condition for this recurrence relation
is Ms (1) = 0.
Applying the technique of substitution in this case, we obtain

56

4. FFT ALGORITHMS

Ms (n) =
=
=
=
=
=
=
=
=

n

n
+
2
2
n n
n
Ms
+ +
 4n  4n 2 n
n
Ms 3 + 3 + 2 + 1
2
2
2
2
 
n
n
n
n
n
Ms k + k + + 3 + 2 +
2
2
2
2
2

n
1
1
1
1
Ms
+n
+

+
+
+
n
2k
23
22
2
!
 k
 3  2
1
1
1
1
Ms (1) + n
+ +
+
+
2
2
2
2
 k !
 2  3
1
1
1
1
+
+ +
+
0+n
2
2
2
2
!
k  d
X
1
n
2

Ms

d=1

To complete the solution of this recurrence relation, we need the following result:

Geometric series.
If a is any real number other than 1, then
k
X

ak+1 aL
a1

ad

ad

= kL+1

d=L

If a = 1, then
k
X

d=L

To resolve the summation in the above recurrence relation, we let a = 1/2 and
L = 1. So,

3. OPERATION COUNT OF THE CLASSICAL RADIX-2 ALGORITHM

k  d
X
1
d=1

57


1 k+1
12
2
1
2 1


1 k+1
12
2
21
 k

1
2
1
= 1 k
2
1
= 1
n

+1

So the number of multiplications saved is

Ms (n) =
=
=

k  d
X
1

2
d=1


1
n 1
n
n1

and the total number of multiplications when j = 0 is

M (n) =

1
n log2 (n) n + 1
2

We can follow a similar procedure to determine the number of additions for the
classical radix-2 algorithm. The recurrence relation

A(n)

2A

n
2

+n

can be used to model this operation count. This formula is valid for all values of
j, so it is not necessary to solve a second recurrence relation for this case. The
number of additions required is given by the closed-form formula

A(n)

= n log2 (n)

The derivation of this formula is left as an exercise.

58

4. FFT ALGORITHMS

The techniques used in this section are sufficient to derive the operation counts
for most of the other algorithms in this text. These other operation count derivations will be left as exercises.

4. Twisted radix-2 FFT


In [20], Gentleman and Saude presented a different method to compute the
FFT for various input sizes. In this section, we will consider the case where n = 2k .
Both this version and the classical version appear frequently in other FFT literature.
In [REF], Bernstein calls the Gentleman-Saude algorithm the twisted FFT for
reasons we shall see shortly. We will adopt this terminology in this text as well.
Recall that our goal is to compute the evaluation of some polynomial f (z) at
each of the nth roots of unity. In the previous section, we learned that if the input
to the reduction step was in the form f (z) mod (z 2m 1), then fewer multiplications
were required for the reduction. We are going to use what we learned about rotation
transformations of complex numbers to make sure that every reduction step input
is in this form.
Let f (z) be some polynomial of degree less than 2m that we wish to evaluate
at each of the (2m)th roots of unity. The twisted radix-2 FFT will always receive
such an input f (z) = f (z) mod (z 2m 1) and produce outputs f (z) mod (z m 1)
and f (z) mod (z m + 1). Note that the first output is already in the form needed
to reuse the twisted radix-2 reduction step, but we need to transform the second
output to put it into the proper form.
We can interpret the second output as some polynomial that we wish to evaluate
at the solutions to z m = 1 = 1(180o). We learned in Chapter 3 that if we apply
the transformation z = (1(180o/m)) z to this equation, then it becomes zm = 1.
Recall that this transformation can be viewed as rotating the points in the complex
plane clockwise by 180/m degrees or rotating the axis system of the complex plane
counterclockwise by 180/m degrees. One may also describe this transformation as
twisting the complex plane. This was the motivation for Bernstein calling this
type of FFT the twisted FFT.
So we know the mapping needed to turn z m = 1 into zm = 1, but we need
a method of twisting the second output of the reduction step according to this
mapping. We learned earlier in this chapter that (1) = 1. Using the properties of complex numbers discussed in Chapter 3, the transformation can also be
expressed as z = (1)/m z. This formula says to replace every occurrence of z in
an expression with (1)/m z. This operation can be applied to the second output
of the reduction step with the following technique.
Twisted polynomial.
One can transform a polynomial f (z) according to the transformation
z = z by multiplying the coefficient of degree d by d for each degree
in f .

4. TWISTED RADIX-2 FFT

EXAMPLE

59

Suppose that we wish to twist the polynomial


f (z) = z 3+ z 2 z using
twisting factor
= = 2/2 + I 2/2.
The resulting polynomial is
=

3 z 3 + 2 z 2 1 z + 0 0

( 2/2 + I 2/2) z 3 + I z 2

( 2/2 + I 2/2) z

Assuming that the powers of are precomputed, the twisting of the polynomial
f (z) of degree less than m requires m 1 multiplications and no additions. A
multiplication is not needed for the constant term of f (z).
So now, both the first and second outputs of the twisted radix-2 reduction step
are in the proper form to be used as inputs for another application of the reduction
step. The process of applying the reduction step and twisting the polynomial can
also be expressed using butterfly operations. The following figure gives the butterfly
operation which is part of an FFT of size n = 16. The input to this reduction step
is of degree less than 8. So m = 4 and the value of used to compute the twisted
polynomial is = (1)/m = (n/2)/m = (16/2)/4 = 2 .

f0

(fY )0

f1

(fY )1

f2

(fY )2

f3

(fY )3

f4

f5

f6

f7

(fZ )0

(fZ )1

(fZ )2

(fZ )3

We now need to show that by recursively applying the reduction step to an


input polynomial, we obtain the desired FFT. The rotations of the complex plane
are cumulative throughout the process and we need to carefully keep track of how
much the complex plane has been rotated at every step of the FFT computation.
The following figure shows all of the rotations involved with a twisted FFT of
size 8. Observe that the figure takes the viewpoint of rotating the axis system of
the complex plane rather than the points. This makes it easier to compare this
figure with the one used for the classical algorithm.

60

4. FFT ALGORITHMS

z8 1

(z(1) )4 1

z4 1

z2 1

z1

z(4) 1

(z(2) )2 1

z(2) 1

z(6) 1

(z(1) )2 1

z(1) 1

z(5) 1

(z(3) )2 1

z(3) 1

z(7) 1

Here, z = z(1) so that z(1) = 1 z. Similarly,


z(2) = 2 z
z(4) = 4 z
z(6) = 4 z(2)

z(3) = 2 z(1)
z(5) = 4 z(1)
z(7) = 4 z(3)

One will probably agree that it is too complicated to create a new variable every
time the complex plane is transformed. Therefore, we need a different notation to
keep track of the transformations.
From this point forward, the inputs and outputs of the reduction steps will be
expressed in terms of the original untransformed complex plane using the notation
f ( z). This notation means to replace each z in the original polynomial f (z) with
z and simplify this into a new polynomial that is a function of z, i.e. compute
a twisted polynomial with = . However, the z of the twisted polynomial is
associated with a complex plane that has been rotated by . This may seem
confusing at first, but it is better than inventing a new variable every time the
complex plane is transformed. The main thing to remember is that when one
sees the notation f ( z), this means a polynomial that is a function of z in a
transformed complex plane where the axis system has been rotated counterclockwise
by .
The following example may help to clarify the new notation.
EXAMPLE.
We can now present the reduction step of the radix-2 twisted FFT algorithm in
terms of the original polynomial and the new notation. It can be shown that if the
input to the reduction step is f (z) = f ( (j)/(2m) z) mod (z 2m 1) for some j,

4. TWISTED RADIX-2 FFT

61

then the outputs are given by f (z) mod (z m 1) = f ( (2j)/m z) mod (z m 1) and
f ( (1)/m z) mod (z m 1) = f ( (2j+1)/m z) mod (z m 1). These expressions
can be determined by carefully keeping track of the cumulative rotations of the
complex plane using the new notation and using the fact that (j)/2m = (2j)/m.
The twisted radix-2 FFT algorithm is initialized with f (z) which equals f ( 0
z) mod (z n 1) if f has degree less than n. By recursively applying the reduction
step to f (z), we obtain f ( (j) z) mod (z 1) = f ( (j) 1) for all j in the range
0 j < n. This is the desired FFT of f (z).
The following figure shows how every intermediate result of this FFT calculation
relates to the original input polynomial f (z) for n = 8.

f (z) mod (z 8 1)
f (z) mod (z 4 1)
f (z) mod (z 2 1)

f ( 0 )

f ( 4 )

f ( z) mod (z 4 1)

f ( 2 z) mod (z 2 1)

f ( 2 )

f ( 6 )

f ( z) mod (z 2 1)

f ( 1 )

f ( 5 )

f ( 3 z) mod (z 2 1)

f ( 3 )

f ( 7 )

We now provide a complete butterfly diagram for this FFT of size 8.

f0

f1

f2

f3

f4

f ( 0 )

f ( 4 )

f ( 2 )

f ( 6 )

f5

f ( 1 )

f ( 5 )

f7

f6

f ( 3 )

f ( 7 )

62

4. FFT ALGORITHMS

Algorithm : Twisted radix-2 FFT


Input: f (z) = f mod ( (j) z 2m 1), the modular reduction of some
polynomial f (z) that has been twisted by (j) .
Here, m is a power of 2 where 2m n.
Output: f ( (j2m+0) ), f ( (j2m+1) ), . . . , f ( (j2m+2m1) )
0. If (2m) = 1 then return f ( (j) z) mod (z 1) = f ( (j) )
1. Split f (x) into two blocks fA and fB each of size m
such that f (x) = fA z m + fB
2. Compute f mod ( (2j) z m 1) = f (z) mod (z m 1) = fA + fB
3. Compute f (z) mod (z m + 1) = fA + fB
4. Twist f (z) mod (z m + 1) by (1)/m to
obtain f mod ( (2j+1) z m 1)
5. Compute the FFT of f mod ( (2j) z m 1) to obtain
f ( (j2m+0) ), f ( (j2m+1) ), . . . , f ( (j2m+m1) )
6. Compute the FFT of f mod ( (2j+1) z m 1) to obtain
f ( (j2m+m) ), f ( (j2m+m+1) ), . . . , f ( (j2m+2m1) )
7. Return f ( (j2m+0) ), f ( (j2m+1) ), . . . , f ( (j2m+2m1) )
Figure 2. Pseudocode for twisted radix-2 FFT

The next diagram shows the intermediate results of the FFT of f (z) = z 7 +
2z + 3z 5 + z 4 + 2z 3 + 3z 2 + 2z + 1 using the twisted method. One can compare
these results with those given for the computation of the
the classical
FFT using
2/2
+
I

2/2 and 3 =
algorithm
provided
in
the
previous
section.
Here,

2/2 + I 2/2.
6

z 7 + 2z 6 + 3z 5 + z 4 + 2z 3 + 3z 2 + 2z + 1

3z 3 + 5z 2 + 5z + 2

8z + 7

15

3 z3 + I z2 z

2I z 3

3 + 2I

2z+I

3 2I 2 + I

2+I

2zI

2I

2I

Pseudocode for this FFT algorithm is given in Figure 2.


We leave the analysis of this algorithm as an exercise, but state that the final
operation counts are given by the recurrence relations

5. CLASSICAL RADIX-4 FFT

M (n) =
A(n)

63

n 1
+ n1
2M
2
 n2
2A
+n
2

where M (1) = 0 and A(1) = 0.


Using the technique discussed in the previous section, closed-form formulas for
the operation counts are given by

1
n log2 (n) n + 1
2
= n log2 (n)

M (n) =
A(n)

These computations are left as an exercise.


Note that this algorithm has the exact same operation count as the classical
radix-2 FFT algorithm. One can show that the operations saved by using the
simplified reduction step at every point in the algorithm is the same as the cost
of computing the twisted polynomials. Thus, any hybrid algorithm which combines the classical FFT and twisted FFT reduction steps will also require the same
number of operations.

5. Classical radix-4 FFT


A more efficient FFT algorithm can be obtained by exploiting the fact that
multiplication by I or I is simple to perform in the complex numbers.
Complex multiplication by I and I.
Given a complex number A + I B, multiplication by I is given by
I (A + I B) =

and multiplication by I is given by


I (A + I B) =

B + I A

B + I (A)

Observe that no arithmetic operations are needed to compute these multiplications.


Some of the elements need to be inverted, but we will see later in this section how
to avoid any computational effort that is associated with inverting these elements
in the FFT algorithms.
EXAMPLE
Observe that the definition of multiplication by I in Cartesian form is consistent
with the definition presented in Chapter 3 for multiplication by I in polar form.

64

4. FFT ALGORITHMS

The function has the following additional properties that will be useful in the
development of the radix-4 algorithm:
Additional properties of the binary reversal function.
(4).
(5).
(6).
(7).

(j) = 4 (4j) for j < n/4


(4j + 1) = (4j) + n/2 for j < n/4
(4j + 2) = (4j) + n/4 for j < n/4
(4j + 3) = (4j) + 3 n/4 for j < n/4

These properties are left as exercises for the reader to verify. It follows from these
properties that for any j < n/4,

(4j+1)
(4j+2)
(4j+3)

= (4j)

= I (4j)
= I (4j)

Each reduction step of the radix-4 FFT receives as input


f (z) = f (z) mod (z 4m b4 ) and computes
fW
fX

= f (z) mod (z m b)
= f (z) mod (z m + b)

fY
fZ

= f (z) mod (z m I b)
= f (z) mod (z m + I b)

where b = (4j) for some j < n/4m. By the above properties, then the input can
also be exprssed as f (z) mod (z 4m (j) ) and the output can be expressed as
fW
fX
fY
fZ

= f (z) mod (z m (4j) )

= f (z) mod (z m (4j+1) )

= f (z) mod (z m (4j+2) )

= f (z) mod (z m (4j+3) )

Each radix-4 reduction step essentially does two levels of radix-2 reduction steps.
A modulus polynomial tree for n = 16 is given by:

5. CLASSICAL RADIX-4 FFT

65

z 16 1
z4 1
0

12

z4 + 1

10

14

z4 4
1

z 4 12
13

11

15

One can verify that the bottom level results of the tree shown above are the
same as the tree produced when the radix-2 algorithm is used.
The reduction step of the radix-4 FFT is fairly simple to perform as well. Split
f into four blocks of size m by writing f = fA x3m + fB x2m + fC xm + fD .
Then the four outputs of the reduction step {fW , fX , fY , fZ } are given by the
matrix computation

fW
fX

fY
fZ

1
1
1
1 1 1

I 1 I
I 1 I

3(4j)

fA
1
2(4j)

1
fB

1 (4j) fC
1
fD

Here, there are 3m multiplications involved in the construction of the polynomials


3(4j) fA , 2(4j) fB , and (4j) fC , but there are no multiplications involved
for the rest of the calculation.
Looking at the matrix computation, there appears to be 12m additions or
subtractions involved in this reduction step. However, some of the operations are
repeated in the four results and only need to be computed one time. For example,
fD and 2(4j) fB are added together in two of the results and subtracted in two
of the other results. By carefully analyzing which operations only need to be done
one time, we can reduce the number of additions to 8m. The improved method of
performing the computations can be summarized in the following butterfly diagram.

66

4. FFT ALGORITHMS

fD

fC

fB

(4j)

fA

2(4j)

3(4j)

fW

fX

fY

fZ

The multiplication by I in the reduction step does not cost any arithmetic operations. Even the inversion given in the result at the beginning of this section does
not require any computational effort if it is combined with the subtraction that
immediately preceded this multiplication.
EXAMPLE.
If k = log(n) is even, then we can start with f = f mod (z n 1) and apply the
radix-4 reduction steps until we have f mod (z (j) ) = f ( (j) ) for all j < n, i.e.
the desired FFT of f . If k is odd, then one level of reduction steps from the classical
radix-2 FFT is needed to complete the FFT computation. The following butterfly
diagram illustrates this case for an FFT of size 8. The dotted lines show the location
of the radix-4 reduction step in this computation. The radix-2 reduction steps are
at the bottom of the diagram.

f0

f1

f2
0

f ( 0 )

f ( 4 )

f3
0

f4

2 0

f ( 2 )

f ( 6 )

f5

2 0

f6
3 0

f ( 1 )

f ( 5 )

f7
3 0

f ( 3 )

f ( 7 )

5. CLASSICAL RADIX-4 FFT

67

Algorithm : Classical radix-4 FFT


Input: f mod (z 4m (j) ), a polynomial of degree less than 4m.
Here, m is a power of 2 where 4m n.
Output: f ( (j4m+0) ), f ( (j4m+1) ), . . . , f ( (j4m+4m1) )
0A. If (4m) = 1 then return f mod (z (j) ) = f ( (j) )
0B. If (4m) = 2 then use a radix-2 algorithm to compute the desired FFT
1. Split f mod (z 4m (j) ) into four blocks
fA , fB , fC , and fD each of size m such that
f mod (z 4m (j) ) = fA z 3m + fB z 2m + fC z m + fD
2. Compute fA 3(4j) , fB 2(4j) , and fC (4j)
3. Compute f = fA 3(4j) + fC (4j)
4. Compute f = fB 2(4j) + fD
5. Compute f = I (fA 3(4j) + fC (4j) )
6. Compute f = fB 2(4j) + fD
7. Compute f mod (z m (4j) ) = f + f
8. Compute f mod (z m (4j+1) ) = f + f
9. Compute f mod (z m (4j+2) ) = f + f
10. Compute f mod (z m (4j+3) ) = f + f
11. Compute the FFT of f mod (z m (4j) ) to obtain
f ( (j4m+0) ), f ( (j4m+1) ), . . . , f ( (j4m+m1) )
12. Compute the FFT of f mod (z m (4j+1) ) to obtain
f ( (j4m+m) ), f ( (j4m+m+1) ), . . . , f ( (j4m+2m1) )
13. Compute the FFT of f mod (z m (4j+2) ) to obtain
f ( (j4m+2m) ), f ( (j4m+2m+1) ), . . . , f ( (j4m+3m1) )
14. Compute the FFT of f mod (z m (4j+3) ) to obtain
f ( (j4m+3m) ), f ( (j4m+3m+1) ), . . . , f ( (j4m+4m1) )
15. Return f ( (j4m+0) ), f ( (j4m+1) ), . . . , f ( (j4m+4m1) )
Figure 3. Pseudocode for classical radix-4 FFT

Pseudocode for this FFT algorithm is given in Figure 3. We leave the analysis
and operation count of this algorithm as an exercise for the reader. In this algorithm, one must remember to subtract multiplications in the case where j = 0 and
a slightly simplified reduction step is used.
The number of operations required to compute an FFT using the classical
radix-4 algorithm is given by

M (n) =
A(n) =

3
8
3
8

n log2 (n) n + 1
n log2 (n) 87 n + 1

if log2 (n) is even


if log2 (n) is odd

n log2 (n)

This algorithm has the same addition count as the classical radix-2 FFT algorithm,
but the multiplication count has been significantly reduced.

68

4. FFT ALGORITHMS

6. Twisted radix-4 FFT


Now, we will consider the twisted version of the radix-4 FFT. Observe that
when j = 0 in the classical version, then the reduction step simplifies to

fW
fX

fY
fZ

1
1
=
I
I

1
1
1 1
1 I
1 I


fA
1
fB
1

1 fC
fD
1

The significance of this case is that no complex multiplications are needed in the
transformation other than a multiplication by I which simply involves swapping
components.
The input to the twisted radix-4 FFT reduction step is always given by f (z) =
f (z) mod (z 4m 1) and the output is always f (z) mod (z m 1), f (z) mod (z m +
1), f (z) mod (z m I), and f (z) mod (z m + I).

This is implemented by splitting the input polynomial f (z) into four blocks
of size m by writing f (z) = fA z 3m + fB z 2m + fC z m + fD . and then using
the butterfly operation given in the previous section to produce {fW , fX , fY , fZ }.
Again, j = 0 and so no multiplications are required at the beginning of the reduction
step.
Now, fW is already in the form needed to apply the twisted FFT reduction
step, but fX , fY , and fZ are not in the required form. A transformation of the
complex plane implemented through the twisted polynomial can produce a result
that is in the required form for each of these cases. It is left as an exercise for the
reader to verify that the value of needed in the twisted polynomial calculation to
produce the desired effect for each of these three outputs is given by

fX = f (z) mod (z m + 1) : = (1)/m = 1(180/m)


fY = f (z) mod (z m I) : = (2)/m = 1(90/m)

fZ = f (z) mod (z m + I) : = (3)/m = 1(270/m)

The following figure illustrates the transformations involved with the same reduction steps considered in the previous section for n = 16.

6. TWISTED RADIX-4 FFT

69

f (z) mod (z 16 1)

f (z) mod (z 4 1)

f ()

f ( 9 )

f ( 2 z) mod (z 4 1)

f ( 5 )

f ( 13 )

f ( z) mod (z 4 1)

f ( 3 )

f ( 11 )

f ( 3 z) mod (z 4 1)

f ( 7 )

f ( 15 )

So the complete reduction step consists of reducing f (z) into fW , fX , fY , and


fZ and then twisting the final three of these results. The butterfly diagram given
in the previous section for an FFT of size 8 has been modified in the figure below
so that one can see how the twisted radix-4 FFT would perform this computation.
Again, the radix-2 FFT is needed for the final stage.

f0

f1

f2

f ( 0 )

f ( 4 )

f3

f ( 2 )

f4

f ( 6 )

f5

f ( 1 )

f6

f ( 5 )

f7

f ( 3 )

f ( 7 )

EXAMPLE.
As an exercise, the reader can write pseudocode to implement the twisted radix4 FFT algorithm and then analyze it. One can compare the differences between

70

4. FFT ALGORITHMS

the two radix-2 algorithms to help complete this task. As one might suspect, the
operation counts of the twisted radix-4 FFT is the same as the classical radix-4
FFT.

7. Radix-8 FFT algorithms


The radix-4 algorithms were based on the fact a multiplication by I or I is
easy to implement in the complex numbers. It turns out that a multiplication by
a primitive 8th root of unity in the complex numbers is cheaper than a multiplication by an arbitrary complex number. Using the definition discussed in Chapter
3, a complex multiplication requires 4 multiplications and 2 additions of real number numbers. The four primitive 8th roots of unity in the complex numbers are
illustrated in the figure below.

3 = 16 135o

= 16 45o
1

5 = 16 225o

7 = 16 315o

and the following result shows that a multiplication by one of these 8th roots of
unity is cheaper than a multiplication by an arbitrary complex number.

7. RADIX-8 FFT ALGORITHMS

71

Complex multiplication by the primitive 8th roots of unity.


The product of
a complex
number A + I B and a primitive 8th root
of unity = 2/2 + I 2/2 is given by

2
2
(A B) + I
(A + B)
(A + I B) =
2
2
Similarly, multiplication by the other primitive 8th roots of unity 3 ,
5 , and 7 is given by

2
2
3
(A + I B) =
(A B) + I
(B A)
2
2
2
2
(B A) + I
(A B)
5 (A + I B) =
2
2

2
2
7 (A + I B) =
(A + B) + I
(A B)
2
2
Note that each product requires2 real multiplications and
2 real additions (assuming that 2/2 has been precomputed).

The reduction step of the classical radix-8 algorithm receives as input f (z) mod
(z 8m b8 ) and produces as output

fS
fT

=
=

f (z) mod (z m b)
f (z) mod (z m + b)

fU
fV

=
=

fW
fX

=
=

f (z) mod (z m I b)
f (z) mod (z m + I b)

fY

fZ

f (z) mod (z m b)
f (z) mod (z m + b)

f (z) mod (z m 3 b)

f (z) mod (z m + 3 b)

By some additional properties of the function,

72

4. FFT ALGORITHMS

Still more properties of the binary reversal function.


(8). (j) = 8 (8j) for j < n/8
(9). (8j + 1) = (8j) + n/2 for j < n/8
(10). (8j + 2) = (8j) + n/4 for j < n/8
(11). (8j + 3) = (8j) + 3 n/4 for j < n/8
(12). (8j + 4) = (8j) + n/8 for j < n/8
(13). (8j + 5) = (8j) + 5 n/8 for j < n/8
(14). (8j + 6) = (8j) + 3 n/8 for j < n/8
(15). (8j + 7) = (8j) + 7 n/8 for j < n/8
one can let b = (8j) for some j < n/8m and show that each reduction step of the
classical radix-8 FFT receives as input f = f mod (z 4m (j) ) and computes

fS
fT
fU
fV
fW
fX
fY
fZ

= f (z) mod (z m (8j) )

= f (z) mod (z m (8j+1) )

= f (z) mod (z m (8j+2) )

= f (z) mod (z m (8j+3) )

= f (z) mod (z m (8j+4) )

= f (z) mod (z m (8j+5) )

= f (z) mod (z m (8j+6) )

= f (z) mod (z m (8j+7) )

A partial modulus polynomial tree for the case n = 64 is given in the figure
below

7. RADIX-8 FFT ALGORITHMS

73

z 64 1

z8 0

z 8 32 z 8 16 z 8 48

32 16 48

40 24 56

z8 8

z 8 40 z 8 24 z 8 56

39 23 55 15 47 31 63

The reduction step for the classical radix-8 algorithm is given by the matrix
transformation

fS
fT
fU
fV
fW
fX
fY
fZ

1
1
I
I
7
3
5

1
1
1 1
1 I
1 I
I 5
I
I
7
I
3

1
1
1
1 1 1
1 I 1
1
I 1
1 3
I
1 7
I
1 I
1 5 I

1
1
I
I

5
3
7

1
1
1
1
1
1
1
1

7(8j) fA
6(8j) fB
5(8j) fC
4(8j) fD
3(8j) fE
2(8j) fF
(8j) fG
fH

where the input to the reduction step has been subdivided into eight blocks of size
m, i.e.

f (z) mod (z 8m b8 ) =

fA z 7m + fB z 6m + fC z 5m + fD z 4m
+fE z 3m + fF z 2m + fG z m + fH

However, as with the radix-4 algorithms, the matrix computation does not
describe the best way to perform the reduction step. Instead, the butterfly diagram

74

4. FFT ALGORITHMS

fH

fG
(8j)

fF

3(8j)

fE

2(8j)

fD
4(8j)

fC
5(8j)

fS

fT

fU

fV

fA
7(8j)

fB
6(8j)

fW

fX

fY

fZ

illustrates the most efficient method for calculating the classical radix-8 FFT reduction step.
The rest of the algorithm details are similar to the classical radix-2 and radix-4
algorithms and are left as an exercise for the reader. One can show that a total of

M (n) =
A(n) =

3
8
3
8
3
8

n log2 (n) n + 1
if log2 (n) mod 3 = 0
n log2 (n) 78 n + 1 if log2 (n) mod 3 = 1
n log2 (n) n + 1
if log2 (n) mod 3 = 2

n log2 (n)

operations are needed to implement the classical radix-8 algorithm. This does not
appear to be an improvement compared to the radix-4 algorithms, but we have not
yet accounted for the special multiplications by the primitive 8th roots of unity.
One can show that

M8 (n)

1
12
1
12
1
12

n log2 (n)
n log2 (n)
n log2 (n)

1
12 n
1
6 n

if log2 (n) mod 3 = 0


if log2 (n) mod 3 = 1
if log2 (n) mod 3 = 2

of the multiplications above are multiplications by primitive 8th roots of unity. We


need to determine how many operations are saved by multiplying by these special
elements.
We mentioned at the beginning of this section that a general complex multiplication requires 4 real multiplications and 2 real additions. It turns out that
multiplication by any root of unity can be implemented using 3 real multiplications
and 3 real additions through the following result credited to Oscar Buneman [8].

7. RADIX-8 FFT ALGORITHMS

75

Complex multiplication by root of unity.


The product of a complex number C + I D and a root of unity
A + I B is given by
(A + I B) (C + I D) =

U +IV

where
U

= W T V

V
W

= D+BW
= C T D

B
1A
=
B
1+A

Here, T is precomputed and the root of unity is stored in a computer


as the ordered pair (T, B) rather than (A, B).
In the early days of computers, a real multiplication required significantly more
effort than a real addition and this technique was advantageous. Later, the time
needed to implement a real multiplication became very close to the same as the
effort needed for a real addition and so this technique is no longer as popular. Another reason why this new technique is not used too often is that results from this
technique are sometimes not as accurate as those from the traditional method due
to errors that show up when a number very close to zero appears in the denominator of a fraction. This is often called numerical instability. While this text will
always model the cost of a general complex multiplication as 4 real multiplications
and 2 real additions, one should be aware of the other method as it is sometimes
used in FFT literature.
Modeling an complex addition with 2 real additions, a complex multiplication
with 4 real multiplications and 2 real additions, and a multiplication by a primitive
8th root of unity with 2 real multiplications and 2 real additions, the total number
of real operations needed to implement the radix-8 algorithm for an input size which
is a power of 8 is given by

MR (n)

AR (n)

4
n log2 (n) 4 n + 4
3
11
n log2 (n) 2 n + 2
4

By determining the number of real operations needed for the radix-4 algorithms,
one can show that the classical radix-8 algorithm requires 1/6 n log2 (n) fewer real
multiplications compared to the radix-4 algorithms. The savings for other input

76

4. FFT ALGORITHMS

sizes are close to the above results, but not quite as attractive. These details are
left as exercises.
A twisted radix-8 FFT algorithm can be constructed by following the same
techniques used to convert the classical radix-2 and radix-4 algorithms into the
twisted versions. This also is left as an exercise for the reader. As one might
suspect, the twisted radix-8 algorithm has the same operation count as the classical
radix-8 algorithm.

8. Split-radix FFT
It is possible to improve upon the counts of the radix-8 algorithm by constructing an algorithm which combines the radix-2 and radix-4 FFT reduction steps. This
can be done for both the classical and twisted formulations of the algorithm. Here,
we will show how to develop such an algorithm for the twisted case. This algorithm
was introduced in [38], but first clearly described and named over 15 years later in
[13].
Consider the computation of f (z) = f ( z) mod (x42m 1) using the twisted
radix-4 algorithm where 0 < n/(8m). The reduction step transforms this input
polynomial into f (z) mod (z 2m 1), f (z) mod (z 2m + 1), f (z) mod (z 2m I)
and f (z) mod (z 2m + I). The final three of these outputs are then twisted in
order to continue using the simplied reduction step. However, it is not necessary
to twist f (z) mod (z 2m + 1) at this point. Rather, we can reduce this result
into f (z) mod (z m I) and f (z) mod (z m + I) without any multiplications in
R. The radix-4 algorithm does not exploit this situation and thus there is room
for improvement. It turns out that even greater savings can be achieved if we
reduce f (z) into f (z) mod (z 22m 1) instead and then reduce this polynomial
into f (z) mod (z m I) and f (z) mod (z m + I).
The split-radix algorithm is based on a reduction step which receives as input
f (z) = f ( z) mod (z 4m 1) where 0 < n/(4m) and produces as output

fW z m + fX

fY

fZ

f (z) mod (z 2m 1)
f (z) mod (z m I)

f (z) mod (z m + I)

The algorithm is called split-radix because this reduction step can be viewed
as a mixture of the radix-2 and radix-4 reduction steps. Here, fY and fZ need
to be twisted after the reduction step while the split-radix reduction step can be
directly applied to fW z m + fX . A modulus polynomial tree is given below for the
split-radix algorithm when n = 16.

8. SPLIT-RADIX FFT

77

z 16 0
z8 0
z4 0

z4 4

z2 0
0

12

z2 4

z 2 12

z2 2

10

14

z 4 12
z2 6

13

11

15

Let f (z) = f ( z) mod (z 4m 1), the input to the split-radix algorithm


reduction step, be expressed as fA z 3m + fB z 2m + fC z m + fD . We will use
part of the radix-2 reduction step to obtain fW z m + fX and part of the radix-4
reduction step to obtain fY and fZ . These results are computed using

fW
fX

= fA + fC
= fB + fD

fY
fZ

= I fA fB + I fC + fD
= I fA fB I fC + fD

As with the other algorithms, these formulas do not show the most efficient method
of performing the computations. The following butterfly diagram shows how to
perform the computations using only 6m additions.

fD

fC

fB

fA

fX

fW
+

fY

fZ

78

4. FFT ALGORITHMS

After the reduction step, fY should be twisted by (2)/m and fZ can be twisted
by (3)/m so that the split-radix algorithm reduction step can be applied to these
results. The output fW z m + fX is already in the proper form.
It can be shown that (2)/m can also be used to compute the twisted polynomial for fZ . This observation was the basis for an algorithm called the conjugatepair algorithm [27]. The only difference between this algorithm and the split-radix
reduction step described above is this different transformation used for fZ . Originally, it was claimed that this reduces the number of operations of the split-radix
algorithm, but it was later demonstrated ([21], [28]) that the two algorithms require the same number of operations. However, the conjugate-pair version requires
less storage by exploiting the fact that (2)/m and (2)/m are conjugate pairs.
The split-radix algorithm receives as input f = f mod (z n 1). By recursively
applying the reduction step, we obtain f ( (j) z) mod (z 2 1) for some values
of j and f ( (j) z) mod (z 1) for other values of j. Reduction steps from the
twisted radix-2 algorithm can be used to complete the computation of the FFT.
The following butterfly diagram shows how an FFT of size 8 can be computed using
the conjugate-pair version of the split-radix FFT. Dotted lines have been added to
the diagram to show the locations of the split-radix reduction steps.

f0

f1

f2

f3

f ( 0 )

f ( 4 )

f4

f ( 2 )

f ( 6 )

f5

f ( 1 )

f6

f ( 5 )

f7

f ( 3 )

f ( 7 )

Pseudocode for the conjugate pair version of the split-radix FFT algorithm is
given in Figure 4. It is left as an exercise for the reader to analyze this algorithm.
The resulting recurrence relations are given by

M (n) =
A(n)

n 1
n
+2M
+ n2
M
2
4
2
n 3
n
+2A
+ n
A
2
4
2

8. SPLIT-RADIX FFT

79

Algorithm : Split-radix FFT


Input: f ( (j)/(4m) z) mod (z 4m 1) where m is a power of 2
such that 4m < n.
Output: f ( (j) ), f ( (j+1) ), . . . , f ( (j+4m1) )
0A. If (4m) = 1, then return f ( (j) ) = f ( (j) z) mod (z 1)
0B. If (4m) = 2, then call a radix-2 algorithm to compute the FFT
1. Split f ( (j)/(4m) z) mod (z 4m 1) into four blocks fA , fB , fC , and fD
each of size m such that
f ( (j)/(4m) z) mod (z 4m 1) = fA z 3m + fB z 2m + fC z m + fD
2. Compute fW z m + fX
= f ( (2j)/(2m) z) mod (z 2m 1) = (fA + fC ) z m + (fB + fD )
3. Compute f = fB + fD
4. Compute f = I (fA + fC )
5. Compute fY = f + f
6. Compute fZ = f + f
7. Compute f ( (4j+2)/m z) mod (z m 1) by twisting fY by (2)/m
8. Compute f ( (4j+3)/m z) mod (z m 1) by twisting fZ by (2)/m
9. Compute the FFT of f ( (2j)/(2m) z) mod (z 2m 1) to obtain
f ( (j) ), f ( (j+1) ), . . . , f ( (j+2m1) )
10. Compute the FFT of f ( (4j+2)/m z) mod (z m 1) to obtain
f ( (j+2m) ), f ( (j+2m+1) ), . . . , f ( (j+3m1) )
11. Compute the FFT of f ( (4j+3)/m z) mod (z m 1) to obtain
f ( (j+3m) ), f ( (j+3m+1) ), . . . , f ( (j+4m1) )
12. Return f ( (j) ), f ( (j+1) ), . . . , f ( (j+4m1) )
Figure 4. Pseudocode for split-radix FFT

where M (1) = 0, A(1) = 0, M (2) = 0, and A(2) = 2. Because the reduction


step involves two different output sizes, somewhat more advanced mathematics is
required to obtain the closed-form operation count formulas.

M (n) =
A(n)

1
3
1
3

n log2 (n)
n log2 (n)

8
9
8
9

n log2 (n)

n+
n+

8
9
10
9

if log2 (n) is even


if log2 (n) is odd

The use of generating functions from combinatorics is recommended for these


derivations. The number of multiplications by primitive 8th roots of unity is given
by

M8 (n) =

M8

n
2

+ 2 M8

n
4

+2

where M8 (1) = 0, M8 (2) = 0 and M8 (4) = 0. The solution to this recursion is


given by

80

4. FFT ALGORITHMS

M8 (n) =

1
3
1
3

n
n

4
3
2
3

if log2 (n) is even


if log2 (n) is odd

The number of operations in the real numbers is given by

MR (n) =
AR (n) =

4
n log2 (n)
3
8
n log2 (n)
3

38
n+
9
16
n
9

2
(1)log2 (n) + 6
9
2
(1)log2 (n) + 2
9

The results for input sizes which are a power of 4 are slightly more attractive than
other input sizes. In any event, the split-radix algorithm requires less operations
than the radix-8 algorithm.

9. Other 2-adic FFT algorithms


Radix-16 algorithms have been proposed (e.g. [5]), but they do not improve
upon the operation counts given in this chapter. This is because there does not
appear to be anything special involved in multiplying by a primitive 16th root of
unity that can be exploited to reduce the overall effort.
An extended split-radix [35] algorithm which mixes the radix-2 and radix8 reduction steps has also been proposed, but has been shown [6] to require the
same total number of operations as the algorithm presented in this chapter. This
paper [6] also claims that split-radix algorithms which mix the radix-2 and radix16 reduction steps and radix-4 and radix-16 reduction steps also require the same
number of operations. Another claim given in [6] is that a split-radix algorithm
combining any two different radix-2c FFT reduction steps requires the same total
number of operations.
Just when people were beginning to think that there was no way to improve
upon the split-radix algorithm discussed earlier in this chapter, a new version of this
algorithm arrived on the scene in a paper written by Johnson and Frigo [26] which
is based on earlier work of van Biskirk. Bernstein [3] independently developed his
own version of the algorithm as well as the present author using a preprint of the
Johnson and Frigo paper.
The basic idea of this algorithm is that all of the roots of unity are scaled to
points on the square illustrated below.

9. OTHER 2-ADIC FFT ALGORITHMS

81

P
r=1

One can show that multiplication by any point on this square requires the same
number of multiplications as the special primitive 8th roots of unity considered
earlier in this chapter.
EXAMPLE

Let us consider multiplication by sec(22.5o )22.5o .


This is a primitive 16th root of unity that has been
scaled by 1/ cos(22.5o ) and labeled as P in the figure
above. Now, this point is equivalent to

1/ cos(22.5o) (cos(22.5o ) + I sin(22.5o ))


1 + I tan(22.5o )

Now, multiplication of a complex number A + I B by


1 + I tan(22.5o) can be computed using

(A + I B) (1 + I tan(22.5o ))

(A tan(22.5o) B) + I (B + tan(22.5o ) A)

using the multiplication formula discussed in Chapter


3. Observe 2 real multiplications and 2 real additions
are required assuming that tan(22.5o) is stored in a
lookup table. The is the same number of operations
required for multiplication by a primitive 8th root of
unity.


Because of the use of the tangent (and cotangent) function in multiplication by


points on this square, Bernstein calls his version of the algorithm the Tangent
FFT.

82

4. FFT ALGORITHMS

The new algorithm works by replacing the points on the circle used with the
split-radix algorithm with the points on the square instead. This reduces the number of operations needed to implement the reduction steps, but distorts all of the
answers to incorrect results.
The trick is to carefully keep track of how much each result has been distorted
and then unscale the results later in the algorithm. The scaling factors used in the
actual algorithm are somewhat more complicated, but allow the multiplications
by points on the square to be used as often as possible. The places where the
results are unscaled are more expensive than the reduction steps of the split-radix
algorithm, but the idea is to have more reduction steps involving the points on the
square than reduction steps where results are unscaled.
Because of all of this bookkeeping, the algorithm is somewhat complicated and
uses four different modified versions of the split-radix reduction step to achieve
the effect described above. The end result is that it reduces the number of real
multiplications by about six percent as the FFT size becomes large. Those that
would like to learn more about this algorithm are encouraged to read the papers
cited above, but the details are too complicated to present here.

10. Properties of the DFT


In this section, we will discuss a number of properties of the DFT for which
is the basis of the FFT algorithms in this chapter. In other words, if f (z) is a
polynomial with complex coefficients, then the DFT of f (z) is the set of function
evaluations {f ( 0 ), f ( 1 ), . . . , f ( n1 )} where is a primitive nth root of unity.
In later sections, we will show how these properties relate to the properties of the
DFT commonly found in engineering textbooks and we will show some applications
of the properties presented in this section.
First, if f (z) and g(z) are two polynomials with complex coefficients, then the
evaluation of these functions is considered a linear operation. Although the property holds for all complex numbers, we will present the property in terms of the
roots of unity used for the DFT. In other words,

Linearity.
Given complex numbers a and b, polynomials f (z), g(z), and
h(z) = a f (z) + b g(z), then
h( j ) = a f ( j ) + b g( j ) for all 0 j < n.

EXAMPLE
A related property involves the product of two input polynomials.

10. PROPERTIES OF THE DFT

83

Convolution.
Given polynomials f (z), g(z), and h(z) = f (z) g(z), then
h( j ) = f ( j ) g( j ) for all 0 j < n.
EXAMPLE
We will see in Chapter 7 why this property is called convolution and why it
is probably the most important property of the FFT.
Next, suppose that the coefficients of some polynomial f (z) of degree less than
n were circularly shifted by i positions to form a new polynomial g(z). Then the
multipoint evaluation of g(z) is related to the multipoint evaluation of f (z) as follows:

Shifting.
If f (z) is a polynomial of degree less than n and
g(z) = z i f (z) mod (z n 1), then
g( j ) =

ij f ( j )

for all 0 j < n.

EXAMPLE
A related property instead circularly shifts the output of the multipoint evaluation

Modulation.
If f (z) is a polynomial of degree less than n and g(z) is a polynomial
formed by multiplying the coefficient of degree k in f (z) by ik for
all 0 k < n, then
g( j ) =
for all 0 j < n.

f ( j+k )

84

4. FFT ALGORITHMS

EXAMPLE
A similar result holds if the coefficients of f (z) are reversed.
Reversal.
If f (z) is a polynomial of degree less than n and g(z) = z i f (z 1 ),
then
g( j )

= i f ( j ) = i f ( nj )

for all 0 j < n.


EXAMPLE
The reversal of f (z) with respect to degree n is sometimes denoted frevn (z). If it
can be understood from the context of the situation what n is, then the notation
can be simplified to frev (z). This simplified notation will be used throughout the
remainder of the text.
If each of the coefficients of f (z) are conjugated, we obtain the following result
Complex conjugation.
If f (z) is a polynomial of degree less than n and g(z) is formed by
computing the complex conjugate of f (z), i.e. g(z) = f (z), then
g( j ) =

f ( j ) = f ( nj )

for all 0 j < n.


EXAMPLE
Up to this point, the properties of the DFT were given for any polynomial
with complex number coefficients. We will now present several related symmetry
properties which hold when the polynomial has special characteristics.

10. PROPERTIES OF THE DFT

85

Symmetry Property I (for real polynomial).


If f (z) is a polynomial of degree less than n with real coefficients, then
f ( j ) =
f ( nj ) =

f ( nj )
f ( j )

for all 0 j < n.


This property can be easily established using the complex conjugate property.
EXAMPLE
Symmetry Property II (for purely imaginary polynomial).
If f (z) is a polynomial of degree less than n with purely imaginary
coefficients, then
f ( j ) =
f (

nj

) =

f ( nj )
f ( j )

for all 0 j < n.


This property can be easily established by factoring out an I from the polynomial
and then using symmetry property I.
EXAMPLE
The next two properties involve special symmetries that exist with the coefficients that comprise the polynomial which we will assume to be of degree less than
n. We are going to view the coefficients as a sequence of numbers of length n. In
other words, if

f (z) = fn1 z n1 + fn2 z n2 + + f1 z + f0 ,


then we are interested in the sequence of numbers [f0 , f1 , . . . , fn2 , fn1 ]. Two
special types of sequences are given by the following definitions:

Even sequence.
A sequence of numbers [f0 , f1 , . . . , fn2 , fn1 ] of length n is said to
be even if fj = fnj for all 0 < j < n.

86

4. FFT ALGORITHMS

EXAMPLE
Odd sequence.
A sequence of numbers [f0 , f1 , . . . , fn2 , fn1 ] of length n is said to
be odd if fj = fnj for all 0 < j < n.

EXAMPLE
Now that we are clear what is meant by an even sequence and an odd sequence of
numbers, we are ready to state the remaining symmetry properties
Symmetry Property III (Even sequence polynomial).
If the coefficients of a polynomial f (z) of degree less than n form
an even sequence, then f ( j ) has no imaginary component for any
0 j < n, i.e.
f ( j )

= Re(f ( j ))

EXAMPLE
Symmetry Property IV (Odd sequence polynomial).
If the coefficients of a polynomial f (z) of degree less than n form an
odd sequence, then f ( j ) has no real component for any 0 j < n,
i.e.
f ( j ) =

Im(f ( j ))

EXAMPLE
The next result shows how to transform any polynomial into the sum of one whose
coefficients form an even sequence and one whose coefficients form an odd sequence
of numbers.

11. FFTS OF REAL DATA

87

Decomposition.
Any polynomial f (z) can be decomposed into the sum of an even
sequence polynomial and an odd sequence polynomial as follows:
f (z) + frev (z) f (z) frev (z)
+
2
2
= fa (z) + fb (z)

f (z) =

where fa (z) is the even sequence polynomial


fa (z) =

f (z) + frev (z)


2

and fb (z) is the odd sequence polynomial


fb (z) =

f (z) frev (z)


.
2

Here, frev (z) is determined by computing the reversal of f (z) with


respect to degree n and then conjugating each of the coefficients in
the resulting polynomial.
EXAMPLE
These symmetry properties probably appear complicated, but are important
because they allow any polynomial to be transformed into the sum of two polynomials, where the DFT of one of the polynomials is the real part of the DFT of the
original polynomial and the DFT of the other is the imaginary part of the DFT of
the original polynomial. We will exploit this situation in the next section. There
is much more that can still be said about the symmetry properties of the DFT.
However, most of the good literature on this topic are written from the engineering
perspective. The reader is encouraged to study [REF] to learn more about these
symmetry properties after understanding the engineering perspective covered in
Chapter 7 of the present text.
There are also several more useful properties of the DFT which were omitted
from this section. The reader can refer to [7] to learn about these properties. We
will discuss the important duality property in Chapter 5 after the inverse FFT has
been presented and will return to the convolution property again in Chapter 7.

11. FFTs of real data


If the input to an FFT consists entirely of real numbers, we can reduce the work
needed to compute the coefficients of a discrete Fourier series by roughly half. One
might be tempted to use Symmetry Property I to compute f ( j ) for 0 j n/2,

88

4. FFT ALGORITHMS

taking advantage of the fact that f ( nj ) is the complex conjugate of f ( j ) for


all 0 j < n/2. In practice, f ( j ) and f ( nj ) appear adjacent to each other in
the scrambled order of the FFT algorithm output, and there appears to be little
effort saved by only computing half of the FFT outputs. We will indeed use the
symmetry properties, but will take a different approach so that the effort needed
to compute this FFT is indeed reduced by almost half.
Let us define fe (z) to be a polynomial consisting of all of the even-indexed
terms of f (z) and fo (z) to be a polynomial consisting of all of the odd-indexed
terms. Thus,

fe (z) =
fo (z) =

fn2 z n/21 + fn4 z n/22 + + f2 z + f0

fn1 z n/21 + fn3 z n/22 + + f3 z + f1

We are going to create a new polynomial g(z) where all of the even indexed coefficients of f (z) are in the real components of g(z) and all of the odd-indexed
coefficients of f (z) are in the imaginary components of g(z). In other words, if f (z)
is a polynomial with all real coefficients represented by

f (z) =
=

fn1 z n1 + fn2 z n2 + + f1 z + f0

fe (z 2 ) + z fo (z 2 ),

then

g(z) =
=

(fn2 + I fn1 ) z n/21 + + (f0 + I f1 )

fe (z) + I fo (z)

Observe that if fe ( 2j ) and fo ( 2j ) are known for all 0 j < n/2, then we can
compute

f ( j ) =

fe ( 2j ) + fo ( 2j )

to compute f ( j ) for all 0 j < n/2. Symmetry Property I can then be used to
easily recover f ( j ) for all n/2 j < n. So the problem of computing the FFT of
f (z) at each of the n powers of has been transformed into computing the FFT
of g(z) at each of the n/2 powers of 2 .
The size n/2 FFT of g(z) can be computed in roughly half of the effort as
computing the size n FFT of f (z), but we then need to relate the FFT of g(z)
to computing the FFT of f (z). Here is where the symmetry properties from the
previous section come into play.
At first glance, the solution appears simple because

11. FFTS OF REAL DATA

g( 2j )

89

= fe ( 2j ) + I fo ( 2j )

and one is tempted to extract the real part of this result and call it fe ( 2j ) and
assign fo ( 2j ) to the imaginary part. The problem with this approach is that
fe ( 2j ) and fo ( 2j ) are themselves complex numbers which are combined to form
g( 2j ), the output of the FFT. We need a way of separating this output into the
components fe ( 2j ) and fo ( 2j ).
From Symmetry Property I, we know that since fe (z) is a purely real function,
then

fe ( nj ) =

fe ( j )

Similarly, from Symmetry Property II, we know that since fo (z) is a purely imaginary function, then

fo ( nj )

= fo ( j ).

So,

g( nj ) =

fe ( nj + I fo ( nj )

fe ( j ) + I fo ( j )

=
and
g( nj )

= fe ( j ) + I fo ( j )

= fe ( j ) I fo ( j )

So now using the method of elimination on the set of equations

g( 2j ) =
g( n2j ) =

fe ( 2j ) + I fo ( 2j )

fe ( 2j ) I fo ( 2j )

we are able to obtain the following formulas for fe ( 2j ) and fo ( 2j ):

fe ( 2j ) =
fo ( 2j ) =

g( n2j ) + g( 2j )
2
g( n2j ) g( 2j )
I
2

90

4. FFT ALGORITHMS

The same formulas can also be obtained by applying Symmetry Properties III and
IV to

g( 2j )

= fe ( 2j ) + I fo ( 2j )

by decomposing the two polynomials on the right side of the equation into sums of
an even sequence polynomial and an odd sequence polynomial. This derivation is
left as an exercise.
One detail that needs to be worked out is how to compute f ( n/2 ) = f (1).
Note that the above formulas do not apply to this case and g( n/2 ) is not computed
as part of the FFT. However, it can be easily shown that f ( n/2 ) = fe ( n/2 )
fo ( n/2 ) and so the entire FFT of f (z) can be computed using just the components
of the FFT of g(z) with roughly half of the effort.
In summary, here are the steps that we need to do to compute the FFT of a
real polynomial:

11. FFTS OF REAL DATA

FFT of a real polynomial.


Given a polynomial f (z) = fn1 z n1 + fn2 z n2 + + f1 z + f0
with real coefficients, the FFT of this polynomial can be efficiently
computed as follows:
1. Construct
g(z) =

(fn2 + I fn1 ) z n/21 + + (f0 + I f1 )

2. Compute the size n/2 FFT of g(z) to obtain

{g( 0 ), g( 2 ), g( 4 ), . . . , g( n2 )}

3. For each 0 j < n/2, compute


fe ( 2j ) =
fo ( 2j ) =

g( n2j ) + g( 2j )
2
g( n2j ) g( 2j )
I
2

where
fe (z) =
fo (z) =

fn2 z n/21 + fn4 z n/22 + + f2 z + f0

fn1 z n/21 + fn3 z n/22 + + f3 z + f1

4. For each 0 j < n/2, compute


f ( j ) =

5. Compute

fe ( 2j ) + fo ( 2j ).

f ( n/2 ) =

fe ( 0 ) fo ( 0 )

6. For each n/2 < j < n, compute


f ( j ) =
EXAMPLE

f ( nj )

91

CHAPTER 5

Inverse FFT algorithms


1. Lagrangian Interpolation
The inverse of the task of evaluating a polynomial at a collection of points is
to interpolate these evaluations back into the original polynomial. In other words,
let f be an unknown polynomial of degree less than n and suppose that we are
given the evaluation of f at n arbitrary points {0 , 1 , . . . , n1 }. In other words,
we are given {a0 , a1 , . . . , an1 } where f (i ) = ai for 0 i < n. Our task is to
recover the unknown polynomial f . Here, we will assume that f is a polynomial
with complex number coefficients. The technique of Lagrangian interpolation can
be used to accomplish this goal.

LaGrange interpolating polynomials.


Given a set {0 , 1 , , n1 } of n points, let Li (z) be defined by
Li (z) =

n1
Y

j=0,j6=i

z j
i j

(z 1 ) (z i1 ) (z i+1 ) (z n1 )
(i 1 ) (i i1 ) (i i+1 ) (i n1 )
for all 0 i < n. These LaGrange interpolating polynomials
have the property that

0 if j 6= i
Li (j ) =
1 if j = i
=

Note that each Lagrange interpolating polynomial has degree n 1.

We will assume that {0 , 1 , , n1 } is fixed so that the Lagrangian interpolating


polynomials can be precomputed. For an arbitrary collection of points, it requires
significant effort to compute these polynomials.
EXAMPLE.
Consider the function g0 (z) = a0 L0 (z). Observe that g0 (0 ) = a0 1 = a0 and
that g0 (j ) = a0 0 = 0 for all j > 0. Similarly, define the function gi (z) = ai Li (z)
for all 0 < i < n. Observe that gi (i ) = ai 1 = ai and that gi (j ) = ai 0 = 0 for
all j 6= i.
93

94

5. INVERSE FFT ALGORITHMS

EXAMPLE.
We can construct f by combining g0 , g1 , . . . gn1 as given by

f (z) = a0 L0 (z) + a1 L1 (z) + + an1 Ln1 (z)


Using this equation, one can verify that f (i ) = ai for 0 i < n.
EXAMPLE.
Since f (z) expressed in terms of LaGrange interpolating polynomials has n
terms, each of which consists of multiplying a constant by a polynomial of degree
n 1, then a total of n2 multiplications and n2 n additions are required to recover
f , assuming that the Lagrangian interpolating polynomials are precomputed. For
the rest of this chapter, we will consider the case where the points are the primitive
nth roots of unity which will reduce these operation counts.

2. Classical radix-2 inverse FFT


The radix-2 inverse FFT algorithm interpolates n = 2k evaluations into a polynomial f of degree less than n. Here, the n points used for the function evaluations
are nth roots of unity and roots of z n 1, presented in the order determined by
the function.
To undo the classical radix-2 reduction step for an FFT of size n = 2k , we need
to receive as input

fY
fZ

= f mod (z m b)

= f mod (z m + b)

and produce as output fA z m + fB = f mod (z 2m b2 ). Here, m = 2i for some


i < k and b = (2j) where = eI2/n and 0 j < 2ki1 . Here, fY and fZ are
each polynomials of degree less than m.
From the classical radix-2 FFT algorithm reduction step, we know that


fY
fZ

b 1
b 1

 

fA

fB

or

fY
fZ

= b fA + fB

= b fA + fB

2. CLASSICAL RADIX-2 INVERSE FFT

95

The technique of elimination can be used to solve these equations for fA and fB .
The results of these computations are

= 1/2 b1 (fZ fY )
= 1/2 (fY + fZ )

fA
fB
or


fA
fB

(2j)
1

(2j)
1


 
fY

fZ

once b1 = (2j) )1 = (2j) has been substituted into these equations. Observe
that (2j) = n(2j) . As a butterfly operation, the computation is expressed as

fB

fA

1/2
+

fY

(2j) /2

fZ

This could be used for the interpolation step of the classical radix-2 inverse FFT
(or IFFT) algorithm, but it has the drawback that two multiplications by 1/2 are
required in every interpolation step. This adds an extra step to the interpolation
step of the radix-2 algorithms and prevents the use of the special multiplication by I
needed to construct the radix-4, radix-8, and split-radix inverse FFT interpolation
steps.
Instead, the classical radix-2 IFFT saves the multiplications by 1/2 in each of
the interpolation steps until the end of the inverse FFT computation. So the inputs
of every interpolation step are scaled by m = 2i where m is the maximum size of
the inputs and the output is scaled by 2m. In other words, the revised interpolation
step receives as input

fY
fZ

= m f mod (z m (2j) )

= m f mod (z m (2j+1) )

and produces fA z m +fB = (2m)f mod (z 2m (j) ) as output using the formulas

96

5. INVERSE FFT ALGORITHMS

fA
fB

=
=

(2j) (fY fZ )
fY + fZ

So the revised interpolation step can be expressed with the following butterfly
diagram

fB

fA
(2j)

fY

fZ

Again, the input polynomials are scaled by m, the output polynomials are scaled
by 2m, and the factor of 1/2 has been removed from the interpolation step.
EXAMPLE
The input to the IFFT algorithm is f ( (j) ) = f mod (z (j) ) for all 0
j < n. By recursively applying the interpolation step, we obtain n f mod (z n 1)
at the end of the algorithm. Assuming that f (z) has degree less than n, then this
output is equal to n f (z). By multiplying this result by 1/n, then the desired
polynomial f (z) is recovered.
The following butterfly diagram shows how the process would work on an FFT
of size 8.

2. CLASSICAL RADIX-2 INVERSE FFT

97

Algorithm : Classical radix-2 IFFT


Input: The evaluations f ( (j2m+0) ), f ( (j2m+1) ), . . . , f ( (j2m+2m1) )
of some unknown polynomial f . Here, is a primitive
nth root of unity and m is a power of 2 where 2m n.
Output: (2m) f mod (x2m (j) )
0. If (2m) = 1 then return f mod (x (j) ) = f ( (j) )
1. Compute the IFFT of f ( (j2m+0) ), f ( (j2m+1) ), . . . , f ( (j2m+m1) )
to obtain fY = m f mod (xm (2j) )
2. Compute the IFFT of f ( (j2m+m) ), f ( (j2m+m+1) ), . . . , f ( (j2m+2m1) )
to obtain fZ = m f mod (xm (2j+1) )
3. Compute fA = ( (2j) )1 (fY fZ )
4. Compute fB = fY + fZ
5. Return (2m) f mod (x2m (j) ) = fA xm + fB
Figure 1. Pseudocode for classical radix-2 IFFT

n f0

n f1

n f2

n f3

0
+

f ( 0 )

f ( 4 )

n f4

1
+

f ( 6 )

n f7

f ( 2 )

n f6

2
+

n f5

f ( 1 )

f ( 5 )

3
+

f ( 3 )

f ( 7 )

Note that the outputs need to be scaled by 1/n to produce the correct coefficients
of the desired polynomial.
Pseudocode for this IFFT algorithm is given in Figure 1. The cost analysis
of this algorithm is left as an exercise for the reader. The number of operations
required for each line of the algorithm is identical to one of the lines of the classical
radix-2 FFT algorithm. Thus, the total number of operations needed to compute
the scaled IFFT of size n is

M (n) =
A(n)

n n
+
2M
2
 n2
2A
+n
2

98

5. INVERSE FFT ALGORITHMS

where M (1) = 0 and A(1) = 0. Subtracting n 1 multiplications to account for the


cases where j = 0, but adding n multiplications to produce to unscaled polynomial
f results in the formulas

M (n) =
A(n)

1
n log2 (n) + 1
2
n log2 (n)

Thus, the classical radix-2 IFFT algorithm requires the same number of operations
as the companion FFT algorithm, except that n extra multiplications are required
to multiply the final result of the algorithm by 1/n to produce the correct answer.
Again, this scaling is necessary because there are factors of 1/2 ignored in each of
the k = log2 (n) levels of interpolation steps of the algorithm.
3. Twisted radix-2 IFFT
We now going to consider how to undo the twisted radix-2 FFT reduction step.
Recall that this process had two parts. First, we receive as input the polynomial
fA xm + fB = f = f ( (j)/(2m) z) mod (z 2m 1) and produce
= f mod (z m 1)
= f mod (z m + 1)

fY
fZ

as output. Now, fY is also equal to f ( (2j)/(m) z) mod (z m 1) and can also be


used as input to another application of this reduction step. However, fZ is equal
to f ( (2j)/(m) z) mod (z m + 1) and is not yet in the proper form. The second
part of the process was to compute the twisted polynomial of fZ with parameter
= (1)/m to transform this result into f ( (2j+1)/(m) z) mod (z m 1), which
can be used as input to another application of the reduction step.
The classical radix-2 inverse FFT discussed in the previous section ignored
factors of 1/2 that were introduced in the interpolation steps. We will follow this
same practice for the twisted radix-2 inverse FFT as well. So, the interpolation
step of the twisted radix-2 inverse algorithm will produce output 2m f ( (j)/(2m)
z) mod (z 2m 1) given the two inputs m f ( (2j)/(m) z) mod (z m 1) and m
f ( (2j+1)/(m) z) mod (z m 1). To do so, we will first transform mf ( (2j+1)/(m)
z) mod (z m 1) into m f ( (2j)/(m) z) mod (z m + 1). It can be shown that the
twisted polynomial with parameter = (1)/m can be used to achieve this goal.
The inputs to the revised interpolation step have been transformed into

fY

fZ

m f mod (z m 1)

m f mod (z m 1)

3. TWISTED RADIX-2 IFFT

99

where f = fA xm + fB is the desired output 2m f ( (j)/(2m) z) mod (z 2m 1).


By applying the interpolation step of the classical radix-2 inverse algorithm with
b = 1, the formulas

fA
fB

= fY fZ

= fY + fZ

can be derived to determine the two components of the output. Observe that there
are no multplications required in this part of the interpolation step.
EXAMPLE.
The algorithm is initialized with f ( (j) ) = f ( (j) z) mod (z 1) for all j
in the range 0 j < n. By recursively applying the interpolation step to these
results, we obtain n f (z) mod (z n 1) = n f (z) if f (z) has degree less than n.
By multiplying each coefficient of this result by 1/n, the desired polynomial f (z)
is recovered. The following butterfly diagram shows how the process works for an
inverse FFT of size 8.
We now provide a complete butterfly diagram for this FFT of size 8.

n f0

n f1

f ( 0 )

n f2

f ( 4 )

n f3

2
+

f ( 2 )

f ( 6 )

n f4

n f5

n f6

n f7

f ( 1 )

f ( 5 )

f ( 3 )

f ( 7 )

Pseudocode for this IFFT algorithm is given in Figure 2. The cost analysis
of this algorithm is again left as an exercise for the reader. The total number of
operations to compute the twisted radix-2 IFFT of size n is

100

5. INVERSE FFT ALGORITHMS

Algorithm : Twisted radix-2 IFFT


Input: The evaluations f ( (j2m+0) ), f ( (j2m+1) ), . . . , f ( (j2m+2m1) )
of some polynomial with coefficients in a ring R.
Here R has a nth root of unity , and m is a power of 2 where 2m n.
Output: 2m f ( (j)/(2m) z) mod (z 2m 1)
0. If (2m) = 1 then return f ( (j) z) mod (z 1) = f ( (j) )
1. Compute the IFFT of f ( (j2m+0) ), f ( (j2m+1) ), . . . , f ( (j2m+m1) )
to obtain fY = m f ( (2j)/m z) mod (z m 1)
2. Compute the IFFT of f ( (j2m+m) ), f ( (j2m+m+1) ), . . . , f ( (j2m+2m1) )
to obtain m f ( (2j+1)/m z) mod (z m 1)
3. Twist m f ( (2j+1)/m z) mod (z m 1) by (1)/m
to obtain fZ = m f ( (2j)/m z) mod (z m + 1)
4. Compute fA = fY fZ
5. Compute fB = fY + fZ
6. Return (2m) f ( (j)/(2m) z) mod (z 2m 1) = fA z m + fB
Figure 2. Pseudocode for twisted radix-2 IFFT

n n
M (n) = 2 M
+ 1
2
 n2
+n
A(n) = 2 A
2
where M (1) = 0 and A(1) = 0. After solving these recurrence relations and including the extra n multiplications to unscale the final result of this algorithm, the
formulas

M (n) =
A(n)

1
n log2 (n) + 1
2
n log2 (n)

are obtained. This algorithm has the exact same operation count as the classical
radix-2 IFFT algorithm and differs from the companion FFT algorithm only in
the n extra multiplications needed to undo the scaling introduced by ignoring the
factors of 1/2 in the interpolation steps.

4. Other 2-adic inverse FFT algorithms


The main purpose of the previous section was to show how to construct an
inverse FFT algorithm given any FFT algorithm. Specifically, one should perform
the opposite of each step of the FFT algorithm in reverse order. Generally, this
will involve solving a system of equations to determine the formulas needed to
undo part of the reduction step. Any twisting of polynomials can be undone by
performing a twist with the opposite rotation used in the FFT algorithm.

4. OTHER 2-ADIC INVERSE FFT ALGORITHMS

101

A more advanced mathematical background is needed to solve the systems of


equations involved with the remaining FFT algorithms discussed in Chapter 4.
Those with this background are encouraged to try and develop the appropriate
formulas for the inverse algorithms using this approach. However, we learned in
Chapter 4 that a number of expressions from these formulas could be combined into
a more efficient method of computing the reduction step and butterfly operations
were drawn to summarize these results.
Given a butterfly diagram for an FFT algorithm, another way of obtaining
the steps needed to implement the inverse algorithm is to reverse the direction
of the arrows in the FFT algorithm and record the inverse of any multiplicative
factors involved in this process. We will now give the resulting butterfly diagrams
associated with the other FFT algorithms discussed in Chapter 4 for the case of an
FFT of size 8. These can be used as starting points for writing pseudocode for any
of these inverse FFT algorithms. The cost of any of these inverse FFT algorithms
is the same as the cost of the companion FFT algorithm, plus n multiplications
needed to implement the scaling at the end of the inverse FFT.
Classical radix-4 inverse FFT:

n f0

n f1

n f2

f ( 0 )

f ( 4 )

0
+

n f3

n f4

f ( 2 )

Twisted radix-4 inverse FFT:

f ( 6 )

n f6

n f7

20

30

30

2
+

n f5

20

1
+

f ( 1 )

f ( 5 )

3
+

f ( 3 )

f ( 7 )

102

5. INVERSE FFT ALGORITHMS

n f0

n f1

f ( )

n f2
+

2
+

f ( )

n f3

f ( )

n f4

n f5

1
+

f ( )

f ( )

n f6

f ( )

n f7

f ( )

f ( 7 )

Classical radix-8 inverse FFT:

n f0

n f1

n f2

20

n f3

30

n f4

n f5

n f6

50

60

70

I
+

f ( 0 )

f ( 4 )

f ( 2 )

f ( 6 )

n f7

40

I
+

f ( 1 )

f ( 5 )

f ( 3 )

f ( 7 )

Note: For the n = 8 case, the butterfly diagram of the twisted radix-8 inverse
FFT is the same as that shown above for the classical radix-8 inverse FFT, with
the exception that the factors { 0 , 20 , 70 } should instead appear on the
bottom row of the diagram to implement the twisted polynomial.
Split-radix inverse FFT:

5. THE DUALITY PROPERTY

n f0

n f1

n f2

n f3

f ( 0 )

f ( 4 )

f ( 2 )

f ( 6 )

n f5

I
+

n f4

103

1
+

f ( 1 )

f ( 5 )

n f6

n f7

1
+

f ( 3 )

f ( 7 )

In all of the inverse FFTs considered in this chapter, the input was in scrambled
order while the output was in the natural order. Some inverse FFT algorithms
instead have the input in the natural order {f ( 0 ), f ( 1 ), , f ( n1 ) and the
output in scrambled order. We will address this in more detail later in the text.
The material in the next section will help to explain why there are these different
versions of inverse FFT algorithms.
5. The duality property
If f (z) is a polynomial with complex number coefficients, then the discrete
Fourier transform of f (z) can be represented as a polynomial using

F (w)

= f ( n1 ) wn1 + f ( n2 ) wn2 + + f ( 1 ) w + f ( 0 )

So if Fd is the coefficient of wd in F (w), then

Fd

f ( d ) =

n1
X
j=0

fj ( d )j

for all 0 d < n where is a primitive nth root of unity. The discrete Fourier
transform can be efficiently computed using the FFT algorithms discussed in Chapter 4, but the outputs of the algorithm need to be rearranged according to the
function.
Suppose that we are given F (w), i.e. the evaluation of f (z) at each of the nth
roots of unity and we wish to recover the polynomial f (z). This is the problem of
computing the IFFT and several algorithms were given earlier in this chapter for

104

5. INVERSE FFT ALGORITHMS

determining f (z). In this section, we give an alternative technique for computing


the IFFT.
At the beginning of this chapter, we learned that given F0 , F1 , . . . , Fn1 , we
could recover f using Lagrangian interpolation given by the formula

f (z) =

n1
X
i=0

Fi Li (z)

where Li (z) is the Lagrange interpolating polynomial. In this case, Li (z) is given
by

Li (z) =
=

1
zn 1
i
n
z i

1  i n1
z
+ 2i z n2 + + (n1)i z + 1
n

The above formulas represent a rather complicated method of generating the


polynomial F (w). Fortunately, there is a much easier method. Observe that
1
n
1
+
n
1
+
n
..
.
1
+
n

f (z) =

F0 z n1 + F0 z n2 + + F0 z + F0


F1 z n1 + F1 2 z n2 + + F1 n1 z + F1


F2 2 z n1 + F2 4 z n2 + + F2 2(n1) z + F2

Fn1 n1 z n1 + Fn1 2(n1) z n2

+ + Fn1 (n1)(n1) z + Fn1

Collecting all common terms of z j together, the coefficient of z j in f , denoted by


fj , is given by

fj

1 
Fn1 (n1)(nj) + Fn2 (n2)(nj) +
n

+ F2 2(nj) + F1 nj + F0

1
F ( j )
n

In other words, fj can be determined by simply evaluating F (w) at j and scaling

5. THE DUALITY PROPERTY

105

the result by 1/n. So, f can obtained by computing the FFT of F (w), but replacing
with 1 in the computation.
This relationship between the FFT and IFFT algorithms is known as the duality property. In summary,
Duality Property.
Suppose that f (z) is a function and F (w) is the polynomial representation of the discrete Fourier transform of f (z).
(1). If F (w) is computed with an FFT algorithm with input
f (z) and primitive root of unity , then the IFFT of F (w) can be
computed with the same FFT algorithm, but with replaced by 1 .
(2). If f (z) can be computed with an IFFT algorithm with
input F (w) and primitive root of unity 1 , then the FFT of f (z)
can be computed with the same IFFT algorithm, but with 1
replaced by .

It should be noted that in part 1 of the duality property above, the final result
of the FFT computation needs to be scaled by 1/n to produce f (z). In part 2 of
the duality property, we will assume that the IFFT algorithm does not include the
scaling by 1/n and that this scaling should not be implemented to determine the
FFT of f (z).
EXAMPLE
The only drawback to the reuse of an FFT algorithm to compute the IFFT
is that any outputs of an FFT algorithm must be unscrambled according to the
function before being used as inputs to the FFT algorithm to compute the IFFT.
The outputs of the IFFT computation must be also be unscrambled to produce
the coefficients of f (z) in the correct order. At most 2n additional operations are
needed to complete these two shufflings of the algorithm data.
If the extra operations needed to reuse an FFT algorithm is too expensive to
justify having one algorithm to compute both the FFT and inverse FFT, then one
can construct a separate IFFT algorithm by essentially performing the inverse of
the steps of any FFT algorithm in reverse order as discussed in the previous sections
of this chapter. Neverthesless, duality is an important property of the FFT as we
shall see in Chapter 7.

CHAPTER 6

Polynomial multiplication algorithms


1. Classical Multiplication
In Chapter 2. we considered the following technique for multiplying two polynomials

Classical multiplication.
Suppose that f and g are two polynomials that we wish to multiply.
Select one of the polynomials, say f . Then multiply each term of f
by every term of g and collect like terms.

In an introductory algebra course, problems with only two or three terms in the
input polynomials are usually considered. This is because if input polynomial has
n terms, then a total of n n = n2 terms need to be computed to obtain the output polynomial and it requires significant effort to perform the computation for
values of n higher than those considered in the introductory algebra course. The
following example shows how much effort is required to compute the product of two
polynomials with four terms.

EXAMPLE

The product h(x) = f (x) g(x) where


f (x) = 4x3 + 3x2 + 2x + 1 and
g(x) = x3 + x2 + x + 2 is given by
h(x)

= (4x3 + 3x2 + 2x + 1) (x3 + x2 + x + 2)

= 4x3 x3 + 4x3 x2 + 4x3 x + 4x3 2


+ 3x2 x3 + 3x2 x2 + 3x2 x + 3x2 2
+ 2x x3 + 2x x2 + 2x x + 2x 2
+ 1 x3 + 1 x2 + 1 x + 1 2

= 4x6 + 4x5 + 4x4 + 8x3 + 3x5 + 3x4 + 3x3 + 6x2


+ 2x4 + 2x3 + 2x2 + 4x + x3 + x2 + x + 2
= 4x6 + 7x5 + 9x4 + 14x3 + 9x2 + 5x + 2

107

108

6. POLYNOMIAL MULTIPLICATION ALGORITHMS

Another way of organizing this work arranges the like terms in columns. This
technique resembles multiplication of integers and is illustrated by the following
example.

The product of f (x) = 4x3 + 3x2 + 2x + 1 and


g(x) = x3 + x2 + x+ 2 can also be computed as follows:

EXAMPLE

4 x3 + 3 x2 + 2 x + 1
1 x3 + 1 x2 + 1 x + 2
8 x3 + 6 x2 + 4 x + 2
4 x + 3 x3 + 2 x2 + 1 x
4 x5 + 3 x4 + 2 x3 + 1 x2
6
4 x + 3 x5 + 2 x4 + 1 x3
4

4 x6 + 7 x5 + 9 x4 + 14x3 + 9 x2 + 5 x + 2


Still yet another method of implementing classical multiplication generates the


terms of the product polynomial one at a time.

Classical multiplication (alternate version).


Suppose that
= fn1 xn1 + fn2 xn2 + + f1 x + f0

f (x)

= gn1 xn1 + gn2 xn2 + + g1 x + g0

g(x)

are two polynomials that we wish to multiply and the product of


these polynomials is given by
h(x)

h2n2 x2n2 + h2n1 x2n1 + + h1 x + h0

Then for any k in 0 k < 2n:


hk

n1
X
i=0

fi gki

Here, we will assume that gj = 0 for all j < 0.

1. CLASSICAL MULTIPLICATION

EXAMPLE

109

Let f (x) = 4x3 + 3x2 + 2x + 1 and let


g(x) = x3 + x2 + x + 2.
Then
f3 = 4

g3 = 1

f2 = 3
f1 = 2

g2 = 1
g1 = 1

f0 = 1

g0 = 2

and the product f (x) g(x) computed using the


alternative technique is given by
h0
h1

=
=

f0 g0 + f1 g1 + f2 g2 + f3 g3
12+01+01+01

f1 g0 + f0 g1 + f1 g2 + f2 g3

=
h2

=
=

h3

=
=

h4

=
=

h5

=
=

h6

=
=

22+11+01+01

f2 g0 + f1 g1 + f0 g2 + f1 g3
32+21+11+01

f3 g0 + f2 g1 + f1 g2 + f0 g3
42+31+21+11

f4 g0 + f3 g1 + f2 g2 + f1 g3
02+41+31+21

f5 g0 + f4 g1 + f3 g2 + f2 g3

= 2

= 5

= 9

= 14

= 9

02+01+41+31

= 7

f6 g0 + f5 g1 + f4 g2 + f3 g3
02+01+01+41

= 4

Thus, h(x) = 4x6 + 7x5 + 9x4 + 14x3 + 9x2 + 5x + 2.




Observe that the computation of the product with the alternative version of
classical multiplication matches the product computed with the traditional classical
multiplication method. The traditional classical multiplication method is recommended for computing the entire product polynomial; the alternative method is
recommended if just a few terms of the product polynomial are to be computed.
For those experienced with computer programming, this is because there are fewer
loops involved with the first method and there is a slight cost associated with each
loop in terms of the time required to perform the multiplication.

110

6. POLYNOMIAL MULTIPLICATION ALGORITHMS

All of the methods discussed in this section are quadratic multiplication methods because the effort needed to multiply two polynomials using these techniques is
proportional to the square of the maximum degree of the input polynomials. One
can show that each technique requires n2 coefficient multiplications and n2 2n + 1
coefficient additions to implement the polynomial multiplication. These derivations
are left as exercises. The rest of this chapter will explore several faster methods of
multiplying two polynomials.

2. Karatsuba multiplication
Suppose that we wish to multiply the two polynomials f1 x + f0 and g1 x + g0 .
If the classical multiplication method discussed in the previous section was used,
we would obtain the product f1 g1 x2 + (f1 g0 + f0 g1 ) x + f0 g0 . This requires 4
coefficient multiplications to compute the result.
Suppose we compute

(f1 + f0 ) (g1 + g0 ) = f1 g1 + f1 g0 + f0 g1 + f0 g0

The sum of the middle two terms of this expression is the coefficient of x in the
above polynomial product. The other two terms are the coefficient of x2 and the
constant term. Then an alternative method for computing the middle term of the
above product polynomial is

(f1 + f0 ) (g1 + g0 ) f1 g1 f0 g0

So one can compute the coefficient of x2 and the constant term as before and then
use this formula to determine the coefficient of x.

2. KARATSUBA MULTIPLICATION

EXAMPLE

111

Suppose that we wish to compute the product of


f (x) = 2x + 1 and g(x) = x + 2 using Karatsuba
multiplication.
Observe that f1 = 2, f0 = 1, g1 = 1, and g0 = 2.
Next, compute (f1 + f0 ) = 2 + 1 = 3 and
(g1 + g0 ) = 1 + 2 = 3.
Then determine the intermediate products
f1 g1
f0 g0

=
=

(f1 + f0 ) (g1 + g0 ) =

21 = 2
12 = 2

33 = 9

Now, (f1 + f0 ) (g1 + g0 ) f1 g1 f0 g0 is 9 2 2 = 5


and the desired product is
(f1 g1 )x2 +((f1 +f0 )(g1 +g0 )f1 g1 f0 g0 )x+f0 g0
or 2x2 + 5x + 2.

This technique reduces the number of coefficient multiplications from 4 to 3,
but the number of coefficient additions and subtractions required has increased. If
a coefficient multiplication is more expensive than a coefficient addition (such as
when the coefficients are complex numbers), this is a good trade. If the coefficients
are real numbers, then we have not yet improved classicial multiplication.
The real power of Karatsuba multiplication is that the above idea can be used
recursively. Let f (z) and g(z) be two polynomials of degree less than 2m which can
have complex number coefficients and split each polynomial into two polynomials
of degree less than m. Thus,

f (z) =
g(z) =

fA z m + fB
gA z m + gB

Observe that fA , fB , gA and gB are each polynomials of degree less than m. The
polynomials f (z) and g(z) can be multiplied into the product polynomial h(z) using

h(z) =
=
=

f (z) g(z)

(fA z m + fB ) (gA z m + gB )

fA gA z 2m + (fA gA + gB fA ) z m + fB gB

A total of four products of two polynomials of degree less than m are required for
this computation. However, the middle term can also be computed using:

112

6. POLYNOMIAL MULTIPLICATION ALGORITHMS

fB gA + gB fA

(fA + fB ) (gA + gB ) fA gA fB gB

So h(z) can be computed by

h(z) =

fA gA x2m + ((fA + fB ) (gA + gB ) fA gA fB gB ) xm


+fB gB

with only three products of two polynomials of degree less than m. Here, fA gA ,
fB gB , and (fA + fB ) (gA + gB ) can be computed with recursive calls to either
Karatsuba or classical multiplication. When each input polynomial has two terms,
the product can be computed using the technique described at the beginning of
this section and no more recursion is necessary.

2. KARATSUBA MULTIPLICATION

EXAMPLE

113

Suppose that we wish to compute the product of


f (x) = 4x3 + 3x2 + 2x + 1 and
g(x) = x3 + x2 + x + 2 using Karatsuba multiplication.
Construct the polynomials
fA
fB

=
=

4x + 3
2x + 1

gA
gB

=
=

x+1
x+2

by splitting each input polynomial into two parts.


Note that f (x) = fA x2 + fB and g(x) = gA x2 + gB .
Next, compute (fA +fB ) = (4x+3)+(2x+1) = 6x+4
and (gA + gB ) = (x + 1) + (x + 2) = 2x + 3.
The intermediate products
fA gA

fB gB
(fA + fB ) (gA + gB )

= 4x2 + 7x + 3
= 2x2 + 5x + 2
= 12x2 + 26x + 12

can be computed using either classical multiplication


or by recursively applying Karatsuba multiplication.
The first of these products was computed in a previous
example. The other two products are left as exercises.
Now, (fA + fB ) (gA + gB ) fA gA fB gB
is computed as follows:
12x2 + 26x + 12
4x2 + 7x + 3
2x2 + 5x + 2
6x2 + 14x + 7
Since m = 2 in this example, we multiply this result
by x2 to obtain 6x4 + 14x3 + 7x2 and add it to
(fA gA ) x4 = 4x6 + 7x5 + 3x4 and fB gB to obtain
the desired product.
4x6 + 7x5 + 3x4
+ 6x4 + 14x3 + 7x2
+ 2x2 + 5x + 2
4x6 + 7x5 + 9x4 + 14x3 + 9x2 + 5x + 2

It can be shown that Karatsubas method requires

114

6. POLYNOMIAL MULTIPLICATION ALGORITHMS

M (2n 1) =

nlog2 (3) n1.585

coefficient multiplications and

A(2n 1) =

6 nlog2 (3) 8n + 2 6 n1.585 8n + 2

coefficient additions to multiply two polynomials of degree less than n into a polynomial of degree less than 2n1. For even moderate-sized polynomials, this technique
is superior to classical multiplication.
3. FFT-based multiplication
An even faster technique for multiplying two polynomials is based on the observation that

h() =

f () g()

where is any element from the number system used for the coefficients of f , g,
and h.
To multiply polynomials f and g of degree less than n into a polynomial of
degree less than 2n, one could choose any selection of 2n points and evaluate f
and g at each of these points. The above formula can then be used to obtain the
evaluation of h at each of these points using only 2n coefficient multiplications.
These 2n evaluations can then be interpolated into the desired product polynomial
h. In summary:

FFT based multiplication.


Suppose that f (z) and g(z) are two polynomials that we wish to
multiply into h(z) = f (z) g(z). Follow the following steps:
1. Evaluate f (z) at n points.
2. Evaluate g(z) at the same n points.
3. Pointwise multiply f () g() to obtain h() for each of the points.
Here, simply represents one of the points.
4. Interpolate the n evaluations of h(z) into h(z).

3. FFT-BASED MULTIPLICATION

115

EXAMPLE
Based on the multipoint evaluation and interpolation techniques from Chapter
2, this technique will be comparable in performance to classical multiplication and
will likely be less efficient than Karatsubas multiplication if an arbitrary collection
of points is used. However, we learned that polynomial evaluation and interpolation
can be significantly improved if the powers of a primitive root of unity are used
instead. Since it does not matter what points are used to evaluate and interpolate
the polynomials, it is advantageous to select the powers of the primitive roots of
unity for the computations as use the algorithms of Chapters 4 and 5 for steps 1,
2, and 4.
EXAMPLE
The ordering of the points is not important for the FFT-based multiplication
technique as well. Thus, one can leave the evaluations of f (z) and g(z) in scrambled order since the inverse FFT algorithms of Chapter 5 expect the input to be
in this scrambled order.
The number of operations needed to implement FFT-based multiplication is
equal to 2 times the number of operations required to compute the multipoint
evaluation of a polynomial of degree less than n, plus the number of operations
needed to interpolate n points into a polynomial of degree less than n, plus n
multiplications for the pointwise products. If the powers of a primitive root of
unity are used for the evaluation points, then the effort needed for the computation
is roughly 3 times the effort needed to compute an FFT of size n.

CHAPTER 7

The engineering perspective


1. The Discrete Fourier Series
The previous four chapters were designed to introduce the reader to the version
of the Fast Fourier Transform used by computer algebraists and those that study
certain types of error-correcting codes in Electrical Engineering. The more popular
view of the FFT, however, is that which was introduced at the very beginning of the
text. In this first chapter, we introduced the concept of the Discrete Fourier Series
and how it could be used to approximate an analog signal over a finite interval.
In this setting, recall that the Fast Fourier Transform was defined as an efficient
method of interpolating a sequence of signal samples into the coefficients of this
Discrete Fourier Series.
The example in this introductory chapter was specially chosen to be symmetric
about the vertical axis so that the presentation would avoid the use of complex
variables. Now that the complex numbers have been studied in sufficient detail in
Chapter 3, we will now more carefully present the concept of the Discrete Fourier
Series, allowing for the possibility of sine functions which can appear as components
of the series for signals in general.
Suppose that we are given a discrete-time signal x( ) which consists of n samples over a finite interval. 1 It is possible to construct a function called the discrete
Fourier series which matches x( ) over this interval where this function is given by

Discrete Fourier Series.


x( )

a0 +

n2
X

k=1

(ak cos(2k/n ) + bk sin(2k/n ))

where n2 =

n/2
if n is even
(n 1)/2 if n is odd

Note that since the above function consists entirely of sine and cosine functions,
then x( ) will be a periodic function with a period of n samples. Since the original
1The use of the notation x( ) is more traditional than f ( ) in engineering literature, so we
will switch to this convention in this chapter.
117

118

7. THE ENGINEERING PERSPECTIVE

function is nonzero only over a finite interval, the discrete Fourier series will match
this function at all of the points of interest and will repeat this pattern of signal
samples outside of the specified interval.
EXAMPLE.
In chapter 1, we mentioned that the Discrete Fourier Series consisted of at
most n sinusoids. Let us more carefully investigate this claim in light of the above
formula for the Discrete Fourier Series.
First, we can interpret a0 as a0 cos(0 ) where cos(0 ) = 1 for all integer
values of . We can also view b0 (which does not appear in the above formula)
as b0 sin(0 ). This term does not appear because sin(0 ) = 0 for all integer
values of and thus does not contribute anything to the series. The following figure
illustrates these properties for these k = 0 terms where n = 8.

cos(0 )

sin(0 )

The component where k = n/2 when n is even also deserves some additional explanation. In this case, then bn2 = 0 because sin( ) = 0 for all integer values of .
The following figure illustrates this property for the case where n = 8 and n2 = 4.

sin( )

So with these two explanations, we can clearly see that there will always be at
most n sinusoids in the discrete Fourier series. The only way that there will be less
than n sinusoids is when one or more of the coefficients in the above formula turns
out to be zero. Because the frequency of each of the sinusoids in the above formula
is a multiple of 2/n, one can say that these sinusoids are harmonically related.

1. THE DISCRETE FOURIER SERIES

119

If n = 8, the following figure gives the eight functions involved in the discrete
Fourier series.

cos(0 )

cos(/4 )

sin(/4 )

cos(/2 )

sin(/2 )

cos( )

cos(3/4 )

sin(3/4 )

120

7. THE ENGINEERING PERSPECTIVE

But why is the Discrete Fourier Series comprised of sinusoids where k is restricted to the interval 0 k n/2? We will now explore several properties of
sinusoids to help us answer this question.
First, the cosine function has the property that it is an even function. In other
words,
Even function.
A function f (z) is said to be an even function if
f (z) =

f (z)

In

for all inputs z to the function.


the case of the cosine function cos( ) where is any constant, then cos( ) =
cos( ) for all inputs to the function. This is similar to the concept of the
even sequence polynomial introduced in Chapter 4, but involves the outputs of the
function rather than polynomial coefficients.
If a term with n = 8 and k = 1 were to be added to the Discrete Fourier
Series considered above, then we would have
1

cos(/4 )

cos(/4 )

Observe that this is the same sinusoid as n = 8 and k = 1 and does not contribute
anything new to the series.
Next, the sine function has the property that it is an odd function. In other
words,
Odd function.
A function f (z) is said to be an odd function if
f (z) = f (z)

In

for all inputs z to the function.

the case of the sine function sin( ) where is any constant, then sin( ) =
sin( ) for all inputs to the function. This is similar to the concept of the
odd sequence polynomial introduced in Chapter 4, but again involves the outputs
of the function rather than polynomial coefficients.

1. THE DISCRETE FOURIER SERIES

121

If a term with n = 8 and k = 1 were to be added to the Discrete Fourier


Series considered above, then we would have

sin(/4 )

sin(/4 )

Observe that this is the negative of the function where n = 8 and k = 1. The
coefficient associated with the leftmost sinusoid can be multiplied by -1 and can be
absorbed into the coefficient associated with the rightmost sinusoid.
So, sinusoid components with negative values of k do not contribute anything
new to the Fourier series. Although not traditional, we could have just as easily
defined the Discrete Fourier series to consist of sinusoids associated with values of k
in the interval n/2 k 0. The point is that only representative of each type of
sinusoid should be included in the Discrete Fourier Series. Because the traditional
method is also the simplest to work with, we will continue to define the Discrete
Fourier Series with sinusoids associated with values of k in the interval 0 k n/2.
Next, we will explore why there are no terms in the Fourier series greater than
or equal to n.
Let K = k + dn for any integer d. If cos(2K/n ) is only evaluated at integer
values of , then cos(2K/n ) = cos(2k/n ). For example, let n = 8, k = 0,
and K = k + 1 n = 8.

cos(0 )

cos(2 )

Similarly, if sin(2K/n ) is only evaluated at integer values of , then


sin(2K/n ) = sin(2k/n ). For example, let n = 8, k = 1, and K = k+1n = 9.

122

7. THE ENGINEERING PERSPECTIVE

sin(/4 )

sin(9/4 )

So, any terms with values of k which are n or higher will also not contribute anything
new to the discrete Fourier series.
The phenonemon where a sampled version of sin(9/4 ) looks like the function
sin(/4 ) is a called aliasing. This results when a particular signal is not sampled
often enough to accurately capture the characteristics of the signal. For example,
if a video camera does not sample the image of a car driving on a highway quickly
enough, it may appear on a television as if the wheels of the car are going backwards
when the car is driving forwards.
Now, let us combine the properties considered so far to handle the cases where
n/2 < k n. Let K = (n k). If cos(2K/n ) is only evaluated at integer values
of , then cos(2K/n ) = cos(2k/n ). For example, let n = 8, k = 3 and
K = n k = 5.

cos(3/4 )

cos(5/4 )

Similarly, if sin(2K/n ) is only evaluated at integer values of , then sin(2K/n


) = sin(2k/n ). For example, let n = 8, k = 3 and K = n k = 5.

1. THE DISCRETE FOURIER SERIES

123

sin(3/4 )

sin(5/4 )

So, none of the cosine or sine terms with k in the range n/2 < k n contributes
anything new to the discrete Fourier series either.
So, every discrete Fourier series associated with a discrete function with period
of n samples has exactly n terms in the series. In Chapter 1, we mentioned that
the coefficients of the discrete Fourier series can be computed using the Discrete
Fourier Transform. In the next section, we will more carefully explore this claim
using the Fast Fourier Transform. For this section, we will simply state the discrete
Fourier series associated with the x( ) function used in the above examples.
EXAMPLE.
There is another way of expressing the Discrete Fourier Series that is sometimes
used by engineers. It can be shown that for a given n and any 0 k n/2, that

ak cos(2k/n ) + bk sin(2k/n )

ck cos(2k/n + k )

where
c2k
tan()

=
=

a2k + b2k
bk /ak

With these results, we obtain the following:


Discrete Fourier Series (alternative representation).
x( )

= a0 +

n2
X

k=1

ck cos(2k/n + k )

where n2 =

n/2
if n is even
(n 1)/2 if n is odd

124

7. THE ENGINEERING PERSPECTIVE

There are fewer terms in this Discrete Fourier series, but the same number of
unknowns. Observe that ak and bk in the original series have been replaced with
ck and k . Also, observe that when bk = 0, then k = 0.
EXAMPLE
2. A mathematicians perspective of the engineers FFT
In this section, we are going to show that the discrete Fourier series representation of a given sequence of samples can be computed with the FFT algorithms
discussed in the previous chapters. To support this claim, we need to derive yet
another representation of the discrete Fourier series that involves complex numbers.
Eulers identity and the properties of sinusoids discussed in the previous section
can be used to show that
eI2k/n + eI2(nk)/n
2
eI2k/n eI2(nk)/n
= I
2

cos(2k/n )

sin(2k/n )

for all integer values of . Substituting these results into the formula for the discrete
Fourier series given in the previous section for an even value of n, we obtain

x( )

n1
X
k=0

ck e2k/n I

where ck are complex numbers given by

c0

= a0

ck

= 1/2 (ak + I bk )
= an/2

cn/2
ck

for 0 k n/2 1

= 1/2 (ank I bnk )

for n/2 + 1 k n 1

Observe that ck and cnk are complex conjugates if ak and bk are real numbers.
EXAMPLE
Let us define the function

f (z) =

n1
X
k=0

fk z k

2. A MATHEMATICIANS PERSPECTIVE OF THE ENGINEERS FFT

125

where fk = ck for all 0 k n. Next, let = e2I/n = 1(360o/n) which is a


primitive nth root of unity. Observe that evaluation of x( ) at = 0, 1, , n 1
is equivalent to evaluating f (z) at each of the n powers of .
We now have the connection between the computation of the coefficients of the
discrete Fourier series and the algorithms discussed in Chapter 4. If one is given
{x(0), x(1), , x(n 1)} which is equivalent to {f ( 0 ), f ( 1 ), , f ( n1 )}, then
these samples can be interpolated into x( ) using any IFFT algorithm discussed in
Chapter 5.
EXAMPLE
But this is not the approach typically taken by the engineer. Instead, he or she
defines the discrete Fourier transform using the formula

X(k) =

n1
X
=0

x( ) W k

where W = eI2/n and computes X(0), X(1), . . . , X(n 1) using one of the FFT
algorithms discussed in Chapter 4. Note that W , the primitive root of unity in this
approach, is equal to 1 . The duality property discussed at the end of Chapter 5
explains why the engineers FFT with primitive root of unity W = 1 is essentially
equivalent to the mathematicians IFFT with primitive root of unity .
EXAMPLE
Observe that X(d) = n cd for all 0 d < n. In other words, X(d) is the coefficient
of eI2/n in the discrete Fourier series, but scaled by n. This is because the
engineer does not scale the output of their FFT by 1/n which is typically required
at the end of the mathematicians IFFT interpolation algorithm.
The engineer uses the IFFT algorithms discussed in Chapter 5 to evaluate
n x( ) at each in {0, 1, n 1}. Here, W 1 = = eI2/n is used in
the IFFT algorithm. The principle of duality can again be used to explain the
application of the IFFT as an evaluation algorithm. The output of this algorithm
is n x(0), n x(1), , n x(n 1). These results are typically scaled by 1/n to
produce the desired evaluations.
EXAMPLE
In summary, the engineer views the FFT as an interpolation algorithm and
the IFFT as an evaluation algorithm for the discrete Fouier series. The property of
duality is used to relate this perspective to the mathematical views of the algorithms
discussed in the earlier chapters.

126

7. THE ENGINEERING PERSPECTIVE

3. The two types of engineering FFT algorithms


We learned in Chapter 4 that there are two types of FFT algorithms from
the mathematicial perspective: the classical algorithms based on the Cooley-Tukey
reduction step and the twisted algorithms based on the Gentleman-Saude reduction
step. In this section, we will discuss the two types of algorithms from the engineering
perspective and show that they are closely related to the classical and twisted
algorithms.
Recall the engineers definition of the discrete Fourier transform:

n1
X

X(k) =

=0

x( ) W k

where W = 1 = eI2/n. Since x( ) represents the signal samples, we will refer


to this function as being in the time domain. Similarly, since X(k) is proportional
to the coefficients of the discrete Fourier series, we will refer to this function as being
in the frequency domain since this series is essentially a summation of sinusoids.
We can decompose the problem of finding X(k) into two subproblems as follows:

X(k) =

n/21

=0
n/21

=0

x(2 ) W k(2 ) +

n/21

=0

x(2 + 1) W k(2 +1)


n/21

x(2 ) (W 2k ) + W k

=0

x(2 + 1) (W 2k )

Also, observe that

X(k + n/2) =

n/21

=0

x(2 ) (W 2k ) W k

n/21

=0

So, to compute X(k), one can first compute

n/21

Xa (k)

=0

x(2 ) (W 2k )

n/21

Xb (k)

=0

and then combine these results using

x(2 + 1) (W 2k )

x(2 + 1) (W 2k )

3. THE TWO TYPES OF ENGINEERING FFT ALGORITHMS

127

X(k) = Xa (k) + W k Xb (k)

X(k + n/2) = Xa (k) W k Xb (k)

We see here that the input (time domain) sequence has been separated or
decimated into two subsequences, one consisting of the even-indexed samples and
the other consisting of the odd-indexed samples. By recursively applying this reduction of the computation of the discrete Fourier series, the so-called decimationin-time (DIT) FFT is derived.
But how does this FFT relate to those studied in Chapters 4 and 5? Let us
construct a diagram of this algorithm for the case where n = 8. First, let us reduce
the problem of computing the FFT of size 8 into two subproblems of size 4.

x(0)

x(2)

x(4)

x(6)

x(1)

x(3)

DFT
size=4

X(0)

X(1)

x(5)

x(7)

DFT
size=4

X(2)

X(3)

W0

W1

W2

W3

X(4)

X(5)

Now, reduce the problems of size 4 into subproblems of size 2

X(6)

X(7)

128

7. THE ENGINEERING PERSPECTIVE

x(0)

x(4)

x(2)

DFT
size=2

x(1)

DFT
size=2

W2

X(1)

x(3)

W0

W2

W0

W1

W2

W3

X(3)

x(7)
DFT
size=2

X(2)

x(5)
DFT
size=2

W0

X(0)

x(6)

X(4)

X(5)

X(6)

X(7)

The following diagram shows the operations needed for the entire computation.
Here W has been replaced by 1 .

x(0)

x(4)

x(2)

X(0)

x(6)

x(1)

X(1)

X(2)

x(5)

x(3)

X(3)

x(7)
0

X(4)

X(5)

X(6)

X(7)

This is essentially the twisted radix-2 IFFT algorithm covered in Chapter 5.


This connection allows the mathematician to view the construction of X(k) as an

3. THE TWO TYPES OF ENGINEERING FFT ALGORITHMS

129

interpolation process. The repeated decimation of the input sequence is equivalent


to scrambling this sequence according to the (binary reversal) function.
Suppose that we rearrange the input sequence into the unscrambled order and
carefully preserve all of the operations used in this algorithm. In this case, we
obtain the following diagram for the n = 8 case.

x(0)

x(1)

x(2)

X(0)

x(4)

W0

W0

X(4)

x(7)

W0

W0

W2

W2

W1

X(2)

x(6)

W0

W2

x(5)

W0

W0

x(3)

X(6)

X(1)

W3

X(5)

X(3)

X(7)

This is essentially the classical radix-2 FFT algorithm covered in Chapter 4


where W is used as the primitive root of unity. As expected, the output of this
algorithm is in scrambled form. The property of duality allows this algorithm to
be used for the interpolation of the input samples into X(k).
So, the engineers decimation in time FFT algorithm is essentially equivalent
to the classical FFT algorithm if the input is not scrambled and the twisted IFFT
algorithm if the input is scrambled.
The second algorithm from the engineering perspective decomposes the problem
of finding X(k) into two subproblems as follows:

n/21

X(k) =

=0

n/21

Now,

=0

x( ) W k +
x( ) W k +

n1
X

=n/2
n/21

=0

x( ) W k

x( + n/2) W k( +n/2)

130

7. THE ENGINEERING PERSPECTIVE

n/21

X(2k) =

=0

n/21

=0

x( ) (W 2k ) +
x( ) (W 2k ) +

n/21

=0
n/21

=0

x( + n/2) (W 2k ) +n/2

x( + n/2) (W 2k )

since W nk = 1 for all integer values of k. Also,

X(2k + 1) =

n/21

=0

x( ) (W

n/21

=0

2k+1

n/21

=0

x( ) (W 2k ) W

x( + n/2) (W

n/21

=0

2k+1 +n/2

x( + n/2) (W 2k ) W

So, to compute X(k), one can compute the two discrete Fourier series

n/21

Xa (k) =

=0

x( ) (W 2k )

n/21

Xb (k) =

=0

x( + n/2) (W 2k )

and then combine them using

X(2k) =
X(2k + 1) =

Xa (k) + Xb (k)
Xa (k) Xb (k)

z W z

where z W z means to replace z with W z and simplify after Xa (k) and


Xb (k) have been combined. This accounts for the W factors in the formulas above
and is essentially the twisting operation discussed in Chapter 4.
In this case, the output (frequency domain) sequence has been decimated into
two subsequences, one consisting of the even indexed components of X(k) and the
other consisting of the odd indexed components. By recursively applying this reduction of the computation of the discrete Fourier series, the so-called decimationin-frequency (DIF) FFT is derived.
The following diagram shows the entire computation of the DIF-FFT for the
case where n = 8.

3. THE TWO TYPES OF ENGINEERING FFT ALGORITHMS

x(0)

x(1)

x(2)

x(3)

x(4)

W0

X(0)

W0

W2

W0

X(4)

X(2)

x(5)

W0

X(6)

X(1)

x(6)

W1

131

W2

W3

W0

W2

W0

X(5)

x(7)

X(3)

W0

X(7)

This is essentially the twisted radix-2 FFT algorithm covered in Chapter 4


where W is used as the primitive root of unity. Again, the output of this algorithm
is in scrambled form. The property of duality again allows this algorithm to be
used for interpolating the input samples into X(k).
The mathematician may wish to determine X(k) using an IFFT, viewing the
computation as an interpolation problem. Suppose that we scramble the input sequence of the DIF-FFT algorithm according to the function and carefully preserve
all of the operations used in this algorithm. In this case, we obtain the following
diagram for the n = 8 case. Here, W has been replaced by 1 .

x(0)

x(4)

X(0)

x(2)

x(6)

X(1)

X(2)

x(1)

x(5)

X(3)

X(4)

x(3)

x(7)

X(5)

X(6)

X(7)

132

7. THE ENGINEERING PERSPECTIVE

This is essentially the classical radix-2 IFFT algorithm covered in Chapter 5


and we have obtained the second interpolation algorithm for the mathematician.
In summary, the engineers decimation in time FFT algorithm is equivalent to
the classical FFT algorithm if the input is not scrambled and the twisted IFFT
algorithm if the input is scrambled. The engineers decimation in frequency FFT
algorithm is equivalent to the twisted FFT algorithm if the input is not scrambled
and the classical IFFT algorithm if the input is scrambled. The property of duality
provides a bridge between the FFT algorithms defined by the mathematicians and
those defined by the engineer.

4. Convolution
We learned in Chapter 6 that the FFT allows the mathematician to efficiently
compute the product of two polynomials of large degree. In the next section, we
will see that the engineer can use the FFT to efficiently compute a related operation
on two sequences of signal samples.
To introduce the engineering concepts involved, consider the system represented
by the following block-diagram

x( )

y( ) = 2 x( ) + x( 1) + x( 2) + x( 3)

y( )

Here, the notation x( ) is traditionally used to represent the input to a system,


and y( ) is used for the output of a system. The function inside the block specifies
that y( ) is determined by adding up twice the input at the current time index plus
the inputs at the previous 3 time indices. It is assumed that x( ) = 0 for all values
of not specified in the system input.

4. CONVOLUTION

EXAMPLE

133

Let x( ) be the sequence represented by [1, 2, 3, 4]


starting at time index 0. Now y( ) can be determined
by sliding this sequence against the coefficients of the
system function as follows:
4

y(0) :

1
2

y(0) = 2

y(1) = 5

y(2) = 9

2
4

y(1) :

4+1
4
y(2) :

6+2+1

y(3) :

y(3) = 14

8+3+2+1

y(4) :

1
y(4) = 9

4+3+2

y(5) :

1
y(5) = 7

4+3
4
y(6) :

1
y(6) = 4

Observe that we have reversed the coefficients of the


input function since 1 is the first sample to be encountered by the system. So y( ) = [2, 5, 9, 14, 9, 7, 4] 

Another way to characterize the transformation contained in system function


is to give the impulse response of the system. This is the output that results
when the input [1, 0, . . . ] is introduced to the system. The impulse response is
traditionally denoted by h( ). However in this manuscript, we are going to use the
alternative notation ~( ) since h has already been used to represent the output of
a polynomial multiplication.

134

7. THE ENGINEERING PERSPECTIVE

EXAMPLE

The impulse response for the above system is given by


1
h(0) :

h(0) = 2

h(1) = 1

h(2) = 1

2
1
h(1) :

1
1

1
h(2) :

1
1

1
h(3) :

h(3) = 1

So, ~( ) = [2, 1, 1, 1]. Observe that ~(k) is the same


as the coefficient of x( k) in the function contained
in the block of the above system for each k from 0 to
3.


In terms of the impulse response for a given system, the output of the system
for a given input is given by the convolution formula

4. CONVOLUTION

135

Convolution.
Given a system with initial response ~( ), the output of the system
y( ) for the input sequence x( ) is given by

y( )

n1
X

x(k) ~( k)

n1
X

x( k) ~(k)

k=0

or
y( )

k=0

The notation y( ) = x( ) ~( ) is often used to indicate this operation.

Although both of these formulas compute x( ) ~( ), the second formula more


closely matches the calculations given in the examples above.

136

7. THE ENGINEERING PERSPECTIVE

EXAMPLE

Let x( ) = [1, 2, 3, 4] and let ~( ) = [2, 1, 1, 1]. The


convolution x( ) ~( ) is given by
y(0) = x(0) ~(0) + x(1) ~(1)

+ x(2) ~(2) + x(3) ~(3)


= 12+01+01+01

y(1) = x(1) ~(0) + x(0) ~(1)


+ x(1) ~(2) + x(2) ~(3)
= 22+11+01+01

y(2) = x(2) ~(0) + x(1) ~(1)


+ x(0) ~(2) + x(1) ~(3)
= 32+21+11+01

y(3) =
=

= 2

= 5

= 9

x(3) ~(0) + x(2) ~(1)

+ x(1) h(2) + x(0) ~(3)


42+31+21+11

y(4) = x(4) ~(0) + x(3) ~(1)


+ x(2) ~(2) + x(1) ~(3)
= 02+41+31+21

= 14

= 9

y(5) = x(5) ~(0) + x(4) ~(1)

+ x(3) ~(2) + x(2) ~(3)


= 02+01+41+31

y(6) = x(6) ~(0) + x(5) ~(1)


+ x(4) ~(2) + x(3) ~(3)
= 02+01+01+41

= 7

= 4


This calculation is equivalent to the one given in the previous example, but
computed strictly in terms of the convolution formula.
However, there is a much simpler method of computing the convolution. Recall
that in Chapter 6, we discussed two methods of implementing classical multiplication. Observe the similarity between the second method for classical multiplication
and the convolution formula. The easier method of implementing the convolution
is to write x( ) and ~( ) as polynomials and then multiply the polynomials using
the first method of classical multiplication.

4. CONVOLUTION

EXAMPLE

137

Express the sequences x( ) = [1, 2, 3, 4] and


~( ) = [2, 1, 1, 1] as polynomials, i.e.
f (x) =
g(x) =

4 x3 + 3 x2 + 2 x + 1

x3 + x2 + x + 2

and multiply the polynomials using classical multiplication as follows:


4 x3 + 3 x2 + 2 x + 1
1 x3 + 1 x2 + 1 x + 2
8 x3 + 6 x2 + 4 x + 2
4 x4 + 3 x3 + 2 x2 + 1 x
5
4 x + 3 x4 + 2 x3 + 1 x2
4 x6 + 3 x5 + 2 x4 + 1 x3
4 x6 + 7 x5 + 9 x4 + 14x3 + 9 x2 + 5 x + 2
Thus, f (x) g(x) equals
h(x) = 4 x6 + 7 x5 + 9 x4 + 14 x3 + 9 x2 + 5 x + 2
and y( ) = [2, 5, 9, 14, 9, 7, 4].


Now that we have established a connection between convolution and polynomial


multiplication, we can use any of the other techniques from Chapter 6 to perform
the convolution. So, convolutions of small size can be implemented using classical
multiplication, convolutions of moderate size can be implemented using Karatsuba
multiplication, and convolutions of large size can be implemented using FFT multiplication.
FULL CONVOLUTION EXAMPLE
Because the convolution theorem is such an important application of the FFT
for engineers and the approach of Chapter 4 for the FFT is not the approach used
by engineers, a proof of the convolution theorem is given in the appendix using only
the definition of convolution and concepts typically seen in an engineering course.
The following diagram illustrates many of the relationships between fast multiplication and fast convolution using the FFT.

138

7. THE ENGINEERING PERSPECTIVE

Polynomials

Frequency domain

Evaluation

f (z)

g(z)

n h(z)

X(k)

H(k)

Y (k)

FFT

FFT

IFFT

FFT

FFT

IFFT

F (w)

G(w)

H(w)

x( )

h( )

n y( )

Interpolation

Pointwise products

Pointwise products
Evaluations

Time domain

Primitive root of unity:

Primitive root of unity: W = 1

In summary, the procedure of fast multiplication from the mathematicians


perspective can be considered to be the dual of fast convolution from the engineering
perspective.

CHAPTER 8

FFT algorithms for other input sizes


Up to this point, all of the FFT sizes have been a power of 2. In this chapter,
we are going to consider FFT algorithms that work for other input sizes. The first
part of this chapter will focus on FFT algorithms which work with input sizes that
are a power of 3. Next, we will sketch out some of the details for creating FFT
algorithms with input sizes that are a power of 5 using a similar approach as the
radix-3 algorithms. The reader can then create the radix-7 FFT algorithms as well
as radix-p algorithms for any prime value of p. Then, we will discuss how to combine
the radix-2 and radix-3 FFTs into an FFT algorithm that works with input sizes
that are a power of 6. The reader can then use this technique to construct an FFT
algorithm that works on any size of the form 2a 3b 5c 7d if desired.

1. The Ternary Reversal Function


The radix-3 FFT algorithms evaluate a polynomial f (z) at each of the powers of
an nth root of unity for some n = 3k . Like the 2-adic FFT algorithms, the radix-3
algorithms do not give their outputs in the natural order { 0 , 1 , 2 , , n1 },
but rather some scrambled order which we saw in Chapter 7 is a conseqence of the
decimation of the input sequence that occurs during the FFT algorithm. However,
in the radix-3 algorithms, the decimation involves powers of 3 rather than powers
of 2 and the binary reversal function is not appropriate for this setting.
The radix-3 algorithms require one to convert from decimal numbers to the
ternary number system. Here, each place value is a power of 3. For example,
the number (121)3 is used to represent 1 group of nine, 2 groups of three, and 1
group of one. In other words,

(121)3

= 1 32 + 2 31 + 1 30

The notation ()3 is used to indicate that this represents a ternary number.
To convert from decimal to ternary, divide the number by three and record
the remainder. Then divide the quotient by three and again record the remainder.
Continue the process until the quotient is zero. The ternary representation of the
number is the sequence of remainders in reverse order.
139

140

8. FFT ALGORITHMS FOR OTHER INPUT SIZES

EXAMPLE

Let us express 65 in ternary form.


65 3
21 3
73
23

= 21
= 7

R2
R0

=
=

R1
R2

2
0

So in ternary form, 65 = (2102)3.

To convert from ternary to decimal, expand the binary number according to


its place values. Then evaluate the resulting expression, treating all numbers as
decimal values.
EXAMPLE

Let us express (2102)3 in decimal form


(2102)3

=
=
=
=

2 33 + 1 32 + 0 21 + 2 20
2 27 + 1 9 + 0 2 + 2 1
54 + 9 + 0 + 2
65

For the 2-adic FFT algorithms, we used the binary reversal function to give the
order of the outputs. For the radix-3 algorithm, we will use the following related
function to give the order of the outputs.

Ternary reversal function.


Given some integer j expressed in ternary form, i.e.
j = (tk1 tk2 tk3 . . . t2 t1 t0 )3 , the ternary reversal of j
(with respect to n = 3k ), denoted by (j)(n) is obtained by
expressing the trits (ternary digits) of j in reverse order, i.e.
(j)(n) = t0 t1 t2 . . . tk3 tk2 tk1 .

EXAMPLE

Let j = 11 and n = 27 = 33 .
In ternary form, j = (102)3 .
So (j)(27) = (201)3 .
As a decimal number, (j)(27) = 19.

2. CLASSICAL RADIX-3 FFT

141

As with the binary reversal function, leading zeros should be included in the ternary
reversal of the number. Also, if n is the same for many related calculations and it
is understood from the context of the situation what n is, then it is not necessary
to specify n in the notation and one can simply use (j) instead.
Properties of the ternary reversal function.
(1).
(2).
(3).
(4).

(j) = 3 (3j) for j < n/3


(3j + 1) = (3j) + n/3 for j < n/3
(3j + 2) = (3j) + 2n/3 for j < n/3
This function is permutation of the integers {0, 1, 2, , n 1}.

The proof of each of these properties closely follows a related property of the
binary reversal function. Each of these proofs are left as exercises.
EXAMPLE

Let n = 9 = 32 . We are going to compute the ternary


reversal of each number in the range from 0 to 8.
0 = (00)3
1 = (01)3

(0) = (00)3 = 0
(1) = (10)3 = 3

2 = (02)3
3 = (10)3

(2) = (20)3 = 6
(3) = (01)3 = 1

4 = (11)3
5 = (12)3

(4) = (11)3 = 4
(5) = (21)3 = 7

6 = (20)3
7 = (21)3

(6) = (02)3 = 2
(7) = (12)3 = 5

8 = (22)3

(8) = (22)3 = 8


One can verify that the four properties above hold for when n = 9.
2. Classical radix-3 FFT
A radix-3 FFT algorithm evaluates a polynomial of degree less than n at each
of the roots of z n 1 where n = 3k . It can be shown that = 1(360o/n) is a
primitive nth root of unity and that { 0 , 1 , 2 , , n1 } are the n evaluation
points that will be used in the process.
The radix-3 algorithm presented in this section is similar to a technique introduced by Winograd as presented in [33], but is believed to be easier to understand
than the Winograd algorithm. Another version of the radix-3 alorithm [12] requires the inputs to be transformed into a different number system. 1 The radix-3
1This number system is called the slanted complex number system which consists of num

bers of the form A + 3 1 B where A and B are real numbers and 3 1 is the element 1120o

142

8. FFT ALGORITHMS FOR OTHER INPUT SIZES

algorithm discussed in this section requires fewer operations than the one found
in [12] and allows the inputs to remain in the traditional complex number system
presented in Chapter 3.
Let = n/3 so that 3 = 1. Note that also has the property that 2 + +
1 = 0 and that 2 is the complex conjugate of .

1
1

2 =

The input to the reduction step of the classical radix-3 FFT algorithm is given
by f (z) mod (z 3m b3 ) and the output is f (z) mod (z m b), f (z) mod (z m b),
and f (z) mod (z m 2 b). Before discussing the reduction step itself, let us consider
the tree of modulus polynomials.
At the top of the tree is z n 1. Thus, we are going to evaluate f (z) at each
of the nth roots of unity. At each reduction step with input size 3m, let b = (3j)
for some j < n/m. In the previous section, we saw that b3 = ( (3j) )3 = (j) ,
b = n/3 (3j) = (3j)+n/3 = 3j+1 and 2 b = 2n/3 (3j) = (3j)+2n/3 =
3j+2 . So the input modulus polynomial is z 3m (j) and the output modulus
polynomials are z m (3j) z m (3j+1) , and z m (3j+2) . At the bottom of
the tree are z (j) for all 0 j < n.
The modulus polynomial tree when n = 9 is given by:

z9 1
z3 1
0

z3 3
6

z3 6
7

from the traditional complex number system. It can be shown that the complex numbers and
slanted complex numbers are two different ways of representing the same collection of numbers.

2. CLASSICAL RADIX-3 FFT

143

where the 9 points used for the multipoint evaluation are given by

The reduction step is again simple to perform with these modulus polynomials.
Split the input into three blocks of size m by writing f (z) mod (z 3m (j) ) =
fA z 2m + fB z m + fC . Then the outputs are given by
fX

fY

fZ

2(3j) fA + (3j) fB + fC

2 2(3j) fA + (3j) fB + fC

2(3j) fA + 2 (3j) fB + fC

The reduction step can also be expressed in matrix form as

or

fX
fY
fZ

2(3j)
2 2(3j)
2(3j)

fX
fY
fZ

1
2

(3j)
(3j)
2 (3j)

1
fA
1 fB
fC
1

2(3j)
1

fA
1 (3j) fB
1
fC

where fA x2m + fB xm + fC = f mod (x3m (j) ).

We are going to compute the reduction step in a special way using the fact
that and 2 are complex conjugates. Express = R + I I and observe that
2 = R I I . Now let fR and fI be defined by
fR

fI

R ( 2(3j) fA + (3j) fB )

I I ( (3j) fB 2(3j) fA )

144

8. FFT ALGORITHMS FOR OTHER INPUT SIZES

Without much difficulty, it can be shown that fY = fR + fC + fI and fZ =


fR +fC fI . To efficiently perform the computations, we first compute fR +fC and
fI . The two results are added to obtain fY and are subtracted to obtain fZ . Finally,
fX is computed using ( 2(3j) fA + (3j) fB ) + fC where 2(3j) fA + (3j) fB
was already computed to determine fR . The Winograd algorithm uses a somewhat
different sequence of operations to produce the three outputs at the same cost. A
butterfly diagram for this sequence of operations to implement the reduction step
is given by

fC

fB
(3j)

fX

fA
2(3j)

fY

fZ

2. CLASSICAL RADIX-3 FFT

EXAMPLE

145

Suppose that we are trying to compute an FFT of size


8 and the input to a reduction step in this process is
f (z) = z 3 + z 2 z = f mod (z 4 4 ). Split this
input polynomial into two blocks of size 2 by writing
f (z) = (z + 1) z 2 + (z). So fA = z + 1 and fB =
z + 0. If b2 = 4 = (1) , then b = 2 = (21) = I.
So b fA = I (z + 1) = I z + I and
fY = b fA + fB

fZ = b fA + fB

= (I z + I) + (z + 0)

= (1 + I) z + I

= (I z + I) + (z + 0)
= (1 I) z I

Suppose that we want to compute the FFT of a polynomial f (z) of degree


less than n = 3k . Then f (z) is equal to f (z) mod (z n 1). We will recursively
apply the above reduction step with appropriate selections of m and b. After all
of the reduction steps have been completed with input size 3m = 3, then we have
f (z) mod (z (j) ) = f ((j) ) for all j < n, i.e. the desired FFT of f (z). In
terms of butterfly operations, the FFT of size 9 is expressed as:

146

8. FFT ALGORITHMS FOR OTHER INPUT SIZES

f8

f7

(3j)

f6

f5

f4

f2

f1

f0

fS

fU

fT

fR

f3

2(3j)

fV

fW

fX

fY

fZ

The FFT of size 9 for the input polynomial f (z) = X considered earlier in this
section is given by NEED UPDATED FIGURE

z 7 + 2z 6 + 3z 5 + z 4 + 2z 3 + 3z 2 + 2z + 1

3z 3 + 5z 2 + 5z + 2

8z + 7

15

3 z3 + I z2 z

2I z 3

3 + 2I

2z+I

3 2I 2 + I

2+I

2zI

2I

2I

Pseudocode for this FFT algorithm is given in Figure 1. It may be helpful to


follow the steps of the algorithm using the example above to understand how the
algorithm works.
Let us now analyze the cost of this algorithm. Line 0 is just used to end the
recursion and costs no operations. Line 1 logicially partitions the input into three

2. CLASSICAL RADIX-3 FFT

147

Algorithm : Classical radix-3 FFT


Input: f mod (z 3m (j) ), a polynomial of degree less than 3m.
Here, m is a power of 3 where 3m n.
Output: f ( (j3m+0) ), f ( (j3m+1) ), . . . , f ( (j3m+3m1) )
0. If (3m) = 1 then return f mod (z (j) ) = f ( (j) )
1. Split f mod (z 3m (j) ) into three blocks fA , fB , and fC each of size m
such that f mod (z 3m (j) ) = fA z 2m + fB z m + fC
2. Compute f = 2(3j) fA and f = (3j) fB .
3. Compute f = f + f and f = f f .
4. Compute fR + fC = R f + fC and fI = I f
5. Compute f mod (z m (3j) ) = f + fC
6. Compute f mod (z m (3j+1) ) = (fR + fC ) + fI
7. Compute f mod (z m (3j+2) ) = (fR + fC ) fI
8. Compute the FFT of f mod (z m (3j) ) to obtain
f ( (j3m+0) ), f ( (j3m+1) ), . . . , f ( (j3m+m1) )
9. Compute the FFT of f mod (z m (3j+1) ) to obtain
f ( (j3m+m) ), f ( (j3m+m+1) ), . . . , f ( (j3m+2m1) )
10. Compute the FFT of f mod (z m (3j+2) ) to obtain
f ( (j3m+2m) ), f ( (j3m+2m+1) ), . . . , f ( (j3m+3m1) )
11. Return f ( (j3m+0) ), f ( (j3m+1) ), . . . , f ( (j3m+3m1) )
Figure 1. Pseudocode for new classical radix-3 FFT

blocks of size m, requiring no operations. In line 2, we multiply the m components


of fA by 2(3j) and the m components of fB by (3j) . Each of these operations
costs m complex multiplications unless j = 0 in which case no multiplications are
required. Line 3 requires 2m complex additions and the cost of line 4 is equivalent
to m complex multiplications. Now, lines 5, 6, and 7 each require m complex
additions to add the two terms in each instruction. The cost of lines 8, 9, and 10 is
equal to the number of operations needed to compute three FFTs of size m. Line
11 costs no operations. Combining these results, the total number of operations
required to compute the FFT using the classical radix-3 algorithm is given by

M (n) =
A(n)

n
3M
+n
3
n 5
3A
+ n
3
3

where M (1) = 0 and A(1) = 0.


We must also subtract multiplications to account for the cases where j = 0. A
recurrence relation giving the number of multiplications saved is

Ms (n) =

Ms

n
3

2
n
3

148

8. FFT ALGORITHMS FOR OTHER INPUT SIZES

where Ms (1) = 0.
Using the technique of substitution discussed earlier for the classical radix2 algorithm, closed-form solutions for the number of operations needed for the
classical radix-3 algorithm is given by

M (n) =
=

A(n)

=
=

n log3 (n) n + 1
1
n log2 (n) n + 1
log2 (3)
0.631 n log2 (n) n + 1
5
n log3 (n)
3
5
n log2 (n)
3 log2 (3)
1.051 n log2 (n)

It can be shown that this is the same number of operations required by the Winograd
algorithm.
Observe that this algorithm is less efficient than any of the algorithms discussed
in Chapter 4, both in terms of the number of multiplications and the number of
additions.

3. Twisted radix-3 FFT


We are now going to consider a twisted version of the radix-3 FFT. The algorithm presented in this section shares many of the same features with an algorithm
presented in [34]. In particular, this algorithm exploits complex conjugate properties of complex numbers as well as the fact that 1 + + 2 = 0 where is the
primitive 3rd root of unity.
Recall that our goal is to compute the evaluation of some polynomial f (z) at
each of the nth roots of unity where n = 3k . In the case of the classical radix-3
algorithm, if the input to the reduction step was in the form f (z) mod (z 3m 1),
then fewer multiplications were required for the reduction. As with the radix-2
algorithm, we are going to use rotation transformations of complex numbers to
make sure that every reduction step input is in this form.
Let f (z) be some polynomial of degree less than 3m that we wish to evaluate at
each of the (3m)th roots of unity. The twisted radix-3 FFT will always receive such
an input f (z) = f (z) mod (z 3m 1) and produce outputs f (z) mod (z m 1),
f (z) mod (z m ), and f (z) mod (z m 2 ). Note that the first output is already
in the form needed to reuse the twisted radix-3 reduction step, but we need to
transform the other two outputs to put them into the proper form.

3. TWISTED RADIX-3 FFT

149

We can interpret the second output as some polynomial that we wish to evaluate
at the solutions to z m = = 1(120o). Using what we learned in Chapter 2, if
we apply the transformation z = 1(120o/m) z to this equation, then it becomes
zm = 1. Recall that this transformation can be viewed as rotating the points
in the complex plane clockwise by 120/m degrees or rotating the axis system of
the complex plane counterclockwise by 120/m degrees. Again, this twisting of the
complex plane is why Bernstein calls this type of FFT the twisted FFT. The
twisted polynomial is again used to implement the transformation.
EXAMPLE
The third output f (z) mod (z m 2 ) can be interpreted as some polynomial that
we wish to evaluate at the solutions to z m = 2 = 1240o. If we apply the
transformation z = 1(240o/m) z to this equation, then it becomes zm = 1. So
now, all of the outputs of the twisted radix-3 reduction step are in the proper form
to be used as inputs for another application of the reduction step. As with the
radix-2 algorithm, it is too complicated to create a new variable each time the
complex plane is rotated, and the notation f ( z) will again be used.
It can be shown that if the input to the reduction step is f (z) = f ( (j)/(3m)
z) mod (z 3m 1) for some j, then the outputs are given by

fX
feY

feZ

=
=
=

f (z) mod (z m 1) = f ( (3j)/m z) mod (z m 1)

f ( (1)/m z) mod (z m 1) = f ( (3j+1)/m z) mod (z m 1)

f ( (2)/m z) mod (z m 1) = f ( (3j+2)/m z) mod (z m 1)

The notation feY and feZ is used to indicate that these are the results of the reduction
step after the the twisting has been completed. The above expressions can be
determined by carefully keeping track of the cumulative rotations of the complex
plane using the new notation and using the fact that (j)/3m = (3j)/m.
We will now present the reduction step of the twisted radix-3 FFT algorithm.
Split f (z) into three blocks of size m, i.e. f (z) = fA z 2m + fB z m + fC . We
need to compute

fX
fY

=
=

fZ

fA + fB + fC
2 f A + f B + f C
f A + 2 f B + f C

and then twist fY and fZ by the amounts indicated above.


It can be shown that = (1)/m is another representation for the twisting
factor used for fY . If the notation (feY )d is used to indicate the term of degree d in
feY after the twisting has been implemented, then

150

8. FFT ALGORITHMS FOR OTHER INPUT SIZES

(feY )d


2 (fA )d + (fB )d + (fC )d d

Since ( (1)/m )m = (1) = n/3 = 2n/3 = (2) = ( (2)/m )m , then

is another value of that can be used to twist fz = f (z) mod (z m 2 )

into f ( z) mod (z m 1). So,


(1)/m

(feZ )d


(fA )d + 2 (fB )d + (fC )d d

The improved algorithm makes use of the fact that (1)/m and (1)/m are
conjugate pairs and that and 2 are complex conjugate pairs, i.e. 2 = . So
the formulas to compute the coefficient of degree d in fX , feY , and feZ are now
(fX )d
(feY )d
(feZ )d

= (fA )d + (fB )d + (fC )d


=
=


(fA )d + (fB )d + (fC )d d

(fA )d + (fB )d + (fC )d d

Several properties of conjugates from Chapter 3 will be used to reduce the


operation count of the reduction step. Since = 1, then we can rewrite the
formula for (feY )d as
(feY )d

= ( (fB fA )d + (fC fA )d ) d

= ( d ) (fB fA )d + d (fC fA )d

Using the property of complex conjugates that states a b = a b, we can similarly


rewrite the formula for (feZ )d as
(feZ )d

d (fB fA )d + d (fC fA )d

Once (feY )d has been computed, then complex conjugate properties can be used to
compute (feZ )d at a reduced cost. For both formulas, we will assume that d has
been precomputed and stored. By applying these concepts for all d in 0 d < m,
the entire reduction step can be computed with fewer operations. The following
example shows the technique involved with computing (feY )d and (feZ )d

3. TWISTED RADIX-3 FFT

EXAMPLE

151

Suppose that
(fB fA )d

d = 140o

(fC fA )d

so that
d

1+I2

3+I4

0.766 + I 0.642

= 1120o 140o

= 1160o
= 0.939 + I 0.342

Also observe that

d = 1 40o

= 1 160

=
=

0.766 0.642 I

0.939 0.342 I

Let us compute the following two expressions


fR

=
=
=
=

fI

=
=
=
=

0.939 (fB fA )d + 0.766 (fC fA )d


0.939 (1 + I 2) + 0.766 (3 + I 4)
(0.939 I1.878) + (2.298 + I3.064)
1.359 + 1.186 I

0.342I (fB fA )d 0.642I (fC fA )d


0.342I (1 + I 2) 0.642I (3 + I 4)
(0.684 I0.342) + (2.568 I1.926)
3.252 + 2.268 I

Finally,
(feY )d

fR + fI

=
=

(1.359 + I1.186) + (3.252 + I2.268)


4.611 + I3.454

=
=

fR fI
(1.359 + I1.186) (3.252 + I2.268)

and
(feZ )d

1.893 I1.082

152

8. FFT ALGORITHMS FOR OTHER INPUT SIZES

The twisted radix-3 FFT algorithm is initialized with f (z) which equals f ( 0
z) mod (z n 1) if f has degree less than n. By recursively applying the reduction
step to f (z), we obtain f ( (j) z) mod (z 1) = f ( (j) 1) for all j in the range
0 j < n. This is the desired FFT of f (z).
The following figure shows how every intermediate result of this FFT calculation
relates to the original input polynomial f (z) for n = 9.

f (z) mod (z 9 1)
f (z) mod (z 3 1)
f ( 0 )

f ( 3 )

f ( 6 )

f ( z) mod (z 3 1)
f ( 1 )

f ( 4 )

f ( 2 z) mod (z 3 1)

f ( 7 )

f ( 2 )

f ( 5 )

f ( 8 )

The next diagram shows the intermediate results of the FFT of f (z) = z 7 +
2z + 3z 5 + z 4 + 2z 3 + 3z 2 + 2z + 1 using the twisted method. One can compare
these results with those given for the computation of the
the classical
FFT using
2/2
+
I

2/2 and 3 =
algorithm
provided
in
the
previous
section.
Here,

2/2 + I 2/2.
6

z 7 + 2z 6 + 3z 5 + z 4 + 2z 3 + 3z 2 + 2z + 1

3z 3 + 5z 2 + 5z + 2

8z + 7

15

3 z3 + I z2 z

2I z 3

3 + 2I

Butterfly diagram:

2z+I

3 2I 2 + I

2+I

2zI

2I

2I

3. TWISTED RADIX-3 FFT

f8

f7

f6

f5

f4

0
R

II0

0
R

II0

f ( 0 )

f ( 3 )

II1

II6

0
R

II0

f ( 6 )

f ( 1 )

2
R

6
R

f3

1
R

f ( 4 )

153

II2

f2

f1

6
R

II6

7
R

6
R

II7

0
R

f ( 7 )

II0

f ( 2 )

8
R

II8

II6

f0

f ( 5 )

6
R

II6

f ( 8 )

Pseudocode for this FFT algorithm is given in Figure ??.


Let us now analyze the cost of this algorithm. Line 0 is just used to end
the recursion and costs no operations. Line 1 logicially partitions the input into
three blocks of size m, requiring no operations. In line 2, we compute the sum
fA + fB + fC at a cost of 2m additions and no multiplications. Next in line 3, we
compute fB fA and fC fA at a cost of 2m subtractions. Line 4 requires 2m 1
multiplications 2 and m complex additions. By the technique used in the example
above, line 5 only requires m complex additions. Finally, the cost of lines 6, 7, and
8 is equal to the number of operations needed to compute three FFTs of size m.
Line 9 costs no operations. The total number of operations to compute the radix-3
FFT of size n using this twisted algorithm is given by

M (n) =
A(n)

n 2
+ n1
3M
3
 n3
3A
+2n
3

where M (1) = 0 and A(1) = 0. The method of substitution to solve these recurrence
relations and obtain the formulas given by

2We can subtract one multiplication for the case where d = 0 and so d = 1.

154

8. FFT ALGORITHMS FOR OTHER INPUT SIZES

Algorithm : Twisted radix-3 FFT


Input: f (x) = f ( (j)/(3m) z) mod (z 3m 1), the modular reduction of
some polynomial f (z) that has been twisted by (j) .
Here, m is a power of 3 where 3m n.
Output: f ( (j3m+0) ), f ( (j3m+1) ), . . . , f ( (j3m+3m1) )
0. If (3m) = 1 then return f ( (j) z) mod (z 1) = f ( (j) )
1. Split f (z) into three blocks fA , fB , and fC each of size m
such that f (z) = fA z 2m + fB z m + fC
2. Compute f ( (3j)/m z) mod (z m 1)
= f (z) mod (z m 1) = fA + fB + fC
3. Compute fB fA and fC fA
4. Compute (feY )d = d (fB fA )d + ( d ) (fC fA )d
for all d in 0 d < m.
Combine the (feY )d s to obtain f ( (3j+1)/m z) mod (z m 1)
5. Compute (feZ )d = d (fB fA )d + d (fC fA )d
for all d in 0 d < m.
Combine the (feZ )d s to obtain f ( (3j+2)/m z) mod (z m 1)
6. Compute the FFT of f ( (3j)/m z) mod (z m 1) to obtain
f ( (j3m+0) ), f ( (j3m+1) ), . . . , f ( (j3m+m1) )
7. Compute the FFT of f ( (3j+1)/m z) mod (z m 1) to obtain
f ( (j3m+m) ), f ( (j3m+m+1) ), . . . , f ( (j3m+2m1) )
8. Compute the FFT of f ( (3j+2)/m z) mod (z m 1) to obtain
f ( (j3m+2m) ), f ( (j3m+2m+1) ), . . . , f ( (j3m+3m1) )
9. Return f ( (j3m+0) ), f ( (j3m+1) ), . . . , f ( (j3m+3m1) )
Figure 2. Pseudocode for improved twisted radix-3 FFT

1
1
2
n log3 (n) n +
3
2
2
1
1
2
n log2 (n) n +
=
3 log2 (3)
2
2
0.421 n log2 (n) 0.5 n + 0.5

M (n) =

A(n)

= 2 n log3 (n)
2
=
n log2 (n)
log2 (3)
1.262 n log2 (n)

Compared to the classical radix-3 algorithm, the number of multiplications has


decreased while the number of additions has increased by roughly the same amount.
As long as a multiplication is at least as expensive as an addition, then the algorithm
presented in this section is more efficient than the classical radix-3 algorithm.
A radix-9 algorithm and a split-radix (3/9) algorithm have been proposed, but
neither is superior to the radix-3 algorithm given in this section. It is left as a

4. RADIX-5 FFTS

155

research problem to try to improve these algorithms and obtain an algorithm with
a lower operation count than the one presented in this section.

4. Radix-5 FFTs
A radix-5 FFT algorithm evaluates a polynomial of degree less than n at each
of the roots of z n 1 where n = 5k . It can be shown that = 1(360o/n) is a
primitive nth root of unity and that { 0 , 1 , 2 , , n1 } are the n evaluation
points that will be used in the process.
Let = n/5 = 172o so that 5 = 1. Note that also has the property that
4 + 3 + 2 + + 1 = 0.

2
1
1
3 = 2
4 =

The input to the reduction step of the classical radix-5 FFT algorithm is given
by f (z) mod (z 5m b5 ) and the output is

fV
fW
fX
fY
fZ

= f (z) mod (z m b)
= f (z) mod (z m b)

= f (z) mod (z m 2 b)

= f (z) mod (z m 3 b)

= f (z) mod (z m 4 b)

The radix-2 and radix-3 FFT algorithms involved a reversal function needed to
determine the values of b in the reduction step. For this particular FFT, we need
the pentary reversal function which involves the base-5 representation of numbers.

156

8. FFT ALGORITHMS FOR OTHER INPUT SIZES

EXAMPLE

Let j = 59 and n = 125 = 53 .


In pentary form, j = (214)5 .
So P(j)(125) = (412)5 .
As a decimal number,
P(j)(125) = 4 52 + 1 51 + 2 50 = 107.

The pentary reversal function has the following properties

b5
b
2 b
3 b
4 b

= ( P(5j) )5 = P(j)
= n/5 P(5j) = P(5j+1)
= 2n/5 P(5j) = P(5j+2)

= 3n/5 P(5j) = P(5j+3)


= 4n/5 P(5j) = P(5j+4)

The proof of each of these properties is left as an exercise for the reader.
At each reduction step with input size 5m, let b = P(5j) for some j < n/m.
Using this selection for b, the input to the reduction step is f (z) mod z 5m P(j)
and the outputs of the reduction step are

fV

fW

fX

fY

fZ

f (z) mod (z m P(5j) )

f (z) mod (z m P(5j+1) )

f (z) mod (z m P(5j+2) )

f (z) mod (z m P(5j+3) )

f (z) mod (z m P(5j+4) )

To perform the reduction step, split the input into five blocks of size m by
writing f (z) mod (z 5m P(j) ) = fA z 4m + fB z 3m + fC z 2m + fD z m + fE .
Then the outputs can be expressed as

fV

fW

fX

fY

fZ

4P(5j) fA + 3P(5j) fB + 2P(5j) fC + P(5j) fD + fE

4 4P(5j) fA + 3 3P(5j) fB + 2 2P(5j) fC + P(5j) fD + fE

3 4P(5j) fA + 3P(5j) fB + 4 2P(5j) fC + 2 P(5j) fD + fE

2 4P(5j) fA + 4 3P(5j) fB + 2P(5j) fC + 3 P(5j) fD + fE

4P(5j) fA + 2 3P(5j) fB + 3 2P(5j) fC + 4 P(5j) fD + fE

or in matrix form as

4. RADIX-5 FFTS

fV
fW
fX
fY
fZ

1
4
3
2

1
3

4
2

1
2
4

2
3
4

157

1
1
1
1
1

4P(5j) fA
3P(5j) fB
2P(5j) fC
P(5j) fD
fE

It is left as an exercise for the reader to determine the most efficient method
to implement the above reduction step. A starting point for this exercise is the
Winograd radix-5 algorithm discussed in [33]. However, one should attempt to
improve the readability of this algorithm as was done for the radix-3 case. The new
radix-5 algorithm should exploit the fact that and 4 are complex conjugates, the
fact that 2 and 3 are complex conjugates, and the fact that 4 +3 +2 ++1 =
0. The reader can also develop pseudocode for the classical radix-5 algorithm and
determine its operation count.
It is also possible to construct a twisted radix-5 algorithm. The input to each
reduction step is the polynomial f (z) = f ( P(j)/(5m) z) mod (z 5m 1) for some
value of j and the outputs are

fV
fW

=
=

fX

fY

fZ

f (z) mod (z m 1)
f (z) mod (z m )

f (z) mod (z m 2 )

f (z) mod (z m 3 )

f (z) mod (z m 4 )

by using the j = 0 case of the classical radix-5 reduction step. Now fV is already
in the form that can be used as input to another application of the twisted radix-5
reduction step, but an adjustment needs to made to each of the other outputs. The
twisted polynomial is again used to rotate the complex roots of unity to put each
output in the required form. It can be shown that the values of for each of the
outputs is given by

fW = f (z) mod (z m ) : = P(1)/m

fX = f (z) mod (z m 2 )
fY = f (z) mod (z m 3 )

fZ = f (z) mod (z m 4 )

: = P(2)/m

: = P(3)/m = P(2)/m
: = P(4)/m = P(1)/m

Note that two values for were given for fY and fZ . By adapting the twisted
algorithm given for the radix-3 case, the reader should be able to create a twisted
radix-5 algorithm that exploits the fact that P(1)/m and P(1)/m are complex
conjugate pairs as well as the fact that P(2)/m and P(2)/m are complex conjugate

158

8. FFT ALGORITHMS FOR OTHER INPUT SIZES

pairs. These details are left for the reader to resolve. The reader is also encouraged
to develop pseudocode for the twisted radix-5 FFT and give an operation count for
the number of operations required to implement the algorithm.
In theory, the techniques discussed in this section can also be used to develop
a radix-p algorithm for any value of p. There appears to be some use of the radix7 algorithms in modern FFT software, but there does not appear to be much
value in creating FFT routines for higher values of p. The reader is encouraged
to create the classical and twisted versions of the radix-7 algorithms without any
further explanation. One can follow the steps used in creating the radix-3 and
radix-5 algorithms in the previous sections to complete this exercise. The radix-7
algorithms developed by Winograd [33] may also prove useful in completing this
exercise.

5. Radix-6 algorithms
In this section, we will show how to combine the classical radix-2 and radix-3
FFTs into the classical radix-6 FFT. By adapting the technique used in this section,
the reader can construct an FFT that works for any size of the form 2a 3b 5c 7d .
The input to the reduction step of the classical radix-6 FFT algorithm is given
by f (z) mod (z 6m b6 ) and computes

fU
fV
fW
fX
fY
fZ

= f (z) mod (z m b)

= f (z) mod (z m b)

= f (z) mod (z m 2 b)

= f (z) mod (z m 3 b)

= f (z) mod (z m 4 b)

= f (z) mod (z m 5 b)

where = 160o is a primitive 6th root of unity.


The radix-6 algorithm exploits the fact that 3 = 1, 2 = , and 4 =
2 where is the primitive 3rd root of unity used in the radix-3 algorithms as
illustrated by the following figure

5. RADIX-6 ALGORITHMS

2 =

159

3 = 1

4 = 2

One can implement the radix-6 reduction step by decomposing it into a radix-2
reduction step followed by two radix-3 reduction steps or into a radix-3 reduction
step followed by three radix-2 reduction steps. We will present the second option
in this section.
The first part of the radix-6 reduction step receives as input f (z) mod (z 6m b6 )
and produces f (z) mod (z 2m b2 ), f (z) mod (z 2m 2 b2 ), and f (z) mod (z 2m
4 b2 ) using the radix-3 reduction step. The radix-2 reduction step is then used
to implement the following reductions

f (z) mod (z
f (z) mod (z

2m

f (z) mod (z

2m

2m

b )
2 2

b )
4 2

b )

f (z) mod (z m b)
f (z) mod (z m 3 b)

f (z) mod (z m b)
f (z) mod (z m 4 b)

f (z) mod (z m 2 b)
f (z) mod (z m 5 b)

It is convenient to present the reduction step outputs in the order f (z) mod (z m b),
f (z) mod (z m b), f (z) mod (z m 2 b), , f (z) mod (z m 5 b). The
above outputs can be unscrambled into this order at no cost.
The following diagram shows how this reduction step can compute an FFT of
size 6 for an input polynomial of f (z) = fA z 5 +fB z 4 +fC z 3 +fD z 2 +fE z +fF .

160

8. FFT ALGORITHMS FOR OTHER INPUT SIZES

f5

f4

f3

f2

f ( 0 )

f ( 3 )

f0

f1

f ( 1 )

f ( 4 )

f ( 2 )

f ( 5 )

For FFTs of size 6k , one must develop a function that unscrambles the order of
the FFT outputs back into the natural order {f ( 0 ), f ( 1 ), , f ( n1 )}. The
required function is just the hexary (base-6) reversal function which involves writing
down an integer in base-6 form with respect to n which is some power of 6, reversing
the hexits and then returning the decimal version of the number obtained. We
will denote this function using the notation H(j).
EXAMPLE

Let j = 25 and n = 36 = 62 .
In hexary form, j = (41)6 .
So H(j)(36) = (14)6 .
As a decimal number, H(j)(36) = 1 6 + 4 = 10.

Several properties of the hexary reversal function need to be established to use it


in conjunction with the radix-6 FFT. This is left as an exercise for the reader, but
is closely related to the previous work done with the binary and ternary reversal
functions.
Developing the pseudocode for the classical radix-6 FFT and counting the number of operations needed to perform the computations is also left as an exercise for
the reader.
It is also possible to develop the twisted radix-6 FFT. A diagram of the twisted
radix-6 reduction step is given below for an input of size 6m. One must twist
the outputs according to the values given below before proceeding with the next
iteration of the reduction step.

5. RADIX-6 ALGORITHMS

f5

f ( 0 )

f4

f ( 3 )

f3

f2

161

f1

f0

f ( 1 )

f ( 4 )

f ( 2 )

f ( 5 )

It is left as an exercise for the reader to develop the theorems that demonstrate
that the twisted radix-6 algorithm works. The reader is also encouraged to develop
the pseudocode for the algorithm and count the number of multiplications and
additions required.

CHAPTER 9

Additional topics
The main goals for writing this book were to introduce the reader to the algebraic perspective of the FFT, to demonstrate the importance of the FFT for
performing the operation of convolution / polynomial multiplication, and to show
how an FFT algorithm can be constructed for any input size which is the product
of one or more powers of small primes. The previous eight chapters were designed
to meet these objectives. In this final chapter, we will introduce several additional
topics related to the Fast Fourier Transform which the reader can explore. Because
most of these topics require a higher mathematical background than that which is
assumed for this book, only an overview will be presented for each of these topics.
References will be given which can be used to obtain the necessary background
in each area. This chapter essentially surveys the highlights of the authors doctoral research and further details on many of these topics can also be found in the
resulting doctoral dissertation.

1. Additive Fast Fourier Transforms


In some advanced algebra courses and most coding theory classes in Electrical
Engineering, the student is introduced to new number systems called Finite Fields
(or Galois Fields). The simplest such number systems consist of all possible remainders resulting from the division of all of the integers by a second fixed integer called
the modulus. To perform addition and multiplication in these number systems, one
adds or multiplies the numbers in the usual way, but reports the remainder when
the sum or product is divided by the modulus. It can be shown that the modulus
is equal to the number of elements in the finite field using this construction. Below
are the addition and multiplication tables for the finite fields of size 2, 3, and 5.

+
0
1
2

+
0
1

0
0
1

1
1
0

0
1

0
0
0

1
0
1

0
0
1
2

1
1
2
0

2
2
0
1

0
1
2

0
0
0
0

1
0
1
2

163

2
0
2
1

164

9. ADDITIONAL TOPICS

+
0
1
2
3
4

0
0
1
2
3
4

1
1
2
3
4
0

2
2
3
4
0
1

3
3
4
0
1
2

4
4
0
1
2
3

0
1
2
3
4

0
0
0
0
0
0

1
0
1
2
3
4

2
0
2
4
1
3

3
0
3
1
4
2

4
0
4
3
2
1

In algebra classes, one learns several properties required for a number system
to have all four basic arithmetic operations (addition, subtraction, multiplication,
and division). Using the above construction, it can be shown that the modulus
must be a prime number for the operation of division to be possible. Thus, all such
finite fields must have a prime number of elements.
It is possible to create additional finite fields using polynomials with coefficients
from one of the prime finite fields. 1 It turns out that all of these additional finite
fields have q = pk elements where p is a prime and k is an integer. The elements
of such a finite field consist of all of the remainders possible when an arbitrary
polynomial is divided by some fixed polynomial of degree k called the generating
polynomial. Addition in such a structure consists of simply adding two of these
polynomials using the arithmetic of the finite field associated with the coefficients of
the polynomials. Multiplication consists of multiplying two of the polynomials with
the result reduced by the generating polynomial. In order for the multiplication
table to be produced which contains the necessary properties for division to be
possible, it must be the case that the generating polynomial cannot be factored
(this is called irreducible). It can be shown that every finite field with n elements
contains the zero element and some other element which is a primitive mth root
of unity where m = q 1. It turns out that the polynomial x will always be
one of these primitive roots of unity in the system consisting of the q possible
residue polynomials. If the generating polynomial has the property that x is an
mth primitive root of unity where m = q 1, then the generating polynomial is
said to be primitive. Because all finite fields of size q have the same algebraic
structure, we can use the notation GF (q) to denote THE finite field of size q.
For example, let us construct a finite field with 24 = 16 elements. This finite
field will consist of all remainders that are possible when an arbitrary polynomial
with coefficients in GF (2) is divided by some generating polynomial of degree 4 with
coefficients in GF (2). We are going to choose the generating polynomial x4 + x + 1.
It can be shown that this polynomial cannot be factored over GF (2) and thus it is
irreducible. We must also show that it is primitive.
Although there are better ways to show that x is a primitive 15th root of unity
in the system of residue polynomials for x4 + x + 1 with arithmetic in GF (2), we
are going to proceed by listing all of the powers of x in this system and showing
that d = 15 is the first positive exponent where xd = 1. Clearly, x2 = x x
and x3 = x2 x. Now x4 = x3 x, but since this result is of degree 4, we must
1Mathematicians prefer to define these other finite fields in terms of a concept called a
quotient ring rather than the method used in this section. However, most mathematicians then
switch over to the system used in this section for computational purposes. The presentation in
this section is not traditional, but is believed to be easier for the beginning student to understand.
This topic is discussed in more detail in the appendix.

1. ADDITIVE FAST FOURIER TRANSFORMS

165

compute the remainder resulting when x4 is divided by x4 + x + 1 which is x + 1.


Next, x5 = x4 x = (x + 1) x = x2 + x and similarly x6 = x3 + x2 . Then,
x7 = x6 x = (x3 + x2 ) x = x4 + x3 and reducing this result by x4 + x + 1 gives
x3 + x + 1. Continuing in this manner, we see that d = 15 is the first case where
xd = 1. So x4 + x + 1 is primitive and can be used to construct the finite field.
It is more convenient to represent each element of the finite field in terms of the
primitive root of unity. Because we are going to later construct polynomials with
coefficients in GF (16) and will also want to use the symbol x in these polynomials,
we are going to instead use the symbol to denote the primitive root of unity. The
following table shows all of the possible remainders in the example considered above
in terms of . From this point forward, we will use the powers of to represent the
elements of GF (16). Recall that all finite fields also contain a zero element which
is not a power of .

0 =
=

0x3 + 0x2 + 0x + 0
0x3 + 0x2 + 1x + 0

0x3 + 1x2 + 0x + 0

3
4

=
=

1x3 + 0x2 + 0x + 0
0x3 + 0x2 + 1x + 1

5
6

=
=

0x3 + 1x2 + 1x + 0
1x3 + 1x2 + 0x + 0

7
8

=
=

1x3 + 0x2 + 1x + 1
0x3 + 1x2 + 0x + 1

9
10

=
=

1x3 + 0x2 + 1x + 0
0x3 + 1x2 + 1x + 1

11
12

=
=

1x3 + 1x2 + 1x + 0
1x3 + 1x2 + 1x + 1

13
14

=
=

1x3 + 1x2 + 0x + 1
1x3 + 0x2 + 0x + 1

0x3 + 0x2 + 0x + 1

The addition and multiplication tables for this finite field with 16 elements is given
by

166

9. ADDITIONAL TOPICS

10 11 12 13 14

10 11 12 13 14

14

10 13 9

12 11 6

11 14 10 3

10

12 1

11 4

14 9

11 2

13

12 5

10 8

10

10 2

11 8

13 11 3

14 12 4

10 1

11

10 10 5

12 2

11 11 12 6

13 5

12 12 11 13 7

13 4

10 14 5

13 10 0
4
1

13 3

14

14 6

10 6

12 9
3

12

13

7
11

1
9

14 7

12

13 2

10 4

12

12 0

13 0

14 0

10 9

14 3

14

14 4

13 12 7

11 1

14 11 0

11

10

13 13 6

12 14 8

11 7

14 14 3

13 1

12 8

11 10 5

10 11 12 13 14

0
10

10 11 12 13 14 1

10 11 12 13 14 1

10 11 12 13 14 1

10 11 12 13 14 1

10 11 12 13 14 1

10 11 12 13 14 1

10 11 12 13 14 1

10 11 12 13 14 1

13

14

14

12

0
13

11

0
12

10

0
11

10 0

10 11 12 13 14 1

11 0

11 12 13 14 1

10

12 0

12 13 14 1

10 11

13 0

13 14 1

10 11 12

14 0

14 1

10 11 12 13

1. ADDITIVE FAST FOURIER TRANSFORMS

167

There is much more that can be said about the basic properties of finite fields.
In fact, entire books have been written about the topic. Probably the most popular
book is the one written by Lidl and Neidderriter [29]. However, the present author
also highly recommends a book written by Wan [36] which may be more accessible
for an undergraduate student.
Although a finite field can be constructed for any prime power, most finite
fields that are used in practice are of size which is a power of 2. This is because
it is convenient to store these elements using a computer and because addition can
be implemented by simply using the computers exclusive or (XOR) operator.
Suppose that we wish to efficiently evaluate a polynomial of degree less than
n = 2k with GF (2k ) coefficients at each of the elements of this finite field. Since a
finite field consists of the zero element and the mth roots of unity where m = n 1,
each of the elements of GF (2k ) is a root of x(xn1 1) = xn x. The number that
we think of as 1 in our number system is equivalent to the number +1 in GF (2k ),
so xn x = xn + x. Because n 1 is never the power of a small prime (2, 3, 5, 7)
or the product of powers of small primes, a finite field does not contain the correct
primitive root of unity to use one of the FFT algorithms discussed in Chapters 4
and 8. These algorithms are sometimes called multiplicative FFTs because they
take advantage of the multiplicative structure of the set of the roots of unity. The
algorithms introduced in this section are called additive FFTs because they have
advantage of an additive structure present in finite fields.
A basis is a special collection of elements that are linearly independent. Basically, this means that if you take any subset of the elements in the basis, you
cannot add or subtract any number of the elements in the subset and end up with
an element that is not in the subset. 2 The elements of a basis are like building
blocks that can be combined to form a set of elements.
In the example of GF (16), the only combinations of 1 = 1 are 0 1 = 0
and 1 1 = 1. Note that all finite fields with 2k elements have the property that
1+1 = 0, so what we think of as the number 2 is really equivalent to 0 in this number
system. If the element 2 = 5 is added to the basis, then we can add combinations
of 1 and 5 to form {0, 1, 5 , 10 }. Next, if we add 3 = to the basis, then we
can add linear combinations of 1, 5 , and to form {0, 1, , 2 , 4 , 5 , 8 , 10 }.
Finally, if we add 4 = 7 to the basis, then we obtain all of GF (16). A list of each
of the 16 elements in terms of the basis {1, 5 , , 7 } is given below.

2The definitions of basis and linear independence are actually more complicated than this,
but these definitions depend on a more advanced knowledge of algebra.

168

9. ADDITIONAL TOPICS

0 =
1 =
=
2

3
4

=
=

5
6

=
=

7
8

=
=

9
10

=
=

11
12

=
=

13
14

=
=

07 + 0 + 05 + 0 1

07 + 0 + 05 + 1 1
07 + 1 + 05 + 0 1

07 + 1 + 15 + 0 1

17 + 1 + 05 + 1 1
07 + 1 + 05 + 1 1

07 + 0 + 15 + 0 1
17 + 0 + 15 + 1 1

17 + 0 + 05 + 0 1
07 + 1 + 15 + 1 1

17 + 0 + 05 + 1 1
07 + 0 + 15 + 1 1

17 + 1 + 15 + 1 1
17 + 1 + 15 + 0 1

17 + 0 + 15 + 0 1
17 + 1 + 05 + 0 1

The additive FFT is a special case of multipoint evaluation, just like the multiplicative FFTs discussed in the previous chapters are special cases of multipoint
evaluation. However, the modulus polynomial tree is based on the factorization of
xn + x in this case. One design for a modulus polynomial tree that can be used to
efficiently evaluate a polynomial with coefficients in GF (16) at each of the elements
of this finite field is given below. Here, = x2 + x as space did not allow this polynomial to be explicitly included in the figure and this emphasizes the point that all
of the polynomials in each row of the table are the same except for their constant
term.

x16 + x
x8 + x4 + x2 + x
x4 + x
x2 + x
0

x4 + x + 1

+1
5

x8 + x4 + x2 + x + 1

10

+ 5

x4 + x + 5

+ 10
2

+
3

13

+ 4
7

x4 + x + 10
+ 2

12 11

+ 8
9

14

1. ADDITIVE FAST FOURIER TRANSFORMS

169

Note that at every branch of the tree, the elements are split into two equalsized groups based on whether or not one of the basis elements is used in a linear
combination involving the elements of that group. For example, in the first branch
of the tree above, all of the elements which are roots of x8 + x4 + x2 + x share the
property that 4 = 7 is not needed to express the element as a combination of the
basis {1, 5 , , 7 } (see the list of GF (16) in terms of the basis elements above).
Conversely, all of the elements that are roots of x8 + x4 + x2 + x + 1 require 4 if the
elements are to be expressed as a combination of the basis elements. At the second
level of the tree, the elements of each set are divided into two equal-sized groups
based on whether or not 3 = is needed in the linear combination. The third
level of the tree subdivides the elements based on 2 = 5 and the last level of the
tree subdivides the elements based on 1 = 1. Because the elements are subdivided
at each stage based on whether or not a basis element is needed to construct the
element and because the basis elements are added together to form the elements of
a finite field, an FFT that uses this technique to efficiently evaluate a polynomial
at each of the elements is sometimes called an additive FFT.
The additive FFT first appeared in the literature in a paper written by Wang
and Zhu in 1988 [37]. However, it did not work out the details of how to construct
the basis used to subdivide the elements at each stage of the FFT. Mathematicians
often use the big-O notation to measure the cost of an algorithm. The way that
this notation works is to only record the term of the operation count that grows
the fastest and omit the multiplicative coefficient of this term. So all of the multiplicative FFT algorithms require O(n log2 (n)) multiplications and O(n log2 (n))
additions. The Wang-Zhu algorithm requires O(n(log2 (n))2 ) operations in general.
Shortly after the publication of this paper, Cantor [9] (1989) published a paper that
uses a special basis to improve the number of operations needed to compute this
FFT to O(n (log2 (n))1.585 ) operations. The set of elements {1, 5 , , 7 } is an
example of a Cantor basis and has the special property that (i+1 )2 + i+1 = i
for each 1 i < k. In the case of the example used in this section, (5 )2 + 5 = 1,
()2 + = 5 , and (7 )2 + 7 = . Two important properties of the Cantor basis
in the additive FFT is the fact that all of the nonzero coefficients in the modulus
polynomials are ones except in the constant term and that if si (x) is the modulus
polynomial in the left node at a branch in the tree, then si (x) + 1 is the modulus
polynomial in the right node at that branch in the tree. These properties of the
modulus polynomials simplify the reduction step of the additive FFT algorithm
compared to general multipoint evaluation and lead to the O(n (log2 (n))1.585 )
operation count. Additional details of these additive FFT algorithms can also be
found in a paper by Gerhard and van zur Gathen [18] as well as the present authors
doctoral dissertation [31].
Another additive FFT algorithm is introduced in the present authors doctoral
dissertation and is based on earlier work by Shuhong Gao that was only published in
an informal set of notes [17] for a course that he taught in 2001. The new additive
FFT algorithm was improved upon as part of the present authors doctoral research
and is now in a form that requires fewer operations than the Cantor algorithm for
all input sizes where the finite field has 2k elements where k is itself a power of 2. 3
3Most practical uses of finite fields also have the property that k is itself a power of 2 because

this matches the most popular data type sizes used in a computer.

170

9. ADDITIONAL TOPICS

This O(n log2 (n) log2 log2 (n)) algorithm takes advantage of the fact that for any
finite field of this form, xd x divides xk x for all values of d which are powers of
2. Observe that GF (16) = GF (24 ) is a finite field of this form since 4 = 22 . Note
that x4 x divides x16 x since 4 = 22 and 2 = 21 . Also, x2 x divides x16 x since
2 = 21 and 1 = 20 . However, 8 = 23 and 3 is not a power of 2. In this case, x8 x
does not divide x16 x, but instead x8 + x4 + x2 + x does. This modulus polynomial
contains 4 terms while all of the other polynomial contain only 2 terms. Observe
that in the modulus polynomial tree above, the cases where the level is a power of
2 contain fewer terms than the modulus polynomials on level 3 (x8 + x4 + x2 + x
and x8 + x4 + x2 + x + 1). In general, the more terms that are contained in a
modulus polynomial for multipoint evaluation, the more expensive the reduction
step will be at that level. So the power-of-2 levels are less expensive than all of
the other levels. Instead of splitting a problem into 2
equal-sized subproblems,
the
new algorithm subdivides a problem of size n into n subproblems of size n.
So an FFT of size 16 is subdivided into 4 subproblems of size 4 and each of these
subproblems is subdivided into 2 subproblems of size 2. The details of the reduction
step of the new additive FFT algorithm are beyond the scope of this book, but the
important thing to understand is that all of the modulus polynomials involved in
the algorithm are of the form xd x + C where C is some constant. We skip over
all of the levels that contain a greater number of terms in the modulus polynomials
and achieve a lower operation count for the algorithm. Further details about how
this process works can be found in the authors doctoral dissertation.
One of the most important applications of finite fields is in the construction
of Reed-Solomon codes. These codes are important because they allow a compact
disk to work even when it is scratched or becomes dirty. Most of the information
on a CD is stored as finite field elements which can be easily converted to digital
form and used to produce music or information needed by a computer. However,
extra finite field elements are added to sections of data called a block. The extra
elements can be used to detect if there is a mistake somewhere in a block. If a
mistake is detected, then a process can be followed with the given information to
often correct the mistake and allow the CD to work properly anyway. This is called
forward error correction. A new algorithm for decoding Reed-Solomon codes is
included in the doctoral dissertation and is believed to be simpler to understand
than commonly used Reed-Solomon decoding routines. Because the block size on a
compact disc is typically small, the new algorithm is not currently of much practical
value. However, if data is stored differently on compact discs in the future that
uses different Reed-Solomon codes, the new algorithm may one day become more
advantageous.

2. Computer Algebra Algorithms


In this book, we have already seen how the FFT can be used to efficiently
to perform the operation of polynomial multiplication. The FFT can be applied
to several other polynomial operations as well. Probably the best reference that
the reader can use to further explore these topics is the text Modern Computer
Algebra by Gerhard and von zur Gathen [19].

2. COMPUTER ALGEBRA ALGORITHMS

171

In a typical Calculus course, one learns the technique of Newtons Method


to solve certain nonlinear equations. This method can be adapted to solving the
problem of polynomial division with remainder. It turns out that it is possible
to reverse the order of the coefficients for the input polynomials and only focus on
computing the quotient and not the remainder. To find the quotient, we must find a
solution g(x) to an equation of the form f g = 1 where f (x) is a known polynomial.
There is no such polynomial that satisfies this equation, but it is possible to obtain
polynomials that come closer and closer to the desired polynomial. Our goal is
equivalent to the problem to computing approximations for a solution to (g) = 0
where (g) = 1/g f . This is where Newtons Method is applied to this problem.
In a typical Calculus course, one learn that such approximations can be computed
using the formula

gi+1

=
=
=

(gi )
0 (gi )
1/gi f
gi
1/gi2
gi

2 gi f gi2

if g0 is chosen appropriately. In this case, g0 is a constant and must be chosen to be


the inverse of the constant term of f . Each iteration of Newtons Method doubles
the number of coefficients that are correct in the approximation of g. This is called
quadratic convergence.
For the polynomial division problem, once an approximation to g has been
recovered that is accurate in some specified number of positions using Newtons
Method, then this approximation can be substituted into a simple equation to
recover the quotient q(x). This result can then be used to easily recover r(x).
Further details of this Newton division are discussed in the Modern Computer
Algebra book and the present authors dissertation.
Newton division can also be computed using polynomials with finite field coefficients. This is amazing because finite fields do not have the property of limits
which is used to define the derivative in Calculus. Instead, the formal derivative
is used with these polynomials which has many of the same properties of the derivative (product rule, quotient rule, chain rule), but is defined algebraicly instead
of through the use of limits. An alternative approach is given in the authors dissertation which does not use any Calculus concepts to derive this application of
Newtons Method for the problem of polynomial division.
Another important problem in Computer Algebra is the computation of the
greatest common divisor or GCD. The GCD of two or more integers is the largest
integer that divides each of these integers.
EXAMPLE
The factorization of large integers is considered to be a difficult problem, so the
above method cannot be used to find the GCD of two or more integers in general.

172

9. ADDITIONAL TOPICS

However, there is another method that can be used that does not require one to
factor any integers. This method, called the Euclidean algorithm, was invented
around 400 B.C. and is one of the oldest techniques in mathematics that is still
used today. The Euclidean algorithm is based on the fact that if a > b, then the
GCD of a and b is the same as the GCD of b and r where r is the remainder
resulting when a is divided by b. This fact is reused as many times as it takes to
compute the desired GCD.
EXAMPLE
A variant of the Euclidean Algorithm called the Extended Euclidean Algorithm also finds integers u and v such that u a + v b = GCD(a, b). The algorithm
also determines values of ui and vi such that ui a + vi b = ri at each step of the
algorithm where ri is one of intermediate results of the Euclidean Algorithm.
EXAMPLE
The Euclidean Algorithm and Extended Euclidean Algorithm can also be applied to finite fields as well as polynomials with complex or finite field coefficients.
In fact, mathematicians call any algebraic structure where the Euclidean algorithm
can be applied a Euclidean Domain. The Extended Euclidean Algorithm is useful
for finding inverses of finite field elements.
EXAMPLE
Let us more carefully examine the computations of the Extended Euclidean
Algorithm in the above example. Observe that in the first step, XXXXXXXXX.
Similarly, in the second step, XXXXXXXXX. We really dont need the lower coefficients of the input polynomials to compute these results. For example, the first
result can also be computed as follows
EXaMPLE
Note that these input polynomials are formed from the upper coefficients of the
input polynomials of the above example. By carefully figuring out which coefficients
of the input polynomials can be discarded at each step of the Extended Euclidean
Algorithm, it possible to reduce the amount of effort needed to compute the GCD
for large-sized problems. The resulting algorithm is called the Fast Euclidean
Algorithm and is discussed in further detail in the Modern Computer Algebra
book. Most practical-sized integer GCD problems of the late 20th century were not
large enough to fully take advantage of the power of the Fast Euclidean Algorithm
and an algorithm by Kenneth Weber [REF] is more advantageous instead. However,
as problem sizes in the 21st century grow in size, the Fast Euclidean Algorithm is
expected to overtake Webers Accelerated Euclidean Algorithm in many cases.
It is possible to integrate FFT multiplication using the truncated Fast Fourier
Transform into the Fast Euclidean Algorithm as discussed in the present authors
doctoral dissertation. This chapter of the dissertation was motivated by a number
of homework problems and research suggestions given in the Modern Computer
Algebra book. More investigation is needed in this area, but most GCD problem

3. TRUNCATED FAST FOURIER TRANSFORM

173

sizes are not yet large enough for such an investigation to likely yield any practical
improvement over techniques currently in place.
One important application of these fast GCD algorithms is the problem of
factorization of polynomials with finite field coefficients. This topic is the subject
of one of the chapters of the Modern Computer Algbera book. The case where the
finite field has 2k elements is treated in a separate paper [18] written by the same
authors. It should also be pointed out that the problems of multipoint evaluation
and interpolation are also considered in another chapter of the Modern Computer
Algebra book.
Because computer algebra is such a new and active area of research, no book on
the topic can cover all of the advances in this branch of mathematics. The present
author has expanded on some of this material and included some of the recent
results of others in his doctoral dissertation. Hopefully, some of this material will
be included in future editions of Modern Computer Algebra and will be improved
upon by others in the future.

3. Truncated Fast Fourier Transform


In the previous chapter, we learned how to develop an FFT routine that works
for any input size of the form 2a 3b 5c 7d . One may wonder how to compute the
FFT for any integer n. If n can be expressed as a product involving a large prime,
then the resulting algorithm would probably be either very difficult to construct or
not much better than just explicitly evaluating the polynomial at the n points.
Since one of the main applications of the FFT is convolution, most of the time
we really dont care if there are exactly n points involved in the FFT or which points
are selected for the evaluations. Instead of computing an FFT of size n, we can
instead compute the FFT of some size bigger than n which can also be expressed
as 2a 3b 5c 7d . For example, instead of computing the FFT of size 59, one could
instead compute the FFT of size 60 = 2 3 5. One difficulty with this method
is finding the bigger input size that should be used with an arbitrary value for n,
especially when n becomes large. This is because factorization of integers becomes
a difficult problem for large values of n. One way to circumvent this issue is to
choose the next largest power of 2 for a given value of n, but this wastes significant
computational effort in the computation of the convolution.
Another technique has been recently introduced by van der Hoeven ([24] , [25])
which may one day prove to be better than the above methods. The presentation
in this section is a modified version of the technique developed for his doctoral
research. The method is based on the following factorization of z N 1 where
N = 2K for some K.

zN 1 =
=

(z 1) (z + 1) (z 2 + 1) (z 4 + 1) (z K1 + 1)
(z 1)

K1
Y

(z d + 1)

d=0

174

9. ADDITIONAL TOPICS

The truncated FFT evaluates a polynomial at each of the roots of some specially
chosen polynomial M(z) of degree n constructed from the factors of z N 1 where
n N . To determine M(z), express n in binary form and include a factor of
d
z 2 + 1 whenever a one appears in the 2d place value of the binary expansion. For
example, let n = 21 so that N = 32. The binary form of n is 21 = (10101)2,
so there are ones in the 24 = 16, 22 = 4, and 20 = 1 place values. In this case,
M(z) = (z 16 + 1) (z 4 + 1) (z + 1).
Recall that the reduction step of the radix-2 FFT algorithm receives as input
f (z) mod (z 2m b2 ) and produces as output fY = f (z) mod (z m b) and fZ =
f (z) mod (z m + b) where m is a power of 2. The truncated FFT algorithm focuses
on the cases where b = 1. The reduction step of the truncated FFT algorithm
receives as input f (z) mod (z 2m 1). If z m + 1 is included in M(z), then the
reduction step produces f (z) mod (z m + 1) and then calls any of the 2-adic FFT
algorithms to evaluate f (z) at each of the roots of z m + 1. Regardless of whether or
not z m +1 is included in M(z), then the reduction step produces f (z) mod (z m 1)
which is used in the next reduction step of the truncated FFT algorithm.
The following butterfly diagram shows how the algorithm would proceed in the
case where n = 21 so that M(z) = (z 16 + 1) (z 4 + 1) (z + 1). The top row of
the figure represents the input to the truncated FFT where squares indicate the
components of the input and the circles indicate coefficients of degree greater than
the input size which are known to be zero in advance. The bottom row of the figure
represents the 21 evaluations of the input polynomial. It can be shown that each
of these locations corresponds to a root of M(z) = (z 16 + 1) (z 4 + 1) (z + 1).

Now the tricky part is to interpolate the evaluations of a polynomial at M(z)


into the polynomial of degree less than n. The inverse radix-2 FFT can be used
for some of the interpolation steps, but there becomes a point where some of the
interpolation steps lack the required inputs to continue.
To continue with the inverse truncated FFT algorithm, we will take advantage of the known zeros that must appear in the upper coefficients of the output
polynomial. At these locations, the interpolation step of the inverse radix-2 FFT
produces the output polynomial fA z m + fB = f (z) mod (z 2m 1) given the input polynomials fY = f (z) mod (z m 1) and fZ = f (z) mod (z m + 1) using the
equations

3. TRUNCATED FAST FOURIER TRANSFORM

fA

fB

175

1
(fY fZ )
2
1
(fY + fZ )
2

At some of the places where the inverse truncated FFT algorithm seems to get
stuck, we know fZ and some of the coefficients of fA (these are the known coefficients in the output polynomial) and we would like to recover some of the coefficients
of fY . The first equation above can be solved for fY to obtain

fY

2 fA fB

We will call this a mixed butterfly operation because it is a mixture of the reduction step of the FFT algorithm and the interpolation step of the inverse FFT
algorithm. After applying the mixed butterfly operation, we have some of the coefficients of f (z) mod (z m 1). We can either apply the mixed butterfly operation
again with the coefficients of f (z) mod (z m + 1) or use the reduction step of the
FFT algorithm to recover some of these coefficients. At some point, we will have all
of the information that we need to recover the desired output polynomial using the
inverse FFT interpolation step and the two additional mixed butterfly operations

fB

fA + fZ

fB

fY fA

and

Again, the inverse truncated FFT is somewhat more complicated than the companion FFT algorithm. The following butterfly diagram is provided to illustrate
how the process works for the case where n = 21. Basically the algorithm proceeds
from the right of the figure to the left recovering values from the top of the figure
down to the bottom. Once enough coefficients have been recovered to undo the
recursion, the algorithm proceeds from the left of the figure to the right, recovering
coefficients from the bottom of the figure to the top.

176

9. ADDITIONAL TOPICS

The following legend is also provided to indicate how each result of the above
butterfly diagram is obtained

Known value at beginning of algorithm (zero value)


Known value at beginning of algorithm (not necessarily zero)

Value computed using FFT reduction step

Value computed using mixed butterfly operation A


Value computed using mixed butterfly operation B

Value computed using inverse FFT interpolation step


Value computed using mixed butterfly operation C

The truncated FFT algorithms are discussed in much more detail in the doctoral
dissertation, but is presented as a generalized version of the algorithm. The algorithm in the dissertation can also work with the additive FFT algorithms introduced
earlier in this chapter, but is presented more abstractly and is thus somewhat more
difficult to read than the description given in this section might suggest.
4. Implementation of FFT Algorithms
This textbook has used the model of computing the cost of an algorithm in
terms of the number of multiplications and the number of additions required to
implement the algorithm. While this is a good model when an algorithm has very
large input sizes, it is not a good model for practical-sized problems of the early
21st century. Computers of this time period have many advanced features that a
computer programmer needs to take advantage of in order to produce an algorithm
that runs the fastest. These features are often difficult to model mathematically,
so mathematicians continue to use operation counts as a measure of comparing
two algorithms. But most people are just interested in the FFT routine that runs
the fastest regardless of what the mathematicians think. The FFTW (Fastest
Fourier Transform in the West) [16] is widely regarded as one of the best software
packages available for computing the FFT. One can read the papers written by the
inventors of this package to better understand the computer science issues involved
with implementing the FFT. Although the current version of FFTW does not use
the most efficient FFT algorithms from the mathematicans viewpoint, one must
periodically revisit the question of the most efficient FFT technique as the input
sizes grow in size in the coming years and new computer architectures are invented
which may favor different methods of computing the FFT. Just by reading the
papers of Frigo and Johnson, one can see several changes in their approach to
computing the FFT over the lifetime of their product.
Another related topic for implementing FFT algorithms is parallel computing.
Here, one gets two or more computers and distributes the work needed to compute
the FFT between the two computers. Special algorithms are needed to communicate the intermediate answers of the FFT to the other computers as efficiently as
possible. The book [10] gives a number of algorithms for computing the FFT in

6. CONCLUDING REMARKS

177

parallel. It is also possible to use these algorithms for multiplying two polynomials
or performing convolution. However, as of this writing, no one seems to be able to
distribute the work of polynomial division, the greatest common divisor, or polynomial factorization in the same way. These are important research problems that
would have great benefits if someone could discover these parallel algorithms.
5. Applications of the FFT
Earlier in this book, it was suggested that the main application of the FFT is
for the efficient computation of the operation of convolution. For many people, this
probably does not sound very impressive because they are not aware of the many
places where convolution is an important component of many practical and useful
techniques in science and engineering.
The book by Brigham [7] contains a large listing of applications of the FFT
and references where the interested reader can explore some of these applications.
Additionally, the extensive bibliography by NAME contains many articles which
discuss practical applications of the FFT.
The present author has recently become interested in the role that some of
the concepts discussed in this book play in the theory of music. Recall that the
original application of the Discrete Fourier Transform was to express a given signal
input as the sum of harmonically related sinusoids. The sinusoids related to one
with a frequency of 440 Hz (also known as the note A) is the basis of much of
traditional music theory. While the FFT does not play a prominent role in the
composition of musical arrangements, the foundation upon which the FFT is built
is the basis for what many would consider to be beautiful music where beautiful
is determined by the ears natural pleasant response to encountering harmonically
related sinusoids. The reference [30] is recommended for exploring this interesting
topic.
6. Concluding Remarks
During his undergraduate education in mathematics and engineering, the present
author had a somewhat difficult time learning the FFT because of the different
perspectives used by the two communities in their publications on the topic. As
a review, the engineers view the FFT as a technique used to efficiently compute
the coefficients of a discrete Fourier series associated with a particular sequence
of signal samples while mathematicians view the FFT as an efficient technique for
solving the multipoint evaluation problem.
Only time will tell whether the algebraic perspective of the FFT becomes more
accepted in the mathematics and engineering communities.
The impact of this book also remains to be seen. It is hoped that other mathematicians and engineers will continue to explore the topics of this book, particularly
those given in this final chapter. Those that make further contributions to the body
of knowledge represented in this book or know of contributions of others that should
be included in the book are encouraged to contact the author (currently at e-mail

178

9. ADDITIONAL TOPICS

address: tmateer@howardcc.edu) so that additions may be made to future revisions


of the text.

APPENDIX A

The relationship between the Fourier Transform


and the Discrete Fourier Transform
Suppose that we are given some function f (t) where t is a real variable and
f (t) can produce either real or complex number outputs. We wish to compute the
Fourier Transform of f (t), i.e.

F (s) =

f (t) e2stI dt.

Since this is an improper integral, let us choose some finite interval of size T0 over
which to evaluate the integrand. This interval will start at a = T0 /2 and end at
b = T0 /2. So we will now consider the proper integral

f (t) e2stI dt

which can be evaluated as a Riemann sum. The Riemann sum is usually defined by dividing the interval of integration into N equal-sized parts and evaluating the function of interest at the left endpoint of each subinterval. Usually, these points are denoted t0 , t1 , t2 , . . . , tN 1 with corresponding function values
f (t0 ), f (t1 ), f (tN 1 ).
To illustrate this process, consider the function

f (t) =

t + 2 if 4 t 4
0
otherwise

for the case where s = 0 so that the complex exponential component of the integral
goes away. We will let T0 = 2 so that the region of integration is from t = 1 to
t = 1. Letting N = 8, the shaded region in the figure
179

180

A. FOURIER TRANSFORM AND DFT

3
2
1
1
1
t 0 t 1 t 2 t 3 t4 t 5 t 6 t 7
is an approximation of the desired integral.
In general, since
ba
i
N
T0
T0
i
= +
2
N

ti

= a+

for all integer values of i in the range 0 i < N and t = T0 /N , then the Riemann
sum can be computed with the formula
N
1
X
i=0

f (ti ) e2sti I t

which represents the shaded area in the figure above when s = 0.


As N increases in size, the Riemann sum becomes a better and better approximation to the given integral. For example, if N = 32 in the case illustrated above,
we obtain the new figure

3
2
1
1

By the power of Calculus,


Z

f (t) e2stI dt

lim

N 1
T0 X
f (ti ) e2sti I
N i=0

A. FOURIER TRANSFORM AND DFT

181

To evaluate the indefinite integral used to define the Fourier transform, we increase
T0 without bound. Since a = T0 /2 and b = T0 /2, then

F (s) =

=
=

f (t) e2tsI dt

lim

T0

lim

T0

T0

T0

f (t) e2stI dt


N 1 
T0
T0
T0
T0
T0 X
f +
i e2s( 2 + N i)I
lim
N N
2
N
i=0

The above formula can be used to compute the Fourier Transform at any input
s in the range to . One problem with this approach is that a computer can
only store the Fourier Transform at a finite number of values for s. So what we are
going to do now is to define a new function FT0 ,N (k) which approximates F (s) for
only N of the possible values for s. Here, we will restrict k to be the N integers
in the interval N/2 k < N/2 and for an input of k, the function FT0 ,N (k)
will approximate F (f0 k) where f0 = 1/T0 and T0 is again the size of the region
over which we will sample the function f (x). So one formula that can be used to
compute FT0 ,N (k) is given by

FT0 ,N (k) =
=


N 1 
T0
T0
T0 T0
T0 X
f +
i e2(f0 k)( 2 + N i)I
N i=0
2
N


N 1
1
1
T0 X
T0 T0
f +
i e2k( 2 + N i)I
N i=0
2
N

Observe that by restricting the values for which we will compute the Fourier Transform to multiples of f0 , we have eliminated T0 from part of the formula. We are
now going to introduce a new function that will simplify this formula even more.
Let fe( ) be a function defined as follows:
fe( )

f
f

T0
N
T0
N

if < N/2

if N/2
T0

where the inputs are restricted to integers in the interval 0 < N . In terms of
the example function for f (t) and the case where N = 8, then fe( ) is given by

182

A. FOURIER TRANSFORM AND DFT

3
2
1
0

We can now define the function FT0 ,N (k) in terms of fe( ) as follows:
FT0 ,N (k) =

N/21
1
T0 X e
f ( ) e2k( N )I
N =0

N 1
1
T0 X e
f ( ) e2k( N 1)I
N
=N/2

The reason why the multiplicative factor T0 /N is included in this formula can be
seen by observing the following figure

3
2
1
0

which uses fe( ) to compute FT0 ,N (k) in the case of the example used throughout
this section. Here, k = 0 which corresponds to the case where s = 0 earlier.
At figure glance, this figure looks somewhat like the earlier one which uses eight
rectangles to approximate the Fourier Transform of f (t) at s = 0 except that the
rectangles are presented in a different order. However, there is another difference
between the two figures. In the earlier case, the width of each rectangle was 1/4 or
T0 /N where T0 = 2 and N = 8. In the second figure, the width of each rectangle
is 1. In order for the summation which uses fe( ) to actually compute FT0 ,N (k), we
must multiply the resulting area by T0 /N .
The second term in the above expression can be simplified to

N 1
1
T0 X e
f ( ) e2k( N )I e2kI
N
=N/2

A. FOURIER TRANSFORM AND DFT

183

By Eulers Formula covered in Chapter 3, then for any integer value of k,

e2kI

=
=
=

cos(2k) + I sin(2k)
1+I0
1

and so FT0 ,N (k) can now be computed using the simplified formula

FT0 ,N (k) =

N 1
T0 X e
f ( ) e2k/N I
N =0

which almost matches the Discrete Fourier Transform formulas introduced in Chapter 1. The only difference between these formulas is the multiplicative constant that
appears in front of the summation.
Let us now define a new function FeT0 ,N (k) as follows:
FeT0 ,N (k) =

N
1
X
=0

fe( ) e2k/N I

where k is allowed to be any integer. Now select any integer c and observe that

FeT0 ,N (k + c N ) =
=

N
1
X
=0

N
1
X
=0

N
1
X
=0

fe( ) e2(k+cN )/N I

fe( ) e2k/N I e2c I


fe( ) e2k/N I 1

= FeT0 ,N (k)

where we have again used Eulers Formula to claim that e2c I = 1. This result
tells us that we can select any integer k1 and FeT0 ,N (k1 ) will be equal to TN0 FT0 ,N (k)
for one of the ks in the interval N/2 k < N/2. This means that we can choose
any N consecutive integers, evaluate FeT0 ,N (k) at each of these integers, and we
will have all N values of FT0 ,N (k), scaled by a multiplicative constant. In practice,
engineers typically use FeT0 ,N (k) to define the Discrete Fourier Transform with the
range of integers 0 k < N and do not multiply the final results by T0 /N . This
matches one of the Discrete Fourier Transform formulas introduced in Chapter 1.

184

A. FOURIER TRANSFORM AND DFT

Given a fixed value for T0 , we now wish to show that FT0 ,N (k) approximates
F (k f0 ) = F (k/T0 ) at the integer values for k in the interval N/2 k < N/2.
Let FT0 (k) be defined as the function that results where N is increased without
bound in FT0 ,N (k). Then

FT0 (k) =
=

=
=

lim FT0 ,N (k)

N 1
T0 X e
f ( ) e2k/N I
N N
=0

N
1 
1
1
T0 X
T0 T0
lim
f +
i e2k( 2 + N i)I
N N
2
N
i=0

N
1 
X
T0
T0
T0
T0
T0
i e2(f0 k)( 2 + N i)I
f +
lim
N
2
N
N
i=0

lim

lim

N
1
X
i=0

T0 /2

T0 /2

f (ti ) e2(f0 k)ti I t

f (t) e2(f0 k)tI dt

If f (t) is bandlimited (which means that there exists some M such that f (t) = 0
for all values of t that are greater than M and all values of t that are less than M ,
then FT0 (k) produces F (f0 k) exactly. Otherwise, as T0 increases, then FT0 (k)
becomes a better and better approximation to F (f0 k).
In the Discrete Fourier Transform FT0 ,N (k), there are two parameters that we
can control to specify how good of an approximation we can make to the Fourier
Transform F (s). The parameter N controls the number of values of for which
the function FT0 ,N (k) will be defined. The parameter T0 controls f0 = 1/T0 which
determines how close the valid inputs to FT0 ,N (k) will be to one another. Then the
function FT0 ,N (k) approximates F (s) for all values of s of the form k f0 for all
integer values of k in the interval N/2 k < N/2. So T0 controls the size of the
interval of values for which FT0 ,N (k) is defined and N controls how many of the
values inside the interval for which the function is defined. By increasing both of
the parameters without bound, FT0 ,N (k) will be a better and better approximation
to F (s).
We now want to show how the inverse Fourier Transform relates to the inverse
Discrete Fourier Transform. The inverse Fourier Transform is defined with the
formula

f (t) =

F (s) e2stI ds

A. FOURIER TRANSFORM AND DFT

185

We can evaluate this integral by letting the parameter fm increase without bound
in the proper integral
Z

F (s) e2stI ds

where a = fm /2 and b = fm /2. This integral can also be evaluated by using the
Riemann sum. We will again divide the interval of integration into N equal-sized
parts and evaluate the function of interest at the left endpoint of each subinterval.
These points can be denoted by s0 , s1 , s2 , . . . , sN 1 with corresponding function
values F (s0 ), F (s1 ), , F (sN 1 ). For example 1, if
F (s) =

(s 1)2 + 1 if 5 s 5
,
0
otherwise

fm = 2, t = 0, and N = 8, then the integration is approximated by


6

1
1
s0 s1 s2 s3 s4 s5 s6 s7

Since

sj

ba
j
N
fm
fm
+
j
=
2
N
= a+

for all integer values of j in the range 0 j < N and s = fm /N , then the
Riemann sum can be computed with the formula
N
1
X
j=0

F (sj ) e2tsj I s

and
1This example is not intended to be the inverse of the earlier example. It was just selected to
produce the graphs that illustrate the integration process for the inverse discrete Fourier transform.

186

A. FOURIER TRANSFORM AND DFT

F (s) e2tsi I ds

lim

N 1
fm X
F (sj ) e2tsj I
N j=0

If N = 32 is the above example, the following figure shows the much better approximation for the integral.
6

By letting fm increase without bound, then the inverse Fourier Transform formula
is obtained.
In theory, the above formula can be used to evaluate f (t) at whatever real
value of t that we want. In practice, however, we usually only know approximations to F (s) at a finite number of multiples of some predetermined fundamental
frequency which is the f0 from the forward Fourier Transform analysis. If N of
function values of these approximations F,N (k) are known for all values of s = kf0
where N/2 k < N/2, then f (t), an approximation to this Riemann sum is given
by

f (t) =

f0

N
1
X
j=0

F,N (j N/2) e2t(f0 (jN/2))I

We are now going to restrict ourselves to inputs of f (t) which are multiples
of T0 /N . Let us use the notation f ? ( ) to refer to the function f (x) with inputs
restricted to values of the form T0 /N In this case, f ? ( ) simplifies to
f ? ( )

= f0
= f0

N
1
X
j=0

N
1
X
j=0

F,N (j N/2) e2( T0 /N )(f0 (jN/2))I


F,N (j N/2) e2 (jN/2)/N I

Since the inputs to f ? ( ) are restricted to multiples of T0 , we can use the evaluations
of Fe,N (k) at all values of k in the interval 0 k < N in place of the evaluations
of F,N (k) given above. Now, f ? ( ) becomes

A. FOURIER TRANSFORM AND DFT

f ( )

187


T0 e
F,N (k) e2 k/N I
f0
N
k=0

N
1 
X
T0 e
F,N (k) e2 (kN )/N I
+ f0
N
N/21 

k=N/2

In terms of F,N (k) and f ? ( ), the example integral is approximated by

6
4
2
0

when N = 8.
Observe that the second term in the above formula simplifies to

f ( )


T0 e
F,N (k) e2 k/N I e2 I
= f0
N
k=0

N
1 
X
T0 e
= f0
F,N (k) e2 k/N I 1
N
N
1 
X

k=0

N 1
1 X e

F,N (k) e2 k/N I


N
k=0

where we have again used Eulers Formula to show that e2 I = 1. We have also
made use of the fact that f0 T0 = 1.
Assuming that limT0 limN F,N (k) = F (f0 k), fm = f0 N = N/T0 ,
and t = T0 /N , then

188

A. FOURIER TRANSFORM AND DFT

lim

lim f ? ( )

T0 N

N 1
1 X e

F,N (k) e2 k/N I


T0 N N

lim

lim

k=0

lim

lim f0

T0 N

lim

lim f0

T0 N

N
1
X
j=0

N
1
X
j=0

F,N (j N/2) e2 (jN/2)/N I

F (f0 (j N/2)) e2 (jN/2)/N I




N 1
N
fm X
fm
F
j
e2 /N (jN/2)I
=
lim lim
fm N N
N
2
j=0
=

=
=
=

N 1
fm X
F (sj ) e2 sj /fm I

fm N N
j=0

lim

lim

N 1
fm X
F (sj ) e2 T0 /N sj I

fm N N
j=0

lim

lim

lim

lim

fm N

lim

fm

N
1
X
j=0

fm /2

fm /2

F (sj ) e2xsj I s

F (s) e2sxI ds

F (s) e2sxI ds

= f (t)

So what we have shown is that given

f ? ( )

N 1
1 X e

F,N (k) e2 k/N I


N
k=0

which is what the engineers typically use as the Discrete Inverse Fourier Transform
formula, as the number of frequency samples (N) increases and the fundamental
frequency f0 = 1/T0 gets closer and closer to 0, then f ? ( ) becomes a better and
better approximation to f (t) for all values of t that are multiples of T0 . Since f0
gets closer and closer to 0, then this means that f (t) becomes a better and better
approximation to f (t) for all t.
The above derivation also shows that we can get away with omitting T0 /N from
the Discrete Fourier Transform formula, provided that the multiplicative factor 1/N
is included in the inverse Discrete Fourier Transform formula. This is typically what
is done if Fe,N (k) is used to define the Discrete Fourier Transform. If

A. FOURIER TRANSFORM AND DFT

FT0 ,N (k) =

189

N 1
T0 X e
f ( ) e2k/N I
N =0

is used for the definition of the Discrete Fourier Transform, then the above analysis
shows that the inverse Discrete Fourier Transform is computed using

f ? ( )

f0

N
1
X
j=0

F,N (j N/2) e2 (jN/2)/N I

Note the multiplicative factor f0 present in this definition.


It can also be shown that if

FbT0 ,N (k) =

N 1
1 X e
f ( ) e2k/N I
N =0

is used for the definition of the Discrete Fourier Transform, then the inverse Discrete
Fourier Transform can be computed using

f ? ( )

N
1
X
k=0

Fb,N (k) e2 k/N I

Note that there is no multiplicative factor present in this definition.


Thoughout the preceding discussion, we have used the term inverse Discrete
Fourier Transform to describe f ? ( ), but we have not yet shown that using either
FT0 ,N (k), FeT0 ,N (k), or FbT0 ,N (k) as input to the appropriately defined f ? ( ) can be
used to recover fe( ). We will now resolve this situation in the case of the most
common Discrete Fourier Transform definition, FeT0 ,N (k).
Before proceeding, one must show that

N
1
X
k=0

2k/N ( n)I

This is left as an exercise for the reader. Then

N
0

if = n
if =
6 n

190

A. FOURIER TRANSFORM AND DFT

f ? ( )

N 1
1 X e

F,N (k) e2 k/N I


N
k=0

N
1

N
1

N
1
X

N
1
X

n=0

fe(n)

k=0

N
1
X
N
1
X
n=0

n=0

fe(n)

fe(n) e2k/N nI
N
1
X
k=0

N
1
X

e2 k/N I

e2k/N nI e2 k/N I
e2k/N ( n)I

k=0

1
=
(0 + 0 + + 0 + fe( ) N + 0 + + 0)
N
= fe( )

Since f ? ( ) = fe( ), we are now justified to use the phrase inverse Discrete Fourier
Transform as f ? ( ) is the inverse of the Discrete Fourier Transform of fe( ). A
similar analysis can yield the same result for F,N (k) and Fb,N (k). This analysis
which relates each of these Discrete Fourier Transform definitions and their inverses
may also help to illuminate why there is a factor of 1/N somewhere in one of the
two definitions.
Let us now return to the expression

f (t) =

N/21
X
1

Fe,N (k) e2t(f0 k)I


N
k=0

N
1
X
1
Fe,N (k) e2t(f0 (kN ))I

N
k=N/2

where t is allowed to be any real number in the interval < t < . Note that
because t is permitted to be a value other than a multiple of T0 , we cannot collapse
the above two summations into a single expression. However, it is shown in Chapter
7 that if it is known that f (t) is a function with no imaginary components, then

N/21

f (t) =

a0 +

k=1

(ak cos(2 (f0 k) t) + bk sin(2 (f0 k) t))

+ aN/2 cos( f0 N t)
where (ak )2 + (bk )2 = |Fe,N (k)/N | and tan() = bk /ak where is the argument of
the complex value Fe,N (k)/N when written in polar form.

A. FOURIER TRANSFORM AND DFT

191

This function is called the Discrete Fourier Series. We already know that
f (t) = f ? ( ) for the N values t = T0 /N where 0 < N . As N increases
without bound, we obtain the Fourier Series

f (t)

= a0 +

k=1

(ak cos(2 (f0 k) t) + bk sin(2 (f0 k) t))

where f (t) = f (t) for any periodic function f (t). In other words, any periodic
function can be represented by an infinite summation of sinusoids where the magnitude of each sinusoid in determined by the Fourier transform. So we can generate
better and better approximations to any periodic function f (t) by increasing the
value of N in the Discrete Fourier Transform of f (t) and using the Discrete Fourier
Series to reconstruct the function.
If T0 is allowed to increase without bound so that f0 gets closer and closer to
zero, then we obtain
Z

f (t) =

F (s) e2stI ds

in other words, the inverse Fourier Transform. Let FA (s) and FB (s) be defined by
the functions such that FA (s1 )2 + FB (s1 )2 = |F (s1 )| and tan() = FB (s1 )/FA (s1 )
where is the argument of the complex value F (s1 )/N when written in polar form
for any s1 . Then the inverse Fourier Transform becomes

f (t) =

(FA (s) cos(2 s t) + FB (s) sin(2 s t)) ds

In particular, if F (s) is a real-valued function, then FB (s) = 0 and the Fourier


Transform formula reduces to

f (t) =

FA (s) cos(2 s t)ds

The importance of the above two formulas is that now nearly any real function f (t)
can be represented as an infinite summation of sinusoids. Instead of the sinusoids
in the summation having frequencies that are multiples of some common value f0 ,
the sinusoids in the summation are all possible frequency values. The Discrete
Fourier Transform can be used in conjunction with the Discrete Fourier Series to
approximate nearly any function f (t) over some finite interval T0 /2 t T0 /2.
For the Discrete Fourier Series to be valid over a wider interval, one should increase
T0 . For the Discrete Fourier Series to be a better approximation to f (t) within the
interval, one should increase N .
We have been careful to say that the above formulas can be used to approximate
f (t) for nearly all possible functions f (t). Some mathematicians who specialize in

192

A. FOURIER TRANSFORM AND DFT

Calculus enjoy constructing bizzarre function definitions to be used as counterexamples when one carelessly uses limits to claim a particular result as a variable
is increased without bound. The so-called Dirichlet conditions restrict the set of
all functions to a smaller subset that can be represented as a Fourier Series. The
Dirichlet conditions claim that if f (t) has a finite number of finite discontinuities,
f (t) has a finite number of maxima and minima, and f (t) is absolutely integrable,
i.e.
Z

|f (t)|dt <

then f (t) can be represented by the inverse Fourier series. The above conditions
are sufficient, but not necessary. This means that there are likely other functions
that can be represented by the inverse Fourier transform that are not covered by
the above conditions. Often the third condition above is relaxed to those functions
that have finite energy, i.e.
Z

|f (t)|2 dt <

Not all functions that meet the relaxed third condition can be represented by an
inverse Fourier transform, but most do. At this point, we will leave it to the
Calculus experts to further partition all possible functions into those that can be
represented by a discrete Fourier transform and those that cannot. Most practical
cases, however, are finite-energy functions that also satisfy the first two Dirichlet
conditions and can be represented by a discrete Fourier transform.
In summary, the Discrete Fourier Transform is a method for interpolating a set
of N values obtained by sampling f (t) uniformly over some interval T0 t T0
into the Discrete Fourier Series of at most N sinusoids that can be evaluated at
the N sample points and obtain the N sample values. As the number of samples
N within the interval increases, the Discrete Fourier Series becomes a better and
better approximation for f (t) within that interval. If f (t) has period T0 , then this
resulting Fourier Series represents f (t) exactly. As T0 increases without bound,
then the Discrete Fourier Series approaches the inverse Fourier Transform which can
represent nearly any function f (t) as an infinite summation of sinusoids. The inverse
Discrete Fourier Transform is used to efficiently evaluate the Discrete Fourier Series
at the N points used to generate this series. The Discrete Fourier Transform is an
approximation for the Fourier Transform, the inverse Discrete Fourier Transform
is an approximation for the inverse Fourier Transform, and the inverse Discrete
Fourier Transform is the inverse of the Discrete Fourier Transform. There are
several valid definitions of the Discrete Fourier Transform, each of which differ
from one another only by a multiplicative constant.
In the main part of the book, we will use the simpler and more conventional
notation f ( ) to represent fe( ) and F (k) to represent Fe,N (k).

APPENDIX B

Residue Rings
1. Background
In an abstract algebra course, a quotient ring is typically defined as R/A where
R is some ring and A is an ideal of R. This partitions R into a number of subsets
which are the elements of R/A. A standard result of an abstract algebra course is
to prove that these subsets form a ring using the operations

(a + A) (b + A)

(a + A) (b + A)

= (a + b) + A
= (a b) + A

Here, a, b, a + b, and a b are elements of R, while a + A, b + A, (a + b) + A,


and (a b) + A are elements of R/A. The + sign in the expressions a + A and
b + A is part of the representation of the element of R/A, while the + sign in the
expression (a + b) means to do an addition operation of the two elements a and b.
The operations of R/A actually combine two subsets of R into another subset of
R. The subsets are the elements of R/A.
For example, let R be the ring of integers and let A be the even integers in
R, i.e. A = {..., 4, 2, 0, 2, 4, ....} . This subgroup A of R satisfies all of the
conditions needed to be an ideal and so R/A is a ring. The only other subset of R
which is a member of R/A is 1 + A = {..., 5, 3, 1, 1, 3, 5, ....}. Thus, R/A has
two elements, namely A and 1 + A (i.e. the even integers and the odd integers)
which can be combined according to the operations above.
The quotient ring just constructed is essentially the binary number system
with the same arithmetic operations. Although this may appear to be a rather
complicated way of defining this number system, this is typically the approach taken
by algebra textbooks and this appears in engineering textbooks involving coding
theory as well. Then, as soon as this complicated construction has been achieved,
it is often frequently then swept under the carpet with a statement something like
Typically what is done is to choose a representative element from each subset
and let the representative denote the entire coset. Each representative is typically
chosen to be the smallest element of the subset. In the case of the example, the
representatives are 0 and 1 and the arithmetic tables for the binary number system
are defined in terms of these two elements. From this point forward, most people
forget about the fact that they are working with subsets and become comfortable
working with the representative elements. Most proofs involving such quotient rings
193

194

B. RESIDUE RINGS

assume that the representatives are really the elements of the quotient ring, when
in reality the elements are actually subsets of elements in some other ring which
are combined in a way isomorphic to the representative elements. This method
of instruction is traditional, but typically very confusing for the beginning algebra
student and for engineers who need to learn finite fields for coding theory.
Throughout this manuscript, we have claimed that the collection of residue
polynomials for a fixed modulus polynomial is equivalent to the algebraic structure
of a quotient ring in the case of univariate polynomials. In this section of the appendix, we will provide a proof of this claim. First, we will show that the collection
of residue polynomials is indeed a ring and second we will show that this set of
elements is isomorphic to a similarly defined quotient ring.

2. Definitions
There is some variation among abstract algebra textbooks about definitions
involving basic abstract algebra concepts. Here, we have adopted the definitions
given in [?] (except for Euclidean Domain with unique remainders and residue rings
which are introduced in this discussion).
Integral domain.
A commutative ring with identity 1 6= 0 is called an integral
domain if it has no zero divisors.

Euclidean domain.
An integral domain R is said to be a Euclidean Domain if
there is a norm N on R such that for any two elements a and b of R
with b 6= 0, there exist elements q and r in R with
(2)

a= qb+r

with r = 0 or N (r) < N (b). Also, if a 6= 0 and b 6= 0, then


N (ab) N (a).
A Euclidean Domain is said to have unique remainders for
any selection of a and b in R with b 6= 0, there exist unique elements
q and r in R that satisfies (2)

3. THE SET OF RESIDUES IS A RING

195

Residues.
Let D be a Euclidean domain with unique remainders and let
m be some element of R. Let D\m denote the set of all possible
elements of r in (2). We will call this collection of elements the
set of residues. This is also the set of elements a of D such that
N (a) < N (m) with any additional restictions placed on the remainders so that D has unique remainders.
Given elements f and m of D, a Euclidean domain with unique
remainders, the function f rem m is defined as the element r of D
obtained from (2).

Note: Since D has unique remainders, then r is unique and so the function f rem
m is well-defined.
Let us define addition and multiplication as follows over D\m, given elements
a and b D\m.
(3)

ab

(a + b) rem m

(4)

(a b) rem m

Here,  and are the operations defined over D\m, while + and are operations defined over D. Since + and are assumed to be well-defined over D and the
rem operation produces a unique result for any element of D, then  and
are
also well-defined.
3. The set of residues is a ring
Lemma 1. Let f be an element of D and let g be an element of D such that
f = g + k m for some k D. Then f rem m = g rem m.
Proof:
Let rf be result of computing f rem m. Thus, f = qf m + rf where qf
and rf are uniquely determined and rf satisfies the conditions to be a remainder
in D. Let rg be the result of computing g rem m. Thus, g = qg m + rg . Since
f = g + k m for some k D, then g + k m = qf m + rf and subtracting k m
from both sides of this equation yields g = qf m k m + rf = (qf k) m + rf .
Since rf satisfies the conditions to be a remainder in D and g rem m is unique,
then rf = rg and f rem m = g rem m. 
Using this lemma, we will now show that the set of elements D\m with addition
operation defined by  and multiplication operation defined by is a commutative
ring with identity. Some of the parts of this proof are left as exercises.

196

B. RESIDUE RINGS

(1) Closure of : By closure of + in D, then a + b is also an element of D. By


definition of rem, a + b rem m is also an element of D that satisfies the properties
to be a remainder in D. So a  b is also an element of D\m.
(2) Associativity of : Let a, b, and c be elements of D\m. Let r1 = a  b.
Note that r1 is an element of D\m that can be expressed as r1 = (a + b) q1 m
for some q1 . Now, let r2 = (a  b)  c = r1  c. Note that r2 can be expressed as
r2 = (r1 + c) q2 m = ((a + b) q1 m + c) q2 m for some q2 . Since D is a
Euclidean Domain, it is also an integral domain and is therefore commutative. By
commutativity and the distributive property of D, r2 = ((a + b) + c) (q1 + q2 ) m.
Since r2 satisfies the properties to be a remainder in D, then r2 is the unique
remainder ((a + b) + c) rem m.
Let r3 = b  c. Note that r3 is an element of D\m that can be expressed as
r3 = (b + c) q3 m for some q3 . Finally, let r4 = a  (b  c) = a  r3 . Note that r4
can be expressed as r4 = (a + r3 ) q4 m = (a + (b + c) q3 m) q4 m. Again by
the distributive property of D, r4 = (a + (b + c) (q3 + q4 ) m. Since r4 satisfies
the properties to be a remainder in D, then r4 is the unique remainder (a + (b + c))
rem m.
By the associative property of D, then r2 = ((a + b) + c) rem m = (a + (b + c))
rem m = r4 .
(3) Identity element: Let a be an element of D\m and 0 be the identity element
of D. Let z = 0 rem m. Then z satisifies the conditions to be a remainder in D
and is thus an element of D\m. Here, z can be expressed as 0 = q m + z. So
z = q m. Now, by the Lemma, a  z = (a + z) rem m = a q m rem m = a
rem m. So, z is the identity element in D\m.
(4) Additive inverse: Let a be an element of D\m. Let b be defined as the
element a rem m. Such an element exists because a is an element of D. This
element b is the unique remainder in D\m that satisfies a = qm+b. So b = qma
for some q in D. Now a  b = (a + b) rem m = (a + q m a) rem m = q m rem
m = 0. Thus, b is the additive inverse of a. Since a is arbitrary, then every element
of D\m has an additive inverse.
(5) Commutativity of : Let a and b be elements of D\m. Each of these
elements is also a member of D. Since D is a Euclidean Domain, it is also an integral
domain and is therefore commutative. By the commutativity of D, a  b = (a + b)
rem m = (b + a) rem m = b  a
(6) Closure of : By closure of in D, then a b is also an element of D. By
definition of rem, a b rem m is also an element of D that satisfies the properties
to be a remainder in D. So a b is also an element of D\m.
(7) Associativity of
proof of (2).

: This proof is left as an exercise and is similar to the

(8) Identity element in : Because D is a integral domain, it contains an


element 1 6= 0 such that a 1 = a for all a D. Let e = 1 rem m. Since e is a valid
remainder in D, then e D\m. By (2), e can be expressed by 1 = q1 m + e for

4. THE RESIDUE RING (D\m, ,

) IS ISOMORPHIC TO THE QUOTIENT RING (D/(m), , )


197

some q1 D. Now, let b be any element of D\m. Then, b e = (b e) mod m.


Thus, there exists a q2 D such that (b e) = q2 m + r2 where r2 = b e. So,
r2 = (b e) q2 m = (b (1 q1 m)) q2 m = b 1 (q1 + q2 ) m = b (q1 + q2 ) m.
Since b is a valid remainder in D, then r2 = b. Since b e is a unique remainder,
then b e = b. Since b is an arbitrary element of D\m and b e = b, then e is the
identity element in .
(9) Commutativity of
proof of (5).

: This proof is left as an exercise and is similar to the

(10) Distributive laws: Let a, b, and c be elements of D\m.


Let r1 = b  c. Note that r1 is an element of D\m that can be expressed as
r1 = (b+c)q1 m for some q1 . Next, let r2 = a (bc) = a r1 . Note that r2 can be
expressed as r2 = (ar1 )q2 m = (a((b+c)q1 m)q2 m = (a(b+c))(q1 +q2 )m.
Since r2 satisfies the properties to be a remainder in D, then r2 is the unique
remainder (a (b + c)) rem m.
Let r3 = a b. Note that r3 is an element of D\m that can be expressed
as r3 = (a b) q3 m for some q3 . Let r4 = a c. Similarly, r4 is an element
of D\m that can be expressed as r4 = (a c) q4 m for some q4 . Finally, let
r5 = (a b)  (a c) = r3  r4 . Now, r5 is an element of D\m that can be
expressed as r5 = (r3 + r4 ) q5 m for some q5 . Note that r5 can be expressed as
r5 = (((ab)q3 m)+((ac)q4 m))q5 m = ((ab)+(ac))(q3 +q4 q5 )m. Since
r5 satisfies the properties to be a remainder in D, then r5 is the unique remainder
((a b) + (a c)) rem m.
By the distributive property of D, then r2 = (a(b+c)) rem m = ((ab)+(ac))
rem m = r5 .
So, a (b  c) = (a b)  (a c). By (9), (b  c)
so both distributive properties hold.

a = (b

a)  (c

a) and

Since the set of elements D\m, we will now refer to this set of elements as a
residue ring.

4. The residue ring (D\m, , ) is isomorphic to the quotient ring


(D/(m), , )
We will now show that given a Euclidean Domain D with unique remainder
and some element m D that the residue ring D\m is isomorphic to the quotient
ring D/(m) where (m) is the ideal of all multiples of m.
Let be a mapping from D\m D/(m) be defined as follows:
(a) = a + (m)
Now is well-defined because it maps every element of D\m to exactly one
element of D/(m)

198

B. RESIDUE RINGS

Let (a) = (b) for two elements a and b of D\m. Then (a) and (b) are
elements of D/(m) and can be expressed as a + (m) and b + (m), respectively. Now
(a + (m)) (a + (m)) = (a  (a)) + (m) = z + (m), where z is the zero element of
D\m. Thus, (a + (m)) = (a + (m)). Since b + (m) = a + (m), then (b + (m))
(a + (m)) = z + (m) as well. Since (b + (m)) (a + (m)) = (b  (a)) + (m), then
(b  (a)) + (m) = z + (m). Let = b  (a) D\m. Since + (m) = z + (m),
there must exist an element C such that = z + C m = C m. Assume that
6= z. By definition of Euclidean Domain, then N () = N (C m) N (m). But
this violates the definition of D\m. So = z and therefore a = b. Thus, is a 1-1
function.
Let a + (m) be any element of D/(m) where a is an element of D. Let b = a
rem m. Here, b is an element of D which is also an element of D\m. Furthermore,
b can be expressed as b = a q m for some q D. Now (b) = (a q m) + (m).
By properties of ideals, (a q m) + (m) = a + (m). Thus, for every element a + (m)
of D/(m), there exists an element b of D\m such that (b) = a + (m). So is an
onto function.
Since is a well-defined, 1-1, onto function, is a bijection. To show that
D\m and D/(m) are isomorphic, we must show that (a  b) = (a) (b) and
(a b) = (a) (b) for all a and b in D\m. By definition of , (a) = a + (m)
and (b) = b + (m).
By definition of , (a) (b) = (a + b) + (m). Now, a  b = (a + b) rem m.
There is an element q1 of D such that a  b can be expressed as (a + b) q1 m. So
(a  b) = ((a + b) q1 m) + (m). By properties of ideals, ((a + b) q1 m) + (m) =
(a + b) + (m). So, (a  b) = (a) (b).
By definition of , (a) (b) = (a b) + (m). Now, a b = (a b) rem m.
There is an element q2 of D such that a b can be expressed as (a b) q2 m. So
(a b) = ((a b) q2 m) + (m). By properties of ideals, ((a b) q2 m) + (m) =
(a b) + (m). So, (a b) = (a) (b).
So is a bijection that satisfies (ab) = (a)(b) and (a b) = (a) (b).
Thus, D\m is isomorphic to D/(m) and we have shown that a residue ring is essentially equivalent to a quotient ring in the case where D has unique remainders. The
set of residues resulting from the division of all polynomials over a coefficient ring
by a fixed modulus polynomial is the residue ring used throughout this manuscript
that was argued to be isomorphic to a similarly defined quotient ring.

5. Examples
(1) The integers are a Euclidean domain with norm given by N (a) = |a|. It is
not a Euclidean domain with unique remainder, however. Suppose that a = 7 and
m = 3. Note that 7 = 2 3 + 1 = 3 3 2. Since both 1 and 2 have norms less
then N (m) = 3, then 1 and 2 are each valid reminders in this case. If we add the
additional restriction that the remainder be nonnegative, then we have a Euclidean
domain with unique reminder. Alternatively, one could add the restriction that

5. EXAMPLES

199

the remainder be nonpositive and obtain a different Euclidean domain with unique
remainder for the integers.
It is possible to restrict the allowable remainders to some other range of the
integers of length m by carefully selecting the definitions of norm and arithmetic
functions over D\(m). For example, if we wish the elements of D\(3) to be 3, 4,
and 5, then use N (a) = |a 3| (plus the restriction that the remainder be greater
than 2) and define addition and multiplication as follows:

ab =

3 + (a + b) rem 3

3 + (a b) rem 3

b =

In this case, z = 3 and the remainders are restricted to be 3, 4, and 5. Although


these more general remainder rings can be constructed in theory, it is unclear what
applications they have that cannot be handled by the simpler remainder rings that
involve N (a) = |a|.
Let us select the restriction of nonnegative remainders. Then Z\n is a remainder ring isomorphic to Z/(n). If n is a prime number p, then Z\p can be used to
construct the finite field GF (p).
(2) If F is a field, then the polynomial ring F [x] is a Euclidean Domain with
norm given by N (f ) = deg(f ) where f is a polynomial in F [x] and deg(f ) is
the degree of f . In this case, F [x] is a Euclidean Domain with unique remainder
with no additional restrictions required. If m(x) is any polynomial in F [x], then
F [x]\m(x) is a remainder ring isomorphic to F [x]/(m(x)). If F = Zp and m(x) is
an irreducible polynomial of degree n, then Zp [x]\m(x) can be used to construct
the finite field GF (pn ).
(3) The Gaussian integers Z[i] consist of the subset
of rational complex numbers
Q(i) given by Z(i) = {a + i b|a, b Z} where i = 1. It can be shown that
Z[i] is an integral domain. Here, we will show that Z[i] is a Euclidean domain
with unique remainder. Let N (a + b i) = a2 + b2 be the norm where the norm
is defined over the elements of Q(i) as well as the elements of Z[i]. This norm is
multiplicative, meaning that N (1 ) N (2 ) = N (1 2 ) for any 1 , 2 Z(i). Let
= a + b i and = c + d i 6= 0 be two elements of Z(i). Since Z[i] Q(i), then
and are also elements of Q(i), which is a field. Compute / = s + t i where
s = (ac+bd)/(c2 +d2 ) and t = (bcad)/(c2 +d2 ). Here, s and t are elements of Q,
so Q(i). Choose s0 to be the closest integer to s and t0 to be the closest integer to
t. Then let q = s0 +t0 i Z[i] and r = q Z[i]. Observe that (s+ti)(c+di) =
(/) () = , so r = (s + t i) (s0 + t0 i) = ((s s0 ) + (t t0 ) i) .
Since s0 and t0 were selected to be the closest integers to s and t respectively, then
|s s0 | 1/2 and |t t0 | 1/2 Thus, (s s0 )2 1/4 and (t t0 )2 1/4. So
N ((s s0 )+ (t t0)i) = (s s0 )2 + (t t0 )2 1/4 + 1/4 = 1/2. By the multiplicative
property of the norm, N (r) = N ((s s0 ) + (t t0 ) i) N () 1/2 N () N ().
So Z[i] is a Euclidean domain. The only case where a remainder is not uniquely
defined is when s or t is half of an odd integer. In this case, there are two possible
integers for s0 or t0 depending on which way we round. To resolve this discrepancy,

200

B. RESIDUE RINGS

one must specify whether to round up or round down. One can even specify
different rules for s0 and t0 if desired. Once these rules have been specified, then
Z[i] is a Euclidean domain with unique remainder. Then Z[i]\ is a remainder ring
for any 6= 0. The specific elements contained in Z[i]\ will vary depending on
the rounding rules selected for s0 and t0 , but the algebraic structure should be the
same for each case.
For example, let = 1 + i. If we round down for both s0 and t0 , then Z[i]\ =
{0, i}. If we round up for both s0 and t0 , then Z[i]\ = {0, i}. If we round down
for s0 , but up for t0 , then Z[i]\ = {0, 1}. If we round up for s0 , but down for t0 ,
then Z[i]\ = {0, 1}. Each of these rings is isomorphic to Z\2, Z/(2), and GF (2)
where the nonzero element is the identity element of the ring in each case.
By varying , we obtain other finite rings. Only when Z[i]\ has a prime
number of elements will the finite ring be isomorphic to a finite field.

domain
using the norm
(4)The Eisenstein integers Z[ 3] are a Euclidean

N (a + 3 b) = a2 + 3 b2 . for any element a + 3 b Z[ 3] and the method


of obtaining the quotient and remainder used for Z[i] . More complicated rounding
rules need to be developed to convert this into a Euclidean domain with unique
reminder, but this is theoretically possible. Then Z[]\ is a remainder ring for
any Z[] where 6= 0. It remains to be determined if this yields any algebraic
structures different from Z[i]\ for any Z[i] and if any of these structures have
any practical value.
(5) Other quadratic integer
rings. It is possible to create remainder rings for
other quadratic integer rings Z[ D] for certain values of D. The valid values of
D are determined by those values which yield Euclidean domains. Other examples
of valid values
D = 2, 7, and 11. In eachcase, the norm
of D are
is defined
2
2
by N (a
+
D

b)
=
a
+
D

b
.
for
any
element
a
+
D

Z[
D]. In order

for Z[ D] to be a Euclidean domain, then a method of obtaining a quotient


q and
remainder r must be developed such that N (r) < N () for any in Z[ D]. For
the valid values of D mentioned above, the technique of obtaining a quotient and
remainder used for Z[i] will work in each case. For other values of D such as 5,
it is not possible to meet this condition. To turn each Euclidean domain into one
with unique remainders,one must develop rules for specify what happens if the
quotient
obtained in Q( D) is the same distance
from several different points in

Z[ D].Once this has been determined, then Z[ D]\ is a remainder ring for any
Z[ D] where 6= 0. Again, it remains to be determined if this yields any new
algebraic structures and if these structures have any practical importance.
6. Concluding Remarks
This section of the appendix showed that the construction of residue rings is
isomorphic to quotient rings and finite fields involving Euclidean Domains with
unique remainders, but can be defined without the complicated concept of ideals
for beginning algebra students and engineers. Specifically, the representative elements commonly used in such quotient rings have an algebraic structure of their
own which can be used in place of the quotient ring.

6. CONCLUDING REMARKS

201

The theory of residue rings can be used for more than just constructung finite
fields. Residue rings can replace any application of quotient rings when a ring used
to construct a quotient ring is also a Euclidean domain with unique remainders.
The Gaussian integer rings mentioned in the previous section is one example of
these additional applications.

APPENDIX C

The convolution theorem


Because one of the main uses of the FFT for engineers is to efficiently compute
convolution and the approach of this book is not traditionally used by engineers, we
are going to show how to establish the convolution theorem using only engineering
techniques. The result is more difficult to establish from this approach, but shows
that the two perspectives of the FFT are consistent with one another.
Let x( ) be the input to an engineering system, ~( ) be the impulse response
for the system, and y( ) be the output defined by

y( )

n1
X
k=0

x( k) ~(k)

Now let X(k) be the (engineers) discrete Fourier transform (DFT) of x( ) and

H(k)
be the (engineers) DFT of ~( ). So, the input and transfer functions can be
expressed in terms of the discrete Fourier series using

x( )

~( )

n1
1 X
X(i) W i
n i=0

n1
1 X
H(j) W j
n j=0

where W = eI2/n . Recall that the engineers FFT scales the coefficients of
the discrete Fourier series by n/T0 , the engineers IFFT includes a scaling of the
result by T0 , and so a factor of 1/n is needed in the IFFT to produce the correct
coefficients for the series.
Using the definition of convolution, the output can now be expressed by
203

204

C. THE CONVOLUTION THEOREM

y( )

n
X

k=0

x( k) h(k)

n1
X
k=0

n2
1

n2
1

n2

n1
X
i=0

1
X(i) i( k)
n

n1
X n1
X n1
X
k=0 i=0 j=0

n1
X
X n1
X n1
i=0 j=0 k=0

n1
X n1
X
i=0 j=0

! n1
X1

H(j)
jk
n
j=0

X(i) i( k) H(j)
jk

X(i) H(j)
i (ji)k

X(i) H(j)
j

n1
X

(ji)k

k=0

Now consider the expression

n
X

ck

k=0

1 + c + 2c + + n/2c + (n/2+1)c + + (n1)c

where c = j i. It can be shown that this expression equals n if c is a multiple of n


and 0 otherwise. So in y( ), the summation is zero for all cases where i 6= j. Thus,

y( )

n
n1 n1
X
1 XX
j
X(i) H(j)
(ij)k

n2 i=0 j=0
k=0

=
=

1
n2

n1
X
i=0

n1
X
k=0

(n X(i) H(i)) i


1
X(k) H(k) k
n

Now the output sequence of the system can also be expressed in terms of the discrete
Fourier series using

y( )

n1
X
k=0

1
Y (k) k
n

where Y (k) is the discrete Fourier transform of y( ). Subtracting these two formulas
for y( ) and multiplying by n, we obtain

C. THE CONVOLUTION THEOREM

n1
X
k=0

(Y (k) X(k) H(k)) k

205

= 0

which holds for every 0 < n.


Let us now define ck = Y (k) X(k) H(k) for every 0 k < n and consider
the following system of equations:

c0 0
c0 1

+ c1 0
+ c1 2

+
+

+ cn1 0
+ cn1 n1
..
.

=
=

0
0

c0 n1

+ c1 2(n1)

+ cn1 (n1)(n1)

using each of the values of in the interval 0 < n. In matrix form, the linear
system of equations can be expressed as:

..
.

1
2
..
.

n1

2(n1)

..
.

1
n1
..
.

(n1)(n1)

c0
c1
..
.
cn1

0
0
..
.
0

Clearly, if we assign c0 = c1 = = cn1 = 0, then we have one solution to this


system of equations. Is it possible that there are other solutions to this system?
This is a question that requires a background in Linear Algebra to answer.
It turns out that the matrix in the above system of equations is a special type
of matrix called a Vandermonde matrix. Using linear algebra techniques, it can be
shown that something called the determinant of this matrix is nonzero if some
characteristics about the components of this matrix are satisfied. In the case at
hand, 1 6= 6= 2 6= =
6 n1 provided that is a primitive root of unity and
the condition is satisfied.
Because the determinant of the above matrix is nonzero, then the above system
of equations has the unique solution c0 = c1 = = cn1 = 0. Consequently, we
have established the convolution theorem

206

C. THE CONVOLUTION THEOREM

Convolution Theorem.
Let x( ) is the input to an engineering system with transfer function
~( ) and the output of the system for this input is y( ) = x( ) ~( ).
Now let X(k) be (the engineers) discrete Fourier transform (DFT) of

x( ), Y (k) be the DFT of y( ) and H(k)


be the DFT of ~( ). Then

Y (k) = X(k) H(k)


for each k in the interval 0 k < n.

To efficiently compute the convolution of an input signal x( ) and a transfer


function h( ), the engineer computes the FFT of these two signals (without scaling
the final result) to obtain X(k) and H(k). By the convolution theorem, the components of these FFTs can be pointwise multiplied to obtain the coefficients of the
discrete Fourier series of y( ), but scaled by n. These components can be scaled by
1/n and used as input to the inverse FFT to obtain y( ). Another valid approach
typically used by the engineer is to scale the output of the IFFT rather than doing
the scaling prior to the IFFT. Again, the property of duality provides a bridge
between the FFTs used in this computation and those used by the mathematician.

Bibliography
[1] Bernstein, D. Multidigit Multiplication for mathematicians.
<http://cr.yp.to/papers.html\#m3>.
[2] Bernstein, D. Fast Multiplication and its applications.
<http://cr.yp.to/papers.html\#multapps>.
[3] Bernstein, D. The Tangent FFT.
<http://cr.yp.to/papers.html\#tangentfft>.
[4] Bittinger, Marvin L. Intermediate Algebra, 9th Edition, Pearson Education (2003).
[5] Bouguezel, Saad, M. Omair Ahmad, and M.N.S. Swamy. An Improved Radix-16 FFT Algorithm, Canadian Conference on Electrical and Computer Engineering, 2: 1089-92, 2004.
[6] Bouguezel, Saad, M. Omair Ahmad, and M.N.S. Swamy. Arithmetic Complexity of the SplitRadix FFT Algorithms, International Conference on Acoustics, Speech, and Signal Processing, 5: 137-40, 2005.
[7] Brigham, E. Oran. The Fast Fourier Transform and its Applications, Prentice Hall (1988).
[8] Buneman, Oscar. Journal of Computational Physics, 12: 127-8, 1973.
[9] Cantor, David G. and Erich Kaltofen. On fast multiplication of polynomials over arbitrary
algebras, Acta Informatica, 28: 693-701, 1991.
[10] Chu, Eleanor and Alan George. Inside the FFT Black Box: Serial and Parallel Fast Fourier
Transform Algorithms, CRC Press (2000).
[11] Cooley, J. and J. Tukey. An algorithm for the machine calculation of complex Fourier series,
Mathematics of Computation, 19: 297-301, 1965.
[12] Dubois, Eric and Anastasios N. Venetsanopoulos. A New Algorithm for the Radix-3 FFT,
IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(3): 222-5, 1978.
[13] Duhamel, Pierre and H. Hollmann. Split-radix FFT algorithm, Electronic Letters, 20: 14-6,
1984.
[14] Duhamel, P. and M. Vetterli. Fast Fourier Transforms: A tutorial review and a state of the
art, Signal Processing, 19: 259-99, 1990.
[15] Fiduccia, Charles. Polynomial evaluation via the division algorithm: the fast Fourier transform revisited, Proceedings of the fourth annual ACM symposium on theory of computing,
88-93, 1972.
[16] Frigo, Matteo and Steven Johnson. The Design and Implementation of FFTW3, Proceedings
of the IEEE, 93(2): 216-231, 2005.
[17] Gao, Shuhong. Clemson University Mathematical Sciences 985 Course Notes, Fall 2001.
[18] von zur Gathen, Joachim and J
urgen Gerhard. Arithmetic and Factorization of Polynomials
over F2 . Technical report, University of Paderborn, 1996.
[19] von zur Gathen, Joachim and J
urgen Gerhard. Modern Computer Algebra, Cambridge University Press (2003).
[20] Gentleman, Morven and Gordon Sande. Fast Fourier Transforms for fun and profit, AFIPS
1966 Fall Joint Computer Conference. Spartan Books, Washington, 1966.
[21] Gopinath, R. A. Comment: Conjugate Pair Fast Fourier Transform, Electronic Letters, 25(16)
1084, 1989.
[22] Heideman, M. T. and C. S. Burrus, A Bibliography of Fast Transform and Convolution
Algorithms II, Technical Report Number 8402, Electrical Engineering Dept., Rice University,
Houston, TX 77251-1892, 1984.
[23] Heideman, Michael T., Don H. Johnson, and C. Sidney Burrus, Gauss and the History of the
Fast Fourier Transform.
[24] van der Hoeven, Joris. The Truncated Fourier Transform and Applications. ISSAC 04 Proceedings, 2004.
207

208

BIBLIOGRAPHY

[25] van der Hoeven, Joris. Notes on the Truncated Fourier Transform. Preprint., 2005.
[26] Johnson, Steven G. and Matteo Frigo. A modified split-radix FFT with fewer arithmetic
operations, IEEE Trans. Signal Processing, 55 (1): 111-119, 2007.
[27] Kamar, I. and Y. Elcherif. Conjugate Pair Fast Fourier Transform. Electronic Letters, 25 (5):
324-5, 1989.
[28] Krot, A. M. and H. B. Minervina. Comment: Conjugate Pair Fast Fourier Transform, Electronic Letters, 28(12): 1143-4, 1992.
[29] Lidl, Rudolf, and Harald Neiderreiter. Finite Fields. Encyclopedia of Mathematics and Its
Applications, Volume 20, Cambridge University Press (1987).
[30] Loy, Gareth. Musimathics: the mathematical foundations of music, volume 2, MIT Press,
2007.
[31] Mateer, Todd D. PhD Dissertation.
[32] Merris, Russell. Combinatorics (Second Edition), Wiley (2003).
[33] Nussbaumer, H.J. Fast Fourier Transforms and Convolution Algorithms, Springer (1990).
[34] Suzuki, Y
oiti, Toshio Sone, and Kenuti Kido. A New FFT algorithm of Radix 3, 6, and 12,
IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(2): 380-3, 1986.
[35] Takahashi, Daisuke. An Extended Split-Radix FFT Algorithm, IEEE Processing Letters, 8
(5): 145-7, 2001.
[36] Wan, Zhe-Xian. Lectures on Finite Fields and Galois Rings, World Scientific Publishing Co.
(2003).
[37] Wang, Yao and Xuelong Zhu. A Fast Algorithm for Fourier Transform Over Finite Fields
and its VLSI Implementation, IEEE Journal on Selected Areas in Communications, 6 (3):
572-7, 1988.
[38] Yavne, R. An economical method for calculating the discrete Fourier transform, Proc. Fall
Joint Computing Conference, 115-25, 1968.

Das könnte Ihnen auch gefallen