0270 PDF C02 PDF

Chapter 2
Some Mathematical and

Computational Preliminaries
The development in Chapter 1 showed that the computation of the DFT involves the
multiplication of a matrix M by a vector x, where the matrix has very special structure.
In particular, it is symmetric, and each of its elements is a power of a single number
, where depends on the order N = 2n + 1 of the matrix1 . These numbers are called
the twiddle factors. Moreover, since = N , only N of the N 2 entries in the
matrix are actually dierent. Finally, since depends on N , in many contexts it will
be necessary to distinguish s corresponding to dierent values of N ; to do this, the

notation N
will be used.
For example, the matrix M in (1.6) satises
1
1
1 1
1 2
1 3
1 4
1
2
4
6
8
1
3
6
9
12
1
4
8
12
16
1
1
1 1
1 2
1 3
1 4
1
2
4
1
3
1
3
1
4
2
1
4
3
2
1
Given these features, it is not surprising that when N is a power of two, the structure of

the matrix can be exploited to reduce the cost2 of computing X from N 2 resulting
from a straightforward matrix times vector computation to (N log2 N ). Indeed,
exploring the numerous variants of the fast Fourier transform (FFT) algorithm which
exploit this structure is a main topic of this book. However, the price of this reduction
is the use of complex arithmetic. This chapter deals with various aspects of complex
arithmetic, together with the ecient computation of the twiddle factors. Usually the
twiddle factors are computed in advance of the rest of the computation, although in
some contexts they may be computed on the y. This issue is explored more fully
in Chapter 4 for implementing sequential FFTs and in Chapter 24 for implementing
parallel FFTs.
1 In
the remainder of this book, the order of the matrices M and M will be denoted by N . Moreover,
as will be apparent in the development in the next chapter, N will often be a power of 2.
2 The -notation dened in Appendix A is used to denote the arithmetic cost of each algorithm.
2000 by CRC Press LLC
2.1
Computing the Twiddle Factors
Let N be a power of 2, and recall that

r
N
= ejr = cos(r) + j sin(r),
2
,
N
j=
1 .
r
Since the N
s are the complex roots of unity, they are symmetric and equally spaced
on the unit circle. Thus, if a + jb is a twiddle factor, then so are a jb and b ja.
By exploiting this property, one needs only to compute the rst N/8 1 values, namely
r
r
N
for r = 1, . . . , N/8 1. (Note that N
= 1 for r = 0.)
The most straightforward approach is to use the standard trigonometric function
library procedures for each of the N/8 1 values of cos(r) and sin(r). The cost will
be N/4 2 trigonometric function calls, and each call will require several oating-point
arithmetic operations.
To avoid the relatively expensive trigonometric function calls, one can use the following algorithm proposed by Singleton [83]. It makes use of the trigonometric identities.
cos(a + b) = cos a cos b sin a sin b
a
a
cos a = cos 2
= 1 2 sin2
2
2
sin(a + b) = sin a cos b + sin b cos a
Letting = 2/N as above, cos((r + 1)) can be computed in terms of cos(r) and
sin(r) according to the formula derived below.
cos((r + 1)) = cos(r + )
= cos(r) cos() sin(r) sin()

= cos(r) 1 2 sin2 (/2) sin(r) sin()
= C cos(r) S sin(r) ,
where the constants C and S are
C = 1 2 sin2 (/2)
and S = sin().
When r = 0, the initial values are cos(0) = 1 and sin(0) = 0. Using these same
constants C and S, sin((r + 1)) can be computed in terms of cos(r) and sin(r) in a
similar way:
sin((r + 1)) = sin(r + )
= sin(r) cos() + cos(r) sin()

= sin(r) 1 2 sin2 (/2) + cos(r) sin()
= S cos(r) + C sin(r) .
A pseudo-code program for computing the N/2 twiddle factors is given as Algorithm 2.1 below. Note that the array wcos stores the real part of the twiddle factors
and the array wsin stores the imaginary part of the twiddle factors.
Algorithm 2.1 Singletons method for computing the N/2 twiddle factors.
begin
:= 2/N
S := sin(); C := 1 2 sin2 (/2) Call library function sin to compute S and C
0
wcos[0] := 1; wsin[0] := 0
Initialize N
= cos(0) + j sin(0)
for K := 0 to N/8 2 do
Compute the rst N/8 factors
wcos[K + 1] := C wcos[K] S wsin[K]
wsin[K + 1] := S wcos[K] + C wsin[K]
end for
L := N/8
Store the next N/8 factors
N/8
wcos[L] := 2/2; wsin[L] := 2/2
N
= cos(/4) + j sin(/4)
wcos[L + K] := wsin[L K]
wsin[L + K] := wcos[L K]
end for
L := N/4
Store the next N/4 factors
N/4
wcos[L] := 0; wsin[L] := 1
N
= cos(/2) + j sin(/2)
wcos[L + K] := wcos[L K]
wsin[L + K] := wsin[L K]
end for
end
2.2
Multiplying Two Complex Numbers
Recall from Chapter 1 that given complex numbers z1 = a + jb and z2 = c + jd,

(2.1)
z1 + z2 = (a + jb) + (c + jd) = (a + c) + j(b + d) ,
(2.2)
z1 z2 = (a + jb) (c + jd) = (a c b d) + j(a d + b c) .
Note that the relation j 2 = 1 has been used in obtaining (2.2).
2.2.1
Real oating-point operation (FLOP) count
Since real oating-point binary operations are usually the only ones implemented by
the computer hardware, the operation count of a computer algorithm is almost universally expressed in terms of ops; that is, real oating-point operations. According
to rule (2.1), adding two complex numbers requires two real additions; according to
rule (2.2), multiplying two complex numbers requires four real multiplications and two
real additions/subtractions.
It is well-known that multiplying two complex numbers can also be done using three
multiplications and ve additions. Letting = z1 z2 = (a + jb) (c + jd), the real
part of , denoted by Re(), and the imaginary part of , denoted by Im(), can be
obtained as shown in (2.3) below.

m1 = (a + b) c
m2 = (d + c) b
m3 = (d c) a
(2.3)
Re() = m1 m2
Im() = m1 + m3
Compared to directly evaluating the right-hand side of (2.2), the method described
in (2.3) saves one real multiplication at the expense of three additional real additions/subtractions. Consequently, this method is more economical only when a multiplication takes signicantly longer than an addition/subtraction, which may have been
true for some ancient computers, but is almost certainly not true today. Indeed, such
operations usually take equal time. Note also that the total op count is increased
from six in (2.2) to eight in (2.3).
2.2.2
Special considerations in computing the FFT
Since a pre-computed twiddle factor is always one of the two operands involved in each
complex multiplication in the FFT, any intermediate results involving only the real
and imaginary parts of a twiddle factor can be pre-computed and stored for later use.

For example, if N
= c + jd, one may pre-compute and store = d + c and = d c,
which are the intermediate results used by (2.3). With , , and the real part c stored,

each complex multiplication in the FFT involving x = a + jb and N
= c + jd can
be computed using three real multiplications and three real additions/subtractions
following (2.4) below.
m1 = (a + b) c
m2 = b
(2.4)
m3 = a
Re() = m1 m2
Im() = m1 + m3
Compared to using formula (2.2), the total op count in (2.4) remains six, but one
multiplication has been exchanged for one addition/subtraction. This will result in
lower execution time only if an addition/subtraction costs less than one multiplication.
As noted earlier, this is usually not the case on modern computers. The disadvantage is
that 50% more space is needed to store the pre-computed intermediate results involving
the twiddle factors; c, and must be stored, rather than c and d.
The paragraph above, together with (2.4), explains the common practice by researchers to count a total of six ops for each complex multiplication in evaluating the
various FFT algorithms. Following this practice, all complexity results provided in this
book are obtained using six ops as the cost of a complex multiplication.
2.3
Expressing Complex Multiply-Adds in Terms of

Real Multiply-Adds
As noted earlier, most high-performance workstations can perform multiplications as

fast as additions. Moreover, many of them can do a multiplication and an addition
simultaneously. The latter is accomplished on some machines by a single multiply-add
instruction. Naturally, one would like to exploit this capability [48]. To make use of
such a multiply-add instruction, the computation of complex z = z1 + z2 may be
formulated as shown below, where z1 = a + jb, = c + js, and z2 = d + je.
= s/c
m1 = d e
m2 = e + d
(2.5)
Re(z) = a + c m1
Im(z) = b + c m2 .
Thus, in total, one division and four multiply-adds are required to compute z. Formula (2.5) is derived below.
z = z1 + z 2
= (a + jb) + (c + js) (d + je)
= (a + jb) + (c d s e) + j(s d + c e)
= a + (c d s e) + j (b + (c e + s d))
s

=a+c d
e +j b+c e+
d
c
c
= a + c (d e) + j(b + c (e + d))
= a + c m1 + j(b + c m2 ).
As will be apparent in the following chapters, the FFT computation is dominated
by complex multiply-adds. In [48], the idea of pairing up multiplications and additions
is exploited fully in the implementation of the radix 2, 3, 4, and 5 FFT kernels. However, the success of this strategy depends on whether a compiler can generate ecient
machine code for the new FFT kernels as well as on other idiosyncrasies of dierent
machines. The actual execution time may not be improved. See [48] for details on
timing and accuracy issues associated with this strategy.
2.4
Solving Recurrences to Determine an Unknown

Function
In various parts of subsequent chapters, it will be necessary to determine the computational cost of various algorithms. These algorithms are recursive; the solution of
a problem of size N is the result of (recursively) solving two problems of size N/2,
together with combining their solutions to obtain the solution to the original problem
of size N . Thus, the task of determining the arithmetic cost of such algorithms is to
determine a function T (N ), where

2T N2 + N if N = 2k > 1,
T (N ) =
(2.6)
if N = 1.
Here N is the cost of combining the solutions to the two half-size problems. The
solution to this problem,
(2.7)
T (N ) = N log2 N + (N ),
is derived in Appendix B; the -notation is dened in Appendix A. A slightly more

complicated recurrence arises in the analysis of some generalized FFT algorithms that
are considered in subsequent chapters. For example, some algorithms provide a solution
to the original problem of size N by (recursively) solving problems of size N/ and
then combining their solutions to obtain the solution to the original problem of size N .
The appropriate recurrence is shown below, where now N is the cost of combining
the solutions to the problems of size N/.

T N
if N = k > 1,
+ N
T (N ) =
(2.8)
if N = 1.
The solution to this problem is
(2.9)
T (N ) = N log N + (N ).
These results, their derivation, together with a number of generalizations, can be found
in Appendix B. Some basic information about ecient computation together with some
fundamentals on complexity notions and notation, such as the big-Oh notation, the
big-Omega notation, and the -notation, are contained in Appendix A.

0270 PDF C02 PDF

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

0270 PDF C02 PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Chapter 2

Some Mathematical and

2000 by CRC Press LLC

Computing the Twiddle Factors

Let N be a power of 2, and recall that

cos(a + b) = cos a cos b sin a sin b

2000 by CRC Press LLC

Multiplying Two Complex Numbers

Recall from Chapter 1 that given complex numbers z1 = a + jb and z2 = c + jd,

z1 + z2 = (a + jb) + (c + jd) = (a + c) + j(b + d) ,

z1 z2 = (a + jb) (c + jd) = (a c b d) + j(a d + b c) .

Note that the relation j 2 = 1 has been used in obtaining (2.2).

Real oating-point operation (FLOP) count

2000 by CRC Press LLC

obtained as shown in (2.3) below.

Special considerations in computing the FFT

2000 by CRC Press LLC

Expressing Complex Multiply-Adds in Terms of

As noted earlier, most high-performance workstations can perform multiplications as

Solving Recurrences to Determine an Unknown

2000 by CRC Press LLC

is derived in Appendix B; the -notation is dened in Appendix A. A slightly more

2000 by CRC Press LLC

Das könnte Ihnen auch gefallen