Beruflich Dokumente
Kultur Dokumente
ABSTRACT
Small
length
Discrete
Cosine
Transforms
(DCT'S)
areusedforimagedatacompression.
In t h a t case,
length 8 or 16 DCT's
are
needed
to be
performed
a t video rate.
We proposetwonewimplementation
of DCT'swhich
have
several
interesting
features,
as f a r as VLSI
implementation is concerned.
A first
one,
using
modulo-arithmetic,
needs
only
so t h a t a single
one
multiplication
per
input
point,
multiplier is needed on-chip.
Asecondone,basedon
a decomposition of t h eD C T
of t h e s e
into
polynomial
products,
and
evaluation
polynomialproducts
by distributedarithmetic,results
small
chip,
with
a great
regularity
and
in a very
t e s t a b i l i Ft yu.r t h e r m o trsheae,m
s t er u c t ucraen
be
used
for
FFT computationbychangingonlythe
ROM-part of t h e chip.
a new
Bothnewarchitectures
,are mainlybasedon
as a cyclicconvolution,
formulation of a length-2DCT
which is explainedinthefirstsection
of thepaper.
While i t is possible
to
obtain
"classical"
algorithms
meeting
these
three
points
(the
paper
describing
t h e m is under
the
process
of being
written),
we
proposeinthispapertwocompletelynewapproaches
t hhaat vs e v e rianlt e r e s t i nf ega t u r e s ,
as f a r
VLSI implementation is concerned.
We will
give
in this
paper
only
sketches
of proofs
f o rt h ed e r i v a t i o n s
of thealgorithms,sinceouraim
is to showthattheunderstanding
of t h e m a t h e m a t i c a l
underlying
structure
of t hDe C cTalne a d
to new
efficient algorithms.
11. THE LENGTH 2" DCT AS POLYNOMIAL PRODUCTS
The DCT is defined as follows :
I. INTRODUCTION
In therecentyears,many
fast DCTalgorithmswere
of majorinterest
:
proposed,amongwhichthreeare
the
CHEN-FRALICK
[I] algorithm, B.G. LEE [31,
and VETTERLI-NUSSBAUMER [41 algorithm.
a long
time,
has
The
first
one,
being
proposed
for
been
considered
for
VLSI implementation
several
times,althoughitdoesnotmeettheminimumarithmetic complexity Dl.
Theotheronesmeettheminimumknownnumber
of
bgth multiplications and additions
to implement a length
DCT
2 algorithm.
Furthermore,
has
itbeen
shown
t h a t , if thesealgorithmscouldbeimproved,thesame
approach would also improve a whole class of algorithm
(Le.IDand
2-D FFT's, DST ---) [ 6 ] . F r o m a p r a c t i c a l
point of view,thealgorithm
by LEEhasgreater
a
regularity than the VETTERLI-NUSSBAUMER algorithm,
but
has
poor
roundoff
noise
performances,
due
to
t h e l/cos coefficients.Both of themhavebeenimplemented in hardware (or silicon)
[51.
Withthoseconsiderations
in mind,onecan
t h e r e is stillsomeneedforDCTalgorithmsmeeting
the following three characteristics altogether
see t h a t
:
of
the
graph
(the
availability
of
of a length N DCT is o f t e n
required),
- good noise performances.
as
(1)
xk
N-1
=
2n
xi cos
i=O
4N
(2i+l) k
The
equivalence
between
the
above
DCT
for
and a cyclic
convolution
is obtained
through
two
:
permutations of t hien p uvt a r i a b l exs .
Thefirstone,alreadygiven
in[4]
t h tee r m (s2 i + l )
in ( I ) i n t o( 4 i t l )
'k
=x
is used to c h a n g e
: L e t us define
as :
N-1
(*)
N=2"
2.rr
X I
4N ( 4 i + l )
cos
i=O
The
second
one
will allow t o change a product of
indices ( 4 i + l ) ( 4 k + l )i n t o a sum of indices : u-1 + vk'
This
result
is obtained
through
the
use
of a o n e
t o one
correspondance
between
the
set of i n t e g e r s
of t hfeo r m
(4i+l),
i-0, -- 2"-1 a n tdhseu c c e s s i v e
n+2
powers of 5 modulo 2
i t is always
possible
write :
..
to
U.
(3)
' >2n+2 ,
4i+l = < 5
This
can
be
applied
recursively
to t hlee n g t h
N/2
DCTarisingfromthecomputation
of t h ee v e nt e r m s ,
a completeformulation
a n d so on,thusresultingin
of the DCT as polynomial products.
x'
< 5i >
(4)
X'li
4N -1
1 XZk 1
L e t us now
consider
separately
the
even
a n do d dt e r m si X 2 k + l
f of t h e DCT.
I t is well
known,
and
fairly
obvious
from
eq.
(1)
that
X2k is t h eo u t p u t
of a DCT of lengthN/2.
When
considering
these
polynomial
products,
it
is
easily
recognized
that
polynomials
the
involving
t hi en p u t
of t hDe CaTraerl el d u c t i o n s
of X(z)
modulo the cyclotomic factors of xN-1 (N=Zn).Knowing
see t h a t h ew h o l e
set of
polynomial
t h i so, n ec a n
products is equivalent t o a cyclic
convolution
(Le.
N
a polynomial
product
modulo
x -1) followed
b
f i a
reduction
modulo
the
cyclotomic
factors
of x -1.
T hsee q u e n c e
t o be
cyclically
convolved
with
the
to befound.But,sinceweknow,
i n p u td a t ar e m a i n s
by
successive
applications
of eq. (IO) t o t h D
e CT's
of decreasinglength
N, N/2, N / 4 ---- t h ee x p r e s s i o n
of t h e unknown
polynomial
modulo
the
cyclotomic
f a c t o r ist,
is easy to reconstruct
the
initial
one,
given in eq. (12) :
H e n c e ,t h ef o l l o w i n gd e c o m p o s i t i o n ,o nt h eo d dt e r m s
will apply recursively on the DCT's
of reduced lengths.
,
When
considering
only
the
odd
terms
)X2k+l
eq.(I)
is nowsymmetrical
in i a n d k, a n dt h et w o
permutationsdescribedabovearenowfeasiblein
k.
( W i t ht h eo n l yd i f f e r e n c et h atth e r ea r eN / Z + t e r m s
XZk+l, and N t e r m s xZicl, thus resulting in the
- term
of eq. 7. ( s e e [XI formoredetails).Hence,wehave
as a result :
We have
now
established
that
the
DCT
N=Zn can be obtained as shown in fig.
of
length
(I).
where :
I t hasbeenshownbyWINOGRAD
[91, t h a tt h em u l t i plicativecomplexity of a cyclicconvolution
of length
2" is given by :
(13)
N/2-1
'k
(14)
N - 1
V(z)
LA
i = O
(11)
Y(z) = X(z)
. V(z) mod
<
si >4N
-n -2
Consequences of p r a c t i c a li m p o r t a n c ec a nb eo b t a i n e d
byobservation
of t a b l e 1, containingthecomparison
betweenthislowerboundandthepracticalalgorithms
for short-lengths :
as :
zN" + I
Le. : t h eo d dt e r m s
of t h eD C Tc a nb es t a t e d
a polynomial product of length N/2.
as
42.2.2
1806
I t is possible(butmoreintricate)
t o show,byusmg
of WINOGRAD t h at th iusp p e r s o moet h erre s u l t s
boundisalsothelowerbound.Thisresultwasalready
obtained by M.T. HEIDEMAN [ I 11.
211
COS
-n -1
(10)
Furthermore,
one
of the
multiplications
involved
as a convolution, as shown
itnhDe C cTo m p u t e d
( I ) is trivial ( V(z) mod. x-1 = 1). We t h e n
in
fig.
obtain, as an upper bound :
x". z
t o g e ta nu p p e r
of t h el e n g t h
4
-
VETTCRLI
LEE
26
32
32
lower bound
CHEN
16
6
16
44
L-1
of
F u r t h e r m o rsei,n ct heceo m p u t a t i o n
of t hr e s u l t
modulo
the
cyclotomic
factors
of xN-I is obtained
as intermediate
variables
inside
the
inverse
NTT,
be
simplified
t hbeu t t e r f l i essh o w n
in fig. (2) can
withthelastoperationsinvolvedinthecomputation
Thisresultsinthediagramshown
I t shouldbenotedthatthiscorresponds
case for NTT's to be used :
to a favorable
SinceNTT'saregenerally
performedonshort-length
s e q u e n c e s( N = 1 6s e e m s
to be a maximum),weavoid
to t h e
in
NTT
that,
due
the
usual
problem
arising
th
relationship
betweena
, t h e Nroot of unity, N,
the
length
of t ht er a n s f o r ma n, d
M the
modulus
( a N 2 1 mod M), it is often
impossible
to use 2
as a root of unity
(thus
avoiding
multiplications
in the NTT) for even moderate lengths.
- What is needed to c o m p u t teh D
e CT
is really a
cyclic convolution, and there
is no need of the overlapadd
or
overlap-save
algorithms
to obtain a linear
convolution, as is needed in FIR filtering.
- Themoduloarithmetic
is notsuch a problem,since,
with the given constraints, we can work modulo
a Fermatnumber,or
a pseudoFermatnumber
[131, which
gives one of the Cimplest known modulo-arithmetic
[141.
In this case, a In flg. (3) represents
only
a shift,
andcanbeimplemented
by a rotation of theinput
word at a bit level.
- F u r t h e r m o r e , s i n c e a great precision on the
Xk is oft e nn e e d e d( u s e
of DCTinadaptativefeedbackloops),
t h en e e d
of greater
wordlengths
when
using
NTT's,
usual
case, is not
such
a waste.
c o m p a r e d to the
Y =
of innerproducts
6-1
a 1. x i o
.t
j=1
obtaining a
SincewehavenowestablishedtheDCT
as a cyclic
convolution,wecanuseNumberTheoreticTransforms
(NTT) 1121 t o c o m p u t e t h e c o n v o l u t i o n , a n d t h e s c h e m e
of fig. ( I ) now becomes as showninfig.
(2).
of t h e NTT-'box.
in fig. 3 f o r N=8.
(17)
t o be computed, and
,
(E
L-1
ai xij)
2-j
is0
In thisequation,thedoublesum
is a successiveshift
and
add
of elementary
terms
(between
brackets),
each
term
being
an
inner
product
between
ai
a n d a v e c t o r of bits (x.., i = O , ---N-I).
'I
f dependson
N binaryvariaL e t f bethisfunction.
ZN different
values.
If t h e s e
bles,
hence
can
take
a ROM at t h ea d d r e s s
corresvaluesarestoredin
ponding to the
binary
configuration
of the
input
bits,
an
implementation
of the
inner
product
by
distributed arithmetic is as shown in fig. 4.
I 1
When usedin
a DCTalgorithm,thedistributedarithof polynomial
the
product
m eitm
i cp l e m e n t a t i o n
willrequireoneinnerproductcomputationpercoeffic i e n t of theresultingpolynomial,andsomebutterflies
todecomposetheinitialDCTintopolynomialproducts
(see fig. 5).
A number of r e m a r k s a r e of i n t e r e s t :
- Sincethe
ROM is addressed by t h eb i t s
of s a m e
weight of t h e o u t p u t s of t h eb u t t e r f l i e s ,t h e s eb u t t e r flies can be implemented in serial arithmetic.
- Thespeed
of a circuitimplementingthisarchitect u r e will belimitedonly
by theoutputaccumulator.
If therequiredspeed
is lower,it is possible to r e d u c e
t hsei z e
of t hcei r c u i t
by using the
relationships
between
the
different
inner
products
involved
181,
in a mannerverysimilar
to thatexplainedin
[I 51
f o r t h e c o m p u t a t i o n of convolution.
All t h ec o m b i n a t i o n s
of t h ei n p u td a t aa r ep e r f o r medinserialarithmetic.Hence,theresultingarchiregular
and
easily
implemented.
t e c t u r e is very
S i n ctehset r u c t u r e
of the
decomposition
of t h e
is t hsea m e
as f o r
DCT
into
polynomial
products
o t h et r a n s f o r m st h,sea mset r u c t u rceaanl sboe
of F o u r iterra n s f o r m s
used
for
the
computations
by changing only the
ROM p a r t of t h e chip.
VI. CONCLUSIQN
We have
first
explained
the
equivalence
between
DCT and cyclic convolution.
Thus,weusedthisrelationshiptoobtainnewDCT
algorithms
with
some
characteristics
suitable
for
VLSI implementation.
O t h ea rl g o r i t h mc sa n
also be
obtained
with
such
an approach. Further work will be reported.
422.3
1807
REFERENCES
s11
[21
[31
[41
151
[61
DUHAMEL
P.
: "Dispositif
transformee
de
encosinusd'unsignalnum6rique6chantillonni".
French
patent,
n"9601629,
February
1986.
P. DUHAMEL : "Dispositif d ed 6 t e r m i n a t i o nd e
latransformkenumkriqued'unsignal".French
patent n"8612431, September 1986.
S. WINOGRAD : "Some
bilinear
forms
whose
m u l t i p l i c a t ci voem p l e xdi et yp etnhodens
field of constants".
Math.
Syst.
Theory,
1977,
Vole10,pp.169-180.
L. AUSLANDER, S. WINOGRAD : "Themultiplicative complexity of certain semi-linear systems
defined bypolynomials".
Adv. in AppliedMathematics. Vol. 1, n03, pp.257-299,1980.
M.T.
HEIDEMAN,
Private
communication.
H.J.NUSSBAUMER, "Fast Fourier Transform and
Convolution
algorithms."
Springer-Verlag,
1981.
R.C.AGARWAL,
C.S. BURRUS : "Fast convolutions
using
Fermat
number
transforms
with
applicationtodigitalfiltering".IEEETrans.on
ASSP, VOI. 22, pp. 87-97,1974.
L.M. LEIBOW'ITZ : "A simplifiedbinaryarithmetic
for
the
Fermat
Number
Transform".
IEEE
Trans.
on
ASSP, Vol. 24,
pp.
356-359,
1976.
S. CHU, C.S. BURRUS : "A p r i m e f a c t o r F F T a l gorithm using distributed arithmetic". IEEE Trans.
o n ASSP, Vol. 30, n02,pp.217-226,April1982.
It
-.13
14
Fig. 2
Fig. 4
Imp1ementatlon of an
inner
product
dlstrlbuted arlthrnetlc
28
Fig. 5 : TheDCT
1808
by
'
of length 8 by distributed a r l t h r n e t l c