INSTRUMENTS
Theory, Algorithms,
and Implementations
Volume 3
1990
Volume 3
Edited by
Panos Papamichalis, Ph.D.
Digital Signal Processing
Semiconductor Group
Texas Instruments
TEXAS
INSTRUMENTS
IMPORTANT NOTICE
CONTENTS
v
FOREWORD ....... .
PREFACE ............. .
. ......... vii
PART I. INTRODUCTION
1. The TMS320 Family and Book Overview ................................................... .
iii
PART V. COMPUTERS
12. A DSPBased ThreeDimensional Graphics System
(Nat Seshan) ........................................................................... 423
PART VI. TOOLS
13. The TMS320C30 Applications Board Functional Description
(Tony Coomes and Nat Seshan) ........................................................... 467
TMS320 BIBLIOGRAPHY .................................................................... 533
iv
Foreword
Much has happened in the TMS320 Family since Volume 1 of Digital Signal Processing
Applications with the TMS320 Family was published, and Volumes 2 and 3 are a timely update to
the family history.
The DSP microcomputers keep changing the perspective of the systems designers by offering more computational power and better interfacing capabilities. The steps of change are coming
more quickly, and the potential impact is greater and greater. Because things change so rapidly in
this area, there is a pressing need for ways to quickly learn how to utilize the new technology. These
new volumes respond to that need.
As with Volume 1, the purpose of these books is to teach us about the issues and techniques
that are important in implementing digital signal processing systems using microprocessors in the
TMS320 Family. Volume 2 highlights the TMS320C25; and Volume 3, the TMS320C30 chip. A
large part of the books is devoted to such matters as characteristics of the TMS320C25 and
TMS320C30 chips, useful program code for implementing special DSP functions, and details on
interfacing the new chips to external devices. The remainder of the books illustrates how these
chips can be used in communications, control, and computer graphics applications.
What these two volumes make clear is how remarkably fast the field ofDSP microcomputing
is evolving. IC technologists and designers are simply packing more and more of the right kind of
computing power into affordable microprocessor chips. The highspeed floatingpoint computing
power and huge address spaces of chips like the TMS320C30 open the door to a whole new class
of applications that were difficult or impractical with earlier generations of fixedpoint DSP chips.
The signal processing theorists and system designers are clearly being challenged to match the creativity of the chip designers.
The present books differ from Volume 1 in the inclusion of a small section on tools. This is
a hopeful sign, because it is progress in this area that is likely to have the greatest impact on speeding
the widespread application of DSP microprocessors. While useful design tools are beginning to
emerge, much more can be done to help system designers manage the complexity of sophisticated
DSP systems, which often involve a unique combination of theory, numerical and symbolic processing algorithms, realtime programming, and multiprocessing. No doubt future volumes ofDigital Signal Processing Applications with the TMS320 Family will have more to say about this important topic. Until then, Volumes 2 and 3 have much useful information to help system designers
keep up with the TMS320 Family.
Ronald W. Schafer
Atlanta, Georgia
November 14, 1989
vi .
Preface
The newer, floatingpoint DSP devices, such as the TMS320C30, have brought an added dimension to DSP applications. With the TMS20C30, programming is much easier because the designer does not have to worry about dynamic range and accuracy issues. An algorithm implemented
in floatingpoint in a highlevel language can be easily ported to such a device. The new architecture
contains other features, besides the floating point capability, that simplify programming. Some of
these features (such as the software stack, the large register file, etc.) were added to facilitate the
development of highlevel language compilers. Currently, C and Ada compilers have been introduced. In addition, Spectron Microsystems introduced an operating system for DSPs (called
SPOX) that further facilitates the development of algorithms on the DSP devices.
Volume 3 of Digital Signal Processing Applications with the TMS320 F amity contains application reports primarily on the third generation of the TMS320 Family (floatingpoint devices).
This book is a continuation of Volumes 1 and 2 in the sense that it addresses the same needs of the
designer. The designer still has the task of selecting the DSP device with the appropriate cost, performance, and support, developing the DSP algorithm that will solve the problem, and implementing the algorithm on the processor. This volume tries to help by bringing the designer up to date
on the applications of newer processors or in different applications of earlier processors.
The objectives remain the same as in earlier volumes. First, the application reports supply
examples of device use and serve as tutorials in programming the devices. Of course, the same purpose is served on a more elementary basis by the software and hardware applications sections of
the corresponding user's guides. Second, since the source code of each application is provided with
the report, the designer can take it intact (or extract a portion of it) and place it in the application.
I t is assumed that the reader has exposure to the TMS320 devices or, at least, has the necessary
manuals (such as the appropriate TMS320 user's guides) that will help the reader understand the
explanations in the reports. The reports themselves include as references the necessary background
material. Additionally, the Introduction gives a brief overview of the available devices at the time
of the writing and points to the source of more information.
The reports are grouped by application area. The term report is used here in a broad sense,
since some articles from technical publications are also included. The authors of the reports are either the digital signal processing engineering staff of the Texas Instruments Semiconductor Group
(including both field and factory personnel, and sUmmer students) or third parties.
The source code associated with the reports is also available in electronic form, and the reader
can download it from the TI DSP Electronic Bulletin Board (telephone (713) 2742323). If more
information is needed, the DSP Hotline can be called at (713) 2742320.
The editor thanks all the authors and the reviewers for their contribution to this volume of
application reports.
Panos E. Papamichalis, Ph.D.
Senior Member of Technical Staff
vii
viii
Part I. Introduction
1. The TMS320C20 Family and Book Overview
TMS320C4x
*TMS320C40
TMS320C30
*TMS320C3026,I'"'_ _
*TMS320C31
e f
I
TMS320C2x
o
o p
r
TMS320Clx
m /
a m
n i
C P
e s
TMS320Cl0, 14
TMS320Cl025
TMS320C15/E15
TMS320C1525
TMS320C17/E17
TMS320C14/E14
TMS32020
TMS320C25
TMS320E25
TMS320C2550
*TMS320C26
*TMS320C50
*TMS320C51
Generation
The following article, "The TMS320 Family of Digital Signal Processors," by Lin, et. al.,
is reprinted from the Proceedings of the IEEE and gives an overview of the TMS320 family. Since
additional devices have been developed from the time the article was written, this section highlights
these newer devices. Table 1 shows a comprehensive list ofthe currently available TMS320 devices
and their salient characteristics.
1st
2nd
Data
Type
Device
TMS320ClO ~
TMS320ClO25
TMS320ClO14
TMS320E14
TMS320C15'
TMS320CI525
TMS320E15'
TMS320E1525
TMS320C17
TMS320EI7
Integer
Integer
Integer
Integer
TMS32020 ~
TMS320C25 ~
TMS320C2550'
TMS320E25'
TMS320C26
Time
RAM
(ns)
Integer
Integer
Integer
Integer
Integer
200
160
280
160
200
160
200
160
200
200
144
144
144
256
256
256
256
256
256
256
Integer
Integer
Integer
Integer
Integer
200
100
80
100
100
544
544
544
544
UK
Integer
~
Cycle
OnChip
ROM
EPROM
UK
I.5K
UK
4K
4K
4K
4K
4K
4K
4K
4K
4K
4K
256
I/O
aITChip
8x16
8xl6
8xl6
7x16
8x16
8xl6
8xl6
8x16
6xl6
6xl6
128K
128K
128K
128K
128K
16xl6
16xl6
16xl6
16xl6
1
1
1
1
1
16Mx32
16xl6
3rd
TMS320C30
Float Pt
60
2K
4K
16M
TMS320C50
Integer
50
8.5K
2K
128K
Serial
4K
4K
4K
4K
4K
4K
4K
4K
4K
4K
5th
t
t
Parallel
DMA
OnChip
Timers
Package
2
2
1
1
DIPIPLCC
DlPIPLCC
DIPIPLCC
CERQUAD
DIPIPLCC
DIPIPLCC
DIP/CERQUAD
DIP/CERQUAD
DIPIPLCC
DIP/CERQUAD
t
t
t
t
t
1
1
1
1
1
PGA
PGAlPLCC
PGAlPLCC
CERQUAD
PLCC
PGA
CLCC
External DMA
Extemalnntemal DMA
For information on military versions of these devices, contact your local 11 sales office.
The additions to the first generation are the TMS320C14 and the TMS320E14; the latter is
identical with the former, except that the latter's onchip program memory is EPROM. The
TMS320C14/E14 devices have features that make them suitable for control applications. Figure
2 shows the components of these devices. The memory and the CPU are identical to
TMS320C15/E15, while the peripherals reflect the orientation of the devices toward control.
16x16bit
Multiply
32bitALU
Watchdog Timer
32bit ACC
0,1 Abit Shift
Timer/Counter 2
16 bitl/O
32bit PReg
2 Auxiliary Registers
SERIAL PORT
Event Manager
tation of certain applications, the TMS320C2550 was designed as a faster version of the
TMS320C25 and has a clock frequency of 50 MHz instead of 40 MHz.
The TMS320C26 is a modification of the TMS320C25 in which the program ROM has been
exchanged for RAM. The memory space of the TMS320C26 has l.5K words of onchip RAM and
256 words of onchip ROM, making it ideal for applications requiring larger RAM but minimal
external memory.
A new generation of higherperformance fixedpoint processors has been introduced in the
TMS320 Family: the TMS320C5x devices. This generation shares many features with the first and
the second generations, but it also encompasses significant new features. Figure 3 shows the basic
components of the first device in that generation, the TMS320C50.
Serial Port
Timer
S{WWaitsts
16x16
Inputs
16x16
Outputs
The software and hardware development tools for the TMS320 family make the development of applications easy. Such tools include assemblers, linkers, simulators, and Ccompilers for
the software. They include evaluation modules, software development boards, and extended development systems for hardware. These tools are mentioned in the following paper by Lin, et. al. The
interested reader can find much more information in the additional literature that is published by
Texas Instruments and mentioned in the next section. In particular, the TMS320 Family Development Support Reference Guide is an excellent source.
One important addition to the list of tools is the SPOX operating system, developed by Spectron Microsystems. SPOX permits you to write an application in a highlevel language (C) and run
it on actual DSP hardware. The operating system of SPOX hides the details of the interface from
you and lets you concentrate on your algorithm while running it at supercomputer speeds on the
TMS320C30.
References
Texas Instruments publishes an extensive bibliography to help designers use the TMS320 devices effectively. Besides the user's guides for corresponding generations, there are manuals for
the software and the hardware tools. The TMS320 Family Development Support Reference Guide
is particularly useful because it provides information, not only on development tools offered by TI,
but also on those produced by third parties. Here is a partial list of the literature available (the literature number is in parentheses)
TMS320 Family Development Support Reference Guide (SPRUOllA)
TMS320Clx User's Guide (SPRU013A)
You can request this literature by calling the Customer Response Center at 18002323200,
or the DSP Hotline at 17132742320.
10
KunShan Lin
Gene A. Frantz
Ray Simar, Jr.
Digital Signal Processor ProductsSemiconductor Group
Texas Instruments
Reprinted from
PROCEEDINGS OF THE IEEE
Vol. 75, No.9, September 1987
11
12
IEEE,
GENE A. FRANTZ,
AND
This paper begins with a discussion of the characteristics of digital signal processing, which are the driving force behind the design
of digital signal processors. The remainder of the paper describes
the three generations of the TMS320 family of digital signal processors available from Texas Instruments. The evolution in architec
tural design of these processors and key features of each generation of processors are discussed. More detailed information ~s
realtime operation,
= 2:
;=1
ali) x(n  i)
.! din
(1)
where yin) is the output sample at time n, ali) is the ith coefficient or weighting factor, and x(n  i) is the (n  i)th input
sample.
With this example in mind, we can discuss the various
 1)
.! din
5
 3) +
+.!'d(n5)
.! din
 2)
.! din
5
 4)
(2)
where din  i) is the daily stock closing price for the (n i)th day. Equation (2) assumes the same form as (1). This is
also the general form of the convolution of two sequences
of numbers, ali) and xli) [5], [6]. Both FIR filtering and convolution are fundamental to digital Signal processing.
Real Time Processing
In addition to being mathematically intensive, DSP algorithms must be performed in real time. Real time can be
defined as a process that is accomplished by the DSP without creating a delay noticeable to the user. In the stock market example, as long as the new average value can be computed priortothe next day when it is needed, it is considered
to be completed in real time. In digital signal processing
applications, processes happen faster than on a daily basis.
In the FIR filter example in (1), the sum of products must
13
able delay between a word being spoken and being recognized would be unacceptable and not considered realtime. Another example is in image processing, where it is
considered realtime if the processor finishes the processing within the frame update period. If the pixel information
cannot be updated within the frame update period, problems such as flicker, smearing, or missing information will
occur.
Application
Control
Telecommunications
Speech processing
Audio processing
Video frame rate
Video pixel rate
Nominal
Sample Rate
1 kHz
'8 kHz
810 kHz
4048 kHz
30 Hz
14MHz
14
A recent development in DSP technology is the singlechip digital Signal processor, such as the TMS320 family of
processors. These processors give the designer a DSP solution with its performance attainable only by the array processors a few years ago. Fig. 1 shows the TMS320 family in
graphical form with the yaxis indicating the hypothetical
performance and the xaxis being the evolution of the semiconductor processing technology. The first member of the
family, the TMS32010, was disclosed to the market in 1982
[11], [12]. It gave the system designer the first microcomputer capable of performing five million DSP operations
per second (5 MIPS), including the add and multiply functions [13] required in (1). Today there are a dozen spinoffs
from the TMS32010 in the first generation of the TMS320
family. Some of these devices are the TMS320C10,
TMS320C15, and TMS320C17 [14]. The second generation
of devices include the TMS32020 [15] and TMS320C25 [16].
The TMS320C25 can perform 10 MIPS [16]. In addition,
expanded memory space, combined singlecycle multiply/
accumulate operation, multiprocessing capabilities, and
expanded I/O functions have given the TMS320C25 a
2 to 4 times performance improvement over its predecessors. The third generation of the TMS320 family of processors, the TMS320C30 [26], [27]. has a computational rate ot'
33 million DSP floatingpoint operations per second (33
MFlOPS). Its performance (speed, throughput, and precision) has far exceeded the digital Signal processors available today and has reached the level of a supercomputer.
It we look closely at the TMS320 family as shown in Fig.
1, we can see that devices in the same generation, such as
the TMS320C10, TMS320C15, and TMS320C17, are assembly
objectcode compatible. Devices across generations, such
as the TMS320C10 and TMS320C25, are assembly sourcecode compatible. Software investment on DSP algorithms
therefore can be maintained during the system upgrade.
Another point is that since the introduction of the
TMS32010, semiconductor processing technology has
emerged from 3l'm NMOS to 2l'm CMOS to 1l'm CMOS.
2.0~m
CMOS
These concepts were designed into the TMS320 digital signal processors to handle the vast amount of data characteristic of DSP operations, and to allow most DSP operations to be executed in a singlecycle instruction.
Furthermore, the TMS320 processors are programmable
devices, providing the flexibility and ease of use of generalpurpose microprocessors. The following paragraphs discuss how each of the above concepts is used in the TMS320
family of devices to make them useful in digital signal processing applications.
Harvard Architecture
15
CLKOUT1
Pfefetch
decode
execute
N2
N1
(the closing price five days ago) was dropped and a new one
(today's closing price) was added. Or, each piece of the old
data is delayed or moved one sample period to make room
for the incoming most current sample. This delay is the
function of the DMOV instruction. Another special instruction in the TMS32010 is the LTD instruction. It executes the
LT, DMOV, and APAC instructions in a single cycle. The LTD
and MPY instruction then reduce the number of instruction
cycles per FIR filtertap from four to two. In the secondgeneration TMS320, such as the TMS320C25, two more special
instructions have been included (the RPT and MACD
instructions) to reduce the number of cycles per tap to one,
as shown in the following:
RPTK 255
MACD
(N
1)
DedicatedHardware Multiplier
As we saw in the general form of an FIR filter, multiplication is an important part of digital signal processing. For
each filter tap (denoted by i), a multiplication and an addition must take place. The faster a multiplication can be performed, the higher the performance of the digital signal
processor. In generalpurpose microprocessors, the multiplication instruction is constructed by a series of addi
16
give the DSP devices instruction cycle times less than 200
ns. The speCific instruction cycle times for the TMS320 family are given in Table 2. These fast cycle times have made
Table 2 TM5320 Cycle Times
Cycle Time
Device
(ns)
TMS320C10'
TMS32020
TMS320C25
TMS320C30
160200
160200
100125
6075
the TM5320 family of processors highly suited for mahy realtime DSP applications. Table 1 showed the sample rates for
some typical DSP applications. This table can be combined
with the cycle times indicated in Table 2 to show how many
instruction cycles per sample can be achieved by the various generations of the TMS320 for realtime applications
(see Fig. 3).
As we can see from Fig. 3, many instruction cycles are
ThirdGeneration TMS320
5000
SecondGeneration TMS320
FirstGeneration TMS320
600
0
l
50
I
0
1 kHz
I
0
E
0
c
0
M
j j
10 kHz
100 kHz
10 MHz
Sample Rata
Fig. 3. Number of instruction cycles/sample versus sample rate for the TMS320 family_
ond and third generations of the TMS320 with their multiprocessing capabilities and high throughput are better
suited.
Now that we have discussed the basic characteristics of
digital Signal processors, we can concentrate on specific
details of each of the three generations of the TMS320 family devices.
THE FIRST GENERATION
OF
17
I.
WE
DEN
MEN
BiO
INSTRUCTION
MC/MP
iNf
PROGRAM
OS
ROM
11536 x 161
LEGEND:
DATA RAM
1144" 161
Ace == Accumulator
AAP =
ARO =
Auxiliary register 0
DATA
PC
'" P register
T register
I.
I.
18
TMS320C151E15
The TMS320C15 and TMS320E15 are fu lIy objectcode and
pinforpin compatible with the TMS32010 and offer
expanded onchip RAM of 256 words and onchip program
ROM (TMS320C15) or EPROM (TMS320E15) of 4Kwords. The
TMS320C15 is available in either a 200ns version or a 160ns version (TMS320C1525).
TMS320C171E17
The TMS320C17/E17 is a dedicated microcomputer with
4K words of onchip program ROM (TMS320C17) or EPROM
(TMS320E17), a dualchannel serial port for fullduplex serial
communication, onchip companding hardware (uIawl
Alaw), a serial port timer for standalone serial communication, and a coprocessor interface for zero glue interface
between t~e processor and any 4lB/16bit microprocessor.
The TMS320C17/E17 isalsoobjectcode compatible with the
TMS32010 and can use the same development tools. The
Instruction
Cycle Time
Devices
(ns)
Process
TMS32010
TMS3201025
TMS3201014
TMS32011
TMS320Cl0
TMS320Cl025
TMS320C15
TMS320C1525
TMS320E15
TMS320C17
TMS320C1725
TMS320E17
200
160
280
200
200
160
200
160
200
200
160
200
NMOS
NMOS
NMOS
NMOS
CMOS
CMOS
CMOS
CMOS
CMOS
CMOS
CMOS
CMOS
OnChip
Prog ROM
(words)
OF
OnChip
Data RAM
OffChip
(words)
(words)
(words)
Ref
144
144
144
144
144
144
256
256
256
256
256
256
4K
4K
4K
[13]
[13]
[13]
[17]
[13]
[13]
[13]
[14]
1.5K
1.5K
l.5K
I.5K
l.5K
l.5K
4.0K
4.0K
4.0K
4.0K
4.0K
OnChip
Prog EPROM
forms (TMS320C25).
Extendedprecision arithmetic and adaptive filtering
support (TMS320C25).
Fullspeed operation of MAC/MACD instructions from
external memory (TMS320C25).
Accumulator carry bit and related instructions
(TMS320C25).
1.Bl'm CMOS technology (TMS320C25):
6Bpin grid array (PGA) package.
6Bpin lead chip carrier (PLCC) package.
2.4l'm NMOS technology (TMS32020):
68pin PGA package.
4.0K
Prog
4K
4K
4K
4K
4K
[14]
[14]
[14]
[14]
TMS320C25 Architecture
The TMS320C25 is the latest member in the second generation ofTMS320 digital signal processors. It is a pincompatible CMOS version of the TMS32020 microprocessor,
but with an instruction cycle time twice as fast and the inclusion of additional hardware and software features. The
instruction set is a superset of both the TMS32010 and
TMS32020, maintaining sourcecode compatibility. In addition, it is completely objectcode compatible with the
TMS32020 so that TMS32020 programs run unmodified on
the TMS320C25.
The 100ns instruction cycle time provides a significant
throughput advantage for many existing applications. Since
most instructions are capable of executing in a single cycle,
the processor is capable of executing ten million instruc
19
RIW
STRB
READY
iiii
XF
MP/MC
001201+''
A1SAO
D)(R(161
TlM{161
PROl161
IMRt61
GREGIBI
01500
LEGEND'
ACCH
 Accumulator high
ACCl
AlU
ARAU
ARB
ARP
OP
ORR
OXR
~
~
~
=.
=
Accumulator low
Arithmetic logic unit
Audla.y ,egiste. arithmetic unit
AUkiliel.,. regisle. pointe. buffer
AUk,lia.v regi$ler pointer
Oal8 memory pege pointer
Serial port data receive legister
Serial port data transmit register
IFR
PC
PFC
 Prefetch counter
IR
RPTC
GREG
RSR
XSR
AROAR7
STO.ST1
_
_
_
QlR
PRO
TIM
 Product .egister
Period 'egiste. for lime.
 Timer
Pfo9.em counter
 Tempore.y regisle.
20
a 16bit scaling shifter, a 16 x 16bit parallel multiplier, a 32bit Arithmetic logic Unit (AlU), and a 32bit accumulator.
The scaling shifter has a 16bit input connected to the data
bus and a 32bit output connected to the AlU. This shifter
produces a leftshift of 0 to 16 bits on the input data, as programmed in the instruction. Additional shifters at the outputs of both the accumulator and the multiplier are suitable
for numerical scaling, bit extraction, extendedprecision
arithmetic, and overflow prevention.
21
Operation
direct addressing
indirect; no change to AR.
indirect; current AR is incremented.
indirect; current AR is decremented.
indirect; ARO is added to current AR.
indirect; ARO is subtracted from
current AR.
indirect; ARO is added to current AR
(with reverse carry propagation).
indirect; ARO is subtracted from
current AR (with reverse carry
propagation),
Note: The optional NARP field specifies a new value of the ARP.
The flexibility of the TMS320C25 allows systems configurations to satisfy a wide range of application requirements
[16]. The TMS320C25 can be used in the following configurations:
a standalone system (a single processor using 4K
words of onchip ROM and 544words of onchip RAM),
parallel multiprocessing systems with shared global
data memory, or
host/peripheral coprocessing using interface control
signals.
A minimal processing system is shown in Fig. 6 using
external data RAM. and PROM/EPROM. Parallel multiprocessing and host/peripheral coprocessing systems can be
designed by taking advantage of the TMS320C25's direct
memory access and global memory configuration capabilities.
TMS320
FAMILY
The TMS320C30 [26][27] is Texas Instruments thirdgeneration member of the TMS320 family of compatible digital
signal processors. With a computational rate of 33 MFLOPS
(million floatingpoint operations per second), the
TMS320C30 far exceeds the performance of any programmable DSP available today. Total system performance has
been maximized through internal parallelism, more than
twentyfour thousand bytes of onchip memory, Singlecycle
floatingpoint operations, and concurrent I/O. Thetotal system cost is minimized with onchip memory and onchip
peripherals such as timers qnd serial ports. Finally, the user's
system design time is dramatically reduced with the availability of the floatingpoint operations, generalpurpose
instructions and features, and quality development tools.
The TMS320C30 provides the user with a level of performance that, at one time, was the exclusive domain of
supercomputers. The strong architectural emphasis of providing a lowcost system solution to demanding arithmetic
algorithms has resulted in the architecture shown in Fig. 7.
The key features of the TMS320C30 [26], [27] are as follows:
60ns singlecycle execution time, ll'm CMOS.
Two 1K x 32bit Singlecycle dualaccess RAM blocks.
One 4K x 32bit singlecycle dualaccess ROM block.
64 x 32bit instruction cache.
32bit instruction and data words, 24bit addresses.
32/40bit floatingpOint and integer multiplier.
32/40bit floatingpoint, integer, and logical ALU.
32bit barrel shifter.
Eight extendedprecision registers.
Two addressgenerators with eight auxiliary registers.
Onchip Direct Memory Access (DMA) controller for
concurrent I/O and CPU operation.
Peripheral bus and modules for easy customization.
Highlevel language support.
Interlocked instructions for multiprocessing support.
Zero overhead loops and singlecycle branches.
The architecture of the TMS320C30 is targeted at 60ns
and faster cycle times. To achieve such highperformance
Fig. 6. Minimal processing system With external data RAM and PROM/EPRO~.
22
RAM
PROGRAM
RAM
ROM
CACHE
BLOCK 0
BLOCK 1
BLOCK 0
(64 X 321
11K X 321
(1K X 321
(4K X 321
R5Y
H'OCD
Hl)[OA
mm
RiI'i
0(3101
A/2301
FSXO
OXO
IIlm
iNf(301
iACK
INTEGERJ
INTEGER!
FLOA TlNGPOINT
FlOA TlNGPOINT
ADDRESS GENERATORS
MULTiPliER
ALU
FSXl
EXTENDEDPRECISION
OX,
REGISTERS (ROR7)
CLKXl
X2/ClKIN
Vce/ 7D)
VSS11001
VBBP
SUBS
CLKRO
MC!MP
x,
FSRO
ORO
CONTROL REGISTERS
XF(lO)
CLKXO
ADDRESS
ADDRESS
GENERATOR 0
GENERATOR 1
FSRl
DR.
ClKRl
AUXILIARY REGISTERS
IAROAR7)
TClKD
CONTROL REGISTERS 1121
TClKl
The CPU consists of the following elements: floatingpoint/integer mu Itiplier; ALU for performing floatingpoi nt,
integer, and logical operations; auxiliary register arithmetic
The register file contains 28 registers, which may be operated upon by the multiplier and ALU. Thefirsteightofthese
registers (ROR7) are the extendedprecision registers,
which support operations on 40bit floatingpoint numbers
and 32bit integers.
The next eight registers (AROAR7) are the auxiliary registers, whose primary function is related to the generation
of addresses. However, they also may be used as generalpurpose 32bit registers. Two auxiliary register arithmetic
units (ARAUO and ARAU1) can generate two addresses in
a single cycle. The ARAUs operate in parallel with the multiplier and ALU. They support addressing with displacements, index registers (lRO and IR1), and circular and bitreversed addressing.
The remaining registers support a variety of system functions: addressing, stack management, processor status,
The three floatingpoint formats are assumed to be normalized, thus providing an extra bit of precision. The first
23
is a 16bit short floatingpoint format for immediate floatingpoint operands, which consists of a 4bit exponent, 1
sign bit, and an 11bit fraction. The second is a singleprecision format consisting of an 8bit exponent, 1 sign bit, and
a 23bit fraction. The third is an extendedprecision format
consisting of an 8bit exponent, 1 sign bit, and a 31bit fraction.
The total memory space of the TMS320C30 is 16M (million) x 32 bits. A machine word is 32 bits, and all addressing
is performed byword. Program, data, and 1/0 space are contained within the 16Mword address space.
RAM blocks 0 and 1 are each 1K x 32 bits. The ROM block
is 4K x 32 bits. Each RAM block and ROM block is capable
of supporting two data accesses in a single cycle. For example, the user may, in a single cycle, access a program word
and a data word from the ROM block.
The separate program data, and DMA buses allow for parallel program fetches, data reads and writes, and DMAoperations. Management of memory resources and busing is
handled by the memory controller. For example, a typical
mode of operation could involve a program fetch from the
onchip program cache, two data fetches from RAM block
0, and the DMA moving data from offchip memory to RAM
block 1. All of this can be done in parallel with no impact
on the performance of the CPU.
A 64 x 32bit instruction cache allows for maximum system performance with minimal system cost. The instruction
cache stores often repeated sections of code. The code may
then be fetched from the cache, thus greatly reducing the
number of offchip accesses necessary. This allows for code
to be stored offchip in slower, lower cost memories. Also,
the external buses are freed, thus allowing for their use by
the DMA or other devices in the system.
DMA
All peripheral modules are manipulated through memorymapped registers located on a dedicated peripheral bus.
This peripheral bus allows for the straightforward addition,
removal, and creation of peripheral modules. The initial
TMS320C30 peripheral library will include timers and serial
ports. The peripheral library concept allows Texas Instru
24
The TMS320C30 provides two external interfaces: the parallel interface and the 1/0 interface. The parallel interface
consists of a 32bit data bus, a 24bit address bus, and a set
of control Signals. The 110 interface consists of a 32bit data
bus, a 13bit address bus, and a set of control Signals. Both
ports support an external ready Signal for waitstate generation and the use of softwarecontrolled wait states.
TheTMS320C30 supports four external interrupts, a number of internal interrupts, and a nonmaskable external reset
signal. Two dedicated, generalpurpose, external 1/0 flags,
XFO and XF1, may be configured as input or output pins
under software control. These pins are also used by the
interlocked instructions to support multiprocessor communication.
Pipelining In the TMS320C30
While the pipelining of the different phases of an instruction is key to the performance of the TMS320C30, the
designers felt it essential to avoid pipelining the operation
of the multiplier or AlU. By ruling out this additional level
of pipelining it was possible to greatly improve the processor's useability.
Instructions
THREE MPYF
THREE MPYF
25
The interlocked operations instructions support multiprocessor communication. Through the use of external signals, these instructions allow for powerful synchronization
mechanisms, such as semaphores, to be implemented. The
interlocked operations use the two external flag pins, XFO
and XF1. XFO signals an interlockedoperation request and
XF1 acts as an acknowledge signal for the requested interlocked operation. The interlocked operations include interlocked loads and stores. When an interlocked operation is
performed the external request and acknowledge signals
can be used to arbitrate between multiple processors sharing memory, semaphores, or counters.
DEVELOPMENT. AND SUPPORT TOOLS
Digital signal processors are essentially applicationspecific microprocessors (or microcomputers). Like any other
microprocessor, no matter how impressive the performance of the processor or the ease of interfacing, without
26
guage according to his application. The C compiler is supported on the TMS320C25 and the TMS320C30.
Hardware Tools
Evaluation Modu/e (EVM): The EVM is a standalone singleboard module that contains all of the tools necessary
to evaluate the device as well as provide basic incircuit
emulation. The EVM contains a debug monitor, editor,
assembler, reverse assembler, and software communica
The TMS320 is designed for realtime DSP and other computationintensive applications [4]. In these applications,
the TMS320 provides an excellent means for executing signal processing algorithms such as fast Fourier transforms
(FFTs), digital filters, frequency synthesis, correlation, and
convolution. The TMS320 also provides for more generalpurpose functions via bitmanipulation instructions, block
data move capabilities, large program and data memory
address spaces, and flexible memory mapping.
To introduce applications performed by the TMS320, digital filters will be used as examples. The remaining portion
of this section will briefly cover applications, and conclude
by showing some benchmarks.
Digital Filtering
1) = bk(i)
where bk(i + 1) is the weighting coefficient for the next sample period, bk(i) is the weighting coefficient for the present
sample period, B is the gain factor or adaptation step size,
e(;) is the error function, and x(i  k) is the input of the filter.
In an adaptive filter, it is important to update the coefficients bk(i) in order to minimize the error function e(i),
which is the difference between the output of the filter and
a reference signal. Quantizationerrors are critical to the
performance of the filter when updating the coefficients
and can be minimized if the result is obtained by rounding
rather than truncating. For each coefficient in the filter at
a given point in time, the factor 2*B*e(i) is a constant. This
factor can then be computed once and stored in the T register for each of the. updates. Thus the computational
requirement has become one multiply/accumulate plus
rounding. Without the new instructions, the adaptation of
each coefficient is five instructions corresponding to five
clock cycles. This is shown in the following instruction
sequence:
LRLK
AR2,COEFFD
LRLK
AR3,LASTAP
LARP
LT
AR2
ERRF
ZALH
ADD
MPY
APAC
',AR3
ONE,15
',AR2
SACH
'+
; LOAD ADDRESS OF
COEFFICIENTS.
; LOAD ADDRESS OF DATA
SAMPLES.
; errf
2'B'e(i)
; ACC
; ACC
=
=
; ACC
bk(i)'2"16
bk(i)'2"16 + 2"15
bk(i)'2"16
+ errf'x(ik) + 2"15
; SAVE bk(i+l).
; LOAD ADDRESS OF
COEFFICIENTS.
; LOAD ADDRESS OF DATA
SAMPLES.
LRLK
AR2,COEFFD
LRLK
AR3,LASTAP
LARP
LT
AR2
ERRF
; errf
ZALR
MPYA
',AR3
',AR2
; ACC
; ACC
SACH
'+
2'B'e(i)
=
=
bk(i)'2"16 + 2"15
bk(i)'2"16
+ errf'x(ik) + 2"15
; PREG = errf'x(ik+l)
; SAVE bk(i+l).
The adaptive filter coefficient update can further be simplified using the TMS320C30 [27) as shown below. The first
instruction defines the number of times to repeat the kernel. The second instruction is the repeatblock instruction
(RPTB). The RPTB instruction allows the iterations ofthe kernel to be performed with zero overhead looping. The kernel
assumes that the error term is stored in register RO. It is
important to note that all of the calculations are performed
in floatingpoint arithmetic. The MPYF3 is a threeoperand
floatingpoint multiply of the input sample x(i  k), which
is stored in memory by the error term errf. The next step
is a threeoperand floatingpoint add (ADDF3) of the change
in the filtertap to the filter tap in parallel with the store (STF)
of the previously updated filter tap. That is, the store (STF)
is to be performed in parallel withADDF3. Thus the number
of cyles for a fioatingpoint adaptation is only two.
LDI
N,RC
RPTB
adapt
MPYF3
+ +ARO(l),RO,Rl
ADDF3
'+AR1(1),Rl,R2
STF
R2,'ARl + +(1)
adapt:
; b(k,i) + errf ' x(i  k)
R2
; R2  b(kl,i)
Since we have discussed the application of digital filtering, we can now describe several applications in the areas
of telecommunications, graphicslimage processing, highspeed control, instrumentation, and numeric processing,
and then conclude this section with several benchmarks.
If more detail is needed on any of these applications, the
reader is referred to [4].
Te/ecommun{cations Applications
27
HighSpeed Modems: The TMS320 can perform numerous functions such a modulation/demodulation, adaptive
equalization, and echo cancellation [21], [22]. For lower
speed modems, such as BeIl212A and V.22 bis modems, the
TMS320C17 provides the most costeffective Singlechip
solution to these applications. For higher speed modems,
such as the V.32, requiring more processing power and
multiprocessing capabilities, the TMS320C25 and TMS320C30 are the designer'S choice.
Voice Coding: Voicecoding techniques [3], [4], such
as fullduplex 32kbit/s ADPCM (CCIn G.721), CVSD,
16kbit/s sub band coders, and LPC, are frequently used in
voice transmission and storage. Arithmetic speed, nor
high datatransfer rate (ten, million 16bit words per second). Both devices can be used in closedloop systems for
control signal conditioning, filtering, highspeed computing, and multichannel multiplexing capabilities. The following demonstrates two typical control applications:
Disk Control: Digital filtering in a closedloop actuation
mechanism positions the read/write heads over the disk
surface. Supplemented with many generalpurpose features, the TMS320 can replace costly bitslice/custom/analog solutions to perform such tasks as compensation, filtering, fine/coarse tuning, and other signal conditioning
algorithms.
Robotics: Digital signal processing and bitmanipulation
power, coupled with host interface, allow the TMS320C25
to be useful in robotics control [24]. The TMS320C25 can
replace both the digital controllers and analog signal processing hardware for communication to a central host processor and for the performance of numerically intensive
control functions.
Instrumentation
Numeric Processing
replacement for a typical bitslice solution. The TMS320C30's floatingpoint precision, high throughput, and
interface flexibility are excellent for this application.
TMS320 Benchmarks
400 ns
FIR filter tap
9.25 kHz
256tap FIR sample rate
lMS adaptive FIR filter tap 700 ns
256tap adaptive FIR filter 5.4 kHz
sample rate
Biquad filter element (five
multiplies)
Echo canceler (single
100 ns
37 kHz
400 ns
9.5 kHz
2 ps
1 ~s
8 ms
32 ms
60 ns
>60 kHz
180 ns
>20 kHz
360 ns
>64 ms
chip)
SUMMARY
HighSpeed Control
28
TMS320 family were covered, and their support tools necessary to develop endapplications were briefly reviewed.
The paper concluded with an overview of digital signal pro
[14]
[15]
Inc., 1986.
REFERENCES
[1]
[19] S. Rosen, "Electronic computers: A historical survey," Comput. Surv., vol. 1, no. 1, Mar. 1969.
[20] M. Honig and D. Messerschmitt, Adaptive Filters. Dordrecht, The Netherlands: Kluwer, 1984.
(21] R.lucky et al., Principles of Data Communication. New York,
NY: McGrawHill, 1965.
[22]
lock, "A 40 MFlOPS digital signal processor: The first supercomputer on a chip," in Proe. IEEE Int'l Conf. on Acoustics,
Speech, and Signal Processing, Apr. 1987.
[27] TMS320C30 User's Guide. Houston, TX: Texas Instruments
Inc., 1987.
29
30
Panos Papamichalis
Ray Simar, Jr.
Digital Signal Processor ProductsSemiconductor Group
Texas Instruments
Reprinted from
IEEE MICRO MAGAZINE
Vol. 8, No.6, December 1988
31
32
The TMS320C30
FloatingPoint
Digital Signal Processor
Panos Papamichalis
Ray Simar, Jr.
Texas Instrumellts
33
34
RDV
'"
.5
ia
HOLD
HOLDA
STRB
Rfjj
it
D (310)
A (230)
RESET
Address
Inlegerl
floatingpoint
ALU
INT (30)
C
a
generators
Control registers
MC/MP
XI
Address
X2/CLKIN
Vee (70)
generator
Address
generator 1
Vss (100)
VBBP
SUBS
35
/'
Cache
(64 x 32)
I I
RAM
block 0
(1K x 32)
,.1,
RAM
block 1
(1K x 32)
I I
rb If,
1
ROM
block
(4K x 32)
.1_ [
PDATA BUS
I I
I I
PADDR BUS ~
IJ
'"
DDATA BUS
'"
.c
:::>
'E"
M
U
M
U
DADDR1 BUS
I
DADDR2 BUS
I
DMADATA BUS ~
DMAADDR BUS Vh
I
CPU
DMA
(PC/IR)
36
...
::I
status register,
index registers,
stack pointer,
interrupt mask and interrupt flag registers, and
repeatblock registers.
In particular, the stackpointer register points to the
software stack. The user has the flexibility of designating
where the stack resides, and even of changing its location
during the program execution. This feature also makes the
stack of essentially unlimited depth and permits its usage
not only for storing the program counter during subroutine
calls but also for passing arguments to subroutines. Such an
arrangement is particularly convenient in the development
of compilers, and we have used it extensively in the
320C30's optimizing C compiler.
The ALU performs floatingpoint, integer, and logical
operations. The ALU always stores the result in the register
file, but the input can come either from the register file or
from memory, or it can be an immediate value.
In the case of floatingpoint arithmetic, the input to the
ALU can originate from either a 40bit extendedprecision
register or a 32bit memory datum. Registers RO through
R7 store the 40bitword result. On the other hand, in
integer arithmetic, both input and output are 32bit numbers, and the output can move to either the lower 32 bits of
the RO through R7 registers or to any other register in the
register file.
The singlecycle hardware multiplier has been an integral part of DSPs because any realtime application relies
on the fast execution of multiplies. Following the same
distinction as in the previous paragraph on the ALU, the
multiplier performs both floatingpoint and integer multiplications. The 32bit inputs to a floatingpoint multiplication yield a 4Obitwide result for storage in one of the
extendedprecision registers.
In both the ALU and the multiplier the results of the
operations are automatically normalized, thus handling
any overflows of the mantissa. If there is an exponent
overflow, the result is saturated in the direction of overflow
and the overflow flag is set. Underflows are handled by
setting the result to zero and setting an underflow flag.
ALU
Multiplier
C\I
0::
Cl
Cl
Cl
0:
C\I
Cl
Cl
!C.
'",
Cl
a:
0::
32bit
barrel
shifter
at
*"
37
Software
MemOry}
space '
UI
"
"
.a
~
"0
"0
"0
E
J:
0!!~
<>.
..
J:
f>.
'ij
f>.
38
Integer and floatingpoint formats. A 32bit, twoscomplement notation represents the integers. In addition to
this singleprecision format, we have a short format, consisting of l6bit, twoscomplement numbers used only for
immediate operands. Every instruction of the 320C30
consists of one 32bit word.
We use three formats for floatingpoint numbers: short,
single precision, and extended precision. The singleprecision, 32bitwide format assigns 24 bits to the mantissa and
8 bits to the exponent. The exponent occupies the 8 most
significant bits, and it is represented in twoscomplement
notation, taking values between 128 and 127. The exponent value 128 is the result reserved to represent zero.
The mantissa, placed at the 24 least significant bits of a
32bit number, is normalized to a number with an absolute
value between 1.0 and 2.0. Since the mantissa is represented in a normalized, twoscomplement notation, the
leftmost bit, which corresponds to the sign, and its adjacent
bit will always be the complement of each other. As a
result, only the sign bit is represented, with the most
significant bit suppressed, In other words, the mantissa
contains 24 significant bits plus the sign bit, with the most
significant bit implied.
Addressing modes. The 320C30 supports several addressing modes that allow the user to access data from
memory, registers, and the instruction word. The basic
addressing modes are
register,
direct,
indirect,
short immediate,
long immediate, and
PC relative.
In register mode the operand is placed into a CPU
register that is explicitly specified in an instruction. In
direct mode the data memory address is formed by preceding the 16 least significant bits of the instruction word with
the 8 least significant bits of the data page pointer. To keep
all instructions one word long, we store only the 161east
significant bits from the address in the instruction word; the
rest become the data page pointer. This restriction implies
that in direct addressing the memory space is segmented
into 256 pages of 64K words each.
Table 1.
Addressing modes of the 320C30.
Mode
Example
Operation
Description
Register
Direct
Short
immediate
Long
immediate
PC relative
Indirect
ADDF RO,RI
ADDF @MEM, RI
Addr=MEM
Operand in RO
Operand in MEM
ADDF 3.14,RI
Operand = 3.14
BRLABEL
BGE LABEL
ADDF * +ARO(di),RI
Addr = ARO + di
Indirect
Addr=ARO di
Indirect
ADDF * + + ARO(di),RI
Indirect
ADDF *  ARO(di),RI
Indirect
Indirect
Indirect
Indirect
Indirect
Addr=ARO+di
ARO=ARO+di
Addr=AROdi
ARO=AROdi
Addr= ARO
ARO=ARO+di
Addr=ARO
ARO=AROdi
Addr= ARO
ARO = circ(ARO + di)
Addr= ARO
ARO = circ(AROdi)
Addr= ARO
ARO = B(ARO + IRO)
Branch to LABEL
Branch to LABEL
Predisplacement add
without modification
Predisplacement subtract
without modification
Predisplacement add and
modify
Predisplacement subtract
and modify
Postdisplacement add
and modify
Postdisplacement
subtract and modify
Postdisplacement add
and circular modify
Postdisplacement subtract
and circular modify
Postindex (IRO) add and
bitreversed modify
di is an integer between 0 and 255 or one of the index registers IRO and IRI.
at the input of the transform are scrambled at output (bitreversed order). To avoid moving the data around to place
them in the proper order. bitreversed addressing accesses
the data in scrambled order for any subsequent operation.
Circular addressing implements circular buffers. Such
buffers are very convenient for use in digitalfiltering
operations. In circular addressing. BK. one of the control
registers, specifies the size of the block. Then. when the
user modifies the contents of an auxiliary register (pointing
within that block) in a circular fashion. the final value is
tested to determine if it is still within the block. If it is not.
it is wrapped around using modulo arithmetic.
The shortimmediate mode encodes immediate, 16bitlong operands of arithmetic operations. The longimmediate mode encodes program control instructions (branch
instructions) for which it is useful to have a 24bit absolute
address contained in the instruction word. Finally. the PCrelative addressing also applies to program control instructions and uses the difference from the present location of
the PC counter rather than an absolute address. The last two
39
'"c:
ti
~
.E
Fetch
I
I
Decode
Fetch
4
5
Read
Fetch
Decode
I
I
*ARI,RO,RI
RI,*AR3
ADDF
STF
Cycle
Execute
Read
Decode
Fetch
I
I
I
Execute
Read
Decode
Fetch
I
I
I
Execute
Read
Decode
I
I
Execute
Read
40
ADDF3
STF
*ARO,Rl,R2
RO,*ARI
41
Table 2.
Instructions for the 320C30.
Instruction
Load and store
LDE
LDF
LDFcond
LDI
LDIcond
LDM
Description
instructions
Load floatingpoint exponent
Load floatingpoint value
Load floatingpoint value conditionally
Load integer
Load integer conditionally
Load floatingpoint mantissa
Twooperand instructions
ABSF
Absolute value of a floatingpoint
number
ABSI
Absolute value of an integer
ADDC t
Add integers with carry
ADDF t
Add floatingpoint values
ADDI
Add integers
t
AND
Bitwise logicalAND
t
Bitwise logicalAND with complement
ANDN t
ASH
Arithmetic shift
t
CMPF t
Compare floatingpoint values
Compare integers
CMPI
t
FIX
Convert floatingpoint value to integer
FLOAT
Convert integer to floatingpoint value
LSH
Logical shift
t
MPYF t
Multiply floatingpoint values
MPYI
Multiply integers
t
NEGB
Negate integer with borrow
NEGF
Negate floatingpoint value
Negate integer
NEGI
Program control instructions
Bcond
Branch conditionally (standard)
BcondD
Branch conditionally (delayed)
BR
Branch unconditionally (standard)
Branch unconditionally (delayed)
BRD
CALL
Call subroutine
CALLcond
Call subroutine conditionally
DBcond
Decrement and branch conditionally
(standard)
Decrement and branch conditionally
DBcondD
(delayed)
Instruction
Description
POP
POPF
PUSH
PUSHF
STF
STI
NORM
Bitwise logicalcomplement
Bitwise logicalOR
Round floatingpoint value
Rotate left
Rotate left through carry
Rotate right
Rotate right through carry
Subtract integers with borrow
Subtract integers conditionally
Subtract floatingpoint values
Subtract integer
Subtract reverse integer with borrow
Subtract reverse floatingpoint value
Subtract reverse integer
Test bit fields
Bitwise exclusiveOR
IDLE
NOP
RETIcond
RETScond
RPTB
RPTS
SWI
TRAPcond
Trap conditionally
NOT
OR
RND
ROL
ROLC
ROR
RORC
SUBB
SUBC
SUBF
SUBI
SUBRB
SUBRF
SUBRI
TSTB
XOR
42
Applications
Certain features of the 320C30 such as its high speed,
versatile architecture, and rich instruction set, make it easy
to implement very demanding algorithms. The large
memory space makes the device suitable for application
areas such as image processing in which memory addressing is one of the prime considerations. And the C compiler
makes it easy to construct algorithms with complicated
logic.
.
. General nsp algorithms. Almost every OSP applicatIOn needs to perform some kind of filtering, the first
application considered for a DSP device. Digital filters are
categorized as FIR (finitelength impulse response) and
IIR (infinite impulse response) filters,'" or, equivalently,
as filters that have only zeros or both poles and zeros. Each
of these categories can have either fixed or adaptive coefficients.
The 320C30 implements FIR filters very efficiently. For
instance, let an FIR filter have an impulse response h[O],
h[ 11, ... , h[N Xl], and let x[nl represent the input of the
filter at time n. Then, the following equation gives the
output y[n1 with the equation:
h[N  1] X x[n  N + 1]
43
Typical
Calling Sequence:
load
load
load
load
ARc)
ARl
RC
BK
CALL
FIR
Organization~
Data Memory
Impul se
response
++
Low
address
h (Nl)
h(N2)
++ +.+
Hi gh
++
address
h<O)
x (n1)
Newest ++
input
x(n)
++
x (n2)
++
x(n1)
:+
++
The physical address for the start of the input samples mLlst be on
a boundary wi th the LSBs set to zero according to the length of the
buff er. The poi nter to the input sequence (x) is incremented and
assumed to be moving from an older input to a ne .... er input~ At the
end of the subroutine AR1 will be pointing to the position for the
nei:t input sample.
.
Argument Assi gnments:
Argument
: Function
ARO
AR1
RC
Address of h (N1)
Address of x (N1)
; Length of fi 1 ter  2
: Length of f i 1 ter (N)
+B~~
(N2)
Prografll sIZe:
6 words
11 +
(Nl)
; ===========""====================""'=="'""'.;::='=====================::==cc========
FIR
.global
;
FIR
MPYF3
LDF
f i 1 ter
RPTS
MPYF3
ADDF3
::
ADDF
<=
initialize Rl):
h(N1) * x(n(Nl
initialize R2.
*ARO++(1),*AR1++(1}'l.,RO
I).O,R2
1
<
) RO
N)
RC
*AR()++(t) ,*ARl++(1)X,RO
RO,R2,R2
. RO,R2,RO
return sequence
RETS
return
end
.end
44
R2
ARC)
ARt
IRO.
load
load
IRI
BK
RC
CALL
IIR2
Low
++
address I
aiW)
b2(O)
++
++
al (0)
Initial delay
node values
Newest
delay
Final delay
node values
++ +.:..+
d(O,n)
dCO,nl)
:+
+_ ... _+ ++
dCO,n2): circular
queue
d(O,nU
oldest
delay
bl (0)
++
bO(O)
++ ++
I
d (O,n2)
d(O,n)
1+
++ ++
Empty
Empty
++ ++
++
++ ++
deN1,n)
: d(Nl,nl> :+
++ ++
++
++
++
I
I d(Nl,n2):
d(Nl,n)
:+
: d(Nl,n1>
a2CNl)
++
b2 (NI)
I d(Nl,n2)
t circular
queue
+~+
++
Empty
Empty
al (Nl)
bl (NI)
Hi gh
++
address I
bO<Nl)
++
The physical address for the start of each circular queue of delay node
values must be on a boundary with the LS8s set to zero according to thE"
; length of the buffer. The BK <block size) register must contain the
Two features of the 320C30 facilitate the implementation of the FIR filters: parallel multiply/add operations and
circular addressing. The first feature pennits a multiplication and an addition to execute in one machine cycle, while
the second makes a finite buffer of length N sufficient for
the data xl nJ. Figure 7 shows the arrangement of the data
and the assembly code for an FIR filter. Note that the filter
takes one cycle of execution per tap.
The transfer function of the IIR filters contains both
poles and zeros, and its output depends on both the input
and the past output. As a rule, these filters need less
computation than a FIR filter of similar frequency response, but they have the drawback of being sensitive to
coefficient quantization. Most often, the IIR filters are
implemented as a cascade of secondorder sections, called
biquads. To implement an IIR filter consisting ofNbiquads,
let al Ii], a2li] be the numerator coefficients of the ith biquad and bO[i], b I Ii], b2111 the denominator coefficients of
=
=
=
y[O.nj x[nj;
for (i=O; i<N; i++){
d[i.nj a2[ij*d[i.n2j + al[ij*d[i,nIj + y[iI,nj;
y[i,nj b2[ij*d[i,n2j + bl[ij*d[i,nIj +
bO[ij*d[i,nj;
}
y[nj = y[NI,nj;
45
?"$si Clnment s:
Argument
: FunctIon
R2
ARO
AR1
BI<
IRO
JR1
RC
Program size:
RC
17 words
Execution cycles:
23 + 6N
; ==:::::::;;::0=====:::=========================================,""""=========... ::11=0::::=.,""::1=
.global
I JR2
MPYF3
MPYF3
*ARO, *AR1,RO
*++ARO( 1) , *AR1 (1) 7., RI
a2 (O)
b2(Q)
MPYF3
*++ARO(l) , *ARt, RO
ADDF3
RO, R2, R2
MPYF3
;
IIR2
I:
*
*
d (O,n2) ) RO
d(O,n2) ) R1
:t
ADDF3
*++ARO(l) , *AR1(Ui!., RO
RO, R2, R2
b1 (0)
d (O,n1) ) RO
second sum term of d (O,n).
MPYF3
STF
bO(O)
:t
R2,
* d<O,n) > R2
store d (O,n); point to
d (0,n2).
RPTB
LOOP
MPYF3
<:= i <
loop for
a2(i)
d (i ,n2) ) RO
first sum term of y(il,n)
:t
ADDF3
..
MPYF3
AODF3
*++ARO(1) , *AR1(U7., RI
Rl,R2,R2
b2(i)
d(i,n2) > Rl
second sum term of y(il,n)
..
MPYF3
al ( i )
d (i ,n1) > RO
first sum term of dU ,n).
ADDF3
::
MPYF3
ADOF3
*++ARO(l) , *AR1
STF
MPYF3
*++ARO(I) , R2, R2
(1)
*
*
bl(i)
d(i,nl) > RO
second sum term of d (i ,n).
7., RO
RO, R2, R2
;
;
LOOP
flnal
bO
5ummatlon
AOOF
ADOF3
NOP
NOP
RO,R2
R1,R2,RO
return to
i rst b i quad
point to d (O,n1>
*AR1(IRll
,*AR1(1)%
return sequence
RETS
return
end
.end
Figure 8 (confd.)
46
Table 3.
Timing of an FFT on the 320C30.
Number of Radix2
points
(complex)
FFT timing (ms)
0.167
64
128
0.367
256
0.801
1.740
512
1,024
3.750
Code size
(Words)
55
Radix4
(complex)
Radix2
(real)
0.123
3.040
0.075
0.162
0.354
0.771
1.670
176
86
0.624
Table 4.
Program memory and timing
requirements for 320C30 routines.
Application
Inverse of a floatingpoint
number
Integer division
Doubleprecision integer
multiplication
Square root
Dot product of two vectors
Matrix times vector
operation
FIR filter
IIR filter (one biquad)
IIR filter (N) 1 biquads)
LMS adaptive filter
LPC lattice filter
Inverse LPC lattice filter
/Llaw compression
)Llaw expansion
Alaw compression
Alaw expansion
Words
Cycles
(best easel
worst case)
31
27
31
27/58
24
32
20/24
35
8 + (N 
10
10
5
7
16
9
11
9
16
13
18
15
1)
2 + R(C + 9)
7 + (N  I)
7
19+6N
8 + 3(N  1)
9 + 5(P  1)
9 + 3(P  1)
16
11116
18
14/21
47
JULY 16,
1987
FFT 51 ZE
N
M
SINE
INP,1024
LOG2 (N)
TEXT
INITIALIZE
.WORD
SPACE
FFTSIZ
WORD
LOGFFT
.WORD
. SINTAB
.WORD
INPUT
WORD
FFT.
LOP
LDI
LSH
LDI
LOI
LSH
LOI
LOI
LOI
;
OUTER LOOP
L[JOF':
NOP
LOI
ADDI
LOI
SUBI
FFT
100
N
M
SINE
INP
FFTSIZ
@FFTSIZ,IR1
2,IR1
O,AR6
@FFTSI Z , I RO
1,IRO
@FFTSIZ,R7
1,AR7
1,AR5
*++AF,6 ( 1 j
>
@INPUT A~';
R/,AF:O,I'~K
Af,/ ,Re
1.RC
AR(I,2 + 2.Nl
R2=X(I)X(L)
R!~Y(j)Y(Ll
RO=R2*SIN AND
K$=Y (I) +Y (L>
J R3""Rl *COS AND
48
II
II
BLK2
AND
II
END
STF
SUBF
MPYF
ADDF
MPYF
STF
ADDF
STF
Rl,R6,RO
*AR2, *ARO 1 R3
R2, *+AR4 (IRl) ,R3
R3,*ARO++(IRO)
RO,R3,RS
R5, *AR2++ (IRO)
STF
R4 t *+AR2
CMPI
BNE
R7,ARl
LSH
LSH
LDI
LSH
1,AR7
1,AR5
R7,IRO
l,R7
BR
LOOP
R3,*+ARO
Y'!)=Y(!)+Y(L)
RO,R3,R4
R4=Rl*CaSR2*SIN
R(J=Rl*SIN AND. ~.
R3=X <I )+X (L)
R3=R2*COS AND
X (I) =X (I) +X (L) AND ARO=ARO+2*Nl
INLOP
RS ... R2*COS+Rl*SIN
X (L) aR2*CaS+Rl*SIN I
;
I
INCR AR2
IE=2*IE
NI=N2
N2=N2/2
NEXT FFT STAGE
NOP
.END
Figure 9 (tont'd.)
References
I. K.S. Lin, G.A. Frantz, and R. Simar, "The TMS320 Family
of Digital Signal Processors," Proc. IEEE. Vol. 75, No.9,
Sept.1987,pp.11431159.
2. R. Simar and A. Davis, "The Application of HighLevel
Languages to SingleChip Digital Signal Pr~essors," Proc.
19881nt'l. Con! Acoustics, Speech, and Signal Processing,
Apr. 1988, pp. 16781681.
49
50
51
52
Panos Papamichalis
Digital Signal Processor ProductsSemiconductor Group
Texas Instruments
53
54
This report describes the implementation of several Fast Fourier Transforms (FFTs)
and related algorithms on the TMS320C30. The TMS320C30 is the first device in the
third generation of 32bit floatingpoint Digital Signal Processors (DSPs) in the Texas
Instruments TMS320 family. The algorithms considered here are the complex radix2 FFT,
the complex radix4 FFT, the realvalued radix2 FFT (both forward and inverse
transforms), the Discrete Hartley Transform (DHT), and the Discrete Cosine Transform
(DCT). These transforms have many applications, such as in image processing, sonar,
and radar.
The introduction briefly describes transforms and their implementation on the
TMS320 family of processors. Next, the different kinds of FFTs (including the real FFT),
the closelyrelated Hartley transform, and the Cosine transform are described and compared. This is followed by a description of the TMS320C30 features that permit efficient
implementations of these algorithms. Then, specific implementations, transforms, and
TMS320C30 C Compiler facts are outlined. Finally, the report discusses some implementation issues, and the appendices list actual TMS320C30 code for performing transforms.
The powerful architecture and instruction set of the TMS320C30 permit flexible
and compact coding of the algorithms in assembly language while preserving close correspondence to a highlevel language implementation. The efficiency of the architecture
and the speed of the device make faster realization of real and complex transforms possible. With the availability of a C compiler, these routines can be put in Ccallable form
and used as faster versions of FFT C functions.
Introduction
The Fast Fourier Transform (FFT) is an important tool used in Digital Signal Processing (DSP) applications. Its development by Cooley and Tuckey gave impetus to the
establishment of DSP as an independent discipline. The wellstructured form of the FFT
has also made it one of the benchmarks in assessing the performance of numbercrunching
devices and systems.
In recent years, because of the popularity of this signalprocessing tool, there have
been efforts to improve its performance by advances both at the algorithmic level and
in hardware implementation. Researchers have been developing efficient algorithms to
increase the execution speed of FFTs while keeping requirements for memory size low.
On the other hand, developers of VLSI systems are including features in their designs
that improve system performance for applications requiring FFTs. In particular, singlechip programmable DSP devices, currently available or under development, can realize
FFTs with speeds that allow the implementation of very complex systems in realtime.
55
j 
00
x(t) ejwtdt
(1)
00
determines the frequency content of the signal x (t). In other words, for every frequency,
the Fourier transform X(w) determines the contribution of a sinusoid of that frequency
in the composition of the signal x(t). For computations on a digital computer, the signal
x(t) is sampled at discretetime instants. If the input signal is digitized, a sequence of numbers
x(n) is available instead of the continuoustime signal x(t). Then, the Fourier transform
takes the form
00
X(dw) =
.E
x(n) ejwn
(2)
n=oo
56
x(n) e
_ j27rkn
N
(3)
n=O
When x(n) consists of N points x(O), x(I), . . . , x(Nl), the frequencydomain
representation is given by the set of N points X(k), k=O,I, .. . ,NI. Equation (3) is often
written in the form
Nl
X(k)
x(n)
wr;:
(4)
n=O
where W ': = e  j 27r:nk / N. The factor WN is sometimes referred to as the twiddle factor.
A detailed description of the DFT can be found in references [1,3,4]. The computational
requirements of the DFT increase rapidly with increasing block size N, having an impact
on the realtime system performance. This problem was alleviated with the development
of special fast algorithms, collectively known as Fast Fourier Transform (FFT).With an
FFT, the computational burden increases much less rapidly with N, and for any given
N, the FFT computational load, measured in terms of required mUltiplications and additions, is smaller than a bruteforce computation of the DFT.
The definition of the FFT is identical to the DFT: only the method of computation
differs. To achieve the efficiency of an FFT, it is important that N be a highly composite
number. Typically, the length N of the FFT is a power of 2: N = 2M , and the whole
algorithm breaks down into a repeated application of an elementary transform known as
a butterfly. If N is not a power of 2, the sequence x(n) is appended with enough zeroes
to make the total length a power of2. Again, references [1,3,4] contain a detailed development of the FFT. Reference [2] also discusses the same topic.
57
_i...,L.to
wNk
P+Q w~
.~
PQ W ~
1
+ Q w~
(5)
W~
(6)
P'
Q'
P  Q
and
The quantities P, Q, and P', Q' represent different points in the array being transformed, and they mayor may not occupy adjacent locations in that array. In an inplace
computation, the result P' will overwrite P, and Q' will overwrite Q. W k represents again
N
the twiddle factor, and its exponent is determined by the location of the corresponding
butterfly in the FFT algorithm.
Figure 2 shows an alternate form of the same FFT butterfly.
Although the notation is now less descriptive, it creates a clearer picture when several
butterflies are put together to form an FFT. Using the first notation, Figure 3 is the
flow graph of an 8point FFT example.
x(O)
X(O)
x(1)
X(4)
x(2)
X(2)
x(3)
X(6)
WO
x(4)
X(1)
WO
x(5)
X(5)
WO
x(6)
N
X(3)
WO
x(7)
X(7)
1
1
1
59
The same procedure can be repeated with all the scrambled numbers not occupying
the position that their index suggests. If the input sequence x(n) is rearranged to appear
in bitreversed form, the output X(k) appears in the correct order, as shown in Figure 4.
X(O)
x(O)
WO
N
X(1)
x(4)
X(2)
x(2)
WO
N
X(3)
x(6)
X(4)
x(1)
x(5)
WO
N
X(5)
X(6)
x(3)
x(7)
WO
N
X(7)
1
1
1
Figure 4. Alternate Form of 8Poiut FFT with Decimation in Time. The Input Is in
BitReversed Order and the Output Is in the Correct Order.
Since the only difference between Figures 3 and 4 is a rearrangement of the butterflies, the computational load and the final results are identical. In terms of implementation, this rearrangement means that the nesting of the two innermost loops in the FFT
routine is interchanged.
The butterflies and the FFT configurations presented thus far implement the FFT
with a decimation in time. This terminology essentially describes a way of grouping the
terms of the DFT definition; see Equation (3). An alternative way of grouping the DFT
terms together is called decimation infrequency. Figures 5 and 6 show the same example
of an 8point FFT: Figure 5 with the input in correct order and the output in bitreversed
order, and Figure 6 viceversa, and using the decimation in frequency (DIF).
60
X(O)
x(o)
Wo
N
X(4)
x(1 )
X(2)
x(2)
WO
x(3)
X(6)
X(1 )
x(4)
WO
N
X(5)
x(5)
X(3)
x(6)
WO
x(7)
1
1
1
X(7)
x(o)
X(O)
x(4)
X(1)
x(2)
X(2)
x(6)
X(3)
WO
N
x(1 )
X(4)
wO
N
x(5)
X(5)
WO
N
x(3)
X(6)
WO
N
x(7)
X(7)
1
1
1
61
In radix4 FFT, each butterfly has 4 inputs and 4 outputs, essentially combining
two stages of a radix2 algorithm in one. Figure 7 shows this combination graphically.
A
A1
A'
B1
B'
B'
C1
C'
C'
D'
D1
D'
62
Although four radix2 butterflies are combined into one radix4 butterfly, the computationalload of the latter is less than four times the load of a radix2 butterfly. Examples of radix4, 16point FFTs are shown in Figures 8 and 9 for decimation in time
and decimation in frequency, respectively.
0
4
2
12
4
5
13
10
10
11
14
12
13
14
11
15
15
63
4
2
12
4
5
13
10
10
11
14
12
13
14
11
15
15
64
A'
B'
C'
C'
B'
0'
0'
(a)
(b)
65
There are eight extendedprecision registers, ROR7, that can be used as accumulators or generalpurpose registers, and eight auxiliary registers, AROAR7, for
addressing and integer arithmetic. For many applications, these registers are sufficient
for temporary storage of values, and there is no need to use memory locations. This is
the case with the radix2 FFT algorithm, where no locations are required other than those
for the transformation of incoming data to be transformed. Also, arithmetic using these
registers greatly increases the programming efficiency. The two index registers, IRO and
IR 1, are used for indexing the contents of the auxiliary registers AROAR7, thus making
the access of the butterfly legs and the twiddle factors easy.
A powerful structure in the TMS320C30 is the blockrepeat capability that has the
form
LABEL
RPTB
LABEL
put instructions here
last instruction
Whatever occurs after the RPTB instruction and up to the LABEL is repeated one
time more than the number included in the repeat counter register, RC. The RC register
must be initialized before entering the blockrepeat construct. The net effect is that the
repeated code behaves as if it were straightline coded (no penalty for looping), with program size equal to the one in looped code. In this way, the FFT butterfly, being the core
of the program, can be implemented in a blockrepeat form, thereby saving execution time
while preserving the clarity of the program and conserving program space.
A bitreversed addressing mode is available to eliminate the need for swapping
memory locations at the beginning or the end of the FFT (depending on the FFT type).
When you use this addressing mode, you access a sequence of data points in bitreversed
order rather than sequentially, and you can recover the points in the correct order during
retrieval of the data instead of spending extra cycles to accomplish it in software.
66
Table 1. Program Memory Requirements for the Core of the FFT and Hartley
Transforms
Routine Type
Program Size
50 words
170 words
68 words
76 words
Hartley transform
71 words
The numbers in the table correspond only to the core program and do not include
the sine/cosine tables for the twiddle factors, any input/output, or any bitreversing operations. Note also that they are independent of the FFT data size.
The data memory requirements are, of course, dependent on the FFT size. The maximum length of a complex, radix2 FFT that can be implemented entirely on the internal
memory of the TMS320C30 is 1024 points. In the present implementation, the 1024point
radix4 FFT requires a few more locations (about 7) than are available onchip.
The code (provided in the appendices) has been written to be independent of the
FFT length. The length N, together with the sine/cosine tables for the twiddle factors,
should be provided separately to maintain the generic nature of the core FFT program.
An example of a file with the sine/cosine tables for a 64point FFT is given in the Appendix F. Note thatthe FFT size and the number of stages are declared .global in both files
(i.e., the main routine and the file with the table) so that the core program gets the actual
values during linking.
To reduce the storage requirements of a sine/cosine table, a full sine and a cosine
cycle are overlapped. The table stores 5/4 of a full sine wave, with the cosine table starting with a phase delay of 114 cycle from the sine table. This table size is larger than actually needed, and it is selected merely for testing convenience of the algorithms. The
minimum table size for a radix2 complex FFT includes 112 of a full sine wave, and 112
of a full cosine wave. If these two half waves are combined using the above quartercycle
phase delay, the minimum table size for this kind of FFT is 3/4 of a full sine wave. For
instance, for a 1024point FFT, the table can be the first 768 points of a sine wave, where
a full cycle would be 1024 points. In the case of a radix4 complex FFT, the minimum
table size should include 3/4 of a sine and 3/4 of a cosine wave. Overlapping these requirements, we get the minimum table size of a radix4 algorithm to be one full sine wave.
An example of a linking file is also included in Appendix F to show how the different segments are assigned. For a complete description of the assembler and linker, consult
the corresponding manual [6].
An Implementation of FFT, DCT, and Other Transforms on the TMS320C30
67
The timing of the FFT routines was done using the cyclecounting capability of the
TMS320C30 simulator. For the conversion of the number of cycles into seconds, a cycle
time of 60 ns was used. The timing refers only to the core FFT computation, ignoring
readin and writeout requirements, since such requirements are applicationdependent.
Also, no bit reversal is counted (although it may be included in the program), since it
is performed as part of the readin or readout. Table 2 gives the timing for the different
FFT routines and for the Hartley transform.
64
128
256
512
1024
1024
Radix2
Radix4
Radix2
Radix2
Complex
Complex
Real
Real
FFT
FFT
FFT
Inverse FFT
0.165
0.370
0.816
1.784
3.873
2.366
0.123
0.077
0.174
0.387
0.857
1.879
0.085
0.193
0.434
0.964
2.124
0.624
3.040
Hartley
Transform
0.081
0.181
0.403
1.132
2.430
For the complex FFTs, the radix4 algorithm reduces the execution time by 2025%
compared to radix2, depending on the FFT size. The last entry in this table represents
the timing of the radix2, DIT routine generated at the University of Erlangen [18] and
given in Appendix A. These numbers are typically used for benchmarking.
= R(k) + j
/(k)
(7)
and that the sequence has length N, R(k) and I(k) should satisfy the following relations:
R(k) = R(Nk), k = 1, ... , N121
/(k) = /(Nk), k = 1, ... , N121
/(0) = /(NI2) = O.
68
(8)
(9)
(10)
In other words, the real part of the transform is symmetric around zero frequency,
while the imaginary part is antisymmetric. Similar conditions hold if the transform is expressed in terms of magnitude and phase.
The savings are due to the fact that not all points need to be computed. Since the
notcomputed points do not need to be saved either, there are also storage savings. An
efficient algorithm for realvalued FFTs is described in [10]. This algorithm was implemented in the present study in such a way that, given the sequence of N real numbers
x(O), x(1), .. . ,x(NI), the resulting FFT, consisting of complex numbers, is stored as
R(O), R(I), .. . ,R(NI2), I(NI2I), I(NI22), ... ,/(1). R(k) and I(k) represent the real and
imaginary parts of the complex number X(k). Figure 11 shows the memory arrangement
for the FFT. Note that the input to the real FFT should be bitreversed, but the bit reversal can be done as the data is brought in. With this arrangement, an Npoint FFT uses
exactly N memory locations. If the full array X(k) is needed, the following relations should
be used:
X(O) = R(O)
X(k) = R(k) + j I(k), K = 1, ... , NI2I
X(NI2) = R(NI2)
X(k) = R(Nk)  j I(Nk), k = NI2+ 1, ... , NI
(11)
(12)
(13)
(14)
x(O)
R(O)
x(1)
R(1 )
x(2)
....
REAL
BITREVERSAL
II
FFT
.....
R(N/2)
I(N/21)
1(N/22)
x(N2)
x(N1)
1(1 )
69
H(k)
0, .. . ,Nl, by
(15)
n=O
Nl
x(n)
= _1
N
~
k=O
(16)
where cas(x) = cos(x) + sin(x). The DHT demonstrates a symmetry that is convenient
for implementations: The same program can be used for both the forward and the inverse
transforms, and the result is correct within a scale factor. Also, the real FFT and the DHT
can be derived from each other [12].
70
A radix2 Hartley transform was implemented on the TMS320C30, and the corresponding code is included in Appendix n. This code follows the structure of the real
FFT in Appendix C. Tables 1 and 2 show the program memory requirements and the
timing for the execution of Hartley transforms of different sizes. The sine/cosine table
sizes are the same as in the case ofa real FFT.
NI
x(k) = ~
n=O
1)1I"n
(17)
2N
NI
x(n)
k=O
e(O)
e(k)
= 11 J 2
= 1, for k
I)1I"n
(18)
2N
'* 0
(19)
(20)
Appendix E shows an implementation of the nCT based on the paper by Lee [14].
The appendix contains the algorithms for both the forward and the inverse transformations
and an example of a table for a 16point nCT. Note that, because of the structure of the
algorithm, the cosine table needed contains actually the inverses of the cosines (within
a scale factor), and it is not stored in the natural order. Instead, it is generated by the
following C pseudocode:
cos_table[N2] = cos[pi/4];
cos_table[N1 ]
2/N;
71
The last entry to the table is not part of the cosine itself; it is a constant that is used
by the algorithm, and it is placed at the end of the cosine table for convenience.
Table 3 shows the timing of the forward and inverse transforms for different transform
lengths. The difference in the timing between the forward and the inverse transforms is
due to the fact that more time was expended to optimize the performance of the inverse
transform. Since four of the smallest butterflies were done simultaneously in the center
program loop, the minimum permissible array size to be transformed is 8.
Forward
Inverse
Size
Transform
Transform
16
64
128
256
512
1024
0.023
0.105
0.230
0.502
1.094
2.378
0.020
0.088
0.193
0.416
0.905
1.982
Table 4. Execution Times in Seconds for a 1024Point Real FFT. The Numbers Should
Be Compared with 0.001879 s of a 1024Point Real FFT on the TMS320C30
Machine
Radix8
Mixradix
Splitradix
0.3634
0.3902
0.3021
0.2376
0.2948
0.2089
0.2545
0.1825
0.2600
0.2127
0.1672
0.1046
0.1107
0.1101
0.0796
0.0943
0.0811
0.0767
0.0871
0.0975
0.0633
0.2371
0.0539
0.0641
0.0217
0.0243
0.0235
0.1671
0.1846
0.1864
0.1299
0.1527
0.1419
0.0940
0.1184
0.0991
0.0885
0.1110
0.0845
0.0277
0.0319
0.0338
0.0277
0.0316
0.0337
0.0182
0.0171
0.0151
0.0180
0.0173
0.0150
0.2518
0.3365
0.2103
0.2806
0.3897
0.2802
0.7586
1.047
0.6955
0.7476
1.029
0.7033
0.6037
0.6895
0.5660
0.0983
0.1060
0.0946
0.3689
0.4126
0.3390
0.3530
0.4142
0.3297
0.2053
0.2244
0.1416
0.2206
0.2457
0.1326
0.9400
1.248
0.9478
2.6670
3.1600
2.8260
1.5040
2.0800
1.4630
73
74
To minimize or avoid such conflicts, there are some simple steps that the designer
can take. The TMS320C30 has three separate memory areas (one onchip, one accessed
by the primary bus, and one accessed by the expansion bus) that can be combined. For
instance, the program can be placed on the expansion port and the data on the primary
port. Or the data can first be brought into internal memory and then operated upon.
Alternatively, the program may be relocated to internal memory. A related approach is
to use the cache. All the transforms are implemented as loops that are executed many
times. If you activate the onchip cache after the first access of the code, the instructions
execute from the cache instead of the external memory.
If there are additional conflicts, they can typically be resolved by some rearrangement
of the code. For instance, consecutively writing to external memory takes two cycles per
write. If, however, a write is followed by some internal operation, then the second cycle
of the write is transparent, and the actual cost is one cycle.
Bit Reversal
The TMS320C30 has a special form of the indirect addressing mode for the bitreversing operation that is required at the beginning or the end of an FFT. Through this
addressing mode, the scrambled data are accessed in their proper order. This addressing
mode works as follows:
Let ARn (n=O .. 7) be the auxiliary register pointing to the array with scrambled
data. The index register IRO contains a. number equal to onehalf the size of the FFT.
Then, after every access of the data, ARn is incremented by IRO using the construct
* ARn + + [IRO]8
This causes the contents of ARn to be incremented by the contents of IRO, but if
there is a carry in this incrementing, the carry propagates to the right instead of to the
left. The result is the generation of the addresses in a bitreversed order. The bitreversed
addressing mode works correctly if the array with the data is aligned in memory so that
the first memory address is a multiple of the FFT size. This can be achieved if the first
memory address has zeros for the last M bits, where M = iog2N, with N being the FFT
size. For example, in the case of a 1024point FFT, the last 10 bits of the memory address
of the first datum should be zeros.
In the implementation of the complex FFT, the output is complex even when the
input is real. So, there is a need to consider both the real and the imaginary parts of the
data array. The above description of the bitreversed addressing mode assumed that the
real and the imaginary parts are stored as separate arrays in the memory. In this case,
each of the arrays (real or imaginary parts) can be accessed as described. However, in
most cases (including this report), the real and imaginary points alternate in the same array.
75
In this arrangement, the following simple modification achieves the same goal: set IRO
equal to N instead of N12, and access the N points of the transform. At every access, the
auxiliary register is pointing to the real part of the FFT. The imaginary part is located
in the next higher location, and it can be easily accessed.
With the bitreversed addressing mode, the unscrambling of the data can take place
when the FFT result is accessed for further processing or for 110. It is possible, though,
that certain applications demand the reordering of the data in the same array. Such a
rearrangement can be done very simply for a complex FFT with the following code.
@FFrSIZ,RC
1,RC
@FFrSIZ,IRO
@INPUT,ARO
@INPUT,AR1
: RC = FFr SIZE
; RC SHOULD BE ONE LESS THAN DESIRED #
; IRO = FFr SIZE
*
RPTB
CMPI
BGE
LDF
II
LDF
STF
II
STF
LDF
II
LDF
STF
II
STF
CONT NOP
BITRV NOP
BITRV
AR1,ARO
; EXCHANGE LOCATIONS ONLY
CONT
IF AROAR1
*ARO,RO
EXCHANGE REAL PARTS
*AR1,R1
RO,*AR1
R1, *ARO
* +ARO,RD
EXCHANGE IMAGINARY PARTS
* +AR1,R1
RO, * + AR1
R1,* +ARO
*ARO + + [2]
*AR1 + + [IRO]B
Note that ARt is pointing to the bitreversed version of the address contained in
ARO. For realvalued FFT, or for FFTs that store the real and the imaginary parts in
separate arrays, the realFFT routine in Appendix C contains a modified example of the
above code.
Use ofDMA
If the signal to be transformed arrives as a continuous stream of data, the DMA
could be used to collect the new data while the data already collected are processed. In
this case, the data source address of the DMA points to the memory location corresponding to a serial port, or to another port associated with an external device. The destination
is a memory space designated for storage.
76
There are two ways to use such buffers. One possibility is to designate one buffer
as the temporary storage and the other buffer as the working area. When the storage buffer
receives the necessary amount of data, the data is transferred to the working area, and
the DMA starts refilling the storage buffer. Alternatively, the two buffers are considered
equivalent: when the processor finishes processing and outputting the data from one and
the DMA has filled the other, the two buffers switch functions; i.e., the DMA starts filling
the first buffer while the CPU is processing the data in the buffer just filled.
Test Vector
For testing purposes, a vector with 64 (quasirandom) data points and the
corresponding FFT values is given in Appendix F. In this way, if any of the routines is
implemented, the test vectors can be used to verify the correct functionality of the routines.
Together with the test vectors, Appendix C gives a sine/cosine table for a 64point
transform, and the linking file for such a transform.
Summary
This report examined implementations of fast transforms on the Texas Instruments
TMS320C3x floatingpoint devices. The transforms considered were several forms of the
FFT, the Discrete Hartley Transform, and the Discrete Cosine Transform. Because of
the powerful architecture of the device, the implementation was done easily and efficiently.
It was shown that a TMS32OC30 executes the FFTs several times faster than large computers
such as VAX and SUN workstations. With the availability of the C compiler, these routines
can be put in Ccallable form and be used to compute the corresponding transforms
efficiently.
77
Appendices
Appendices A to F contain the TMS320C30 assembly language programs for the
different algorithms considered. The contents of the appendices are as follows:
Appendix A: Radix2 Complex FFT.
composed of
AI: Generic Program to Do a LoopedCode Radix2 FFT
Computation on the TMS320C30.
A2: ffL2  Radix2 Complex FFT to Be Called as a C
Function.
A3: Complex, Radix2 DIT FFT  R2DIT.ASM.
A4: Complex, Radix2 DIT FFT  R2DITB.ASM.
A5: TWIDIKBR.ASM  Table with Twiddle Factors for a FFT
up to a Length of 1024 Complex Points.
Appendix B: Radix4 Complex FFT.
composed of
BI: Generic Program to Do a LoopedCode Radix4 FFT on the
TMS320C30.
B2: fft_4  Radix4 Complex FFT to Be Called as a C
Function.
Appendix C: Radix2 Real FFT.
composed of
CI: Generic Program to Do a Radix2 Real FFT Computation
on the TMS320C30.
C2: fft~l  Radix2 Real FFT to Be Called as a C Function.
C3: Generic Program to Do a Radix2 Real Inverse FFT
Computation on the TMS320C30.
Appendix D: Discrete Hartley Transform.
composed of
DI: Generic Program to Do a Radix2 Hartley Transform on the
TMS320C30.
Appendix E:
composed of
EI:
E2:
E3:
E4:
78
Acknowledgements
Mr. Raimund Meyer and Mr. Karl Schwarz (Lehrstuhl fur Nachrichtentechnik,
University of Erlangen) provided the fast routines of Appendix A to do l024point, radix2,
DIT FFT. Mr. Paul Wilhelm of the University of Washington provided the routines for
the Fast Cosine Transform (FCT) together with the related explanations and the test vector
in Appendix E. Their contributions are gratefully acknowledged.
79
References
[1] Burrus, C. S., and Parks, T.W. DFTIFFTand Convolution Algorithms, John Wiley
and Sons, New York, 1985.
[2] Lin, K. S., Ed. Digital Signal Processing Applications with the TMS320 Family,
PrenticeHall, Englewood Cliffs, New Jersey, 1987.
[3] Oppenheim, A. V. and Schafer R.W. Digita.l Signal Processing, PrenticeHall,Inc.,
Englewood Cliffs, New Jersey, 1975.
[4] Rabiner, L. W., and Gold, B. Theory and Application of Digital Signal Processing,
PrenticeHall, Inc., Englewood Cliffs, New Jersey, 1975.
[5] Burrus, C.S. "Unscrambling for Fast DSP Algorithms," IEEE Transactions on
Acoustics, Speech, and Signal Processing, Vol. ASSP36, No.7, pp. 10861087,
July 1988.
[6] Papamichalis, Panos E., and Burrus, C.S."Conversion of DigitReversed to BitReversed Order in FFT Algorithms," Proceedings of 1989 IEEE International
Conference on Acoustics, Speech, and Signal Processing, May 1989.
[7] ThirdGeneration TMS320 User's Guide, Texas Instruments, Inc., Dallas, Texas,
August 1988.
[8] Papamichalis, Panos E., and Simar, Ray Jr. "The TMS320C30 FloatingPoint Digital
Signal Processor," IEEE Micro, Vol. 8, No.6, pp. 1329, Decemberl988.
[9] TMS320C30AssemblyLanguage Tools User's Guide, Texas Instruments Inc., Dallas,
Texas, July 1987.
[10] Sorensen, H.V., Jones, D.L., Heideman, M.T., and Burrus, C.S. "RealValued Fast
f/ourier Transform Algorithms", IEEE Transactions on Acoustics, Speech, and Signal
Processing, Vol. ASSP35, No.6, pp. 849863, June 1987.
[11] Bracewell, R.N. "The Fast Hartley Transform," Proceedings of IEEE, Vol. 72,
No.8, pp. 10101018, August 1984.
[12] Sorensen, H.V., Jones, D.L., Burrus, C.S., and Heideman, M.T. "On Computing
the Discrete Hartley Transform," IEEE Transactions on Acoustics, Speech, and Signal
Processing, Vol. ASSP33, No. 4,pp. 12311238, October 1985.
[13] Ahmed, N., Natarajan, T., and Rao, K.R. "Discrete Cosine Transform," IEEE
Transactions on Computers, Vol. C23, pp. 9093, January 1974.
[14] B. G. Lee, "FCT  A Fast Cosine Transform," Proceedings of 1984 IEEE
International Conference on Acoustics, Speech, and Signal Processing, pp.
28A.3.128A.3.4, March 1984.
[15] TMS320C30 C Compiler Reference Guide, Texas Instruments Inc., Dallas, Texas,
December 1988.
80
[16] Sorensen, H.V., Heideman, M.T., and Burrus, C.S. "On Computing the SplitRadix
FFT," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol.
ASSP34, No.1, pp. 152156, February 1986.
[17] Sorensen, H.V.and Burrus, C.S. "Computer Dependency of FFT Algorithms",
Proceedings of ASILOMAR, 1987.
[18] Schuessler, H.W., Meyer, R., and Schwarz, K. "FFT Implementation on DSP
ChipsTheory and Practice," Proposal for the 1990 IEEE International Conference
on Acoustics, Speech, and Signal Processing.
81
82
83
00
ADDI
LDI
SUBI
R7,ARO,ARl
AR7,RC
I,RC
>
"CI
"CI
FIST LOG'
THE PROGRAM IS TIl<EN FRc.1 HI' BURRUS L PARKS B1Xl<. P. III. TIE lCOII'lEXI
DATA RESIlE IN INTERNAl remy. TIE Cl)1PUTATION IS DCK INPLACE, BUT TIE
RESULT IS MOVED TO IiNlHR PEMORY SECTION TO DEMONSTRATE THE B1TR!:VERSED
ADDRESSING. THE TWIDDLE FACTffiS ARE Slf'PLiED IN A TABLE PUT IN A .DATA
;ECTJON. THIS DATA IS InUIED IN A SEPARATE FILE TO PRESERVE THE GElRIC
NATURE OF THE _AM. FCIl THE SAPE PLmlSE, THE SIZE DF TIE FFT N AND
LOG21NI ARE DEFItD IN A .GlOBL DIRECTIVE AND SPECIFIED DURING LINKING.
::...
;:,
~
'tj
~
~
;:,
'"is
6'
;:,
.s;,
IN?
.GLOBL
FFT
.GlOBL
.GlOBL
.GlOBL
M
SINE
,
,
,
:
.USECT
.BSS
"IN", 1024
OUTP,I024
.'"3
WORD
FFT
l:l
SPACE
100
(j
;:,
I:l
;:,
FFTSIZ
LOGFFT
SINTAn
INPUT
OUTPUT
WORD
WORD
WORD
WOOD
SINE
IN?
ruTP
fFT:
LOP
FFTSIZ
LOI
LSH
LDI
LDI
LSi<
LDI
LOI
tFFTSIZ,IRl
2,IRI
0, ARb
tFFTSI Z, IRO
I,IRO
tFFTSIZ,R7
l,AR7
LOI
1,AR5
~
<::>
""<::>
;:,
S
'"
10m
CMPI
B10
NOP
LDI
"
"
"
"
"
H+AR6( 1)
@INI'JT , ARO
RO=l1IltllU
RI=l1IJXIU
R2=YllltYIU
R3=Y1IIYIU
Y1II=R2 AND ...
YIU=RJ
: XII I=RO AND ...
XIU=RI AND ARO,Z = ARO,2 t 2t'NI
LDI
LDI
ADDI
LOI
ADDI
ADDI
ADDI
LDI
SUBI
LDF
I"
("j~
=
a ....
3
"CI ""i
a.~
.... ""i
END
=(JQQ
2,Ml
@SINTAB,AR4
ARS,AR4
ARI,ARO
2,ARI
@Itf'UT,ARO
R7,ARO,ARl
AR7,RC
I,RC
tAR4,Rb
RPTB
SUBF
SUBF
MPYF
AODF
MPYF
STF
SUBF
MPYF
ADDF
MPYF
STF
ADDF
STF
STF
BLK2
tAR2, *ARO, R2
t+AR2,I+ARO,Rl
Rl,Rb,RO
HAR2, HARO,R3
Rl, <tAR41 IRI I ,R3
R3,HAAQ
RO,R3,R4
Rl,Rb,RO
<AR2, tARO, R3
Rl, <tAR41 IRII, R3
R3,IAAO++(IRO)
RO,R3,R5
R5, <AR2ttIIROI
R4,HAR2
CMPI
R7,ARI
IILOP
BNE
.>
~
@lOGFFT , ARb
BLK2
LOOP:
1M2, "ARO, R3
R2,4MO
....=
,
,
,
,
,
0
0
BLKI
tARO, *AR2, RO
tAR2++, fARO++, R1
*AR2, *ARO, R2
SECOND LOOP
"HER LOOP
tv
[3
"
IIt.OP:
INITIALIZE
STF
STF
tl
S'...."
..
BLKI
TEXT
.'"3
l:l..
RPTB
ADDF
SUBF
ADDF
SUBF
STF
STF
Q.
,
,
,
,
;
,
,
,
,
,
,
,
,
R2=l1IJXIU
RI=YIJ IYIU
RO=R2<SIN AND ...
R3=YllltYIU
R3=Rl<COS AND ...
YlJIYlJltYIU
R4=RI<COSRl<SIN
RO=R1+SIN AND ...
R3=XlIJtXIU
R3=R2tCOS AND
X1II.11IItXIU AND _Ot2<NI
R5=R2tCOStR1+SIN
XIU=R2tCOStR1+SIN, INCR AR2 AND...
YIU=RltCOSfl2<SIN
""i
= =3
Q
="Q
130
~Q
00=
~t"'"
QQ
("jQ
~"CI
Q~
Q.
I
("j
Q
Q.
~
:=0
=
....
Q.
~
I
N
~
~
~
:l:.
;:s
3'
';l
1i>
3
~
LSI!
I,AR)
LSI!
LOI
LSI!
1,AR5
R),IRO
1,R)
L()(J>
!Ill
is
END'
=ii
.'l
tl
(":l
LOI
SUBI
LOI
LOI
LDI
LDI
@FFTSlZ,Re
I,Re
@FFTSlZ,IRO
2,IRI
fiNPlIT,ARO
to.J1PIJT,ARI
RPTB
Bl1RY
STF
STF
~
..,
"
8!TRY
HAROtll,RO
1/IRO++(!ROIB,RI
RO,HARlIll
RI,.IIRI++IIRll
SELF
!Ill
SELF
.'l
~
c
~
'"
So
~
~
~
o
C
00
VI
, N2=N2/2
, 111 FFT
ST~
~.
I~
IE=2flE
, NI=H2
L~
LDF
.END
, RC=N
, RC SIIlllI IE !IE LESS THAN lESIRED
, IRO=SIZE ~ FFT~
00
0
... NAI'E:
ffU  RADIX2.C!l'If'LEX FFT TO BE CAlLD AS A C RKTlON.
FP
SYtO'SIS:
INT ffUIN,~, DATAl
INT N
FFT SIZE: No2"~
NlI1IlER {F STAGES = UlG2(NI
INT ~
ARRAY WITH I~ AND OOTPUT OATA
FLOAT 'DATA
::..
:::
DESCRIPTION:
GNRIC FlKTlON TO 00 A RADIX2 FfT CM'UTATIOO ON TIE 32OC3O.
Tft: DATA rHlAY IS 2'NU}(;, WITH REAL AND IMAGINARY VALUES ALTERNATHI3.
THE _
IS BASED 00 THE FORTRAN _
IN Tft: lltIlRUS AND PARKS
BOOK, P. 111.
:::
is'
g.
:::
<5;,
.'3
t.:::?
FLOAT
VALUE(5N/41
\:l
\:l..
Sf.
....~
.,
~
:::
~
C
'"C
:::
_fiL2:
tv
00
.~ord
PUSH
LDI
MH
PUSti'
PUSHF
PUSH
PUSH
PUSH
PUSH
LOI
STI
Lal
STI
LDI
STI
= sin! (5+N/411*2tpi/N)
THE VALUES VALUEI, VALUE2, ETC., ARE Tft: SAllE WAVE VALUES. FOR AN
NP\JINT FFT, HRE ARE N+N/4 VALUES FOR A FLU AND A IlUARTER PERIOD {F
THE SINE WAVE. IN THIS WAY, A FULL SINE AND COSINE PERIOD ARE AVAILABlE
IS/.f'ERlt1POSEDI.
>
"C
"C
=
....
Q.
~
~
_sine
:::
FP
SP,FP
R4
RS
R6
R7
AR4
AR5
ARb
AR7
tFP(21,RO
; ~ ARGUMENTS TO LOCATIONS
; Tft: NMES I N THE PROOW1
RO,!f'FTSIZ
.FPi3I, RO
RD. !LOGFfT
=
....
.... C
I
:;:d
==
r':l
Q.
....
~
I
N
~TQHNG
~
~
++
FP(41
FP131
FP(11
FP(11
FPIOI
DATA
N
: RETlIIN AOOR
ruJ FP
++
REGISTERS USED: RD, RI, R2, R3, R4, R5, Rb, R7, ARC, ARI, AR2, AR4, AR5
ARb, AR7, IRO, IRI, RS, RE, fit
AUTHOR: PANOS E. P~MICHALIS
TEXAS INSTRUIENTS
!FFTSIZ ,IRI
1.IRI
LD!
1,AA5
O,AAb
!FFTSIZ.IRO
I.IRO
@fFTSIZ,R7
I,AR7
HHHffftUfHHnH**ffHffHHHUHtttHHHHU.UHHHHHftHIf.UfH
LOOP:
....
0=
~
=
~
(J.ITER LOOP
OCTOOCR 13, 1987
RO.!INPUT
LDI
LSH
LDI
LDI
LSH
LDI
LDI
~
0
"C
fFP(41,RO
Sf.
~
FFTSIZ.I
LOGFFT,I
Itf'UT,1
_Slne
TEXT
SINTAB
:::
. ass
PUSH
(j
_ffU
INITlALlZE C FUtUIOO
THE COMPlITATlOli IS OONE IN PlACE, AND Tft: ORIGINAL DATA IS [STROYED.
BIT REVrnSAL IS Itf?LEI1ENTED AT THE END (F THE FOCTlON. IF THIS IS t{)T
NECESSARY, THIS PART CAN BE ClMNTED OJT.
GLOBAL
DATA
_swe
FLOAT
.'3
AA3
.BSS
.SSS
"'<:S
"::!
.set
GLOB!.
GLOB!.
Q.
NaP
H+AR6(1)
LDI
ADDI
LDI
SUBI
@INPUT.ARO
R7.ARD,AR2
AR7, R(
1,RC
=
=
~
~
;:,
FIST LOW
RPTB
AIlIF
SUBf
ADIIF
SUBf
STF
STF
SIF
STF
"i:5
is
gO
ILK!
"
END:
YILI~3
XIII~O
AND ...
XfLI~1 AND ARO,2
ARO,2 + 2+NI
END
,:l
INLOP:
C)
So
'"...
~
~
LDI
LDI
ADDI
LDI
ADDI
ADDI
ADDI
LDI
SUBI
LDF
2,ARI
tsINTAB,AR4
AR5,AR4
ARI,ARO
2,ARI
~ltflJT ,ARO
RI,ARO,AR2
ARI,RC
I,RC
+AR4,R6
So
'"
~
~
N
C
o
C
I1LK2
"
RPTB
SUBf
SUBf
II'YF
ADDF
IIPYF
STF
SUBf
ItPYF
ADDF
IIPYF
STF
ADDF
STF
STF
CIIPI
BLK2
tAR2, tARO, R2
HAR2, '+MO,Rl
R2,R6,RO
,
,
R2~XIIIXILI
RI~YIIIYILl
,~tSIN
AND ...
R~YIII+YILI
f+AR2, ttARO,R3
RI, HAR4IIRII,R3
R3,HMO
RO,R3,R4
RI,R6,RO
tAR2,tMO,R3
R2, HAR41 IRII,R3
R3,tARO++llROI
RO,R3,R5
R5, +AR2++1 IROI
R4, HAR2
~/2
CONT
BITRV
LDI
SUBI
LDI
LDI
LDI
~FTSIl,RC
I,RC
~FTSIl,
IRO
IltflJT,ARO
!INPUT,ARI
RPTB
CMPI
IIGE
LIF
LDF
STF
STF
LDF
LDF
STF
STF
B!TRV
ARO,ARI
CONT
*ARO,RO
tARI,RI
ROt "'ARl
Rl,fARO
II(p
H+ARO(2)
NOP
tARI++llROIB
t+ARQ(1)
,RO
HAA1(t),Rl
RO,t+ARlIll
RI, t+AROI I I
SECOND LOOP
'"
NI~
RI,IRO
1,R7
LOW
@l.OGFfT,AR6
Cl'l'1
BZD
o
(j
, ~XIII+XILl
, RI~XIIIXILl
, ~YIII+YILl
, R3=YIIIYILl
, YIII~2 AND,
~
,:l
I::>..
ILKI
tMO, tAA2, RO
fAR2tt, tMO++, Rl
tAR2, tARO, R2
1M2, fAAO, R3
R2,+AROR3, tAA2ROt fMQH( lROJ
RI, tAA2++ I lRO I
LDI
LSH
DR
~I+COS
/MI ...
Y1I1~YIII+YILl
,
,
R~I tCOSR2tSIN
,
,
~tCOS+RI+SIN
~ltSIN
AND ...
R3=X III +X III
, R3=R2tCOS AND ...
XIII~XIII+XILI
AND
XILI~tCOS+R1tSIN,
~ARO+2tNI
YILI~RI tCOSR2+SIN
lINE
RI,ARI
INLOP
LSH
I,AR7
LSH
1,AR5
IE~2tIE
POP
POP
POP
POP
POPF
POPF
POP
POP
PIP
R!:TS
AR7
AR6
AR5
AR4
R7
R6
R5
R4
FP
00
00
R2DlT,ASII
THIS _
GENERIC _
IIOJIlES
19,07,89
!?
"<::j
[
i:;
'
~
.!"1
i:::l
<l
.!"1
$:l
;:,
$:l..
s....,
'"
$:j
;:,
~
<:l
'"
~INIIDI
'"
COEFFICIENT
RUoNIO)) = COSI2*PItO/321
!(WNIO)) = SINI2*PIt01321
R{WN(4)) = COSI2*PI*4/321
!(WNI4Il = SINI2*PI*41321
=1
=0
= 0,707
= 0.707
12
13
FIRST 1110 PASSES IIIlE REALIZED AS A FlU! BUTTERFLY LIXJ' SINCE TIE
rt.t.TIPLlESIIIlE TRIVIAL. THE rt.t.TIPLIER IS DIU USED FOR A LOAD IN
PARAlLEL WITH IW ADIIF IR SUBF.
14
R{WNI3IJ
!(WNI3))
R{WNI7Il
!(WNI7I)
= 0,831
= 0.550
= O,I9:i
= 0.981
15
= COSI21P!J3/321
= SINI2IPI*3/321
= COS I21PI>7I32 I
= SINI2*PI*7/321
229 WOOOS
512 IIMIIS
ttffttftHtftfftUttfHHffHHffHfHffHIHHflHfHflfllfHfHffHfHlfIHI
f
A~AGE
CYCLES/BUTTERFLY
7,275
TOTAL BUTTERFLYCYCLES
= 37249
INITIALIZATION OVERHEAD
= 2181 = 5,55, (F TOTAL TItE
TOTAL NIJ1BER OF INSTRUCTION CYClES = 39429
TOTAL TIME FOR A 1024 POINT FFT
2.36 os IEXCLUDING BIT
j AI/
/ \
/
RE~I
DR + J BI  I
cos 
/
\ +
j SIN I  DR' + j BI'
HUtHtHftHHIHnHHfHltftfHfHHtfftttftHHltlHHffHHfHfHltftfH
=~
cos
TR
TI = DR * SIN  Bl *
81 f SIN
cos
AR'= AA + TR
AI'=AITI
DR'= AR  TR
81/= AI + Tl
oc
J
~
a
.~
WHEN GENERATED FOR A FFT LENGTH IIF 1024, TIE TABLE IS FOR ALL
AVAILABLE FFT IIF LESS OR EIllW. LENGTH,
HtlHHHHHHffHHHfHHffHtHHHUHHHffHHHfHHtHHHHHHHft
s.
FILES'
TIE TWIDDLE FACTMS IIIlE SHRED IN BIT1If.IIERSED IRIEl AND WITH A TAa.E
LENGTH IF N/2 IN = FFTLENGTHI,
EX1<'I'LE' SOOWN FIR N=32, WNlnl = COSI2IPItn/NI  j>SINI21PItn/NI
ADIRESS
;:,
F!l.L~ING
ftfftHHftftfHHfHtHHfHfHffHftHHfHffHfHtHfHHffffHffHfffHftf
3
I
.~
>
~
;:,
.,lobil
.globil
.globil
.globl!
.global
.global
_global
"
;:,
.BSS
111'.2048
.BSS
OOTP,2048
"i:5
.;:!
~
~.
~
t:1
C)
~
I:>
;:,
..,'~"
~
I:>
;:,
~
c:::;
FFTSIZ
FG4Il2
FG4"3
FGSt12
FG2
FG2I13
LOOm
SINTAB
SINT"I
SINTP2
INPUT
INPUTP2
(UJTPUT
'"
0C
00
\0
SUBf
SINE
INPUT IlECTCll lENGTH = 2N IIEPENDS
IJj NI
OOTPUT IlECTCll lENGTH = 2N 1 _
IJj HI
.lIiord
,word
N'lIERT2
INIERT 3
.lIIord
.word
~TCHEL2
.lIIord
fWolB
fWolB3
."ord
.word
.lIIord
,lIIord
.word
.1II0rd
,word
.lIIord
FFT:
I.lf'
LOI
LDI
LOI
ADDI
ADDI
ADDI
LOI
LOI
LDI
LOI
LSH
LOI
SUBI
SIIE
SIIEI
SIIE+2
INP
111'+2
OOTP
AR3,ARb
2.IRI
I,IRO
lRO,RC
2,RC
"
"
SUIIf
ADIJ'
STF
SUIIf
"
STF
ADIF
II'YF
"
SlIIIF
"
"
FFTSIZ
fFG2,IOO
fSIHTAB,AR7
IINPUT,IIRO
lRO,IIRO,ARI
IRO,ARI,AR2
lRO,AR2,AR3
ARO,AR4
ARI,ARS
ADIJ'
II'YF
"
AIIIJ'
STF
SlIIIF
STF
ADIF
1M2,<ARO,R4
1M2, tAROtt ,RS
IMI, tAR3, Rb
tARl ++. IM3++ I R7
Rb,R4,RO
tAR3++,1M7,RI
Rb,R4,RJ
RI,tARI,RO
RO,.fAR4++
RI,lMltt,RI
RJ,tAR5++
RI,RS,R2
t+AR2,1M7,RI
RI,RS,RJ
RI,tllRO,R2
R2,lM2ttIlRII
Rt, tMO+t I R6
RJ,lMbtt
RO,R2,R4
R4=AR+CR
RS=ARCR
Rb=M+IIR
R7=MIIR
AR'=RO=R4+Rb
RI=OI, I1R'=RJ=R4Rb
; RO=BI +01, AR' =RO
; RI = 51  DI , BR' = R3
;CR'=R2=R5+RI
; RI=CI, M'=RJ=RSRI
; R2=AI +CI, CR' =R2
; Rb=AI CI, M' =RJ
; AI' = R4 = R2 + RO
RPTB
II'YF
::
ARO AA + Al
ARI SR + HI
AA2 CR ... CI ... CR' t CI~
M3:DR+OI
AR4 : AR' + AI'
ARS , SR' + HI'
ARb : M' + 01'
AR7 : FIRST TWIDIl FACTCll = I
S
ADIJ'
SUBF
ADIJ'
~TCIL
.tixt
FILL PIPELllE
N
IfW.B
N'lIERT
J:
::
SUIIf
II'YF
ADIF
ADIF
STF
SUBf
"
"
LDAD PAGE POIHTER
lRO = N/2 = IJ'FSET BETWEEN INPUTS
AR7 POINTS TO TWIDOLE FACTCll I
IIRO POINTS TO AR
ARI POINTS TO IIR
AR2 POIHTS TO CR
AR3 POINTS TO M
AR4 POIHTS TO AR'
ARS POINTS TO I1R'
ARb POINTS TO M'
ADMESS OFFSET
lRO = H/4 = lUUIER OF R4BUTTERflIES
::
::
::
STF
SlIIIF
ADIJ'
STF
SUIIf
STF
AIIIJ'
II'YF
SUIIf
ADIF
STF
SUIIf
:1
STF
ADIF
II'YF
::
SUIIf
"
STF
"
STF
ADIF
SUIIf
IlKI
1M2, 1M7, RO
RO,R2,R2
IMI tt, tAR7, RI
R7,Rb,RJ
RO,<ARO,R4
R4,1M4tt
RO, _ARO++,R5
R2,tAR5++
R7,Rb,R7
RI,tAR3,Rb
R7,tAR6++
RI, tM3tt, R7
R3, IM2tt
Rb,R4,RO
tAR3tt, 1M7, RI
Rb,R4,R3
RI,tARI,OO
RO,1M4tt
RI,lMltt,RI
R3, 1M5++
RI,RS,R2
ttAR2, 1M7,RI
RI,RS,R3
RI,<ARO,R2
R2, IM2ttllRIl
RI,_,Rb
R3,tM6++
; RO = CR , IBI' = R2 = R2  ROI
; RI = IIR , ICI' = RJ = Rb + R71
; R4 = AR+ CR , IAI' = R41
; R5 = AR  CR , IBI' = R21
; 101' = R7 = Rb  R71
; Rb = M + BR , 101' = R71
: R7 = M  BR , ICI' = R31
;AR'=OO=R4+Rb
; RI=DI, BR' =R3=R4Rb
; 00 = BI + DI , AR' = 00
; RI = BI  DI , I1R' = R3
;CR'=R2=R5+RI
; RI = CI , M' = R3 = RS  RI
;R2=AI+CI,CR'=R2
; Rb=AI CI, M' =R3
8LKI
ADIIf
RO,R2,R4
; AI' = R4 = R2 + RO
nEAR PIPELINE
SUIlf
ADIIf
STF
STF
SUIIf
STF
STF
"
RO,R2,R2
R7,Rb,R3
R4,IM4
R2,tAR5
R7,Rb,R7
R7,IMb
R3,'AR2
R2,IM3++
AR5,RC
FIRST BUTTERFLYTYPE;
; BI' = R2 = R2  RO
; CI' = R3 = Rb + R7
; AI' = R4 , SI' = R2
TR = BR cos + BI SIN
TI=BR'SINBI'1lIS
AR"= AR + TR
AI'= AI  TI
BR'= AR  TR
BI'= AI + TI
; DI' = R7 = Rb  R7
; DI' = R7 , CI' = R3
;:,
STF
llll
'tj
"
::l
~
;:,
is
~.
;:,
.Q,
STLH
tl
(j
~
I:l
;:,
I:l..
S.
~
...
\:l
;:,
::
llll
llll
SUBI
llll
fFG2,IRI
IRO,AR5
1,AR5
I,ARb
LDI
llll
llll
LDI
ADDI
llll
LSI!
LSI!
LSI!
LSH
!SINTAB,AR7
O,AR4
@INPUT,ARO
ARO,AR2
IRQ,ARO,ARJ
AR3,ARI
I,ARb
2,AR5
1,AR5
I,IRQ
LSH
ADDI
I,IRI
I,IRI
L!F
L!F
IMI++,Rb
tAR7,R7
tARt++, tAR7, RO
RO,RI,R3
'ARI++,R7,RI
R3, tARO,R2
tARO++,R3,RS
;RS=AR+TR,BR'=R2
FILL PIPELINE
'"
;:,
S.
~
"
"
a
C
"
"
"
"
"
IIFLYI
"
L!F
If'YF
ADIIf
MPYF
I1PYF
ADDF
If'VF
SUIIf
ADIIf
H+AR7,Rb
IMI,Rb,RI
H+AR4,RO,R3
IMI,R7,RO
; RI = BIt cos , R2 = AR  TR
IIfLY!
t+AR!, RO, R5
ADDF
SUIIf
STF
ADIIf
If'YF
SUIlf
If'YF
STF
ADIIf
STF
RS, tAR2++
RI,RO,R2
IMI,R7,RO
R2,'ARO,R3
R2, 'MOH ,R4
R3, IM3++
RQ,RS,R3
tARltt,Rb,RO
R3, 'ARO,R2
IMI++,R7,RI
R4, IM2++
IMO++, R3, RS
R2,IM3++
"
::
c:;
~
N
GRLI'PE
~
c:;
"
RPTB
If'YF
STF
SUIIf
If'YF
"
"
"
"
SUIIf
ADOF
STF
SUIIf
STF
NOP
If'YF
STF
If'YF
If'YF
SUIIf
I1PYF
SUBF
ADIIf
STF
llll
RI,RO,R2
R2, tARO,R3
; R2 = TI = RQ  RI
; R3 = AI + TI , AR' = RS
RS,IM2++
RQ,RI,R3
IMI++,Rb,RI
R3, tARO,R2
IMO++,R3,RS
R2, IM3++
AR5,RC
; R4 = AI  TI , BI' = R3
; AOOOfSS UPDATE
; RI = BI t cos , AI' = R4
; RQ = BR SIN
; R3 = TR = RI  RQ , RQ = BR cos
; RI = BI SIN, R2 = AR  TR
;RS=AR+TR,BR'=R2
~
~
LDI
LDI
LDI
LDI
"(5
SECOND BUTTERFLYTYPE:
;;J
~
IS
~.
;:
I. BUTTERFLY' 0('()
ADDF
SUBF
ADDF
SUBF
f*+.HHUHUfHHHHHHHHtHHfHHfHUHffHfHfUH+HHHHHHTHH
RPTB
BFLY2
:J
MPYF
STF
ADlIF
MPYF
ADDF
SUBF
STF
WBF
MPYF
SUBF
MPYF
STF
ADDF
STF
f+ARl,R7,R5
"'
~
..,'"
~
"
s,
c
;:
~
'"
tv
aa
BfLY2
"
*ARQ++ R3. R5
R2. fAR3++
.ARO, .ARI,R2
tAAt ++, fARO ... , R3
tARO,'ARI,RO
fAAl++,*A/lO++,Rl
1IR'=R2=IIR+OO
OO'=R3=AROO
AI' = RO = AI + BI
BI' = RI = AI  BI
2, BUTTERFLY' ."()
RS, tAA2++
RI,RO,R2
tARI,Rb,RO
R2, IAIlO,R3
R2,IARO+t ,R4
R3, tAR3++
RO,R5,R3
*ARl++,R7,RO
R3, .ARD,R2
fARl++,R6,Rl
R4,IAR2+t
LMIl OUTPUT
POINTER TO TWIDlLE FACTOR
OISTiVU IIETIEN TWO GRIlPS
FILL PIPELINE
:J
("J
IIRI,1IR3
ts1NTP2,1IR7
5,IRO
fFG8I12,RC
; (R2 = TI = RO + RII
, RO = 00 I SIN, (R3 = AI + TIl
; (RI = AI  TI , BI' = R31
;TR=R3=I15RO
; RO=OO.COS, R2=ARTR
ADDF
SUBF
ADDF
SUBF
STF
STF
STF
STF
fARO, IAIlI,Rb
; IIR' = Rb = AR + 00
<ARIH,*ARO++,R7
; 00' = R7 = AR  00
.ARO,IAIlI,RI
; AI' = R4 = AI < BI
tARIHIlROI,IAROHIIROI,RS; BI' = 115 = AI  BI
R2,IAR2++
; (AR' = R21
R3,IM3+t
; IBW = R31
RO,tAR2++
; (AI' =RO)
RI, 1AIl3H
; (81' = Rli
STF
R6,tAR2++
; ARf
R7, tAR3H
R4,tAA2++(JRO)
115, 1AIl3++IIROI
; 00' = R7
; AI' = R4
; BI' = 115
STF
, STF
STF
= Rb
3. BUTTERFLY: .rM/4
CLEAR PIPELINE
AOOF
ADIIF
STF
CMPI
BHED
SUBF
STF
L!IF
STF
N(f>
Rl,RO,~l
R2, <ARD,R3
RS, *AR2++
ARb,AR4
GIlUPPE
R2, .fARO++! IRt) ,R4
ADIIF
SUBF
ADDF
SUBF
R2=TI=RO'RI
R3=AI+TI
AR'=R5
ADLf
LIF
LOF
SUBF
STF
STF
STF
R7 = cos
AI' = R4
BRANCH HERE
I,IRO
snn
AR'
AI'
BI'
00'
= 115
= Rl
= Rb
= RT
= IIR
= AI
= AI
= AR
+ BI
 00
+ 00
 BI
4. BUTTERFLY' 1<'1114
, 00 FOLLOWING 3 INSTRUCTIONS
, RI = AI  TI , BI' = R3
tARo... , '+ARl, R5
IARl,fARO,R4
IftRl++,IARO,Rb
tARl t+, 1AIl0++, R7
HARI,H+ARO,R3
; AR' = R3 = AR + SI
+ART,RI
; RI = IFoo INNER LW'I
IAIlI++,RO
; RO = 00 (FOR INlIER LOO"I
IAIlIHIIROI,IARO++,R2; 00' = R2 = IIR  BI
R5,IIIR2++
(AR' = 1151
R7,tAR3++
100' = RTI
R6,fM3H
(81' =Rbl
5. TO M. BUTTERFLY:
APTB
BF2ND
LDF
STF
LIF
STF
AR7++,R7
RI, tAR2++
tAR7++,Rb
R2,tAR3++
\0
LDI
LDI
ADD!
~INPlJT,ARO
ARO,1IR2
IRO,ARO,IIRI
UPPER Itf'UT
Lf'PER OUTPUT
UlER Itf'UT
"
"
\0
tv
"
rl'VF
STF
ADlF
rl'YF
ADDF
SUBF
STF
ADDF
rtPVF
SUHF
;.,.
;::
~
':j
"
"
;:;:
'"
;::
"
Ei
"
"
.'"3
"
tI
"
.'"3
"
;::
C)
1:>
;::
;:,.
a
s'..."
"
"
R2, fARO++,R4
R3, tAR3++
, (R4 = AI  TI , BI'
RO,RS,R3
,R3=TRoRO'RS
; RO = DR + SIN, R2 = AR  TR
II'VF
STF
SUHF
II'YF
ADDF
SUHF
"3,
c:;
"
SUHF
'"
sc:;
"
"
'"
"
~
~
tv
00
"
"
,R3=TR=RO+RS
,RO=DR'SIN, R2=ARTR
SUHF
II'YF
STF
ADDF
STF
II'VF
STF
ADDF
II'YF
ADDF
SUBF
STF
SUBF
rtPVF
~l++,Rb,RO
R3,+ARO,R2
tAR I ++1 IROI ,R7,RI
RI, tAR2++
tARO++, R3, R3
R2. tAR3++
SUBF
rl'VF
ADlF
R3, 'ARO,R2
tARI++( IROI ,Rb, RI
tARO++, R3, R3
, RI = 01 SIN, R3 = AR + TR
R31
+tARI,R6,RS
RS,tAR2++
RI,RO,R2
tARI,R7,RO
R2, .ARO,R3
STF
ADDF
I1PVF
SUBF
rl'YF
STF
ADDF
STF
"
HF2END
"
a.EAR PIPELINE
, (R4 = AI  TI , BI'
rtPYF
STF
SUBF
rwYF
ADDF
STF
SUBF
rtPVF
;::
, (R2 = TI = RO + Rll
, RO = DR COS , (R3 = AI + TIl
R3,.ARO,R2
tARI++,R7,RI
R4,tAR2++(IROI
+ARO++, R3, RS
R2,tAR3++
"
fARl++,Rb,RO
rtPYF
STF
ADIF
STF
1:>
;::
f+ARI,Rb,RS
R3, tAR2++
RI,RO,R2
'ARl,R7,RO
R2,+ARO,R3
R2, 'ARO++(JROI ,RI
R3,tM3++ilROI
RO,RS,R3
"
"
::
,RS=AR+TR,DR'=R2
, RS
BI + SIN, (AR'
RSI
R2, fM3++
RI,tAR2++
Rl,RO,R2
R2, +ARO,R3
RJ,IM2++
R2, +ARO,R4
R3,tAR3
R4,+AR2
LDI
lOI
lOI
lOI
LDI
lOI
lOI
R31
tINPtiT,ARO
ARO,AR2
tINPtiTP2,ARI
ARI,AR3
ts I NTP2, AR7
3,IRO
@FG4r12,RC
FILL PIPELINE
,R3=AR+TR,DR'=R2
I. BUTTERFLY' ."0
f+ARI,R7,RS
R3, 'lM2++
RI,RO,R2
tARI,Rb,RO
R2, .ARO,R3
R2, +ARO++I IROI, R4
R3, tM3++{lRO)
RO,RS,R3
tMl++,R7,RO
R3,+ARO,R2
tARl++,Rb,Rl
R4,+AR2++IIROI
fAAO++,R3,RS
, RS = AR + TR , DR' = R2
, 1R2 = TI = RO  RII
, RO = DR + SIN, 1R3
, DR' = R2 , AI'
R4
,R2=TI=RO+Rl
, R3 = AI + TI , AR' = R3
, RI = AI  TI , 01'
R3
, AI' = R4
LAST STIa
, (R2 = TI = RO  RII
, RO = DR + COS , (R3 = AI + TIl
0
STF
STF
ADDF
ADIF
STF
SUBF
STF
STF
DDF
SUBF
ADDF
SUHF
AI + TIl
; UPPER II~UT
; UPPER ClJTPUT
,LMR IIf'UT
, LMR ClJTPUT
, POINTER TO TlUDIH FACTORS
, GRP OFFSET
,AR'oR6=AR+DR
tARO, tAR I ,Rb
,BR'=R7=ARDR
fAA 1++ *ARO++ R7
, AI' = R4 = AI + BI
tARO, tARI, R4
tARI++1 IROI, tARO++l IROI,RS , BI' = RS = AI  01
2. BUTTERFLY: w"tt/4
ADDF
LDF
,R3oTR=RSRO
,RO=DR'COS, R2=ARTR
::
LDF
SUHF
STF
STF
STF
R2, IM3++
,AR'=R3=AR+BI
, RI = 0 IFOO INf'ER LOiJ'I
, RO 0 DR IFOR IIH:R LOiJ'I
tMl++,RO
tAR1++( IRO). fARO++ ,R2 , DR' = R2 = AR  BI
, IAR' = Rbi
Rb, tUt2++
, IDR' = R71
R7,+AR3++
RS, iM3++(IROI
, IBI' = RSI
f+ARI,+ARO,R3
<AR7,RI
3. TO M. BUTTERFLY:
f+ARI,R7,RS
RS, tAR2++
RI,RO,R2
+ARI,R6,RO
R2, +ARO,R3
R2, fARO++,R4
R3, tAR3++
RO,RS,R3
tARl++,R7,RO
,R3=TR=RSRO
,ROoDR+COS, R2=ARTR
, 1R2 = TI = RO + Ril
, RO = DR SIN, 1R3 = AI + TIl
0
R31
::
::
"
LDF
STF
LDF
STF
rl'YF
STF
ADDF
rtPYF
+AR7++,R7
RI, +AR2++IIROI
+AR7++,Rb
R2, +AR3++
f+ARI,Rb,RS
R3,+AR2++
RI,RO,R2
tARI,R7,RO
, R7 = COS , IAI'
R41
;:s
II
~
'G
..
II
SUIIF
..
..
..
..
..
..
If>YF
SIF
ADIF
STF
!:i
~.
~
t:l
(")
....,
I:l
;:s
!:l..
So
ADIF
SUIIF
SIF
ADIF
rlf'YF
If>YF
SIF
SUIIF
If>YF
ADIF
SUlF
SIF
SUIIF
If>YF
SUIIF
lFLEND
...'"
~
~
So
..
..
SIF
SIF
ADIF
; R5"::: BI
ADIF
R3,IM2++
SUIIF
R2, <ARO,R4
R3,1AR3
R4,tAR2
END IF
BIT REVERSAl
a
C
..
..
BITRV
LOI
LOI
LOI
LOI
LOI
SUBI
I!FHSIl,IRO
2,IRI
@INPUT,ARO
!OUTPUT, ARI
I!FFTSIl,RC
2,RC
LDF
Rf'TB
BlTRY
ttAROlll,RO
IN
~++IIROlb,Rl
SIF
RO, _1 (I I
ttAROUl ,RO
RI,IMl++lIRIl
tARO++(JROlb,RI
RO,_11l1
LDF
SIF
LDF
STF
NIP
NIP
NIP
NIP
RI,IMI
SELF
BR
_I, R7, R5
R3,1M2tt
RI,RO,R2
IMI,R6,RO
R2,<ARO,R3
R2, tAROttilROl, R4
R3,1M3tt (JROI
RO,R5,R3
IMltt,R7,RO
R3,<ARO,R2
tARltt(JROI,R6,RI
tARO++ ,R3,R3
STF
END:
,R3=TR=RO+R5
, RO = BR SIN, R2 = AR  TR
R2,_
R2,lAR3tt
R4,1M2++(JROI
RI,RO,R2
R2,<ARO,R3
SIF
STF
II
~
N
IARO++,R3,R3
SIF
, (R4 = AI  TI , BI' = R31
:I
, 1R2 = TI = RO  Rll
, RO = BR SIN, 1R3 = AI + TIl
, IR4 = AI  TI , BI' = R31
,R3=TR=ROR5
,RO=BRt.COS, R2=ARTR
, RI = BI t SIN, R3 = AR + TR
Il.EAR PIPELINE
'"
\0
If>YF
ADIF
R2,<ARO,R3
R2, fMOttl IROI,R4
R3,IAR3++(JROI
RO,R5,R3
IMltt,R6,RO
R3,'ARO,R2
IMltt(JROI,R7,RI
R4,1AR2++( IROI
,R2=T1=RO+RI
,R3=AI+T1,AR'=R3
, R4 = AI  TI , BI' = R3
, AI'= R4
.tnd
SELF
'2
APPENIIIlM
COIfl.EX, RADIX2 DIT FFT
R2DITB.ASIt
THE TWIDIl.E FACTOOS ARE STORED IN BIT REVERSED ORDER AND WITH A TABLE
LOOTH IF N/2 (N = FFTlfNGTHI.
;:::
3'
"i:5
ADDRESS
"
12
13
14
15
FIRST Till PASSES ARE RAl.IZED AS A FlUl BUTTERFLY LIXP Situ TIE
I1lI.TlPLIES ARE TRIVI~. TIE I1lI.TlPLIER IS ONLY USED FOO A LOAD IN
PARALLEL WITH AN ADDF 00 SUBF
IIIN
"l
i::l
(J
I:l
IIIORY SIZE :
I:l..
PROO
DATA
;:::
S.
...,
'"
I:l
;:::
~
<::i
~
~
s.
'"
~
w
ac
COEFFICIENT
R{WNIO)) = COSI2.PltO/32)
I(WNIO)) = SINI2tPltO/32)
R{ljNI4)) = COSI2tPl.4/32)
I(WNI4)) = SINI2tPl'4/321
R(Wlmll
I(WNI3))
R{1IH17Il
I(WNl7Il
= COS(2'P!t3/32)
= SIN(2tPl'3/321
= COSI2'PIt7/321
= SINI2tPl'7/32)
= 0.831
= 0.550
= 0.195
= 0.981
~L
231 WOODS
512 WOODS
fHftIHHfffHffHfHUHffHfff+ffH4HfHIIHUffHfftfffHHf.fHffHfo+H.
4
8
8.25
AVERAGE CYCLES/BUTTERFLY
7.475
TOT~ BUTTERFLYCYCLES
= 38272
INITIALIIATIOO OVERIAD
= 2185 = 5.4 Z DF TOT~ TIlE
TOT~ NlIIIER IF INSTROCTIOO CYCLES = 40457
TOTAL TIlE FOR A 1024 POINT m
2.42 .5 IINCLUDIti BIT
REVERSIL)
fHHffH+ ...... H .. HfoHHHfHIt. .HHHffHttH.HHHHHHtHH.... fHfHHff
c::I.o
=1
=0
= 0.707
= 0.707
~TERED.
=
....
c
tfHtHHHHHfHffHnHHHfHHtHfHHfHtfHHtfHffHffHHHffHlfHff
:l
 j*SINC2tPItn/N)
'i
~
24.07.89
E;
= COS(2*PItn/N)
\ I
I \
/
I
TR
= BR
COS + SI
* SIN
TI = BR
SIN  BI
AR'= AR
TR
AI'= AI  TI
SR'= AR  TR
Bl'= Al + TI
\
\ +
cos
'H+UHffHHfflU.HfHfHHflltfHHHHHffHIHffHtHIIH'IIHHHHHHt
'"
r;
....c::I.o
~
~
::3
~I
>
~
::..
;::
FFT
H
IfW.B
FILL PIPW
Ei
.global
.global
,glob.1
.global
,glob.1
.global
.global
"SIN!:
;::
,bss
ltf',2048
.bS5
OOTP,2048
Sl
_"l
.text
"*
;:!
~
;::
~.
i::)
NVIERT
NACHTL
ADDF
SUBF
I"'UT I'ECToo LENGTH = IN IW'ENDS
ON HI
OOlI'UT I'ECTIll LEIf3TH = IN 1!EP9IDS
ON HI
"
"
FFTSIZ
"
,!IIord
.word
.word
FG4M2
FG4"3
FGBM2
..
,word
NVIERT2
NVIERT 3
NACHT[L2
I:l..
FG2
.word
~B
"
(j
_"l
;::
a
..,
S~
~
$:I
;::
~
~
FG2I13
LooFFT
SINTAB
SINT"I
SINTP2
INPUT
INPUTP2
OUTPUT
wm
'"
.word
ttiAUl3
.lIIord
"
,!IIord
.word
,word
,word
.liIord
.word
.\IIord
"
SINE
SINEl
SINE'2
Iii'
IhP+2
WTP
WTP.l
S~
APTB
MPYF
"
"
..
..
"
c:>
0c:>
\0
UI
FFT:
W'
LDI
LDI
LDI
ADDI
ADDI
ADDI
LOI
LOI
LDI
LDI
LSH
LDI
SUBI
FFTSIZ
1FG2,IRO
!SINTAB,AIl7
@IIf'UT,1IRO
lRO,AIlO,AIlI
IRO,AIlI,AR2
lRO,AR2,AR3
AIlO,AIl4
ARl,AR5
AR3,AIlb
2,IRI
1,IRO
lRO,RC
2,RC
SUBF
ADLl'
STF
SUBF
STF
ADDF
1AR2, <MO, R4
1AR2, .ARO++ , RS
<Ml,<M3,R6
tAR1++, *AR3++,R7
Rb,R4,RO
fAR3tt, tAR7, Rl
Rb,R4,R3
RI,'IIRI,RO
RO, fAR4++
Rt, fARt ++, Rt
RJ,IARS++
Rl,R5,R2
ttAR2,tM7,Rl
Rl,R5,R3
Rl,'IIRO,R2
R2,<M2++IlRII
Rt, IMO++, Rb
R4=AIl'CR
RS=AIlCR
Rb=M:+BR
R7 = III  IIR
AIl'=RO=R4.R6
RI = DI , IIR' = R3 = R4  R6
; RO=BI'Dl, All' =RO
; RI = BI  DI , IIR' = R3
; CR' = R2 = RS Rl
; RI = CI , 00' = R3 = R5  RI
; R2 = Al
CI , CR' = R2
; R6 = AI  CI , Ill' = R3
R3,IARb++
~~,R2,R4
; AI' = R4 = Rl RO
AIlO'AIl'AI
AIlI , IIR' BI
AR2 : CR .. CI + CR~ + CI I
AIl3 : III DI
AIl4 : All' AI'
AIlS : IIR' BI'
IIRb : DR' Dl'
AIl7 , FIRST TWIDDLE FACToo = I
;::
ADDF
SUBF
ADIF
I1PYF
SUBF
ADLl'
STF
SUBF
STF
ADLl'
II'YF
"
"
"
..
SUBF
MPYF
ADDF
ADLl'
STF
SUBF
STF
SUBF
ADLl'
STF
SUBF
STF
ADLl'
MPYF
SUBF
ADDF
STF
SUBF
STF
ADDF
.
..
MPYF
SUBF
ADLl'
STF
SUBF
BLKI
fAFU. tAR7 ,RO
RO,Rl,Rl
IARlt+,fM7,Rl
R7,R6,R3
RO, 'IIRO,R4
R4,tAR4++
; RO = CR , IBI' = R2 = Rl  ROI
; RI = BR , ICI' = R3 = R6 R71
; R4 = All CR , IAI' = R41
RO, IARO++, R5
R2,IARS++
R7,R6,R7
RI,'AIl3,Rb
, IDI' = R7 = Rb  R71
, R6 = III IIR , IDI' = R7l
R7, IMb++
Rt, fAR3++, R7
R3, tAR2++
,R7=IllIIR,ICI'=R31
Rb,R4,RO
tAR3++, <M7 ,RI
R6,R4,R3
Rl,+AIll,RO
RO, <MI++
RI, fARl++,Rl
R3, fAR5t+
Rl,RS,R2
; All' = RO = RI + R6
, RI = DI , IIR' = R3 = RI  Rb
; RO = BI DI , All'
= RO
, Rl = BI  Dl , IIR' = R3
; CR/ :; R2 :: R5 + RI
, RI = CI , 00' = R3 = R5  Rl
; Rl = AI CI , CR' = Rl
,R6=AICI, Ill' =R3
I:
STF
IlJCI
ADIF
Rl,W6++
RO,R2,R4
FIRST IlITTERflYTYPE:
a.EAR PIPEl.INE
SUIIF
STF
STF
::
~
::s
SUIIF
SIF
::
STF
LDI
LDI
SUBI
LDI
is
SllfE
t:l
(J
."'l
$:l..
c:>
.,So
'I>
..
!RPPE
M,_
112,_
R7,R6,R7
R7,<AR6
Rl,<AR2
~
c::s
LDI
LDI
LDI
LDI
ADDI
LDI
LSH
LSH
LSH
LSH
tsINTAB,AR7
O,AR4
111I'UT,ARO
ARO,AR2
IRO,ARO,ARJ
ARJ,ARI
I,AR6
2,AR5
1,AR5
I,IRO
LSH
ADDI
I,IRI
I,IRI
LDF
LDF
WI++,R6
<AR7,R7
tv
LDF
..
.
..
..
If'YF
ADIF
If'YF
If'YF
If'YF
SUIIF
ADIF
STF
::
1FG2,IRI
IRO,AR5
1,AR5
1,AR6
FILL PIPEl.INE
'"
So
'I>
TR = BR
TI = BR
AR'= AR
AI'= AI
BR'= AR
BI'= AI
"~
5"
::s_
RO,II2,II2
R7,R6,Rl
f++AR7,R6
WI,R6,Rl
++W4,RO,Rl
Wl,R7,RO
Wl++,W7,RO
RO,RI,R3
Wl++,R7,RI
Rl,<ARO,II2
_,Rl,RS
112,_
AR5,RC
LDI
, AI' = R4 = R2 + RO
::
::
,
,
,
,
,
,
,
,
,
,
TR
II
II
..
!FlYI
RPTB
!FlY!
If'YF
_1,R6,R5
R5,iM2++
RI,RO,R2
WI,R7,RO
R2,'MO,R3
STF
SUIIF
If'YF
ADIF
SUBF
SIF
ADIF
If'YF
SUIIF
If'YF
STF
STF
Rl,<AR3++
RO,R5,Rl
WI++,R6,RO
R3,'ARO,R2
WI++,R7,RI
R4,<AR2++
WO++,Rl,R5
112,_
SIN
tAR'
R5J
, 1112 = TI = RO  RII
; RO =lit f COS , (R3:1 AI + TI)
, IR4 = AI  TI , BI' = Rl)
;R3=TR=RO+R5
,RO=BRISIN,R2=ARTR
, RI = BI I COS , IAI' R4)
,R5=AR+TR,BR'=R2
::
.
.
II
II
II
STF
SUBF
SIF
HOP
If'YF
SIF
If'YF
If'YF
SUBF
If'YF
SUIIF
SIF
LDI
RI,RO,R2
II2,<ARO,Rl
R5,<AR2++
112, IARO++IIRII ,R4
Rl, <AR3++IIRII
WI++IIRII
WI,R7,RI
R4, <AR2++IIRII
WI,R6,RO
tARl++,'M7++,RO
RO,Rl,Rl
Wl++,R6,Rl
Rl,<ARO,II2
_,Rl,R5
R2,IAR3++
AR5,RC
SECOND IIJTTERFLYTYPE:
TR=BllCOSBRISIN
TI=BIISIN+BRICOS
AIl'= AR +TR
,RS=AR+TR,BR'=R2
R2,fARO++,R4
; R5 = BI
AI'= AI  TI
BR': AR  TR
,R2=TI=RORI
,Rl=AI+TI, AR' =R5
, R4 AI  TI , BI' = Rl
, ADalESS lPDATE
, RI BI I COS , AI' R4
,RO=BRISIN
, Rl = TR = RI  RO , RO = BR I COS
,Rl=BItSIN,R2=ARTR
,R5=AR+TR,BR'=R2
SlIIIF
~
8['= AI I TJ
;::
RPT9
IIfLY2
If'YF
STF
AJIlF
If'YF
HARI,R7,AS
AS,lAR2ff
Rl,RO,R2
fARI,R6,RO
R2, tARO,R3
R2,fMO++,R4
R3,tAR3ff
RO,AS,R3
fAR1+l, R7, RO
R3,.MO,R2
fARl ff ,R6,Rl
R4,lAR2ff
fAROff ,R3,AS
R2,fAR3ff
ADlf
;;;
::
!il
...S.
Q
I:
"
AIlIlF
SlIIIF
"
STF
;::
~
tl
<"'l
,:l
SlIIIF
If'YF
SlIIIF
"
If'YF
STF
ADlf
STF
II
IIfLY2
II
I:>
CLEAR PIPELIlE
ADlf
AJIlF
STF
O'f[
~
~
....
BlED
~
c::;
""
Q
;::
So
~
~
N
<::>
Q
<::>
, AS = 9[
: 1R2 = TJ = RO I Rll
, RO = BR f SIN, (R3 = AI I TIl
;:
,TR=R3=ASRO
,RO=BRtCOS,R2=ARTR
"
, Rl = 91
"
SUIIf
"
STF
"
STF
LIf'
NOP
END
, 91' = RI A[  91
STF
STF
STF
STF
STF
STF
STF
STF
AIlIlF
SUIIf
Rl,RO,R2
R2, tARO,R3
R5,IAR2++
AR6,AR4
GfilPPE
R2,tIIRO++URll,R4
R3, tAR3ffURll
fHAR7,R7
R4, IM2++( IRll
fARlIIURIJ
;R2=TJ=ROIRI
,R3=Alln
; AR' = RS
SUBF
, AR' = AS = AR I 91
,AI'=R4=A[BR
, BI' = Rb = AI I BR
;BR'=R7=AR91
4, BUTTERfLY; W"ft/4
, 00 Rl.I.~INl 3 INSTROCTJONS
, R4 = AI  n , 8[' = R3
;R7=COS
; A[' = R4
; BRAIli IRE
,AR'=Rb=ARIBR
: BR' = R7 = AR  BR
, AI' = R4 = AI I 91
tARO, fARI,R4
fARIIII IROI ,tIIRO++ I IROI ,AS , 91' = AS = A[  91
, (AR' = R21
R2,tAR2++
, IBR' = R31
R3,fAR3ff
, (A!' = ROI
RO,tM2++
, (91' = Rll
Rl,fAR3ff
R6, fM2++
,AR'=Rb
, BR' = R7
R7, fAR3H
R4,IAR2IHIROI
,AI'=R4
AS, tAR3ff( [ROI
, 9[' = AS
tARO, fARI,R6
3. IlUTTERfLY: W"ft/4
,AS=ARITR, BR'=R2
ADlf
"
LIf'
LIf'
SUIIf
"
STF
STF
STF
,
HARl,tIlARO,R3
tAR7,RI
;
,
fARlff,RO
fARlffUROI, tAROtl, R2
R5,fAR2++
,
R7,fAR3ff
;
R6,*M3++
;
AR' = R3 = AR I B[
RI = 0 IFOR IIftR UXI'I
RO = BR IFOR lItER LOOPI
;BR'=R2=AR91
IAR' = RSI
IBR' = Rll
191' = RbI
O'fl
aNZ
4,IRO
STLfE
lOl
LD[
lOl
lO[
!INPUT,ARO
ARO,AR2
[RIl,ARO,ARI
ARI,AR3
IS INTP2, ARl
S,IRIl
;
,
,
;
;
,
lPPER [NPUT
lPPER OOTl'\JT
L~ [NPUT
L~ OOTPUT
PO[NTER TO TW[Dll.E FACTOR
DISTANCE IIETIEEN TIIIJ OROOPS
!FGIlIt2,RC
1. BUTTERFLY'
ADlf
SUIIf
ADlf
LDF
fAR7ff,Rl
R4,IAR2ff
fAR7ff,Rb
R2,fAR3ff
HARI,R6,AS
R3,fAR2ff
RI,RO,R2
fARI,R7,RO
R2,tARO,R3
R2,.ARQ++<IROI,R4
R3, tAR3ffUROI
RO,AS,RJ
fARl++,R6,RO
R3,tARO,R2
fARlff,R7,RI
R4,iAR2++UROI
"
STF
If'YF
STF
ADlf
If'YF
"
0('()
; AR' = R2 = AR + BR
,BR'=R3=ARBR
,AI' = RO = AI + 91
IIf2END
STF
"
RPT9
"
"
FILL PIPELINE
\0
...I
SUIIf
ADlf
SUIIf
ADIf'
"
fARI++,tIIRO++,RI
2, IlUTTERfLY: lO"O
"
"
LDF
AIlIlF
SUIIf
STF
ADDF
If'YF
SlIIIF
If'YF
STF
; (R2 = TJ = RO I Rll
; RO = BR t COS , (R3 = A[ I TIl
; IR4 = A[  Tl , B[' = R31
:R3=TR=RO+R5
, RO = BR f SIN, R2 = AR  TR
, Rl = BI
10
00
AIlIF
:1
STF
II'I'F
STF
SlIIF
II'I'F
ADIF
SlIIF
STF
II
;l..
;:
~
'I::i
W
~
is
II'I'F
::
SlIIF
II'I'F
STF
n
r:
STF
II'I'F
g'
::
I:
II'I'F
ADIF
If
STF
STF
SlIIF
SUBF
SUBF
II'I'F
tl
::
!""'l
:f
STF
:I
AIlIF
STF
(')
SUBF
II'I'F
I:l
S.
C
S
...<1>
~
s::\
;:
II'YF
::
STF
AIlIF
II'I'F
"
SUBF
~
<:i
BF2ElCI
"
::
S<1>
Q
C
SlIIF
II'I'F
ADIF
,R5ollR+TR,IIR'=R2
t+ARI,Rt.,R5
R5,tIIR2++
RI,RO,R2
tllRl,R7,RO
R2,tMO,R3
R2, tARO++,R4
R3,_
RO,R5,R3
tllRl++,Rt.,RO
R3,tMO,R2
tllRl++t1ROI,R7,RL
R4,tAR2++
tARO++,R3,R3
R2,_
t+ARI'R7,R5
R3,tAR2++
RI,RO,R2
tllRl,R6,RO
R2,tMO,R3
R2, tIIRO++lIROI ,R4
R3,_tlROI
RO,R5,R3
tllRl++,R7,RO
R3,IARO,R2
tllRl++,R6,RI
R4, tAR2++tlROI
tIIRO++ , R3, R5
R2,_
t+ARI,R7,R5
R5,tAR2++
RI,RO,R2 ,
tllRl,R6,RO
R2,tMO,R3
R2, tARO++ ,R4
R3,tAR3++
RO,R5,R3
tllRl++,R7,RO
R3,tMO,R2
tllRl++tlROI,R6,RI
tMO++,R3,R3
"
::
STF
STF
AOOF
AIlIF
STF
SlIIF
"
LDI
LDI
LDI
LDI
LDI
LDI
LDI
LDI
R3,'M3
R4,t11R2
, AI' = R4
IINPUT,ARO
lOOTPUT, AR2
IINPUTP2,1IR1
1OOlP1,AR3
ISINTP2;1iR7
IFFTSIZ,IRO
3,IRI
, IJ'PER IIf'UT
;
,
;
,
,
IFtl4II2,fIC
ADIF
SUBF
SUBF
AOOF
UBF
"
"
,R5=Bllros,IIIR'=R51
LDF
LDF
ADIF
STF
STF
STF
IIR' = R3 = IIR  BI
RI = 0 IFOR IMlER LOOPI
RO = IIR tFOR IMlER LOOP)
,1IR'=R2=IIR+BI
tllR' = 1161
IAI' = R51
tllR' R71
PTB
BFLEND
17 CYCLES IF FFT SIZE (1024 IU: TO TI USE OF INTERNAL IEIDlY FOR BIT
1If\.RSAl, 21 CYCLES IF m SIZE = 1024 IU: TO TI USE OF EXTEANAI. IEItORY
FOR BIT IIf\.RSAl
LDF
"
STF
"
STF
KPYF
STF
AOOF
LDF
, IIR' = R2 , AI' = R4
II'I'F
,R2=T1=RO+RI
,R3=AI+T1,IIR'=R3
SlIIF
ADOF
STF
AOOF
I R4 = AI  TI , BI' = R3
,
t+ARI,tMO,R3
,
tAR7,RI
tllRl++,RO
,
tARl ++tlRII, IAROt+, R2
R6,I~++(IRO)b
,
,
R5,tIIR3++tlROlb
,
R7,1AR2++tlROlb
3. TO K. BUTTERFLY:
,tR2=T1=RO+RII
, RO = IIR I SIN, tR3 = AI + TIl
, RI = BI I SIN, R3 = IIR + TR
2, WfTERFLY: 0"11/4
"
R2,tAR3++
R4,tAR2++
RI,RO,R2
R2,tMO,R3
R3,tIIR2++
R2,tMO,R4
STF
STF
CLEAR PlPELIIE
STF
SlIIF
II'I'F
tMO++,R3,R5
R2,_
"
"
II'I'F
tllR7++,R7
R4, tllR3++IIROIB
tllR7++,R6
R2, tAR2++IIROIB
ttIlRI,R6,R5
R3, tllR2++IIROIB
RI,RO,R2
tllRI,R7,RO
R2,tMO,R3
R2, iARO++tIRII,R4
R3, tllR3++IIROIB
RO,R5,R3
tARl++,A6,AO
, R7 = ros , tlBI'
=R4))
;::
~
"t:l
;:'l
Ei
::;<.
<Q.,
~
_'l
tl
(J
~
::,
"
"
"
"
"
;::
~
<:>
"
"
'"
<:>
;::
~
~
<::::>
0<::::>
R2,IAR2++(lROIB
R4, tM3tt<1ROIB
RI,RO,R2
R2,IMO,R3
R3,IAIl2
R2,IMO,R4
R3, tM3++<1ROIB
R4,IAIl3
I'I'VF
ADIF
rl'VF
WBF
STF
ADlIF
WBF
STF
ADIF
STF
STF
END IF FFT
END:
S'I>
tv
STF
"
""t
~
::,
;::
.tMI,R7,RS
R3, tM2++<1ROIB
RI,RO,R2 "tMI,R.,RO
R2,IMO,R3
R2, IARO++<1Rll ,R4
R3, tM3++<1ROIB
RO,RS,FtJ
*ARl++,R7,RO
R3,IMO,R2
tMl++IIRII,Rb,RI
R3, tMO++, R3
; RI = BI I
cos ,
IBI' = R41
; IR2 = 11 = RO  RII
, RO = III I SIN, IAI' = R3 = Al  TIl
; IBI' = R4 = Al Tl , AI' = R31
,R3=TR=RORS
, RO = III I COS , All' = R2 = All + TR
, Rl = BI SIN, BIt' = R3 = All  TR
CLEAR PIPELINE
I:>...
C)
S'I>
rf>YF
STF
SIIBF
rl'VF
SIIBF
SIIBF
BFLEND
"
R3,IMO,R2
tMI++IlRlI,R7,RI
R4, tM3ttilROIB
R3, fARO++ ,R3
R2, tM2++<1ROIB
ADIF
STF
"
"
ADIF
rl'YF
STF
WBF
STF
NlP
NlP
N(f>
NlP
SELF
III
.end
SELF
;R2=Tl=RO+Rl
, AI' = R3 = Al  Tl , Ill' = R3
, Bl' = R4 = Al + 11 , AI' = R3
, BI' = R4
:g>
HH. .HHH*nHfHfUHHHUHfHfffHffHHHfHHHfHHfftfffHffHIftH
IIPPfNDlX A5
. fl,.,
TITLE: llIlD1KBR.ASII
.n"t
.fl .. t
.float
FOCT~
(F
7. 11432195745216fOOI
7.0275474445722'5<)1
0.1350041>4915452.003
9.99981175282601.)1
=
....
~
~
1024 CIl'f'lEX
POINTS.
WRITTEN BY :
'tj
~
is
LENlTH
.",l
0
.set
.set
.set
.set
.set
,:l
~
Cl
So
'"
~
~
Q
c::.
sine
0
nhalb
nvitrt
So
'"....
.global
.global
.global
.global
.globa.l
.global
ffHHfflfHfffHfHHHffHffHfllHfHHfHHH*HHHfH"HHffHHHfUHH
.Q.,
tl
14.07.89
ERI.A'aIHIJERNI
~~
~~
~""""
=
~
"0"""
'
(J
(F
.~
so.m
nachtel
II
~>
nhalb
nviert
naeMe I
1024
512
mtOOTH
128
0/2
0/4
o/B
10
NJ1IIER
2Sb
.set
.set
..
.set
fnhalb
fnviert
toacbtel
.set
.set
16
B
4
5
....,C"
.float
.float
.float
.float
.float
.floit
.float
.float
"""=~
~~
....
(1~
9~
'a
~.
~ ~
~
c;"
S1n!
. float
= Idlol
.dita
00
~~
I
=~
Q ~
(F STAGES
'0
o~
~ ~
I. OOOOOOOOOOOOOOet<lOO
O. OOOOOOOOOOOOOOet<lOO
7.07IOb7B11B6548t)1
7.07lOb7BIIB6548tD01
9.i3879';32511287.)1
3. B26B34323650911e)1
3. B26834323650911e)1
9.23879';32511287.)1
9.B07B52B040323Ot)1
s. ~
=
~
~~
""S
til
8'
""S
~
101
HfHHffHfHlffUHUIHHHHHffffffHHHHHfHfffffffHftHfHHfHHI
GCNERIC _
TIIS320C30.
THE TWIDCl..E FACTalS ARE SlPPLIED IN A TAILE PUT IN A .DATA SECTl(w, THIS
DATA IS IN:lUlD IN A SEPARaTE FILE TO PRESERVE TIE OENERIC NAME IF THE
PROORA/t. FOR THE SAlE PlIlPOSE, THE SIZE IF TIE FFT NAND LOO4IN) ARE
!FINED IN A .Il.OIIL DlRECTlE AND SPECIFIED MlNO LIN<INO,
~
"G
:=
is"
go
;:s
<Q.,
.'""l
IIf>
g
So
~
f:2
N
C
ac
FFl
N
"
SINE
TEItP
STORE
.USEeT
'IN',1024
.WORD
FFT
100
,WORD
.WORD
$+2
.IDlD
WORD
WORD
WORD
"SINE
.ass
.ass
.ass
ass
.ass
.ass
FFTSIZ, I
LOGFFT,I
SINTAII,I
INPUT,I
STAGE,I
RPTCNT, I
IEINDX,I
FFTSIZ
.""""==
!STeIlE,ARI
tAROH,RO
STl
RO. fARt++
IARO++.RO
RO, tAR1++
(WE
I'EIOOY TO TIE
~~
('D
('D
~;
LOI
'AROH,RO
STI
RO,fAR1++
LOI
'ARO,RO
STl
RO,'AIlI
LOP
LOI
LOI
LOI
LOI
STl
FFTSI2
IFFTSIZ.RO
IfFTSIZ,IRO
IFFTSIZ,IRI
O,AR7
AR7,tSTAGE
LSH
LSH
LOI
STl
I,IRO
2,IRI
I,AR7
AIl7, IRPTCNT
LSH
STl
ADDI
STl
SUBI
LSH
2,RO
AR7,@IEINDX
2,RO
, INITIAliZE IE INlX
RO,M
, JT=RO/2'2
2,RO
I,RO
, RO=N2
== ::!.
oof')
, WI1ANIl TO LOAD DATA PAGE POINTER
IIf>
FFl SIZE
L004IFFTSIZl
SINE/COSINE TABLE BASE
AREA WITH INPUT DATA TO PROCESS
FFT STAGE
REPEAT COJlTER
IE INlX Fell SINE/COSINE
ADDF
ADDF
S"
~
~
('D
9
!INPUT,ARO
RO,ARO,ARI
RO,ARI,AR2
RO,AR2,AR3
!RPTCNT,RC
I,RC
(WE
FIST LOOP
RPTB
ADIF
=9
"C
OUTER LOOP
LDI
ADDI
ADDI
ADDI
LOI
SUBI
~~
==
~(JQ
~.,
LOOP:
, BEGllfoIlNO IF TEIf' STORAGE AREA
tTEIIP,ARO
.SPACE
.BSS
TEll'
~g
SEcoo\.OOP mNT
JT mNTER IN _ , P. 117
IAI INlX IN _ , P. 117
NlI'IIIER
INITIAlIZE
...
~
.GLOIIL
.il.OIIL
.GLOIIL
.1l.01IL
TEXT
~
~
~
c::;
fHHUHHfIHHH:lHHfUIHHHHUffHHHftH*UHHHfftftHfH*HfHfH
LOP
LOI
LOI
LOI
STl
LDI
LPCNT,I
JT,I
IAI,I
FfT:
THE PRiJGRAII IS TAKEN FRIl1 THE IIlIlRUS AND PARKS 1100(, P. 117. TIE ctH'I.EX
DATA RESIl IN INTERNIi..~, AND THE IXfI'UTATl(W IS IIIH: INPLJ:E,
.ass
.ass
ass
APPENDIX BI
IILKI
*+ARO, f+AA2.Rl
t+AR3, 'tARI, R3
Rl,RI,R6
=
Ii'
~
J:..
't3
;:
"
"";::is'
"
LDF
LDF
ADDF
ADIF
STF
ADIF
SUBF
;::
<Q.,
STF
SUBF
SUBF
SUBF
STF
SUBF
ADIF
STF
STF
5tH
ADIF
STF
STF
"
."'l
"
i::1
(J
."'l
!:l
"
;::
!:l.
0
S.
""...
I1LKI
"
LDI
ADDI
CI1PI
BID
STI
~
C
tv
C
a
C
IJ,)
RPTB
1,AR7
AR7,fIAI
2,AR7
AR7,@LPCNT
LDI
ADDI
LDI
LDI
ADDI
ADDI
STI
ADDI
2,ARb
I!LPCHT ,ARb
@LPCNT,ARO
@IA1,AR7
fIElHOX,AR7
@1tf'UT,ARO
AR7,@IAI
RO,ARO,ARI
AR6, I!LPCHT
, IA2=IAI+IAH
, 1A3=1A2+IAH
"
ItPYF
ADDF
ItPYF
STF
If'YF
SUBF
IIPVF
"
STF
R6,++M3
; Y(l3)=R4*C03R2+S13
If'IF
ADDF
STF
R4, 'ARb,R7
R7,R6
Rb, tAR3++IIRO)
, R7oR4tSI3
, R6=R2tC03+R4tSI3
, XI!3)oR2tC03+R4'SI3
'"
"
ADDF
ADDF
SUBF
SUBF
ADIF
ADDF
N'YF
STF
ADIF
SUBF
SUBF
! ~!
! '!
"
"
END
LOI
STI
LDI
STI
I1LK2
HAR2, ttARO,RJ
nAR3 ....Ml,R5
RS,R3,R6
HAR2,*tMO,R4
R5,R3
IAR2, fMO,R1
tAR3,tARI,R5
R3, HAR5I1RIl, R6
Rb, .tARO
RS,RI,R7
.AR2 , tARO ,R2
RS,R1
RI,tAR5,R7
R7, tARo++ I lRO)
R),R.
HAR3, f+ARl, RS
RI, ..AR5(JRIl ,R7
Rb, .tARl
R3,*M5.R6
R7,R6
RS,R2,RI
R5,R2
tAR3, tARl ,115
RS,R4,R3
RS,R4
R3, HAR41 IRI), R6
Rb,tAR1++(IRO)
RI,tAR4,R7
R7,Rb
RI, ttAR4IIRIl ,Rb
Rb,ttAR2
R3, tAR4,R7
R7,Rb
R4, .tARbllRll ,Rb
Rb,tM2++(IRO)
R2, tARb,R7
R7,Rb
R2,.tARbIlRll,R6
ADIF
!STAGE,AR)
1,AR7
@LOGFFT,AR7
AR7,@5TAGE
@IAI,AR7
@IAI,AR4
@5INTAB,AR4
AR4,AR7,AR5
I,AIr.i
AR7,AR5,AR6
I,ARb
seCOND LOOP
, Ci.IlRENT m STAGE
If'YF
STF
SUBF
SUBF
If'YF
STF
If'YF
ADIF
ADDF
SUBF
SUBF
~JBF
"
INLCP:
ADIF
If'YF
STF
If'VF
SUBF
sn
; X<I2)::ft?R3
; X(J3)~+R3
LDI
LDI
ADDI
ADDI
SUBI
ADD!
SUBI
S.
""
~
R4=YII)Y(J2)
YlIloRl<fi3
RloRlR3
R5=XII2)
R7=mll
R3=XiIIl+X(13)
, RI=IIIl+XII2)
, VllIloRlRJ
,_I<fi3
, R2=XII lX ([2)
, XII)oRl<fi3
, RloRlR3
, R6=XI11 lX1!3)
, fl3=VIIIlV<l3)
, XllI)oRlR3
, R5=R4R6
, R4=R4+R6
, m2)oR4R6
, VII3)oR4+R6
, R5=R2R3 '"
, R20R2<fi3 '"
:::i
;::
'"
g
*+M2,f+ARO,R4
R6, ttARO
RJ,RI
tAR2,R5
t+ARI,R7
tAR3, tAR I ,R3
R5,tARO,RI
RI, ttARl
R3,RI,R6
115, tARO,R2
R6, tARO++<lRO)
RJ,RI
1M3, fM1,Rb
R7, ttAR3, RJ
RI, tARl++<lRO)
R6,R4,RS
R6,R4
RS, t+AR2
R4, t+AR3
RJ,R2,RS
R3,R2
115, tAR2++1 lRO)
R2,tAR3++IIRO)
SUBF
STF
SUBF
;::
N'VF
"
, IA1=IAI+IE
: IXIIl,VII)) POINTBl
, IXIIIl,VIII)) POINTBl
ADD!
RO,ARl,AR2
; U(I2),Y<I2)) POINTER
ADDI
LDI
SUBI
Clf'1
BlD
RO,AR2,ARJ
!RPTCNT,RC
I,Re
@JT,ARb
SPCL
, 1I(I3),VI13)) POINTER
, RC SIIJlU) !If M LESS THAN IESlREO
IF LPCNT=JT, GO TO
Sl'ECIAl OOTTERFLV
BLK2
STF
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
R3=V<ll+VII2)
R5=VI1Il+V(13)
R6=R3+RS
R4=VII)YI12)
R3=R3RS
RI=IIIl+XII2)
R5=XIIIl+XII3)
R6=R3tC02
YII)oR3+RS
R7oR1+RS
R2=XIIlXII2)
RloRlRS
R7oRltSI2
lI11oRl+RS
R6=R3tC02RI'SI2
R5=YIII)YI13)
R7oRltC02
VI1I)oR3tC02flltSI2
R6=R3'SI2
R6=R1tC02+R3tSI2
R10R2+RS
, R20R2RS
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
i
aPl
II
8ft
ILf'CNT,RO
III.OP
COO
STI
lOl
LSH
ADDI
STI
SUBl
IRI,M4
I,M4
tSINTAB,M4
RP18
IILJ(3
AIlIF
SIIF
AIlIF
SIIF
II'YF
AIlIF
II'YF
S1f
SIIF
II'YF
S1f
AIIF
II'YF
S1f
STF
IM2,_,RI
tM2,_,R2
<+IIR2,<+MO;Rl
<+IIR2,_,R4
tM3,tMI,RS
RI,RS,Rt.
RS,RI
<+M3,_I,RS
RS,Rl,R7
RS,Rl
Rl,_
RI,IARO++IlROI
tMl,tMI,RI
<+M3,_I,Rl
R6,ftM:l
R7,tMl++IlROI
Rl,R2,RS
R2,Rl,R2
RI,R4,Rl
RI,R4
RS,Rl,RI
tM4,RI
RS,Rl
_,Rl
RI .. <+11R2
R4,R2,RI
_,RI
Rl,tM2++IIROI
R4,R2
_,R2
RI,<+M3
R2,tM3++IIROI
aPl
l1'li
ILf'CNT,RO
III.OP
LDI
LDI
LSH
1RPTCNT,M7
IIEINDX,AR6
2,AR7
STI
LSH
AR7,WTOO
2,AR6
SI'CI.
i..SH
; POINT TO SIN(451
; CllEATE COSINE INIIEl M4>C021
8ft
AR6,tlEltill
RO,IRO
3,RO
2,RO
RO,IJT
2,RO
I,RO
LOOP
; NI'IQ
; JT'lQI2+2
; N2=H2/4
;:,;
't:i
:=
SlIIF
'"S;:,;
AIXF
SIIF
AIXF
g"
.a.
~
t::I
stF
:l
::
AIlIF
SIIF
SlIF
AIlIF
SlIF
S
'".,
..
..
IILJ(3
II
S
'"
~
~
Q
C
S1f
SIIF
SIIF
S1f
STF
coo
RI=XIII+1tI21
R211l1X 1121
R3=YIII+YII2I
R4aYIIIY1121
RS=XCIII+Xml
R6oA5RI
RI=RI+RS
RSoYIIII+Y1131
R7=RH5
R3=R3+RS
YIIIXIII=RI+RS
RI.11II1X 1131
R3=YCIIIyml
YCIIloR5ftl
; XCIII>113R5
; R50R2+R3
; R2=R2+R3 !!!
; 1l3oR4R1
I R4=R4tRl
; RI>113R5
; RloRltC021
; R3=R3+RS
; R3=R3tC02I
; YII21(RlRSltC02l
;Rl~'11
; RI=RltC021
; XII2I(R3+RSltC02l
; R2R2+R4 !!!
; R2=R2tC021 ! ! !
; Y(I3)(R4R2ltC02l !!'
; X(3)(R4+R2)tC021 III
END:
I!
SITRV
;:
SELF
LDI
SUBI
lOl
LDI
LDI
LOP
lOr
tFFTSlZ,Re
I,Re
ImSlZ,lRO
2,IAI
tlNPUT,ARO
STIR
tSTlR,AftI
RP18
BITRV
_(II,RO
tARO++IlROIB,RI
RO,_IUI
RI, IMI ++IIRII
L!F
LDF
STF
STF
8ft
END
SELF
:Re=N
; Re SIWJI BE ONE LESS THAN IESIRED
; IRO=SIZE IF FFT=N
;::
~
'i:l
~
~
is
=:t.
,:l
t::I
(J
,:l
~
;::
I:l...
.,S'"
tw1E: ffU 
lESCRIPTl(Jj:
GENERIC FlKTI(Jj TO 00 A RADIH m COI'UTATl(Jj (Jj TNE TIIS320C30.
TNE DATA ARRAY IS2tNol(JjG, WITH REAL AN) llIAGINMY VALIS AlTERNATlMl. TNE _
IS BASED (Jj TNE FIIHRAN _
IN TNE lUlRUS
AN) PMKS BOO<, p, 117,
,globoI
= sin(Ot2lpilNI
.floit
.floit
'l'lht.1
.'float
vi,lue(5N/41
vllut2 ;: 5il'l(1'2Ipi/N)
;::
S
'"
~
N
C
Q
C
TI VALlES vII ... l, .11 2, ETC., ARE TNE SINE WA\ VAlIS. FCIl AN
NPOINT m,' THERE ARE N+N/4 VALlES FCIl A All ANII A QlWlTER PERIOD
OF THE SINE WA\. IN THIS WAY, A FLU SINE AND alSlNE PERIOD ARE
AYAILAIU IStftRllI'OSElIl.
STACK STIUTLII lI'IJN TI ClU:
++
fP141
DATA
fP131
ft
fP121
N
: fIETIRj AOCIl
fP1ll
fPIOI
OLDFP
+.
REGISTERS USED: RO, RI, H2, Rl, R4, R5, R6, R7, MO, ARI, ARl, ARl, AR4,
Nr.>, M6, M7, IRO, IR1, RS, RE, RC
AR3
.[Loa.
GUll.
.IFT_4
_SINE
,BSS
.BSS
.BSS
FFTSlZ,1
LOGFFT,I
Itf'UT,1
>
'C
'C
,ENTAY POINT FCIl EIECUTI(Jj
, AOCIlESS OF SINE TAIU
SINTAB
.!IIord
t:=
~::;
PUSHF
PUSHF
PUSH
PUSH
PUSH
PUSH
Rb
LOI
ST!
LOI
ST!
LOI
STl
<FP121,RO
RO,IfFTSlZ
.FPI3I,RO
RO,@lOGFFT
R4
R5
~
~
I
'~
n
0
LOI
LOI
LDI
LOI
STl
IfFTSlZ,RO
IfFTSIl,IRO
IfFTSIl,IRI
O,AR7
AR7,!!STAGE
LSH
LSI!
LOI
STl
I,IRO
2,IRI
I,AR7
AR7,IRPTOO
"e.
~
fFP(41,RO
RO,!Itf'UT
STAGE,!
RPTCNT ,I
IEINDI,!
LPCNT,!
JT,!
IA1,1
=~
..
R7
AR4
AR5
ARb
AR7
.BSS
.BSS
.BSS
.BSS
.BSS
.BSS
ac I~
e 
INITIAlIZE C FlJI:Tl(Jj
FP
SP,FP
_SINE
PUSH
LOI
PUSH
PUSH
..=
~
.sine
dati
.sine
.SET
TEXT
_ffU:
IN CIlDER TO HA\ THE FINAL RESllT IN BITR\RSED CIlDER, TI TWO
ftID!l IIRAIDES OF THE RADIX4 IIUTTEflfLY ARE INTERCHIIHGfD IIIJIIMl
STCIlAGE. NOTE THIS DIFFERENCE WIN COII'ARIMl WITH THE PROGRAII (Jj
p, 117. TI CM'UTATl(Jj IS OONE IfH>LACE, ANI! TNE CIlIGlNAL DATA IS
lESTRDYED. BIT IIE\RSAL IS IIflEIIENTED AT THE END OF THE FlKT!(Jj.
IF THIS IS NOT NECESSARY, THIS PART CAN IE ClXXENTED OOT. THE
SINE/COSlNE'TAIU FCIl TNE TWID!l FACTCIlS IS EIPECTED TO IE StPPLIED
IIIJIII) LUt< TlIE, ANIIIT SHWl) HA\ TI FI1WIIMl FCIlMT:
;::
<::>
SYIO'SIS:
int
I:j
~
c
fP
Ii'PEIIIIl B2
, FFT STAGE I
, REPEAT al.WTER
: IE INtEl FCIl SINE/alSllE
, SECINH.IXP al.WT
, JT COltITER IN _ , P. 117
, IAI INtEl IN PROORAII, P. 117
t:=
~
!.
~
~
rIl
~
LSII
STI
AIIII
STI
SUBI
LSII
2,RO
M7,tlEIlIX
2,RO
RO,M
2,RO
I,RO
INITIALIIE IE INDEX
I JT=RO/2+2
I
RO=II2
IIJTER LOOP
::
1i'
'i;i
;;;l!I
<10
::
is
g"
ADIF
ADIF
ADIF
.~
t1
SUB'
STF
SUB'
:J
::
So
<10
~
0
is
So
<10
tv
Q
C
STF
ADIF
::
....
~
s::;
LDF
LDF
ADIF
ADIF
tINPUT,ARO
RO,ARO,IIRI
RO,IIRI,IIR2
RO,IIR2,IIR3
IRPTCNl,RC
I,RC
I
I
I
I
ARO
IIRI
IIR2
IIR3
POINTS
POINTS
POINTS
POINTS
TO
TO
TO
TO
RC SID.lII lIE
XCII
XliII
X1121
11131
(IE
FIST LOOP
RPTB
..
J:
..
IlJ(I
SUB'
STF
SUB'
SUB'
SUB'
STF
SUB'
AIIIF
STF
STF
SUB'
ADDF
STF
STF
IlJ(I
_ , t+IIR2,Rl
I
I
;
;
;
;
;
;
I
;
;
;
I
I
I
;
;
;
;
;
;
;
;
;
;
Rl=YUI+YII2I
R3=YIIII+Y( 131
_1+R3
R4=YCIIYU21
YIII=RI+R3
Rl=RIR3'
R5=X1I21
R7=YIIlI
R3XIIlItXII3I
RI=XIII+1II21
YIIII=RIR3
_1+R3
R2=XIII!II2I,
XCII=RI+R3
Rl=RlR3
R6=X III I! 1131
R3=YIIlIYU31
XlIlI=RIR3
R5=R4R6
R4=R4+R6
YU21=R4R6
YU31=R4+R6
R5=R2R3 ! ! !
R2=R2+R3 !! !
XU2}=R2R3 !!!
LDI
ADDI
LDI
LDI
ADDI
ADDI
STJ
ADDI
STJ
ADDI
ADDI
LDI
SUBI
Cl'PI
BiD
LDI
LDI
ADDI
ADDI
SUBI
ADDI
SIIII
2,ARl>
IlKNT,IIR6
ILPCNT,ARO
IIAl,M7
IIEIf()X,1IR7
IINPUT,ARO
1IR7,tIAI
RO,ARO,IIRI
ARl>,tLPCNT
RO,lIRl,lIR2
RO,IIR2,IIR3
IRPTCNl,RC
I,RC
M,ARl>
5I'CL
IIAl,M7
tlAl,IIR4
ISINTAB,IIR4
1IR4,1IR7,AR5
1,AR5
1IR7,AR5,1IR6
1,ARl>
I
I
IAl=IAl+IE
CXCII, YIIII POINTER
(XIIII,YUIlI POINTER
I
I
I
I
tsTAGE,M7
1,M7
ILOOFFT,M7
END
1IR7,ISTAGE
'"
STAGE
IlJ(2
ADIF
ftAR2.I+MO,R3
++IIR3,I+llRl,R5
R5,R3,R6
t+IIR2, _ , R4
ADIF
..
..
RPTB
ADDF
ADDF
SUB'
SUBF
ADDF
..
; aReIT
SECOND LOOP
; X(3)=R2+R3 !!!
I,M7
1IR7,tIAI
2,M7
M7,1LPCNT
11t.(J>:
LOOP'
LDI
ADDI
ADDI
ADDI
LDI
SIIII
LDI
STJ
LDI
STJ
IIPYF
STF
ADDF
SUBF
SUBF
IIPYF
STF
SUB'
SU8F
IIPYF
STF
IIPYF
ADDF
R5,R3
+IIR2,_,Rl
+AR3, +ARI,R5
R3, ++AR5I1RIl, R6
R6,t+ARO
R5,Rl,R7
tllR2,+ARO,R2
R5,Rl
RI,'AR5,R7
R7,tllRO++llROI
R7,R6
ttM3, HAR1, R5
Rl,'+AR5I1RIl,R7
R6,t+AR1
R3,'AR5,R6
R7,R6
; R3=YIII+YII21
I R5=YIIlItY(131
I 
; R4=YUIY(121
; R3=R3R5
;
;
;
;
;
;
;
;
;
;
;
;
;
;
Rl=XCII+1II21
R5=XIIII+XII3I
_3+CD2
YCII=R3+R5
R7=Rl+R5
R2=XIII! 1121
Rl=RlRS
R7=Rl+SI2
XIII=RI+R5
R6=R3+CD2Rl'SI2
R5=YIlIlYIl3I
R7=Rl'C02
Y(IlI=R3+CD2RI+sI2
R6=R3'SI2
_ I +CD2+R3'SI2
:::
AD!}"
SUBF
SUBF
~
"i::l
'""
;:;
:::
IS
"
:::to
C
:::
~
"
."'l
tl
"
(j
_"'l
i:l
:::
i:l..
(;)
"
lJL)(2
S....'"
SIJBF
ADIJ"
l'I'YF
STF
l'I'YF
SIJBF
I1PYF
STF
I1PYF
AD[/'
l'I'YF
STF
l'I'YF
SIJBF
I1PYF
STF
l'I'YF
ADDF
STF
Cl'I'I
BP
BR
RS,R2,RI
RS,R2
>AR3, >ARI,RS
RS,R4,R3
RS,R4
R3, ++AR4IIRII ,Rb
Rb,>ARI+t(IROI
RI,>AR4,R7
R7,Rb
RI, ++AR411RIl ,Rb
Rb, .+AR2
R3, >AR4,R7
R7,R6
R4, ++ARbllRl I ,Rb
Rb, 'AR2++IIROI
R2, +ARb,R7
R7,R6
R2, ++ARbIIRIl ,Rb
R6,HAR3
R4, 'ARb,R7
R7,Rb
Rb, IAR3++( IRQ)
I!I.PCNT,RO
Irt..OP
CONT
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
RI=R2+RS
R2=R2R5
RS=XIIIll{!31
R3=R4RS
R4=R4+RS
Rb=R3+C01
XIIIl=RI'C02+R3'SI2
R7=RI'SII
Rb=R3*COIRlfSIl
Rb=RI+CDl
YII21=R3'C01Rl'SII
R7=R3'SIl
Rb=RlfC01f.R3tSII
Rb=R4'CQ3
XI121=RI,COI+R3'SI1
R7=R2'S13
Rb=R4'CQ3R2'SI3
Rb=R2'CQ3
y(!31=R4'CQ3R2'SI3
R7=R4'SI3
Rb=R2fC03+R4+SI3
XI131=R2+C03+R4'SI3
~
C
'"C
:::
S
'"
LOI
LSH
ADD!
"
BLK3
"
CONT
;
;
;
;
;
;
;
;
;
;
;
;
;
;
RI=XI!)+l{!21
R2=XII1X1121
R3=YIlI+YI121
R4=YIIJY1121
RS=XIIIl+l{!31
Rb=RSll1
RI=RI+RS
RS=YIIII+YII31
R7=R3RS
R3=R3+RS
YIlI=R3+R5
XI!I=RI+RS
RI=XIlIIX(j3J
R3=YIIIIYI131
Rb,HARl
Y(I1I=R5Rl
R7,>ARJ++IIR()1
R3,R2,RS
R2,R3,R2
Rl,R4,R3
Rl,R4
; XIIII=R3RS
; RS=R2+R3
CONT
; R2=R2+R3 !!!
BITRV
AODF
AD[/'
SUE<!'
AD!}"
AD[/'
AD[/'
STF
STF
SUBf
SUBF
...I
END:
BLKJ
[(I
>AR2,>ARO,RI
1AR2, IARQ, R2
HARZ, HMO,R3
HAR2, HARO,R4
1M3, *ARt ,RS
RI,R5,Rb
RS,RI
HARJ, ++ARI, R5
RS,R3,R7
RS,R3
R3, HMO
RI, tM{)t+( IROl
.AR3,>AR1,RI
HAR3, ++ARt, R3
SUBF
; POINT TO SINI4S1
; CREATE COSI~ INDEX AR4=C021
AlJ[/'
SUBF
STF
STF
AD!}"
SlJBF
SlJBF
ADDF
: R3=R4ll1
; R4=R4+Rl
RS,R3,RI
.AR4,RI
RS,R3
LDI
LOI
LSH
@APTCNT,AR7
@IEINDX,ARb
2,AR7
STI
LSH
ST!
LOI
lSH
AR7, !RPICHI
2, ARb
ARb,mINDX
RO,IRO
3,RO
ADDI
2,RD
*AR4,R3
RI, f+AR2
R4,R2,RI
.AR4,RI
R3, *AR2++( lRO)
RI=R3RS
RI=RI>C021
R3=R3+RS
R3=R3>C021
YII2I=IR3RSI'C021
,,,
RI=R2ll4
RI=R1+C021
X1121=IR3+RSI+C021
R2R2+R4 ",
R2=R2+C021 ",
y(!31=IR4R21+C021
1Il3)=IR4+R21'C021
I~MENT
,,,
~XT
TIlE
RPTB
SUBF
tv
0
IRI,AR4
1,AR4
@SINTAB,AR4
I!I.PCNT, RO
I,,OP
STF
SIIBF
l'I'YF
STF
AD!}"
I1PYF
STF
STF
Cl'I'I
BFD
I1PYF
AD[/'
I1PYF
i:l
:::
R4,R2
.AR4,R2
RI, 'tAR3
R2. +AR3++IlROI
;
;
;
;
;
;
;
;
;
;
;
;
SUBF
STI
RO,M
SUBI
LSH
BR
2,RD
; JT=N2I2+2
I,RO
LOOP
; N2=N2/4
~XT
FH STAI
!fFTSIZ,Re
I.Re
!fFTSlZ,IRO
@INPUT,ARD
@INI'UT,ARI
RPTa
eMPI
BGE
L!}"
LDF
BlTRV
ARO,ARI
CONT
'MO,RO
>ARI,RI
RO, *ARI
Rl,fMO
STF
STF
"
; IE=4'IE
; NI=N2
LDF
LDF
srF
SIF
NOP
NOP
HARQ( 1)
Re=N
Re SIO.lD BE OOE LESS 1fW< DESIRED ,
lRO=SIZE Of FFT=N
,RO
HARHlI,Rl
RO,HARt(l)
RI,t+AROI11
f+tAf(O(Zl
'ARI++llROIB
108
109
CllPI
BGE
LLf
Pl'PENDlX CI
GENERIC PROORAII TO 00 A RADlX2 RfAL FFT COIf'UTATION ON TlE TI1S32OC30
THE PROGRAM IS TAKEN FROM HE PPI'ER BY SORENSEN ET AL, cUE 1987 ISSUE
OF THE TRIV<SACTIONS ON !\SSP,
..
LLf
STF
..
CONT
STF
tfl'
tfl'
BITRV
THE lREALI DATA RESIlE IN INTERNAL ~Y. TlE COMPUTATION IS lXJiE
INI'lACE. HE BIT REVERSAL IS lIOI AT TIE BEGINNItIJ Lf HE PROJRII/1.
~
;:
~
~
~
;:
AlJTti(IR:
i:i
~.
INP
\:j
<'l
.GLO~.
FFT
N
.GLOBL
; LOO2(NI
.GLOBL
SINE
.USECT
.BSS
"IN",I024
OOTP, 1024
s::,
;::
INITIALIZE
wORD
:::..
So
~
....
;::
~
0
'"
FFTSIZ
LOGfFT
SINTAB
INPUT
OUTPUT
FFT:
0
;::
~
~
t:i
a
C
BLKI
:=
=
~.
tINPUT,M~
lRO,RC
I,RC
BLKI
HARO, *ARO++, RO
tARO,*ARO,Rl
RO,*AAO
RI,lIARO++
, R(FXlll+X(J+11
, RIX(! IX(J+()
; xn )=X( I 1+x{I+lI
, X(!+IIX(1)X!l+()
fFT
SPACE
100
WOOO
WORD
WORD
WORD
.WOOO
SINE
INP
ClJTP
LOP
@INPUT,ARO
2,IRO
eFFTSIZ,RC
2,RC
I,RC
..
.
..
BLK2
NEGf
STf
STf
STF
BLK2
, R(FX(1)+XII+21
HARO{ IRO), *MO++( IRO),RO
tARO, .AAOIIROI,RI ; RIXIIIXII+21
, R(Fm+31
t+AAO,RO
; X(I)=X(J)+X{i+2J
RO, 'AROI lROI
, m+21m )X (J+21
Rl,*r:iKl++(JRQI
, x(!+31Xll+31
RO, HARO
FfTSIZ
LDI
SUBI
LDI
LSH
LDI
LOI
I!fFTSIZ.RC
I,Re
eFFTSlZ ,IRO
I,IRO
@INPUT,AAO
@INPUT.ARI
RPTS
Sl1RV
LOOF
; RC=N
; Re SHOOLD BE ONE LESS THAN lIESlRED I
LOI
LSH
LOI
LOI
LOI
LSH
LSH
LSH
eFFTSIZ,IRO
2,IRO
I,R4
; lROINDEX FOR E
; R5 IIlLDS TIE ClIlRENT STAGE NltIlIER
: R40N4
2,R3
, R30Ia
I,IRO
I,R4
1,R3
, EoE/2
; N4=2ftt4
3,R5
, N2020m
LOI
LDI
ADDI
LOI
tINPUT,AR5
lRO,AAO
@SINTAB,AAO
R4,IRI
~~
:3 :=
..,
~
aa~....
"0
.... Q..,
..,
:=trQ
Q
9
:=
:
r~
Q
~~
a:: Q
00=
~~
==
O~
=~
So
N
C
SEPTEMBER 8, 1987
"0
"0
~
TEXT
.'"l
::?
\:i
RPTB
ADDF
SUBF
STF
STF
PAN~;
(;LOft
~
.'"l
LDl
LDI
SUBI
= N/2.
E. PAPAMICtlAlIS
TEXAS INSTRlJMENTS
>
LENGTH 00 lIUTTERFLIES
'G
MI,AAO
CONT
'MO,RO
lIARl,RI
RO,lIARl
Rl,*MO
tMO++
tAR1 H( 1ROIB
~
~
!.
~
~
;:
N(p
~
:!
is
~.
~
tI
"
(J
a
S.
...'"
~
s::i
;:
~
<::i
"
<::i
f:::i
tv
C
a
C
LOF
ADOF
SUBF
STF
NEGF
'IIRS++IIRII,RO
HIIRSIIRII,RO,RI
RO, mARSIlRI I ,RO
RI,HIRSIlRlI
RO
HtM5(1RU,Rt
RO,'M5
RI,'IIRS
,
,
,
,
,
,
,
,
STF
STF
LOI
LSH
LDI
SUBI
@FFTSII,IRI
2,IRI
R4,Re
l,Re
RPTB
It'YF
If'YF
I'I'YF
ROOF
It'YF
SlJBF
ILK3
>AR3,>+ARQIIRII,RO
fAA4, 'ARO,R1
>AR4,"AROIIRII,RI
RO,RI,R2
'AR3,>ARO++lIROI,RO
RO,RI,RO
>AR2,RO,RI
'AR2,RO,RI
Rl.fAR3++
>ARI,HZ,AI
RI,>AR4R2,tARI,RI
Rl,IAR1++
RI, 'AR2
SUBF
;:
'"
ADDF
"
STF
"
STF
"
BLK3
ROOF
SUBF
STF
STF
SUBI
ADDI
Cl'l'I
SLID
ADDI
MlP
HOP
!INPUT,ARS
R4.11RS
@FFTSII,1IRS
INLOP
!ItflJT,1IRS
ADOI
Cl'l'1
BLE
I,RS
!LIXHT,RS
LOOP
N(p
I()p
MlP
RO=XUI
RI=XUI+XIl+N21
RO=XIlIHIl+N21
XIlI=XIIi+XIl+N21
RO=XIl IXI I +H21
RI=XII+N4+N21
XIl+N21=XIIIXII+N21
XII+N4+N21=KIl+N4+N21
INNERIIOST LOOP
I:>
;:
I:>..
S.
IIRS,ARI
I,ARI
ARI,AR3
R3,AR3
AR3,AR2
2,AR2
R3.M2,AR4
NEGF
"
."l
LDI
ADOI
LOI
ADDI
LDI
SUBI
ADOI
Tl~S
,RO=XII31'COS
, RI=XII41'SIN
, RI=XI141.COS
, R2=XtI31'COS+XI141'SIN
, RO=X!l3I'SIN
; RO=X(l3)ISINtX(J4)'COS !!!
; Rl=XII21+RO I!!
: Rl=X(J2I+RO !!'
; X( I31=X021+RO ! I!
, RI=XII1I<il2
; X(l4)=X(J2>+RO !! I
, RI=l(lIHl2
, XIlIl=XilIl+R2
, XII21=XIlIlR2
, IIRS=I+NI
, LOOP BACK TO TI INNER LOOP
END
BR
,END
END
FP
/\'PENDIK C2
NAI1E:
.BSS
lnt PI
~BER OF STAGES = LOO21NI
floit Idata.
MAAY WITH IIf'UT AND 0U1l'IJT IiITA
is
5'
::
[(SCRIPTION:
GNERIC Fl))C!ION TO JJO A RADIK2 FFT CMUTATION ON THE T11S32OO).
THE IiITA MAAY IS tHONG. WITH ONLY REAl. DATA. THE OUTPUT IS STORfD
IN TIE SME LOCATIONS WITH REAL AND IIIAGINARY POINTS RAND I AS
Fo..LOIIS: RIOI. Rill RINI2I. IINI2lI 1111
II
::
s:l..
= c.os( (N/4)t2*pi/NI
FP(4)
FP<3l
FP(2)
FPIIl
FP(O)
'"
<:::S
::
S.
""
++
DATA
"
N
: RETLIlN ADffi
0..0 FP
"
t+
"
a
C
_SINE
FP
SP,FP
PUSH
R4
RS
AR4
AR5
LDI
STI
LDI
STI
LDI
STI
+fPI21.RO
RO.@FFTSIZ
+fPI31.RO
RO.tLCXJ'FT
+FPI41.RO
RO.@INPUT
LDI
SUBI
LDI
LSH
LDI
LOI
IfFTSIZ.RC
I.RC
IfFTSIZ.IRO
I.IRO
@IIf'UT.ARO
tINPUT.ARI
IIPTB
ClIPI
BITRV
ARI.ARO
CONT
oARO.RO
+ARI.RI
RO.+ARI
RI.oARO
oARO++
.AR1++IIROIB
BGE
tv
C
UfUU'U'.HUfffffU,ffffffffffffff.'HUH,fHfUfffH fffffU'HU'HH
=
....
~
~
CONT
BITRV
LDf
LDf
STF
STF
IU'
IU'
LDI
LDI
SUBI
!INPUT.ARO
IRO.RC
I.RC
a; RC=N
; RC SIO.UI IE ONE L<:SS THAN IlESIRED I
; IRO=HALF TIE SIZE OF FFT=NI2
~
~
t3
.....
Q
=
tD
atD
~
=
=
n
til
L<:NGTHTWO BUTTERFLIES
OCTOBER 13. 1987
~
~
REGISTERS USEO: RO. RI. R2. R3. R4. RS. ARO. ARI, ARl. AR4. AR5; IRO.
IRI, RS. RE. RC
.n
TIE VALLS v.luel TO v.I"INI41 ARE THE FIRST QUARTER OF THE SINE
PERIOD AND; v.I"INI4+1I TO v.lu.INI2) ARE THE FIRST QUARTER OF THE
COSltE PERIOD.
~
1:1
::
~
<:::S
= siniOI2f pi/NI
"".,
=
;;:l'
FFTSIZ.I
LCXJ'FT.I
IIPUT.I
= sinlllltpl/NJ
.fluat value{NI2)
<:)
PUSH
PUSH
PUSH
PUSH
_sine
.floit vahe1
.float vilue2
.word
LDI
,dita
_sine
s:l
S.
..FFT _RL:
INITIALIZE C FtKTlON
tl
,:i
SINTAB
>
i
"I:S
TEXT
::
..FFTJI!.
_SItE
ass
.o..OIIl
.BSS
SYNCf'SIS:
M3
.0..011..
SET
.....
....~
Q
;:
rr
'::l
~
::!
~
ilK!
ti
RPTS
ADOF
SUSF
STF
STF
lUI
HARO AROt+ ,RO
fARO, fMO,Rl
RO,f~ARO
Rl,lIARQtt
;
;
;
;
RQ=XtII+xtl+11
RI=Xtl IXtl +1 I
XtII=XtlltXtl+11
Xtltll=ltIHtl+11
LDI
LDI
lDI
lSH
SUSI
@INPUT,ARQ
2,IRQ
tFFTSII,RC
2,Re
I,Re
i::l
RPTB
ADOF
."3
SUBF
NEGF
IIlK2
I+AROIIROI, 'AROt+IIROI, RO
; RO=XtlI+Xtl+21
'ARQ, fMOtlROI ,RI ; RI=XtlIXII+21
++ARO,RO
; RQ=Xtl+31
RQ, fARQ tl RO I
; XIII=XtlI+1iI+21
Rl,fMQH(IRO)
; Xtl+21=XIIHtl+21
RO, HMO
; Xllt31=Xtl+31
<Q.,
~
\)
;:
$:l..
..
IIlK2
..
S~
$:l
;:
~
<:j
lOOP
g'"
~
INlOP
tv
C
a
C
LDI
lSH
lDI
LDI
LDI
lSH
lSf!
lSH
tFFTSII,lRO
2,IRO
3,RS
I,R4
2,R3
I,IRQ
I,R4
I,RJ
;
;
;
;
;
;
;
IRQ=INDEX F~ E
R5 nos TI CLIlRENT STAG NLI1IIER
R4=N4
R3=N2
E=E/2
N4=2fN4
R2=2tN2
S
'"
...'"
~
STF
STF
STF
lDI
lDI
ADDI
lDI
!INPUT,AR5
IRO,ARO
ISINTAB,ARO
R4,IRI
LDI
ADDI
lDI
ADDI
LDI
SUBI
ADDI
AR5,ARI
I,ARI
ARI,AR3
RJ,AR3
AR3,AR2
2,AR2
R3,AR2,AR4
LOF
IM5++(lRU ,RO
RO=X(J)
ADDF
SUEIF
STF
NEGF
ftAR5IIRII,RQ,RI
RO, f ..AR5( IRI I ,RO
RI, fAR5tlRI I
RQ
RI=Xt I I+XtltN21
RQ=Xtl 1+11 1+N21
XtIl =XI! 1+1iI+N21
RQ= Xtlll tl tN2 I
NEGF
STF
STF
l++iIR5tIRll,RI
RQ,+i1R5
RI,tAR5
RI=Xtl+N4+N21
Xt ItN2I=XtllXtltN21
Xtl +N4tN2I=Xtl +N4+N21
INtJmOSTLOCf>
g'
$:l
..
LDI
LSH
lDI
SUSI
IFFTSll,IRI
2,IRI
R4,Re
2,Re
RPTB
It'YF
It'YF
It'YF
ADOF
II'YF
1lK3
tM3,ftARQtlRII,RQ ;
fAA4, .ARQ;RI
;
'AR4, ftARQt IRI I,RI ;
RQ,RI,R2
;
'AR3,tAAOt+IIRQI,RQ;
RQ,RI,RQ
;
fAR2,RQ,RI
;
fAR2,RQ,RI
;
SUIIf
.
..
..
1lK3
SUEIF
ADOF
STF
AOOF
STF
SUBF
STF
STF
SUBI
ADOI
CI1PI
!l.ED
ADDI
HOP
fa'
ADDI
C!l'1
!I.E
RQ=Xt!3lfCOS
RI=Xtl41'SIN
RI=xtl41'COS
R2=ltI3ItCO$+ltl4l'SIN
RQ=XI!3I'SIN
RO=XIl3)*S]N+X(l4}tCOS !! I
Rl=XlI2)+RO !!!
RI=X I 121+RQ .. ,
Rl,fM3H
; XIJ31=X(2)+RO !!!
fARI,R2,RI
RI, 'AR4R2,fAAI,RI
Rl,lARl++
RI, iAR2
;
;
;
;
;
!INPUT,AR5
R3,AR5
IFFTSIl,AR5
IILDP
!INPUT,AR5
RI=11ll 1+R2
X(J41=XII2HRO !!!
RI=XllIH12
ltlll=ltl1l+R2
XI121=XIllIR2
; AR5=ltNl
; LW' BACK TO TI IHR LW'
1,R5
eUGFT,R5
lOOP
RESTORE TI REGISTER
POP
POP
POP
POP
POP
RETS
AR5
AR4
R5
R4
FP
V~UES
AND RETlIlN

LID'
APPENJUC3
GENERIC I'ROGIWI TO 00 AlWIIX2 REIl. INIIERSE FFT CM'UTATlIII III TI
INl.IP
TIIS32OC3O.
TI lREIl.I DATA RESIlE IN INTERIW. tEItORY. TI COIf'UTATlIII IS In
INf'I.M:E. TI BIT REVERSIiL IS In AT TI IIEGlrtllNl IF TI I'ROGIWI. TI
1tf'UT DATA ARE SraD IN TI FOLLOIIINl 0RlER:
li'
'15
~
11tf'UT.AR5
lRO.1IRO
ISINTAB.IIRO
R4.IRI
LDI
ADDI,
LDI
ADDI
U11
SUBI
ADDI
AR5.ARI
I.ARI
ARI.AR3
Rl.AR3
ARl.AR2
2.AR2
Rl.AR2.AR4
NIP
AIIIF
SUIIF
t++IIR5URlI
, POINT TO IIl+N41
tAR51 IRlI.f+IIR5URI I.RO
t+AR5URI I. fARSURI I.RI
RO.fARSURlI
, IUI=IUI+XIl+N21
RI ....AR5URlI
, IU+N2IIIIIlU+N21
tAR5.RO
2.0.RO
RO. fARSURlI
, IU+N4I=2tIU+N41
t++IIR5URlI.RI
2.0.RI
RI.fAR5++URlI
, XU+N4+N2I~XIl+N4+N2lf2
STF
STF
is
~TIDI:
g"
PANOS PIIPMIClW.IS
TEIAS INSTRtIIEIITS
.a,
.Ill.OBL
. 1ll.OBL
.1ll.OBL
.1ll.OBL
"
SIIE
tl
.BSS
INP.I024
,:l
.TEIT
INITIALIZE
IFFT
N
LIF
NPYF
..
STF
LIF
NPYF
STF
1lK3
tAR2.tARI.RI
tAR2. tARI.RO
RI.f+llROIlRlI.RO
RO.tARI++
tAR3. tAR4.R2
tAR3._.R6
R2.fARO.R6
R6.tAR2R6.RO
R2.f+llROIIRlI.R6
RO,IAR3++
RI. tARO++I lROI.RO
R6.RO
RO.tAR4
RPTB
SUIIF
ADIF
.IOIIRII
.1oIIRII
.1oIIRII
.1oIIRII
N
"
SINE
IMP
UIP
FFTSIZ
g
So
"'
~
N
NPYF
'FFTSIZ
LOOFFT
SINTAB
1tf'UT
IFFH
U11
U11
U11
LSH
LDI
LSH
I.IRO
3.R5
IFFTSIZ.Rl
1.Rl
IfFTSIZ.R4
2.R4
STF
ADIF
SUIF
::
NPYF
STF
::
SUIIF
NPYF
STF
1lK3
NPYF
ADIF
STF
ac
::
, lRO=INlEX FOR E
, R5 IILDS TI CIIlRENT STAlE IUIIIER
, R3oilII2=N2
, R4=NII4=N4
SUBI
CNPI
BLTO
ADlII
IrtlER LID'
LDI
ADDI
t'D
I'C t'D
== :I.
;n
::.~
8ri5
=9
..
=
:s
t'D
2.Re
IFFT
n~
=
a :s
: 1"
100
IFFTSIZ.IRI
2.IRI
R4.RC
SPACE
.Q
U11
LSH
U11
SUBI
,WORD
So
INNERIIOST LID'
U11
U11
ADDI
U11
, RI=T1"IUlI11121
RO=T1tCOS
1111 1=1111 1+11121
R2=T2=III3I+XIl41
, R6=T2tSIN
, 11121=XIl41l1131
R6=T2tCOS
IU31=T1tCOST2tSIN
RO=T1tSIN
, XIl41=T1fSIN+T2tCDS
11tf'UT.1IR5
IfFTSlZ. AR5
Irt.OP
11tf'UT.AR5
lRO.ARO
ISINTAB. ARO
~o
::::=
00=
~~
==
Q~
=~
f
~
;:s
ADDI
~
~
I,RS
ILOIFFT,RS
L(IJ>
1,IRO
1,R4
1,R3
Cll'1
IlD
LSI!
LSH
LSI!
S!
go
lAST PASS
(F
"
E=E12
"
_12
N2=N212
11 MIN L(IJ>
;:s
tl
("')
!i
s::..
I:i
~
C
"
"
STF
BLK2
I:
..VI
II'VF
STF
UF
Rl=XIII+XU+21
Rl'XIIIXII+21
XIII=XIII+XU+21
XII+21=X!IlX1I+21
; Rl=2.OtXCI+1I
; XII+1I=2.OtXII+1I
; Rl=2.OtXCI+31
; XCI+31=2.0tXCI+31
; RO=XII+4+21
LENGTHTill MTERFLIES
""
~
ADDF
SUBF
STF
STF
UF
.~
t+MOUROI,RO
BLK2
RO,fIIR(}++(IROI,Rl
RO,tAROIlROI,Rl
Rl,'ARO(IRO)
Rl,tARO++
tARO,RI
2,O,Rl
Rl,tAROIIROI
tARO++,Rl
2,O,Rl
Rl,tARO
t+AROIlROl,RO
UF
II'IF
So
UF
:1
~
g
tINPIIT,ARO
2,IRO
tFFTSIZ,Re
2,Re
I,Re
IIPTB
!:>
;:s
a
So
..,""
LOI
LDI
LOI
LSH
SUBI
BLKI
I:
LOI
LDI
LSI!
51111
@INPIIT,ARO
IFFTSIZ,Re
1,Re
I,Re
IIPTB
ADDF
SUIIF
STF
STF
BLKI
t+AAO, tARO++ , RO
tARO, tARO, Rl
RO,tARO
Rl,tMOt+
;
;
;
;
RO=XIIl+XH+1I
Rl=X!IIXII+1I
XIII=X!Il+XCI+11
X11+1 l=XII lX1I+1 1
IFFTSIZ,Re
I,Re
IFFTSIZ,IRO
1,IRO
IIPTB
BITRV
ARl,ARO
Cll'1
; RC=N
; Re SIO..UI BE
(J,(
; lRO=HALF 11 SIZE
m=N12
IUfltJT,MO
@INPIIT,ARI
IIIi:
UF
UF
STF
STF
ClJjT
BITRV
10'
to'
END
III
.END
CIJjT
tARO,RO
tMl,Rl
RO,IARI
Rl,tARO
tMl>++
tARl++IIROIB
; IF ARO<Ml
END
116
117
UF
STF
STF
"
APPENDIX Dl
00
TI TlIS32OC30,
"
10'
10'
~T
BITR'!
lOl
lOl
SlIiII
THE TWIDDLE FACTORS ARE stfPLIED IN A TAlilE PUT IN A .DATA SECTION. THIS
DATA IS IIQUDEO IN A SEPARATE FILE TO PRESERVE TIE GENERIC NATLIIf OF TIE
PIlOGRAl1. FOR TIE SM PURPOSE, THE SIZE OF TI FHT N AND LlmlNI ARE
DEFINED IN A .GLOBL DIRECTIVE AND SPECIFIED OOUtIJ Llrt<IOO. TIE LENGTH OF
THE TABLE IS NI4 t NI4 = N/2.
;:,:
~
;:!
'is"
:::to
.il..0!Il
;:,:
..:;,
~
:J
tl
.GLOBL
.GLO!Il
.GLOBL
.BSS
FHT
N
RPTB
AIU
SlIIIF
STF
STF
"
"
SINE
;
;
;
;
INP,1024
~tQlY
TEXT
INITIALIZE
I:l
;:,:
I:l..
WORD
FHT
"
SPACE
100
"
'"....
I:l
;:,:
..;,
~
'"
~
FHTSIZ
LOGFHT
SINTAB
INPUT
WORD
INP
FHT<
LOP
FHTSIZ
WORD
WORD
WORD
N
C
RPTB
CMPI
BHRY
ARl,ARO
CONT
tARO,RO
'"
a
C
Q.
~.
.~
@INPUT,ARO
lRO,RC
1,RC
I"
BLKl
HARO, tARO++, RO
tARO, tARO, Rl
RO,HIRO
Rl, tARO++
;
;
;
;
RO=Xllltl(J+1I
Rl=XU)lU+1)
XU)=xtI)+xII+1I
XII+II=xtIIXU+1I
lOl
lOl
lOl
LSH
SlIiII
@INPUT,ARO
2,IRO
eFHTSIZ,RC
2,RC
1,RC
RPTB
AIU
SlIIIF
STF
UF
AODF
STF
SlIIIF
STF
STF
BLK2
; RO=XIJ)+XIL2)
HAROI lRO), 'ARO++I lROI, RO
f'ARO,*MO(IROI,Rl ; Rl=XIJIXIL2)
; X(JI=X(JI+x(L2)
RO, HIROURO)
; RO=XIL4)
HARO,RO
; Rl=XIL3)+XIL41
RO, tARO,Rl
; XIL2I=IIJIXIL2)
RI, tARO++
; Rl=XIL31XIL4)
RO,H.ROURO),Rl
; XIL3)=IIL3)tXIL4)
Rl, H.ROURO)
; XIL4)=XIL3lXIL4)
Rl,tARO++
=
=
.,..
~
=~
~ '"1
~r':)
~::p
~ci5
N'"1
~a
~=Q
~
t=
Q.
~.
I
~IN
S
"
BLK2
"SINE
LDI
SUBI
LOI
LSH
LDI
lOl
;:,:
(j
.,
BLKI
AUTHOR; PANOS PAPAIIICHALIS
TEXAS INSTRlINTS
;:,:
"t:S
LENGTHTIll WTTERFLIES
tARl,Rl
RO, 'ARI
Rl , tARO
tARO++
tARl++URO)B
; RC=N
; RC SIIllD BE ONE LESS TIWI DESIRED
LOOP
eFHTSIZ,IRO
2,IRO
3,R5
1,R4
2,R3
l,IRO
1,R4
1,R3
;
;
;
;
;
;
;
lRO=INDEX FOO E
R5 f.DS TIE ClIlRENT STAGE tuIBER
R4=N4
RJoII2
E=E/2
N4=2'N4
N2=21N2
~
'"1
~
'<i
'"1
=
~
[IJ
BGE
LDF
; XCHANGE LOCIITI~
IF ARO<ARI
(H.y
INlOP
lOl
lOl
ADDI
LDI
@INPUT,AR5
lRO,ARO
tsINTAB,ARO
R4,IRl
'"1
:3
:l:..
;::
LDI
AIIDI
LDI
ADDI
LDI
SUBI
ADDI
!?
'ti
i1!
'"is
'
;::
.s;,
LDF
AIIDF
SUBF
STF
.~
"
"
STF
LDF
AIIDF
"
STF
STF
t"l
.'3
i:l
;::
SUBF
i:l..
!:\
;::
~
'l
"
S.
~
c;:,
a
c;:,
RI,'IIR5!1RlI
RO
RO,f1iR5
f+AR5!1RlI ,RO
RO,'IIR5IIRlI,RI
RO, 'AR51 IRII ,RI
RI,'IIR5!1RlI
RI, .fIiR5lIRli
LDI
LSH
LDI
SUBI
@FHTSIZ,IRI
2,IRI
R4,Re
2,Re
RPTB
If'YF
If'YF
If'YF
ADDF
If'YF
AIIDF
STF
AIIDF
STF
SUBF
STF
STF
IlK3
tAR3,f+AIlO!lRlI,RO
fAR4,fMO,Rl
tAR4, f+AIlOI IRII ,RI
RO,RI,R2
tAR3,'ARO++IIROI,RO
RI,RO,RO
RO,'AR2,RI
fAR2,RO,RI
RI,'AR4tARI,R2,RI
RI,'AR2R2,tARI,RI
Rl,*ARl++
RI, tAR3++
SUBI
ADDI
Clf'I
IlTD
ADDI
fINPUT,ARS
R"J,1IR5
@FHTSIZ,1IR5
INLOP
fIIf'UT,1IR5
SUBF
SUBF
'"
tAR5++IIRlI,RO
<+IIR5!1RII,RO,RI
RO,utAR5(IRl),RO
"
"
::
BLK3
Ia
N(p
.....
.....
I,C)
AIIDI
Clf'I
1,R5
KOO'HT,RS
END
IIl
LOCI'
III
END
,END
; AR3 POINTS TO XIL31=XlU+N21
; AR2 POINTS TO XIL2I=XlJI+I+1121
; AR4 POINTS TO X!L41=11L2+1121
;
;
;
;
;
;
;
;
;
;
;
RO=IIJI
RI=XlJI+IIL21
RO=XlJ)+IIL2)
IIJ)=XlJI+IIL2)
RO=IIJHIL21
IIL2)=XIJHIL2)
RO=llL41
RI=XIL31+IIL41
RI=IIL31lIL4)
XIL31=IIL31+IIL41
IIL41=X!L3HIL41
INNI'III1OST LOCI'
~
..,'"
'l
;::
NEGF
IIR5,ARI
I,ARI
ARI,AR3
RJ,AR3
AR3,AR2
2,AR2
RJ,AR2,AR4
;
;
;
;
;
;
;
;
;
;
;
;
;
;
RO=XIL31.COS
RI=XIL41.SIN
RI=IIL411COS
R2=IIL3IICOS+XIL4lfSIN=T1
RO=IIL31.SIN
RO=IIL31'SIN1II41ICOS=T2
RI=IIL2H2
RI=IIL2)+T2
XIL41=IIL21T2
RI=XlL1I+T1
XIL2)=XIL21+T2
RI=IIUHI
XlU)=IIL1l+T1
XIL31=XlUHI
; AR5=I+NI
; LOCI' BACK TO TI lIfER LOCI'
120
121
tl
tl
OOTSllLLOOP.
ftlllllU.lD'l
IIPf'ENIIX EI
A FAST COSIIE TRAIISFORII
!lASED IJI TIE ALGaUM OOTLIIED IIY IIYElNl GI LEE IN HIS ARTIQ.E, FCT  A
FAST COSIIE TRANSFORII, PUllLISIEII IN 11 PROCEEDINGS IF 11 IEEE INTER
LEE'S AU~IUM HAS BEEN IIIDIFIED TG ALLIJI NATlRIL OIlER T1~ _IN
COO'FICIENTS RATHER THAN 11 LESS _
INPUT SUGGESTED IN HIS ARTIQ.E.
~
'e
...
is
C
;:s
.gl.btl
.gl.btl
g,.b.,
.glob.,
.~
tl
:'l
<J
[
0
.,S'"
~
!il
~
c::;
~
'"
S
FCT
"COSTAII
aEFF
1M3, tAR4,RI
lAR2.tARI,RO
RI,<++AR7,RI
Rl,tAR4,R3
SUIIF3
1I'YF3
::
AIIIIF3
..
If'Yf3
AlllF3
RO,<M7,RO
R2,tARl,R2
II
STF
STF
RI, tAR2++URIIX
R3,tAR4++URIIl
..
STF
STF
RO, tAR3++URlll
R2, tARl++C1Rlll
FCTSllE
_COS
..DATA
...ord
.ord
.word
ft
COS_TAlI
com
Fcn
LDI
LDI
IFCTSIZE,MO
IfCTSIIE,IIK
LDI
LDI
LDI
LDI
LDI
ADDI3
SIIII
LSH3
LDI
ADDI
ADDI3
ADDI3
LllATA,AR6 .
LCOS,M7
MO,IRI
I,IRO
M6,MI
AR6,MO,AR2
1,AR2
IRO,MO,ARl
1,AR5
Mb,ARl
IRO,ARl,M4
IRO. MS, ftC
tv
THIS LIIP SERIES lIlES ALL 1IIIUTTERFLY STAGES EXCEPT 11 FINAL 1IiIE.
RPTII
END_CSlTER..LIIP
ADDI3
IRO,MS,ftC
AIIIIF3
lIGTD
tAR3++ ,1AR2,RO
ARl,AR2
ftlDllLLIIP
AlllF3
*ARl++,tlR4, RO
ADDI
IR
2,ART
OIOOH,ST
CllPI
SUllF3
END_CSlTER..LIIP'
1AR2,R2
IM3,Rl
text
'"
LIF
LIF
;:s
,
,
>
!!
~
"C
"C
tr:l
I'
>
I'!!j
=
~
nC
III
13
""I
III
SIJIII3
I,IRI
M6,ARI
IRO,AR6,AR2
IRI,AR2
2,IRI
OOTSIlLLlIP
1,AR5
IRO,M4,ARl
ADDI3
IRO,MS,ftC
LSH
LDI
ADDI
ADDI
CllPI
IIGTD
LSH
~
""I
::...
;:,
~
"t5
LDI
ADDI
LSH
ADDI
ADDI3
tf>YF3
:'l
'"
;:,
is
~.
4.IRI
1,AR3
l,ARS
3,AR4
IRO,ARS,RC
fM7, HAR7, R4
;:,
RPTB
END_2ND_LOOP
1M2, *AA1,RO
SUBF3
SUBF3
tf>YF3
ADOF3
..
I1PYF3
AD(f3
..
I1PYF3
STF
I1PYF3
STF
Sl
."'l
tl
(J
"'l
I:>
;:,
i:l..
a
'..,"
So
~
l:l
;:,
~
Cl
AD(f3
STF
;:,
END_2ND_LOOP:
'"Cl
So
'"
STF
<::>
.N
LDI
ARS,RC
; SET LI' REPEAT m.wER.
>AR2++IIROIB,'AR4++f1ROIB,RO
; DATA POINTER LPDATE.
AR1,R4
; USE INITIAL !\R1 VALl( AS IIfi:R LOOP
CONTRQ.
1,RC
fAR4++tIROIB
; CONTINLl lPDATlNG POINTERS.
AR2,!\R3
RPT8
END_INSIDE
LDI
AD(f3
LDI
SUBI
to'
LASLlNSIDE_LOOP:
AD(f3
ADOF3
STF
END_INSIDE:
STr
Rl,.tAR3++iIRll%
IlNED
LDI
SUBI
OR
tllR1++IIROIB,>AR2++f1ROIB,RO
; LPDATE DATA POINTERS
'AR3++f1ROIB,'AR4++f1ROIB,RO
tllR3++1 IROIB, tllR4++1 IROIB,RO
fARl++( IROIB, *AR2++( IR01B,RO
R4,AR4
; IS THIS LOOP Clft'LETE'
LASLlNSIIiU.OOP
; DELAYED IIlAI:H, IF NOT.
ARS,RC
; SET LP REPEAT m.wER.
1,RC
0100H,ST
; SET REPEAT ~DE.
tv
<::>
[$
~
~
RPTB
AD(f3
LAST..BLOCK
; SIt< THERE ARE AN ODD ~ OF
tllR1,>AR2++f1RlIl,RO
;
ADDITIONS, TIE FINAL (lS ARE
10 ~.
LAST..BLOCK:
STF
THIS LOOP SERIES DOES ALL OF TIE BIT REVERSE ADDITIONS AT THE END OF FAST
COSINE TRANSfIlRl1.
LOI
LDI
ADDI
LDI
'LDI
2,IRO
!\Ro,ARl
4,AR1
LDI
LSH
AR2,AR4
l,ARS
LASUlJTSIDE_LOOP:
ARl,AR2
B,IR1
; Ll'DATE POIIITERS
WJNTERS.
RO,fARl++(IR1)Z
; SAVE
ADDITI~.
Ctf'1
LSH
ADDI
1,IRO
IRO,R4
1,ARS
tRTlPLY IRO BY 2.
LPDTEE IIfR LOOP CONTRQ REGISTER.
ARE CALCl.I.ATI ONS COtf'LHE ,
OOTD
LDI
LDI
LSH
LASLOOTSI[E_lOCF'
R4,AR2
R4,AR1
1,IR1
END IF LAST
l.O(p
SERIES.
;::
't:j
!f
~
I:i
.:l
\::l
(J
.:l
~....
~
~
~
~
~
;:;.
~
~
N
C
o
C
LIf'
BEQD
_,RO .
!M*T..5TOOE
LSH
24,AR5
SUBI3
ARS, _,ARI
10'
ARI,I/IR6
~
"
"i5
;;
lI1SED 00 1l ALOOUTItt
FAST COSINE TRANSFIRI,
MlTI __ COOFERfII:E !II
DIEGO, CA, 1921 IWICH
is
LEE'S ALGORITHII HAS 1IEEN IIlDIFlED TO ALl.IlI NATURAl. IIIIER TIlE lUtAIN
ClEFFICIEHTS.
<Q.,
~
tl
.global
IFCT
global
gl.b.l
.global
"COEFF
I:l
So
'"...
;:,
~
c
COS_TAB
STF
_cos
COS_TAB
"
LOI
LOI
'"
LDI
LOI
ADOI
SUBI
LOI
l.SH
LOi
LOI
ADOI
tv
ac
IfCTSlZE,ARO
IfCTSllE,lI(
UlATA,AR6
LCOS,AR7
ARO,AR7
2,AR7
ARO,IRO
2,IRO
ARO,IRI
ARb,ARI
IRO,ARI
D:
STF
ADOI
lRO,ARI
LOI
LOI
SUBI
ARI,AR2
IRO,RC
2,RC
LOF
!.OF
ADDF3
IFCT:
tAR3,RJ
, GET IJ'PER HALF IF SEC!WD ADDlTlON.
tARI,*AR2++IlROlB,RI ,DO FIRST ADOlTl!ll.
RO,tMI++IIROlB
, STORE ADDlTlON DONE 1l LAST LOOP OR
WIEN INlTlALlZATI!II lIAS DONE ABOVE
:1
So
..
ADDF3
:1
ADDF3
"C(EFF
END_CENTER'
text
rd
.lIIord
,word
END_CENTER
!.OF
ADDF3
FCTSIZE
DATA
RPTB
t
(")
l.SH
tAR2+t IlRO 1B
tMI++IlROlB, *AR2++IlROlB,RO ,FIND FIRST SI.II. IItAKES
ARI, AR3
"IDlLE LOOP ftORE EFFICIENTl
AR2,AR4
ARI,AIlS
tM3++UROlB,*AR4++IlROlB,RI, lUtIY ADD TO LfDATE
,
POINTERS.
I,IRO
, IPDATE INlEl RGISTER.
"!DOLE:
,:i
I:S
NOP
ADIF3
LOI
LDI
LOI
ADIF3
Iff'EIIIlIX E2
STF
ADIF3
LOI
CftPI
SlED
l.SH
SUBI
OR
I,IRO
OUTSII(
ARb,ARI
ADOI
lRO,ARI
lSH
~I,IRI
~
>
r
('i
....~
;
13
8'
~
~
8'
.....
N
0'1
ENIUENTER_Lru':
THIS LOOP INCUJIES TlE LAST BIT REVERSED ADDITION STAGE, TIE FIRST
IIIJTTERFLY, AND TlE COSINE InTlP\.ICATlOOS FOO THE SECOND IIIJTTERFLY
SERIES.
: ll'DATE DATA POINTER FOO THIS LOOP.
: INITHILIZE INDEX REGISTER.
: INITIALIZE REPEAT COUNTER.
;::
SUBI
LDI
3,AR2
B,IRI
I'/lO.RC
3,RC
fAR7,R7
I,RC
RC,AR5
Rl'TB
END_CfNTEILLru'
'"is
ADDF3
*+AR2, *AR2,R4
g.
tf>YF3
'ARl,R7,1I5
;::
It'YF3
R7, R4, RO
..Q.,
ADIF3
fAR4,'AR4,R3
ADIlF3
It'YF3
115,HIRI,R4
HAR7,R3,Rl
ADIF3
SUBF3
RO, tAR2,R2
115,HIRI,1I5
!f'YF3
tAR7, R2, RO
SUBI
LDI
LDI
LSH
LDF
"tj
:=!
;::
"
."l
tI
(J
,;l
"
I:l
;::
I:l.
()
S.
'"...,
~
"
SUBF3
RO, 'AR2,R2
"
STF
STF
STF
It'YF3
R4,HIRI
115, fARl++(JRlll
rt'YF3
tAR7,R2,RO
SU9'3
Rl, tAR4,R3
ADIF3
It'YF3
R4, .AR3,R5
'AR7,R3,Rl
ADIlF3
SUBF3
Rl, tAR4,R3
R4, HIR3,R4
rt'YF3
tAR7,R3,Rl
STF
STF
STF
Rl, 'AR4
RO, '!Il2#( IRl),
R5, *AR3
I:l
RQ, ttAR2
tAR3,R7,R4
;::
~
O
'"
"
0
;::
S.
'"
"
tv
aa
"
"
STF
STF
THIS SERIES OF LOOPS DOES ALL IIIJT TIE LAST IIIJTTERFLY STAGE. Iil TIE
COSINE COEFFICIENT InTlPLlCATlOOS ARE DONE, INClUDII3 TI InTlPLICATIONS FOO TlE LAST IIIJTTERFLY STAGE. (THIS _
FUll 1il(l(S FOO
FAST EXECUTION.)
Rl,*M4++{IRll%
R4, fAR3#(JRm
SUBI
SUBI
LDI
LIF
LDF
2,AR7
I,AR4
ARS,RC
fAR7, 115
tAR7,R4
,
,
,
,
Rl'TB
END_NTL
SUBF3
'AR4,tAR3,Rb
ADDF3
I1PYF3
fAR4,tAR3,R7
R5,Rb,RO
"
ADIF3
It'YF3
*AA2, 'ARl, R2
R4,R7,Rl
"
SUBF3
fAR2, fARl,R3
STF
STF
RO,'!Il3#(IR1lX
R2, tAR(++( IR1I4
STF
STF
Rl,tM4++(IRllx'
R3, tAR2tt( IRIIl
NTLLOOP:
"
END~TL:
"
LDF
Clf'1
BNED
ADIF3
ARS,RC
foPfl..7,R5
fAR7,R4
ARl,!Ilb
NTLLOOP
tAR4#, tAR3, RO
::a..
;::r
AIIII'3
til
tM2++,IMI,00
OIOOH,ST
'1:i
;:~
LDI
AIIll3
LSH
is'
"
Cll'1
IIGED
AIIll3
LSH
LDI
AR3,MI
IRI,IIRI,AR3
I,IRI
IRI,IIRO
NTLUlII'
IOO,AR3,AR4
1.1IR5
AR:i,RC
C.
"l
(')
~
~
~
LDI
AIlDI3
SIIII3
LDI
LSH
SIIII
2,IRI
IOO,IIR2,AR4
IOO,IIRI,AR3
1IRO,1I:
2,Re
I,Re
RPlB
ElLLAST..LOOP
So
L<F
tAR4,RO
AIU3
tM2,tARI,RI
tM2,IMI,R2
,
,
~
<:S
II
STF
SlIIF3
RO,IMl,R3
RI, tARIURIl
RO,IMl,R4
::
STF
STF
R2, tAR2++URI)
R3,IIIR3URll
R4,IAR4++URll
SlIIF3
Q
C
"
~
C
AIU3
ElLLAST..LOOP'
STF
.ft'
All)
**
**
*f
*f
*
APPEND IX E3
*
f
**
*
*
*
*
.global
.global
COS_TAB
M
.set
16
data
*COS_TAB
.float
float
.float
float
.float
.float
.float
.float
.float
float
.float
.float
.float
.float
.float
.float
0.5024193
5.1011487
0.5224986
1.7224471
0.5669440
1.0606777
0.6468218
0.7881546
0.5097956
2.5629154
0.6013449
0.8999762
0.5411961
1.3065630
0.7071068
0.1250000
.end
128
APPENDIX E4
DATA FILE
:*
~.
.glc.bal
*
*COEFF
COEFF
data
.float
.float
.float
.float
float
.float
float
.float
.float
.float
.float
.float
.float
.float
float
.float
.end
137.0
249.0
105.0
217.0
73.0
185.0
41.0
153.0
9.0
. 121.0
23:3.0
89.0
201.0
57.0
169.0
25.0
129
130
131
IV
~
~
't:j
'is"
~
5'
0.2113
0.0824
0.7599
0.0007
0.8096
0.8474
0.4524
0.8075
0.4832
0.6135
0.2749
0.8807
0.653S
0.4899
0.7741
0.9626
0.9933
_"l
O.83bO
tl
(J
_"l
s::.
~
s::...
S.
'"....
~
$:l
~
S.
Cj
~
Cj
~
S.
'"
~
~
tv
a
C
0.7469
0.0378
0.4237
0.2613
0.2403
0.3405
0.1167
O.62'j()
0.5510
0.=
0.4943
0.0365
0.2260
0.8159
0.2284
0.=
0.0621
0.7075
').2408
0.6907
(1.1062
0.2640
0.7034
0.4021
0.6553
0.9700
0.0380
0.0900
0.2560
>
0.5598
0.9160
0.1402
0.7054
0.0178
0.2611
0.1358
0.0503
0.5782
0.2432
0.9448
0.5876
0.7250
0.2849
0.6707
0.8642
0.1943
IIPPENDrx Fl
"C
"C
t!>
e:=
~
I'
tfj
~
~
"C
t!>
Y=
30.3774
1.7780  2.5S84i
1.0376  2.3999i
1.0123 + 2.4889i
0.6594 t 2.3639i
1.5228  0.7527i
3.8171  O.205Qi
2.7096 + l.284li
2.1622  1.68b3i
0.2879 + 1.867li
1.5479 + I.6298i
O.b36b  O.l176i
2.2902 + 1.5S49i
2.4837  0.5842i
1.733S + 0.073Si
0.2180  0.4726i
0.2104 + 0.4897i
1.7473  1.0213i
0.1233  2.3915i
0.6415  1.1144i
2.7719  0.4802i
o.OOb3  0.3885i
0.7163 + l.5682i
O.32IB
0.7823
o.2S53
1.OBI3
3.4869
3.0352
3.2099
1.9511
I.B755
 1.3316i
+ I.OW7i
+ 2.B270i
 2.78bli
+ 1.94B5i
+ 1.3B5Si
+ 2.3564i
 0.7714i
+ 0.28b7i
.,.
Q\
I
~
....
<=
...
t!>
n
Q
=t!>
tll
t!>
~
~
~
==
Q
....
t!>
tll
::....
;,:
~
't;j
~
~
;,:
~.
;,:
,:""3
\:)
(j
,:""3
I:>
;,:
""
a
..,So
!:l
;,:
~
<::>
Co
<::>
;,:
So
~
~
~
N
Cl
0
>'
1.5474
1.87SS
I. 9511
3.2099
3.0352
3.4&9
1.0013
0.2553
0.7823
0.3218
0.7163
0.0063
2.7719
0.6415
0.1233
I. 7473
0.2104
0.2180
1.7338
2.4837
2.2902
0. b3bb
1.5479
0.2879
 0.2Ilb7i
+ O. nl4i
 2.3504i
 1.3855i
 1.9485i
+ 2.7S61i
 2.8270i
 1.0607i
+ 1.3316i
 1.:ib82i
+ 0.388Si
+ 0.4802i
+ 1.1I44i
+ 2.3915i
+ 1.0213i
 0.4897i
+ 0.4726i
 0.0738i
+ 0.5842i
 1.5549i
+ 0.1I76i
 1.6298i
 1.1lb7li
2.1b22 + 1.6863i
2.7096
3.8171
1.5228
0.6594
1.0123
1.0376
1.7780
 1.2841i
+ 0.205Oi
+ 0.7527i
 2.3639i
 2.4889i
+ 2.3999i
+ 2.5584i
....
W
APPENDIX F2
FILE TO BE LINKED WITH TIE SWlCf COlE All A M1'OINT. RADIX4
.globl
.gl.bl
.globl
N
::to.
;:s
"
"G
.dota
.Hoat
.flool
.fl ..t
.fl t
.flo.t
.flo.t
'"is'
'
.:;,
.float
.float
.fl ..t
.fl t
.fl ..t
.fl t
.fl ..t
.flo.t
.fl ..t
.float
.~
tl
(J
."'l
S
...'"
I:i
;:s
~
<:l
~
'"
S
.fl ..t
.flod
.fl ..t
.float
. float
.float
.float
. float
.float
.flo.t
.float
.float
.fl ..t
.float
float
.flOit
.fl ..t
.fl ..t
tv
C
.11 ..t
float
.fl ..t
.f10i.t
.flod
.11 .. t
. float
.float
.flo.t
.float
.float
.float
.float
0.000000
0.098017
0.195090
0.290285
0.38U83
0.471397
0.555570
0.1>34393
0.707107
O.moIO
0.831470
0.881921
0.923880
0.956940
0.980785
0.995185
aJSII
'"
M
I>
SII
;:s
l:l..
.set
.set
SII
N
m.
. float
.f1od
.flo.t
.flo.t
.11 ..t
1.000000
0.995185
0.980785
0.956940
0.923880
0.881921
0.831470
0.773010
0.707107
0.1>34393
0.555570
0.471397
0.38U83
0.290285
0.195090
0.098017
0.000000
0.098017
0.195090
0.290285
0.382683
0.471397
.float
.float
.float
.float
.float
.fl.oI
.float
.floit
.float
.fl t
.float
.float
. fl .. t
.float
.fl ..t
.float
.float
.float
. float
float
float
.floit
.float
.float
.float
.float
.float
.flo.t
:g>
0.555570
0.634393
0.707107
rD
o.n30IO
0.831470
0.881921
0.923880
0.956940
0.980785
0.995185
1.000000
0.995185
0.980785
0.956940
0.923880
0.881921
0.831470
O.moiO
0.707107
0.634393
0.555570
0.471397
o.38U83
0.290285
0.195090
0.098017
0.000000
0.098017
0.195090
0.290285
0.3821>83
0.471397
0.555570
0.634393
0.707107
O.moIO
0.831470
0.881921
0.923880
0.956940
0.980785
0.995185
=
~
~
~~
~:
0
I
01:00=
~~
....
. r:=
~
Q.
:r;....
=~
rD
rI.l
8
~
n0
Ii"
.,a'
~
0\
01:00
I
....~
it
**
ApPEND! X f:3
*l!I!
,~
0
SECTIONS
{
text: {}
.data : {}
IN 809800h : { 12fopt.obj(IN) }
.bss 809COOh: {}
}
135
136
Doublelength FloatingPoint
Arithmetic on the TMS320C30
AI. Lovrich
137
138
Short floatingpoint format, used to represent immediate operands, consisting of a 4bit exponent and a 12bit mantissa.
Singleprecision format, used for regular floatingpoint value representation, consisting of an 8bit exponent and a 24bit mantissa.
139
f
Figure 1. SinglePrecision FloatingPoint Format of the TMS320C30
Operations are performed with an implied binary point between bits 23 and 22. When
the implied mostsignificant nonsign bit is made explicit, it is located to the immediate
left of the binary point after the sign bit. We show the implied bit explicitly throughout
this application report for clarity. The floatingpoint number x is expressed as follows:
x
if
if
if
s = 0;
s = 1;
e = 128, s = 0, andf= 0
The range and precision available with the TMS320C30 singleprecision floatingpoint format are illustrated by the following values:
Most Positive:
Least Positive:
Least Negative:
Most Negative:
x
x
x
x
=
=
=
=
+3.4028234 x 10+ 38
+5.8774717 X 10 39
5.8774724 X 10 39
3.4028236 X 10+ 38
140
the most significant portion of the floatingpoint operation, while zz represents the least
significant portion of the floatingpoint operation.
As an example, consider the result of the exact addition of two floatingpoint numbers
x and y that are expressed in the singleprecision format of the TMS320C30:
x = 217FFFFFh
Y = OC7FFFFFh
= 233
= 233
X
X
As can be seen in this example, most of the precision available for y will not be
available to carry out the addition. Maintaining full precision for floatingpoint addition
requires extra mantissa bits beyond the 24 bits available on the DSP. Since the need for
such precision is rare, software methods are used to represent the result of the operation
as a floatingpoint number pair (z,zz). In our example, the exact result is represented as
follows:
z = 234 X 01.000 0000 0000 0000 0000 00l1b
zz = 209 X 01.111 1111 1111 1111 1111 1000b
The corresponding hexadecimal representation of (z,zz) is shown below:
z = 22000003h
zz = 097FFFF8h
Some definitions are basic to the development of concepts in this report. First is
the definition of the floatingpoint operations over a system R. The system contains all
the possible floatingpoint numbers that the singleprecision format of the TMS320C30
can represent. All the floatingpoint arithmetic is carried out in base 2. Therefore, R can
be represented as follows on the TMS320C30:
141
multiplier on the TMS32OC30 saves the upper 40 bits of the mantissa in one of the extendedprecision registers [1] and drops the least significant byte of the result. By this definition,
the floatingpoint multiplication on the TMS320C30 is faithful. Since the algorithms require the floatingpoint result to be in singleprecision format, the floatingpointmultiplication on the DSP must therefore be followed by a second truncation step. Saving the contents
of the extendedprecision register to a memory location or masking off the low 8 bits results
in truncation.
A floatingpoint operation is optimal if for all x and y, the result of fl(x * y) is an element
of R nearest to (x * y). In other words, the roundoff error should not exceed onehalf
of the last remaining bit position. This is commonly referred to as rounding.
The results of floatingpoint operations on the TMS320C30 are stored in the extendedprecision registers [1]. The extendedprecision register adds 8 bits of precision to the
floatingpoint arithmetic result. Execution of the RND (round) instruction forces the result
of the floatingpoint arithmetic to be optimal. When you round the result of the addition
or subtraction operations on the TMS320C30, these floatingpoint operations become
optimal.
Z+zz=x+y
(2)
142
=y
(z  x)
(3)
(4)
In particular, Ixl > Iyl must be valid for Equation (4) to be valid. Implementation
of Equation (4) on the TMS320C30 always generates the exact correction term zz if the
result of floatingpoint addition operation is made optimal. This requirement guarantees
that the result of singleprecision floatingpoint add and subtract belongs to system R. By
swapping the x and y values 'when Ixl < IYI, the condition for obtaining an exact result
is met.
The algorithm requires that x and y be normalized. Normalization guarantees that
the floatingpoint number has only one sign bit, and that sign bit is followed by nonsign
bits [1]. Floatingpoint addition on the TMS320C30 assumes that the operands are normalized.
The TMS320C30 assembly code for obtaining the doublelength sum of two
singlelength floatingpoint numbers x and y is shown in Appendix A. First, the values
for x and yare interchanged when Ixl < Iyl. When you add x and y values, the number
with the smaller exponent, y, is shifted repeatedly until the exponents of x and yare equal
and their mantissas are aligned. We have now calculated the singlelength number,z, that
satisfies Equation (2). Since the floatingpoint addition on the TMS320C30 is made optimal by rounding, the extra precision is, in effect, dropped. The extra precision value,
zz, is obtained by implementing Equation (4). Figure 2 is a graphical representation of
the implemented algorithm. The figure also shows the relationship between doublelength
number pair (z,zz) and singlelength floatingpoint numbers and their representation on
the TMS320C30.
143
FaA'
e(X~
I I
e(y)
24
f(x)
'I
f(y)
x+y
z
zz
f2(normalized)
Doublelength Addition .
A natural extension of exact singlelength addition and subtraction is its application
to doublelength arithmetic. Figure 3 shows an algorithm for implementing doublelength
addition on the DSP. Using this algorithm, you can add two doublelength numbers (x,xx)
and (y ,yy) and represent the result as a doublelength number (z,zz).
The algorithm requires forming a doublelength number (r,rr) that represents an exact addition of x and y. Generating a second number, s = rr + yy) + xx), results in
a number pair (r,s) that approximates the addition of (x,xx) and (y,yy). Finally, an exact
addition of rand s generates a doublelength number (z,zz) that has the same value as (x,xx)
+ (y,yy).
To obtain exact results for addition and subtraction, subtraction and addition must
be optimal; this is guaranteed by following each subtraction or addition instruction on
.
the DSP with a round instruction.
144
r = x + y;
if (abs(x) >abs(y
s = x  r
else
s =y  r
z = r + s;
. zz = r  z + s;
+ y + yy + xx;
+ x + xx + yy;
(5)
145
;
;
;
;
hy
ty
= Y=y
P + p;
hy;
p = hx X hy;
q = hx X ty + tx X hy;
z = P + q;
zz = p  z + q + tx X ty;
Figure 4. Exact Singlelength Product
Doublelength Multiplication
The doublelength multiplication algorithm, shown in Figure 5, relies on the
singlelength algorithm discussed earlier. The algorithm generates a nearly doublelength
approximation of the output result (c,cc). Note that the exact singlelength mUltiplication
routine is used for this approximation. Exact addition is used to generate a doublelength
floatingpoint number that is the closest approximation to the actual result.
The doublelength product program implementation uses the TMS320C30 stack
capabilities to save some intermediate variables. These programs are written to be used
as callable functions or macros in your program. In either case, the stack pointer must
be set to a valid memory segment for proper code execution.
; Calculate the doublelength product of (x,xx) and (y,yy)
; the result being a nearly doublelength number (z,zz).
; Program uses exact singlelength multiplication, mult12 (.).
mult12 (x, y, c, cc);
cc = x X yy + xx X Y + cc;
z = c + cc;
zz = c  z + cc;
Figure 5. Exact Doublelength Product
146
xx  c
yy) / y;
147
Error Analysis
This section discusses and determines an upper bound for the error generated in
forming a doublelength result. The value of the doublelength number (z,zz) is equal to
z + zz. Singlelength addition, subtraction, and multiplication results are always exact.
In doublelength addition, any error introduced in the end result is generated by calculating
the zz term. An upper bound error magnitude has been calculated in Reference [2] and
is shown in Equation (6) as follows:
Iy +yyll x
2 2  2t
= IZI x
(6)
222t
where t = 24 for this system. This gives an upper bound of Izi X 246 , or approximately Izl x 1.42 x 10 14 . This translates to a theorical accuracy greater than 13 decimal
places. Table 1 shows an example of doublelength addition using the exact addition
algorithm previously described. The numbers in the left column represent TMS320C30
hexadecimal notation for the floatingpoint results, and (z,zz) is the decimal equivalent
of the doublelength output result. Appendix B shows a listing of C programs (exact) that
convert from TMS320C30 hexadecimal notation to decimal notation.
Table 1. Exact Singlelength Arithmetic Examples
Singlelength Addition
x
= 217FFFFFh
= OC7FFFFFh
= 22000003h
zz
= 097FFFF8h
= FC7C8923h
= OA29A7E5h
= OA29ABD8h
zz
= EFA46000h
148
= OF7FFFFFh
= 21FFFFFFh
= 30800000h
zz
= 18800002h
= FC7CB923h
= OA29A7E5h
= 07277BF7h
zz
= EBA714FOh
(DS~
The doublelength product, quotient, and squareroot algorithms all have a small
relative error. The upperbound error magnitude for each is given in Equations (7) through
(9).
IExl=(lx+xxl x Iy+yyl) x 11 x 2 48
(7)
IE + I =(Ix +xxl
(8)
(9)
149
x
xx
y
yy
Z
zz
x
xx
y
yy
Z
zz
=
=
=
=
=
=
=
=
=
=
=
=
22000000h
097FFFFEh
21000001h
097FFFFEh
43000002h
(Z,ZZ)
1.47573996570139475 x 10 20 (Exact)
1.47573996570139427 x 1020 (DSP)
2A7FFFFCh
22000003h
097FFFF8h
OA29ABD8h
EFA46000h
2C29ABDDh
(z,zz)
23319450552284.2434 (Exact)
23319450552284.1250 (DSP)
13907DC2h
Doublelength Quotient
x
xx
y
yy
z
zz
x
xx
y
yy
Z
zz
=
=
=
=
=
=
=
=
=
=
=
=
43000002h
2A7FFFFCh
2C29ABDDh
13907DC2h
1641205Ah
(Z,ZZ)
FC24BE20h
6328365.08044074177 (Exact)
6328365.08044075966 (DSP)
22000000h
097FFFFEh
21000001h
097FFFFEh
007FFFFDh
(z,zz)
1.99999964237223082 (Exact)
1.99999964237217398 (DSP)
D3400000h
x
xx
Z
zz
x
xx
Z
zz
150
=
=
=
=
=
=
=
=
2C2BDDOOh
3907DC2h
61451A4h
(Z,ZZ)
FB39EF11h
4860114.04539400958 (Exact)
4860114.04539400712 (DSP)
21000001h
097FFFFEh
103504F5h
F7BC0784h
(Z,ZZ)
92681.9110722252960 (Exact)
92681.9110722253099 (DSP)
Note that the results were obtained using the programs shown in Appendix B. The
C programs were created and compiled on a 80386based microcomputer running under
MSDOS 3.3.
Set up a local frame by saving the old frame pointer on the stack.
Assign the new frame pointer to the current value of stack pointer.
Allocate the frame.
Save any dedicated registers that the function modifies.
Execute function code.
Store a scalar value in RO.
Deallocate the frame.
Lastly, restore the old frame pointer [4].
The following code segment shows the singlelength addition routine modified to be
in Ccallable form. Note that registers R4 through R7 and AR4 through AR7 are dedicated
registers used by the compiler. These registers must be saved as floatingpoint values.
single
fp
x
y
z
zz
.set
.set
.set
.set
.set
.set
OFFh
ar3
rO
r1
r2
r3
151
w
x1
y1
.set
.set
.set
.global
.width
.text
r4
r2
r3
_add12:
push
pushf
push
Idi
Idi
Idi
absf
absf
cmpf
Idflt
Idflt
dflt
fp
r4
r4
sp,fp
* fp[2],rO
*  fp[3],r1
x,x1
y,y1
y1,x1
x,x1
y,x
x1,y
; Save old fp
addf3
rnd
subf3
rnd
subf3
rnd
pop
popf
pop
retsu
.end
x,y,z
z
x,z,w
w
w,y,zz
zz
r4
r4
fp
;z=x+y
96
_add12:
; Form w = z  x
; zz
= y  [y  w]
; Restore fp
Conclusion
This report presented an implementation of extendedprecision arithmetic routines
for the TMS320C30 DSP. The programs presented include singlelength floatingpoint addition, subtraction, and multiplication, which produce exact doublelength results.
Doublelength floatingpoint addition, subtraction, multiplication, division, and square root
were also presented. The doublelength floatingpoint routines all had a small relative error that appeared in the correction term zz. However, it has been shown that the accuracy
of the doublelength floatingpoint result is at least 13 decimal digits. Table 3 is a summary
of information about the routines contained in Appendices A and B. Execution times shown
152
in the table are given only for the routines in Appendix A. These times do not include
the call and return if the routine is implemented as a called function. They also do not
include any context saves and restores that may be required.
Table 3. Summary Information
Routine
Mnemonic
Appendix
Code Size
Execution
(Words)
(Cycles)
Singlelength Add
_add12
A1
12
12
Doublelength Add
_dbladd
A2
25
25
Singlelength Multiply
"1ult12
A3
35
35
Doublelength Multiply
_mult2
A4
51
51
Doublelength Divide
_div2
A5
115
115
_sqrt2
A6
163
163
C30DBL
B1
C30DBL2
B2
References
[1.]
[2.]
[3.]
[4.]
[5.]
[6.]
ThirdGeneration TMS320 User's Guide (literature number SPRU031), Texas Instruments, Inc., 1988.
Dekker, T.J., "A FloatingPoint Technique for Extending the Available Precision",
Numer. Math. 18, 1971, pp 224242.
Linnainmaa, S., "Software for DoubledPrecision FloatingPoint Computations",
ACM Transactions on Mathematical Software, Vol. 7, No.3, Sept. 1981, pp
272283.
TMS320C30 C Compiler (literature number SPRU034), Texas Instruments, Inc.,
1988.
Kernigan, B.W. and Ritchie, D.M., The C Programming Language, 2nd Revision,
PrenticeHall, Englewood Cliffs, New Jersey, 1978.
Kochan, S.G., Programming in C, Second Edition, Howard K. Sams, Indianapolis,
Indiana, 1988.
153
Appendix A
154
;::
<::!~
~~
~
2
S
~
~.
()Q
HHHfHHHHfIIIIIIIIIlIIIIIIIHfHfHfHHHHfH..
Entry Conditions:
Upon entry (rO,1'1l contains Ix,y)
Exit Conditionsl
Upon exit 11'2,1'31 contiins Iz,zz),
Registers Affected:
rO, 1'1, r2, 1'3, 1'4
Revision: Original
Execution n.t: 12 cycles
H*tHHHH+HHHJ:HfHfffHfHfHHffHffHfHfHff
single
::I.
.set
.set
;;!
....
~
<"\
~
~
.set
.9101>01
zz
xl
yl
.set
.set
.set
Offh
..iddl2
e:=
,0
,I
,2
.>
,3
,4
.Sit
,2
.set
. text
,3
...
=
rLJ.
_iddI2.
~
~
N
C
x,xl
absf
ibsf
capl
Idflt
y,yl
yl,xl
1df1t
y,'
Idllt
xl,y
iddl3
rod
x,y,z
5ubf3
X,Z,III
x,xl
; :xl ) 1'1: ?
; if not, exchuge x " y
.end
....
VI
VI
III,Y,ll
zz
t""
==~
(JQ
Iz=x+y
,od
(JQ
5ubf3
,od
retsu
>
"=
"=
;fol'."=zx
>
Q.
;zz=y~
AUHIOIl: AI Lovrici1
rexa~;
2/21/89
In~;trumenls,
Inc.
neqi~;ter5
AffeGtecl:
rN, r 1, r2, r3, r4, rh, rf}, r7
* Hev i s ion: Or i ~J i na I
* ~::xecu1ion time: 2~) Gycles
**** ***'Ie *** ******(lhl,,,ld
'" ********** **** *** **** **** **** *** ****
.~ll
xx
y
yy
z
lZ
x1
yl
obal
set
set
set
set
set
set
set
set
set
se t
rN
rl
r2
1<3
r4
15
rG
(!
r6
r7
.Iexl
clbI a(ld:
absl
absf
011p"t
I (If I I
I elf! t
I clf I I
I ell I t
I (If 11
I elf It
x,x1
xx, y1
ancl (y,yy)
vy. xx
x1, y
y1, yy
x, y, r
rnc1
subf3
rncl
adell3
!""ncJ
adelt
rnd
aelelf
rnd
subf3
I"nd
acldf3
rnd
retsu
.end
y,x
adell3
adelf'3
I"nd
156
><,x'l
y, y1
y1, x1
+ y
,x.:;
y, s, s
r +
yy, s
r + y .f yy
xx, s
r.f
r, lZ
YY
xx
r + s
s, r ,1
7.
77
"l..l.
S, 17,7'1
2/
Z + S
zz
subf3
f
~.
~~.
::...
;:;.
::!.
Al.JTt('R:
Entry Conditions:
Upon entry (rO,rlJ contains (x,y)
Exit Conditions:
Upon exit (rO,rH contains (Z,IZ),
Registers Affected:
rO, rl, r2, r3, r4, r5, NI, r7
Revision: Original
Execution TiH: 3S Cycles
~.
single
~
N
C
o
C
,gl.bal
...1
set
. set
, .. I
x
Ix
q
by
Iy
zz
.Ieop
set
.set
.5et
.set
set
.set
.set
.set
.text
.....:J
"",hy,p
;p=hXfhy
; fJl.> is faithful
.pyl3
andn
"py13
andn
addl3
rnd
subl3
rnd
addl
rnd
.pyl3
rO
rl
r7
single, te.p
tx,hy,q
single,q
q, te.p,q
;telp=hxfty
; f1(*) is fi.ithful
;q=txfhy
; f1 (f) is faithful
; q=hXfty+txfhy
p,q,z
>
;z=p+q
~
~
I'D
z
l,p,ZZ
zz
q,zz
zz
txt ty, te.p
andn
single, hap
addl3
rnd
ZZ, hiP, zz
;zz=pz
I."p,x,p
; p
single,p
; f1(.) is fiithful
subf3
PIX, hx
Ihx=xp
rnd
hx
addl3
rnd
"",p,""
hx
;hx=xp+p
s.bl3
rnd
hx,x, tx
I tx =xbx
"py13
te.p,y,p
single,p
; p = Y I constant
; fHI) is faithful
p,y,hy
hy
hy,p,hy
hy
,hy=yp
= x cons tint
.=
Q..
;zz=pz+q
;telp=txfty
; flit) is faithful
; zz=PZ+q+tXfty
zz
..
=
'J.l
rttsu
(JCl
,data
I'D
.11.al
'constant, teap
rnd
addl3
rnd
singJe,p
conshntl
Idl
.pyl3
andn
subf3
Ul
.... 1112
om
rO
rl
r2
r3
r4
,5
r5
r6
.... 1112'
indn
"py13
ando
addl3
rnd
HfHHfHftHHHHHfHfHHHHffHfHHHffH+tHf
;:;.
,Iy=yhy
AI Lovrich 2/21189
~;:;.
:!J
~
hy,y,ly
Iy
rnd
.end
4097
; constant
=2"'1242412)+1
==I'D
(JCl
.=.
""d
Ix
;hy=yp+p
u;
00
HHffHHIHtfHfHffHHfHffHflfHHfffHffHUHf
FlKTlOOIEF , "ult2
Al LOYl"'ich
212118'1
Texas InstruHots, Inc.
Entry Conditions:
Upon entry (1'0,1'1> conhins (x,y),
and (1"2,1'3) contains (xx,yy).
Exit Conditions:
Upon exit (1'0,1'1) contiins (z,zz),
Registers Affected:
rO, 1'1, 1'2, 1"3, 1'4, 1'5, 1'6, 1'7
!f
So
~
::l'.
;:.
~~.
~
:I.
So
~
::l'.
'"
'c:>"'
;:.
So
'"
a
C
,hx=xp
rnd
addl3
rnd
p,x,hx
hx
hx,p,hx
hx
subf3
rnd
hX,x,tx
tx
;tx=xhx
opyl3
tup,y,p
; p =Y f constant
andn
singlt,p
subl3
rnd
addl3
rnd
p,y,hy
hy
hY,p,hy
hy
addl
rod
constant
tup,cc
1 CC=X*YY+XX*Y+CC
"
addl3
rnd
;hx=xp+p
CC,C,1
;z=c+cc
.. zz=cz+cc
Algorithl used:
lult12tx, y, c, cel;
cc=x*yy+xx.y+cq
Revision; Original
Execution TiIH': 51 cycles
HHHfHfffHflllllllll.llllllllllllffHHHffHffHf
....1t2
,global
single
Offh
.set
.set
rO
r1
y
.set
p
r2
.set
r3
hx
.Stt
r~
set
Ix
q
r5
.set
r5
hy
.set
ty
.set
r'
.set
rO
r1
zz
.set
xx
.set
r2
yy
r3
.set
r~
.set
cc
.set
r'
t ..pO
.set
r'
.set
r7
tellP
.ttxt
...ult20
; teapO = ~yy
apyl3
x,yy, teapO
udn
single, teapO
; teap = ylxx
y,xx, teap
lIPyl3
single,teap
andn
; tHP = xfyy + ylxx
te.pO. teap
addl
leap
rnd
; (xiyy + y1xx)
telp
pushl
breik:
I ault12(x, y, e, cd
I cc=xlyt+xxfy+ec
subf3
IZ=C+CC
zz=cz+cq
Iconstant, tnp
teap,X, p
; p =X
single,p
AI.JTtI(fi:
z = c ... ee;
;;:
<:l"
~
Idl
apyl3
ando
subf3
hy,y,ty
,hy=yp
;hy=yp+p
,tY'yhy
rnd
ty
hx,hy,p
singie,p
lIjIyl3
indn
Ipyl3
ando
addl3
rnd
apyl3
andn
addl3
rnd
; ttlp = tx I ty
subf3
e, p,ee
;ee=pe
rnd
addl
rnd
addl
rnd
cc
;p=hxfny
;zz=cz
;zz=cz+cc
retsu
(t>
data
.end
>
"C
"C
4097
; conshnt
= 2"(242412)+1
=
e:
>
~
~
;tttp=hXfty
6(t>
;q=txlhy
;q=hxlty+txlhy
t"'4
==
(t>
(JCI
;e=p+q
;ee=pc+q
cc
teap,ee
cc
; ec=pc+q+txfty
teap
; XIy + ytxx
I restore variables
popl
Z,C,tZ
zz
zl,ee,zz
zz
constant:
.float
apyf3
andn
q,ee
subl3
rnd
addl3
rnd

=
...
"C
;:
HHIIIIIIIIIIIIIIIIIIIIIUII"IUllllfHHfHffHffH
I
0<>
15
g.
0<>
~
S
I
I
I
AlgoritM used:
IlUltl2(c, Y. u, uu);
::I.
S
I
I
cc = I,x u uu +xxc
Z = C + ee;
%z=cz+cq
~.
a
C
globol
.5tt
_div2
Offh
rO
rl
r2
r3
r4
r4
r5
r5
r6
rO
rl
r2
r3
r7
r3
rl
r2
r3
set
.5.t
zz
.5tt
y
p
.s.t
.s.t
.5It
hx
.5.t
Ix
yl
.Stt
hy
ty
.stt
.5et
,set
.5et
zz
.5.t
set
xx
yy
tt.p
ttopl
teap2
...t
set
.s.t
... t
.s.t
.5.t
cc
u
uu
VI
pushf
rO
iSh
24,rO
popf
f
xx
Sive x
; Now Rl
x[O]
:I
opyf3
.. dn
rO,r1,r2
singlf,r2
; R2
subrf
2.0,r2
r2
; R2 = 2.0 
r2.rO
lin91e,rO
; Rl
rO.rl,r2
sing1t,r2
2.0,r2
; R2 = V
andn
Sivt
24,rO
rO
rO
1.0" 2H(;1).
:I
udn
subr;
rnd
opyf
indn
I:
I:
Y f x[OJ
x[1]
V f x[O]
I:
x[OJ
(2.0  v
f x[O])
x[tJ
; R2 = 2.0 
V f
x[1]
r2
r2,rO
singl rO
~
~
6
~
...
<
rnd
lIIIyf
save yy
rO
1,rO
~.
rtfgi
subi
uh
push
f
Q..
opyf
yy
xx
rl
pop
the RI = 127  I = 128. Thus x[Ol = 0 ond this _ill (iuse tho olgoritha
tfxt
1.0
rl
_div2'
pushf
1"1,1"3
pushf
yy) I y;
pushf
ldf
..bsf
tHffHtHftfHHHHffHfHHflllllllllllllllllffHU
single
inv_f:
Rtvision: Original
<)
c=x/y;
'"
'"
to
Regist.rs Affected:
1'0, 1'1, 1'2. 1'3, 1'4, 1'5, r6, 1'7
S
Entry Conditions:
Upon entry 11'0,1'0 contains Ix,y),
and (r2,r3) contains Cxx,yy),
Exi t Conditions:
Upon exit (1'0,1'11 contains (Z,lZ),
...
;:s
; SlY' Y
Icx/y;
"
S
pushl
8l
"pyf
lAdn
subr
,nd
lI'yf
iOdo
1"0,1'1,1'2
li0911,1'2
2.0,,2
,2
;R2=vlx[21
1'2,1'0
; Rl = x[3] = x[2]
o,yf
iOcin
;R2:cv
.nd
lI'yl
1"0,1'1,1'2
Sillglt,rO
2.0.,2
.2
1'2,1'0
iOdn
singl.,tO
ldl
opyl3
udn
leonstant, tap
.. bf3
rnd
oddf3
,nd
p.x,hx
I hxx"p_
hx
hx.p.""
hx
I hx=xp.p
bI3
.nd
hx,x, tx
Ix
;tx=xhx
opyl3
top,y,p
singt."
; P . y I eGnstlRt
lndn
bI3
,nd
oddl3
.nd
p.y.hy
hy
hy.p.hy
hy
Ihy'yp
bI3
.nd
hy.y.ty
ty
IIY'yhy
opyl3
!\x,hy,p
singl."
I p""'hy
;I.... hx.ty
; R2 =.2.0 ..
x[2]
(2~O
subrf
; R2
x[31
=2.0  v
x[31
I::
1:3~
~S~
~.
~
~.
~
::I.
S;:
~.
"l
g
S'1>
N
C
Q
C
lI'yl
iOdo
Jubrf
.nd
lI'yl
iO'"
addf
.nd
Noll tho CUt .1 v
..,I
1'0,1'1,1'2
singll,r2
1.02
.2
1'0,1'2
lingll,1'2
.2 0
; R2 = v x[41 = 1.0.. 01 .. =) I
rO 1
<0 i.
hlndl,d.
Idl
ldln
1'1,1'2
.33
1'2,1'1
Idl
rl,t.
; MVt lly
o,yl
tncin
I IUlU2le, y, u, uu)
lI'yl3
Indn
lingl., ttlp
lI'yf3
.. dn
Iddl3
rod
lx.hY,q
singl',q
q, t ..... q
q
lI'y13
III'"
; rtstor. y
; rtstort x
; SlY. x
yl,x
singlt,x
Ie;,; x I
stv. vt.rilblts
puohl
phl
lndn
x
yl
; SlY' e
; SlY' lly
Ihy'yp+p
;qtx'hy
lqahxtty+txfhy
I rtstort vlritb1ts
popl
popl
pu.hl
I P x constant
 v x[2])
singl.,rO
1
t ..... x.P
singl",
Iddl3
rod
tty)
.vbl3
rod
oddl
.nd
tx, ty,ttlp
.t..,tx.ty
singlt, ttIP
P.,q,u
;up+q
U,p,UU
tUUpu
VI
q,uu
v.
I uupu+q
popl
popl
popl
subl3
end
subf
rnd
popl
oddl
rnd
popl
.pyl
indo
S
i\'
i\'
JJ
S.
~
~.
~
S
subf
...
S.
rod
opyf
indn
yl
c
restore lIy
res tOte ,
t p
testort x
U, tNP,CC
(c=xu
cc
UU,CC
;cc=xuuu
cc
hap
ttllP,CC
; restore xx
; cc =x  u 
YU
+ xx
cc
ttlp
c, ttlp
single, tMP
tUp, ec
; restore yy
; c yy
; cc=xuuu+xxc*Y)
cc
=( x 
yl,ec
single,cc
; cc
(,Ce,l
IZ=C+C'
l,C,ZZ.
;zz=cz
u  uu
xx 
:::So
;:
IZ=C+CC
oddl3
rod
ri'
S.
zz=cz+cc
subf
rnd
aW
rnd
rehu
'"
~
~
Q
C
0'1
zz
ce,ll
; ZI
=C 
+ CC
zz
dati.
cOIlsh.ot;
.floit
.end
4091
; conshnt
= 2"(2424121+1
C f
yy ) I Y
....
flll:TllII IEF
pu.hf
",yf
..d.
pushf
pop
uh
.sqrt2
li/liiii. Al ltYricb
2121189
EAtry Conditio.s
Upon try (rO,rU cooloi (x, xx).
Exi t Conditiensl
(ztu),
Rlglsl.r. Afftcl.d'
rOt 1'1, .1'2, ,.3, r4, rS, 1'6, 1'7
IHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHH
~
;:
Revisionl Drigi....1
Extculio. n ... 163 Cycl
.globtl
sinlle
...t
y
p
hx
Ix
q
by
tt
ot
.Ht
.Ht
.Ht
.Ht
If
... t
OQ
S~
~.
I
cl'
a"
.Ht
.HI
zz
xx
t..,
S;:
cc
::I.
"'
g'"'
::to
.
cI
nogi
h
push
popf
"'
tv
C
Q
C
C
I
:II
O.25,rO
singl.,rQ
single,r2
.o,r2
I r2 (v/2) [0]
zz
lIIyf
..2,1'1
singl',r1
.o,r2
I r2 (v/2) tIl
.1Il
singlt,r2
; r2' 1.5  (v/2) 1Il 1Il
r2,rl
singl',rl
I rl [21
= .[n
subrf
1.5,r2
r2
r2,rl
singl.,r!
; rl
rl,rl,r2
singl.,r2
rO,r2
lingl.,r2
; r2 x[3]
r.d
IIIYf
Inda
IIIYf
Indn
lIIyf
..d.
.[2]
singl.,r2
I r2
= x[31
= (v/2)
x(3]
[3]
t""
~
(;I.l
i!
~
rl,rl,r2
sin'gl.,r2
.o,r2
[2]
1:1"
1.5,r2
r2
= (v/2)
=
&
.1Il
lIIyf
.. d
indn
xx
.[01
; r2
IIIYf
Indn
1.5,r2
r2
I r2 111
rnd
IIIYf
.[0]
,.1,rl,r2
lingl.,r2
Indn
sub"f
; Slye y.
= I.S
Indn
lIIyf
rO,r3
5ing1.,r2
=....
~
&odn
lIIyf
andn
>
"=
"=
1:1
I r2 = x[O]
5ubrf
letr.ct tH ..p....t of v.
pu.hf
rl,rl,r2
rnd
.0
retslt
rl
24,rl
rl
rl
"pyf
lIIyf
..d
sqrtlx)
ldf
",yf
..d.
_.qrt2
Offb
rO
rl
r2
r3
r4
r5
r5
r6
rO
rl
rl
r7
r2
r3
.sqrt2t
S
...t
.Nt
.Nt
...t
...t
.HI
.Nt
.Ht
.to.t
~
~
rO
rl
25,1'1
Gent"t. v/2.
z .e+ce,
zzcz+cc,
I
UII),
cc ( x  u  uu + xx ) 10.:11 I CI
;:
; SlVI Ie
Algorilla u..d'
c ill 541"tex),
IUtt12ec, c, u.
2.0.rO
singl.lrQ
x[31
::::
<:l~
~
;:
S
S"
<><>
;0
g"
~
:::So
'"
tv
[3
C
0'1
W
single,r1
opyf
,..1,1'1,1'2
single,r2
rO,r2
5iogle,I'2
1.5,,2
,2
; 1'2
1'2,1'1
; 1'1
andn
<::>
S
andn
subrf
rod
apyf
i::l
'~"
"I'yf
~.
1'2,1'1
andn
<><>
SS!
1.5,r'2
,2
rod
opyf
subl'f
tl
<::>
andn
Idf
single, 1'1
rl,rO
apyf
1'3,rO
andn
5i091e,1'0
= (v/2)
opyf3
andn
opyf3
indo
oddf
rod
Idf
I"p,q
1 q=hxfty+txfhy
x[4]
11.5  (vl2l*d41tx
.pyf3
Indo
.ddf3
rod
tX,ty,ttap
;telp=tx+ty
single, teap
p,q,u
u
lu=p+q
subf3
u,p,Uu
;uu=pu
= sqrtlxl
rod
oddf
rod
oddf
,nd
; save c
x,y
singl',q
I"
Slve variables
pushf
;ttlp=hx+ty
;q=tx*hy
x(4] f x(4)
=x(5]
uu
;uu=pu+q
q,uu
uu
, uu=pu+q+txfty
telp,tltI
uu
.. cc =.!
X  U  Uti
+ xx I
popf
popf
subf3
'nd
subf
'nd
popf
addf
,nd
telp
teap,cc
cc
; restore xx
;cc=xuuu+xx
push!
pushf
cc
; sa~e cc
; save c
0.5 / c
It:onstant, hip
hlp,x,p
; p = x * constant
single,p
5ubf3
,nd
oddf
,nd
p,x,hx
hx
p,hx
hx
;hx=xp
subf3
,nd
hx,x,tx
Ix
;tx=xhx
lIflyf3
andn
tup,y,p
single,p
; p
subf3
,nd
oddf3
,nd
p,y,hy
hy
hY,p,hy
,hy=yp
hy
subf3
,nd
hy,y,ly
Iy
,Iy=yhy
lPyf3
andn
hx,hy,p
single,p
,p=hx1hy
;hx=xp+p
= y " constant
,hy=yp+p
hip
u, hap,cc
; r.storf C
; rutorf x
;cc=xu
cc
uU,ce
;cc=xuuu
cc
r2,r3
,2
; v is
.2
.1
uh
24,.1
of y
rl,r2,rO
single,rO
; Rl =
1.0,rO
rO
rl,rO
singlt,rO
rO,.1
; RO [5] (.[4]*(I.o(VOX[4llU+>[4]
rl,r2
y ..
x[4]
>
.pyf
tndo
subrf
d
opy!
Indn
add!
I:
1.0 01
negi
.1
iSh
1,.1
24,.1
posh
popf
.1
.1
Jlbi
i
l;;"
d
opyl
Indo
~
So
opyf
..do
1~
~.
~
::!.
So
I
g
So
<1>
Q
c
subrf
opyl
..do
rl,r2,rO
iinglt,rO
2.0,rO
rO
rO,.1
single,r'
rl.,2,"0
singl.,rO
2.0,rO
rO
rO,.1
ntgf
Idl
Idln
opyf3
..d.
sub.f
<0 is
rO,r2
.. restore YII'ilblu
popf
popf
opyf
indo
opyl
indo
; RI 2.0  v I .[0]
; RO x[IJ = x[O] I (2.0  v I x[Oll
t p
cc
0.5,ce
singl.,cc
r2,tc
; restore c
; restore cc
; cc ( x  u  au + )0(
; cc
=( x 
singl',ee
,RI=.lx[IJ
I=C+C(
; Rl: 2.0 
V t
xU]
oddl3
.nd
t.I"ce,l
; z=c+cc
I, tap,ll
I ZZCZ
zzcz+cc
,RI=v l .[2]
2.0,rO
rO
; RI = 2.0  v I .[2]
d
opyf
..d.
to,r!
siftJ 1t,rl
opyl
..d.
sllb.f
d
.,yl
rl,r2,rO
,RI =vo.[3]
2.0,.0
..d.
liD.I.,rt
5ubf
.nd
oddl
.nd
zz
CC,IZ
; zzcz+cc
zz
!'thu
s11l91.,rO
.ellto
constant:
; RI2.0vI.[3]
I RO [41 [3] I (2.0  * .[3ll
float
d
4'm
0.5
u  uu + xx ) .. 0.5 I c
single,rt
rO
rO,.1
IFYF
hlndltd.
r2,rO
r3,r3
; Rlv*x[Ol
rl,r2,rO
Jingl.,tO
opyf
..d.
slIbrf
I.
, const.. t 2"(2424/2)+1
Appendix B
165
166
r
~
!So
I
long
long
int
long
~.
...
::to.
~
;:
~
r;'
~
So
'"
if (operation = 1) z = x + Y;
if (operation ZIi: 2) z = x  y;
if (operation lIZ 3) z x I YI
if (operation = 4) z = x / YI
if (opetltion = 5) z II sqrt(x);
printfC\nz II 1.1Slg, z),
f.::i
N
a
<::>
z=x+Y;
printff'nz := 1.18L1~ z);
printfC\n\nTypt 0 to txU, ehe continue: It
scanf
.ill
,lit,
) IIIAn. (i != O}I
)
I'D
=
a
represent it lS O /
(=
t:=
""""
127) return(O);
t::'=
OIJQ
I'D
=
="~
;~
if Csignl .antiSSl
II
=
..0,=
o tIllJQ
, I'D
if (Mntiua. = OxOl000000H
exp++;
HntiS5I = 0x00900000,
return(Rntissa);
}
~~
I'D (')
I
exp += 127;
Mntissa = <M.ntissl "0x007fffff) I (Sign 8) : (exp
~OO
I'D .. '
...ntisH;
<::>
/f fxp=l28 corresponds to O.
i=l;
dol
printfCType h.o C30 hex nuab.r5:\n)1
printfl x = .);
sClAHXX,b1);
printfCy = .);
scanf('XX','YIl;
xl = c30tot(xl);
x = (long doublelilln ..t fltlxlll;
yl = c3Otoelyll,
y = (long double)(flfloat f)('Y1l I;
dol
p,;ntft'Addlll, SUbl21, lIpyl31, D;v141, Sq,t1511 "I;
scuf(Xd, '.ptrdion);
) wIli It (operation" :: operation)Sh
"CI
z,
double x, Y.
int xl, yl;
i, operation;
int c30toe(1ong inU;
if (exp
sign = x II 0x00800000;
exp = x
24;
MinD
too'
AU_bel'S
23)1
=
l!3,
~
til
.. ,
~
~
SO
...i.n
I
long doubl. x, Y.
I ... int Xl, yl, xxi. yyl,
ilt it ope....ti..n;
10., i.t c30t.. n.., I.tl,
jl,
I,
s.
:?J
~
~.
dol
p.i.tflMelIII. MI2I ..,,131. 010141. Sq.tISII '1,
1ClIf1'1d' .......U.. I,
(optrltio.<1 n optNtion)5);
) "il,
::...
if I.....ti .... II
if C.....ti 21
if C.....ti.... 31
if I.....U.... 4'
If lo....ti .... 51
l!'
::I.
S.
long int
ptintfll\u
II.
Lla.,I,
s.
.cull'U' .'YII,
xl c30tMlxll,
x non, d....I.II.lfl ..t '11"111;
yl c30tHlyll,
y' non, .....I.II.UI ..t 'II'YIIII
Z&x+Y1
"
Q
c
bp;
I: . . . .
ntissil
;
!'~
g.~
;':19
,=
l~g.
g' 7'
Fil'
~.~
t'I)
E.
.'
III
if I...ti ...
= 0.0100000011
exp++1
antiss.
0x008000001
if (exp
z),
prilltf(IZ )1
=
~
x,
sign. x 0x0Q800000.
expx241
I
Z =x )'l
z'xYI
z x "YI
I x / YI
Z Iq.tlxl,
output in hex/
print"'x '},
J:
In'
dol
.cutfl'U,bll,
prhUC'xx '1;
IcUf(U;' ,bxU,
prittfC'y I),
....fIU.'YII,
priAtfC' " II,
.clllfIU.,"11,
xl c30tHlxll,
xxi c30tHlxxll,
yl c30tHlyll,
yyl c30tHlyyll,
X' non, doubl.II'UI ..t '11..111
no., d.... I.II.lfl ..t 111...111,
y' non, doubl.II.lfl ..t 'II'YIII
non, _1.II'lfl ..t '11,"111,
>
>127)
scufC I U,lrxl),
printfClu III
~,
... =
=
~
51
gO
William Hobl
Digital Signal Processor ProductsSemiconductor Group
Texas Instruments
169
170
Introduction
In the general class of orthogonal transforms, there exists one in particular, the
discrete cosine transform (DCT), that has recently gained wide popularity in signal processing. The DCT has found applications in such areas as data compression, pattern recognition, and Weiner fIltering, primarily becauseofits close comparison to the KarhunenLoeve
Transform (KLT) with respect to rate distortion criteria [1]. Although the KLT is considered to be optimal, there is no fast algorithm to compute it. Since there is no fast KLT
algorithm, the DCT is an attractive alternative.
For image coding, the DCT works well because of the high correlation among adjacent data samples (pixel values). Because of this correlation, the DCT provides near optimal reduction while retaining high image quality. In a comparative study [2], the DCT
was shown to outperform the Fourier, Hartley, and cascas transforms for image com~
pression, providing even more motivation for finding fast implementations.
A number of algorithms have been developed, most notably those of HOll [3] and
Lee [4], which generate higherorder DCTs from lowerorder ones. This paper presents
two 8 X 8 DCT routines, one for the TMS320C25 and another for the TMS320C30, based
upon the routine in [3].
171
Zk =
 ex(k) NI
E
~
N
n=O
Xn
cos
(7r (2n+
I)k)
k = 0, I, . . . ,N  I
2N
(Ia)
xn
N.I
E ex(k)Zk cos (7r (2n+
~
N k=O
2N
I)k) k = 0, I, . . . ,N  I
(Ib)
~T(N)X'
(2)
where x and z are column vectors denoting the input and output data sequences, and T(N)
is the DCT matrix of order N. Actually, expanding the matrix (neglecting the factor of
172
Zo
Z2
ex
(3
ex
ex
ex
(j
(3
(3
(j
Xo
X2
(3)
(3
where
0(
and 0
= sin (~).
expressed as
Zo
Xo
Z4
0(
0(
0(
0(
0(
0(
0(
0(
x2
Z2
(3
0
(3
(3
0 (3
x4
Z6
(3
0
(3
(3
0 (3
x6
Zl
)..
P.
p
'Y
}..
p.
'Y
x7
Z5
P.
'Y
}..
p.
p
'Y
}..
X5
Z3
'Y
}..
P.
'Y
}..
p.
p
X3
Z7
'Y
}..
P.
p
'Y
}..
p.
Xl
where}..
cos
(~), 'Y =
cos
(4)
the input is no longer in natural order but has been rearranged according to the permutation
matrix P and the relation
x = Px,
(5)
where
An 8x 8
173
Upon examination, the matrix T(N) in (4), which is the matrix T(N) with the rows and
columns rearranged, can be described more compactly as
. T
(~)
15
(~)
T(N;
(6)
since the upper half of the 8point DCT is exactly the 4point DCT matrix previously
generated. Using the results obtained in [3], the relationship between
T
15 (~)
and
(~) is a given as
(7)
where
K
= RLRt,
R being the matrix that performs a bit reversal on the input data; L is the lower triangular
matrix
and Q
diag [ cos
1
2
1
2
2
2
1
2
2
2
2
2
1
2
2
2 2
(n
is now in bitreversed order. Signal flow graphs for 2point, 4point, and 8point DCTs
are shown in Figure 1, with the multipliers defined as in (4).
174
Zo
Zo
Xo
"
Zo
Xo
Z2
X,
X2
Za
a
X,
"
"
Z,
Z2
Z,
X3
1
Z,
2:1 MUX
"
Za
2ptDCT
1:2DEMUX
(b) 4Point
(a) 2Point
Zo
Xo
.z,
X,
Z2
X2
Zo
Xa
..
X.
"
"
Zo
Z,
"
"
Z3
I'
X.
"
"
Z,
"
Z2
Zo
Z.
tJ
X.
y
"
4ptDCT
X7
Z7
2:1 MUX
1:2DEMUX
(e) 8Point
Figure 1. Signal Flow Graphs for 2Point, 4Point, and 8Point DCTs
The structure of the algorithm looks very much like that of a Fast Fourier Transform
(FFT). since the most fundamental computation is a 2point butterfly. This routine is actua1ly
a generalized case of the CooleyTukey FFT algorithm with the addition of the recursion
at the end. If the equations for the signal flow graph are written explicitly. the recursive
nature of the DCT becomes clear; for a 4point DCT. we have
Zo
Z2
= Zoo
= Z2.
Z1
= z],
Z3 = 2Z3  Z],
175
Zo =
Z4
Z2
Z6
Z1
Z3
zs
Z7
ZO,
= Z4,
= Z2,
= Z6,
= Z],
= 2Z3
 Z],
= 2zs  Z3,
= 2Z7  zs
transform is obtained by completely reversing the direction of the signal flow graph; i.e.,
performing the bitreversal first, then the recursions and the butterflies, and finally, the
data permutation.
For the twodimensional case of interest, the OCT can be described in the form
z(k,I)
1 (,)
= 2
x(m,n)
= 1N
(11" (2m+
l)k) cos
2N
(11"
(11" {2n+
2N
1)0 (8a)
(8b)
where a (k) = ~ for k = 0, unity otherwise. Like the FFT, the OCT kernel is
separable, allowing the transform to be performed in two steps, first along the rows and
thenthe columns.
176
zo
Zl
A
'Y
P,
p
p,
Z2
fj
0
fj
fj
0
fj
X2
A
'Y
X3
a a
X4
A p,
x5
Xo
'Y A
Xl
Z3
'Y
p
A
p,
p,
Z4
a
a
Z5
p,
A
'Y
'Y
p
Z6
fj
fj
0
0
fj
fj
X6
Z7
p,
'Y
A
A
'Y
p,
p
X7
where A = cos
(~),
'Y = cos
(R)' p,
(7)
= sin (~)
16 .
The algorithm described above has been shown to be numerically stable for fixedpoint processors; however, to prevent serious data errors, truncation and roundoff must
be accounted for. A roundoff technique similar to the one in [6], is used to prescale the
matrix coefficients by (2 15  1). This product is then loaded into the accumulator with
a onebit left shift, effectively dividing it by 215. After a mUltiplication is performed, the
32bit value in the accumulator must be rounded to sixteen bits, where bits 13,14, and
15 are used to determine the value of the sixteenth bit. The TMS320C25 performs this
operation in a single instruction by adding 3000h to the accumulator product with a onebit left shift, as outlined in the code shown in Figure 2.
177
*
*
*
DCTINI
*
*
*
T2
LDPK
RSXM
SPM
LRLK
RPTK
BLKP
LRLK
RPTK
BLKP
RNDOFF
1
AR1,COEFF
EDATAIDATA
IDATA,* +
AR1,RNDOFF
10
EDATA,*+
SIGNEXTENSION MODE
LEFT SHIFT 1 BIT
COEFFICIENTS
VARIABLES
AR1,DST
MAR
LAR
LARK
LT
MPY
ZAC
RPTK
MAC
*+,AR2
AR2,SRC
AR3,7
* + ,AR2
C10
LTA
MPY
ADD
SACH
BANZ
* + ,AR1
C10
RNDOFF
*0+,AR3
t2,*,AR2
6
C11, * +
178
After the multiplications are computed, the results are stored in another array area
in transposed order; thus, a separate routine for transposing the matrix is not needed. Once
the rows are transformed, the pointers for the input and output matrices are exchanged.
When the procedure is repeated, the output is stored as rows, completing the transform.
Appendix A contains a complete program listing for the forward transform on the
TMS320C25. To perform an inverse nCT, the table of cosine coefficients should be
replaced with those used for an inverse transform.
*
BITREV
II
II
II
II
LDF
LDF
STF
STF
LDF
LDF
STF
STF
*ARO,RO
*AR2,R1
R1,*ARO
RO,*AR2
*AR1,RO
*AR3,R1
R1, *AR1
RO,*AR3
179
Because of the amount of data shuffling that occurs, an eightword scratchpad vector
has been created with four permanent pointers set up at every other memory location.
This allows access to each element in the vector (by predecrement or preincrement
addressing) without requiring constant alteration of one or two pointer locations. Although
there is no overhead for looping on the TMS320C30, straightline coding is used as much
as possible to increase performance.
You can transpose the DCT matrix in the same way as in the TMS320C25
implementation: namely, store the transformed row vector as a column vector in another
matrix and interchange the input and output pointers.
The complete routines for the forward and inverse transforms are given in Appendix B.
Results
The execution times and memory requirements for the two routines are given in
Table 1. For the TMS320C30 implementation, the forward transform contains the scale
so the transform is not unitary. When the signal flow is reversed,
factor of
1,
instructions accumulate and thetime required to perform the inverse transform actually
increases (see Table 1). This increase occurs because certain multiplications cannot be
performed in parallel with another instruction. The two times are identical on a TMS32OC25
because it uses a matrix routine to compute the transform.
Table 1. Execution Times and Memory Requirements
Device
Memory Required
Program
TMS320C25
TMS320C30
Time Required
'Data
(f'S)
232 words*
203 words
257.3 (forward)
232 words
203 words
257.3 (inverse)
148 words**
136 words
155 words
136 words
99.4 (forward)
107.9 (inverse)
180
Summary
Two routines for a twodimensional Discrete Cosine Transform are presented: one
for the TMS320C25 and one for the TMS320C30, with a development of the algorithm
given for clarification. This report also discussed the similarities of the DCT to the CooleyTukey FFT algorithm and arithmetic shortcuts which can reduce the DCT's execution
time. Although these implementations use the most recent formulation, there is still room
for investigation into more efficient methods. Another approach that might prove fruitful
is to deal with the entire 8 x 8 array all at once, as suggested by Haque [7], rather than
transforming the array by rows and columns. However, both routines given in the
appendices provide fast, numerically stable solutions for applications requiring the DCT.
Acknowledgements
The author thanks Steve Ford for supplying the original code for the TMS320C25
implementation. Francois Charlot helped in modifying the code for the TMS320C25, as
well as in preparing this manuscript. Daniel Chen improved the performance of the code
for both the TMS320C25 and the TMS320C30.
References
[1] Ahmed, N., Natarajan, T., and Rao, K.R. "Discrete Cosine Transform," IEEE
Transactions on Computing, vol. C23, pp. 9093, January 1974.
[2] Perkins, M. "A Comparison of the Hartley, CasCas, Fourier, and Discrete
Cosine Transforms for Image Coding," IEEE Transactions on Computing, yol.
36, pp. 758760, June 1988.
[3] Hou, H.S. "A Fast Recursive Algorithm for Computing the Discrete Cosine
Transform," IEEE Transactions on ASSP, vol. ASSP35, No. 10,
pp. 14551461, October 1987.
[4] Lee, B.G. "FCT  A Fast Cosine Transform," Proceedings of 1984 Conference
on ASSP, pp. 28.A.3.128.A.3.4, March 1984.
[5] Jayant, N.S., and Noll, P. Digital Coding of Waveforms, New York, PrenticeHall, 1984.
[6] Srinivasan, S., Jain, A.K., and Chin, T.M. "Cosine Transform Block Codec for
Images Using the TMS3201O," Proceedings of IEEE ISCAS '86, Cat. No.
86CH22558, vol. 1, pp. 299302.
[7] Haque, M.A. "A TwoDimensional Fast Cosine Tranform," IEEE Transactions
on ASSP, vol. ASSP33, pp. 15321539, December 1985.
181
00
fff*IHfHlltfffHfHHfHffHHfHHtftHHffHHfHitffHfHfffHIHHHHff*
MAC
COl,++
LTA
I1PY
ADD
t+,AR2
C_OO
IV
SACH
BANZ
**fHHHffHIHfHHffHHHfHHlffHfHfHHtHIH*HHHHUHHHffHHH
. title
LAR
12
OCTlNI
00
X
00
tl
1;;'
C ()
s.~
"'
~ ~.
DIMS
Q~
C C
1t ~'
~~
~;:!
1.A.i"'
N;:S
1
ARl,COEFF
EDATAIDATA
IDATA,*<
ARl,RNDOFF
10
EDATA,H
lARK
LARK
CNFP
AR7,1
MO,S
LAR
lAR
ADAK
AR7: DIMENSI(}Il
POINTER INCREI1ENT FOR DATA TRANSPOSITION
"MAC" NEEDS 1 OPERAND IN PROGRAtI MEMORY
T3
.fqU
T1
M3,7
ARl,SRC
AR2,DST
H
C_OO
>
'CI
'CI
(D
=
~
ft,AR2
UO
~
nI::'
13
Cll, ..
H,AR!
UO
RNDOFF
10+, AR3
T2, +,AR2
lARP
lARK
LT
I1PY
ZAC
RPTK
MAC
LTA
I1PY
ADD
SACH
BANZ
ARI,SRC
AR2,DST
2
1
AR3,7
H
UO
C21,ft
H,AR2
UO
RNDOFF
'0+,M3
T3,+,ARI
lAR
ciS
a 15
C;:S
>........
IJCl
0
""I
LARK
LAR
lAR
LT
I1PY
ZAC
RPTK
ARl,DST
H,AR2
AR2,SAC
1il3,7
; VARIABLES
v,'i+,
...
$PM
lAR
, LARK
LT
I1PY
ZAC
RPTK
MAC
LTAS
MPY
ADD
SACH
BANZ
; SIGNEXTENSI(}I MODE
; LEFT SHIFT 1 BIT
; COEFFICIENTS
(')
~~
~~
RNDOFF
LRLK
RPTK
BLKP
LRLK
RPTK
BLKP
;:s
;:s ~
LDPK
RSXM
'SxS OCT'
MAR
.sect
B
. text
RNIIOFF
<O+,1il3
n,<,ARI
14
AOOK
lARP
lAR
lARK
LT
I1PY
ZAC
ARl,DST
3
2
AR2,SAC
1il3,7
H
c..30
=a
8"
""I
;.
(D
13
~
'JJ
(M
N
Q
U1
5t
s.00X
~~
\.\l'"
~~
Q~
LAR
LAR
.
1;
LARP
~~
T5
~<:i
C~
Q~
C.
!f~
LARK
LT
If'y
lAC
RPTK
rIAC
LTA
If'y
ADD
SACH
BANI
TO
C41,H
".AR2
C.40
RNDOFF
to+.AR3
T5.".ARI
AlJRI(
LARP
Tb
LAR
LARK
LT
I'I'Y
lAC
RPTK
rIAC
LTA
If'Y
SEVENTH SET
LAR
LAR
ADRK
LARP
LARK
T7
LT
If'Y
lAC
LARP
It
BANI
LAC
lII10V
SACL
ClEFFICIENTS
ARI.DST
7
2
AR2.SRC
AR3.7
..
C.70
b
C71.*
H,ARI
C.70
RNDOFF
to+.AR3
T8,f,AR2
DST
SRC
SRC
AR7
DI/IS ..... ARI
CNFD
B
, STOP tRE
.i.Sfct
.labt1
.lIIord
ARI.SRC
AR2.DST
COO
COl
CO2
C03
C04
COS
.word
I
AR3.7
*
C.bO
COb
ord
,IIUIrd
.!liord
,lIIord
COEFFICIENTS
RNDOFF
to+.AR3
T7.".ARI
page
C51."
H,ARI
C.SO
RNDOFF
fO.,AR3
T6,f,AR2
~
Cbl,ft
".AR2
UO
STOP:
SACH
BANI
LAR
LARK
LT
If'y
lAC
RPTK
rIAC
LTA
If'Y
ADD
SACH
BANI
ARI.DST
5
2
AR2.SRC
AR3,7
C.SO
ADD
LARP
C.40
LAR
::to
AlJlI(
00
LAR
ARI.SRC
AR2.DST
4
I
AR3.7
AlJRI(
s.~
!II
~~
EIGHTH SET
1.1,("')
.,
rIAC
LTA
If'Y
ADD
SACH
BANI
C31,1+
H,ARt
UO
RNDOFF
'0+.AR3
T4, .... ,AR2
rIAC
LTA
If'y
ADD
SACH
BANI
!II
RPTJ(
RPTK
C07
CIO
Cll
CI2
.word
.lIIord
.word
,word
.word
'RC!lEF' .OFFOOh
IDATA
5792
5792
5792
5792
5792
5792
5792
5792
B034
6811
4551
,
,
SEcaGl ROW
COEFFICIENTS
CI3
CI4
CIS
Clb
CI7
C20
C21
.word
word
.word
.word
C22
.word
.word
,word
,word
.tlord
cn
C24
C25
C2b
C27
C30
C31
C32
~
00
X
00
tl
C
:I
1;;'
t')
C C
CbS
COb
lr~
COl
ClO
C71
cS
Q g,
C:I
oword
.word
,lIIord
,1II0rd
~~
~~
~'"
~~
tv:l
,lftIrd
,lIford
.1II0rd
.lIIord
C35
C3b
C37
C40
C41
C42
C43
C44
C45
C46
C47
C50
CSI
C52
CS3
C56
C57
C60
COl
CO2
C03
Cb4
.1II01"d
C34
C54
C55
9!
.,
.word
C33
s.f\
" n
~ ~,
,lIIOrd
,1II0rd
,1liioI'd
.word
.lIIord
,lIIord
.word
.word
,word
.lIIord
.word
,word
,lIIord
,lIIord
,word
,lIIIord
,\ford
.1II0rd
o!llord
,lIIord
.word
,lIIord
.1II0rd
,word
,lIIord
,lIIord
.1II0rd
,lIIord
en
1liiOI'd
C73
C74
C75
.lIIord
C77
,lIIord
cn
,lIIord
.word
.word
.libel
1598
1598
4551
0811
0034
75b8
3134
3134
7Sba
75b8
3134
3134
7Sba
ball
1598
0034
4551
4551
9034
1598
ball
5792
5792
5792
5792
5792
5792
5792
5792
4551
S034
159S
ball
ball
1598
0034
4551
3134
7Sba
75b8
3134
3134
7Sba
75b8
3134
1598
4551
ball
9034
0034
ball
4551
1598
DATA
.word
.word
.lfOrd
,lItOI'd
,word
,lIIor6
,word
. .,ord
.lfOrd
.,ord
.lfOrd
12288
PICT
RESULT
5792
0034
7Sba
ball
5792
4551
3134
1598
; RWOlfF FACTOR
;
;
;
;
;
;
;
;
;
;
AIDlESS IF PICT\H:
ADIIlE5S IF 1lESlX.T
COO ctEFFlCIEHT
CIO ClEFFICIEHT
C20 ctEFFICIEHT
C30 ctEFFlCIEHT
C40 ctEFFlCIEHT
C50 ClEFFiCIEHT
C60 ctEFFlCIEHT
C70 ctEFFlCIEHT
DATA rfFlNITIONS
; FOURTH ROW
; FIFTH ROW
[f
[f
CrfFFlCIENTS
COEFFICIENTS
CrfFF
.usect
,BSS
,BSS
,BSS
,BSS
,BSS
,ass
,ass
,ass
,BSS
,ass
,ass
,BSS
,BSS
; SIXTH ROW OF COEFFICIENTS
; EIGHTH ROW
[f
CrfFFICIENTS
.end
"COEFFS' ,b4
PICT,b4
IlESlX.T,b4
RHIKFF,I
SAC,I
OST,I
c..OO,1
UO,I
C20,1
C_lO,1
C_40,1
UO,I
c..bO,1
C_70, I
fHHHffflfffffffffHffffltHHffffHffHttHffHtfflfHHtHfHnHHHfHH
;:s
s.00X
<II
~~
td~
C~
n~
v,C"l
.
s.~
...
tffftfttttHfHlHtHfUtHlHffHttHtHtHHHfHffHHtttHtftHtHtHHHI
"
~~
~~
tdc
C~
C~
~
~
is
:=:o
;:s
_COS
INPUT
OUTPUT
SCRATCH
SCRlAST
RTNI
RTN2
.BSS
.BSS
.BSS
.global
.global
.di.ta
M,64
INF,64
SCR.8
COSTAS
START
.Mord
.word
.lIIord
.word
.Mord
.word
.liord
COSTAS
INP
M
LDI
LDI
LDI
LDI
LDP
LDI
lDI
LDI
lDF
LDF
LDI
RPTB
BRD
LDI
LDI
ADDI
00
VI
BLKl
5CR
5CR+7
TRIVlSl
TRANS2
. text
4
START
II
, SCRATCHPAD IIIORY
7,Re
2,IRO
S,IRI
S,B!(
!!SCRATCH
!SCRATCH, AR4
@ooTPUT,ARb
@INPUT,ARS
Q, 25, Rb
2.0,R7
!RTNl,R4
BU<1
OCT
ARS,ARO
ARS,ARI
I,ARl
4
TRANS2;
, SET BUFFER lOOTH=8
"
I:
,
,
;
;
VARIABLE lOCATIONS
00.05 IN'UT I1ATRIX
CONSTANT 0.25
CONSTANT 200
LDF
STF
INPUT I1ATRIX IS STORED IN RAIl, lIND THE RESUJS MI: STORED IN TIE SAllE
lOCATlCfI.
<II
a~
TRlillSlI
II
"
II
"
, POINTS TO IN'UT
"
"
LDF
STF
LDF
STF
LDF
STF
LDF
STF
LDF
STF
LDF
STF
LDF
STF
LDF
SUBI
OJ, ARb
LDI
LDI
LDI
LDI
!SCRATCH,AR4
lOUTPUT, ARS
@INPUT,ARb
7,Re
LDI
RPTB
BRD
LDI
LDI
ADDI
!RTN2,R4
BLK3
IICT
ARS,ARO
ARS,ARI
I,ARl
LDF
STF
LDF
STF
LDF
STF
LDF
STF
LDF
STF
LDF
STF
LDF
STF
LDF
STF
LDF
<AR4++IlIX,Rl
Rl,<ARb++IIRll
4AR4++IIIX,RI
Rl,<ARb++IIRll
<AR4++IlIX,Rl
Rl,tMb++IlRlI
<AR4++IllX,Rl
Rl,<ARb++IIRII
<AR4++IlIX,Rl
Rl,<ARb++IlRll
<AR4++IlIX,Rl
Rl,4ARb++IIRll
<AR4++IlIX,Rl
Rl,<AR6++IlRll
<AR4++IlIX,Rl
Rl,<ARb++IlRlI
<AR5++IlRll,RS
>
'CI
Q.
~.
CI::'
; DO OCT ON COLIiI'/j
, I'ECTMS
("')
03
IJQ
Q
"'!
=:
, POINTS TO INPUT
S[IJ
8"
"'!
e;.
~
03
a=
C'I'.l
~
=
=
("')
~
BU<3
SUBI
OJ,ARb
; IICREIIENT POINTERS
EHD
IIR
EHD
,END
00
0'\
LDI
LOI
LOI
AR4,AR2
@SCRLAST,AR3
!..COS,AR7
LDF
; POINTS TO OUTPUT
; TABLE POINTER
"
"
STF
STF
STF
STF
RI,+M3
R2,<ARI
RO,fAR3
R3,fARI
::
::
LDF
STF
STF
LDF
LDF
STF
STF
LDF
LDF
STF
STF
LOF
LDF
STF
STF
LOF
"
fAAOH( IROl,RO
fARI++IIROI,RI
RO, fAR2++ 111
RI,fAR31l1
fARO++1 lROI ,RO
fAR1++(IRO),Rl
"
"
RO. fAR2+t( 1)
"
RI, fAR311i
fARO++1 lROI ,RO
fARI++1 IROI ,RI
RO, fAR2HIIi
RI, fAR3111
"
I:
tDI
ADDI
LOI
ADDI
LOI
ADDI
LDI
ADDI
00
X
00
t;,
!:;'
C
;:s
<"')
~
s.~
~
('j
~ ~,
~~
~~
Q~
v.~
.,C
<::i
~~
"i5
"
"
"
"
"
~~
~~
rv;:S
oS
"
O;:S
"
a 6'
"
LOF
LDF
SUBF3
SUBF3
i'PYF3
ADOF3
MPYF3
ADDF3
STF
STF
STF
STF
LDF
LDF
SUBF3
SUBF3
i'PYF3
ADDF3
i'PYF3
ADDF3
AR4,ARO
I,ARO
ARO,ARI
2,ARI
ARI,AR2
2,AR2
AR2,AR3
2,AR3
; POINT TO OUTPUT
; SET UP POINTERS
fAR2,R2
;
'>AR2,R3
;
>AR2, fARO, RI
fAR2,fARO,RO ;
Rl,fAR7++(1),Rl
R3,*ARO,R3
;
RO,fAR7H III,RO
R2,fARO,R2
RI,fAR2
R2, 'ARO
RO,fAR2
R3,fARO
fAR3,R2
fAR3,R3
fAR3,fARI,RI
fAR3,fARl,RO
RI,fAR7++111 ,RI
R3,fARI,R3
ROt fAR7++( 1) IRO
R2,fARI,R2
"
"
"
"
LDF
SUBF3
SUBF3
i'PYF3
ADDF3
i'PYF3
ADDF3
STF
STF
STF
STF
LOF
LDF
SUBF3
SUBF3
i'PYF3
ADDF3
i'PYF3
ADDF3
STF
STF
STF
STF
fARI,R2
; THIS IS THE SAllE AS AIIOIIE EXCEPT THE
fARI,R3
PO INTERS CllANGE
<ARI, fMO,RI
fARl, fARO,RO
RI,fAR7++111 ,RI
R3, fARO,R3
RO,fAR7111 ,RO
R2, fARO, R2
Rl,fMl
R2, fARO
RO,fARI
R3,fARO
fAR3,R2
fAR3,R3
fAR3,fAR2,RI
fAR3, fAR2, RO
Rl.*AR7++(1l,Rl
R3, >AR2,R3
RO,fAR7t+{1l,RO
R2, fAR2,R2
Rl, fAR3
R2,fAR2
RO, fAR3
R3,fAR2
LOF
"
"
"
II
"
"
"
"
LDF
SUBF3
SUBF3
i'PYF3
ADDF3
MPYF3
ADDF3
STF
STF
STF
STF
LDF
UF
SUBF3
SUBF3
i'PYF3
ADDF3
i'PYF3
ADDF3
fARO,R2
fARI,R3
fARO,*"ARO,RI
fARt, IARl, RO
RI, fAR7,RI
R3, fARI,R3
RO,fIlR7,RO
R2,fARO,R2
RI, fARO
R2,fARQ
R3, <ARI
RO,fARI
fAR2,R2
fAR3,R3
fAR2, fAR2, RI
fAR3, fAR3,RO
RI,fAR7,RI
R3, +M3,R3
RO,fAR7,RO
R2, fAR2, R2
STF
STF
STF
STF
g :A.;:s
s.00X
!II
~~
~ ~.
t.)<')
ci;l
Q~
~g
to
....
So
!II
"
::
t.)c:i
?5~
~1i'
'G
is
g"
;:s
*
EXIT
LDF
LDF
STF
STF
LDF
LDF
STF
STF
tARO,RO
fAR2,Rl
RI.*IIRO
RO, *AR2
*ARI,RO
*1IR3,RI
RI.*IIRI
RO, ....AR3
COSTAS
MPYF3
I1PYF3
SUBF3
SUBF3
STF
STF
LASTLOOP MPYF3
I1PYF3
SUBF3
MPYF3
STF
SUBF3
SUBF3
STF
STF
R7,*1IR3,R2
R7,*1IR3,RI
ilARI.R2,R2
*ARI,RI,RI
RI, *AR3
R2, fAR3
R7, *ARI ,RO
R7, *AR2,RI
*AR0,RO,R2
R7, 1M3, R3
R2,IARl
*ARI,RI,RI
RI,R3,R3
; 2XI7lX(3)
, 2XI8lX(4)
;
,
,
;
X(4)=2*XI4)
X(6)=2*XI6)
R2=2XI4lX(2)
R3=2*X(3)
; RI=2X(6)x14)
, R3=2XIBlX(6)
RI,IM2
R3,*M3
00
~
I1PYF3
STF
.end
MPYF3
STF
I1PYF3
STF
I1PYF3
STF
I1PYF3
STF
MPYF3
STF
MPYF3
STF
MPYF3
STF
I1PYF3
STF
R6, *1IR3,RO
RD, *AR3 (I )
R6,*AR3,RI
RI,*1IR3(l)
R6,*AR3,RO
RO,tAR3111
R6,*AR3,RI
RI, tAR3 1II
R6, *AR3,RO
RO, *AR3 (I )
R6,*AR3,RI
RI,*AR31II
R6, *AR3, RD
RO, *AR311 )
Rb, *AR3,RI
RI, tAR3
R4
HIRO,RD
*AR7,RO,RO
RO,*ARO
; RETURN
; MLU BY IIS1lRT(2)
; STORE THE RESLI. T
BUD
LDF
!II
~~
~~
RI,*1IR2
R2, *AR2
R3,HIR3
RD,*AR3
; OK TO NOYE AR3
.float
.float
.float
.float
.fT oat
.float
. float
.end
0.980785280403
0.555570233019
0.195090322016
0.831469612303
0.923879532511
0. 382683432365
0.707106781188
LAMBDA
I1U
NU
GMIIA
BETA
DELTA
ALPHA
00
00
HH**fHHI***HH**************HfU*ffHHH**fUfHffHIHffHHHHHfD4
~ILLIAM
HOHL
INPUT MATRIX IS STORED. IN RAM, AND THE RESULTS ME STORED IN THE SAME
LOCATION.
H********II*******HH***************II**I************I**IHIIH*lIHHI*++*
.BSS
.global
.global
;:s
00
X
00
\:l
0;'
(")
,\I!ord
INPUT
OUTPUT
SCRATCH
RTNI
RTN2
.'ilord
;:s ~
s.~
'" \:l
~ ~.
~'"
~~
...,,~
C C
..., ~
1r~
'15
~~
~'"
tv;:S
cS
g'
TRANS!:
OOT,64
INP,64
SCR,S
COSIAB
START
"
::
:1
"
"
BLKI
.d~ta
_COS
START
tl
"
ass
.ass
"
.lIIord
.lIIord
.lIIord
.lIlol'd
COS_TAB
INP
OOT
SCR
TRANSI
TRANS2
TRANS2:
LOI
lOI
lOI
LOF
lOI
lOP
lOi
LOI
LOI
7,Re
2,IRO
S,IRI
2.0,R7
8,BK
@OUTPUT
MTPUT,AR6
SUBI
63, ARb
LOI
lOI
lOI
lOI
@INPUT,AR6
~OUTPUT, M5
&RATCH,AR4
7,RC
, REALIGN POINTERS
@RTN2.R4
, RETURN ADDRESS
@INPUT,AR5
LDl
RPTB
BRO
LDI
LOI
ADDI
@RTNl,R4
BLKI
IOCT
AR5,ARO
lOF
STF
'AR4++ (111., Rl
Rl, 'AR6++( IRll
, VARIABLE LOCATIONS
~_COS,AR7
BLK6
SUBI
63,M6
END
BR
END
::
"
"
, POINT TO INPllT
, TABLE POINTER
I,ARO
"
::
::
"
, POINT TO INPUT
, TABLE POINTER
IDCT
AR5,MO
IAR4++(11'l.,Rl
RI, *AR6++( lRlI
*AR4 tt ll)'/.,Rl
Rl,*AR6++IIRU
*AR4tt( lJX,R1
RI, <AR6++(JRI)
tM4++(U'l.,Rl
RI,<AR6++Rli
.AR4++(IlX,Rl
Rl,'AR6++(IRli
fAR4++(U'l.,Rl
Rl,<AR6++(IRli
fAR4+t(UX,Rl
Rl,'AR6++(1R1l
<AR4++(1II,Rl
Rl, <AR6++( 1R1l
<AR5++(IRllRS
"
, MULTIPLIER
, SH ruFFER LNGTH=64
BlJ(6
LOF
STF
LDF
STF
LOF
STF
IJlF
STF
lOF
STF
IJlF
STF
LOF
STF
IJlF
STF
LDF
~SCRATCH,AR4
I,ARO
fAR4++( 1II, Rl
Rl,fAR6++(lRlI
fAR4++(IlX,Rl
Rl,fAR6++(lRlI
IAR4++(lJX,Rl
Rl, <AR6++!IRli
*AR4 t t(llX,Rl
Rl, fAR6++( lRlI
*AR4++(1II,Rl
Rl,*AR6++(IRli
<AR4++(1II,RI
RI,<AR6++(lRI}
fAR4tt(l)X,Rl
Rl, <AR6++ (lRll
IAR5+t(lRU,RS
LOI
RPTB
Il!lD
lOI
lOI
ADOI
. text
~_COS,AR7
IJlF
STF
IJlF
STF
LDF
STF
lOF
STF
IJlF
STF
lOF
STF
LDF
STF
LDF
, END
(f
SUBROUTINE
l ~
IOCT
SOo
<b
X
~~
1..1.>'"
C<b
Nq
Q~
1.1,0
S~
<b
~~
~~
I..I.>'>
~~
a~
:1
"
"
C.
"
SECLOOP
!:i
::to
l
ARO,ARl
2,ARt
ARt , AJl2
2,AR2
AJl2,AR3
2,AR3
LDF
rlPYF3
STF
'ARO,RO
fAR7,RO,RO
RO, 'flRO
"
"
"
, MLU BY t/SOOT(2)
, STORE THE RESULT
1;
... .
C
LDI
ADD!
LDI
ADDI
LDI
ADDI
"
"
SUBF3
SUBF3
MPYF3
STF
SUBF3
MPYF3
STF
STF
STF
MPYF3
STF
STF
SUBF3
SUeF)
MPYF3
STF
MPYF3
STF
STF
STF
CORRECT ORDER
f
BITREV
LDF
LIlF
"
STF
STF
LDF
LDF
STF
STF
tAR3, *AR2,R2
R2, tARt,R3
*AR3,R7,RO
R2,*AR2
R3,*ARO,R2
fAR2,R7,Rt
R3,*ARl
RO, *AR3
Rt,'AR2
tARt,R7,RO
RO, fARt
R2, fARO
4AR3,fAAt,R2
'AR3, 'ARt,R)
R7, 'AA3, RO
R2, IARt
R7,.AR3,Rt
R3, *ARt
RO, 'AA3
Rt,*AR3
, XI6HIS)
, X14H16)
, 2XI8))RO
"
, X13H17)
, XI4HI81
, 2'XI7)
, 2*X181
00
\0
LDF
IARO,RO
*ARt,Rt
LOF
IAR2,R2
LDF
MPYF3
MPYFl
MPYF3
fAR3,R3
1AR7,Rt,Rt
*AR7,RO,RO
tAR7,R2,R2
"
:1
4AR7++lll,R3,R3
Rt,'ARt
RO, fARO
R2,4AR2
R3,4AR3
fARO,R2
,
fARI,R3
,
tARO, 4ARO, RO
4ARt,'ARt,Rt
RO, 'ARO
,
Rl,*+AR7,Rl
R3, 4ARt ,RO
RO, .AR7,RO
,
R2, *flRO,R2
R2, HRO
RO, HRt
Rt,*ARt
LDF
LDF
SUBF3
SUBF3
STF
MPYF3
ADDF3
MPYF3
ADDF3
STF
STF
STF
*AR2, R2
*AR3,R3
*AR2, *AR2, RO
*AR3, *AR3,Rl
DELTA
BETA
RO,*AR2
Rt, ftAR7,Rt
, DELTA ON NEXT GROUP
R3,HR3,RD
RO,fAR7++IIRO),RO ,BETA ON NEXT GROUP
R2. *AR2, R2
R2, *AR2
RO,HR3
Rt, *AR3
"
"
"
LDF
"
, X12H14)
, 2'XI6))Rt
"
If'YF3
STF
STF
STF
STF
LDF
UF
SUBF3
SUBF3
STF
MPYF3
ADDF3
MPYF3
ADDF3
STF
STF
STF
"
"
, PERFORM TI ALPHA MllT's
"
LDF
LDF
SUeF3
SUeF3
ADDF3
ADDF3
STF
STF
STF
STF
fARt,R2
, THIS IS THE SA/1E AS ABOVE, EXCEPT THE
, POINTERS CHANGE
*AIU,R3
*ARt,*ARO,Rt
*ARt, *ARO, RO
R3, 'ARO,R3
R2,HRD,R2
Rt,'ARt
R2, 'ARO
RO,*ARl
R3, 'ARQ
LDF
UF
SUeF3
SUeF3
MPYF3
ADDf3
MPVF3
ADDF)
MPYF3
MPYF3
'AR3,R2
fAR3,R3
'AR3, ,AR2, Rt
'AR3, 'AR2,RO
1AR7++11I,Rt,Rt , NU
R3, *AR2,R3
1AR7++11l,RO,RO ,IiAI1MA
R2, 'AR2,R2
R2,fM7++(U,R2 ; LAMBDA
R3,*AR7,R3
,MU
::
"
STF
STF
STF
STF
Rl,'AR3
R2,'AR2
RO,4AR3
R3, +AR2
COS_TAB
::
::
LDF
LDF
SUEr3
SUEr3
ADDF3
ADDF3
STF
STF
STF
STF
LOF
LDF
;::
00
X
00
<:)
"
"
LDI
LDI
ADD!
LDI
LD!
ADD!
LDF
s.~
<"'l
~ ,
~~
~~
"
v,~
"
fJ
<:)
..,
"
<::)
"
~~
"I:i
"
~~
~:!
"
ciS
"
~~
a
6'
c;::
;:: ~
~
SUBF3
SUEF3
ADDF3
ADDF3
STF
STF
STF
STF
+AR2,R2
'AR2,R3
*AR2,fARO,Rl
'AR2, 4ARO, RO
R3,'ARO,R3
R2,+ARO,R2
Rl,4AR2
R2,'ARO
RO. 'AR2
R3, 'ARO
'AR3,R2
'AR3,R3
lIAR3,'AR1,Rl
fAR3, fARl,RO
R3,4AR1,R3
R2,4ARI,R2
RI,4AR3
R2, 'ARI
RO,4AR3
R3,fARl
"
LDF
STF
STF
LDF
LDr
STF
STF
LDF
LDF
EUD
STF
STF
LDF
LDF
STF
STF
.~nd
AR4,ARO
AR4,ARI
I,ARI
AR5,AR2
7,AR3
AR2,AR3,AR3
+AR2++III,RO
+AR311l ,RI
RO, +ARO++lIROI
RI, 4ARI ++IIROI
+AR2++I1I,RO
+AR311I,RI
RO, +ARO++IIROI
Rl, *AR1++( IRO)
, POINTS TO SCRJITCH
, POINTS TO HI'UT
, VECTOR
, GOING UP
, GOING DOWN
fAR2++( 1) ,RO
4AR311l ,RI
R4
; RETURN HOlE
RO, .AAO++( IRO)
Rl, *ARl++<IRO)
fAR2++{ l",RO
4AR311I,RI
RO, 'ARO++IIRO)
RI,+ARI++IIROI
.global
.data.
.float
. float
.float
. float
.float
.float
.float
.end
COUAB
0.707106781188
0.923879532511
0.382683432365
0. 19509032L"Q16
0.831469612303
0.980785280403
0.555570233019
ALPHA
BETA
DELTA
NU
GAIi1A
LAMBDA
MU
Sen Kuo
Northern Illinois University
Chein Chen
Digital Signal Processor ProductsSemiconductor Group
Texas Instruments
191
192
Introduction
A filter selects or controls the characteristics of the signal it produces by conditioning the incoming signal. The coefficients of the filter determine its characteristics and output
a priori in many cases. Often, a specific output is desired, but the coefficients of the filter
cannot be determined at the outset. An example is an echo canceller; the desired output
cancels the echo signal (an output result of zero when there is no other input signal). In
this case, the coefficients cannot be determined initially since they depend on changing
line or transmission conditions. For applications such as this, it is necessary to rely on
adaptive filtering techniques.
An adaptive filter is a filter containing coefficients that are updated by an adaptive
algorithm to optimize the filter's response to a desired performance criterion. In general,
adaptive filters consist of two distinct parts: a filter, whose structure is designed to perform a desired processing function; and an adaptive algorithm, for adjusting the coefficients of that filter to improve its performance, as illustrated in Figure 1. The incoming
signal, x(n), is weighted in a digital filter to produce an output, yen). The adaptive algorithm
adjusts the weights in the filter to minimize the error, e(n), between the filter output, yen),
and the desired response of the filter, den). Because of their robust performance in the
unknown and timevariant environment, adaptive filters have been widely used from
telecommunications to control.
din)                ,
1....__ .In)
FILTER
STRUCTURE
I    ....~.........yln)
ADAPTIVE
FILTER
193
Adaptive filters can be used in various applications with different input and output
configurations. In many applications requiring realtime operation, such as adaptive prediction, channel equalization, echo cancellation, and noise cancellation, an adaptive filter
implementation based on a programmable digital signal processor (DSP) has many advantages over other approaches such as a hardwired adaptive filter. Not only are power,
space, and manufacturing requirements greatly reduced, but also programmability provides flexibility for system upgrade and software improvement.
The early research on adaptive filters was concerned with adaptive antennas [1] and
adaptive equalization of digital transmission systems [2]. Much of the reported research
on the adaptive filter has been based on Widrow's wellknown Least Mean Square (LMS)
algorithm, because the LMS algorithm is relatively simple to design and implement, and
it is wellunderstood and wellsuited for many applications. All the filter structures and
update algorithms discussed in this application report are Finite Impulse Response (FIR)
filter structures and LMStype algorithms. However, for a particular application, adaptive filters can be implemented in a variety of structures and adaptation algorithms [1,
3 through 9]. These structures and algorithms generally trade increased complexity for
improved performance. An interactive software package to evaluate the performance of
adaptive filters has also been developed [10].
The complexity of an adaptive filter implementation is usually measured in terms
of its multiplication rate and storage requirement. However, the data flow and data
manipulation capabilities of a DSP are also major factors in implementing adaptive filter
systems. Parallel hardware multiplier, pipeline architecture, and fast onchip memory size
are major features of most DSPs [11, 12] and can make filter implementation more efficient.
Two such devices, the TMS320C25 and TMS320C30 from Texas Instruments [13,
14], have been chosen as the processors for fixedpoint and floatingpoint arithmetic. They
combine the power, high speed, flexibility, and an architecture optimized for adaptive
signal processing. The instruction execution time is 80 ns for the TMS320C25 and only
60 ns for the TMS320C30. Most instructions execute in a single cycle, and the architectures of both processors make it possible to execute more than one operation per instruction. For example, in one instruction, the TMS320C25 processor can generate an instruction
address and fetch that instruction, decode the instruction, perform one or two data moves
(if the second data is from program memory), update one address pointe~, and perform
one or two computations (multiplication and accumulation). These processors are
designed for realtime tasks in telecommunications, speech processing, image processing, and highspeed control, etc.
194
To direct the present research toward realistic realtime applications, three adaptive
structures were implemented:
1. Transversal
2. Symmetric transversal
3. Lattice
Each structure utilizes five different update algorithms:
1.
2.
3.
4.
5.
LMS
Normalized LMS
Leaky LMS
Signerror LMS
Signsign LMS
Each structure with its adaptation algorithms is implemented using the TMS320C25
with fixedpoint arithmetic and the TMS320C30 with floatingpoint arithmetic. The processor assembly code is included in the Appendix for each implementation. The assembly
code for each .structure and adaptation strategy can be readily modified by the reader to
fit his/her applications and could be incorporated into a C function library as callable
routines.
In this application report, the applications of adaptive ftlters, such as adaptive prediction, adaptive eqUalization, adaptive echo cancellation, and adaptive noise cancellation
are presented first. Next, the implementation of the three filter structures and five adap,
tive algorithms with the TMS320C25 and TMS320C30 is described. This is followed by
the practical considerations on the implementation of these adaptive ftlters. The remainder
of the application report covers coding options, such as the routine libraries that support
both assembly and C languages.
Adaptive Prediction
Adaptive prediction [16 through 18] is illustrated in Figure 2. In the general application of adaptive prediction, the signals are x(n)  delayed version of original signal,
d(n)  original input signal, y(n)  predicted signal, and e(n)  prediction error or
residual.
Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30
195
dln)......            ,
1.......... eln)
DELAY
xln)
ADAPTIVE
FILTER
yIn)
a2* u(n2)
+ ...... +
am* u(nm)
v(n)
where at. a2, .... , am are the AR parameters. Thus, the present value of the process u(n)
equals a finite linear combination of past values of the process plus an error term v(n).
This adaptive AR model provides a practical means to measure the instantaneous frequency of input signal. The adaptive predictor can also be used to detect and enhance a narrow
band signal embedded in broad band noise. This Adaptive Line Enhancer (ALE) provides
at its output yen) a sinusoid with an enhanced signaltonoise ratio, while the sinusoidal
components are reduced at the error output e(n).
Adaptive Equalization
Figure 3 shows another model known as adaptive equalization [2, 9, 15]. The signals
in the adaptive equalization model are defined as x(n)  received signal (filtered version
of transmitted signal) plus channel noise, den)  detected data signal (data mode) or pseudo
random number (training mode), yen)  equalized signal used to detect received data,
and e(n)  residual intersymbol interference plus noise.
196
DATA
, . .       , MODE
yIn)
xln)
TRAINING
MODE ,        ,
PSEUDO
RANDOM
NUMBER
GENERATOR
ADAPTIVE
FILTER
SLICER
+
dIn)
rI
HYBRID I
,,
,,
,,,
,,
,
xln) ~~+9__1Ir
ADAPTIVE
FILTER
ECHO
PATH
yIn)
+
eln) ....414
,..__'1_
,
NEAREND
SIGNAL
I
'
IL.. _ _ _ _ _ _ _ _ _ I'
Figure 4. Block Diagram of an Echo Canceller
Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30
197
The adaptive echo cancellers are used in practical applications of cancelling echoes
for longdistance telephone voice communication, fullduplex voiceband data modems,
and highperformance audioconferencing systems. To overcome the echo problem, echo
cancellers are installed at both ends of the network. The cancellation is achieved by
estimating the echo and subtracting it from the return signal.
SIGNAL
SOURCE
"'..1" e(n)
xln)
NOISE SOURCE
ADAPTIVE
FILTER
yIn)
Application Summary
The above list of applications is not exhaustive and is limited primarily to applications within the field of telecommunications. Adaptive filtering has been used extensively
in the context of many other fields including, but not limited to, instantaneous frequency
tracking, intrusion detection, acoustic Doppler extraction, online system identification,
geophysical signal processing, biomedical sign~ processing, the elimination of radar clutter,
beamforming, sonar processing, active sound cancellation, and adaptive control.
198
(1)
i=O
where ~(n)=[x(n) x(nl) ... x(nN+l)]T is the input vector, ~(n)=[wo(n) wl(n) ...
wNl(n)]T is the weight vector, T denotes transpose, n is the time index, and N is the
order of filter. This example is in the form of a finite impulse response filter as well as
the convolution (inner product) of two vectors ~(n) and ~(n). The implementation of Equation (1) is illustrated using the following C program:
y[n] = 0.;
for (i = 0; i < N; i + +) [
y[n] + = wn[i]*xn[i];
[
where wn [i] denotes wi(n) and xn[i] represents x(n i).
199
xln2)
xln 1)
x(n)
xlnN+ 1)
Z1
Z1
Z1
PFC
Weights
80
Data Buffer
B1
ARn
Data Bus
Program
Bus
T(16)
MULTIPLER
P(32)
ACC(32)
200
The MACD instruction enables complete multiply/accumulate, data move, and pointer
update operations to be completed in a single instruction cycle (80 ns) if fIlter coefficients
are stored in onchip RAM or ROM or in offchip program memory with zero wait states.
Since the adaptive weights wj(n) need to be updated in every iteration, the fIlter coefficients must be stored in RAM. The implementation of the inner product in Equation (1)
can be made even more efficient with a repeat instruction, RPTK. An Nweight transversal fIlter can be implemented as follows [23]:
LARP
. LRLK
RPTK
MACD
ARn
ARn,LASTAP
Nl
COEFFP,*
(A)
Where ARn is an auxiliary address register that points to x(n  N + 1), and the Prefetch
Counter (PFC) points to the last weight wN 1 (n) indicated by COEFFP. When the MACD
instruction is repeated, the coefficient address is transferred to the PFC and is incremented
by_ one during its operation. Therefore, the components of weight vector 'y!::(n) are stored
in BO as
Low Address
PFC    
w1(n)
HIgh Address
The MACD in repeat mode will also copy data pointed to by ARn, to the next higher
onchip RAM location. The buffer memories of transversal fIlter are therefore stored as
Low Address
x(n)
x(n1)
....
x(nN+2)
ARn ____
x(nN+1)
HIgh Address
201
The architecture ofTMS320C30 [14] is quite different from TI's second generation
processors. Instead of using program/data memory, it provides two data address buses
to do the data memory manipulations. This feature allows two .data memory addresses
to be generated at the same time. Hence, parallel data store, load, or one data store with
one data load can be done simultaneously. Such capabilities make the programming much
easier and more flexible. Since the hardware multiplier and arithmetic logic unit (ALU)
of TMS320C30 are separated, with proper operand arrangement, the processor can do
one multiplication and one addition or subtraction at the same time. With these two combined features, the TMS320C30 can execute several other parallel instructions. These
parallel instructions can be found in Section 11 of the ThirdGeneration TMS320 User's
Guide [14]. Associating with single repeat instruction RPTS, an inner product in Equation (1) can be implemented as follows:
II
MPYF3
RPTS
MPYF3
ADDF3
ADDF3
; w[O].x[O]
; Repeat N 1 times
; y[] = w[].x[]
; Include last product
where auxiliary registers ARO and ARI point to x and w arrays. The addition in the parallel
instruction sums the previous values of Rl and R2. Therefore, Rl is initialized with the
first product prior to the repeat instruction RPTS.
Note that the implementation above does not move the data in the x array like MACD
does in TMS320C25. For filter delay taps, the TMS320C30 uses a circular buffer method
to implement the delay line. This method reserves a certain size of memory for the buffer
and uses a pointer to indicate the beginning of the buffer. Instead of moving data to next
memory location, the pointer is updated to point to the previous memory location.
Therefore, from the new beginning of the buffer, it has the effect of the tapped delay line.
When the value of the pointer exceeds the end of the buffer, it will be circled around
to the other end of the buffer. It works just like joining two ends of the buffer together
as a necklace. Thus, new data is within the circular queue, pointed to by ARO, replacing
202
the oldest value. However, from an adaptive filter point of view, data doesn't have to
be moved at this point yet.
TMS320C30 has a 32bit floating point multiplier and the result from the multiplier is
put and accumulated into a 40bit extended precision register. If the input from AID converter is equal to or less than 16 bits, there is no roundoff noise after multiplication.
Theoretically, the TMS320C30 can implement a very high order of adaptive filter.
However, for the most efficient implementation, the limitation of filter order is 2K because
the TMS320C30 external data write requires at least two cycles. If the filter coefficients
are put in somewhere other than internal data RAM, the instruction cycles will be increased.
(2)
den) yen),
where den) is the desired signal and yen) is the filter output. The input vector ~(n) and
e(n) are used to update the adaptive filter coefficients according to a criterion that is to
be minimized. The criterion employed in this section is the meansquare error (MSE)E:
E
= E[e2 (n)]
(3)
where E [.] denotes the expectation operator. Ify(n) from Equation (1) is substituted into
Equation (2), then Equation (3) can be expressed as
E
E[d2 (n)]
~T(n)R~(n)
 2
~T(n)
(4)
where R = E[x(n)x T(n)] is the N x N autocorrelation matrix, which indicates the sampletosample correlation within a signal, and ~ = E [den) ~(n)] is the N x 1 crosscorrelation
vector, which indicates the correlation between the desired signal d(n)and the input signal
vector ~(n).
The optimum solution w*
rived by solving the equation
= [wo* Wl*
OE
= 0
o~(n)
(5)
~6)
203
If the R matrix has full rank (i.e., R1 exists), the optimum weights are obtained by
~*
R1 ~
(7)
In Linear Predictive Coding (LPC) of a speech signal, the input speech is divided
into short segments, the quantities of Rand .Q are estimated, and the optimal weights corresponding to each segment are computed. This procedure is called a blockbyblock dataadaptive algorithm [24].
A widely used LMS algorithm is an alternative algorithm that adapts the weights
on a samplebysample basis. Since.this method can avoid the complicated computation
of R 1 and .Q, this algorithm is a practical method for finding close approximate solutions
to Equation (7) in real time. The LMS algorithm is the steepest descent method in which
the next weight vector w(n + 1) is increased by a change proportional to the negative gradient of meansquareerror performance surface in Equation (7)
~(n+ 1) = ~(n)
(8)
 u'V (n)
where u is the adaptation step size that controls the stability and the convergence rate.
For the LMS algorithm, the gradient at the nth iteration, 'V (n), is estimated by assuming
squared error e2 (n) as an estimate of the MSE in Equation (3). Thus, the expression for
the gradient estimate can be simplified to
o[e2(n)]
o~(n)
 2 e(n)
~(n)
(9)
Substitution of this instantaneous gradient estimate into Equation (8) yields the
WidrowHoff LMS algorithm
,
~(n +
(10)
204
~(n)
will
(11)
where Amax is the largest eigenvalue of the matrix R. Amax can be bounded by
Nl
= N r(O)
(12)
r (0)
i=O
For adaptive signal processing applications, the most important practical consideration is the speed of convergence, which determines the ability of the filter to track nonstationary signals. Generally speaking, weight vector convergence is attained only when the
slowest weight has converged. The time constant of the slowest mode is [1]
t
=UAmin
(13)
This indicates that the time constant for weight convergence is inversely proportional to u and also depends on the eigenvalues of the autocorrelation matrix of the input.
With the disparate eigenvalues, i.e., Amax> > Amin, the setting time is limited by the
slowest mode, Amin. Figure 8 shows the relaxation of the mean square error from its initial value EO toward the optimal value Emin.
Adaptation based on a gradient estimate results in noise in the weight vector, therefore
a loss in performance. This noise in the adaptive process causes the steady state weight
vector to vary randomly about the optimum weight vector. The accuracy of weight vector
in steady state is measured by excess mean square error (excess'MSE = E [E  Emin])'
The excess MSE in the LMS algorithm [1] is
excess MSE = u Tr[R] Emin
(14)
205
10 3
w,
Initial Wo = 0.2.
 1.0
30.00
22.50
15.00
7.50
.00
64.75
128.50
192.25
256.00
Iteration
206
Since u*e(n) is constant for N weights update, the error signal e(n) is first multiplied
by u to get ue(n). This constant can be computed first and then multiplied by x(n) to update w(n). An implementation method of the LMS algorithm in Equation (10) is illustrated
as
ue(n) = u*e[n];
for (i=O; i<N; i++) [
wn[i] + = uen * xn[i];
TMS320C25 Implementation
The TMS320C25 provides two powerful instructions (ZALR and MPYA) to perform the update example in Equation (10).
ZALR loads a data memory value into the highorder half of the accumulator while rounding the value by setting bit 15 of the accumulator
to one and setting bits 014 of the accumulator to zero. The rounding is
necessary because it can reduce the roundoff noise from multiplication.
Assuming that ue(n) is stored in T and the address pointer is pointing to AR3, the
adaptation of each weight is shown in the following instruction sequence:
LRLK ARl,N 1
LRLK AR2,COEFFD
LRLK AR3,LASTAP+ 1
MPY
. ADAP ZALR
MPYA
SACH
BANZ
*,AR2
* ,AR3
*,AR2
*+,O,ARI
ADAP,*,AR2
;
;
;
;
;
;
;
;
;
;
;
;
;
For each iteration, N instruction cycles are needed to perform Equation (1), 6N instruction cycles are needed to perform weight updates in Equation (10), and the total number
of instruction cycles needed is 7N + 28. An example of a TMS32OC25 program implementing a LMS transversal fIlter is presented in Appendix AI. Note that BANZ needs three
instruction cycles to execute. This can be avoided by using straight line code, which requires 4N + 33 instruction cycles [25].
Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30
207
TMS320C30 Implementation
Although the TMS32OC30 doesn't provide any specific instruction for adaptive filter
coefficients up<iate, it still can achieve the weight updating in two instructions because
of its powerful architecture. The TMS320C30 has a repeat block instruction RPTB, which
allows a block of instructions to be repeated a number of times without any penalty for
looping. A single repeat mode, RM, in the status register, ST, and three registers  repeat
start address (RS), repeat end address (RE), and repeat counter (RC)  control the block
repeat. When RM is set, the PC repeats the instructions between RS and RE a number
of times, which is determined by the value of RC. The repeat modes repeat a block of
code at least once in a typical operation. The repeat counter should be loaded with one
less than the desired number of repetitions. Assuming the error signal e(n) in Equation
(10) is stored in R7, the adaptation of filter coefficients is shown as follows:
MPYF3
LDI
RPTB
MPYF3
IIADDF3
LMS
STF
MPYF3
IIADDF3
STF
ADDF3
STF
*ARO+ +(1)%,R7,Rl
order3,RC
LMS
*ARO+ +(1)%,R7,Rl
*ARl,Rl,R2
R2, *ARI + +(1)%
; Rl = u*e(n)*x(n)
; Initialize repeat counter
; Do i = 0, N3
; Compute u*e(n)*x(n i 1)
; Compute wi(n) + u*e(n)*x(n i)
; Store wi(n+l)
*ARO,R7,Rl
*ARl,RI,R2
R2, *ARI + +(1)%
*ARl,Rl,R2
R2, *ARI + +(1)%
; For i = N2
; Store wN2(n+l)
; Include last w
; Store wNl(n+l)
where auxiliary register ARO and ARI point to x and w arrays. Rl is updated before loop
since the accumulation in the parallel instruction uses the previous value in Rl. In order
to update x array pointer to the new beginning of the data buffer for next iteration (i.e.,
perform the data move), one of the loop instruction set has been taken out of loop and
modified by eliminating the incrementation of ARO.
To perform an N weight adaptive LMS transversal filter on TMS320C30 requires
3N + 15 instruction cycles. There are N and 2N instruction cycles to perform Equations
(1) and (10), respectively. The TMS32OC30 example program is given in Appendix A2.
The LMS algorithm considerably reduces the computational requirements by using
a simplified mean square error estimator (an estimate of the gradient). This algorithm has
proved useful and effective in many applications. However, it has several limitations in
performance such as the slow initial convergence, the undesirable dependence of its convergence rate on input signal statistics, and an excess mean square error still in existence
after convergence.
208
______ _
xln).......... Z1
___
Z1
z1
I<~
yIn)
y(n)
(ISa)
i=O
where N is an even number. Note that, for fixedpoint processors, the addition in the
brackets may introduce overflow because the input signals x(n i) and x(n  N +i + 1) are
in the range of 1 and 1  2 15. This problem can be solved by shifting x(n) to the right
one bit. The update of the weight vector is
wj(n+I) = wj(n) + ue(n)[x(nl) + x(nN+i+I)]
(ISb)
209
for i=O,I, ... ,(NI21), which requires N/2 multiplications and N additions. Theoretically, this symmetric structure can also reduce computational complexity since such filters
require only half the multiplications of the general transversal filter. However, it is true
only for the TMS320C30 processor. When a filter is implemented on the TMS320C25,
the transversal structure is more efficient than the symmetric transversal structure due
to the pipeline multiplication and accumulation instruction MACD, which is optimized
to implement convolution in Equation (1).
TMS320C25 Implementation
For TMS320C25, in order to implement the instructions MAC, ZALR, and MPYA,
we can trade memory requirements for computation saving by defining
z(ni) = x(ni) + x(nN+i+l) , i=O,I, ... ,N/2 1
(16a)
y(n) =
wj(n) z(ni)
(16b)
i=O
Wj(n+ 1) = wj(n) + u e(n) z(ni) , i=O,I, ... ,N/2 1
(16c)
SYM
LARK
LRLK
LRLK
LRLK
LARP
LAC
ADD
SACL
BANZ
AR1, N/21
AR2,LASTX
AR3,FIRSTX
AR4,FIRSTZ
AR3
*+,O,AR2
*,O,AR4
*+,O,ARI
SYM,*,AR3
;
;
;
;
Counter
Point to
Point to
Point to
= N/2 1
x(nN+l)
x(n)
z(n)
The instruction sequence to implement the LMS algorithm in Equations (1) and(lO)
can be used to implement Equations (16b) and (16c), except using MAC instead ofMACD
in Program (A). Therefore, N instruction cycles are needed to shift data in x(n), 3N instruction cycles are needed to implement Equation (16a), N/2 for Equation (16b), and
3N for Equation (16c). The total number of instruction cycles required to implement the
symmetric transversal filter with the LMS algorithm is 7 .5N + 38. Where 7.5N is an integer because N is chosen as an even number. The O.5N instruction cycles come from
Equation (15a) since symmetric transversal structure folds the filter taps into half of the
order N (see Figure 9). The maximum filter length for most efficient code, 256, is the
210
same as for the FIR filter. The use of the additional data memory can be obtained from
the reduced data memory requirement for weights of the symmetric transversal filter. The
complete TMS320C2S program is given in Appendix Bl.
Note that instead of storing buffer locations x(n) contiguously, then using DMOV
to shift data in the buffer memory (requiring N cycles) at the end of each iteration, we
can use a circular buffer with pointers pointing to x(n) and x(n  N + 1). Since pointer updating requires several instruction cycles, compared with N cycles using DMOV to update the buffer memory contents, the circular buffer technique is more efficient ifN is large.
TMS320C30 Implementation
As mentioned above, the TMS320C30 uses a circular buffer instead of data move
technique. Therefore, it does not have to implement tapped delay line separately as
TMS320C2S. Equations (I) and (16a) can be combined and implemented in the same loop.
The advantage of this is that a parallel instruction reduces the number of the instruction
cycles. The implementation is shown as follows:
II
INNER
II
LDF
LDI
RPTB
ADDF3
MPYF3
STF
ADDF3
0.0,R2
;
order/22,RC
;
INNER
;
*AR4++(1)%,*ARS(I)%,Rl;
Rl,*ARI+ +(I),R3
;
Rl, *AR2+ +(1)
;
R3,R2,R2
;
ADDF3
MPYF3
STF
ADDF3
Clear R2
Set up loop counter
Do i = 0, N/2 2
z(i) = x(ni) + x(n+Ni)
R3 = w[] * z[]
Store z(i)
Accumulate the result for y
where AR4 and ARS point to x[O] and x[Nl]. ARI and AR2 point to wand z array,
respectively. IRO contains value of N/2 1. The same instruction codes of weight update
of transversal filter can be used in symmetric transversal structure by changing the x array pointer to the z array pointer. Appendix B2 presents an example program. The total
number of instructions needed is 2.SN + IS, which is less than that of the transversal
structure.
211
fo(n)
= bo(n) =
(17a)
x(n)
(17b)
(17c)
where fm(n) represents the forward prediction error, bm(n) represents the backward prediction error, km(n) is the reflection coefficients, m is the stage index, and M is the number
of cascaded stages. The lattice structure has the advantage of being orderrecursive. This
property allows adding or deleting of stages from the lattice without affecting the existing
stages.
r~., fm(n)
xln)
Stage
Stage
Stage
   .... bmln)
Stage m
.,I
f m 1 In)
i.,___t...
Z1
212
(19a)
(19b)
(20)
O<=m<=M
where the LMS algorithm is used to update the coefficients of the regression filter. For
noise cancellation application, em(n) corresponds to the output e(n) in Figure 5. For applications such as adaptive line enhancer and channel equalizer, filter output yen) is obtained as
M
yen)
(21)
gm(n) bm(n)
m=O
fo(n)
f,(n)
Stage
xln)
boln)
b,ln)
fm(n)
Stage
Stage
bm(n)
: 9,(n)
dIn)
)1' .....      f I I I f
yIn)
213
TMS320C2SITMS32OC30 Implementation
There are five memory locationsfm(n), bm(n) , bm(nl), km(n) , and gm(n)required for each stage. The limitation of onchip data RAM is 544 words for the
TMS320C25 and 2K words for the TMS320C30. A maximum of 102 stages can therefore
be implemented on a single TMS320C25 for the highest throughput. Here, another ad.vantage of TMS32OC30 architecture design is shown. Since the operands of the mathematic
operations can be either memory or register on the TMS320C30, and there is no need
to preserve the values of fm array for the next iteration (refer to Equations (17) and (18,
the fm array can be replaced by an extended precision register. Thus, for the most efficient codes, the stage limitation of lattice structure for TMS320C30 is 512, or onefourth
of the 2K onchip RAM.
Lattice structures have superior convergence properties relative to transversal structures and good stability properties; e.g., low sensitivity to coefficient quantization, low
roundoff noise, and the ability to check stability by inspection. The disadvantages of lattice filter algorithms are that they are numerically complex and require mathematical
sophistication to thoroughly understand their derivations. Furthermore, as shown in Appendixes Cl and C2, lattice structures cannot take advantage of the TMS320C25 and
TMS320C30's pipeline architecture to achieve high throughput. The total number of instruction cycles needed is 33M +32 for TMS320C25 and 14M +4 for TMS320C30.
214
=aI
var(n)
(22)
and
~(n
+ 1)
= ~(n)
u(n)e(n)~(n)
(23)
where a is a convergence parameter, and var(n) is an estimate of the input average power
at time n using the recursive equation
var(n) =(1  b) var(n 1)
b x2 (n)
(24)
AR3
AR3,FRSTAP
ERRF
VAR
VAR,SHIFT
ERRF ,SHIFT
VAR
ACC = var(nI)
ACC = (Ib) var(n 1)
ACC = (Ib) var(n 1) + b x2(n)
Store var(n)
215
TMS320C30 can be found in the user's guide for each device [13, 14]. Since the power
of input signal is always positive, those codes can be simplified to save computation time.
Since the power estimation in Equation (24) and step size normalization in Equation (22)
are performed once for each sample x(n), the computation increase can be ignored when
N is large. As shown in Appendixes Dl and D2, the total number of instruction cycles
needed for the normalized LMS algorithm (7N +57 for the TMS320C25 and 3N +47 for
the TMS32OC30) is slightly higher than for the LMS algorithm (7N + 34 and 3N + 15)
when N is large.
where
1) =
~(n)
sign[e(n)]
+ u sign[e(n)]
~(n)
(25)
1 , if e(n) ;;::: 0
 1 , if e(n) < 0
216
The programs in Appendixes El and E2 implement a transversal filter with signerror LMS algorithm in looped code. The total number of instruction cycles needed for
this algorithm using the TMS320C25 is 7N +26, which is slightly less than for the LMS
algorithm's 7N +28. Computing u*e(n) takes 5 instruction cycles. The signerror LMS
algorithm determines the sign of the u by checking the sign of e(n), which takes only 3
instruction cycles. The total number of instruction cycles needed for the signerror LMS
algorithm using the TMS320C30 is 3N + 16, which is slightly higher than for the LMS
algorithm. This occurs because the TMS320C30 takes only one instruction cycle to compute u*e(n) and two instruction cycles to determine the sign of the u.
Secondly, the signdata LMS algorithm is
~(n +
1) =
~(n)
+ u e(n)
sign[~
(n)]
(26)
(27)
which requires no multiplications at all and is used in the CCITT standard for ADPCM
transmission. As we can see from the above equations, the number of mUltiplications is .
reduced. This simplified LMS algorithm looks promising and is designed for VLSI or
discrete IC implementation to save multiplications.
The signsign LMS algorithm can be implemented as
for (i=0; i<N; H+) [
if (e[n] > = 0.) [
if (xn[i] >= 0.)
wn[i] + = u;
else
wn[i]  = u; J
else [
if (xn[i] > = 0.)
wn[i]= u;
217
else
wn[i] +
= u; J J
ARl,Nl
AR2,COEFFD
AR3,LASTAP+l
*,O,AR2
ERR
ERRF
LAC
ERRF
XORK
MU,15
ADD
SACH
BANZ
*,15
*+,l,AR1
ADAP,*,AR3
;
;
;
;
;
;
;
;
;
;
;
;
;
;
Set up counter
Point to wi(n)
Point to x(ni)
Load x(ni)
XOR with e(n)
Save sign bit, sign = 0 if same signs
Sign = 1 if different signs
Sign extension to ACCH,
ACCH = 0 If ERRF > = 0
ACCH = OFFFFh if ERRF < 0
Take one's complement of m
If sign = 1
Weight update
Save new weight
The one's complement of u is used instead of u, because they are only slightly
different and the step size does not require the exact number. The weight update with
this technique requires lON instruction cycles and FIR filtering requires Ninstruction cycles
so that the total number of instruction cycles needed is lIN + 21. The complete TMS32OC25
assembly program is given in Appendix Fl.
To determine whether a positive or negative u should be used without branching
is trickier in the TMS320C30. Fortunately, the extended precision registers ofTMS320C30
interpret the 32 mostsignificant bits of the 40bit data as the floatingpoint number and
the 32 leastsignificant bits of the 40bit data as an integer. When a floatingpoint number
218
changes its sign, its exponent remains the same. Therefore, the sign of step size u can
be determined by using XOR logic on its mantissa. The following code shows how the
signsign LMS algorithm is implemented on the TMS320C30.
II
SSLMS
II
=
=
=
=
=
=
ASH
XOR3
LDF
ASH
XOR3
ADDF3
31,R7
RO,R7,R5
*ARO+ +(1)%,R6
31,R6
R5,R6,R4
*AR1,R4,R3
;
;
;
;
;
;
R7
R5
R6
R6
R4
R3
LDI
RPTB
LDF
STF
ASH
XOR3
ADDF3
order3,RC
SSLMS
*ARO+ +(1)%,R6
R3, *AR1 + +(1)%
31,R6
R5,R6,R4
*ARl,R4,R3
;
;
;
;
;
;
;
LDF
STF
ASH
XOR3
ADDF3
STF
*ARO,R6
R3,*AR1++(1)%
31,R6
R5,R6,R4
*AR1,R4,R3
R3, *AR1 + +(1)%
;
;
;
;
;
;
Sign[e(n)]
Sign[e(n)] * u
x(n)
Sign[x(n i)]
Sign[x(n i)]*Sign[e(n)] * u
wi(n) + R4
Here, RO, R4, and R5 contain the value of u before updating. ARO and ARl point
to x array and w array, respectively. R7 contains the value of error signal e(n). The complete program is given in Appendix F2. The total number of instruction cycles is 5N + 16,
which is much higher than LMS algorithm.
The signsign LMS algorithm is developed to reduce the multiplication requirement
of the LMS algorithm. Since DSPs provide the hardware multiplier as a standard feature,
this modification does not provide any advantage when implementing this algorithm on
the DSPs. On the contrary, it causes some disadvantages since decision instructions will
destroy the instruction pipeline. If you use the XOR logic operation in order to avoid using the decision instructions, the complexity of the program will be increased and the total
number of instruction cycles will be greater than the regular LMS algorithm.
219
based upon adding a small forcing function, which tends to bias each filter weight toward
zero. The leaky LMS algorithm has the form
~(n +
1) = r
~(n)
+ u e(n)
(28a)
~(n)
1) =
~(n)
 c
~(n)
+ u e(n)
~(n)
(28b)
In order to achieve the highest throughput by using ZALR and MPYA, cw(n) can
be implemented by shifting wj(n) right by m bits where 2 m is close to c. Since the length
of the accumulator is 32 bits and the high word (bits 16 to 31) is used for updating w(n),
shifting right m bits of wj(n) can be implemented by loading wj(n) and shifting left
16  m bits. The sequence of TMS320C25 instructions to implement Equation (28b) is
shown as
LRLK
LRLK
LRLK
LT
MPY
ADAPT ZALR
MPYA
SUB
SACH
BANZ
ARl,Nl
AR2,COEFFD
AR3,LASTAP+ 1
ERRF
*,AR2
*,AR3
*,AR2
*,LEAKY
*+,O,ARI
ADAPT,*,AR2
;
;
;
;
Set up counter
Point to wj(n)
Point to x(n  i)
T = ERRF =u*e(n)
; LEAKY=16m
For each iteration, 7N instruction cycles are needed to perform the adaptation process (6N for the LMS algorithm). The total number of instruction cycles needed is 8N +28
(see Appendix Gl for the complete program). The leaky factor r has the same effect as
adding a white noise to the input. This technique not only can solve adaptive weights
overflow problem, but also can be beneficial in an insufficient spectral excitation and stalling
situation [5].
The method used above is especially for the TMS320C25, which has a free shift
feature. Since TMS320C30 is a floatingpoint processor, r can simply multiply to filter
coefficient. However, in order to reduce the instruction cycles, this mUltiplication can
combine with another instruction to be a parallel instruction inside the loop. The following code shows how to rearrange the instructions from the LMS algorithm to include this
multiplication without an extra instruction cycle.
220
II
II
LLMS
II
II
II
II
MPYF
MPYF3
MPYF3
ADDF3
LDI
RPTB
MPYF3
ADDF3
MPYF3
STF
@u_r,R7
*ARO+ +(I)%,R7,Rl
*ARO+ +(1)%,R7,Rl
*ARl,Rl,R2
order4,RC
LLMS
*AR2,R2,RO
*+ARl(I),Rl,R2
*ARO++(1)%,R7,Rl
RO, *ARI + +(1)%
;
;
;
;
;
;
;
;
;
;
MPYF3
ADDF3
MPYF3
STF
MPYF3
ADDF3
*AR2,R2,RO
* + ARl(I),Rl,R2
*ARO,R7,Rl
RO,*ARl++(I)%
*AR2,R2,RO
*+ARl(1),Rl,R2
;
;
;
;
;
;
MPYF3
STF
STF
*AR2,R2,RO
RO,*ARI + +(1)%
RO,*ARI + +(1)%
II
R7 = e(n)*u/r
Rl = e(n)*u*x(n)/r
Rl = e(n)*u*x(nl)/r
R2 = wa(n) + e(n)*u*x(n)/r
Initialize repeat counter
do i = 0, N4
RO = r*wj(n) + e(n)*u*x(ni)
R2 = Wj+l(n) + e(n)*u*x(nzil)/r
Rl= e(n)*u*x(ni2)/r
Store wj(n + 1)
RO = r*wN3(n) + e(n)*u*x(nN+3)
R2 = wN2(n) + e(n)*u*x(nN+2)/r
Rl = e(n)*u*x(n  N + 1)/r
Store wN3(n+l)
RO = r*wj(n) + e(n)*u*x(nN+2)
R2 = wNl(n) +
e(n)*u*x(n  N + 1)/r
; RO = r*wj(n) + e(n)*u*x(n  N + 1)
; Store wN2(n+l)
; Update last W
Auxiliary registers ARO and ARI point to x and Warrays. AR2 points to the memory
location that contains value r. R7 contains the value of error signal e(n). Rl and R2 are
updated before the loop because the parallel instructions inside the loop use the previous
values in Rl and R2. Note that Rl is updated twice before the loop because the updating
of R2 requires the previous value of Rl. In order to update x array pointer to the new
beginning of the data buffer for next iteration, two of the loop instruction sets have been
taken out ofloop and modified by eliminating the incrementation of ARO. The TMS320C30
assembly program of an adaptive transversal fIlter with the leakage LMS algorithm is listed
in Appendix G2 as an example. The total number of instruction cycles for this algorithm
is 3N + 15, which is the same as the LMS algorithm. This example shows the power and
flexibility of the TMS320C30.
221
Implementation Considerations
The adaptive filter structures and algorithms discussed previously were derived on
the basis of infinite precision arithmetic. When implementing these structures and algorithms
on a fixed integer machine, there is a limitation on the accuracy of these filters due to
the fact that the DSP operates with a finite number of bits. Thus, designers must pay attention to the effects of finite word length. In general, these effects are input quantization,
roundoff in the arithmetic operation, dynamic range constraints, and quantization of filter
coefficients. These effects can either cause deviations from the original design criteria
or create an effective noise at the filter output. These problems have been investigated
extensively, and techniques to solve these problems have been developed [28, 29].
The effects of finite precision in adaptive filters is an active research area, and some
significant results have been reported [30 through 32]. There are three categories of finite
word length effects in adaptive filters:
Design Issues (design of the optimum step size u that minimizes system
noise).
(29)
i=O
and
wj(n + 1)
wj(n)
(30)
where x(n i) is the input sequence and wj(n) are the filter coefficients.
If the input sequence and filter coefficients are properly normalized so that their
values lie between 1 and 1 using Q15 format, no error is introduced into the addition.
However, the sum of two numbers may become larger than one. This is known as overflow.
The TMS320C25 provides four features that can be applied to handle overflow management [13]:
222
A.
B.
C.
D.
< 1/
(31)
Iwj{n) I
i=O
where Xmax denotes the maximum of the absolute value of the input. The right shifter
of the TMS320C25, which operates with no cycle overhead, can be applied to implement
scaling to prevent overflow of multiplyaccumulate operations in Equation (29). By setting the PM bits of status register STl to 11 using the SPM or LSTl instructions, the
P register output is rightshifted 6 places. This allows up to 128 accumulations without
the possibility of an overflow. SFR instruction can also be used to right shift one bit of
the accumulator when it is near overflow.
Another effective technique to prevent overflow in the computation of Equation (29)
is using saturation arithmetic. As illustrated in Figure 12, if the result of an addition
overflows, the output is clamped at the maximum value. If saturation arithmetic is used,
it is common practice [28] to permit the amplitude of x(n i) to be larger than the upper
bound given in Equation (31). Saturation of the filter represents a distortion, and the choice
of scaling on the input depends on how often such distortion is permissible. The saturation arithmetic on the TMS320C25 is controlled by the OVM bit of status register STO
and can be changed by the SOVM (set overflow mode), ROVM (reset overflow mode),
or LST (load status register).
output
12 15
1
: 12 15
1.. . .
input
223
Filter coefficients are updated using Equation (30). As illustrated in Figure 13, a
new technique presented in reference 31 uses the scaling factor a to prevent filter's coefficients overflow during the weight updating operation. Suppose you use a = 2ffi. A right
shift by m bits implements multiplication by a, while a left shift by m bits implements
the scaling factor 1Ia. Usually, the required value of a is not expected to be very small
and depends on the application. Since a scales the desired signal, it does not affect the
rate of convergence.
dIn)   . . . , . . . .    ......
J.~
xln)t_.....
FILTER
STRUCTURE
1/8
eln)
yIn)
ADAPTIVE
ALGORITHM
o=
2 b ,
(32)
(b = 15), is called the width of quantization since the numbers are quantized in steps of o.
The products of the multiplications of data by coefficients within the filter must be
rounded or truncated to store in memory or a CPU register. lAs shown in Figure 14, the
roundoff error can be modeled as the white noise injected into the filter by each rounding
operation. This white noise has a uniform distribution over a quantization interval and
for rounding
 112 0 < e
224
~112
(33a)
and
(33b)
where
xt. .[:t4..,
.
E
t   ........... y
= Rounding Ix
8]
=x
225
Design Issues
The performance of digital adaptive algorithms differs from infinite precision adaptive algorithms. The finite precision LMS algorithm is given as
~(n
+ 1)
~(n)
Q[u*e(n)*~(n)]
(34)
where Q [.] denotes the operation of fixed point quantization. Whenever any correction
term u*e(n)*x(n i) in the update of the weight vector in Equation (34) is too small, the
quantized value of that term is zero, and the corresponding weight wj(n) remains unchanged. The condition for the ith component of the vector w(n) not to be updated when the
algorithm is implemented with the TMS320C25 is
I u e(n)
x(ni)
I <012
(35a)
2exp * 0/2
(35b)
(36b)
where ax 2is the power of input signal x(n) and fmin is the minimum mean squared error
at steady state.
In the Leaky LMS Algorithm section, it was mentioned that the exce.ss MSE given
in Equation (14) is minimized by using small u. However, this may result in a large quantization error since the most significant term in the total output quantization error is [31]
226
Noe2
(37)
2 a2 u
The optimum step size uo reflects a compromise between these conflicting goals.
The value of uo is shown to be too small to allow the adaptive algorithm to converge completely and also to give a slow convergence. In practice, u > uo is used for faster convergence. Hence, the excess MSE becomes larger, and the roundoff noise can typically
be neglected when compared with the excess mean square error.
Finally, recall Equations (11) and (12). The step size u has an upper limit to guarantee
the stability and convergence. Therefore, the adaptive algorithm requires
1
O<u<
(38)
Nox 2
On the other hand, the step size u also has a lower limit. The optimum uo, which
minimizes the sum of the excess MSE and roundoff noise, is smaller than Umin, i.e., too
small to allow the adaptive weight to converge. For an algorithm implemented on the
TMS320C25, the wordlength of 16 bits is fixed, and the minimum stepsize that can be
used is given in Equation (36). The most important design issue is to find .the best u to satisfy
Umin
<
< 
Nol
(39)
Therefore, in order to make the condition in Equation (39) valid, the initial values
of filter coefficients are better close to zero for the floatingpoint processor if the situation
in unknown.
Software Development
The TMS320C25 and TMS320C30 combine the high performance and the special
features needed in adaptive signal processing applications. The processors are supported
by a full set of software and hardware development tools. The software development tools
include an assembler, a linker, a simulator, and a C compiler. The most universal software development tool available is a macro assembler. However, the assembly language
programming for nsp can be tedious and costly. For adaptive filter applications, an
assembly language programmer must have knowledge of adaptive signal processing. The
challenge lies in compressing a great deal of complex code into the fairly small space and
most efficient code dictated by the realtime applications typical of adaptive signal processing.
227
Recently, C compilers for the processors were developed to make DSP programming easier, quicker, and less costly compared with the work associated with programming in assembly language. Due to the general characteristics of a compiler, the code
it generates is not the most efficient. Since the program efficiency consideration is important for adaptive filter implementation, the code generated from the C compiler has to
. be modified before implementing. Thus, two alternative ways, besides writing an assembly
program, to implement adaptive signal processing on DSP are presented. First is the
automatic adaptive filter code generator [12], which can be found on Texas Instruments
TMS320 Bulletin Board Service (BBS), and second are the adaptive filter function libraries
that support assembly and C programming languages.
In this report, two adaptive filter libraries have been developed: one can be called
from an assembly main program; the other can be called from the C main program. Note
that, for the TMS320C25 only, certain data memory locations have been reserved for storing
the ne.cessary filter coefficients, previous delayed signal, etc. In other words, these data
memories are used as global variables.
The foilowing example is the TMS320C25 assembly main routine, which performs
an adaptive line enhancement by calling the LMS algorithm subroutine. The filter order
is 64, delay is equal to one, and the convergence factor u is 0.01.
*
*
228
ORDER:
MU:
PAGEO:
.equ
.equ
.equ
20
327
*
*
**
XO:
XN:
WN:
.usect
.usect
.usect
*
ONE:
U:
ERR:
Y:
.usect
.usect
.usect
.usect
.usect
.usect
D:
ERRF:
**
*
"buffer" ,ORDERl
"buffer" ,1
"coeffs" ,ORDER
, 'parameters' , , 1
, 'parameters' , , 1
, 'parameters' , , 1
, 'parameters' , , 1
, 'parameters' , , 1
"parameters" , 1
INITIALIZATION
START
LDPK
SPM
SSXM
LRLK
LACK
SACL
LALK
SACL
PAGEO
1
AR7,XO
1
ONE
MU
U
;
;
;
;
;
Set DP = 0
Set PM equal to 1
Set sign extension mode
AR7 point to > 300
Initialize ONE = 1
************************************************************************
************************************************************************
INPUT:
IN
D,PA2
CALL
LMS
; Call subroutine
OUT
Y,PA2
LAC
LARP
SACL
B
.end
D
AR7
*
*
OUTPUT:
*
INPUT
229
The symbols, such as ORDER, U, ONE, D, LMS, Y, and ERR, are defined and
referred to for the purpose of modular programming. The uninitialized sections specified
by the directive .usect can be placed in any location of memory according to the linker
command file. Note that MACD instruction requires the sources of the operands on program memory and data memory separately, and CNFP instruction configures RAM block
oas program memory. Therefore, the coeffs section has to be in data RAM block 0, and
the buffer has to be in RAM block 1. Appendix HI contains the adaptive transversal filter
with LMS algorithm subroutine using the TMS320C2S, and Appendix H2 contains an
example of a linker command file.
*
N
mu
*
begin
230
$
N,BK
@xILaddr
@xILaddr,ARO
@wILaddr,ARI
O.O,RO
N1
RO,*ARO+ +(1)%
;
;
;
;
;
Set
Set
Set
Set
RO
up circular buffer
data page
pointer for x[]
pointer for w[]
= 0.0
; x[] = O.
IlsTF
LDI
LDI
; w[] = O.
; Set pointer for input ports
; Set pointer for output ports
nput:
LDF
IILDF
STF
STF
*AR6,R7
*+AR6(1),R6
R7,@d
R6,*ARO
;
;
;
;
*
*
*
*
CALL LMS30
OUTPUT y(n) AND e(n) SIGNALS
LDF
BD
LDF
STF
STF
@y,R6
input
@e,R7
R6,*AR7
R7,*+AR7(1)
;
;
;
;
;
Input d(n)
Input x(n)
Insert d(n)
Insert x(n) to buffer
Get y(n)
Delay branch
Get e(n)
Send out y(n)
Send out e(n)
*
DEFINE CONSTANTS
*
*
.usect "buffer" ,N
n
.usect "coeffs" ,N
wn
.usect "vars",1
iILaddr
out_addr .usect "vars",1
xILaddr .usect "vars",1
wILaddr .usect "vars",1
.usect "vars" ,1
u
.usect "vars",1
order
.usect "vars",1
d
.usect "vars",1
y
.usect "vars" ,1
e
" .cinit"
. sect
cinit
. word 6,iILaddr
.word 0804000h
.word 0804002h
.word xn
. word wn
Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30
231
.float
.word
.end
mu
N2
In the above example, data memory order is initialized to N  2 for computation convenience. The linker command files and the subroutine that implements the LMS transversal filter can be found in Appendixes H4 and H5.
C Function Libraries
The TMS32OC25 and TMS32OC30 C language compilers provide bighIevellanguage
support for these processors. The compilers allow application developers without an extensive knowledge of the device's architecture and instruction set to generate assembly
code for the device. Also, since C programs are not devicespecific, it is a relatively
straightforward task to port existing C programs from other systems.
To allow fast development of efficient programs for adaptive signal processing applications, C function libraries have been developed. These libraries include functions for
adaptive transversal, symmetric transversal, and lattice structures.
232
**
SUBK
SACL
LAC
SACL
LAC
SACL
LAC
LRLK
SACL
1
ORDER
*U
*D
*,O,AR3
AR3,FRSTAP
*
; ORDER = N  1
; Getting and storing the mu
; Getting and storing the D
; Insert the newest sample
ARl
AR2,*,AR2
y
*,O,ARl
AR2,*,AR2
ERR
*,O,ARI
Therefore, the parameters should be entered in the order given above. If there are
other parameters, they should be inserted right after the convergence factor mu. The leaky
LMS algorithm subroutine is given as an example.
llms(n,mu,r ,d,x,&y ,&e}
the r is defined in Equation (28a). Note that the values of the AR registers, which will
be used in subroutine, and the status registers must be saved at the beginning of the
subroutine and restored right before returning to calling routine. An example of a Ccallable
program is given in Appendix 11. Memory locations 0200h to 0200h + N 1 and 0300h
to 0300h + N 1 are reserved for filter coefficients and buffers, respectively. N denotes
the filter order.
TMS320C30 C Subroutine
As previously mentioned, the TMS320C30 architecture has features designed for
a highlevel language compiler. Note that the callable word is dropped in this section title
because the TMS320C30 is so flexible that the restrictions for the TMS320C25 no longer
exist. Since the memory locations of filter buffers and coefficients are determined by the
parameters that pass from the calling routine, the same subroutine can be used in different
places. However, the only restriction is that the memory locations of filter buffers must
align to the circular addressing boundary [14]. The features of TMS320C30 architecture
that make a major contribution toward these improvements are dual data address buses,
software stack, and flexible addressing mode. The parameters passed to subroutine are
pushed into the stack. Therefore, after returning from the subroutine, the stack pointer,
SP, must be updated to point to the location where SP pointed before pushing the parameters
Implementation of Adaptive Filters with the TMS320C25 or the TMS320C30
233
into the stack. "However, this will be done by the C compiler. The usage example of the
C function subroutine is given as follows:
tlms(n,u,d,&w,&x,&y,&e) where
n  Filter order
u  Step size
d  Desired signal
&w  Filter coefficients
&x  Input signal buffers
&y  Addr of output signal
&e  Addr of error signal
The example below shows how the C subroutine receives and manipulates the
parameters passed from the caller program and how the result is returned to the caller
routine.
FP
*
*
*
*
AR3
FP"
SP,FP
*FP(2),R4
; Get filter order
"*FP(6),ARO ; Get pointer for x(]
*FP(5),ARl ; Get pointer for w(]
IlsTF
LDI
STF
MPYF
POP
*FP(2),AR2
R2, * +FP(1),R7
R2, *AR2
*FP(3),AR2
R7, *AR2
* + FP(2),R7
FP
;
;
;
;
;
;
Note that AR3 is used as the frame pointer in TMS320C30 C compiler. Appendix
12 contains the complete LMS transversal filter example subroutine program.
234
on the TMS320C25. Figure 15 illustrates the flowchart of this procedure. Since the implementation on TMS320C30 is done only by the simulator, the last stage, realtime testing,
is not implemented.
Algorithm Analysis
and C Program
Implementation
J
.~
Rewrite C Program
to Emulate
DSP Sequence
~
Implement in DSP
Program and Testing
by DSP Simulator
RealTime
Testing
+
Figure 15. Adaptive Filter Implementation Procedure
In the first stage, algorithm design and study is performed on a personal computer.
Once the algorithm is understood, the filter is implemented using a highlevel C program
with double precision coefficients and arithmetic. This filter is considered an ideal filter.
In the second stage, the C program is rewritten in a way that emulates the same
sequence of operations with the same parameters and state variables that will be implemented
in the processors. This program then serves as a detailed outline for the DSP assembly
language program or can be compiled using TMS320C25 or TMS320C30 C compiler.
The effects of numerical errors can be measured directly by means of the technique shown
in Figure 16, where H(z) is the ideal filter implemented in the first stage and H'(z) is
a real filter. Optimization is performed to minimize the quantization error and produce
stable implementation.
235
H(z)
+
10(n)12
x(n)
ME
n=1
e 2 ln)
H(z)
236
PERSONAL
COMPUTER
TEK2235
SCOPE
FG504
FUNCTION
GENERATOR
HP3561A
DYNAMIC
SIGNAL
ANALYZER
1:
+
.......
PRECISION
NOISE
GENERATOR
HP PLOTTER
x(n)........,.,
d(n)
+
~~~e(n)
x(n1)
Adaptive
Filter
1    0     _ Enhanced
yIn)
Output
237
oscilloscope and signal analyzer, the significant SNR improvement, convergence speed,
ability to track nonstationary signals, and longterm stability of the adaptive predictor are
observed.
RANGE: 17 dBV
1/15
115dBV
STATUS: PAUSED
A:MAG
Amplitude
6dB/DIV
33
START: 0 Hz
BW: 47.742 Hz
STOP: 5,000 Hz
,238
RANGE: 13 dBV
1/15
STATUS: PAUSED
A:MAG
15 dBV
Time
Amplitude
6 dB/DIV
33
START:
0 Hz
BW: 47.742 Hz
Summary
Three adaptive structures and six update algorithms are implemented with the
TMS320C25 and TMS320C30. Applications of adaptive filters and implementation considerations have been discussed. Two subroutine libraries that support both C language
and assembly language for two processors were developed. These routines can be readily
incorporated into TMS320C25 or TMS320C30 users' application programs.
The advancements in the TMS320C25 and TMS320C30 devices have made the implementation of sophisticated adaptive algorithms oriented toward performing realtime
processing tasks feasible. Many adaptive signal processing algorithms are readily available
and capable of solving realtime problems when implemented on the DSP. These programs provide an efficient way to implement the widely used structures and algorithms
on the TMS320C25 and TMS320C30, based on assemblylanguage programming. They
are also extremely useful for choosing an algorithm for a given application. The performances of adaptive structures and algorithms that have been implemented using the
TMS320C25 and TMS320C30 have been summarized in Tables 1 and 2.
239
LMS
TMS320C25
Instruction Cycles
7N+28
33
Leaky
Instruction Cycles
8N+28
LMS
34
SignData
Instruction Cycles
11N +26
Transversal
LMS
41
Structure
SignError
Instruction Cycles
7N+26
LMS
30
SignSign
Instruction Cycles
11 N + 21
LMS
30
Normalized
Instruction Cycles
7N+S7
LMS
47
Instruction Cycles
7.SN+38
LMS
50
Leaky
Instruction Cycles
8N+38
Symmetric
Transversal
Structure
LMS
51
SignData
Instruction Cycles
9.SN+36
LMS
58
SignError
Instruction Cycles
7.5N+36
LMS
47
SignSign
Instruction Cycles
9.5N+31
LMS
47
Normalized
Instruction "Cycles
7.5N +69
LMS
66
Instruction Cycles
33N+32
LMS
63
Leaky
Instruction Cycles
35N+32
Lattice
LMS
65
Structure
SignError
Instruction" Cycles
36N+32
LMS
65
Normalized
Instruction Cycles
90N+34
92
LMS
"
240
Instruction Cycles,
3N+ 15
17
Leaky
Instruction Cycles
3N+ 15
LMS
19
SignData
Instruction Cycles
5N+ 16
Transversal
LMS
24
Structure
SignError
Instruction Cycles
3N+ 16
LMS
18
SignSign
Instruction Cycles
5N+ 16
LMS
24
Normalized
Instruction Cycles
3N+47
LMS
49
Instruction Cycles
2.5N+15
LMS
Symmetric
Transversal
Structure
23
Leaky
Instruction Cycles
2.5N + 19
LMS
26
SignData
Instruction Cycles
3.5N+ 18
LMS
30
SignError
Instruction Cycles
2.5N + 18
LMS
24
SignSign
Instruction Cycles
3.5N + 17
LMS
30
Normalized
Instruction Cycles
2.5N+50
LMS
56
Instruction Cycles
14N+9
LMS
20
Leaky
Instruction Cycles
16N+9
Lattice
LMS
22
Structure
SignError
Instruction Cycles
16N+9
LMS
22
Normalized
Instruction Cycles
67N+9
LMS
73
241
References
[1] B. Widrow and S. Stearns, Adaptive Signal Processing, PrenticeHall, 1985.
[2] R. Lucky, J. Salz, and E. Weldon, Principles of Data Communications, McGrawHill, 1968.
[3] S. Haykin, Adaptive Filter Theory, PrenticeHall, 1986.
[4] M. Honig and D. Messerschmit, Adaptive Filters: Structures, Algorithms, and Applications, Kluwer Academic, 1984.
[5] J.R. Treichler, C.R. Johnson, and M.G. Larimore, Theory and Design of Adaptive
Filters, Wiley, 1987.
[6] T. Alexander, Adaptive Signal Processing, SpringerVerlag, 1986.
[7] G. Goodwin and K. Sin, Adaptive Filtering Prediction and Control, PrenticeHall,
1984.
[8] M. Bellanger, Adaptive Digital Filters and Signal Analysis, Marcel Dekker, 1987.
[9] J. Proakis, Digital Communications, McGrawHill, 1983.
[10] C. Chen and S. Kuo, "An Interactive Software Package for Adaptive Signal Processing on an ffiM Person Computer," 19th Pittsburgh Conference on Modeling and
Simulation, May 1988.
[11] S. Kuo, G. Ranganathan, P. Gupta, and C. Chen, "Design and Implementation of
Adaptive Filters," IEEE 19881nternational Conference on Circuits and Systems, June
1988.
[12] S. Kuo, G. Ma, and C. Chen, "An Advanced DSP Code Generator for Adaptive
Filters," 1988 ASSP DSP workshop, Sept. 1988.
[13] Texas Instruments, SecondGeneration TMS320 User's Guide, 1987.
[14] Texas Instruments, ThirdGeneration TMS320 User's Guide, 1988.
[15] S. Qureshi, "Adaptive Equalization," Invited Paper, Proceedings of the IEEE, Sept.
1985.
[16] L. Rabiner and R. Schafer, Digital Processing of Speech Signals, PrenticeHall, 1978.
[17] N. Jayant and P. Noll, Digital Coding of Waveforms: Principles and Applications
to Speech and Video, PrenticeHall, 1984.
[18] J. Makhoul, "Linear Prediction: A Tutorial Review," Proceedings ofthe IEEE, April
1975.
[19] C. Cowan and P. Grant, Adaptive Filters, PrenticeHall, 1985.
[20] C. Gritton and D. Lin, "Echo Cancellation Algorithms," IEEE ASSP Magazine,
April 1984.
[21] D. Messerschmitt, et al, "Digital Voice Echo Canceller with a TMS32020," in Digital
Signal Processing Applications with the TMS320 Family, PrenticeHall, 1986.
[22] B. Widrow, et al, "Adaptive Noise Cancelling: Principles and Applications," Proceedings of the IEEE, December 1975.
[23] A. Lovrich and R. Simar, "Implementation of FIRIIIR Filter with the
TMS3201O/TMS32020," in Digital Signal Processing Applications with the TMS320
Family, Texas Instruments, 1986.
242.
243
CI
C2
DI
D2
EI
E2
FI
F2
GI
G2
HI
H2
H3
H4
H5
Ii
12.
244
Title
Transversal Structure with LMS Algorithm Using the TMS320C25
Transversal Structure with LMS Algorithm Using the TMS320C30
Symmetric Transversal Structure with LMS Algorithm Using the
TMS320C25
Symmetric Transversal Structure with LMS Algorithm Using the
TMS320C30
Lattice Structure with LMS Algorithm Using the TMS320C25
Lattice Structure with LMS Algorithm Using the TMS320C30
Transversal Structure with Normalized LMS Algorithm Using the
TMS320C25
Transversal Structure with Normalized LMS Algorithm Using the
TMS320C30
Transversal Structure with SignError LMS Algorithm Using the
TMS320C25
Transversal Structure with SignError LMS Algorithm Using the
TMS320C30
Transversal Structure with SignSign LMS Algorithm Using the TMS320C25
Transversal Structure with SignSign LMS Algorithm Using the TMS320C30
Transversal Structure with Leaky LMS Algorithm Using the TMS320C25
Transversal Structure with Leaky LMS Algorithm Using the TMS320C30
Assembly Subroutine of Transversal Structure with LMS Algorithm Using
the TMS320C25
Linker Command File for Assembly Main Program Calling a TMS320C25
Adaptive LMS Transversal Filter Subroutine
TMS320C30 Adaptive Filter Initialization Program
Assembly Subroutine of Transversal Structure with LMS Algorithm Using
the TMS320C30
Linker Command/file for Assembly Main Program Calling the TMS320C30
Adaptive LMS Transversal Filter Subroutine
C Subroutine of Transversal Structure with LMS Algorithm Using the
TMS320C25
C Subroutine of Transversal Structure with LMS Algorithm Using the
TMS320C30
"t:j
.title
'TUtS'
*HfHfHHttUHHffHfHHHf*HIHH.HfUUIHHfHHHHHHHHHtU
is
d(n)
g'
..::
us@ct
.used
paraaeters ,1
paraathrs N ,1
.used
used
usect
.usect
AparaHters",1
"paraaeters ,I
"paraaeters, 1
"parilH!ters,1
ERRf:
:+
t(n)
~
=
.....
>
text
"t:j
ioo"
ESTII'IITE TI SIGNAl.. Y
::1.
~
~
Algorith.. :
LARP
un
63
yin) = $IJ1w(klfxlnk)
ko()
~.
s.
s.
Whete
~
t:l
N
Note:
k=O,1,2, ,b3
FIR
II/(!
IIU
= 0.01.
..,<:>
S.
'"
~FINE
XO:
XN:
loIN:
LT
ItPY
64
ADAPT
"buffer ,OODERl
buffer", 1
coeffs ,ORDER
ZIIlR
tl'YA
SACH
BANZ
FINISH
.end
CIl
..... <
CIl
(JQ (t>
"'1
CIl
C:l
=~
rJ).
~; ACe
ERR
LARK
LRLK
LRLK
PARArlETERS
.equ
.equ
SACH
PI
ADD
reDER:
OII+OfdOOh, .
SACH
PAGEO:
Using rounding
=  Yin)
; T = ERRln)
; P = U ERRln)
Ot,IS
ERRF
ARl,ORiERl
AR2,WN
AR3,XN+l
ERRF
;
;
;
;
;
;
;
;
;
f,AR3
f,M2
It,O,AR!
ADAPT, f , AR2
~=
N"'1
\.}~
ERR
U
f,AR2
~~
rJ).:::l
=~
HfffHffff+****+fff++HH*fHU+HUHHHff+HHfHfHHfHf*HHffff
~FINE
!HO,lS
AR3,XN
ffiDERl
NEG
ADI!
LT
ItPY
"'1
Initial condition:
1) PI1 status bit should be equal to 01.
2) SXtl status bit should be set to 1
3) The current [p (datil leaory page pointer) should be page O.
41 Dita lIe.ory ('j should be 1.
5) Data IIIe.ory U should be 327.
v,
lAC
LRlJ(
RPTK
NACD
CNFD
API
SACH
AR3
11'1'1(
w(U
'"
fHffIHIHI***lfHfHffftfHHfHI
~
t:l
N
>
't:j
't:j
UfH*HIH*HHfHffffffHHHffH
(SUl"Il)
D:
V:
ERR:
OOE:
U:
Set up counter
Point to the coefficients
Point to the data sample
T register = U f ERR(nl
P = U ERR(n) f X(nkl
Load ACCH with Alk,n) " round
W(k,n+ll = WO::,n) + P
P = U ERRln) f Xlnk)
Store W(k,n+ll
....
til;.
t"'"
e::
s.;.
a
fffHfHHHHHUHHHH+HH+HtHtHfHHIIIIIIIIIIIIIIIIIII
f
input:
LIF
(SlIt)) t(n)
:~
::) yIn)
Al goritha=
63
ylnl = SLI1 olkl.xlnkl k=O,1,2, ,,63
k=O
.Q.,
::...
"
LPDATE
,In)
= dlnl
 yIn)
;;;
IIIlkJ
=.. (kl
+ u*flnl*x(nkl k=O,l,2, . 63
~
;::;
:::l.
=bot
Chen, Cheinthung
and au
=0.01.
~rch,
1989
,copy
"
"adapfltr.int"
HfffHffHffHfHHfHfHHIHIHHfftHHfHftfHHHffff
*HffHltHfHfffHfffffHffHffftHHffHfHfffHftffHHf
order
u
. set
.Stt
b4
01
So
~
ac
~
~
btgin
. text
. set
LOI
LIP
LOI
LOI
LIF
RPTS
STF
:: STF
LOI
LOI
. $
order,BK
txn_addr
txo.a.ddr,MO
hn_iddr,ARI
O.O,RO
order1
RO,_I1IX
RO,>AR1++llIX
@in_addr,AR6
@ouLaddr Jifl
R2,R7
R2,'AR7
R7, ttAR711l
SUBF
STF
STF
~IGHTS
,In)
=dlnl
..,~
 yin)
Cj
I'I'YF
If'YF3
LOI
RPTB
If'YF3
ADIF3
STF
I'I'YF3
ADDF3
BD
STF
ADIF3
STF
@u,R7
; R7 = tin) f u
tARO++<1IX"R7,Rl ; Rl =t(n) f u .. xln)
order3,Rt
; Initialize repeat counter
, Do i = 0, N3
utS
lARO++lm,R7,Rl ; RI =tin) oJ u x(nill
; R2 = lIIilnl + tin) I u .. xlnil
>ARl,Rl,R2
; ~i(n+1) = "Hnl + e(n) I u I x(ni)
R2,>AR1++lm
; For i = N  2
+MO,R7,Rl
>ARl,Rl,R2
input
; De 1ay branch
; wi<n+1l = ~i<n) + e(n) I u I x(ni)
R2,*AR1++11IX
>ARl,Rl,R2
; Update h,st III
R2,>AR1++I1IX
Jj. <..,
[IJ
oln I
[IJ
[IJ
;. e:..
~
00
~
s;::i
ooll
~c:
N..,
Q~
n~
~
Q;.....
IEFH COOSTANTS
"
LIIS
fffHHHfHtHHHUHHHHffftHfHUtHHfffffffffffHffHtHf+H+H
G
v.
..,
=
....
COI1PUTE ERRal SIGNAL olnl AND OUTPUT ylnl AND .Inl SIGNALS
is
So
:g>
0.0,R2
, R2 = 0.0
_IIIX,>AR1++11lX,Rl
ordtr2
fARO++(lll, fMl++(11l,Rl
Rl,R2,R2
, ylnl = oU.xU
Rl,R2
; Include last result
LIF
I'I'YF3
RPTS
1'I'YF3
:: ADDF3
ADIF
:+
l?
'G
xln) :
Input din}
Input xln)
Insert xln) to buffer
dlnl              :
tARb,R7
.+M6UI,Rb
R6,+MO
:: LIF
STF
Set
Set
Set
Set
up drculu buffer
data page
pointer for xU
xC]
=0
pointer for
RO = 0.0
If[]
0[1 = 0
Set pointer for input ports
Set pointer for output ports
xn
.n
in_addr
ouLaddr
xLaddr
IiIn_addr
u
cinit
.usect
.usect
.used
.usect
.used
.usect
.UStct
.Stct
.lIIord
.lIIord
.word
1liioI'd
.word
.float
.end
'buffer" ,order
"coeffs,order
vars,l
"'lars, 1
vars,l
vars,l
"'lars' ,1
. cinit"
S, in_addr
II8OOlOOh
0804002h
xn
.n
au
00
~
~
..,
=....
title
lEFII PMAlETERS
"
00El2'
dlnl :
::...
zUnI
~
:::.
::
:
fj
v,
:t
:t
ISltII
ISltII
ISlI11
::
::
:t :ISltII : Z
::
::
::
::
Algorith
ERR:
(1:
U:
[RRF:
>~
riQoo
0
::!. 9
puaJltters",1
.used
.useet
para.aetersM ,I
.useet
.used
. useet
.useet
pa.rueters, 1
paraaehrs,l
piruehrs" ,I
;'9
9 ::a.
",
d r')
....
~rllfters,1
HtffHfHtHHHHffffffflfHHHf
(I>
.... 03
AlIAPTlIE FILTER
= =",
.=''(1)
. <=
(JC:/
31
ylnl = SltI .Ikifxlnkl k=O.1.2 31
k=O
LARP
LARK
LJ1U(
Where
LRLK
LJ1U(
11ft
SYI1
LAC
ADD
SACL
BAN!
Note: This source progru is tbe generic version; I/O configuration bas
not bun set up. User has to lodify the Rin routine for specific
application.
AR3
ARl.0RlER21
AR2.l.I\SMT
AR3.FRSlIAT
AR4. FRSIIlF
H,O,M2
..... Il.AR4
*+,O,AlU
S'm ..... AR3
ESTHIATE TI SIONN. Y
00
id~
='"
n~
N ....
u.=
........~
",
If'VI(
LAC
LJ1U(
FIR
RPTK
I1AC1I
CII'II
Il
01.15
AR3.L.A5IIlf
00El21
IIltOfdOOh.'
APIV::
03~
:::~
Cll'P
Initial condition:
1) At status bit should be equal to 01.
21 Sit status bit should be set to logic 1.
31 Tbe current If (u.ta IItllOry page pointer should be page O.
41 Di.ta ory Cf should be 1.
SI Dita ..aory U should be 327.
Chen, CheinChung February, 1989
....=~
~
Q
c
"buffer" ,I
coeHs" ,0RDER2
coeffs ,1lURI
coeffs",l
text
D:
Y:
PfRF(IItt TI
So
~
"CS
buffer" ,Il[R21
.useet
.used
.useet
.usect
.used
HlffHHHHHlfHfHffHfHHffH
c
....
~
tv
:
So
~
~
tv
::
:t
b4
32
z : :: z :::
::
::
FRSBlf.
L.A5IIlf'
Ill'
FRSlIAT:
LASlIAT:
::
zllnk)
:1
::
:: ::
;\i
.tqu
.equ
    yin)
+:t
: A. F. :)ISltII) .Inl
!IlDER'
is
'Y25'
tHtHtffHlffHtftHHfHfHH+*HHH...tH"HHHtHHffHofHffHf
SACH
COI1PIJTE TI
EftR(II
="
00
lEG
ADD
SACH
; ACe =  yen)
D,I~
ERR
; ERRI
=01
 Yin'
=EM(n)
LT
ilPY
PAC
ADD
SACH
ERR
U
; T
II,I~
lJIRt(
ARI,0RDER21
AR2,1oW
M3,LASBlf
ERIIf
.,AR2
1,M3
ERIIf
; P = U I ERRI
'I;j
LRI.J(
LRLK
IT
"
II'Y
AIJIIPT
~
~.
iti
~
~
s:
So
'"
~
~
B
VI
So
'"
tv
Q
c
lllLR
II'VA
1,M2
SACH
BAIIl
",O,AR1
ADAPT, t ,AR2
; Stt up counter
; Point to the coefficients
t Point to tht liSt buffer
; T register
= U. ERRln)
Set point.r
LRI.J(
AR2,lASDATI
RPTK
DIMlY
OODEN
R'put Nl
1
.tnd
tilNS
HHHHIIIIIIIIIIIIIIIIIIIIIIII III1IIIIIIIIIIIIIIIHf.IHHIH
'i:l
Algoritha=
zlnk) =xlnkl)
31
is
g"
1II1(k)
.I:
"'Mr .,
Ult,
filttr order
tfHHHHflfHl.llllllllllllllltHHfHffHfHHH
o::!
HHHHHHtHlHIHfHHHHfllllllllllllll.11111
;:;.
;:;.
ordtr
.Stt
IU
.set
~
~
Q
V,
...<::>
;:;.
'"
RPTS
STF
:: SlF
STF
SlF
LDI
LDI
LDI
STF
;
;
;
;
;
;
,
SUIIF
STF
STF
II'YF
/PYFJ
LDI
RPTB
II'YF3
:: AIIIIF3
SlF
UIS
IfYFJ
:: ADDF3
,xl]=O
BD
0,,,,,/22
RO,tMI+<1I)
RO, 1AR2++(1)
RO, tMlIlRO)
RO,tAR2lIRO)
linlidr i M6
tout...dd',AR7
,.1]=0
, ,II 0
; III[] II: 0
, ,I] = 0
; Set pointtr for iDput porb
; Set pOinter for output ports
_,R7
t+M611l,R6
1IRO,1R4
R6,tMGUl1
,
;
;
I
RI,tMI+<I1l,RJ
Rl,"M2++C1)
RJ,R2,R2
, yll
til
....=
=.[].,Il
Q,.
; Store zln)
.t:=
>~
00
ag
::::!. 9
9
="tIl
9 ::;
R2,R7
R2,tM7
R7,t+M711l
; tIn)
=dIn)
 yIn)
d r;.
....
rn
input:
l.lF
:: l.lF
, Fi I ttl' order
; Step sizt
>
"C
"C
COII'UTE ERROl SIGHIII. .In) AND 001PUI yIn) and .In) SIGHIILS
:1
adlpfltr.int64
0.01
life
II'YF3
:: SlF
AIIIF
ADAPTII FILTER
.copy
=0.0
_1111,1IIR5(1)1,RI
AIIIIF3
:!l
R2
A1111F3
:: SlF
AIIIIF3
~.
RPTB
IfYF3
IltER
0.0,R2
1IRO,1IR5
orderJ22,RC
k=O
'"
LDI
LDI
yen)
.sa,
s.
l.lF
Input ~In)
Input xln)
Set for.rd pointer for xU
Insert xln) to buffer
SlF
AIIIIF3
SlF
xn
tU,R7
IAR2++I1l,R7,RI
ordet/23,RC
UIS
IAR2++ 11 ) ,R7. RI
tMI,Rl,R2
R2,tMI+<(1)
tAR2IIRO),R7,RI
tMl,RI,R2
input
R2,tMI+<11l
tMI,RI.R2
R2, tMlIlRO)
cinit
used
.ustd
.used
used
.uJtct
.usect
.Ultet
.useet
.UJtct
Stet
....rd
;
;
;
,
DeilY brlnch
lIIiCn+1) II: lIIIiCn) + tin)
Include lISt III
l!pdot. lut
~Oi
~rs
00
~OO
N
C""l
I U'
zlni)
n~
~C=
""I
til
!FINE CQlSTIINlS
.n
zn
iruddr
ouLlddr
xnl.ddr
WL&ddr
zn_lddr
; R7 =tin) , U
; Rl =e~n) , u , zln)
; Initializt rt(tMt counter
,00;=0,N3
; Rl =tin) I u I zlnil)
; R2 =IIIUn) + tin) I u I zlni)
; llllien+!) =llllien) + tin) I u I z(ni)
; Fo, ; =N  2
= ~""I
P'tilfo=
="rn
(JQ
'buffet' ,OI'Mr
'coeffs' ,ordtrl2
'coeffs',order/2
'vau',l
'VlI'S',t
'vt.rs',.
YV'J,l
'v..,.s',1
'Vlrs',t
'.cinit'
6,i ....dd'
....~
e~
00
250
'G
.title
If~
::...
'g.
~
:::J
1lIIlER:
10lni
IlInl
IHlnl
i)ISlMI} i)ISlMI}filnl
ii
.
*
i::
.,0:1:
::
::
::
bUn)
S<I>
~
tv
Algorithl:
v.
Cl
....
S<I>
~
tv
ac
.used
"buffer ,mIER+1
.uslet
'buffer"~+l
~.
I"
'parileters ,I
M
.useet
lIj)i.rueters,l
y:
useet
.uslet
parueters ,I
paruehrs ,1
'paraaeters",1
"parueters,l
E:
U:
i=l,2, ,64
TEJI>:
iI
.useet
x:
i=1,2, ... ,b4
useet
i=1.2 64
.uslet
k=O
[IJ
64
00
="~ "1
ItU
Giln+!) = Giln) + IU
a:S00"1
INITIAlIZE TI POINTERS
f [ fitnlfbilInli + biln)lfittnl ]
f
~=
text
bUnHGiln)
i=O
eiCn)
IV
bUn)
i=1,2, 64
= 0.01.
Note: This source progru is the generic version; I/O configuration has
not betn set up. User has to .odify the !Nin routine for specific
application.
Initial condition:
!) ptf status bit should be equi.l to 01.
2) sxn stitus bit should be set to logic 1.
3) Tt.. current (P Ida.ta IItllOry page pointer) should bt pige O.
4) Data lIt.ory U should be "]27.
5) The 81 " BD1 pointer 1M3 " M4) should be exchanged every
iteration. For enap1e,
For odd iteration: AR3 ) 81
AR4 } 81
l.IIRf'
AR3
l.AAI(
ARI.IlIIlERI
M2,FI
lRlK
l.RI.J(
l.RI.J(
l.RI.J(
l.RI.J(
W~
~~
('1~
N="
AR3.BI
AR4.BDI
ARS,GI
AR6.KI
Vt~
a:
INITIAlIZE TI BI AlII Fl
00
lAC
SACL
SOlCL
.O,M2
I,O,AR3
"1
INITIIIlIZATJI*
LT
II'Y
....= .r,;.
(JQ
fffffHfHHfffffHHHfHHHHfH
= SU1
d~
HfffHffffffffffHfffffffHfffffH
Kiln+U = Kiln) +
Q.
('1
bil!n:l)
Ul
Bl=
8Dl:
* fiHnl
AR4 } 001
tv
'coeffs"~CER
I
i=O
eoeffs ,au:ER
coeffs .~+1
64
yin) = SIJ'I yUn)
.uSlet
.usect
D:
>
"CI
"CI
biHnl ::
;;"!
64
.usect
i
iZii}ISlMI} . iZii}ISlMJ)bilnJ
bOtn) ::
.equ
n:
1(1:
fkiJ :
::
Ftbruuy, 1989
fkil
xln):
.leO :
Clltin~hung
fHHUHfl,III IIIIIII.fHH+HfHHHHHHHHUHHtHHHfl4ff'HHH
IEFII PARAIETERS
is
.s;,
ehtn,
'us'
fHH.fH.......ufHfHfHH.HHtH.HHffHHfHHHf......ffH
',ARS
1.M2
PAC
SACH
IG
ADIII
SACH
E,
T = BI
P=BIIGI
ACe =81 f Gl
Initialize YIOI = B1 f 01
ra = 181 I Gil
ra = Dlnl  81 GI
Initialize EIOI = Din)  81 _ G1
... *
252
u:~
j~
~
~
ii!.
..
L30 I Adaptive lattice Structure Filter IIIith"UIS Algorithl
. using the 1l1S32OC3O
::l
Algorithll
1;;"'
"";osis
bi1(nl)
~:;;.
k=O
64
64
yin) = SlJI yiCn) =ilIt biCn)fG;iCn)
i=l
i=l
""~
kHn+l) = Kilnl + IU
GHn+1)
""
~
~
c:>
""
~
I [
I
filnlfbiHnl) + bUnlffiHnl ]
tiln)
* biln)
i=I,2, 64
=0.04.
; R3
copy
idipfltr.int l
OOTPUT y(nl
..
BD
SUBF
fHffHfllfffHfffHHffHHHfffHHHffHtHH
ord.r
.5.t
.s.t
64
0.04
STF
:I STF
LDI
II LDI
; Filter ord.r
; St.p siz
begin
c:>
a
c:>
01'<ier f 2,BK
f:kniddr
Ikruddr,MO
Ibn_iddr,lIRl
f! .....ddr,AR2
order,IRO
O.O,RO
ordtr*21
RO, tIIROttl lIX
RO,tllRltt(!lX
IIRI,IRO,1IR4
liruddr, AR6
f:outl.ddr,M1
ItOI'd
LDF
II LlIF
tIIR6,R7
ft/IR6ll ) ,R5
; Input dIn)
I Input xln}
.lIIOl'd
;
,
;
;
;
Set
Stt
Set
Set
Set
up Circular
d.ti page
pointtr for
pointer for
pointer for
buff.r
Ul
b[]
gEl
, RO = 0.0
, kll = 0.0 .nd gIl = 0.0
, bll = 0.0 ODd bd[] = 0.0
input:
f
kn
gn
bn
in_iddrouLaddr
kniddr
bn...ddr
gOlddr
cinit
.UStct
.usect
.usect
.usect
.usect
.usect
used
.usect
.usect
.Stct
.lIIord
.lIIord
.word
.lIIord
.fl ..t
.tnd
....=
.
Q.
~
~
N
~~
a.
.........
= ....
[Il
(JQ
DeIlY bl'lnch
Tik. out liSt tel'll
Stnd out ylnl
Send out e(n)
Upditt k[] pointer
Updit. g[] pointer
t")
~
;'00
.....
""I
13=
~~
00E;
~~
~~
~=:
~=
DEFIIf: cmsTANTS
.t.xt
.5.t
LDI
UP
LDI
LDI
LDI
LDI
LDF
Rl'TS
STF
Ii STF
ADDI
LDI
LDI
"C
"C
.(nl SIGNALS
input
R4,R6
R6,tAR7
R7,ttllR7<l1
fAROUROI,RS
tAR2UROI, R7
>
= kFil
fHffHlfHffHHfHffffllllllllllllllll tHH
~
N
N
UI
W
SUBF
ordtrl,RC
lattice
tARO,RS,R3
R7, tllRltt( m,RO
R3,tllR4,R3
I.,RO
RO,tAR2,RO
R3,.ARl
R5,tllRl,Rl
RO, tAR2tt( II
tARO, tM4,RO
RO,RS
RS,fllR4tt(IlX,RO
RI,RO
lu,RO
RO,tIIRO,RO
R3,tAR2,R4
RO, fAROtt1l1
R4,R6
R4,R7
BI f 01
Insert Bl
E=DBItGI
fHffHHHffftfHHHHHfHHtlllllllllllllllllllllHHHffHfl
VI
<:l
....
S
+ IU
S.
SS
=Giln)
Rl'TB
Pl'YF3
1I'YF3
SUBF3
"
II'YF
ADDF3
I I STF
Pl'YF3
II STF
II'YF3
SUBF
rfYF3
ADIF
II'YF
ADIF3
Pl'YF3
II STF
ADIF
;OS
RS,tAR2,R6
RS,tllRl
R6,R7
SUBF
un
5'
rfYF3
" STF
cotffs",ord.r
coeffs ,order
buffer".2i'order
vin,i
vars.l
vats,i
val's, 1
wrs,1
vars l ,l
.cinit
6.irLaddr
O804OOOh
l
080400211
kn
bn
gn
ou
=~
00
=....>
(JQ
""I
.....
e=
,htl.
'1JQ5'
tHtHHH+ftfHHIHHtHIfH+H++fHfHffHHHHffHH*HtHffHH
u.s
yarU) = (1.r)
Where
vu(Kl) + r
x(nl
lJIRP
LRLK
k=O,l,2, 63
SPH
x(nl
lilt
M3
M3,XO
=
=~
ADD
ACe = (1r)
oct = (11')
SACH
VM
Stort
~.
;;.
;;.
DEFIlE PARAlTERS
~ER:
SHIFT:
PAGEO:
...,<::>
.equ
.equ
.equ
..
I:>
D:
USfct
.used
.usect
RESERVE ADIR:SSES
.ustct
"buFfu" ,M[Rl
"buffer ,I
"coeffs.,flUR
F~
VM(nll
VM(ni) ... r
PIlRAlETERS
parutters,1
"'1
* X(n)
X(n)
W~Hn)
CNFP
If'YK
LAC
1JE,15
LRlJ(
FIR
to 07fffh.
IIPTK
I1ACIJ
CNFD
APr;;
SACH
M3,XN
~1
lIItOfdOOh, t
CllIIPUTE 11 ERROO
; ACe =  Vln)
D
ERR
lPDATE 11 IoEIGHTS
10:
IN:
lit:
~;;
=<
=~
d\ll
"'1
[IJ
lEG
ADm
SACH
b4
7
ACe = VMlnli
Initial condition:
1> PI1 status bit should be equal to 01.
2) SI~ status bit should be set to 1.
>~
rjQ~
o "'1
~
ERIIF
VM
VM,SHIFT
ERIIF,SHIFT
SUB
Note: This source progru is the generic version; 110 configuration has
not been set up. User has to .odify the Rin routine for specific
app 1ication.
Q
I:>
ZIUi
:3l
~
N
~.
SQRA
~.
'"
. text
."
;;.
ERRF:
VAA:
parae hrs,1
~
;:..
Q
v,
pa.rueters ,1
parueters,1
"parueters, 1
parueters",!
N
I:>
.usect
.used
.used
usect
u:
ko()
i.
pariHters.,l
M
63
'"
.used
.usect
HffffttHfffHH"tH""'.....HHf
Algorithll:
!?
'tl
Y:
ERR:
OOE:
LT
ERR
; T = ERRln)
; P ::: U f ERR(n)
1JE,15
rt'Y
pr;;
ADD
IUllli.I!E COOERGE
ABS
RPTK
SUBC
BIT
FACT~
14
VM
ERR, 0
e.
:r rJ1::t
. . . .....=
IJ'Q
=~
~e;
s=~
rJ1~
~
....
N .....
Q=(lZ
NO
Vie
e.
N'
~
=~
rJ1
:?
~
'i5
BBZ
'
~Y
ADAPT
::...
'i5
:::to
:::J
;:;
;t
:
So
~
~
~
N
C
Q
v,
....c
So
~
~
~
N
C
ac
N
Ul
Ul
SAel
LARl<
LRIJ(
LRLK
LT
.s;,
NEXT
lEG
IEXT
FINISH
; ERRF
ERRF
ARt,ORDERt
U .. :ERR(rd: I VAR
Set up counter
AR2,~
AA3, IN+1
Em
IALR
MPYA
",M2
',M3
1,M2
SACH
H,O,ARI
BANI
.end
=
; Store ERRF
P = U * ERR(n) .. X(nk)
lQad ACCH !!11th A(k,nl & round
W(k,n+ll = W(k,n) + P
p = U f ERRfnl .. X(nk)
Store W(k,o+1)
tv
ItfftHftHffHltHflHHHttHftHfffffHffHtfHftffotflfffHfl
VI
0\
If'YF
If'YF
~
I'PYF
Algorittll:
b3
yin) = SltI .Ik)>xlnk) k=O.1.2 ..... b3
k=O
Vitln)
= rlvarlnl)
tin)
=dIn)
 yin)
wlk)
=!IIlk)
::t
!;
c
.s;,
PFYF3
:: ADIF
STF
RPTS
u~(n)tx(nk)/var(n) k=O,I,2, 63
=64
ind au
copy
btgin
\.It
(;)
.,
...
64
0.01
1.0
0.996
0.004
; Filhr,ordtr
; Step 5Ut
; Input signil powr
..
STF
STF
tARO++(Ul,tMl++(1)l,Rl
R6,R3
R3,tvu
; Rutore varin)
ordtr"2
Rl.R7
, .In)
= din)
 yin)
t.xt
... t
LDI
LIf'
LDI
LDI
LIF
Rl'TS
STF
I I STF
LDJ
LDI
~
II ~
STF
R2,tM7
R7,ttM7(1)
, 1.0  olp"
_,R7
_Ml),Rb
Rb,<ARO
Input din)
Inpul xln)
Inmt xlnl to buff..
x[l = 0
.[1 = 0
Stt pointtr for iftput ports
Sft poinhr lor output ports
""r:....
II"""""
~ ~
.,
= ::a=
s
=:
~

...4
""1
rn
~ Qj
~. 
= =1
,Cooput. IIvarlnl
; min) = 0 2.
(JQ
~
~
......J
'"'"":I
a:
=
n
=
....
~
PUSH
POPF
Rl
24,Rl
Rl
1.R2
24,Rl
Rl
Rl
II'YF
SUIIIF
If'YF
Rl,~.RO
2.0.RO
RO.Rl
, RO = v x[0]
, RO = 2.0  v x[OI
, Rl = x[l] = x[O] 12.0  v x[OII
I'fYF
SUIIIF
II'YF
R2,Rl.RO
2.0,RO
RO,Rl
; RO =V f xU]
, RO = 2.0  v x[l]
, R2 " x[2] = x[1l 12.0  v x[l])
~
Q..
II'YF
SUIIIF
If'YF
Rl.~.RO
, RO =v X[2]
, RO = 2.0  v' x[2]
, R2 =x[3] = x[2] 12.0  v x[21)
!;;ioI
2.0.RO
RO.Rl
If'YF
Rl.~,RO
, RO = v x[31
IEGI
SUBI
"dtr.II(
","_oddr
fxruddr.MO
..._oddr,ARI
O.O.RO
order!
RO,tfIROt+llII
RO.tflRl++1III
lin~4dr 1Mb
loul_oddr;AR7
14
"''
r.I'J.
ASIi
.set
.set
.Stt
.5Ot
.5.t
PIJSW
v..>
~
....
SUBF
Q..
, Rl = 0.0
HfiffffHHHfHHtHlftHHfUHHMftHHH
...
O.O.Rl
'odoplltr.int'
"C
vulnlI
tfffffHfHffHfHIfHffHfHHHHfIHHHtH
:
~
HfftHffHfHHHfHfffHHHfHffHHfHftHftHf
~
order
:::;.u
~
pollltr
'"
olph.
;t
alphol
So
=t
::;.
~
; R3
>
If'YF3
>MO++II)I.tflRl++IIII.RI
:: ADIf3
Rl,R2.R2
; yin) = fII[].X[]
ADtf
Rl,R2
; Include liSt result
.
calPUTE ERROR SIGHAl .Inl
=0.01.
;:s
it.I::.
; R6 =x2
, Rb = IH) xl
+ l1rlfXln)fx(n)
R6,R6
"'_I.Rb
~~
lvat ,R3
ASIi
00 ::.a
W ....
(1
.,
W 0
~
..
!.
....
~
~
00
,.
;;li~
"'""
~i
!h
i~ ih
~.,
~~&
rhi ri
~i ~~
257
00
.title
1TSE25'
ERR:
.useet
HHtHHftHHllllllllllllllllHHfIHIHHtHHfflHfHHfHtHlfH
(1:
U:
.useet
.useet
ERRF:
NEGMU:
parueters",l
Rpal"ueters,1
parueters".1
"paruettrs,1
parueters R ,I
.useet
.ustet
>
"CI
*ff.l"HHHffHHHH~
Algorithll
IflflftfllfHIHit*HHHHfHHHH
63
ylnl 5lII.lkltxlnlc1 k=O,1,2, ,63
k=O
"
~.
01 llit.
Q
c
/GIll sh.uld be
NEG
ADDH
BGEI
LT
m.
ACe  Ylnl
ACe Dlnl  Ylnl
; T register = U
ADAPT
qu
qu
lEFllE AIDlSSES
64
.usect
.IStct
.usect
[f
buffer ,MlRl
buffer, 1
coeffs ,(Ji[R
I);
VI
.USlct
usect
ARI,ORDERI
; Set up counter
LRU<
l.Rl.K
II'Y
AR2,1fj
AR3,XN+1
IALR
II'YA
SACH
BANI
t+,O,ARI
ADAPT, ...., AR2
10=
INI
"U
lARK
.... ,AR2
1,M3
.... ,AR2
parueters\1
paI'ueters .1
FINISH
.end
rIQ~
Q .,
:1.
~
~tIl
e="~.,~
....
T register = U
D
NEXT
/GIll
>.
til _
~;
ERROR
[f
1I'IIATE TI !.EIGHTS
lEFllE PiIRAITERS
Ifj+OfdOOh, ....
LT
HtHHff+lfHHHHtffllllllllllllllllllllllHHH
PAlEO'
~.
~
MCD
SACH
NEXT
!JURI
if
_.y
ONE, 15
AR3,XN
tJlIEl1
tfj
Configure eo is progr... lHIIory
Clear the P register
Using rounding
Point to the oldest saple
Repeat N tiMS
Estiaate YCn)
Configure eo a5 data IBOry
RPTK
ClCK Tl SIGN
v.
II'YK
CNFD
APAC
Initial condition:
U PII stdus bit should be equal to 01.
21 SX" st.tus bit sh.uld be Stt t. I.
3} The current IF (dda ory ~ge pointer) should be page O.
4} Dlta ... ry CIE should be 1.
51 llit ....ry U should be m.
~
::;
AR3
LAC
~.
LARP
CNFP
LRU<
>= 0
<0
FIR
~
::t...
ESTlMTE TI SIGNN. Y
is'
So
So
~
. text
= .,00
(JCI~
~=
="n
tD
~E;
=::tD
00 ~.
~~
N="
=00
n
....
N(JCI
UI=
00
s::to
::...
~.
for k=O.l,2, 63
olkl olk) + ulxlnk) if tl,) )= 0.0
olk) olk)  Ulxlnk) if 01,) ( 0.0
Where
::::~
WI'
tv
<::>
~
STF
= 0.01.
.set
et
ASH
ildilpfltr.int l
b4
0.01
SEUtS
.Sft
begin
LDI
UI'
LDI
LDI
UF
RPTS
~
~
..
<::>
a
<::>
input:
STF
STF
LDI
LDI
UF
UF
order , 8K
txn..oddr
lxn..lddr.ARO
lIon...ddr.MI
O.O.RO
ordert
RO. tMO++lllI
RO.1M1++11l1
liruddr,AR6
lout..a.ddr ,Nfl
,xll'O
; .11 0
; Set .pointer for input ports
; Set pointer for output ports
".R4
;R4=.u
".R5
_.R7
t+ARl>lll R6
R6.1MO
IORJ
tFtF3
LOt
RPTB
1f'YF3
AOOF3
STF
If'VF3
.. ADW3
BD
STF
ADIF3
STF
I
R2. tAR7
R7,t+M7U)
31.R7
,
R4.R7.R5
,
offtRQ++CUJ.,R5,Rl;
order3, Re
;
SEUtS
xn
Mn
.usect
.useet
"bufftr,order
cotffs"lordfr
iLaddr
.useet
yus",1
outddr
. used
.used
"vars'''.
xJ\_addr
Im_addr
u
Get Signltln)]
R5 =S[tln)] U
Rl = S[e(n)) " U .. xCn)
Initialize reptit counter
00 i O. N3
Rl S[tln)] U I xlni[)
R2 oHn) + S[tln)] U I xlnj)
oHn+[) oiln) + S[eln)]lulxlnj)
For i = N  2
tMO++I1lI.R5.Rl;
IMI.RI.R2
;
R2.1M1++11l%
,
1MO.R5.RI
,
IMI.RI.R2
input
; Delay brancb
R2,tAR1++(1)1
; lIIiln+U = .itn} + S[.Cn)]tut')(Cni)
IMI.RI.R2
R2.1M1++11l%
, Update ).. t 0
IEFIIE COISTANTS
~
tfj
N
>.13
Q
rJ
:I. =
r ;:g
vars,'
vlrs,.
virs",l
tIl
~ ~
til
(lei
....
..,
53
t/.}
==
==
13 ..,
;.
~ ~
~
00
tc>
~
N=00=:
(is
==
(".N ~.
I
tr1
a
.,
; Input xCn)
.used
,used
. sect
.lIIord
word
word
0804002h
.lIIord
l1li1
00
,R5'"
cinit
UF
II UF
STF
....
fllfHfltHHHfffffHHflfHHfIHHHfIHHf
tU
tv
R2.R7
HfHHlfHHffHfffHHHfHHHHfHflKlHf
order
SUBf
.copy
>
"C
"C
; yIn) =oll.xll
f IRelucit last fesult
tHHfHfHffHfHfffHHHtHHfHlHHHHHfffH
....
So
~
10
=64 and au
Rl.R2.R211U
Rl,R2
Q
v,
:: STF
RPTS
tMO++IIlI.1M1++IIlI.RI
ord,,2
tMO++lIll.1M1++1Il%.RI
II'YF3
0.0.R2 II
;; ADW3
ADIF
; R2
=0.0
UF
63
So
So
~
_3
A)gorilh,"
~.
; bput dCn)
.float
.end
.,
",cinit"
5, iru.ddr
08040(l()h
.u
'TSS25'
.titl.
IHHHfHtlllllllllllllllllHHHffHlftHHfHHlfHHlHtHfflfHf
1):
.useet
pvueters.l
U:
ERRF:
ustet
useet
paruetfrs ,1
,aruettl's ,I
ffUHfIfHUHfHfffflffHffHlHf
UIIIP
"
!it
~
..,c
~
~
Q.
API
SET !P 11 POINTERS
lARK
l.RI.K
lRlK
ARI.ORIEl!
AR2.111
ARJ.XN+I
Set up counter
Point to the coefficients
Point to the dila supJe
ERR
OOJERI
PAGEO'
.equ.
.equ
!PIlATE 11 IoEIGHTS
ADAPT
64
.usect
.usect
.usect
buffer .0UR1
buffer ,1
coeffs".(JUER
.usect
.useet
.ustd
api.rueters 11
pitueters ,1
"rueters,1
LAC
1, O,M2
ACe = Xlnk)
lOR
ERR
ERRF
IORK
111.15
ADD
SACH
1,15
Upd.te Wlkl
SACL
LAC
FINISH
BANZ
.tnd
..... . . =
ERRF
".I.ARI
ADIIPT ..... ARJ
="tIl
=~
....
= 00
~~
~
til
.....
~~
13"1
HfIIHIHflIHfHfHHIHflHIHHIIHIHfIHIHH
IEFIIE PARAITERS
>~
~13
S; "1
CICI
/G
So
<1>
!x.15
ARJ.XN
ORDERI
IoN+OfdOOb ....
CN'D
LAC
l.RI.K
SACH
=0.01.
Initial condition:
~.
I!ACIl
....=
ARJ
If>\')(
RPTK
FIR
,pplication.
IU
)= 0
<0
Note: This source progru is the generic version; 110 configuration hiS
not been set up. User has to aodify the Hin routine for specific
So
<1>
CN'P
For k = 0.1.2 63
lII(k) = w(k) + u if e(nlfx(nkl
wlk) =1110:1  u iF e{n)~(nIc)
ESTiIlATE 11 SIGNAl. Y
I
s.So
text
63
yIn I = 5U1 .lkl",lnkl k=O.1.2 63
k=O
:g>
ft.,fHfHfIIH.tHtHHfHHHHH
~~
OO::;;!
~
N ....
....
==
(100
N ....
fJlCICI
=
....
=
I
00
CICI
00
fHHHfHlIHffHIHHHtHIHHHftHfflHHffHffHHfHtHH
g"
y(n)
~
~
~.
...,,.. 11ft
"'
~
N
9...
<
0.0
UIF
ASH
=M and .u =0.01.
XOR3
LUI
Ii'TB
.lHllHfHHHlIIIIIIIIII I.IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIH
order
copy
,Sft
ai.ckpf1tr. iDt'
64
..
,set
0.01
text
.s.t
LUI
LII'
LUI
LUI
UIF
UIF
UIF
UIF
begin
So
"'
IIPTS
STF
Ii STF
~
N
C
UIF
Ii STF
_'IS
order,.
l>auddr
I>auddr,lIRO
""'oddr,MI
I.,RO
;
;
;
;
u,
",RS
O.O,RO
; RO
orderl
lin_ldclr,M6
h.t...ddr,M7
UIF
_,R7
_U),R6
R6,eIIRO
LIIF
II STF
ASH
lID
Set
circulir buffer
s.t dAti pige
stt p.inter for x(]
Set pointer for !f(]
;RO=IIU
; R4=1U
RO,IAAQ++(lJX
RO,IMI++(lJX
XORJ
AIlDF3
STF
3I,R7
RO,R7,RS
IAAQ++(l)X,R6
3I,R6
RS,R6,R4
IMI,R4,R3
:g
~
~
R7
RS
=Signle(nll
=Sign[e(n)l
I
R6 = x(nl
R6 = Sign[x(nUl
R4 =Signlx(nilleSignl.(nll
R3 =IIIHn) + R4
;RS=.u
Input d(nl
Input xC D)
Insert xen) to buffer
II'Yf3
IAAQ++U IX, R6
R3,IMI++(l)X
31,R6
RS,R6,R4
IMI,R4,R3
eIIRO,R6
R3,IMI++UIX
3I,R6
input
RS,R6,R4
IMI,R4,R3
R3, eARI++( 111
N3
Update oN2(n+1I
Get the sign of dati
Delay brinch
Decide the sign of u
Coop.t. oNl(n+1I
Store list ,,(n+l)
.,
=
e
.... 11:1
="t'I.l
~
('t)
O~
....t'I.l
11:1
Ji~
~~
~=
a::~
....
....
'J).~
~
N
=="
.useet
'buffer',ordtr
r':l'J).
.usect
.useet
.useet
.useet
used
.usect
sect
'eotHs' ,order
vats.,1
'vars',l
'Yl.rs,1
=~
irt..addr
ouLaddr
xl'Ii.ddr
u
cinit
!ford
lIIord
=0.0
IAAQ++UlX,IMI++(lJX,RI
=0,
riQ~
xn
...rd
; R2
Do i
>~
IiII'I
Iali.ddr
O.O,R2
SSUIS
=0.0
xll =0
oil = 0
s.t pointer for input ports
Set pointer for oatput ports
input:
UIF
order3,RC
IEFIIoE caeTANTS
ASH
XORJ
ADIIF3
SSUtS
",R4
LDI
LDI
Ii UIF
STF
R2,R7
R2,<M7
R7,_7(l)
ASH
XORJ
>= 0.0
ADIF.l
;;"!
So
So
SUBF
STF
II STF
for k=O,1,2., 63
o(tl = oCt) + ., if x(n(cllt(n)
1iI(k) ;; .. Ck}  u, if xCriJftCn)
I,
ordtr2
COtPUTE ERROR SIGfW. .(nl AND IlJTPUT y(nl AND .(nl SIGfW.S
63
StII .. (U~lnk) PO,1,2, ,63
PO
s.
II'Yf3
II ADIF.l
ADIIF
Algorjth.:
is
R1'TS
.tnd
vars.,1
Yf.rs,1
.cinUS,iruddr
O804OOOh
OS04OO2h
01'4
xn
.lford
.float
IIIn
.u
'J).
t.
title
~Tl25~
fH*H+f.H+tHH+ffHHHHtfffHf .....fHHHHHHHHftHtffHHHf
u:
.USfct
ERRF:
.used
c>
pirueters, 1
Rpuuehrs,1
fffffHlHHHIffflflHffHff+ff+H
"'CI
("t)
"'tfHffHfIHffflfHlfHHHHHf
.
Algorithl:
Q..
~.
yin)
= SltI .Ikilxlnk)
eel'll
= dlnl
k=O
'"G
ff~
"'hue
'
Iff
~.
:31
~
tv
..."
...
<:l
IIMfdOOh....
~1
'EG
AD()l
SACH
IlIDER:
LEAKY:
PAGEO:
,equ
ACe = Vln)
D '
ERR
, ERRln) = Din)  Yin)
lPDATE TI IEIGHTS
.equ
.equ
AMOT
II
S;
= :=;
lARK
MI,ORDERl
; Set up counter
LRLK
AA2.~
LRLK
AR3, XN+l
LT
I'I'V
ERRF
.... AR2
ZALR
I,M3
rFYA
+,M2
SUB
f,~
coeffs",ORlER
SAO!
I+,O,ARI
AlW'T ..... AR2
BAN!
ERR:
;  '"l
buffer ,I
():
(JQ
....
buffer\auRl
tv
'"l
OPE, IS
ERIIF
.used
FINISH
til
ADD
SACH
.unct
.uud
pi.raeters ,I
puHfters ,I
puUtttrs ,I
"pt,rilltters ,I
ro
rJJ.
XO:
.usect
.used
.usect
.used
, T = ERRln)
, p = U ERRln)
IN:
WN:
ERR
U
LT
I'I'Y
b4
7
=
~
~~
DEFII PIlRAlTERS
~
C
AR3,XN
RPTK
I1IICll
POC
D:
V:
ac
L.RI..J(
~
......
~TI~
HfffHfHffHHHfftHHHHHfHHHftfHfHHfH
it
~
0
CfE,15
~
SACH
CWD
1?
FIR
Initial condition:
U Pft status bit should be equil to 01.
21 SXI'! status bit should be set to 1.
3) The current If (data Iflory pige pointer) should be pige O.
So:.
k~.1.2 63
Note: nis source progl'u is tl'tt generic version; JlO configurltion has
not been set up. User hu to lodify the Rio routine for sped fie
a.pplication.
~
::....
~
 yin)
is
~
CHFP
I'PYK
lAC
k=O.1.2..... b3
end
, P = U ERRln) I Xlnk)
; Load ~ lIIith A<k,n) &: round
: WO::,n+1) = "'(k,n} + P
: p = U ERfHn) f X(nkl
: ACe = R I W(I::,n) + P
; Store W(k,n t 1)
Cl=
N
tJI ~
to
~
~
~
~
~
~
00
>
riQ'
C
'"l
~
:;
:?
'i::j
;:
~
SUBF
b3
fin)
= din)
!}
lillie)
= r*wlkl
~.
1a
<S
s..
'"
~
N
0
=64,
=0.995
and
begin
...
s..
'"
~
~
N
"
.hxt
.set
LOI
LIP
LOI
LOI
LOI
LDF
RPTS
STF
STF
LOI
LOI
ordfr,BK
bn_addrl
txniddr, ARO
hln_addr, ARt
ir _addr ,M2
O.O,RO
orderl
RO, 'ARO++llIX
RO,tARlHllll
t!:in_addr ,ARb
@ouLaddr,M7
'ARb,R7
.+ARbill ,Rb
Rb, tARO
; Input din)
; Input x(n)
; Insert x(n) to buffer
;
;
;
;
,
Set
Set
Set
Set
data page
pointer for x[]
pointer for 'II[)
pointer for I'
RO = 0.0
,x[J=O
, .u = 0
; Set pointer for input ports
; Set pointer for output ports
input:
LIF
I I LDF
STF
= 0.01.
LUIS
V,
N
0\
IU
.copy
"a.dapfltr.int"
fffHHlfHffHfflffHlffffHHfHf,*Hfflfflf
PERFIHI ADAPTIVE FILTER
fHfHHHlllfffHHHfHffllfflfHlfflfllHlf
order
. set
64
lIIu_leaky
.Sft
0.01005
; au I leaky
leaky
.set
0.995
<;)
00
tl'YF
tl'YF3
tl'YF3
" ADIF3
LOI
RPTB
tl'YF3
I I AOIF3
tl'YF3
" STF
tl'VF3
" ADIF3
tl'YF3
" STF
iO
tl'VF3
ADDF3
"
tl'VF3
" STF
STF
IHHHHHlfHIHfffffffHIHIHHflHHlffHHIH
, ylnl = w[J.x[]
: include list result
>
"'0
'0
R2,R7
R2, 'AR7
R7,t+AR7111
=
.....
~
~
 yin)
'"2l
STF
II STF
<Q.,
:....
RI,R2,R2
RI,R2
COMPUTE ERROO SIGNI<. .Inl AND OOTPUT ylnl AND .Inl SIGNI<.S
AlgorithD:
....is
;:;:
I I ADIF3
ADDF
LIF
0.0,R2
; R2 = 0.0
tl'VF3
RPTS
tl'YF3
@u_r,R7
; R7
_IlIX,R7,RI ; Rl
.1IROt+l1lX,R7,Rl ; Rt
tARl,Rl,R2
; R2
order4,RC
LUIS
'AR2,R2,RO
t+AR1I11,RI,R2
tAROt+! 114, R7, Rl
RO, fARt ++( lIX
'AR2,R2,RO
HM1(U,Rl,R2
tARO,R7,RI
RO,'AR1++11II
input
tAR2,R2,RO
<+AR11l1,Rl,R2
'AR2,R2,RO
RO,tARl++1l1%
RO,fARl++(l)X
1l:F1J COOSTANTS
xn
'"in_addr
ouLaddr
xn_addr
IIIn_addr
''
r
r_addr
cinit
.usect
.usect
.usect
.used
.used
. used
.usect
.lJsect
.used
.sect
.word
. ord
.ord
.ord
.word
.float
. float
.word
.end
buffer",ordu
coeffs",order
vars",l
N vars ',l
vars,1
virs",l
vars",l
virs,1
'vars",1
. cinit"
7,in_addr
II8II4OOOh
0S040II2h
xn
.n
au_leaky
leaky
C".l
N
=tlnllu/r
= e{n)tulx(nl/r
= e(nlfu*xlnll/r
=.0(0) t eln)fulx(nl!t
; Initialize repeat counter
; Do i = 0, N4
; RO = rfwi(n) t e(nJtufx(ni)
; R2 = wi+1(n) .. elnlfulxlni1)/r
; Rt = eln)fulxlni2)/r
; store .i In+l)
; RO = rftM3(n) + eln)fufx(nN+3)
; R2 = ...2In) + e(n)fufx(nH+2)/r
; RI =fln)fulx(nH+UIr
; Store ~3{n+J)
; Delay branch
; RO = rltili(n) + e(n)tufxlnN+2)
; R2 = IIINUn) + e(n)lufx(nN+lI/r
; RO = rf"i(n) + e(n)lufx(nN+1l
; Stort ,,*"2(n+1)
; Update last .,
3
""l
=
CIl
d~
CIl
'"l
= ~
.... CIl
(JQ
rJ1
.....
::r .....
~ ""l
3~
~=
rJ1~
id~
=
..
n
.....
(.H::r
=~
~
=
~
rJ1
>....
(JQ
""I
.....
::r
9
'w.s'
.titl.
tHHfffIHIHffHfffHUnHHIHtflttHHH.u...nftffUtHUffffH
text
lARP
SAA
SAA
SAA
CNFP
/tPYI(
UIS
RPTK
0
OlE, 15
M3.IN
(lUElI
MCD
~fdOOh.l
LAC
111
yin} = SU1 .lk}IXlnk} kOO.I,2,oo.,NI
kOO
LRlJ(
FIR
'G
::!
~
tin) = din)
~ y(n)~
IIIlkJ = wlkJ
g.
;:s
::...
~
~
p.s.
'HfHfffHfttttH+'HunIHftHfHHHHfHftHH
Q
v,
a
c
ADAPT
l.ARI(
LRI.J(
LRlJ(
SACtI
LT
/tPY
Pri::
ADD
S~
S~
SACtI
BANZ
UIS,IlRlER,U.D.M. Y.ERR.IN.1oN
SA\'f2:
SAVE3:
ERRf'
.UStct
.usect
.USfct
.usect
Plruttus,l
~rueters, I
plrUtters , 1
Rptrueters , 1
fHtfHttHtfHfHHf+HHHtHfHf
ESTl11ATE Tl
SI~
....=
~
t"'I.==
~
~~
~
, ACe =  Yin}
D
>9
C'"
IJQ
~OO
ERR
, T =ERRln)
, P = U < ERRln}
==
9 ~
OlE, IS
ERRf
0[11
ARt,ORIEl!
AR2,1oN
M3.IN+I
ERRf
; Set up counter
f~,AR2
<,M3
... ,AR2
n,O,ARl
ADAPT ..... AR2
S.
........
==
IJQ
..... 0
=~
~ ~
~""I
~=
OO~
~<
N~
Q""I
LAR
LAR
LAR
LT
II'Y
ZILR
II'YA
Q..
00[11
Ll'DATE Tl IoEIGHTS
Initial conditions:
1) Ottl. Hlory M should be equi) to 1.
21 Diti It.ort U should bt equa.l to ttJ (QIS forlNtl.
3) Pt1 status bit should be equal to 01.
41 SIt'I sh,tu5 bit should be set to logic 1.
5) 0'hI stitus bit shourd be set to 1.
6) Tht current DP (dlta ItRiory pig'. pointer) should bt plge O.
~.
.,
; Estiaate Yin)
; Configure SO u aita, ory
Pl'ri::
SACH
ERR
"'CI
"'CI
; Reput N tiltS
af'D
16
ADIIl
SACH
>
Note: This subroutine perforls Adaptive Fi lter using the utS Algoritha.
There ire SOM initial conditions to Htt before cilling it.
<::>
;
;
;
;
;
;
;
;
COIIPUTE Tl ERRIll
is
M3
ARI.SAIIOI
AR2.SA'1f2
M3.SAllC:3
FINISH RET
.end
ARI.SA\I
AR2,SA'1f2
M3.SAVE3
n~
N
til 00
.....
"'I
=
.....
=
!")
""I
.........~
=
is
..
>
;~
~

~'< ~
..........
~~ h~
265
,.idth
132
';;l
!a
g'
~
~
~'
~
;;:;
;;
:
So
H**HfffHff*f+HHHftHffHHHffflofHHHHflfHfHHHHH
STAClCSIZE .set
FP
.set
RESET
sect
.word
stack
stacLaddr
ini Laddr
stack
UP
LDI
<:>
....
Cll'1
So
LDI
BZD
LDI
LDI
SUBI
BEQ
c:>
a
c:>
SI',FP
IIO AUTOINITJALIZATJIJI
Q
v.
; Address of stack
; Address of init tables
do_init:
iniLi.ddr
tin; Laddr ,ARO
I,ARO
done
_,RI
done
_,ARI
_,RO
I,RI
>
"'0
"'0
t'D
=
....
~
begin
.end
.ffi
a:
~
.word
.liIord
stacLa.ddr
@sh.cLiddr,SP
_,RO
I,RI
cinit
_,RO
done:
BR
RO,tARI++
RO,RI
do_init
_,AR
rJ1
vectors
adip_ini t
.useet
,text
LDI
LDI
c:>
LJIP
40h
M3
STI
:: LDI
LDI
BNZD
LDI
LDI
SUBI
fHIfHffHIH++HHfffftHHfHffHHfHfHHIHHH+HHHtff
=
=
(1
W
>
~
~
'S.
....
<
('D
~
....
Get page of stored address
LOid the i.ddress into SP
And into FP too
=
('D
"1
=
9"
....
....
e:..
....
IS
.o...
~
(JCl
RPTS
Rl
; Block copy
"1
:::to
el
= SUI1wlkltx(nkl
yIn)
tIn)
= din)
111(1;:)
~.
Where
'"i::l
k=O,l,2, .. ,Nl
= .. 00:)
IH:
 yin)
RPTB
= Nand
IIU
= 0.01.
Initial condition:
;;;
;t
:
So
Chen, CheinChung
~rctl,
1989
.global
L/'tS30,u,d,y,e,order
U*lfffffffUUUfflfffl**IIHfHHI**flfHfHtffff
fHHHflHfflffffffl**HffffHfHHfHffHffffHff
. text
.set
PUSH
PUSI'
PUSIF
UlS30
el
....
So
'"
PUSH
PUSI'
PIP
PlPF
POPF
POP
Cl
a
:=
"
=fin)
_(IIX,RJ,RI
= tin) f u f xln)
lorder ,Re
Initialize repeat counter
I,Re
UIS
; Do i = 0, N3
1ARO++IIIX,R3,RI ; Rl = tIn) f U f xlnili
; R2 = ~i(nl + tin) I u * xlnil
'ARI,RI,R2
R2,fARl++(1J~
; ~i(nt1) =IIIHn) + tin) I U f xlnil
IARO;RJ,RI
; for i =N  2
tARl,RI,R2
R2, 1AR1++(IIX
; lIIi(n+1) =SjilnJ + eln) f u .. xlni)
tARl,RI,R2
R2,tARl++CIIX
; Update last til
RJ
RJ
R2
Rl
Rl
s;;o
:=
.&;:..
t""I
~~
00'"
~
>5!
C'"
(JQ
~
00
_0
=:
=C'"
5!
""l
(JQ
dO
'" S.So
So
0
=~
~
~~
00'"
~<
N~
Q""l
, RJ
= 0.0
O.O,RJ
If'VF3
_Ill!, tARl++llIX,RI
@Order
'ARO++IIIX, tARl++IIIX,RI
RI,R3,RJ
, y(nl = o[J.x[J
RI,R3
; Include list result
If'VF3
AD1F3
ADDF
R3
Rl
~""l
Rl
Rl
R2
R3
R3
1.DF
RPTS
.end
IU,R3
POPF
RETS
UIS
If'VF3
" ADlIF3
STF
If'VF3
:: AD1F3
STF
AD1F3
STF
UfffHfHfffHffHfnffHfffffHfffHlfffffHHHf
>
"t:l
"t:l
If'VF
If'VF3
!.DI
SUBI
k=O
<Q.,
D
v,
Store yin)
.Inl = dlnl  ylnl
Store tin)
Q..
Nl
;::
RJ,@y
@d,R3
RJ,@<
Algorith.:
is
'"
STF
SUIIRF
STF
fHfH+HH*******HtHHHH****HHHHHHHHtH*,*",**HHff
~~
~
Qoo
""l
=:
~
=:
""l
~
~u~~n~
"""'''' ................... 8
oooo:::c:::::~
~ h~hHi
I;!,~
268
"<:j
. ti tIe
~ctJ1S'
~
~
Ei
Algotith.:
COFFP:
COEFFD:
FRSTAP:
.tqu
OffOO~
.equ
.equ
030Qh
NI
yin) =SLI1 wlk)f><lnk) k=O, 1,2, ,NI
k=O
::...
!ilk) :;
:!l
~(k)
Usage:
IIks(n,.u,d,x,f.:y.I!el
n  order of fi 1tel'
SST
SST!
OST!
SPI1
SSXM
.u  convergence factol'
d  desired signal
SIJVI1
;;.
x  input signal
L.lJ'I(
MIVl
LAC
SUBK
Note:
~
tv
SACL
ADLK
SACL
.def
<:>
...,
ac
Q..
~
i'
>(1
00
IJQ C
o
::!.
.....
f
Set
Set
Set
Set
Set
ACe
=N
I
ORDER
,ORDER=NI
FRSTAP
ADRlST
LAC
SACL
LAC
SACL
LAC
+HHHUHfffUHfffffffffHHf+HHHfffffHfffff
v.
tv
....=
TI REGISTERS
(F
ARI,SAVEI
AR2,SA12
AR3,SA\3
AR4,SAVE4
OSTO
SAIl
SAIl
SAIl
SAIl
::to
~
!'tI
. text
_1 lIS
"<:j
;;.
>
"Cl
"Cl
HHf***ffHHHHHHHHHHHHf
"
0200~
HHffHHHfHff**fffHfflftHf....
_las
OSTO:
OS11:
SAVE!:
SA\2:
SAVE3:
SAVE4:
ORDER:
X:
D:
U:
y:
ERR:
ERRF:
ADRLST:
.usect
.usect
.usect
.used
.usect
.used
.usect
usect
.usect
.usect
.usect
.used
usect
.usect
puaaeters", 1
puiJM!ters,l
puueters",l
puueters, t
puoueters,1
pariattersN,l
pariHters",l
pariaeters ,1
"paraaetersN ,I
"parueters,l
pualleters",l
para.eters ,1
paraJleters,l
"parueters ,1
LRLK
SACL
CNFP
FIR
LALK
1,15
AR3,AmLST
ORDER
aEFFP,'"
LAA
RPT
I1ACD
Ctf'D
IJQI3
..... '"S
=~
!'tI
[IJ
~~
~a
=. . .
Noo
NC
tI1~
.....
C
'"S
!'tI
....
.~
COI1PUTE TI ERROO
lEG
ADIi
=
=..,
=
13<
f!J. 0
(1'"S
APAC
SACH
53 ::to
Cj!'tl
OO[IJ
AR3,FRSTAP
MPYK
0"
'"S
0
=C
; ACe =  YIn)
00
>
"
i .
270
"t;j
HtftHflHlfHfHHfftfHfHtHfI4tff'fHfffUfHfHfff+lfHUf
f
[
~
Algorith.:
i=i
Nl
ylnl = SlIt .lkl*Xlnkl k=Q.l.2 Nl
k=Q
~.
=wlkl
wlkl
~.
Whet.
Usage~
lilt
+ ufe(n)fxlnkl
=Nand
IU
= 0.01.
'*lex  fiinput
1tel' coefficients
signal buffer
So
"
.globil
set
1989
_tlas
AR3
c
....
IfHffHHHftfHffHflHHHfHffffffHffHHfffff
So
_tl ..
'"
tztv
C
ac
*
"
HH.ffHfHfffHffHlfHHHHfllffHHfffHHffH
. text
.set
PUSH
LDI
PUSH
PUSH
PUSH
PUSH
PUSIF
PUSH
PUSIF
PUSH
PUSIF
PUSIF
FP
sp.FP
ARO
ARI
AR2
Rl
Rl
R2
R2
RI
Ab
R7
LIIS
tFPISI.ARI
2.R!
Get
Get
Get
Set
fi 1ttl' order
pointer for x[]
>
"C
"C
0.0.R2
(D
=
....
Q..
, R2 = 0.0
~
>~
00
LDI
SUIF3
STF
LDI
STF
~121.AR2
~""l
~131.AR2
Get
R7.<AR2
Send out
ten)
"
II'YF
If>YF3
LDI
RPTB
If'YF3
ADIF3
LDF
STF
STF
ADIF3
STF
POPF
POPF
POP
POPF
POP
POPF
POP
POP
POP
POP
POP
RETS
.end
ttFP121.R7
tARO(I). R7.Rl
RI.RC
~
tAROI1I.R7.RI
tAR1W.Rl.R2
tARO.Rb
R2.tARl
Ab._tAROW
tAR1UI.Rl.R2
R2.tARl
R7
Ab
RI
R2
R2
Rl
Rl
AR2
ARI
ARO
FP
; R7
; Rl
=tin)
=tin)
C"
~8
="=
9 eo
=
O(D
address
t(n)
IHflHffHIHHHHHHlHtfHtfHH.ttfHffHtfH
FP
~lbl.ARO
tllsln,MI,d,',b,&y,&el
n  order of fi 1tel'
1M! 
!j
convergenct factor
d  desired signal
If>Yf3
RPTS
If>Yf3
:: ADIF3
ADDF
k~,1.2, ,Nl
~121.RI
LDF
'"
~
tztv
LDI
LDI
LDI
SUBI
~. Q
..
u f xlnN+l1
, Get x((ntiN+ll
; .. iCn+lI = !!lien) + eCn)
, Shift x[]
; R2 =!!IiCn) + e(n)
; U,ditt liSt III
= l3~
IJQ
u f x(n)
x(nil
;;~
(D
l3~
~Pi
00
til
~~
N=00
~::;
~=
....
=~
=
~
$J
;:
~
00
272
A Collection of Functions
for the TMS320C30
Gary Sitton
Gaslight Software
273
274
Introduction
This report presents a collection of efficient machine language programs for advanced
applications with the TMS320C30. These programs provide basic math and transcendental functions. Other routines include vector functions, FFTs and linear algebra.
Library Overview
The set of programs fall into six categories:
I.
II.
III.
IV.
V.
VI.
xd  indicates that the relative accuracy of the implemented function is x decimal digits.
*  program name prefix stands for M or R.
M  selects the memory based parameter entry point.
R  selectthe register based parameter entry point.
X  indicates the extended precision program
version.
275
II.
A.
IV.
276
computes
computes
computes
computes
computes
computes
computes
computes
III.
SIN
COS
EXP
LN
ATAN
SQRT
FPINV
FDIV
SINX
COSX
EXPX
LNX
ATANX
SQRTX
FPINVX
FDlVX
FMULTX
computes
computes
computes
computes
computes
computes
computes
computes
computes
ILOG2
B.
C.
IMULT
IDiV
*CORMULT
B.
*CONMULT
C.
*CBITREV
D.
*FMIEEE
inplace computation of the complex vector product of two complex arrays using the complex conjugate of the second array.
inplace computation of the complex vector product of two complex arrays.
inplace bit reverse permutation on a complex array with separate real and imaginary arrays.
inplace fast conversion of an IEEE array to a
TMS320C30 array.
A Collection of Functions for the TMS320C30
V.
VI.
E.
*TOIEEE
F.
G.
H.
*VECMULT
*CONMOV
*VECMOV
CFFFT2
B.
CIFFT2
*SOLUTN
B.
*SOLUTNX
Solves a well conditioned system of linear equations with any number of dependent variable sets.
Uses no (diagonal) pivoting with normalprecision
floatingpoint math.
Solves a well conditioned system of linear equations with any number of dependent variable sets.
Uses no (diagonal) pivoting with extendedprecision floatingpoint math.
277
The extendedprecision versions of the programs in this report may be slower than
their normal precision counterparts. When using extendedprecision results in RO from
category II programs, note that the results may be stored in memory with or without rounding. A more accurate normalprecision result will generally be obtained by rounding. You
should never round before using an extendedprecision result as input to another extendedprecision program unless special circumstances exist. Note that truncation, not rounding,
will occur if an extendedprecision register is moved to any 32bit register or any memory
location. This will generally cause loss of accuracy in the amount of the value of the least
significant bit of the mantissa.
Program Utilization
Since all programs in this collection are intended to be invoked by a CALL instruction, you must have the stack pointer (SP register) appropriately set to an available memory
area, preferably in internal RAM. Programs in categories I and II save and restore the
data page register DP by using the stack area pointed to by SP. Programs in category
III do not alter or use the DP register at all. The programs in categories IV through VI
alter but do not restore the DP register.
All ofthe programs in categories I through III, except for ILOG2, are implemented
as straight line code. You may wish to disable the instruction cache while these programs
are being executing. This will cause no loss .of execution speed and will avoid flushing
out potentially reusable instructions in the cache. It is beneficial to have the cache enabled
when using most of the remaining programs (categories IV through VI) as they generally
contain multiinstruction loops.
Programs in categories IV through VI allow input through externally defined variables
addresses. The .global references indicate these addresses, where the input variable values
and/or addresses are located. The starting address of these memory locations is given by
the external variable $PARAMS. All of the addresses are assumed to be in the same
TMS320C30 memory page as $PARAMS. If this is not the case, the addresses or the
programs should be changed assure that the DP register gets set properly.
Programs in categories IV and VI also allow the use of registers to hold input
parameters. The exact registers to be used are found in the program source listings. When
using the register input entry point, refer to the program using the R prefix on the program name, e.g. RSOLUTN. The memory based parameter input entry uses the M prefix,
e.g. MSOLUTN. The .global references to the R prefix entry points may be deleted if
they are not needed.
278
Polynomial Approximations
The polynomial approximation method is fundamentally very simple. A limited part
of a function is approximated by a polynomial of some order sufficient to obtain the desired
accuracy. The polynomial is generally a series of the form:
n
Pen, x) =
E [a[i]xiJ,
(1)
i=O
where x is the independent variable, n the polynomial order (a fixed integer), and a[i]
is a set of n + I fixed coefficients.
The desired function, say f(x), is then approximated by a particular Pen, x) such that:
f(x) = Pen, x)
(2)
where xl and xu are the limits of the domain of x, and e(x) or e(x)/f(x) is the error function which has been usually minimized in the minmax (equiripple) sense. This is done
by selecting an appropriate means of calculating the coefficients a[i].
Various techniques and schemes are used in the selection of:
See Hastings [2] for an excellenttutorial on this numerical methodology. All of the
polynomial approximations used in here were obtained from the National Bureau of Standards reference edited by Abramowitz and Stegun [3].
279
+ h)
g(x)
(3)
where rex, h) is the remainder of the series (which can be assumed to be small), and g'(x)
is the derivative of the function g(x). Leaving off the remainder in (3) we get, in terms
of incremental values of x, the approximation:
to be quadratic, i.e. the error in the approximation at each iteration is proportional to the
square of the error in the previous iteration. Minimally, this requires a sufficiently close
starting value for x[O] and the condition that ig'(x)i > 0 for all iterated values of x.
= lIx[i]  c = 0,
(6)
which is solved when lIx = c or x = lIc. Taking the derivative of (6) and substituting
into (5) and simplifying gives us:
x[i + 1]
x[i][2  cx[iD,
(7)
280
g(x[iD = x[ij2  c = 0,
(8)
which is solved when x2 = c or x = sqrt(c). The solution for (8) leads to the classic square
root formula:
x[i+l] = 0.5[c/x[i] + x[iJ},
but this equation uses division. However, the iteration from f(x)
be shown to be:
x[i + 1] = x[iJ[1.5  c'x[ij2},
(9)
where c' = c/2 = 0.5c. Though (10) needs no division, the final desired result must be
transformed by an extra multiplication by the input c because:
sqrt(c) = c[1/sqrt(c)J.
(11)
Formula (10) will also converge, in the precision doubling fashion of the NewtonRaphson iteration, given a suitable close starting value for x[O] and the use of sufficiently
accurate arithmetic. Note that the extendedprecision version routines FPINVX and SQRTX
both use an extra iteration (for a total of 4) to achieve the needed 32bit accuracy for the
40bit format.
The initial guess x[O], for the iterations of l!sqrt(c) and 1/c, may be obtained using
an interesting approximation. A TMS320C30 floatingpoint number c = (1 + m)2e, where
o ~ m < 1 and 127 ~ e ~ 127. The extra 1, added to the fractional mantissa m,
is the implied bit. Then we can write the inverse of cas:
1/c = 1/(1 + m)2 e .
(12)
m)
1  m/2,
(13)
which is exact at the end points: m = 0 and m = 1. Then the approximation for the
reciprocal would be:
lIc
(1  ml2)2 e .
(14)
281
It turns out that this approximation can be achieved in a single logical operation.
If you compute the unlikely value of c' = c XOR OFF7FFFFFFFh, you would complement all bits in c except the sign bit. Including the implied bit and taking the effect of
one's complement arithmetic into account results in a final value of:
c'
= [1 +
(1  m)]2(e +
1),
(15)
(1 mI)2 e
(16)
lIc.
c' gives about 3 bits of precision, which is an excellent seed x[O] for the lIc iteration.
Using e/2, you have a start for the lIsqrt(c) iteration as well.
sin(x) = x
1:
[a[2i]x2i]
xe(x) ,
(16)
i=O
where Ixl ~ Pi/2 and Ixe(x)I ~ 2(10 9). Instead of using another power series for cos(x),
you can use the fact that:
cos(x) = sin(x
(17)
Pi/2).
The series given by (16) is only accurate in the 1st and 4th quadrants, i.e. Ixl ~
Pi/2. Sin(x) in the other two quadrants is found from:
sin(x)
(18)
sin(Pi  x).
The case for x < 0 is expediently handled by using Ix I for all calculations except
for the final multiply by x in (16).
Exponential Functions
The EXP and EXPX (exponential) routines use an approximation (see Section 4.2.45,
p. 71, in [3]). The expansion is of the form
exp(x)
1: [a[i]xi]
e(x),
(19)
i=O
where 0 ~ x ~ In(2) and le(x)1 ~ 2(10 10). The series for 2Y is found by substituting
y = x/ln(2) since:
exp(x) = exp(ln(2)y) = 2Y.
282
(20)
A Collection of Functions for the TMS320C30
2y =
[b[i]yiJ
+ e(x) ,
(21)
i=O
where b[i] = a[i](ln(2)i). See the coefficients in the EXP routine.
Values of exp(x) for x outside the convergent range are found by two means. First
for x < 0, note the relationship:
exp( x)
(22)
lIexp(x),
which does require an inverse (see the FPINV and FPINVX routines). For y > 1, let
y = n + f where n = 1, 2, ... and S f < 1. By substituting y in (20), you get
(23)
In(1
+ x) =
[a[i]xiJ
+ e(x) ,
(24)
i=1
where S x S 1 and ie(x)i s 3(10 8). The expansion for In(y) can be used if the
transformation y = x  I is applied.
Values ofln(x) for x outside the convergent range are found in the following way.
First, make the substitution x = f(2n) for 1 S f < 2 and n = 0, 1, ... ,and then write:
log2(x)
log2(f2n)
+ log2(f),
(25)
where log2(x) is the log base 2 of x. Using the relationship that log2(x) = hi(x)/ln(2),
you get the equation
In(x) = In(f)
+ nln(2).
(26)
Arctangent Functions
The ATAN and ATANX (arc or inverse tangent) routines use the approximation
from section 4.4.49, p. 81 in [3]. The series with only even terms for atan(x)/x is transformed to
8
atan(x) = x
[a[2i]x2iJ
+ xe(x) ,
(27)
i=O
A Collection of Functions for the TMS320C30
283
where 1 ~ x ~ 1 and Ixe(x)I ~ 2(10 8). Values for atan(x) for x outside the convergent range are obtained by noting the following identity:
atan(x)
(28)
= b(l/c).
(29)
(30)
The last term in (30) is always less than the 32bit precision in the mantissa of the
final result. Therefore, you need only to compute the first two terms in the product xc.
Also, note that all the indicated products in (30) may be computed using a normalprecision
native TMS320C30 multiply as long as the terms are collected in extendedprecision
registers. The additions are also done using the native TMS320C30 add as it is implemented
in extendedprecision.
284
(au)(bu)2 32
(31)
where the powers of two indicated are accomplished by shifts. Note that each product
in (31) must be represented as a 32bit integer. The adds in the sum must be done with
care to facilitate the carry between the two final 32bit components of the product.
Integer Divide
The IDIV routine is a modified version of the program DIVI in the TMS320C3x
User's Guide [1]. It has been modified to return the absolute value of the remainder of
the integer division. The remainder was originally computed, but was discarded during
the extraction process for the quotient. A few more instructions allow the extraction of
both the quotient and remainder from the result of the SUBC process. The program IDIV
may be used for the computation of the modulo function. The output of IDIV is the pair
[q, Irll = alb, with the property:
o~
r = (a modulo b) < a,
(32)
= bq + r, for positive
285
+
c1[k]conj(c2[k]), k = 1, ... , n,
(33)
where + means replaces. Each complex array is assumed to be stored as two separate
arrays, i.e. (c1] = (xl, yl] and (c2] = (x2, y2]. In cartesian complex representation, (33)
becom~s
(xl
iyl)
+
(xl
iyl)(x2  iy2) ,
(34)
where i represents the imaginary constant sqrt(  1). Separating the real and imaginary
parts, we have:
xl
+
xlx2
+ yly2, yl
+
ylx2  y2xl
(35)
This operation can be used for the frequency domain correlation of two FFTs to implement time domain correlation.
On the other hand, the array routine *CONMULT computes the pointbypoint complex multiply oftwo complex arrays. If the arrays are cl and c2, and are each oflength
n, then
c1[k]
+
cl[k](c2[k]), k
1, ... ,n,
(36)
iyl)
+
(xl
iyl)(x2
iy2).
(37)
+
xlx2  yly2, yl
+
ylx2
y2xl.
(38)
This operation can be useq for the frequency domain convolution of two FFTs to implement digital filtering.
286
Vector Primitives
The array routines *VECMULT, *CONMOV, and *VECMOVare a useful suite
of efficient programs for simple array operations. The first routine, *VECMULT, performs the simple operation x[k] ~ x[k]c which is a scalarvector multiply useful in uniformly scaling an array by a constant c. You can use this for scaling arrays after an inverse
FFT by choosing c = lin. The next routine, *CONMOV, performs the operation
x[k] ~ c which is useful in filling or initializing any portion of an array to a single constant c. The last routine, *VECMOV performs the simple operation x[k] ~ y[k] , an array
move, and is, therefore, generally useful.
FFT Routines
This category contains the two complementary radix 2 complex FFT programs
CFFFT2 and CIFFI'2. These programs differ from previously available TMS320C30 FFT
programs in that they operate on complex arrays which are stored as two separate and
independent real arrays. Both routines do the FFTs inplace and do no index permutations
or constant scaling (multiplication). Also these programs require only a 3/4 cycle external, precomputed sine table. As with previous FFT programs, these, too, have a special
multiplyless butterfly loop for the occurrence of unity twiddle or complex rotation factors.
287
The routine CFFFr2 is a DIF radix 2 complex forward FFT program and thus
assumes a normally indexed pair of input arrays. The output array is bitreverse permuted
and normally must be unscrambled to be of any use (see *CBITREV). The routine CIFFT2
is a DIT radix 2 inverse FFT program and thus assumes a bitreverse indexed pair of
input arrays. A normally indexed complex frequency spectrum must be bitreverse scrambled before using CIFFT2 (again, see *CBITREV). On the other hand, the output from
this inverse FFT is in normal indexed order, but lacks the traditional scaling by the factor
of lin. Therefore, backtoback calls of CFFFT2 and CIFFT2 will return the original
complex array (in proper order) but multiplied by a factor of n. Consult the handbook
by Burrus and Parks [4] for additional FFT algorithm details.
+
+
A[l, 2]x[2]
A[2, 2]x[2]
A[n, 2]x[2]
+ ... +
+ ... +
A[1, n]x[n]
A[2, n]x[n]
= y[1],
=
(39)
y[2],
damental problem in linear algebra, then, is to find the solution vector x. In fact, you
may desire to find the m different solutions to m sets of linear equations which share the
same coefficient matrix A, i.e. Ax[k] = y[k], for k = 1, . . . , m.
288
You can solve the general problem just stated by using *SOLUTN, or with more
accuracy with *SOLUTNX. This is done by constructing a tableau B (table of coefficients)
which is simply the coefficient matrix A (in row major storage format) with the negative
of the y vector(s) appended (:) as m extra columns to A. Thus you would have B = A
: y, as your problem, where B is a n by n+m matrix and typically m = 1. Thus, for
the common case of m = 1, the input array B can be written as:
A[1, 1], A[1, 2], ... , A[1, n], y[I],
A[2, 1], A[2, 2], ... , A[2, n], y[2],
(40)
2.
The maximum absolute value of the elements in A must be approximately unity. This is necessary to assure that no pivot element is encountered which is
smaller in magnitude than 10 8 for *SOLUTN, and 10 10 for *SOLUTNX.
This restriction monitors the system condition and assures an adequately accurate solution, but the fmal solution should always be verified by substitution. This is done by inspecting the elements of the error vector e = Ax Y computed by using the solution x, and the original A and y.
289
Summary
This report presented a set of routines that can be used in digital signal processing
applications. The appendix contains the source code of these routines. This source code
can also be obtained from the Texas Instruments Electronic Bulletin Board (713) 2742323.
If there are comments or corrections, please contact the author of this report:
Mr. Gary Sitton
Gas Light Software
5211 Yarwell
Houston, TX 77096
Tel (713) 7291257
References
(1)
(2)
(3)
(4)
Burrus, C.S. and Parks, T.W., "DFT/FFT and Convolution Algorithms", John
Wiley and Sons, New York N.Y., 1985.
(5)
Press, W.H., Flannery, B.P., Teukolsy, S.A., and Vettering, W.T., Numerical
Recipes in C  The Art of Scientific Programming, Cambridge University Press, Cambridge England, 1988.
290
Appendix
Program Librar'y
$MATH. ASM
II.
$MATHX.ASM
II 1. $MATH I. ASM
IV.
$VECTOR.ASM
V.
$FFT2.ASM
VI.
$LINALG.ASM
<5'
;::c
.s;,
~
;::c
~.
'c>
.,
So
(1)
~
~
tv
1.
tHlfHHffffHlfHHHHfHttfHffHHHHfHHtHIHffHffHtHfffHIH+fHf
SIN
cas
EIP
LN
a: =< 88.
>= O.
f
~
>
~
N
Hf4HUtfffHIHHlHHfffHtflflHltffHfHHHlfHffff
_:SIN
<
<
<
<
fffffHfHUHHHHHfHHffHfHffHffHfHffHffHfHf
.GLOBL SIN
.GLOBL ECOS
INTERNAL CONSTANTS
~
~
B.
Q
.FLOAT 0.636619n2
; 2/PI
COF
.FLOAT
FLOAT
.FLDAT
.FLOAT
.FLOAT
.FLOAT
; CI (PII21
1.570790327
0.0459640968 ;C3
0.07969260878 ;C5
0.00468166687 ; C7
0.00016025884 ;C9
0.000003433338; ell
;OS
ACOF
.IIJRD
COF
..,
CON
So
ACOO
.IIJRD
;OS
.Q.,
~
;OS
~
SHFT
; AIlMESS OF COEFFS.
too
'"
C
Q
C
CON
; ADDRESS OF COOSTS.
TEXT
~
tv
START OF SIN _
SIN:
PUSH
UP
; ROUND X
; R4 {= X
LDF
rf'YF
FIX
FLOAT
SUBF
NEGF
ADDI
AND
TSTB
RO
RO,RI
!NORM.RI
RI,IRO
IRO,R2
R2,RI,RO
RO,R3
I,IRO
3,IRO
2,IRO
R3,RO
@ACON,ARO
HRRO(IROI,RO
RO,R3
@ACOF,ARO
DP
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
RO {= :x:
RI {= RND IX:
RI {= Xt2lPI
IRO <= INTEGER GUADRANt Q
R2 {= FLOATING IlUADRANT Q
RO <= X, I ( X < I
R3 <= X
lRO <= Q + I
IRO <= TADLE INDEX
LOOK AT 2ND LSS
IF I THEN RO (= X
ARI ) CONST. TABLE
FINAL IlAPPING, RO <= 1 + C
R3 <= I
RRO } CCfFF. TADLE
UNSAVE DP
.DATA
NORM
RO
RO,R4
ECOS:
LDFNZ
LDI
ADDF
NEGF
LDI
POP
::t.
RND
LDF
If'
@ACOF
;SAVEDP
; LOAD DATR PAGE POINTER
rlPYF
RND
RO,RO,R2
R2
; R2 (= X....2
; ROUND X..2
rlPYF
ADDF
*ARO,R2,Rl
*ARO,Rl
; RI <= X"2tCl1
; Rl <= C9 + Rl
rfVF
ADDF
R2,Rl
<ARO,Rl
II'VF
R2,Rl
*ARo,RI
ADDF
RI
R2,RI
*ARO,Rl
ROUND BEFORE
Rl {= X"2'(C5 + RIl
Rl <= C3 + RI
AND
rlPVF
RDDF
Rl
R2,Rl
tRRO.RI
ROUND BEFORE
ADDF
RHO
MPVF
;:..
12~
Q
'
~
~
;:::
'
'"
'C'
....
~
~
~
R4,R4
R3,RO
pop
!!UD
R2
R2
RNO
RHO
MPVF
RO
RI
RI,flO
TEST ORIGINAl X
IF X ( 0 THEN RO (= x
R2 (= RETWN ADDRESS
RETURN (DELAYEDI
RIJLIill BEFORE
ROUND BEFORE
RI (= hlCI + RII
ffffHfftfHfHHfHHHfHffff****HH******fHIHtfHH
PROORAI1: COS
COSINE FUNCTION: RO
(=
COStROI.
*********H"**I*II**I,,**********************************
tv
.GLOBL COS
.GLOBl ECOS
TEXT
PUSH
LDP
DP
; SAVE DP
~ACOF
BRD
RND
ADDF
LDF
ECOS
RO
@SHFT,RO
RO,R4
\0
W
tv
nHfffl**uuun***UHfHftfHfHtHfltfffIHHHffHf
*
*
*
PROGRAN; EXP
WRITTEN BY: ()AAY A. SITTON
GAS LIGHT SOFTWARE
HOUSTON, TEXAS
MARCH 1989.
PUSH
II'
LDP
ffC7
RO
RO,R2
RO,RI
R2,RI
@ENRI1,RI
RI,R3
R3,RO
RO,RI
R3
24,R3
R3
R3
ffC7,ARO
lIP
RND
IGF
LOF
LDFN
II'VF
FIX
FLDAT
SUBF
IGl
LSH
PUSH
POPF
LOI
POP
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
SAVE II'
LOAD DATA PAGE POINTER
ROUND X
R2 <. x
RI <. x
IF X <0 THEN RI <. IX;
RI <. X IWLNI21
R3 <. I INTEGER IF X
RO <. FLT. PT. I
RI (. FRACTION OF :XI, 0
R3 <. I
MOVE I TO EXP.
SAVE AS INT.
R3 (. FLT. PT. 2*'1
ARO  >CEFF. TABLE
UNSAVE II'
<.
X( I
HHff*********HHH*tfflf***********HfHHIH*********
AND
NPYF
ADDF
.ARO,RO
; ROUND BEFORE *
; RO (. X+C7
; RO <. Cb + RO
MPVF
ADOF
RI,RO
'ARO,RO
NPYF
ADDF
RI,RO
'ARO,RO
; RO (= XHC5 t RO)
; RO (. (4 + RO
NPYF
ADDF
RI,RO
*ARO,RO
RND
NPYF
ADDF
RO
RI,RO
.ARD,RO
; RO
RND
NPYF
ADDF
RO
RI.RO
1AR0,RO
; ROUND BEFORE
; RO <. l'lC2 + ROI
; RO <. CI + RO
RND
MPYF
RO
RI,RO
; ROUND BEFORE *
; RO <. X*ICI + ROI
RI
.ARQ, RI, RO
INTERNAl CONSTANTS
;..
DATA
~
~
ENRM
<Q.,
~
;:
C7
.FLOAT
FLOAT
FLOAT
FLOAT
.FLOAT
.FLOAT
.FLOAT
.FLoAT
1.0000000000
().693147180
0.240226%9
0.055503654
0.009615978
0.001328240
0.000147491
0.000010863
17
.OIJRD
C7
'"l
::to
<::>
;:
...
S~
aa
CO
CI
C2
C3
C4
C5
C6
C7
; ROUND BEFORE *
; RO <. 11(C3 + ROI
(=
C2 + RO
LDF
TEXT
tv
; I/LNI21
;:
'0'">'
FLOAT 1.442695041
BND
ADDF
RND
II'YF
RETS
R2,R2
FPINV
tARO,RO
RO
R3,RO
;
;
;
;
;
TEST ORIGINAl X
IF X <0 THEN RO (. Ill, IDaAYEDI
RO (= 2HX = CO + RO
ROUND BEFORE *
RO <. 2*'11 + XI
;::::
f**H**HHfHUfHHHHHffHIHHHH*HH++Hf*HHH
PROORIII1: tR
'"B.
<:l
;:s
.s;,
~
;:s
B.
'0'"....>
S.
'"
PUSH
LOP
PUSIF
*
I
I
*
I
I
(=
POP
ASH
FLOAT
LOF
LDE
tRIRO).
SUBRF
LOF
tlPYF
LOF
LDI
POP
tUfHHtHHHHffUHffHfHHfHfHHffHHffHffHtH
.GLOB!. tR
tv
INTERNAL CONSTANTS
Q
C
DATA
SCALING CDEFFS. FOR LNIl+X)
LNRH
CO
.FLOAT O.693147100b
.FLOAT 1.0000000000
; LN(2)
; CO 11.0)
ca
FLOAT
.FLOAT
FLOAT
FLOAT
.FLOAT
FLOAT
FLOAT
.FLOAT
0.9999964239
0.4998741238
0.3317991l2S8
0.2407338084
0.1676540711
0.0953293897
D. D360B84937
0.0064535442
ACB
WORD
CB
(=
X ( 1.
; Til' OF (I
; TOP OF C2
;TDP~C3
; TOP Of C4
; TIP ~ C5
; Til' OF CO
;TOP~C7
; TOP Of ca
RO,RO
LOF
RETSLE
SAVE IF
LOAD DATA PAGE POINTER
SAVE AS FLT. PT.
R3 (= INTEGER FORMAT
R3 (= E = SIGNED EXP.
RI (= FLT. PT. E VALUE
R2 {= 1.0
EXP. RO (= 0 II (= X ( 2)
R2 (= X  I 10 (= X ( I)
RO (= tR(2)
RO (= E*tR(2)
R3 (= EILN(2)
ARO ) COEFF. i ABLE
UNSAVE DP
IARO,RO
; RI
; RO
; RO
tlPYF
ADDF
RI.RO
IARO,RO
; RO
; RO
HPYF
ADDF
RI,RO
IARO.RO
; RO
; RO
tlPYF
ADDF
RI.RO
IARO,RO
; RO
; RD
tlPYF
ADDF
RI.RO
IARoe.RO
; RO
; RO
RND
HPYF
ADOF
RO
RI.RO
IARo.RO
; ROUND BEFORE I
; RO (= XI(C3 + RO)
; RO (= C2 + RO
RND
HPYF
ADDF
RO
RI,RO
; ROl*W BEFORE I
; RO (= IIIC2 + RO)
; RO (= CI + RD
BUD
RND
HPYF
ADDF
START OF LN PROGRAM
\0
;
;
;
;
;
;
;
;
;
;
;
;
;
;
RND
HPYF
ADDF
POP
LN:
DP
lACS
RO
R3
24.R3
R3.RI
@CO,R2
R2.RO
RO,R2
@LNRM,RO
Rl.RO
RO,R3
@AC8.ARO
DP
R2,RI
'~RO,RI,RO
fARO,RO
(=
(=
(=
(=
(=
(=
(=
(=
(=
(=
(=
RND X
IIC8
C7 + RO
IIIC7 + RO)
CO + RO
lllCO + RO)
C5 + RO
XIIC5 + RO)
C4 + RO
X'(C4 + RO)
C3 + RO
TEXT
Ul
SCALE VARIABlE X
; TEST X
; RETI.IlN NOW IF X (= 0
R2
R2
RO
RI.RO
R3.RO
R2 (= RETURN ADDRESS
RETURN IllOLA YEO)
RIfflI) BEFORE I
RO (= XIICI + RO)
RO (= tRIX) + E'LN(2)
HHHftHHffHHf**fHHfHHHftHHHfHffHffHHHf
<
PROGRAII' ATAN
SCALE VARIABlE X
PUSH
LDP
<
<
<
<
<
<
<
(=
ABSF
SUBF
BLED
RND
RND
lOI
ATAN(RO).
PUSHF
ABSF
ADDF
1),
69.
<
LDF
CALL
R2
SKIP
flO,R3
RO,RI
I,IRO
,
;
,
;
,
NEGF
R3,R3
I,IRO
, R3 (= X'
, IRO < 2, (P1/4)
RI,RI,flO
!AC17,ARO
, flO (= 1><2
, ARO ) COEFF, TABLE
, tmAVE Ill'
.FLOAT 0,7853981035
.FLDAT 0.7853981635
,FLOAT O.OOOOO~
B.
'C.,'">
S,..
CI7
~17
,FLOAT
.FLOAT
.FLOAT
,FLDAT
FLOAT
FLOAT
,FLOAT
.FLOAT
FLOAT
1.0000000000
0,3333314528
O.lffl355085
o,1420S89944
0,1065626393
0.0752896400
0,0429096138
0,0161057367
0,OO2a662257
WORD
C17
~
N
TEXT
<:::>
<:::>
ATAN'
SUBI
; PI/4
SKIP'
, PI/4
, ZERO
;Rl<=lX~+l
, RO (= :x:  I
, RO (= <:X:  ll/(lX: + 1)
POPF
WED
RND
RNO
SUBI
;:s
, SAVE RND X
, RI (= IX:
.GLOIll ATAN
,GLOIll FDIV
::...
RI
RO,RI
@CI,RI
R2,RO
FDIV
INTRNAl CONSTANTS
, SAVE Il'
, LOAD DATA PAGE POINTER
, R2 (= IX:
, R2 ( :x:  I
, IF : X: ) I THEN SCALE (DELAYED)
,R3(=RNDX
, RI (= RND X
, lRO (= 0, POST SCALE INDEX
DATA
!AC17
RO,R2
@CI,R2
SKIP
RO,R3
RO,RI
O,IRO
HHHHHUHfH**HfHHfHHHIHHHfHHUfffHfHH
Ill'
, CI
;C3
,CS
; C7
, C9
,W
, C13
, CIS
, CI7
(=
X (= 1.
/PYF
lOI
POP
Ill'
GET ORIGINAl X
IF X ( 0 THEN RO (= X' (DELAYED)
R3 <= RND X'
RI (= RND X'
lRO <= I, (PI/4)
RO,RI
>ARO, Rl, RO
fARO,RO
, RI (= RND U<2
; RO <= XH2>C17
; RO (= CIS + flO
/PYF
ADOF
RI,RO
fARO,RO
PlPYF
ADOF
Rl,RO
IARO,RO
PlPYF
ADDF
RI,RO
fARO,RO
; RO (= X1f2f(Cll + ROI
, RO (= C9 flO
RND
PlPYF
RO
RI,RO
fARo,RO
R<X.tID IlEFORE 1
RO (= X"21(C9 RO)
RO <= C7 RO
ADDF
n
~
~
Q.
g'
AND
If'\'F
ADIf
RND
If'YF
AIlIlF
RND
If'VF
::
ADIf
~.
'c>
.,
So
'"
RO
RI,RO
<MO,RO
; RIX.ND BEmIE 0
; RO <= XH2<1C7 + ROI
;RO<=C5+RO
RO
RI,RO
<MO,RO
, RIX.ND BEFORE f
, RO <= Xo02f1C5 + ROI
,RO<=eJ+RO
RO
RI,RO
oARO,RO,RI
, ROONlI IIEFCIIE 0
; RO <= XH20leJ + ROI
;RI<=CI+RO
ffHHHfHfHfHHIHHHHfHHHJHfllffHHH1ffHtH
PROORM. SQRT
o
o
o
o
POP
BUD
R2
R2
RND
If'YF
RI
Rl,RI,RO
o++AROllROI,RO
AIlIlF
o
.0
HfHfHIHIHHHfHfffttHfHfHffHlfHHlHHHHfflf
.GLOIIL SQRT
ac
.DATA
INTERNAl. COOTANTS
OOTi
OOT2
SET
SET
CNST3
OOT4
.FLDAT 1.103553391
. FLDAT O. 780330086
SI1SK
.WORD
0.5
1.5
; ADJUSTED 1.0
; AIUISTED SQRTU121
0FF7FFFFFH
.TEXT
START If
SQRT
Llf
RETSLE
RO,R3
PROORM.
SQRT:
!S
PUSH
DP
SAllE DP
LIP
I!SIlSK
PUSHF
RO
R2
1SIISK,R2
R2,RI
POP
XDR
LDI
00
LDI
LSH
ASH
POSH
f'(ff
LIIE
LIIF
LSH
LIIFIf/
rl'Yf
POP
R2,R4
8,RI
I,R2
R2
R2
R2,RI
!CNST3,R2
7,R4
@CNST4,R2
R2,RI
lIP
;
;
;
;
;
,
,
,
,
,
,
R4 (= RI
RI (= RI EXP. REI1OI'EIl
R2 (= R2 WITII E/2 EXP,
SAVE R2 AS INTEGER
R2 (= FLT, PT,
RI (= (H/2)f2ffE/2
R2 (= 1,1 .. , fOR ODD E
TEST LS8 (f E (AS SIGN)
IF E EVEN R2 (= 0.78 .. ,
RI (= CORRECTED ESTlMTE
lNSAVEIIP
CNSTl, RO
RO
RO
, RO
(=
(=
V/2 TRIH:,
RHO Vl2
I1ARCH 1989.
(=
lIRO
ffHJ.IHHHflfHfffffffHHHHfffffffHHfHHfHHfHf
::t..
~.
~
::s
~
~.
'"
~
..,
So
~
N
C
ac
R2 (= X[O]"2
R2 (= (Vl2) X[O]ff2
R2 (= 1,5  (VI2) X[O]1I2
RI (= X[1] = no] * (1.5  (V/2)<x(O]ff2)
rl'Yf
I!PYf
SUBRf
rl'Yf
RI,RI,R2
RO,R2
CNST2,R2
R2,RI
I!PYf
rl'Vf
SUBRf
I!PVf
RI,RI,R2
RO,R2
CNST2,R2
R2,RI
,
;
R2 (= X[1]ff2
R2 (= (VI2) * XCI]ff2
R2 (= 1,5  (VI2) X[1] ..2
RI (= xm = XCI] (1.5  (Vl2)*X[lJ1I2)
I'I'VF
I1PYF
SUBRF
I1PYF
RI,RI,R2
RO.R2
CNST2,R2
R2,RI
,
,
;
,
R2 (= X[2]*,2
R2 (= (VI2) f m]"2
R2 (= 1.5  (VI2) X[2]"2
RI (= X[3] = X[21 * (1,5  (V/2)*X[2]*,2)
SET
SET
1.0
2.0
!15K
WORD
OFf7fFFFFH
TEXT
RHO
I1PVF
RNIl
I1PYF
SUBRF
RHD
II'Yf
RI
RI,RI,R2
R2
RO,R2
CNST2,R2
R2
R2,RI
;
;
;
,
,
;
ROUND
R2 (=
ROUND
R2 (,
R2 (=
ROUND
RI (=
BEFORE *
H3]*,2
BEFORE
(VI2) f H3]ff2
1.5  (Vf2) f X[3lff2
BEFORE
X[4] = XC3] (1.5  (V/2)'X[3]*,2)
R2
R2
R3
RI
RI,R3,RO
R2 (= RETlQlN ADDRESS
RETURN ([lAVED)
ROUND BEFORE ,
ROUND BEFORE
RO = SlIlHV) = V.SlIlTU/V)
OO,RO
TEST F
,RETURHNCIIIFF=O
PUSH
LDP
PUSHF
POP
XOR
PUSH
POPF
POP
lIP
MSK
RO
RI
MSK,RI
RI
RI
OP
;
,
;
;
g
~
Q.
g'
.s;,
~
;:::
Q.
g'
'"
~
...
s..
~
HHHffHHHHlflffUft**ff+fHfHtHH*****"*I**f**U
PROORAII: FDIV
MPVF
SUBRF
MPYF
RI,RO,R4
TWO,R4
R4,RI
, R4 (. F X[O]
, R4 (. 2  F X[Q]
, RI (. XCIl X[O] 12  F XCO]I
rlPYF
SUBRF
MPYF
RI,RO,R4
TWO,R4
R4,RI
, R4 (. F xm
, R4 (. 2  F X[I]
, RI (. X[2] ': XII] 12  F XlllI
MPYF
SUBRF
RI,RO,R4
TWO,R4
R4,RI
, R4 {= F X[2]
, R4 (. 2  F X[2]
, RI (. XI3] X[2] 12  F X[2]1
MPYF
RO,R4
RI,RO
RO,R4
ftH***U**IHfffflnl********fIfIHfHH**UHHU***flf
pop
BUD
SUBRF
MPYF
.GLOBL FDIV
.GLOBL FPINV
~
W
0C
ADIF
RZ
R2
0NE,R4
RO,R4
R4,RI,RO
R2 (. RETlilN ADORtSS
RETlilN (DELAYED I
R4 {= 1  F Xl3] = EPS
R4 (. X[3] EPS
RO (= XI4] 1X13]HI  1F<X[3]111 + x[3]
TEXT
START IF FDIV PROGRAM
FDIV:
RND
LOF
CALL
AND
MPYF
RETS
END
:8
R3 (. AND X
Rl (= Y
RO (. l/Y
RO,R3
RI,RO
FPINV
RO
R3,RO
ROiJND BEFDRE
RO (. X
,
RE~
f**H**fffffHfffff"****ffHffHHfffHHffffffIHHlffHHHtffHfff*HHlff
".'HfI*HH+HHfff**ffflffftHfHHf+HHHfHHHHff
PROORAI\: SINX
COSX
EXPX
L/iX
ATANX  COMPUTES AN
eo
IHIIHHIfIIHHHfIHIHI**HffffffHlfftHHHHfIHff
.GLDBL SINX
GLOBL ECOSX
OLDBL FMum
;:..
H**,*HIHfffHffflffflff**HHffffH*HfffHlfIfIIHfIHII*****HH*ff*HII*
INTERNAL CONSTANTS
DATA
~
~
~.
.s;,
SHF2
Q.
SHFI
COF
WORD
WORD
WORD
WORD
WORD
WORD
WORD
WORD
WORD
OOOOOOOA3H
00049QFDAH
OOOOOOODIH
OFFDAA2ISH
00OOOO0E3H
OFC2335EOH
OFSE69754H
OF3280B2SH
0E09997B4H
ACOF
WORD
COF
...
s.
'"
~
f:2
tv
0000OOO6FH
OFF22F9S3H
; BOTTOM OF 21PI
; TOP OF 2/PI
;:,
'
'"
'0>
WORD
WORD
; BOTTOM OF CI IPlI2)
; TOP OF CI IPlI2)
; BOTTct1 OF C3
; TOP OF C3
; BOTTct1 OF C5
; TOP OF C5
; TOP OF C7
; TDP OF C9
; TOP OF Cll
; ADORESS OF COEFFS.
CON
ACON
WORD
o
C
TEXT
CON
; ADORESS OF CONSTS.
;:...
~
~
PUSH
LDP
g'
~
;::s
~
g'
'"
~
..,
PUSHf
ABSf
LDf
OR
FIX
RO
00
@NRi1I,RI
I!NRI12,RI
FMULTX
RO,IRO
IRO,RI
RI,RO
RO,R3
I,IRO
3,100
2,IRO
R3,OO
~ON
@ACilN,ARO
++AROIIRO),RO
OO,R3
~Of,ARO
; SAVE ORIGINAL X
;OO(=:X:
; RI (= TOP II' 2/PI
; OR IN BOTTCI1 Of 2/PI
; RO (= a:*2/PI
; IRO (= INTEGER tlUADRANT G
; RI (= FLOATING Gli\DRANT Q
; RO (= X, I ( X { I
: R3 (= x
; R2 (= Q + I
; IRO (= TABLE INDEX
; LOOI< AT 2ND LSB
; If I THEN RO (= x
; LOAD DATA PAGE POINTER
; ARO ) CONST. TABLE
; FINAL MAPPING, RO (= X + C
; R3 (= x
; ARO } COEFF, TABLE
LDF
CALL
LDF
RO,RI
FMULTX
RO,RI
; RI (= X
; RO (= XH2
; Rl (= X**2
MPYf
ADDf
*ARO,RI, RO
*AROc,RO
; RO
{=
X**21C11
; RO
{=
C9 + RO
MPYF
ADDf
RI,OO
*ARO,RO
; RO (= X**2>1C9 + RO)
; 00 (= C7 + 00
~nF
OR
ADDf
RI,RO
*ARO,R2
*ARo,R2
R2,RO
00 (= XM2t(C7 + RD)
R2 (= TOP II' C5
OR IN BOTTOM Of C5
RO (= C5 + RO
CALL
LDf
OR
ADDf
FIUTX
*ARO,R2
*ARO,R2
R2,RO
R2 (= TOP (f C3
OR IN BOTTOI1 OF C3
00 (= C3 + 00
LDF
o
.
RO
(=
X**2*(C5 + RO)
FMULTX
*ARO, R2
*AOO,R2
R2,OO,RI
RO (= XH2*1C3 + 00)
R2 (= TOP (f CI
OR IN BOTTOI1 (f CI
RI(=CI+RO
FLOAT
SUBf
NEGf
ADDI
AND
TSTB
LDFNZ
LDP
LDI
ADDf
NEGF
LDI
oa
; SAVE OP
; LOAD DATA PAOE POINTER
ECOSX:
CALL
tv
OP
@NRi11
So,
~
CAll
LDF
OR
ADDf
SINX:
R3,OO
fMULTX
AS
R3,OO
OP
00 (= X
00 (= HRI = SINIX), (DELAYED>
TEST ORIGINAL X
IF X ( 0 THEN RO (= x
UNSIIVE OP
...,
13
H*ftHfHfHHfH*H*H*f*****HfHf*****HHH***,****f
**ffHHHHflf****Hf+Hf***************HHfUfHHHH
PflOORAI'\: COSX
PROGRAl1: EXPX
I
I
I
I
I
*HHffflfffHIHHHftHHHffHHHHHHHfffffHfHH
.GLOBL ElPX
GLOiIL FMULTX
.GLOBL FPIN~X
GLOlIL COSI
GLOlIL ECOSl
INTERNAL CONSTANTS
:A.
.TEXT
~.
<Q.,
~
;:::
DATA
SCALING COEFFS. FOR 2""X
COSI:
; SA~ lIP
, LOAD DATA PAGE POINTER
PUSH
LlIP
lIP
_1
II!D
ECOSI
tslFl,RI
tsIF2,RI
RI,RO
LIF
!XI
AIIIF
RO
Rl
III
RO
(=
(=
IN
(=
ENRM2
ENRMI
WORD
WORD
C7
AC7
WORD
""
RETIBI 0CClIiS
FRO'(
'C'
..,
it
~
<::>
Cl
<::>
, IIOTI!l1 OF l/LN(21
: TOP OF I/LNt2I
WORD
WORD
WORD
.WORD
.WORD
.WORD
WORD
.WORD
WORD
WORD
WORD
~
~.
00OOOO029H
0003SAA3BH
OOOOOOOOOH
OOOOOOOOAH
OFFCE8DEBH
00OOOOO6EH
OFD7SFDEDH
OOOOOOO4bH
OFB9CA833H
OF91D8CSbH
OFbDIE7A9H
OF3IAA7D7H
OEFC9BD9CH
C7
TEXT
START OF ElPX PROGrulI1
, CO (1.0)
, BOTIOM OF CI
, TOP OF CI
, BOTTOM OF C2
, TOP OF C2
; BOTI!l1 OF C3
,TOPOFC3
; TOP IF c\
; TOP OF CS
; TOP OF Cb
; TOP OF C7
g
~
~
;:s
;:s
;:s
'"
...
S.
~
ElPI'
BNO
ADDF
itIL VMIAIIlE X
PUSH
UP
II'
un
RO,II3
R3,RI
RI,RO,RI
113
24,113
113
113
IN:7,ARO
;S4M;1I'
; lOAD MTA PAGE POINTER
;R2<=X
; RI <= X
; IF X <OTIENRI <= IX:
; RI <= Ta> 0' I/LNI21
, m IN BOTTaI 0' I/LNI21
, RO <= X = :x:tLN121
, 113 <= I = INTEGER 0' X
, RI <= FLT. PT. I
, RI <= FROCTIOO 0' :X:, 0 <= X ( I
, 113 <= I
, Ifl'IE I TO EXP
, S4M; AS IN!.
; 113 (= FLT. PT. 2....1
, ARO ) COO'F. TABLE
pa>
II'
, LtiSAVE II'
LIF
UfII
LIF
Ol
au
FII
FlOAT
SUIF
IEGI
LSH
PUSH
POPF
1N:7
RO,R2
RO,RI
R2,RO
IBHIl,RI
_,RI
FlU.TX
tztv
If'iF
AllIIF
IARO,RI,RO
IARO,RO
, RO (= IIC7
, RO (= C6 + RO
If'YF
AllIIF
RI,RO
<ARO,RO
, RO (= X'IC6 + ROI
, RO (= a; + RO
If'YF
AllIIF
RI,RO
<MO,RO
; RO (= XI{CS + RO)
, RO (= C4 + RO
; RO (= Xf(C4 + RO)
, R4 (= TOP 0' C3
, m IN BOTTOM OF C3
,RO(=eJ+RO
If'YF
LIF
RI,RO
IARO,R4
fARO,R4
AllOF
R4,RO
If'YF
LIF
Ol
AllIIF
RI,RO
fARO,R4
'ARO,R4
R4,RO
, RO
(=
au
, RO
, R4
(=
AllIIF
Fl!UI.TX
<ARO,R4
'ARO,R4
R4,RO
, RO
(=
CI + RO
au
Fl!UI.TX
, RO
(=
IIICI + ROI
LIF
, RO
, R4
(=
{=
, m IN
(=
lI(eJ + ROI
TOP 0' C2
SOTTIli'I OF C2
C2 + RO
X<lC2 + ROI
TOP OF CI
, til IN BOTTOM OF CI
LIF
R2,R2
, TEST ORIGINAL
x
II'YF
LDI1
RETS
FPINVX
IARO,RO,RI
R3,RI,RO
RI,RO
(IF
tl FPINVX BRANCHI
HHHfHffHftHftHffffffHffHtffff***ffff*ffflffffff*
PIIOGRIIII: LHI
~'UTTEN8Y:
GARY. A.
.WffiI)
CB
TEXT
SlTT~
ACB
1NlERW. ClJj5TANTS
LDF
LDI
lIP
eACS
RO
R3
24,R3
R3,RI
@C0,R2
R2,RO
RO,R2
@lHR"I.RO
@lIIRI12,RO
FlU.Tl
RO,R3
tACS,AAO
DATA
POP
DP
EV~UATE
PUSH
LDP
PUSlf'
POP
ASH
FLOAT
LDF
EI~_NAIS
.a.ra
m.oa.
llIE
SUBRF
LHI
FlU.Tl
LDF
(Il
~L
~
.
~
~
!::
;:
URI2
URU
po"YIDII~
CO
CS
;:
'"
'C'
....
So
~
~
C
II
f(Il
.IQUJ
:IQUJ
,IQUJ
.IQUJ
.IQUJ
,IQUJ
.IQUJ
.IQUJ
.IQUJ
IIOOOOIIOFFH
i IIOTIOM OF CI
1IFF7FFFC3H
, TIIP OF CI
, IIOTIOM OF C2
,TllPOFC2
; IIOTI~ OF C3
; TIIP OF C3
;_OFC4
;TllPOFC4
; IIOTI~OFC5
; TIIP OF C5
; IIOTI~ OF Cb
,TllPOFCb
, IIOTI~ OF C7
; TIIP OF C7
; TIIPOFCS
000000084H
OFE80107fH
OOOOOOOIl'!ti
0fE29E11fH
00000009711
~13H
00000004IH
III'D2IIIDI2H
0000000E7H
IJ'CIICC3'IH
;IQUJ
.IQUJ
.IQUJ
1fB130187H
,IIIRI
COO'fS.
.FlOAT 1.0
.IQUJ
~
C
, 8OTIOM OF LHI21
, TIIP OF LHI21
0000000F7H
OFF317217H
LDF
~YF
LDF
.IOID
.IOID
000000043H
IF8M:lI?IfH
, TEST X
, RETUfIN IiJW IF I (= 0
OR
ADDF
~YF
LDF
OR
ADDF
~YF
LDF
OR
ADOF
~L
LDF
OR
ADOF
~L
LDF
,
:
,
,
,
,
,
,
,
,
,
,
,
,
,
SAVE lIP
LOAD DATA PAGE POINTER
SAVE AS FLT. PT.
R3 (= INTEGER FORMAT
R3 (= E = SIGNED EIP.
RI (= FLT. PT. E V~UE
R2 (= 1.0
EIP. RO (= 0 11 (= X ( 2)
R2 (= I  I 10 (= X ( 11
RO (= TIIP OF LHI21
(Il IN 8OTIOM OF LHI21
RO (= E'LH121
R3 (= EflNI21
ARO ) COEFF. TABlE
UNSAVE lIP
TRUNCATED SERIES
R2,RI
'ARo. RI, RO
fARO,R2
tARO,R2
R2,RO
, RI (= I
, RO (= x.es
,R2(=TIIPOFC7
, OR IN IIOTIOM OF C7
, RO (= C7 t RO
RI,RO
'ARO,R2
fARO.R2
R2,RO
,
,
,
,
RO
R2
OR
RO
RI,RO
'ARQ,R2
*ARO,R2
R2,RO
,
,
;
;
RO (= 1.ICb + ROI
R2 (= TOP OF C5
OR IN IIOITOM OF C5
RO (= C5 + RO
FII.JLTX
'ARO,R2
tARo,R2
R2,RO
,
,
,
,
RO
R2
OR
RO
FII.JLTX
'ARo,R2
, RO (= 1.IC4 + ROI
, R2 (= TOP OF C3
(=
(=
IN
(=
(=
(=
IN
(=
1.IC7 + RO)
TIIP OF Cb
IIOITOM OF Co
Cb t RO
XfIC5 + ROI
TIIP OF C4
IIOITOM OF C4
C4 + RO
g
~
B.
;:
B.
'C....'"i'
(II
au.
UF
(II
AIIF
au
UF
III
au
,IIIINIIOTTOIIFCl
,RO(=Cl+RO
AIIF
RETS
PROORAII: ATANX
I
Flll.TX
1IIRO,R2
1IIRO,R2
R2,RO
Flll.TX
1IIRO,R2
1IIRO,R2
R2,RO
, RO (= XIIC2 + 1101
, R2 (= TIP IF CI
, III IN IIOTTOIIF CI
,RO(=CI+RO
Flll.TX
R3,RO
I
I
I
I
, RO (= LNIXI + EtlNI21
So
'"
HHHtftHHHHtffHftHHHHfHIIHtIHHHHHHHH
1IIRO,R2
R2,RO
HfltHHHfftffHfHHtHflfHfHftHtHHtfffHHHHII
; IIET\Rl
~
N
GLOBL ATANX
.GLOBL FItULTX
.GLOBL FDIVX
Cl
INTERNAl. COOSTANTS
.MTA
SCAlING COEFFS. FOR ATANIXI
WORD
.WCIlII
OOOOOOOSDH
OFFB6F0251
WORD
0000000A2H
.WCIlII
OFF49OFMH
OOOOOOOOOH
OSOOOOOOOH
.IIORD
.WCIlII
BOTIon
TIP IF
IIOTIon
TIP OF
SOTTon
TIP OF
IF P1I4
P1I4
IF PII4
PII4
OF ZERO
ZERO
.WOOD
.IIIIRD
WORD
.IIORD
WORD
WORD
1lFD4C88E4H
OOOOOOOFFH
OFDEE8038H
000000056li
OFC5A3D83H
.IIORD
000000093H
.WCIlII
OFCE5CEeIIH
OOOOOOOBFH
OF82FCIFDH
.IIORO
WORD
.IIORO
OOOOOOOOOH
000OOOO6EH
0FEII55594H
OOOOOOOD9H
.IIORO
WORD
;
,
;
;
;
;
;
;
;
,
;
;
,
TIP IF
SOTIO"
TIP IF
IIOTTGn
TIP IF
IIOTIDn
TOP IF
IIOTTDn
TIP OF
IIOTIGn
TIP OF
IIOTTDn
TOP OF
CI (1,01
OF Cl
Cl
OF C5
C5
OF C7
C7
OF C9
C9
IF ell
Cll
OF ell
CI3
.1lRD
Cl7
.1IIIlII
OfAFB9IFEH
Of73BD74AH
ACl7
.1IIIlII
Cl7
, TOP OF Cl5
, TIP Of Cl7
UF_
UF
UlI
PUSIF
ABSF
AIIF
UF
au
IP
1AC17
RO,R2
1e1,R2
SKIP
RO,R3
RO,RI
O,IRO
RO
RO,RI
IeI,RI
R2,RO
FDIYX
15'
;:,;
POPF
&D
UF
UF
stili
IEOf
~.
'C'"....>
So
stili
SIClPI
,SAYEIP
, lOA1J OOTA PAGE POINTER
, R2 <= IX;
,R2(=:X:I
, IF IX: ) I TIN SCALE (InAYEDl
, R3 (= X
, Rl (= X
, lRO (= 0, POST SCALE INDEX
au
UlI
I'U'
SAYE X
Rl (= IX;
Rl <= IX: + I
RO <= IX:  I
RO (= !iX:  lImx: + 11
!lET ORIGINAL X
IF I ( 0 TIN RO <= I' (InAYEDI
R3 <= X'
Rl <= I'
lRO <= Z, IPII4J
R4
SKIP
RO,R3
RO,RI
2,IRO
,
,
,
,
,
RO,R3
Z,IRO
, R3 (= X'
, lRO <= 4, IP1I4J
RlLTX
1AC17,ARO
IP
, RO <= XH2
, ARO ) COO'F. TABLE
, lNSAI'E IP
II'Yf
,
,
,
,
RO
R2
OR
RO
(= IHZ'ICll + ROl
(= TOP OF C9
IN roTTCft OF C9
(= C9 + RO
FnTX
tARG,R2
tARO,R2
RZ,RO
FIIlUI
tARQ,R2
tARG,R2
R2,RO
,
,
,
,
1IF
00
ADDF
FIIlI.TX
tARo,R2
tARO,R2
R2,RO
, RO (= IHZt(C5 + ROJ
, R2 <= TOP OF C3
, OR IN ronCft OF C3
,RO(=C3+RO
CALL
FI1I.ILTX
; RO
UF
NIIf
IPYF
UF
011
RO,RI
_ _ ,RI,RO
_ _ ,RO
RI,RO
,Ri
_ _,R2
CALL
LDF
OR
ADOf
au
lDF
00
ADDF
RO
R2
OR
RO
(= IHZt(C7 + ROJ
(= TOP OF C5
IN roTTOI1 OF C5
(= C5 + RO
(=
XH2f(C3 + RO)
FINISH lP
[3
FnTX
'ARO,R2
fARG,R2
R2,RO
CALL
,
,
,
,
_,
<1>
~
tv
c
(= XHZ'ICI3 + ROl
(= TOP Of ell
IN ronOll OF Cll
(= Cll + RO
00
ADIF
;:,;
RO
R2
OR
RO
LDF
!LED
,
,
,
,
au
ABSF
5UBF
;:,;
RI,RO
IARO,R2
'ARQ,RZ
RZ,RO
LIP
tl'YF
00
ADIF
SCALE VARIABLE X
, RO (= Cl3 + RO
.TEXT
PUSH
R2,RO
L]f
ATANI'
ADIF
, Rl < XH2
, RO <= XH2ft17
, RO<=CI5+RO
RO <= XH2tICI5 + ROJ
R2 <= TIP Of Cl3
OR IN _
Of Cl3
ADDF
1IF
au
NOP
fARO,RO,RI
R3,RO
FIIlI.TX
_!lROJ
,
,
,
,
RI (= Cl + RO
RO (= I (SIGNEDl
RO (= ATANI!Xl = XI(1 + ROl
ARO ) C (0,0, PI/4 III P1/4J
LDF
00
ADDF
R4
R4
tARO,RI
tARO,RI
RI,RO
R4 (= RETI.IlN ADIflESS
RETURN IInAYEDJ
Rl (= TOP Of C
OR IN ronCft OF C
RO (= ATIIN!Xl + C
~
~
LSH
ASH
I
I
g'
.:;,
~
;:s
~
~.
I
I
I
I
.GLOIL SllRTX
GLOIIL FIlLTI
1 _ CONSTANTS
~
N
a
o
.DATA
CNSTl
SET
CNST2
CNST3
CNST4
.SET
S/tSI(
.IIlRD
0.5
1.5
.FLOAT 1.103553391
.FlOAT 0.7S033OOSI>
, ADJJSTED 1.0
, ADJUSTED SllRTIlI2)
OFF7FFFFFH
rl'VF
;
;
;
;
;
;
;
,
,
RI (= RI EXP. REIIlVED
R4 (= R4 WITH E12 EXP.
SAVE R4 AS INTEGER
R4 <= FlT. PT.
RI (= flIV2)12HE12
Ri <= 1.1 ... Fill ODD E
~ LSS OF E lAS SIGH)
IF E EVEN R2 <. O. 7B ...
RI <= COIRECTED ESTJIlATE
CNSTl, RO
R3,RO
, RO
, RO
(=
Vl2 TRIJ'lC.
All PREC.
<= Vl2
=X 
VH2
=0 ...
RI,RI,R2.
RO,R2
CNST2,R2
R2,RI
R2
R2
R2
RI
rl'VF
rl'VF
SUDRF
rI'VF
RI,RI,R2
RO,R2
CNST2,R2
R2,RI
, Ri <=
~VF
rI'VF
SUDRF
~VF
LDF
CALL
LDF
LDF
LDF
SORTX'
CALL
LDF
RETSLE
RO,R3
<= x[0]H2
<= IV/2) X[OIH2
<= 1.5  IVl2) x[0]H2
<= XIll = X[OI * 11.5  IVl2)'X[0]"2)
rl'VF
rI'VF
SUBRF
rl'VF
LDF
.TEXT
SUBRF
LDF
CALL
, R2
, R2
<= XIllH2
<= IVI2) * x[IlH2
,. RI
<=
RI,RI,R2
RO,R2
CNST2,R2
R2,RI
,
,
,
,
R2
R2
R2
RI
(= X12]1I2
(= IVl2) X[2)1I2
<= I.S  (Vl2) x[2]1I2
<= X[3] =X[2) U.5  (Vl2)1XI2]ff2)
RO,R2
RI,RO
FIt.l.TX
RI,R4
R2,RI
R4,R2
Flt.l.TX
CNST2,RO
R2,RI
FItLlTX
,
,
,
,
,
,
,
,
,
,
R2
RO
RO
R4
RI
R2
RO
RO
RI
RO
(= Vl2
(= x[3)
<= X[3)1I2
(= X13]
<= Vl2
<= x[3]
<= IVl2) f X[3]1I2
<= I.S  (Vl2) x[3)H2
<= X[3]
<= x[4] =X[3] U.S  (Vl2)fX[3)'f2)
IIRD
LDF
PUSH
LDP
PUSHF
I'(P
LSH
8,RI
1,R4
R4
R4
R4,RI
lCNST3,R2
7,R5
1CNST4,R2
R2,RI
"IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHHflllllllllllllllfH
It
::.l
f'(f'F
LD
LDF
LDFI'fj
I
I
'0...
PUSH
lOR
LDI
LDI
no] =SQRTUI2)'UK/2)f2HE/2
IlP
ISI1SK
RO
R4
@SltSI<,R4
R4,RI
R4,R5
SAVE II'
LOAD DATA PAOE POINTER
SAVE V AS FlT. PT. V = U+K)f211E
R4 <= v AS INTEGER
R4 (= COll'LEltENT ALL SUT SIGN
RI <= I H1/2)'2"E
RS <= RI
I'(P
NlP
FMULTX
R3,RI
DP
RO = SIlRTlV)
RI <= v
ltlSAVE IlP
DEAD CYCLE
=VtSllRTIl/V)
IDELAYED)
o00
HftffHHHfffHffHffHH*****HIU*HHHH*****Htflf
PROGRAII: FPINVX
(=
I/RO.
11.IIIII.J.I.tfHHHfHHfHHHHHHHHHfHfHflHH
EXTERNAl. PROORAII
"~.
,DATA
TIll
.SET
SET
I,D
2.0
l15li
.1IIJllJ
0fF7FFFFFH
(J
.TEXT
START IF FPINIIX _
;::
~.
F1'INIIX'
UF
RO,RO
'\)'
...
;:;.
'"
~
~
N
C
ac
; TEST F
; RETlRHDI IF F = 0
ITSZ
Il'
IV'f
RI
lP
I'll'
IIISK
RO
RI
IIISK,RI
RI
RI,RO,R4
TWO,R4
R4,RI
R4
R4
RI
(= F I HO]
<= 2  F I nOJ
<= X[I] = HO]
12  F
I1PYF
SUBIIF
I1PYF
RI,RO,R4
TWO,R4
R4,RI
R4
R4
RI
(= F I X[I]
<= 2  F I XI I]
<= X[2] = X[I] I
12  F X[I])
MPYF
SUBRF
MPYF
RI,RO,R4
TWO,R4
R4,RI
R4
R4
RI
(= F I X[2]
(= 2  F I X[2]
<= X[3] = X[2]
12  F I X[2])
CAll.
SUBRF
CAll.
ADDF
.END
INTERNAl. COOTIINTS
MPYF
SUBRF
I1PVF
RETS
,Gl.O!IL FPINIIX
,GlOII. F19JLTl
::...
SAVEll'
LOAD DATA PAIi: POINTER
SAlE AS FLT. PT, F = I1+H) I 2HE
FETCH lIACl( AS INTEGER
COfLEIIENT E ~ N BUT IflT SIGN BIT
SAlE AS INTEGER, AND BY MGIC ...
RI (= X[OJ = 1111/2) I 2HE,
lNSA\E Il'
FrI.IlTX
M,RO
FMlUX
RI,RO
RO
RO
RO
RO
= 1X[3]
<= F I
(=
<=
(=
11  IF
X[O])
X[3]))) X[3]
Xl3] = I + EPS
I  F X[3] = EPS
Xl3] I EPS
X[4] = 1X[3ltil  IFlm]) + X[3]
; RETURN
_ . FDIYX
'
..s;,
~
::s
~
~.
'&
....
So
'"
HHtHHHfHffl tHHHfHHI**"IHf**HHHfIHHH
: ..... 11111111111111111111111.11111111.
 
PROGRAII' FlU.TX
WRITTEN BY, GARY A. SITTCW
GAS LIGHT SlfTWARE
IIlJSTON, TEXAS
MARCH 1989.
(=
RO/RI.
1IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHtf
I
I
I
I
fHHfHfHfHfIHfHffffHf,**"**HfflfffHltiHtHtIU
.1l.OII. FDIVX
.1l.OII. FPINVX
.1l.OII. FlU.TX
.GLoa. FllULTX
,TEXT
ac
TEXT
FlIULTXI
FDIVX:
ABSF
LDF
LDF
CALL
LDF
IIR
RO,R3
Rl,RO
FPINVX
R3,RI
FlU.TX
ROfRI.
~
tv
(=
R3
Rl
RO
(=
ABSF
I1PVF
(=
I/V
LDF
RI<= X
RO
XOR
(=
(=
XIV
ANON
SUBRF
I1PYF
ADDF
LDF
ANON
SUBRF
Pl'YF
ADDF
IGF
RO,R4
Rl,RO
RI,R7
R4,R7,R6
R4,R5
OFFH,R5
R4,R5
R7,R5
Rb,R5
R7,R6
OFFH,R6
R7,R6
R4,R6
Rb,R5
RS,R6
,
,
,
;
,
,
,
;
,
R4
RO
R7
R6
R5
R5
R5
R5
R5
R6
(=
(=
(=
(=
(=
(=
(=
(=
(=
<=
, Rb (=
, R6
(=
, R6
(=
; RS
(=
(=
, R6
:XA:
SIGN INFO ..
:XB:
AlB
:XA:
A = XA  EAt2H24
EA12tt24
BlEAt2H24
AlB + BtEA'2"24
:XB:
B = XB  EB'2H24
EBt2H24
AtEB*2H24
:XAtXB: =AlB + (B*EAtA*EB)*2H24
 :XAIXB:
IN
f5
POP
R4
R4
!IUD
R4
RETtffl (DELAYED)
LDF
LDFN
LDF
RO,RO
Rb,R5
R5,RO
(=
RETURN ADDRESS
w
.....
o
HHHHfftHfHffHHflftHH**ffHfffff:lHHHffHfHIHHH**HHf**IHHH
IHHHlfHIHHH*******HIHHfHH***HIHHHHHflH
PROGRAi'I:
INPUT RESTRICTlONS: RO ) O.
REGISTERS FOR INPUT: RO.
REGISTERS USED AND RESTORED: SP.
REGISTERS ALTERED: lROl AND RO.
REGISTERS FOR OUTPUT: RO.
ROUTINES NEEDED: NONE.
PROGRAM: SI1ATHI.AS/1
INTEGER 132BITl I1ATH ROUTINES
SI1ATHI.AS/1 CONSISTS Of THE FOLLOWING ROUTINES:
ILOO2' COIfUTES M= LOO2INl, N=( 2nM FOR USE WITH RAOlX 2 FFT
PROORAIIS.
'1
I1ARCH 1989.
lum
*
*
I
<=
IINTEGERl LOG2IROl.
fffUfHflHHHHHfHHHHHHH*HHHHHHfHHHH
;:...
~.
;::
~
~.
'C....'">
;;.
'"
~
~
ac
LOOP:
LDI
lOI
1,IRO
l,IRl
; IRO (. I I IN IT. 11
; IRI (. M IINIT. 11
CMPI
IRO,RO
LOOP
1,IRO
1,IRl
100,00
; COMFARE ITO N
; LOOP IF N ) I IDElAYEDl
; I (= 2*1
; M =H + 1
; COMPARE I TO N
IRl,OO
; 00 (= LOO2INl
; RETURN
BOlD
L5H
ADOI
CMPI
LDI
RETS
::...
g
~
~
~.
~
....
So
o
o
o
o.
o
SUBRI
SUBRB
>
OONE'
HHHHfHHHfllllllllllllllfHHHUHfHHfHUfHfff
EXTERNAl. _
NArES
.GLOIL linT
ac~
.lEXT
START
(F
llU.T PROOIW1
IIU.TI
XIIl
ASSI
ASSI
RO,RI,ARO
RO
RI
16,ARI
ARI,RO,R2
OFFFFH,RO
ARI,RI,R3
OFFFFH,RI
ARI (= 16
R2 (= XI =
RO (= XO =
R3 (= YI =
RI (= YO =
(f(Il
SHIFTSI
UPPER 16 BITS
(F
LOWER 16 BITS OF
UPPER 16 BITS OF
LOWER 16 BITS OF
RO,RI,R4
R3,RO
R2,RI
RO,RI
R2,R3
R4
RO
RI
RI
R3
(=
(=
(=
(=
(=
XO<YO
XO<YI
XIfYO
P2+P3
XI>YI
PI
P2
P3
P4
LDI
LSH
CllPI
RI,R2
16,R2
O,ARO
IlOI
ARI,RI
R4,R2,RO
R3,RI
IF
RI
RO
RI
)=
(=
(=
(=
~
;:
~.
LSH
ADDI
ADDC
1JI.ISl'(Jj.lEXAS
IIAROi 1989.
~
~
BGED
PIaMI. llU.T
R2 (= P2+P3
R2 (= LOIIER 16 BITS OF P2+P3
CHECK THE SIGN OF THE PRODLtT
HETS
O,RO
O,RI
RO
, RI
(=
(=
(F
OPPOSITE SIGN
110
WI (WITH B!IlROW)
,RETURN
HHffffHfHllHHHfffH.HffHlfHfffffHtHHHHffU
PROGRM: IDIV
SUBI
LSH
IRO,IRI
IRI,RI
I1ARCH 1989.
IRI
RI,RO
LOI
SUBRI
LSH
NEGI
LSH
SUBRI
LSI!
INPUT RESTRICTIONS: RI != O.
RO,RI
31,IRI
IRl,RO
lRI
IRI,RO
32,IRI
IRI,RI
RI (= lREI1AINDER, IlOOTlENr:
IRI (= 32  (IRl+lI
RO {= RO SHIFT LEFT IRI
IRI (= IRI
RO {= lXi/IVI
IRI (= (JRI+D
RI (= lREI1AINGERI
tHtHHIIIII.I.I IIIIIIIIIHHHffHftHHfHfHfHfH
EXTERIW. _
NEGI
NAItES
ASH
LOINZ
Cll'1
.1i.OBL IDIV
RO,R3
31,R2
R3,RO
O,RO
RETS
R3 (= 1Xi/IVI
TEST SIGN BIT
IF SET RO {= RO
SET STATUS FROII RESULT
RETURN
START IF IDlY _
:t.
12
~
;:;
~.
~
~
;:;
.TEXT
ZERO:
IDlY:
RETS
XOO
ASSI
ASSI
RO,RI,R2
RO
RI
R2
RO
RI
{=
(=
(=
SIGNUIt (RO/Rli
IXI
IVI
0'
i:;
Cll'1
IIIUD
'0'
....
FlOo\T
PIJSIF
p(I'
LSH
oc
FlOo\T
tv
C
LOI
LOI
PIJSIF
p(I'
LSH
RO,RI
ZERQ
RO,R3
R3
IRI
24,IRI
R3 (= NlRIIAlIZED DIVIDEND
PUSH AS FLOAT
IRI (= INTEGER
IRI (= DIVIDEND EXPfIIENT
RI,R3
R3
IRO
24,IRO
R3 (= NIlM.IZED DIVISOR
PUSH AS FlOo\T
IRO (= INTEGER
IRO (= DIVISOR EXPONENT
RO,RI
0,00
RI (= lREI1AINDERI
RO (= IlOOTlENT
RETURN
::t..
1IIIIIIIIIIIIIIIIIHHHfffHfHffHffHHHfffHHHfHffflfHffHfffflHHH
ffffl**HfffffUHffHffffflffffltftHlffHHffHffHH+HfH
PROGRAM: tCffil1ULT
t
t
~.
'0'
...,
PROORIIIU '<'{CTal.ASII
<'{CTOO UTILITIES
g.
.s;,
~
;::
~
~
IV
C
ac
FEBRUAAY 1989
>
IHln**fHfHH**fffHnl*'*HHfHltnHHHftfffHlfflffl:lH
ARRAY LENGTH N
ADDRESS OF I NPUT
ADDRESS OF I NPUT
ADDRESS OF INPUT
ADDRESS OF Uf'UT
$N
SIAD!
SIAD2
SSAD!
S5AD2
X!
VI
X2
V2
w
w
TEXT
I'ICORItULT:
"""
LOP
LDI
LDI
LDI
LDI
LDI
tUUHItHIHHH.fHHHfHHltHHHHIHHUHHfIHHHHf
HPARIIS
@SN,Re
HIADI,ARO
HIAD2,ARI
HSADI,AR2
@SSAD2,AR3
PROORAI1: fCOO~lLT
I
I
~PIIJI. T
RCtRtJl.T:
I,Re
;Re(=NI
RPTB
LOOPI
~YF
fARO, fAR2,RI
fARI, fAR3,R3
lAR2++,fARI,RO
RI,R3,R2
fARO, fAR3++ ,RI
RI,RO,R3
R2,fARO++
R3,lAR1++
~YF
~YF
::
ADIF
~YF
SUBF
UXPJ: STF
::
STF
~
~
:4.
o
;=::
RETS
; RETlI\N
ENTRY PROTOCOl:
VARIABLES FOR 11f'IJT'
SIAD! ) mOl, SIAD2 ) mOl,
SSADI ) 12[01, SSAD2 ) Y2[0),
$N = N lLENGTH), SPARIIS = DATA PAGE.
INPUT RESTRICTJ~' $N ) O.
REGISTERS ALTERED' Re, DP, ARO3 AND RO3.
HHfHHHHfHHHfHHHfHfHHHHllllllllllllllllllllff
~.
.GlO81.
.GlOBL
.GlO81.
.GlOBL
,GlO81.
;=::
'"
'Cl'
...
s.
'"
~
~
N
C
a
c
ARRAY LENGTH N
ADmESS IF IIf'IJT
ADmESS OF IIf'IJT
ADmESS IF IIf'IJT
ADmESS IF INPUT
$N
SIADI
SIAD2
SSADI
SSAD2
EXTERNAL _
.GlO81. I1CONIU.T
.GlO81. Rrof1I.I.T
START OF _
XI
YI
X2
Y2
NAIS
TEXT
PEI1IllY BASED PARrITER ENTRY
::...
1tC3fU.T:
g
~
~
'
~
;::
~
.
'c'"s>'
HHHnfHHHfHf*H*ftHIHHlH+HHfHffHHHfHHI
f
UP
I$PMIIS
IlII
IlII
IlII
IlII
IlII
@SH,Re
I$IADI,ARO
I$IAlI2,ARI
HSADI,AR2
1$SAD2,AR3
....
SUBI
I,RC
, RC (= N  I
So
RPTB
If'YF
If'YF
If'YF
SUBF
If'YF
ADDF
STF
STF
LOOP2
tARO, tAR2,Rl
tARI, tAR3, R3
IAR2++,tARI,RQ
R3,RI,R2
tARO, fAR3++ , R1
RI,RQ,R3
R2,tARO++
R3,4AR1++
'"
e:J
N
::
ac
UJ(P2:
"
RETS
, RETURN
PROOlAII: fCBITREV
WRITTEN BY: GAAY A. SITTON
GAS LIGHT SOfTWARE
HOUSTON, TEXAS
MARCH 1m.
f
I
I
RCBITRE~
*
I
*
ENTRY PROTOCOL:
REGISTERS FOR INPUT:
ARO ) XlOl, ARI ) YIOl, RC = N (LENGTH).
INPUT RESTRICTIONS: RC )= 4.
REGISTERS ALTERED: RC, IRO, ARO3 AND RQ3.
IHHH*****HHHIHIHfffHfHfHHHHHHHfHIHHH
ARRAY lENGTH N
ADDRESS OF INPUT X
ADDRESS If INPUT Y
Vt
M<BITREV:
LOP
LDI
lOI
LDI
0 '\
UPARMS
@SN,Re
@SIADI,ARO
@SIAD2,ARI
*lfttltfHHfftftfHHHfHHIH*HnHntHftHIfHHHH
PROGRAM: 'F11IEEE
lOI
RC,IRO
:
3,Re
;
I,IRO
;
ARO,AR2
;
fAR2++IIRO)B
fARO++
;
ARI,AR3
;
IRO (= N
Re (= N  3
lRO (= N/2 FOR BIT REVERSE
AR2 J ARRAY I IBIT REV.)
: INCR. 8RIAR2) (OUTSIDE LOOP)
INCA. ARO (OUTSIDE LOOP)
AR3 J ARRAY Y (BIT REV.)
~MlEEE
ENTRY PROTOCOL:
VARIABLES Foo INPUT:
SIADI J 1[0l, SN = N lLENGTH),
SPARMS = DATA PAGE.
INPUT RESTRICTIONS: $N J O.
REGISTERS ALTERED: Re, OP, AROI ~ RDt.
::...
9
~
~
15'
;:;
<Q,
~
;:;
UF
UF
"
II
::
UO'3:
LDF
STF
STF
STF
STF
NP
RETS
LOIJ'3
;
AR2,ARO
;
LOO>3
;
fARl ++
;
'AR3++(!RO)B
'ARO++,RO ;
fAR2,R2
;
fARl,RI
;
fIIR3,R3
;
RO, fAR2
;
R2, fARD
;
RI,fAR3
;
R3, fARl
;
fAR2++(!RO)B
R2 (= XIJl
RI (= vm
R3 (= YrJl
x[Jl (= RO
ICll (= R2
YIJl (= Rt
YCll (= R3
; INCA. liR(AR2)
.'
HHHlffH**HHHHHHHHHHHHfHHfHHffHHHH
; RETURN
t
c
; ARRAY LENGTH N
; ADDRESS OF INPUT X
;:;
'a...'">
Gl.OBL MFMlEEE
GLOBL RFMlEEE
so:.
~
f2
N
<::>
a
<::>
WORD
WOOD
WORD
WOOD
WORD
OfFSOOOOOH
OFFOOO()()()!l
07FOOOOOOH
OSOOOOOOOH
OSIOOOOOOH
::....
TAIIA
g
~
HHHfffHfHHfHHfffHffffHHfffHHffHHfHHHfH
MEA
.TEXT
g.
.Q,
~
::s
CTAB
.I0Il)
START OF _
IflllEEE'
UP
"
~
...
So
'"
N.RC
ISIADI,ARO
RRUEEE.
SUBI
I,RC
ICTAB
ITAIIA,ARI
;RC<=NI
; lOAD DATA PAGE POINTER
; ARI ) aJNSTANT TABlE
L111
tv
L111
L111
::s
_ . *TOIEEE
<
SASEIl PAIWJER ENTRY
UP
RPTB
POPF
lOOP4
tARO, tARI,OO
tARO,OO
_1111,00
tMo,Rl
lOOP4
_1121,00
00
00
NEGF
00
GLOBl SPARtIS
; RETURN
.GLOBL SN
.GLOBL SIADI
AND
ADDI
L11IZ
L111
BGED
SUBI
PUSH
lOOP4' STF
RETS
RO,fARO++
fHfffHHfHfHHfltHlHfHHffHIH4Hfff*fffHHfHff
; ARRAY LENGTIl N
; ADDRESS OF INPUT X
.......J
mOIEEE.
....
00
LOP
LOI
LOI
HPARMS
HN,RC
@$iADI,ARC
**HHftHfHfffffH**HffHfHffHH*HffffHH...HfHHHH
PROGRN1: *VECI1ULT
RC{=NI
LOAD DATA PAGE POINTER
ARI ) CONSTANT TABLE
I,RC
@CTAB
eTASA,ARI
RPTB
ADDI
LSI!
:
:
,
,
,
,
,
,
,
,
OR
HAR1I3l,RO
RO,<ARO++
LSH
PUSHF
LDP
IlGED
POP
::to.
~
~
~
g'
~
;::
LOOI'5
<MO,RO
HAR1I4l,RO
I,RO
RO
tARO,RI
LOOPS
RO
HAR1!2l,RO
I,RO
ABSF
LDPI
tHffHHtfffHfHUHfHHfffffHfUHfHHHfffHfHffHfH
UXP5' STI
RETS
, RE1LIiN
.GLOBL IN
GLOBL SCNST
.GLOBL SIADI
I'
;:;.
~
....
GLOBL IIIWI.I.T
,GLOBL RVECI1ULT
tv
[3
C
TEXT
!VECI1ULT:
LOP
LOI
HPARI1S
HN,RC
LDI
UF
KIADI,ARO
KCNST,RO
, ARO ) nOl
,RO(=C
~
;:s
'"'
~.
~
....
RPTS
",YF
STF
So
~
SKIP!: STF
RETS
2,RC
RO,fARO,RI
O,RC
SKIPI
RC
RO,HtARO,RI
RI,>ARO
RI,'ARO
RC(=N2
RI (= c.no]
CIlIf'ARE RC TO 0
IF RC ( 0 THEN SKIP LIXP
; RETURN
ac
PROGRAM: fCON/lO'l
WRImN BY: GARY A. SITT~
GAS LIGHT SOFTWARE
HOUSTON, TEIAS
FEBRUARY 1989.
R'IECItl.T:
SUBI
"'YF
Cll'I
lilT
HtffHffHfHffffff**fHfftHHfHHIIIIIIIIIIIIIIHHHHH
ffHHfffHfHHHIHHHfHtHfHHHffHHffHftHfHHHH
ARRAY LENGTH N
ADDRESS OF CONSTANT C
ADDRESS OF INPUT X
MC~~MOV:
YJ
10
LOP
lDI
WARMS
@IN,RC
LDI
LlF
ISIADI,ARO
IItNST,RO
ARO ) xtoJ
,RO(=C
flfHHHfffftffHfHftHfHffHfffHHHffHfHHHfftHHH
I
PROORAI!' f\l:C/fJV
VECTOR
I1'IE~V
RWftI'i.
SUBI
I,RC
,RC<=NI
RPTS
STF
RETS
RC
RO, IARO++
~YE:
(N)= Il.
ENTRY PROTOCOL:
VAlUABLES Fill IIf'UT:
$IADI ) I[OJ, $1AD2 ) Y[OJ,
$N = N (LENGTHI, tPARI1S = DATA /'ME.
INPUT RESTR[CTlONS' $N ) O.
REGISTERS ALTERED: ftC, DP, AROI, AND RO.
,1m <= C
,RETLIlN
ffftHtHffffttHfHHHfffHHHHHflHHfffll.111111111111
g'
.GLOB!. tPARI1S
~
~
.GLOB!. $N
.GLOB!. $IADI
.GLOB!. $1AD2
ADDRESS OF IIf'UT I
ADIIRESS IF IIf'IJT Y
.GLOB!. I1'IECIIlII
.GLOB!. RVEOIJII
g.
'0'
.,
START OF _
So
Q
c
.Tm
""
f::J
N
ISPARI1S
m,RC
ISIADI,ARO
LDI
g
~
R
g.
.s;,
~
B.
'C'
...
s
Ilo
~
c
Q
c
ISIAIl2,1IR1
, IIRI ) YIOI
tHHHfHffffHIIIIIIIIIIIIIIII.lllllllllltHiHHHHHHIIIIIIIIIIIIIIIIIII
RADIX 2
R'iEOIlYI
SUBI
Uf
Cll'I
lILT
2,RC
tMO++,RO
O,RC
Sl(IP2
RC
S1F
RO,tARI++
RO,tARl
Sl(IP2: S1F
RETS
.EIID
tMO++tRO
CFFFT2 
I.
RPTS
Uf
I RE11JIN
ROOTINfS
IfFT2.ASII CCIISISTS
,RC<=N2
, RO (= UOI
, ilII'ARE RC TO 0
I IF RC ( 0 TIEN Sl(IP LOOP
(F
TI FIILLII/IM> ROOTINES:
CIlI'l)(
CIFFT2  COlt'LEX DIT INVERSE RADIX 2 FFT USING SEPARATE REAl lIND
IIVIGINARY ARRAYS lIND 314 CYCl SUE TABLE IlXES tlT ItnUlE
THE liN SCILE FACTOR.
ffffHftfHffHfftfffffHHHtfHHHfffHfHHtHffHfHHfHHlftHfffHtHf
HHUHHftffHHfHfHHHH**HHH**HfHHHfHIIHHHI
PROORAM: CFFFT2
LOI
LSH
LOI
lOI
LSH
lOI
MRCH 1989,
I
I
I
::...
~
OUTER LOOP
I
I
I
I
I
I
I
K (= K + 1
AR() ) XIOI
ARI ) XILI
AR2 ) YIOI
AIl3 ) YILI
SETUP 1ST INNER LOO' REPEAT COONTER,
RC lONE LESS THAN TI lESlRED II
I,AR6
@$IADI,ARO
R7,ARO,ARI
@$IAD2,AR2
R7,AR2,AR3
RS,RC
I,Re
STF
STF
FBLKI
fARO,lARI,RO
fARI,fARO,RI
IAR2,IAR3,R2
tARl, IAR2, R3
RO, tAR0++IIROI
RI,IARI++IIROI
R2,1AR2++IIROI
R3, fAR3++( IRO)
;
;
;
;
;
'
.s;,
.GLOBL S511
GLOBL WARMS
;:;
~'
~
....
$N
$M
SIAOI
SIAD2
,TEXT
CFFFT2:
UI'
ADDI
lOI
ADD!
LDI
ADOI
LOI
SUBI
twES
~
N
FLOOP:
HIIII""III'IIIIIIIIIIIIHflHfHHHlltHIHHHHfH.Hff
EXTERNAl _
IRI (= N
IRI (= N/4, OFFSET FOR COSIN!:
AR6 (= K IINIT, 01
R7 (= NI
R7 (= N2 IINIT, NI2I
RS (= IE IINIT,I1
lRO,IRI
2,IRl
O,AR6
IRO,R7
I,R7
I,RS
!.DI
tSN, lAO
; COMPARE M TO K
; IF K )= " TIN IlET\Rl
J (= 2, IPREINCREIlENTEDI
ARO (= I IINIT. II
AR2 (= I IINIT, II
ARS (= SINTABllAl IINIT. IA
2,AR7
I,ARO
I,AR2
@$SINE,AR5
RS,ARS
fARS,R6
AR5,IRI,AR4
@$IADI,ARO
@$IAD2,AR2
R7,ARO,ARI
R7,AR2,AR3
RS,RC
I,Re
RPTB
FBLK2
01
(= IA + IEl
R6 (= SINIXI, IX = 121P1IN)fIAI
AR4 ) COSIXI
ARO ) xm
AR2 } YIII
ARI ) XILI
AR3 } YILI
SETUP 2ND INNER LIXJ> REPEAT COONTER.
RC lONE LESS THAN THE lESlRED II
; AR5 ) SINTABlIA
;
;
;
;
;
;
;
;
;:....
SU!iF
g
~
'"Cl
::to
;::!
~
;::!
B.
Cl
;::!
'"
~
....
S<1\
~
~
a
C
:1
5IJBF
II'YF
ADIF
II'YF
STF
SUIIF
II'YF
::
ADDF
II'YF
II
STF
ADIF
FIiLK2' STF
STF
II
::
Cll'I
*MI, <MO, R2
tIIR3,tAR2,RI
R2,Rb,RO
tAR2,tIIR3,R3
RI,*M4,R3
R3, IM2++! lROI
RO,R3,R4
RI,R6,RO
<MO, *MI,R3
R2,<AR4,R3
R3,*MO++IIROI
RO,R3
R3, *MI ++ I lRO I
R4, IAR3++ II RO I
. R7,AR7
,
,
,
,
,
,
,
,
,
,
,
,
,
,
R2 (= XT = XIII  XILI
RI (= YT = Y1I1  YILI
RO (= XTtSIN AND ...
R3 (= YIII + Y!L1
R3 (= YTtCOS AND ...
YIII (= YIII + YILI, INCR. AIl2
R4 (= COStYT  SINIXT
RO (= SINtYT AND ...
R3 (= XIII + XILI
R3 (= eoslXT AND ...
XIII (= XIII + XILI , INCR. ARO
R3 (= eoslXT + SIN<YT
XILI (= eoslXT + SINIYT, INCR. ARI AND ...
YILI (= eoslYT  SINtXT, INCR. AR3
, COMPARE N2 TO J
IILTD
LOI
LOI
ADDI
FIN1.Cf
AR7,ARO
AR7,AR2
I,AR7
IIRD
LSI!
LOI
LSI!
FLOOP
I,RS
R7,IRO
I,R7
PROGRAM' CIF'l2
f
I
fIHHHfffHHHfHfltff**fffHHfHIHfHHHflffHfffff
)=
...,
LDI
@.?ARMS
I!$~. IRO
LDI
LSH
LDI
LDI
LDI
LSH
LDI
IRO,IRI
2,IRI
@$M,ARb
I,R7
IRO,RS
1,R5
2,IRO
IRI (= N
IRI {= N/4, OFFSET FOR COSINE
ARb (= K lINn. MI
R7 (= N2 IINIT. II
R5 (= N
RS (0 IE IINIT. N/21
IRO (= NI lINn. 21
OUTER LOOP
ILOOP: LDI
ADDI
LDI
ADDI
LDI
SUBI
FIRST
::....
g
~
B.
<Q,
~
;::
B.
Co
'&
..,
S~
RPTB
ADDf
SUBF
ADlF
SUBF
STF
I:
STf
IBLKI: STF
::
STf
CllPI
BEQD
!$lADI,ARO
R7,ARO,ARI
mAD2,AR2
R7,AR2,AR3
RS,RC
I,RC
I~
,
;
;
,
;
,
ARO ) 1101
ARI ) XILI
AR2 ) VIOl
M3 ) VILI
SETUP 1ST INNER LOOP REPEAT COUNTER.
RC lONE LESS mAN THE DESIRED II
"
"
;
,
;
,
;
;
,
;
R3, fM3tt( IRO) ,
@1II,ARb
SKIP
,CCMPAREMTOK
; If K = MTHEN SKIP TWIDDLED LOOP
MPYF
SUBf
MPYF
SUBf
ADDf
SUBF
STF
ADDF
STF
"
ADDF
IBLK2: STF
STF
"
IBLKI
tARO, tARI,RO
tARI, tARO, RI
*AR2, >AR3, R2
fM3, *AR2, R3
RO, tARO++ II RO I
RI,fARI++llROI
R2
Viel mehr als nur Dokumente.
Entdecken, was Scribd alles zu bieten hat, inklusive Bücher und Hörbücher von großen Verlagen.
Jederzeit kündbar.