Digital Signal Processing CSS55

Prof. Shankar Prakriya!
Indian Institute of Technology, Delhi!
12/9/14
Why DSP Processors?!

Typical DSP operations: Dot product, matrix
product, convolution, filtering, FFT Algorithm!
Typical DSP operations required!
INDEXING ISSUES in typical operations!
12/9/14
y = cTx where c = [c1,,cN] x=[x1,..,xN]!

The multiply accumulate MAC req.!
In a normal processor, ci and xi are accessed
sequentially from memory before the multiply
accumulate SLOW!
In a DSP processor, these are accessed
simultaneously!
12/9/14
DSP Processors utilize the Harvard/Modified

Harvard architecture (separate program and
data memory). !
!
Von-Nueman architecure is used by general
purpose processors!
!
12/9/14
DSP Processors like the C5510 allow the

user to define sections of the memory as
program/data memory!
Crossover path is allowed for some sections
(both program and data memory)!
The C5510 allows three quantities to be
accessed simultaneously!
!
12/9/14
Indexing requirement for repeated operation

of MAC!
One pointer for ci one for xi!
Increment of each pointer after MAC!
Check to see if N operations have been
completed!
The RPT instruction is used by C5510 to
repeat MAC!
12/9/14
When we multiply accumulate with fixed point

arithmetic, the accumulator has to have extra
guard bits to avoid overflow!
The C5510 has a 40 bit accumulator with 16
bit words!
12/9/14
Consider two finite length sequences h and x

of length N and Nx (convolution length N
+Nx-1)!
Assume h is stored in program memory and x
in data memory!
Convolutions of long length are often
implemented using FFT in practice!
12/9/14
y[n] = i=0N-1h[i]x[n-i] n=0,.., N+Nx-2!

To compute y[n], x[n]..,x[n-N+1] are used along with
h[0].,,h[N-1]!
To compute y[n+1], x[n+1]..,x[n-N+2] are used along with
h[0].,,h[N-1]!
We need the pointer for h to increase each time and repoint to h[0] for computation of next output!
The pointer to x[n] decreases within a MAC but needs to
increment by 1 in preparation for next output!
This can be achieved without extra computation in DSP
processors like the C5510!
12/9/14
It can be seen that the pointer to h[n] can benefit

from circular buffers!
In circular buffers, the pointer when incremented
beyond the last addresses goes back to the
starting addresses!
When decremented below the start address, it
goes forward to the last address automatically
savings in clock cycles!
12/9/14
10
Filtering is the most common operation in signal

processing!
Same as convolution, but the input x[n] is an
infinite length stream (real-time filtering)!
Used to implement lowpass, highpass, bandpass
or bandstop filters!
These frequency selective filters change the
frequency content of an input signal!
12/9/14
11
y[n] = i=0N-1h[i]x[n-i] is a Finite Impulse

Response (FIR) Filter!
H(ej) = nh[n] e-jn is referred to as the
Frequency Response of the Filter!
Note that H(ej) is complex, and both
magnitude and phase responses should be
studied!
12/9/14
12
FIR Filters are preferred for their guaranteed

stability with finite word-length effects or in
adaptive systems where h[n] are constantly
adapted!
FIR filters require large number of coefficients
but can be designed to have a linear phase!
12/9/14
13
y[n] = i=0N-1h[i]x[n-i] - i=1Mg[i]y[n-i]!

is the difference equation of the Infinite Impulse
Response (IIR) type!
Note the feedback of the output in the filter!
Stability is an issue for these causal filters since
the transfer function B(z) = G(z)/H(z)!
may have poles outside the unit circle!
!
12/9/14
14
y[n] = i=0N-1h[i]x[n-i] !
Note that an input array of size N is all that can
be used for x[n]!
Computation of y[n] requires x[n],,x[n-N+1]
but y[n+1] requires x[n+1]..,x[n-N+2] (x[n-N
+1] is no longer needed)!
12/9/14
15
OPTION 1: After y[n] computation, x[n-N+2] is

shifted to x[n-N+1] location, x[n-N+3] to x[n-N+2]
location, etc!
x[n+1] can then be stored in the same location
as x[n] was!
This complicated moving operations can
consume a lot of time!
Processors like the TIs C5510 perform this
move while computing each output!
12/9/14
16
Option 2: It is also possible for x[n] to be stored

in a circular buffer of size N!
A pointer is required to indicate the location
where a new incoming sample is to be stored.!

12/9/14
17
Accumulator has to be of larger number of bits

with guard bits
To avoid overow, it is important to scale inputs
DSP processes should be able to scale numbers
while accessing/storing numbers
Having multiple MAC units speeds ltering
more buses required!
Multiple MACs speeds ltering, assists in
complex multiplication/ltering
12/9/14
18
In many applications, it is desirable that the

phase (ej) of H(ej) = nh[n] e-jn possess
the GLP property!
(ej) = + over 0 2!
In these cases, the impulse response is either
symmetric or anti-symmetric!
These symmetry has to be exploited for
computational and power efficiency!
12/9/14
19
12/9/14
20
There are four types of FIR filters !

N even symmetric!
N even asymmetric!
N odd symmetric!
N even symmetric!
Eg. h[0]x[n]+h[1]x[n+1]+h[0]x[n+2] should FIRST
ADD x[n] and x[n+2], multiply with h[0] This
saves multiplications, and power (less use of
MAC units) !
FIRSADD and FIRSSUB of C5510!
12/9/14
21
Some applications require FIR filters with

complex coefficients to be implemented!
Some processors provide multiple MAC units to
facilitate this!
Multiple MACs can also speed up FIR filtering!
Care has to be taken to avoid overflow/underflow!
12/9/14
22
Assuming two matrices of size m x n and n x r

are stored in memory (row-wise)!
The first row and the first column of the second
matrix need to be accessed for the first element
of the product!
To access the first column, the pointer needs to
be incremented by r (such pointer manipulation
is provided for in C5510)!
12/9/14
23
Filtering
is efficiently implemented using

DFTs, which are implemented using FFTs!
X[k] = n=0N-1x[n]WNnk where WN=e-j2/N
(twiddle factor)!
DFT:
!
It
requires N2 complex multiplications for
direct implementation!
X[k] = n=0N/2-1xe[n]WN/2nk +WNk n=0N/2-1xo[n]WN/2nk !

!
Sum of two DFTs (N/2)2 each +N/2 extra!

Complexity has decreased (xe[n] and xo[n] are
even and odd parts of the sequences)!
12/9/14
24
Nlog2N
computations required in log2N

stages of FFT!
This algorithm is referred to as Decimation
in Time algorithm!
The sequence x[n] is often complex!
The basic computation is implemented as
a butterfly requires one complex
multiplication!
We can utilize the symmetry in WN: !
WNk = -WNk+N/2 for k=0,1,..,N/2-1!
12/9/14
25
The Buttery (note that only one complex multiply

is required)

12/9/14
26
12/9/14
27
Note that the input is in bit reversed order, but

the output is in normal order
Note the sequence of twiddle factors accessed
in each stage. Use of a circular buer with
variable pointer step can help in the
implementation.
Computations are in-situ, and all numbers are
complex
12/9/14
28
Note that the sequence of inputs accessed is

in a bit reversed order !
The last log2N bits are reversed in the first

stage, the last log2N-1 in the second, etc.!
The C5510 and other DSP processors

implement bit reversed ordering in hardware
while implementing the FFT no extra
instructions required!
12/9/14
29
12/9/14
30
Sequence is periodic, and has autocorrelation

that is periodic impulse!
Periodic sequence has length that is a
maximum of 2n-1!
Used widely in communications to scramble,
spread spectrum etc!
12/9/14
31
Has to tap selected registers, XOR the bits and

nd parity in single instruction!
Should be able to rotate logically
Concatenation of larger registers (accumulators)
desirable for larger length sequences
12/9/14
32
Used in communication receivers, decoders

Computationally ecient implementation of ML
receivers
Finite State Machines

12/9/14
33
12/9/14
34
Calculation of path metrics, rapid comparisons of

path metrics
Useful to perform multiple additions
simultaneously
Most DSPs allow addition of multiple quantities
at the same time, access multiple numbers
simultaneously
12/9/14
35
DSP requires very specialized functions to be

implemented, with specic indexing and other
requirements
DSP processor are designed to take care of these
Good programming requires a good
understanding of the DSP architecture
12/9/14
36
Prof. Shankar Prakriya

Indian Institute of Technology, Delhi
12/9/14
Fixed point arithmetic"

Q format"
Operations using Q format numbers"
Floating point numbers"
Finite word-length effects in Filters (intro)"
12/9/14
Many Practical DSP and FPGAs use fixed point

arithmetic"
Fixed point processors perform operations
strictly on integers"
This implies that there is loss of precision when
representing real numbers"
The C5510 processor uses 16 bits to represent
numbers"
The C64X can deal with 32 bit integers too"
12/9/14
Precision Number of bits used to represent a

digital value"
Resolution Smallest non-zero value that can
be represented"
Quantization Error Difference between the
actual analog signal value and its digitized
value"
12/9/14
Word Length Effects Errors due to fixed

point representation of input/coefficients, and
effect of mathematical operations with these"
Overflow Occurs when a computation results
in a number larger (smaller) than the
maximum (minimum) represented number "
12/9/14
Saturation Mode An operating mode that sets

the result of an operation to the maximum
(minimum) represented value in case of overflow
(underflow)"
"
Dynamic Range Ratio of absolute value of
maximum and minimum represented numbers in
dB =20log10(Max/Min)"
"
Range Difference between the most positive
number and the most negative number
represented"
12/9/14
Truncation When nearest represented point

is used after ignoring the fractional part"
"
Roundoff When the nearest represented
point is used"
"
Q Format - A system in which a programmer
assigns a decimal point in fixed point
numbers/operations"
12/9/14
Many Practical DSP and FPGAs use fixed point

arithmetic"
Fixed point processors perform operations
strictly on integers"
This implies that there is loss of precision when
representing real numbers"
The C5510 processor uses 16 bits to represent
numbers"
The C64X can deal with 32 bit integers too"
12/9/14
12/9/14
All representations are similar for positive

numbers"
Note that the seconds complement has only
one zero and has a representation for the
smallest negative number"
All DSP processors use the 2nd complement
representation"

12/9/14
10
000000
00000
----------------------------------------11111010 = -610
If we go back to the 4 bit representation the result wil
Example:
Compute
in fixed point on 4 bi
Multiplication
of -3 athe
nd 6product
gives
1101 = -310
0110 = 610
----------------------------------------00000000
1111101
111101
00000
----------------------------------------11101110 = -1810
If we go back to the 4 bit representation the result wil
Clearly, the last 4 bits are 1110, which is -2 (incorrect!)"

Properties:
For any sequence of operations in
Notice
the
sign
extension."
represented in the given range we can calculate correc
on intermediate results.
Example: By using the 4 bit signed representation ([12/9/14
11
7+6-8 = 5
0111+
7+
----------------------------------------00000000
1111101
111101
00000
----------------------------------------11101110 = -1810
Consider performing 7+6-8 =5 using 4 bit 2

Properties:
For any
sequence of operations in which the fin
complement
numbers
represented in the given range we can calculate correctly the final res
on intermediate
A useful presults.
roperty: When nal answer is within
Example:
Bythe
using
the 4 ibit
signed representation
([-8, i7)
range,
answer
s correct,
even when there
s range) calc
7+6-8
overow
in =b5etween!
0111+
7+

0110
6
nd be 11102 = -21
If we
go back to the 4 bit representation the result will
----------------------------------1101+
-3+ (overflow)
1000
-8
----------------------------------0101
5 (correct)
12
12/9/14
"!
C5510 fixed point processor does arithmetic with

16 bit 2nd complement integers"
When these numbers are multiplied a 32 bit

number results"
We would want to store only 16 bits of the

answer"
Retaining the MSB 16 bits may not make sense "
12/9/14
13
Example (drawn from Kuo) using 4 bits"

7 (0111) X 6 (0110) = 0101010 (42). MSB bits
0101 = 5."
We therefore have to store the entire answer "
When the numbers involved are fractional,
there is no such issue.!
12/9/14
14
Suppose we multiply 7/8 with 6/8 instead"

We represent 7/8 by 0111 = 2-1+2-2+2-3
=0.5+0.25+0.125"
Represent 6/8 by 0110= 2-1+2-2 "
Product is 2-2 +2 2-3 +2 2-4 +2-5 = 2-1 +2-3 +2-5 "
which can be represented as 0101010 and
approximately as 0101!"
Note however, that we have to use a consistent
Q format for the 8 bit product too."

15
12/9/14
When a decimal point is introduced, the

ignored bits can become insignificant"
This can be achieved by sacrificing dynamic
range"
Greater precision is however achieved"

12/9/14
16
Consider a Qm.n with m integer and n

fractional bits (N=m+n+1 is bits in fixed point
representation)"
bN-1bN-2b0 in Qm.n equals"
"
(-bN-12N-1+bN-22N-2+.b0)2-n"
12/9/14
17
12/9/14
18
The same number say 2000H represents 0.25 in

Q0.15, and 0.5 in Q1.14 and 1 in Q2.13"
The processor only operates on integers the

programmer has to keep track of the decimal
point in fixed point processors"
Qm.n is written as Qn when N=m+n+1 is obvious

(Q15 means Q0.15 for 16 bit representation)"
12/9/14
19
To
store 0.59 in a computer, we

multiply it by 215=32768 to get
19333.12."
This is rounded to 19333=4B85H
which is stored"
To convert the answer to fractional
form, simply multiply by 2-15"
Notice the loss in precision involved"
12/9/14
20
We compute X*215 = -294.8 (~ -295), which if w
Multiply by using the fractional representation on

c)
-0.510 with 0.7510
An extra sign bit has to be eliminated when

multiplying:
1.110 = -0.510

Observation: Pay attention to the sign extension.
0.110 = 0.7510
--------------------------------------00000000
1111100
111100
00000
--------------------------------------11.1010002 = -0.37510
Supplementary sign bit
We go back to the 4 bits representation:
In this1|1.101|000
example, there is no
truncation. 1.101
right! 10 which is a
-0.375
2 =1.101
2 = is
d)
12/9/14
-0.510 with 0.62510
21
1|1.101|0002 =1.1012 = -0.37510 which is a correct number.
Extra sign bit has to be ignored. When

and 0.625, truncation errors
d)multiplying
-0.510 with-0.5
0.625
10
occur"
1.100 * 0.101 = 11.1011002 = -0.312510

With four bits, you get: 1.101=-0.3750 which has
truncation errors!"
"!
12/9/14
22
When
you multiply numbers in Q0.15

format, the answer needs has a factor of
30 and the number occupies 32 bits in
accumulator."
Since a fractional form is Q0.31, we need
to multiply the answer by 2 to retain the
MSB (a left shift is all that is required)"
One has to similarly take care when
multiplying numbers in different Q formats"
Notice that there is no problem in additions"
12/9/14
23
Consider again the multiplication of 7/8 with

6/8 (4 bit representation)"
7/8 and 6/8 are represented as 0111 and
0110"
Product is 2-2 +2 2-3 +2 2-4 +2-5 = 2-1 +2-3 +2-5 "
But product reads as 00101010 x 2-6 "
For the MSB to be picked, it has to be left
shifted to give 01010100 x 2-7 which is a true
fractional format with 8 bits"
12/9/14
24
Suppose we want to multiply -7/8 with 6/8

instead"
Represent -7/8 as 1001 and 6/8 as 0110 in
Q0.3 format"
(-1 20 + 2-3)(2-1 + 2-2)= -2-1 -2-2 + 2-4 + 2-5 = -20
+ 2-2 + 2-4 + 2-5 whose 8-bit representation is
11101011"
It has to be left shifted by one"
12/9/14
25
12/9/14
26
Q0.15 has smallest dynamic range and highest

precision
12/9/14
27
Frequently, the IEEE 754 floating point standard is

used (ensures portability of code)"
It is defined for representing in 32 bit (single

precision), 43 bit (extended single precision), 64
bit (double precision) and 79 bits (extended
double precision)"
The floating point number has three fields the

sign (s) bit, the exponential (e) bits and the
mantissa (m) bits"
12/9/14
28
number = (-1)s2(e-127)1.m for single precision"

number = (-1)s2(e-1023)1.m for double
precision"
Note that the exponent is biased by
127/1023 to deal with positive and
negative exponents"

12/9/14
29
Single and Double Precision (Kuo)"
12/9/14
30
Dynamic range of numbers is large for

reasonable precision (Kuo)"
12/9/14
31
The
exponent of the smaller number has

to matched to that of the larger number"
Eg (from Kuo) "
If x = -10 2(128-127) 1.22 =2.44 and "
y = -11 2(130-127) 1.52 = -12.16 "
y+x = [-11 1.52 + (-10 1.22 2-(130-128)) ]
2(130-127) = -1 1.215 23 = - 9.72"
12/9/14
32
The
mantissas are multiplied, and the

exponents added"
X = -1sx 2(ex-127) 1.mx and "
y = -1sy 2(ey-127) 1.my"
product is "
(-1sx 1.mx)(-1sy 1.my)2(ex+ey-254)"

12/9/14
33
The C67x uses 32 bits, 24 for mantissa

(including sign) and 8 for the exponent"
"
The processor can also perform calculations

using double precision (64 bits) but requires
more cycles/operation"
"
The processor keeps only 48 bits of the

intermediate products and avoids overflow bits or
guard bits in accumulator by use of the exponent"
12/9/14
34
Floating
point requires more circuitry (not

an important factor today with high
transistor densities)"
Floating point required the use of larger
number of pins on IC greater cost (no
an important factor in these days of SOC)"
In the past code for floating point was
easier to develop in C language (and
assembly was used for fixed-point), no
longer the case"
12/9/14
35
With fixed point, there is a need to re-test

performance of algorithms in fixed point
representations (finite word-length effects)"
"
Floating point is therefore quicker to market (lower

cost of development too)"
"
Floating point was used for low volume

applications that were not cost sensitive, and fixed
in high volume low cost applications"
"
These cost factors are becoming increasingly

irrelevant"
12/9/14
36
Power
Consumption is increasingly the

new deciding factor"
Other factors include applications Audio
requires cascade IIR filtering, and floating
is preferable (ear is sensitive to loss of
precision). Audio signals have high
dynamic range"
In video/communications where transforms
are used, fixed points are preferred."
12/9/14
37
Floating
point processors are still preferred

in applications where high dynamic ranges
are involved (matrix inversions for
example) "
Floating is preferred in feedback systems
and when coefficients have large dynamic
range"
No need to check for overflow in floating
point processors"
Floating point is preferred in medical/
military applications."
12/9/14
38
Quantization error due to A/D conversion"

Quantization of the coefficients"
Rounding due to lower precision of
intermediate results"
Overflow/underflow due to improper scaling"
12/9/14
39
Assume
V is the peak-to-peak value of the

signal, and N levels are used. "
=V/2N, variance of quantization noise
2= 2/12 assuming uniform distribution"
For sine-wave input, P=V2/8 and SQNR =
6.02N+1.76 dB "
12/9/14
40
This
causes frequency response to be

different from the design (location of poles
and zeros changes)"
Because of this equivalent structures
result in different frequency response"
Cascade of quadratic forms used for this
reason instead of direct form for IIR Filters"
12/9/14
41
When
input and coefficients are multiplied

for filtering with finite precision arithmetic
and rounded, the error is modeled as
additive noise"
Computer simulation/simplified analysis
used for studying structures optimal for
implementation "
12/9/14
42
By assuming that the noise added is white and

uniform, and uncorrelated to the signals,
simplified analysis can be used to choose filter
structures"

12/9/14
43
Extensive
care is taken to ensure that

there is no saturation due to repeated
MACs (by scaling inputs/coefficients)"
The C5510 allows one to saturate
accumulator at the largest/smallest value
to control the error"
12/9/14
44
Scaling to take care of the worst-case

scenario causes serious SNR loss"
|y[n]| m|h(m)||x[n-m]| m|h(m)|"
All that we need to do is to scale so that "
m|h(m)| 1"
This does not utilize the full dynamic range of
the adder output registers, and results in poor
SNR"
12/9/14
45
Another approach is to normalize the input
so that the energy of the impulse response

is less than 1"
||h[m]||2 1"
This can be established using the CauchySchwartz inequality"
Processors should having the ability scale
numbers while processing.."
12/9/14
46
It
is important to interpret numbers in Q

formats when using fixed point processors"
Finite word length effects need to well
understood"
The input/coefficients need to be scaled to
ensure there is no overflow"
Scaling is also required to maintain Q
formats"
12/9/14
47
Prof. Shankar Prakriya!

Indian Institute of Technology, Delhi!
12/9/14
Feature(s) Benefit(s)!
A 32 x 16-bit Instruction buffer queue!
Two 17-bit x17-bit MAC units!
One 40-bit ALU!
One 40-bit Barrel Shifter!
One 16-bit ALU!
Four 40-bit accumulators!
Twelve independent buses:!
Three data read buses!
Two data write buses!
Five data address buses!
One program read bus!
One program address bus!
User-configurable IDLE Domains!
12/9/14
Buers variable length instructions and implements ecient block

repeat operations
Execute dual MAC operations in a single cycle
Performs high precision arithmetic and logical operations
Can shift a 40-bit result up to 31 bits to the left,
or 32 bits to the right
Performs simpler arithmetic in parallel to main ALU
Hold results of computations and reduce the required memory
trac
Provide the instructions to be processed as well as the operands
for the various computational units in parallel to take advantage
of the C55x parallelism.
Improve exibility of low-activity power management
12/9/14
!
!
12/9/14
!
!
12/9/14
Fetches program code, 32 bits at a time &
decodes
Queue holds 64 bytes at a time
Block local repeats executed within
Speculative fetches during conditional calls/
branches
Identies code boundaries (8,16,24,32,40 and
48 bits)
Determines parallel/non-parallel
Sends data to D and A units

12/9/14
12/9/14
Generates all Program memory addresses!

Controls sequence of instructions!
Generates 24bit addresses!
Tests whether condition is true !
Initiates interrupt servicing !
Controls repetition of single instruction/block of
instructions !
PC,RETA(return address reg), BRC0-BRC1 are block
repeat counters,RSA0-RSA1 are blockrepeat start addr
reg, REA0-REA1 are end registers, BRS1is BRC1 save
register!
RPTC is the single repeat counter, CSR is computed
single repeat register!
Four status registers ST0_55-ST3-55 are memory
mapped (bits can be toggled by specific instructions too)!
Interrupt
8
12/9/14 registers are also present!
12/9/14
All
data memory addresses are generated!

All pointer manipulations are handled!
8 Address registers (AR) and one
Coefficient Data Pointer (CDP)!
ARs/CDP can be incremented/
decremented by one/specified amount!
A-unit ALU operates in parallel with D unit!
Handles Circular addressing!
12/9/14
10
12/9/14
11
XAR0-XAR7 are 23 bit AR registers (operated in 64K

pages)!
XCDP- CDPH and CDP to provide common data to
two MACs!
SPH and SP are stack pointers high and low bits!
DPH data page high registers that allow the
instruction to specify only the lower 16 bits, DP is 16
bits !
BK03,BK47 and BKC are circular buffer size registers!
BSA01,BSA23, BSA45,BSA67 and BSAC are
circular buffer start address registers!
T0-T3 are temporary registers that are used to
specify operand in many instructions and also useful
for temporary storage, besides beign used for index
manipulations!
12/9/14
12
A bit ARnLC/CDPLC in ST2_55 has to be set for circular

addressing
12/9/14
13
12/9/14
14
12/9/14
15
Two MACs and an ALU plus shifter!

Shifts 40 bit accumulator values 31 bits to
left/32 bits to right!
Performs bit counting!
Performs additions/subtractions/comparisons/
rounding/saturation/boolean logic !
Tests,clears and sets bits in registers!
12/9/14
16
AC0-AC3 are four accumulators of 40 bits (16 low

+16 high +8 guard bits)
Can load memory locations with shift
Can load two adjacent locations into ACC (32 bits
in one cycle)
Can add/subtract two halves of accumulator
with two halves of other separately

!
12/9/14
17
ACOCx are accumulator overflow flags

Use BSET/BCLR to set/clear flags
TC1-TC2 are test/control bits used in a variety of bit test and
conditional instructions
C (carry is also modified by logical shifts)
12/9/14
18
BRAF Block Repeat Active Flag

C16 if set will make the ALU perform two 16 bits operations in
parallel
C54CM bit if set will set to compatibility mode of C54x processor
FRCT bit if set will cause left shift of all multiply results
INTM Interrupt Mode bit if set will disable all maskable interrupts
12/9/14
19
SXMD bit if 1 sets the sign extension mode

SATD is the saturation mode bit
M40 if1 ensures that sign is extracted from 39th
bit, and carry is determined wrt it, saturation
value is wrt 40 bits
XF is an External Flag
12/9/14
20
12/9/14
21
CACLR bit if set clears the instruction cache

CAEN bit if cleared disables the cache
HINT is Host Interrupt Bit
When CLKOFF is zero external CLKOUT pin disabled
SATA
is Saturation Mode bit
12/9/14
22
Many dierent addressing modes are possible

Geared towards making the code compact and
faster
Absolute adressing modes are rarely used

12/9/14
23
Can make the instruction slow/bulky
12/9/14
24
12/9/14
25
12/9/14
26
12/9/14
27
12/9/14
28
12/9/14
29
12/9/14
30
12/9/14
31
12/9/14
32
12/9/14
33
Allows use of both MACS

CDP is incremented once,not twice
12/9/14
34
Broadly fetch and execute phase!
12/9/14
35
12/9/14
36
12/9/14
37
12/9/14
38
12/9/14
39
12/9/14
40
If
required to write to a location but if a

previous location has not read it, NOPs
are inserted !
If an instruction is supposed to read a data
but the previous instruction has not yet
written, some extra cycles are added so
write occurs first!
Programmer can therefore program with
some elementary rules in mind!
!
12/9/14
41
The processor has features that can be
expected in a typical DSP processor

It has features that save power
It is designed to speed up DSP
operations
Modern devices have multiple C5510
cores with ARM processor
12/9/14
42

Digital Signal Processing CSS55

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Digital Signal Processing CSS55

Hochgeladen von

Copyright:

Verfügbare Formate

Prof. Shankar Prakriya!

Indian Institute of Technology, Delhi!

Why DSP Processors?!

y = cTx where c = [c1,,cN] x=[x1,..,xN]!

DSP Processors utilize the Harvard/Modified

DSP Processors like the C5510 allow the

Indexing requirement for repeated operation

When we multiply accumulate with fixed point

Consider two finite length sequences h and x

y[n] = i=0N-1h[i]x[n-i] n=0,.., N+Nx-2!

It can be seen that the pointer to h[n] can benefit

Filtering is the most common operation in signal

y[n] = i=0N-1h[i]x[n-i] is a Finite Impulse

FIR Filters are preferred for their guaranteed

y[n] = i=0N-1h[i]x[n-i] - i=1Mg[i]y[n-i]!

OPTION 1: After y[n] computation, x[n-N+2] is

Option 2: It is also possible for x[n] to be stored

Accumulator has to be of larger number of bits

In many applications, it is desirable that the

There are four types of FIR filters !

Some applications require FIR filters with

Assuming two matrices of size m x n and n x r

is efficiently implemented using

requires N2 complex multiplications for

X[k] = n=0N/2-1xe[n]WN/2nk +WNk n=0N/2-1xo[n]WN/2nk !

Sum of two DFTs (N/2)2 each +N/2 extra!

computations required in log2N

The Buttery (note that only one complex multiply

Note that the input is in bit reversed order, but

Note that the sequence of inputs accessed is

The last log2N bits are reversed in the first

The C5510 and other DSP processors

Sequence is periodic, and has autocorrelation

Has to tap selected registers, XOR the bits and

Used in communication receivers, decoders

Calculation of path metrics, rapid comparisons of

DSP requires very specialized functions to be

Prof. Shankar Prakriya

Fixed point arithmetic"

Many Practical DSP and FPGAs use fixed point

Precision Number of bits used to represent a

Word Length Effects Errors due to fixed

Saturation Mode An operating mode that sets

Truncation When nearest represented point

Many Practical DSP and FPGAs use fixed point

All representations are similar for positive

If we go back to the 4 bit representation the result wil

Clearly, the last 4 bits are 1110, which is -2 (incorrect!)"

Example: By using the 4 bit signed representation ([12/9/14

Consider performing 7+6-8 =5 using 4 bit 2

C5510 fixed point processor does arithmetic with

When these numbers are multiplied a 32 bit

We would want to store only 16 bits of the

Example (drawn from Kuo) using 4 bits"

Suppose we multiply 7/8 with 6/8 instead"

When a decimal point is introduced, the

Consider a Qm.n with m integer and n

The same number say 2000H represents 0.25 in

The processor only operates on integers the

Qm.n is written as Qn when N=m+n+1 is obvious

store 0.59 in a computer, we

We compute X*215 = -294.8 (~ -295), which if w

Multiply by using the fractional representation on

-0.510 with 0.7510