Beruflich Dokumente
Kultur Dokumente
Kurt Keutzer
Processor Applications
Workstations, PCs
Increasing
Cost
Single program
DSP support
Increasing
volume
Microcontrollers
Kurt Keutzer
Processor Markets
$30B
32-bit
micro
$1.2B/4%
Kurt Keutzer
$5.2B/17%
32 bit DSP
DSP
$10B/33%
16-bit
micro
$5.7B/19%
8-bit
micro
$9.3B/31%
Performance
Microprocessors
Performance is
everything
& Software rules
Microcontrollers
Cost is everything
Cost
Kurt Keutzer
Mixed/
Signal
Analog
DSP
Kurt Keutzer
DSP Applications
Audio applications
Networking
MPEG Audio
Cable modems
Portable audio
ADSL
Digital cameras
VDSL
Wireless
Cellular telephones
Base station
Kurt Keutzer
Cable modem
gateways
Increasing
Cost
High-end
Mid-end
Low end
Storage products - TMS320C27
Portable phones
Wireless headsets
Consumer audio
Kurt Keutzer
Increasing
volume
Kurt Keutzer
Will provide
a ubiquitous
infrastructure
for wireless
data as well
as voice
600
500
400
300
Digital
200
100
0
Analog
Kurt Keutzer
Year
CELLULAR TELEPHONE
SYSTEM
123
456
789
0
PHYSICAL
LAYER
PROCESSING
A/D
Kurt Keutzer
415-555-1212
CONTROLLER
SPEECH
ENCODE
BASEBAND
CONVERTER
SPEECH
DECODE
DAC
10
RF
MODEM
HW/SW/IC PARTITIONING
MICROCONTROLLER
123
456
789
0
ASIC
A/D
415-555-1212
CONTROLLER
PHYSICAL
LAYER
PROCESSING
SPEECH
ENCODE
BASEBAND
CONVERTER
SPEECH
DECODE
RF
MODEM
DAC
DSP
Kurt Keutzer
ANALOG IC
11
RAM
RAM
DMA
speech
quality
ASIC
LOGIC
DSP
CORE
enhancment
phone
keypad
book
intfc
control protocol
voice
recognition
de-intl &
RPE-LTP
decoder
speech decoder
demodulator
and
synchronizer
Kurt Keutzer
Viterbi
equalizer
12
C540
ARM7
Kurt Keutzer
13
Embedded
Processor
Interface
Low Power Bus
FB
Data
Flow
Kurt Keutzer
Fifo
Fifo
Video
Decomp
Pen
SRAM
Graphics
Audio
Video
14
Video I/O
Downlink Radio
Voice I/O
Pen In
Video Unit
Coms
Memory
Kurt Keutzer
15
custom
DSP
Optimized for a single program - code often in on-chip ROM or off chip
EPROM
Minimum code size (one of the motivations initially for Java)
Performance obtained by optimizing datapath
Low cost
Customizable core
Kurt Keutzer
16
Nintendo processor
Kurt Keutzer
Cellular phones
17
???
Kurt Keutzer
Nintendo processor
Cellular phones
18
Code size
Kurt Keutzer
19
BENCHMARKS - DSPstone
ZIVOJNOVIC, VERLADE, SCHLAGER: UNIVERSITY OF AACHEN
APPLICATION BENCHMARKS
REAL_UPDATE
COMPLEX_UPDATES
DOT_PRODUCT
MATRIX_1X3
CONVOLUTION
FIR
FIR2DIM
HR_ONE_BIQUAD
LMS
FFT_INPUT_SCALED
Kurt Keutzer
20
Different history and different applications led to different terms, different metrics,
some new inventions
Convergence of markets will lead to architectural showdown
Kurt Keutzer
21
set of applications
End-user programmable
constraints, additional
performance may not be
useful/valuable
Kurt Keutzer
22
Differentiating features:
Differentiating features
power
cost
speed (must be
predictable)
speed
Kurt Keutzer
23
Kurt Keutzer
24
DSP are judged by whether they can keep the multipliers busy 100% of the
time.
FFT, and
convolvers
People still write in assembly language for a product to minimize the die
area for ROM in the DSP chip.
Kurt Keutzer
25
TMS320C80
TMS320C6000
TI TMS320C4X
MOTOROLA 96000
AT&T DSP32C
TI TMS320C2X
MOTOROLA 56000
AT&T DSP16
Kurt Keutzer
26
Speed
Code density
Low power
Architectural and micro-architectural features that
are artifacts of the era in which they were designed
Kurt Keutzer
27
Fixed-point arithmetic
MAC- Multiply-accumulate
Harvard Architecture
Bit-reversed addressing
Circular buffers
Zero-overhead loops
Kurt Keutzer
28
-1 x < 1
radix
point
S
Kurt Keutzer
.
radix
point
SW Libraries
Kurt Keutzer
30
Modulo Arithmetic???
Kurt Keutzer
31
Kurt Keutzer
32
Motorola DSP:
24b x 24b => 48b product, 56b Accumulator
Multiplier
Shift
ALU
Accumulator G
Kurt Keutzer
ALU
Accumulator
33
Kurt Keutzer
34
Data Path
DSP Processor
General-Purpose Processor
1 cycle.
Shifters
Guard bits
Saturation
Kurt Keutzer
35
Kurt Keutzer
36
FIR Filtering:
A Motivating Problem
M most recent samples in the delay line (Xi)
New sample moves data down delay line
Tap is a multiply-add
Each tap (M+1 taps total) nominally requires:
Multiply
Accumulate
Kurt Keutzer
37
Z 1
C1
Kurt Keutzer
Z 1
C2
Z 1
....
C N 1
CN
38
y(n)
N1
h(m)x(n m)
0
element of finite-impulse
response filter computation
X
MPY
ADD/SUB
ACC REG
Kurt Keutzer
39
Xn X
2
n-1
Yn
The critical hardware unit in a DSP is the multiplier - much of the architecture
is organized around allowing use of the multiplier on every cycle
This means providing two operands on every cycle, through multiple data
and address busses, multiple address units and local accumulator
feedback
Kurt Keutzer
40
Kurt Keutzer
41
DSP Memory
FIR Tap implies multiple memory accesses
DSPs want multiple data ports
Some DSPs have ad hoc techniques to reduce memory bandwdith demand
Kurt Keutzer
42
Kurt Keutzer
43
PROGRAM
MEMORY
X MEMORY
Y MEMORY
GLOBAL
P DATA
X DATA
Y DATA
Kurt Keutzer
44
Memory Architecture
DSP Processor
General-Purpose Processor
Harvard architecture
Typically 1 access/cycle
No caches-on-chip SRAM
Program
Memory
Processor
Processor
Memory
Data
Memory
Kurt Keutzer
45
Kurt Keutzer
46
Kurt Keutzer
47
DSP Addressing
Have standard addressing modes: immediate, displacement,
register indirect
Want to keep MAC datapth busy
Assumption: any extra instructions imply clock cycles of
overhead in inner loop
=> complex addressing is good
=> dont use datapath to calculate fancy address
Autoincrement/Autodecrement register indirect
Kurt Keutzer
48
=>
0 (000)
1 (001)
=>
4 (100)
2 (010)
=>
2 (010)
3 (011)
=>
6 (110)
4 (100)
=>
1 (001)
5 (101)
=>
5 (101)
6 (110)
=>
3 (011)
7 (111)
=>
7 (111)
Kurt Keutzer
49
x(0)
F(0)
100
x(4)
F(1)
010
x(2)
F(2)
110
x(6)
F(3)
001
x(1)
F(4)
101
x(5)
F(5)
011
x(3)
F(6)
111
x(7)
F(7)
Four 2-point
DFTs
Two 4-point
DFTs
Kurt Keutzer
50
Kurt Keutzer
51
CIRCULAR BUFFERS
Instructions accomodate three
elements:
buffer address
buffer size
increment
Allows for cyling through:
delay elements
coefficients in data memory
Kurt Keutzer
52
Addressing
DSP Processor
General-Purpose Processor
Specialized addressing
modes; e.g.:
General-purpose addressing
modes
Autoincrement
Modulo (circular)
Kurt Keutzer
53
Kurt Keutzer
54
Kurt Keutzer
55
DO X ...
Address Generation
PCS = PC + 1
if (PC = x && ! condition)
PC = PCS
else
PC = PC +1
Kurt Keutzer
56
Instruction Set
DSP Processor
General-Purpose Processor
Specialized, complex
instructions
Multiple operations per
instruction
mac x0,y0,a
Kurt Keutzer
x: (r0) + ,x0
y: (r4) + ,y0
General-purpose instructions
Typically only one operation
per instruction
mov *r0,x0
mov *r1,y0
mpy x0, y0, a
add a, b
mov y0, *r2
inc r0
inc rl
57
Host ports
Parallel ports
Timers
Clock generators
Kurt Keutzer
58
Specialized peripherals
Kurt Keutzer
59
Kurt Keutzer
60
Fixed-point arithmetic
MAC- Multiply-accumulate
Harvard Architecture
Bit-reversed addressing
Circular buffers
Zero-overhead loops
Kurt Keutzer
61