Department of ECE
Bannari Amman Institute of Technology,
Sathyamangalam
1
Technology
2
Why DSP - Why Not Analog
3
Why DSP - Why Not Analog
ASP DSP
5
Block diagram of communication system
6
Digital Signal Processing(DSP)
Processing: a series operations performed according to programmed instructions
Signal: a parameter (electrical quantity or effect) that can be varied in such a way as to
convey information.
Digital: operating by the use of discrete signals to represent data in the form of
numbers.
Digital Signal Processor (DSPs): electronic system that processes the digital signals.
7
8
Real time dsps APPLICATIONS
• Cell phones.
• Fax machines.
• High-resolution printers.
• Digital cameras.
9
10
11
DSPs ARCHITECTURES
12
VON NEUMANN ARCHITECTURE
Three Buses
a) Data Bus.
b) Address Bus.
c) Control Bus
• To transfer instructions and data simultaneously. Both instructions and data are stored in
separate memory .
• Enhance the performance, because instructions and data can be fetched simultaneously.
Execute Multiple
Instruction/cycle.
Increased Performance.
Easier to program.
17
DSPs CLASSIFICATION
18
• By arithmetic format
– Fixed-point
– Floating-point
– Block floating-point
• By data width
– Typical fixed-point DSPs: 16-bit
– Typical floating-point DSPs: 32-bit
• By memory organization
• By multiprocessor support
19
Contd.,
• By speed
– Million of instruction per second (MIPS)
– A basic operation (e.g. MAC)
– A basic algorithm (e.g. FFT, FIR or IIR filter)
– Benchmark programs
• By power consumption
– Operating voltage
– Sleep or idle mode
– Programmable clock dividers
– Peripheral control
20
21
• First generation (TI TMS32010)
• Second generation (Motorola DSP56001, AT&T
DSP16A, Analog Dev. ADSP-2100, TI TMS320C50)
• Third generation (Motorola DSP56301, TI
TMS320C541, TI TMS320C80, Motorola
MC68356)
• Fourth generation (TI TMS320C6201, Intel
Pentium MMX)
22
• 16-bit fixed-point.
• Harvard architecture.
• Accumulator.
• Specialized instruction set.
• 390 ns MAC time (228 ns today).
23
• 24-bit data, instructions
• 3 memory spaces (X, Y, P)
• Parallel moves
• Single- and multi
instruction
• hardware
• loops
• Modulo addressing
• 75 ns MAC (21 ns today)
24
• Enhanced conventional DSP architectures
• 3.0 or 3.3 volts.
• More on-chip memory.
• Application-specific function units in data path or
as co-processors.
• More sophisticated debugging and application
development tools.
• DSP cores (Pine & Oak from DSP G., cDSP from TI)
• 20 ns MAC (10 ns today).
25
• Blazing clock speeds and super scalar
architectures.
• VLIW-like architectures, achieve top
performance via high parallelism and increased
clock speeds.
• 3 ns MAC throughput.
• Expensive, power-hungry.
26
27
TMS320 Family & Its Applications
28
NOMENCLATURE
29
TMS320 DSP Family Overview
• Consists of fixed-point, floating-point, and multiprocessor digital signal processors.
Characteristics
• Very flexible instruction set
• Inherent operational flexibility
• High-speed performance
• Innovative parallel architecture
• Cost-effectiveness
• C-friendly architecture
30
C5x Architecture
31
TMS320C5x Overview
• Fixed point, 16-bit processor and operated at 40MHz
• Consists of the ’C50, ’C51, ’C52, ’C53, ’C53S, ’C56, ’C57, and ’C57S DSPs
• Fabricated by CMOS integrated-circuit technology.
• Single instruction execution time is 50msec.
• Execute up to 50 million instructions per second (MIPS).
• Operational flexibility and speed due to advanced Harvard architecture
• CPU with application-specific hardware logic, on-chip peripherals, on-chip
memory, and a highly specialized instruction set.
Advantages
Increased performance and versatility due to enhanced architectural design
Low power consumption due to advanced integrated-circuit processing technology
Source code compatibility with ’C1x, ’C2x, and ’C2xx DSPs for fast and easy
performance upgrades
Enhanced instruction set for faster algorithms and for optimized high-level
32
language operation
General DSP System Block Diagram
33
Architecture of TMS320C5X
34
FOUR sub blocks
• Bus Structure
• On chip Memory
• On chip Peripherals
35
Bus Structure
Separate program and data buses allow simultaneous access to program
instructions and data, providing a high degree of parallelism.
36
Central Processing Unit (CPU)
37
elements of CPU
CALU
•Central arithmetic logic unit (CALU)
Consist of 16x16 bit parallel Multiplier
32 bit Accumulator (ACC)
32 bit Accumulator Buffer (ACCB)
Product Register (PREG)
Additional Shifters at the output of ACC and PREG
Used to perform 2’s Complement arithmetic.
PLU
•Parallel logic unit (PLU)
It’s a Second logic unit
Executes logic operations on data without affecting
the contents of Accumulator or PREG
can set, clear, test, or toggle bits in a status register,
control register, or any data memory location.
Results are written back to the original data memory
location. 38
Contd.,
•Auxiliary register arithmetic unit (ARAU)
•Memory-mapped registers
It consists of,
Program Counter- contain an address of program memory used for fetch instruction.
Control and status registers -16 bit reg. contain control & status bits for CPU.
40
DSP Requires Multiply and Accumulate
41
On-Chip Memory
• C5x has a total address range of 224K words X 16 bits.
42
Large on-chip Memories includes,
43
Contd.,
Data/program single-access RAM (SARAM)
divided into 1K- and/or 2K-word blocks continues in program or data memory
space
one SARAM block can be accessed only once per machine cycle
44
On-Chip Peripherals
On-chip peripherals are:
• Clock generator
• Hardware timer
• Serial port
• User-maskable interrupts
45
Contd.,
• Clock Generator
Consists of an internal oscillator and a phase-locked loop (PLL) circuit.
Can be driven internally by a crystal resonator circuit or driven externally by a
clock source.
PLL circuit can generate an internal CPU clock by multiplying the clock source
by a specific factor, so you can use a clock source with a lower frequency than
that of the CPU.
• Hardware Timer
6-bit hardware timer with a 4-bit prescaler is available.
It clocks at a rate that is between 1/2 and 1/32 of the machine cycle rate
(CLKOUT1), depending upon the timer’s divide-down ratio.
Can be stopped, restarted, reset, or disabled by specific status bits.
46
Contd.,
•Host Port Interface (HPI)
Its available on the ’C57S and ’LC57 is an 8-bit parallel I/O port that provides
an interface to a host processor.
Information is exchanged between the DSP and the host processor through on-
chip memory that is accessible to both the host processor and the ’C57.
•Serial Port
Three different kinds of serial ports are available:
a general-purpose serial Port
a time-division multiplexed (TDM) serial port
a buffered serial port (BSP).
Each ’C5x contains at least one general-purpose, high-speed synchronous, full-duplexed
serial port interface that provides direct communication with serial devices such as codecs,
serial analog-to- digital (A/D) converters and other serial systems.
Capable of operating at up to one fourth the machine cycle rate
The serial port transmitter and receiver are double-buffered and individually controlled by
maskable external interrupt signals.
Data is framed either as bytes or as words
47
Contd.,
•TDM Serial Port
This is available on the ’C50, ’C51, and ’C53 devices is a full duplexed serial port that can
be configured by software either for synchronous operations or for time-division
multiplexed operations.
Commonly used in multiprocessor applications.
•Test/Emulation
On the ’C50, ’LC50, ’C51, ’LC51, ’C53, ’LC53, ’C57S and ’LC57S, an IEEE standard 1149.1
(JTAG) interface with boundary scan capability is used for emulation and test.
It provides the boundary scan to and from the interfacing devices.
It can be used to test pin-to-pin continuity and to perform operational tests on devices that
are peripheral to the ’C5x.
On the ’C52, ’LC52, ’C53S, ’LC53S, ’LC56, and ’LC57, an IEEE standard 1149.1 (JTAG)
interface without boundary scan capability is used for emulation purposes only
Can perform on-board emulation by means of the IEEE standard 1149.1 serial scan pins
and the emulation-dedicated pins.
48
Pipelining
Definition
In the operation of the pipeline, the instruction fetch, decode, operand read,
and execute operations are independent, which allows overall instruction
executions to overlap.
(Or)
The process of fetching a new instruction while other instruction on execution.
Advantages:
49
Pipeline Structure
The FOUR PHASES of pipeline structure and their functions are as
follows:
Fetch (F)
This phase fetches the instruction words from memory and
updates the program counter (PC).
Decode (D)
This phase decodes the instruction word and performs address
generation and ARAU updates of auxiliary registers.
Read (R)
This phase reads operands from memory, if required.
If the instruction uses indirect addressing mode, it will read the
memory location pointed at by the ARP before the update of the
previous decode phase.
Execute (E)
This phase performs any specify operation, and, if required,
writes results of a previous operation to memory.
50
Prefetch Fetch Decode Access Read Execute
P F D A R E
51
Four Level Pipeline Operation
52
Contd.,
53
Example: Pipeline Operation of 1-Word Instruction
ADD *+
SAMM TREG0
MPY *+
SQRA *+, AR2
54
Addressing Modes of c5x
Direct addressing
Indirect addressing
Immediate addressing
Circular addressing
55
Direct Addressing
Instruction contains the lower 7 bits of the data memory address (dma).
7-bit dma is concatenated with the 9 bits of the data memory page pointer (DP) in
status register 0 to form the full 16-bit data memory address.
This 16-bit data memory address is placed on an internal direct data memory address
bus (DAB).
DP points to one of 512 possible data memory pages and the 7-bit address in the
instruction points to one of 128 words within that data memory page.
Load the DP bits by using the LDP or the LST #0 instruction.
56
Examples:
57
Immediate Addressing mode
used to handle the constant data.
Examples:
58
Contd
., The
Instruction word(s) contains the value of the immediate operand.
’C5x has both 1-word (8-bit, 9-bit, and 13-bit constant) short immediate
instructions and 2-word (16-bit constant) long immediate instructions.
59
memory mapped register addressing
Examples:
60
Contd.,
61
Indirect Addressing
Example
62
Contd.,
Eight 16-bit auxiliary registers (AR0–AR7) provide flexible and powerful indirect
addressing. In indirect addressing, any location in the 64K-word data memory
space can be accessed using a 16-bit address contained in an AR. Figure 5–3
shows the hardware for indirect addressing.
63
Circular Addressing
Many algorithms such as convolution, correlation, and finite impulse response (FIR)
filters can use circular buffers in memory to implement a sliding window, which
contains the most recent data to be processed.
’C5x supports two concurrent circular buffers operating via the ARs.
The following five memory-mapped registers control the circular buffer operation:
CBSR1 — Circular buffer 1 start register
CBSR2 — Circular buffer 2 start register
CBER1 — Circular buffer 1 end register
CBER2 — Circular buffer 2 end register
CBCR — Circular buffer control register
64
Instruction Set
•The ’C5x instruction set supports numerically intensive signal-processing
operations as well as general-purpose applications, such as
multiprocessing and high-speed control.
• The instruction set is a superset of the ’C1x and ’C2x instruction sets and is
source-code upward compatible with both devices.
• Classifications:
Accumulator memory reference instructions
Auxiliary registers and data memory page pointer instructions
Parallel Logic Unit (PLU) instructions
Multiply instructions
Branch and call instructions
I/O and data memory operation instructions
Control instructions
65
Accumulator memory reference
instructions
66
Examples:
67
Auxiliary registers and data memory
page pointer instructions
68
Examples:
69
Contd.,
70
Parallel Logic Unit (PLU) instructions
71
Examples:
72
PREG & Multiply instructions
73
Examples:
74
BRANCH INSTRUCTIONs
75
Examples:
76
control INSTRUCTIONs
77
SIMPLE PROGRAMMING EXAMPLE
64 bit Addition and Subtraction
ADDITION
.MMERGS
.TEXT
START LDP #100H
LACC 0001,10H
ADDS 0000
ADDS 0004
ADD 0005,10H
SACL 0008
SACH 0009
LACC 0003,10H
ADDC 0002
ADDS 0006
ADD 0007,10H
SACL 0010
SACH 0011
H: B H
78
SUBTRACTION
.MMERGS
.TEXT
START LDP #100H
LACC 0001,10
ADDS 0000
SUBS 0004
SUB 0005,10
SACL 0008
SACH 0009
LACC 0002,0
SUBB 0006
ADD 0003,10
SUB 0007,10
SACL 000A
SACH 000B
H: B H
79
MULTIPLICATION AND EXPRESSION EVALUATION
16 Bit Multiplication:
.MMREGS
.TEXT
LDP #100H
LACL #0
LT 0000
MPY 0001
PAC
SACL 0002,0
SACH 0003,0
H: B H
80
Y=A*X1+B*X2+C*X3
.MMREGS
.TEXT
LDP #100H
LACL #0
LT 0000
MPY 0003
LTA 0001
MPY 0004
LTA 0002
MPY 0005
APAC
SACL 0006,0
H: B H
81
GENERATION OF WAVEFORMS
SQUARE WAVE FORM
.MMERGS
.TEXT
START: LDP #120H
LACC #0H
LOOP: SACL 0
RPT #0FFH
OUT 0,04
CMPL
B LOOP
.END