Sie sind auf Seite 1von 83

Digital signal processors

Department of ECE
Bannari Amman Institute of Technology,
Sathyamangalam

1
Technology

2
Why DSP - Why Not Analog

3
Why DSP - Why Not Analog
ASP DSP

• Inflexibility to changes • More Flexible


• Sensitive to Electrical noise • Highly Immune to Noise
• Limited Repeatability due to • Repeatability or
temperature variations Reproducible
• Limited Accuracy • Accuracy controlled by word
length
•Difficult to store and costlier • Storage is easy and cheap
•Difficult for implementation •Easy for implementation
4
Digital Signal Processing
Disadvantages

• More Quantization and Round off errors

• Complex to design ADC and DAC circuits

5
Block diagram of communication system

6
Digital Signal Processing(DSP)
Processing: a series operations performed according to programmed instructions

Signal: a parameter (electrical quantity or effect) that can be varied in such a way as to
convey information.

Digital: operating by the use of discrete signals to represent data in the form of
numbers.

Digital Signal Processors(DSPS)

 Digital Signal Processor (DSPs): electronic system that processes the digital signals.

7
8
Real time dsps APPLICATIONS

• Cell phones.

• Fax machines.

• DVD players and other home audio equipment.

• Computer disk drives.

• High-resolution printers.

• Digital cameras.

9
10
11
DSPs ARCHITECTURES

12
VON NEUMANN ARCHITECTURE

 In 1946, John Von Neumann


Developed.

 Three Buses

a) Data Bus.
b) Address Bus.
c) Control Bus

• Both instructions and data are stored in same memory. 13


HARVARD ARCHITECTURE
 Hardvard Mark

• To transfer instructions and data simultaneously. Both instructions and data are stored in
separate memory .

• Enhance the performance, because instructions and data can be fetched simultaneously.

• Also have ALUs and input/output units.


14
MODIFIED HARDVARD ARCHITECTURE

 Two or More Memory Buses.


 Two Independent memory Banks.
 Not Interchangeable.
 Less Flexible. 15
VLIW – Very Long InstructIon Words/ MuLtIpLe ALu’s

 Increased the number of


instruction processed per
second.

 Execute Multiple
Instruction/cycle.

 It transfer each instruction to


an appropriate Functional Unit.
16
Advantages of VLIW Architecture

Increased Performance.

Easier to program.

Can add more execution units, allow more instructions to be packed

into a VLIW instruction.

Disadvantages of VLIW Architecture

Increase in memory use.

High Power Consumption.

17
DSPs CLASSIFICATION

18
• By arithmetic format
– Fixed-point
– Floating-point
– Block floating-point
• By data width
– Typical fixed-point DSPs: 16-bit
– Typical floating-point DSPs: 32-bit
• By memory organization
• By multiprocessor support

19
Contd.,
• By speed
– Million of instruction per second (MIPS)
– A basic operation (e.g. MAC)
– A basic algorithm (e.g. FFT, FIR or IIR filter)
– Benchmark programs
• By power consumption
– Operating voltage
– Sleep or idle mode
– Programmable clock dividers
– Peripheral control

20
21
• First generation (TI TMS32010)
• Second generation (Motorola DSP56001, AT&T
DSP16A, Analog Dev. ADSP-2100, TI TMS320C50)
• Third generation (Motorola DSP56301, TI
TMS320C541, TI TMS320C80, Motorola
MC68356)
• Fourth generation (TI TMS320C6201, Intel
Pentium MMX)

22
• 16-bit fixed-point.
• Harvard architecture.
• Accumulator.
• Specialized instruction set.
• 390 ns MAC time (228 ns today).

23
• 24-bit data, instructions
• 3 memory spaces (X, Y, P)
• Parallel moves
• Single- and multi
instruction
• hardware
• loops
• Modulo addressing
• 75 ns MAC (21 ns today)

24
• Enhanced conventional DSP architectures
• 3.0 or 3.3 volts.
• More on-chip memory.
• Application-specific function units in data path or
as co-processors.
• More sophisticated debugging and application
development tools.
• DSP cores (Pine & Oak from DSP G., cDSP from TI)
• 20 ns MAC (10 ns today).
25
• Blazing clock speeds and super scalar
architectures.
• VLIW-like architectures, achieve top
performance via high parallelism and increased
clock speeds.
• 3 ns MAC throughput.
• Expensive, power-hungry.

26
27
TMS320 Family & Its Applications

28
NOMENCLATURE

29
TMS320 DSP Family Overview
• Consists of fixed-point, floating-point, and multiprocessor digital signal processors.

• Designed specifically for real-time signal processing.

Characteristics
• Very flexible instruction set
• Inherent operational flexibility
• High-speed performance
• Innovative parallel architecture
• Cost-effectiveness
• C-friendly architecture

30
C5x Architecture

31
TMS320C5x Overview
• Fixed point, 16-bit processor and operated at 40MHz
• Consists of the ’C50, ’C51, ’C52, ’C53, ’C53S, ’C56, ’C57, and ’C57S DSPs
• Fabricated by CMOS integrated-circuit technology.
• Single instruction execution time is 50msec.
• Execute up to 50 million instructions per second (MIPS).
• Operational flexibility and speed due to advanced Harvard architecture
• CPU with application-specific hardware logic, on-chip peripherals, on-chip
memory, and a highly specialized instruction set.
Advantages
 Increased performance and versatility due to enhanced architectural design
 Low power consumption due to advanced integrated-circuit processing technology
 Source code compatibility with ’C1x, ’C2x, and ’C2xx DSPs for fast and easy
performance upgrades
 Enhanced instruction set for faster algorithms and for optimized high-level
32
language operation
General DSP System Block Diagram

33
Architecture of TMS320C5X

34
FOUR sub blocks

• Bus Structure

• Central Processing Unit(CPU)

• On chip Memory

• On chip Peripherals

35
Bus Structure
 Separate program and data buses allow simultaneous access to program
instructions and data, providing a high degree of parallelism.

Four major buses:


• Program bus (PB)
 Carries the instruction code and immediate operands
from program memory to CPU.
• Program address bus (PAB)
 Provides address to program memory space for both
read and write.
• Data read bus (DB)
 Interconnects various elements of the CPU to data
memory space
• Data read address bus (DAB)
 Provides address to access the data memory space

36
Central Processing Unit (CPU)

37
elements of CPU
CALU
•Central arithmetic logic unit (CALU)
 Consist of 16x16 bit parallel Multiplier
 32 bit Accumulator (ACC)
 32 bit Accumulator Buffer (ACCB)
 Product Register (PREG)
 Additional Shifters at the output of ACC and PREG
 Used to perform 2’s Complement arithmetic.

PLU
•Parallel logic unit (PLU)
 It’s a Second logic unit
 Executes logic operations on data without affecting
the contents of Accumulator or PREG
 can set, clear, test, or toggle bits in a status register,
control register, or any data memory location.
 Results are written back to the original data memory
location. 38
Contd.,
•Auxiliary register arithmetic unit (ARAU)

 Register file containing eight Auxiliary Registers (AR0-AR7) ARAU

with 16 bit wide connected with ARAU.

 3 bit Auxiliary Register Pointer (ARP).

 Unsigned 16 bit ALU.

 AR are used for indirect addressing of the data memory or

temporary data storage.

•Memory-mapped registers

 Has 96 registers memory mapped into page 0

 It is the component of the data memory space.

 used for indirect data address pointers, temporary storage,

CPU status and control, or integer arithmetic processing

through the ARAU. 39


Contd.,
Program controller
 Decodes the operational instructions, manages the CPU

pipeline, stores the status of CPU operations, and decodes

the conditional operations.

 It consists of,

 Program Counter- contain an address of program memory used for fetch instruction.

 Control and status registers -16 bit reg. contain control & status bits for CPU.

 Hardware Stack- used for PUSH & POP operation.

 Address Generation unit –Holds table information.

 Instruction Register- Contains code for application.

40
DSP Requires Multiply and Accumulate

41
On-Chip Memory
• C5x has a total address range of 224K words X 16 bits.

• Memory space is divided into four individually selectable memory segments:

 64K-word program memory space

 64K-word local data memory space

 64K-word input/output ports

 32K-word global data memory space

42
Large on-chip Memories includes,

•Data/program dual-access RAM (DARAM)


Carry a 1056-word X 16-bit on-chip dual-access RAM (DARAM).
DARAM is divided into three individually selectable memory blocks:
 512-word data or program DARAM block B0
 512-word data DARAM block B1
 32-word data DARAM block B2.
 All 1056 words X 16 bits configured as data memory
 544 words X 16 bits configured as data memory and
512 words × 16 bits configured as program memory

43
Contd.,
Data/program single-access RAM (SARAM)

 carry a 16-bit on-chip single-access RAM (SARAM) of various sizes.

 divided into 1K- and/or 2K-word blocks continues in program or data memory

space

CPUs support parallel accesses to these SARAM blocks

one SARAM block can be accessed only once per machine cycle

 SARAM can be configured by software in one of three ways:

 All SARAM configured as data memory

 All SARAM configured as program memory

 SARAM configured as both data memory and program memory

44
On-Chip Peripherals
On-chip peripherals are:

• Clock generator

• Hardware timer

• Software-programmable wait-state generators

• Parallel I/O ports

• Host port interface (HPI)

• Serial port

• Buffered serial port (BSP)

• Time-division multiplexed (TDM) serial port

• User-maskable interrupts

45
Contd.,
• Clock Generator
 Consists of an internal oscillator and a phase-locked loop (PLL) circuit.
 Can be driven internally by a crystal resonator circuit or driven externally by a
clock source.
 PLL circuit can generate an internal CPU clock by multiplying the clock source
by a specific factor, so you can use a clock source with a lower frequency than
that of the CPU.

• Hardware Timer
 6-bit hardware timer with a 4-bit prescaler is available.
 It clocks at a rate that is between 1/2 and 1/32 of the machine cycle rate
(CLKOUT1), depending upon the timer’s divide-down ratio.
 Can be stopped, restarted, reset, or disabled by specific status bits.

46
Contd.,
•Host Port Interface (HPI)
 Its available on the ’C57S and ’LC57 is an 8-bit parallel I/O port that provides
an interface to a host processor.
 Information is exchanged between the DSP and the host processor through on-
chip memory that is accessible to both the host processor and the ’C57.

•Serial Port
 Three different kinds of serial ports are available:
a general-purpose serial Port
a time-division multiplexed (TDM) serial port
 a buffered serial port (BSP).
 Each ’C5x contains at least one general-purpose, high-speed synchronous, full-duplexed
serial port interface that provides direct communication with serial devices such as codecs,
serial analog-to- digital (A/D) converters and other serial systems.
 Capable of operating at up to one fourth the machine cycle rate
 The serial port transmitter and receiver are double-buffered and individually controlled by
maskable external interrupt signals.
 Data is framed either as bytes or as words
47
Contd.,
•TDM Serial Port
 This is available on the ’C50, ’C51, and ’C53 devices is a full duplexed serial port that can
be configured by software either for synchronous operations or for time-division
multiplexed operations.
 Commonly used in multiprocessor applications.

•Test/Emulation
 On the ’C50, ’LC50, ’C51, ’LC51, ’C53, ’LC53, ’C57S and ’LC57S, an IEEE standard 1149.1
(JTAG) interface with boundary scan capability is used for emulation and test.
 It provides the boundary scan to and from the interfacing devices.
 It can be used to test pin-to-pin continuity and to perform operational tests on devices that
are peripheral to the ’C5x.
 On the ’C52, ’LC52, ’C53S, ’LC53S, ’LC56, and ’LC57, an IEEE standard 1149.1 (JTAG)
interface without boundary scan capability is used for emulation purposes only
 Can perform on-board emulation by means of the IEEE standard 1149.1 serial scan pins
and the emulation-dedicated pins.

48
Pipelining
Definition

In the operation of the pipeline, the instruction fetch, decode, operand read,
and execute operations are independent, which allows overall instruction
executions to overlap.
(Or)
The process of fetching a new instruction while other instruction on execution.

Advantages:

 Reduce the critical path.


 Increase the clock speed or sample speed.
 Reduce power consumption.
 Improve the System Performance.
 Increase the efficiency.

49
Pipeline Structure
The FOUR PHASES of pipeline structure and their functions are as
follows:
Fetch (F)
 This phase fetches the instruction words from memory and
updates the program counter (PC).
Decode (D)
 This phase decodes the instruction word and performs address
generation and ARAU updates of auxiliary registers.
Read (R)
 This phase reads operands from memory, if required.
 If the instruction uses indirect addressing mode, it will read the
memory location pointed at by the ARP before the update of the
previous decode phase.
Execute (E)
 This phase performs any specify operation, and, if required,
writes results of a previous operation to memory.

50
Prefetch Fetch Decode Access Read Execute
P F D A R E

• Prefetch: Calculate address of instruction


• Fetch: Collect instruction
• Decode: Interpret instruction
• Access: Collect address of operand
• Read: Collect operand
• Execute: Perform operation

51
Four Level Pipeline Operation

52
Contd.,

53
Example: Pipeline Operation of 1-Word Instruction

ADD *+
SAMM TREG0
MPY *+
SQRA *+, AR2

54
Addressing Modes of c5x

 Direct addressing

 Indirect addressing

 Immediate addressing

 Memory-mapped register addressing

 Circular addressing

55
Direct Addressing

 Instruction contains the lower 7 bits of the data memory address (dma).
 7-bit dma is concatenated with the 9 bits of the data memory page pointer (DP) in
status register 0 to form the full 16-bit data memory address.
 This 16-bit data memory address is placed on an internal direct data memory address
bus (DAB).
 DP points to one of 512 possible data memory pages and the 7-bit address in the
instruction points to one of 128 words within that data memory page.
 Load the DP bits by using the LDP or the LST #0 instruction.

56
Examples:

• Bits 15 through 8 contain- opcode.


• Bit 7, with a value of 0, defines addressing mode - direct
• Bits 6 through 0 contain - dma.

57
Immediate Addressing mode
used to handle the constant data.

 Data can be either 16 bit constant or 7,9,13.

 Depending on the length of data addressing mode is referred to as Long

 Immediate or Short Immediate.

 # prefix to specify the Immediate Addressing.

 Examples:

58
Contd
.,  The
Instruction word(s) contains the value of the immediate operand.
’C5x has both 1-word (8-bit, 9-bit, and 13-bit constant) short immediate
instructions and 2-word (16-bit constant) long immediate instructions.

Short Immediate Addressing

Long Immediate Addressing

59
memory mapped register addressing

 used to access efficiently the CPU and on chip peripheral registers.


 It operate like the direct addressing except upper 9 bits of the address that is accessed
assumed to be 0s.
 Only Eight instructions are their for memory mapped register addressing.

LAMM — Load accumulator with memory-mapped register


LMMR — Load memory-mapped register
SAMM — Store accumulator in memory-mapped register
SMMR — Store memory-mapped register

 Examples:

60
Contd.,

61
Indirect Addressing

 It uses auxiliary register to holds the address of the of operands in


memory.
 Each auxiliary register (AR0-AR7) provide flexible and powerful indirect
addressing

Example

62
Contd.,
Eight 16-bit auxiliary registers (AR0–AR7) provide flexible and powerful indirect
addressing. In indirect addressing, any location in the 64K-word data memory
space can be accessed using a 16-bit address contained in an AR. Figure 5–3
shows the hardware for indirect addressing.

63
Circular Addressing

Many algorithms such as convolution, correlation, and finite impulse response (FIR)
filters can use circular buffers in memory to implement a sliding window, which
contains the most recent data to be processed.
’C5x supports two concurrent circular buffers operating via the ARs.
 The following five memory-mapped registers control the circular buffer operation:
 CBSR1 — Circular buffer 1 start register
 CBSR2 — Circular buffer 2 start register
 CBER1 — Circular buffer 1 end register
CBER2 — Circular buffer 2 end register
CBCR — Circular buffer control register

64
Instruction Set
•The ’C5x instruction set supports numerically intensive signal-processing
operations as well as general-purpose applications, such as
multiprocessing and high-speed control.
• The instruction set is a superset of the ’C1x and ’C2x instruction sets and is
source-code upward compatible with both devices.
• Classifications:
 Accumulator memory reference instructions
 Auxiliary registers and data memory page pointer instructions
 Parallel Logic Unit (PLU) instructions
 Multiply instructions
 Branch and call instructions
 I/O and data memory operation instructions
 Control instructions
65
Accumulator memory reference
instructions

66
Examples:

67
Auxiliary registers and data memory
page pointer instructions

68
Examples:

69
Contd.,

70
Parallel Logic Unit (PLU) instructions

71
Examples:

72
PREG & Multiply instructions

73
Examples:

74
BRANCH INSTRUCTIONs

75
Examples:

76
control INSTRUCTIONs

77
SIMPLE PROGRAMMING EXAMPLE
64 bit Addition and Subtraction
ADDITION
.MMERGS
.TEXT
START LDP #100H
LACC 0001,10H
ADDS 0000
ADDS 0004
ADD 0005,10H
SACL 0008
SACH 0009
LACC 0003,10H
ADDC 0002
ADDS 0006
ADD 0007,10H
SACL 0010
SACH 0011
H: B H

78
SUBTRACTION
.MMERGS
.TEXT
START LDP #100H
LACC 0001,10
ADDS 0000
SUBS 0004
SUB 0005,10
SACL 0008
SACH 0009
LACC 0002,0
SUBB 0006
ADD 0003,10
SUB 0007,10
SACL 000A
SACH 000B
H: B H

79
MULTIPLICATION AND EXPRESSION EVALUATION
16 Bit Multiplication:
.MMREGS
.TEXT
LDP #100H
LACL #0
LT 0000
MPY 0001
PAC
SACL 0002,0
SACH 0003,0
H: B H

80
Y=A*X1+B*X2+C*X3

.MMREGS
.TEXT
LDP #100H
LACL #0
LT 0000
MPY 0003
LTA 0001
MPY 0004
LTA 0002
MPY 0005
APAC
SACL 0006,0
H: B H

81
GENERATION OF WAVEFORMS
SQUARE WAVE FORM
.MMERGS
.TEXT
START: LDP #120H
LACC #0H
LOOP: SACL 0
RPT #0FFH
OUT 0,04
CMPL
B LOOP
.END

SAW TOOTH WAVE FORM


.MMERGS
.TEXT
START: LDP #120H
LACC #0H
SACL 0
OUT 0,04H
LOOP: LACC 0
OUT 0,04H
ADD #05h
SACL 0
SUB #0FFFh
BCND LOOP,LEQ
B START
.END
82
83