Sie sind auf Seite 1von 27

DSP Processors



Figure 4.1(a) The 4x4 binary multiplication

Figure 4.1(b) The structure of a 4x4 Braun multiplier

MAC using DSP






;Clear Accumulator A


; Rep N times the next instruction


*(R0)+, *(R1)+, A

; Fetch the two memory locations pointed by R0 and

R1, multiply them together and add the result to A, the
final result is stored back in A


A, *R2

; Move result to memory

Figure 4.4 A MAC unit

Innumerical analysis, one or moreguard digitscan be used to

reduce the amount ofroundoferror.
For example, suppose that the final result of a long, multi-step
calculation can be safely rounded of toNdecimal places. That is to
say, the roundof error introduced by this final roundof makes a
negligible contribution to the overall uncertainty.
However, it is quite likely that it isnotsafe to round of the
intermediate steps in the calculation to the same number of digits.
Be aware that roundof errors can accumulate. IfMdecimal places
are used in the intermediate calculation, we say there
areMNguard digits.
Guard digits are also used in floating point operations in most
computer systems.

Figure 4.5 A MAC unit with accumulator guard


Von Neumann architecture

Figure 4.8(a) The bus structure of von Neumann

architecture(The bus structure of von Neumann
architecture (Related matter from ramesh babu 5th
edition page no 11.8) matter from ramesh babu 4th
there Neumann modelandPrinceton
Thevon Neumann architecture, alsoedition
known as

architecture, is acomputer architecturebased on that described in 1945 by the

mathematician and physicistJohn von Neumannand others.
The design of a von Neumann architecture is simpler than the more modern
Harvard architecturewhich is also a stored-program system but has one dedicated set of
address and data buses for reading data from and writing data to memory, and another set of

Harvard architecture

Figure 4.8(b) The bus structure of Harvard architecture((Related

matter from ramesh babu 5th edition page no 11.9)
TheHarvard architectureis
matter from
babu 4th edition
also there
storageand signal pathways for instructions and data. The term originated from the
Harvard Mark Irelay-based computer, which stored instructions onpunched tape(24
bits wide) and data in electro-mechanical counters. These early machines had data
storage entirely contained within thecentral processing unit, and provided no access
to the instruction storage as data. Programs needed to be loaded by an operator; the
processor could notinitializeitself.

Contrast with von Neumann architectures

Under purevon Neumann architecturetheCPUcan be either reading an
instruction or reading/writing data from/to the memory. Both cannot occur at the
same time since the instructions and data use the same bus system. In a
computer using the Harvard architecture, the CPU can both read an instruction
and perform a data memory access at the same time,even without a cache.
A Harvard architecture computer can thus be faster for a given circuit
complexity because instruction fetches and data access do not contend for a
single memory pathway.
Also, a Harvard architecture machine has distinct code and data address spaces:
instruction address zero is not the same as data address zero. Instruction
address zero might identify a twenty-four bit value, while data address zero
might indicate an eight-bit byte that is not part of that twenty-four bit value

Amodified Harvard architecturemachine is very

much like a Harvard architecture machine, but it
relaxes the strict separation between instruction
and data while still letting the CPU concurrently
access two (or more) memory buses. The most
common modification includes separate instruction
and datacachesbacked by a common address
space. While the CPU executes from cache, it acts
as a pure Harvard machine. When accessing
backing memory, it acts like a von Neumann
machine (where code can be moved around like
data, which is a powerful technique). This
modification is widespread in modern processors,
such as theARM architectureandx86processors.
It is sometimes loosely called a Harvard
architecture, overlooking the fact that it is actually
Figure 4.8(c) The bus structure for the architecture with one program
memory and two data memories

GPP(General Purpose Processor) Data Path

Memory Data


Register 1

A GPP unlike a Single Purpose

Processor can accomplish various
tasks via programs written in an
Instruction Set that the
microprocessor can recognize.
Most processors are built from a
controller, dataptath and memory


Same memory for program and


Register 2

Digital Signal Processors Data Path Only

Program Memory Data
Data Memory Data


A DSP Chip is a
microprocessor specially
designed for DSP
Harvard architecture allows
multiple memory reads
Architecture optimized to
provide rapid processing of
discrete time signals, e.g.
Multiply and Accumulate
(MAC) in one cycle





Memory structures


Pipelining is a technique which follows two or more operations to
overlap during execution.
In pipelining, a task is broken down into a number of distinct
subtasks which are then overlapped during execution. It is used
extensively in digital signal processors to increase speed.
An instruction can be broken down into three steps. Each step in
the instruction can be regarded as a stage in a pipeline and so can
be overlapped. By overlapping the instructions , a new instruction is
started at the start of each clock cycle.
The figure shows the timing diagram for a three-stage pipeline ,
drawn to highlight the instruction steps. Typically , each step in the
pipeline takes one machine cycle. Thus, during a given cycle up to
three diferent instructions may be active at the same time ,
although each will be at the diferent stage of completion.
The key to an instruction pipeline is that three parts of the
instruction (that is fetch, decode and execute) are independent and
so the execution of multiple instructions can be overlapped.
It is seen that at the ith cycle , the processor could be
simultaneously fetching the ith instruction , decoding the (i-1)th

Figure 4.13 Pipelining for speeding up the execution of an instruction

Types of DSP
Low End Fixed Point

High End Fixed Point

ADSP215XX, DSP56800

Floating Point
TMS320C3X, C67XX, ADSP210XX, DSP96000,


Figure 3.1(a) Fixed-point format to represent signed integers

Figure 3.1(b) Fixed-point format to represent signed fractions

Figure 3.2 IEEE 754 format for floating-point numbers

Figure 3.3 (a) An A/D converter with b bits for signal representation,
(b) quantization model for the A/D converter

Figure 3.3 (c) quantization error in truncation A/D converter

Figure 3.3 (d) quantization error in rounding A/D converter, (e)

probability density function for truncation error, (f) probability
density function for rounding error

Figure 3.4 An example showing the D/A converter error due to the
zero-order hold at its output: (a) DSP output, (b) D/A output,

Figure 3.4 An example showing the D/A converter error due to the
zero-order hold at its output: (c) the convolving pulse that generates
(b) from (a), (d) frequency contents of the convolving pulse in (c)

Fixed Point Vs Floating

Fixed Point/Floating Point
fixed point processor are :

less power consuming
Harder to program
Watch for errors: truncation, overflow, rounding

Limited dynamic range

Used in 95% of consumer products

floating point processors

have larger accuracy
are much easier to program
can access larger memory

It is harder to create an efficient program in C on a fixed

point processors than on floating point processors


Fixed Point Vs Floating

Floating Point

Fixed Point




Portable Products

Digital Subscriber Line (DSL)

2G, 2.5G and 3G Cell Phones

Wireless Basestations

Digital Audio Players

Central Office Switches

Digital Still Cameras

Private Branch Exchange (PBX)

Electronic Books

Digital Imaging

Voice Recognition

3D Graphics

GPS Receivers

Speech Recognition


Voice over IP

Fingerprint Recognition