53 views

Original Title: 3 Pipelining and Parallel Processing

Uploaded by Raghunandan Komandur

- Digital Signal Processing Using MATLAB
- Ece-Vii-dsp Algorithms & Architecture [10ec751]-Assignment
- Article on Dsp
- DRU Unit Description
- advanced microprocessor questions
- Dsp
- sprp499
- Amateur Radio Digital Modes
- IT Dept Odd Sem TT14
- Syllabus 4 07 Electrical 2007
- Vhdl Adders and Vedic Multipliers
- altera_dspbook
- 40
- ROM Based Scince Driect
- ncs_index
- tiduac1
- a107320201-Digital Signal Processing
- 00567299
- 53
- Chap 13

You are on page 1of 34

Pipelining and

Parallel Processing

Shao-Yi Chien

Shao-Yi Chien 2 DSP in VLSI Design

Introduction

Pipelining and parallel are the most important

design techniques in VLSI DSP systems

Make use of the inherent parallel property of

DSP algorithms

Pipelining: different function units working in parallel

Parallel: duplicated function units working in parallel

Shao-Yi Chien 3 DSP in VLSI Design

An Example:

Car Production Line

Throughput: how

many cars produced

in one hour

Latency: how long will

it take to produce one

car

mT

Cost: 1

Throughput: 1/mT

Latency: mT

T T

T

mT

mT

mT

m

Cost: >1

Throughput: 1/T

Latency: >mT

Cost: >n

Throughput: (1/mT)*n

Latency: >mT

n

Shao-Yi Chien 4 DSP in VLSI Design

Pipelining of Digital Filters (1/3)

FIR filter

Critical path: T

M

+2T

A

Shao-Yi Chien 5 DSP in VLSI Design

Pipelining of Digital Filters (2/3)

Pipelined FIR filter

A M sample

T T T +

A M

sample

T T

f

+

1

Shao-Yi Chien 6 DSP in VLSI Design

Pipelining of Digital Filters (3/3)

Schedule

Shao-Yi Chien 7 DSP in VLSI Design

Pipeline

Can reduce the critical path to increase

the working frequency and sample rate

T

M

+2T

A

T

M

+T

A

Shao-Yi Chien 8 DSP in VLSI Design

Drawbacks of Pipelining

Increasing latency (in cycle)

For M-level pipelined system, the number of

delay elements in any path from input to

output is (M-1) greater than the origin one

Increase the number of latches (registers)

Shao-Yi Chien 9 DSP in VLSI Design

How to Do Pipelining?

Put pipelining latches across any feed-forward

cutset of the graph

Cutset

A cutset is a set of edges of a graph such that if these

edges are removed from the graph, the graph

becomes disjoint

Feed-forward cutset

The data move in the forward direction on all the

edges of the cutset

Shao-Yi Chien 10 DSP in VLSI Design

How to Do Pipelining?

Feed-forward cutset

Shao-Yi Chien 11 DSP in VLSI Design

Example

In the SFG in Fig. (a),

all the computation time

for each node is 1 u.t.

(a) Calculate the critical

path

(b) The critical path is

reduced to 2 u.t. by

inserting 3 extra delay

elements. Is it a valid

pipelining?

Shao-Yi Chien 12 DSP in VLSI Design

Answer

(a) The critical path

is A3A5A4A6,

4 u.t.

(b) No, it is not a

valid pipelining. Fig.

(c) is the corrected

one

Shao-Yi Chien 13 DSP in VLSI Design

Fine-Grain Pipelining

Critical path (T

M

=10, T

A

=2)

T

M

+2T

A

=14

T

M

+T

A

=12

T

M1

=6 or T

M2

+T

A

=6

Shao-Yi Chien 14 DSP in VLSI Design

Notes for Pipelining (1/2)

Pipelining is a very simple design technique

which can maintain the input output data

configuration and sampling frequency

T

clk

=T

sample

Supported in many EDA tools

Still has some limitations

Pipeline bubbles

Has some problems for recursive system

Introduces large hardware cost for 2-D or 3-D data

Communication bound

Shao-Yi Chien 15 DSP in VLSI Design

Notes for Pipelining (2/2)

Effective pipelining

Put pipelining registers on the critical path

Balance pipelining

10(2+8): critical path=8

10(5+5): critical path=5

Shao-Yi Chien 16 DSP in VLSI Design

Parallel of Digital Filters (1/5)

Single-input single-output (SISO) system

Multiple-input multiple-output (MIMO) system

3-Parallel System!

Shao-Yi Chien 17 DSP in VLSI Design

Parallel of Digital Filters (2/5)

Parallel processing, block processing

Block size (L): the number of data to be

processed at the same time

Block delay (L-slow)

A latch is equivalent to L clock cycles at the

sample rate

Shao-Yi Chien 18 DSP in VLSI Design

Parallel of Digital Filters (3/5)

The critical path is the same

T

clk

is not equal to T

sample

L-slow

Shao-Yi Chien 19 DSP in VLSI Design

Parallel of Digital Filters (4/5)

Whole system

Shao-Yi Chien 20 DSP in VLSI Design

Parallel of Digital Filters (5/5)

Parallel-to-serial converter

Serial-to-parallel converter

Shao-Yi Chien 21 DSP in VLSI Design

Parallel Processing v.s.

Pipelining

Parallel processing is superior than pipelining

processing for the I/O bottleneck

(communication bounded)

Pipelining only can increase the clock rate

System clock rate = sample rate for pipelining system

Use parallel processing can further lower the required

working frequency

Shao-Yi Chien 22 DSP in VLSI Design

Pipelining-Parallel Architecture

Shao-Yi Chien 23 DSP in VLSI Design

Notes for Parallel Processing

The input/output data access scheme

should be carefully designed, it will cost a

lot sometimes

T

clk

>T

sample

, f

clk

<f

sample

Large hardware cost

Combine with pipelining processing

Shao-Yi Chien 24 DSP in VLSI Design

Low Power Issues

Pipelining and parallel processing are also

beneficial for low power design

Propagation delay

Power consumption

Assume the sampling frequency is the

same

Shao-Yi Chien 25 DSP in VLSI Design

Pipelining for Low Power (1/2)

For M-level pipelining

Critical path is reduced to 1/M

The capacitance is also reduced to C

charge

/M

The supply voltage can be reduced to , and the

propagation delay remains unchanged

Shao-Yi Chien 26 DSP in VLSI Design

Pipelining for Low Power (2/2)

Power consumption:

How about the parameter ?

Solve this equation to get

Shao-Yi Chien 27 DSP in VLSI Design

Example

Parameters

T

M

=10 u.t.

T

A

=2 u.t.

T

m1

=6 u.t.

T

m2

=4 u.t.

C

M

=5C

A

V

t

=0.6V

Normal V

cc

=5V

(a) New supply

voltage?

(b) Power saving

percentage?

Shao-Yi Chien 28 DSP in VLSI Design

Answer

(a)

Origin system:

Pipelined system:

(b)

Invalid value, less

than threshold

voltage

Shao-Yi Chien 29 DSP in VLSI Design

Parallel Processing for Low

Power (1/2)

For L-parallel system

Clock period: T

seq

LT

seq

C

charge

remains unchanged

C

totol

LC

total

Have more time to charge the capacitance, the supply

voltage can be lower

Shao-Yi Chien 30 DSP in VLSI Design

Parallel Processing for Low

Power (2/2)

Power consumption

To derive the parameter

Shao-Yi Chien 31 DSP in VLSI Design

Example

Parameters

T

M

=8 u.t.

T

A

=1 u.t.

T

sample

=9 u.t.

C

M

=8C

A

V

t

=0.45V

Normal V

cc

=3.3V

(a) New supply

voltage?

(b) Power saving

percentage?

Shao-Yi Chien 32 DSP in VLSI Design

Answer

(a)

Origin:

2-parallel system:

(b)

Invalid value, less

than threshold

voltage

Shao-Yi Chien 33 DSP in VLSI Design

Pipelining-Parallel for Low

Power

Shao-Yi Chien 34 DSP in VLSI Design

Conclusion

Pipelining

Reduce the critical path

Increase the clock speed

Reduce power consumption at the same speed

Parallel

Increase effective sampling rate

Reduce power consumption at the same speed

- Digital Signal Processing Using MATLABUploaded by오유택
- Ece-Vii-dsp Algorithms & Architecture [10ec751]-AssignmentUploaded byMuhammadMansoorGohar
- Article on DspUploaded byOnur Cem Erdoğan
- DRU Unit DescriptionUploaded byWelly Lukmanulhakim
- advanced microprocessor questionsUploaded byNikhil Pathak
- DspUploaded byKavitha Subramaniam
- sprp499Uploaded byMochamad Rizal Jauhari
- Amateur Radio Digital ModesUploaded byAravind Balasubramanian
- IT Dept Odd Sem TT14Uploaded bySrini92
- Syllabus 4 07 Electrical 2007Uploaded byBindu Chipiri
- Vhdl Adders and Vedic MultipliersUploaded byamta1
- altera_dspbookUploaded byaankka
- 40Uploaded bysubbareddy
- ROM Based Scince DriectUploaded byjebas_ece
- ncs_indexUploaded byRoberto Auquilla
- tiduac1Uploaded byphysicsnewblol
- a107320201-Digital Signal ProcessingUploaded byKoteswararao Mallidi
- 00567299Uploaded byKranthi Kumar
- 53Uploaded byArvind Baronia
- Chap 13Uploaded bySuriya Skariah
- lecture notes for DSPUploaded byAbhimanyu Chowdary
- Exercise List and Examination RulesUploaded bygiulia
- Motivation Letter ScribdUploaded byDavid
- DATA COMMUploaded byBilal Ashfaq Ahmed
- Comparison of Impedance Measurements in a DSP Using Ellipse-fit AndUploaded byskull82
- APD___Tema_3.pdfUploaded byIrina MTr
- probenwin2Uploaded byRobinJamesPayyappillil
- ChecklistUploaded bymidhun manikandan
- Lab 5Uploaded bySameera මලිත් Withanachchi
- CSE III SyllabusUploaded byMonika Kohli

- Gate 15 BrochUploaded byViveen Charan
- DocumentUploaded byRaghunandan Komandur
- Ra 1412008010035Uploaded byRaghunandan Komandur
- Exercise NoUploaded byRaghunandan Komandur
- Exercise No1Uploaded byRaghunandan Komandur
- Verilog LabUploaded byRaghunandan Komandur
- Library IeeeUploaded byRaghunandan Komandur
- hw2s04solUploaded byRaghunandan Komandur
- lecture5 (1)Uploaded byRaghunandan Komandur

- QN902x_Datasheet_v0.98_Nov_2013Uploaded byRichard Zerpa
- VL9252 Low Power Vlsi Desing 2Uploaded byRakesh Kumar D
- AREA OPTIMIZED LOW POWER ARITHMETIC AND LOGIC UNIT.pdfUploaded bysherin
- Low power and speed M-GDI based full adderUploaded byeditor3854
- PDF Low Power PDF Iep Ppt AsdUploaded bysandeep_sggs
- TCASIIadc06Uploaded byGurkaranjot Singh
- A Novel Smart Billing System for Energy Consumption Through Wireless Network (EIE)Uploaded byMATHANKUMAR.S
- CMOSUploaded byUday Kumar
- IRJET-Designing of Sram Using Lector Technique to Reduce Leakage PowerUploaded byIRJET Journal
- VTU M.Tech (VLSI)syllabusUploaded byNaveen Gowdru
- Datasheet (9)Uploaded bychandreshpate
- Energy-Efficient Signal Processing via Algorithmic Noise-ToleranceUploaded byeballiri
- ECE-AppliedElectronics.pdfUploaded byJacob Chako
- Dual Mode Logic DesignUploaded byLucas Weaver
- MURALI_VLSI Design Engineer (1) (1)Uploaded byMurali
- ImportantUploaded byLakshmi Narayanan
- Fujitsu - ASIC/COT - 40nm CMOS Technology CS302 SeriesUploaded byFujitsu Semiconductor Europe
- CMOS Full Adder Circuit TopologiesUploaded byAamodh Kuthethur
- 33. M.E.VLSI DesignUploaded byThahsin Thahir
- Estimation of Power in VLSI Circuit Using Various SimulationUploaded byBT Krishna
- Low Power Low Voltage Bulk Driven Balanced OTAUploaded byAnonymous e4UpOQEP
- NOC Error correctionUploaded byNavin Kumar
- Design High Speed, Low Noise, Low Power Two Stage CMOSUploaded bysanjeevsoni64
- Study of Sram and Its Low Power TechniquesUploaded byIAEME Publication
- IRJET-Fixed Width Replica Redundancy Block MultiplierUploaded byIRJET Journal
- Data SheetUploaded byMartin Elias Ortiz
- MIP_1_2008.pdfUploaded bylazaros
- AN2255Uploaded byNacer Mezghiche
- Design of Low Power TPG Using LP-LFSR.docUploaded byNsrc Nano Scientifc
- GREEN PPTUploaded byPrateek Sharma