Deepak 2

PIPELINING
-Deepak Haran (2000B5A3710)
WHAT IS PIPELINING??
Pipelining is an implementation technique
where multiple instructions are overlapped in execution to make fast CPUs.

It is an implementation which exploits
parallelism among the instructions in a sequential instruction stream.
THE METHODOLOGY
In a pipeline each step is called a pipe
stage/pipe segment which completes a part of an instruction. Each stage is connected with each other to form a pipe. Instructions enter at one end ,progress through each stage and exit at the other end.
THE NEED FOR PIPELINING

TO MAKE FAST CPUS. This is accomplished by increasing the
CPU throughput (the number of instructions completed per unit time) It yields a reduction in the average execution time per execution. For a machine with multiple clock cycles per instruction, pipelining is viewed as the reduction in the number of CPI.
Contd
time per instruction on a pipelined
machine time per inst. on unpipelined machine = _______________________________ Number of pipe stages
IMPLEMENTATION OF THE DLX INSTRUCTION SET

The DLX architecture has been chosen
because its simplicity makes it easy to demonstrate the principles of pipelining. Each DLX instruction can be implemented in at most 5 clock cycles. implementation requires the use of several temporary registers which simplify pipelining.

The five clock cycles are as follows :
Instruction Fetch cycle (IF) : the instruction stored in the memory corresponding to the PC is stored in the IR and (PC+4) is stored in NPC. Instruction Decode/Register Fetch Cycle : Decoding is done parallel with reading registers because the fields are at a fixed location in the format (Fixed Field Decoding).

Execution/Effective Address cycle (EX) :
The ALU operates on the operands prepared in the prior cycle performing functions depending upon the DLX instruction type. Memory access/branch completion cycle (MEM) the only instructions that are active are the loads, stores and branches. * memory reference : if the instruction is a load, then data from the memory is placed in the LMD register. If the instruction is a store

then data from the B register is written into the memory corresponding to the value stored in register ALUOutput. * Branch : if the instruction branches, the PC is replaced with the branch destination address in ALUOutput, otherwise, it is replaced with incremented PC in register NPC.

Write Back cycle (WB) :
the result is written into the register file, whether it comes from the memory system or from the ALU.

Single Cycle vs Multiple Cycle
Implementation : Multiple cycle implementation : each instruction takes multiple clock cycles to execute. In the DLX set, each instruction takes five clock cycles to implement. Single Cycle implementation : each instruction takes one long clock cycle

However the single cycle implementation
is not followed for the two reasons : 1. inefficient for those machines which have a reasonable variation among the amount of work and in the clock cycle time needed for different instructions. 2.it requires the duplication of functional units that could be shared in a multicycle implementation.
THE BASIC PIPELINE FOR DLX

Since each instruction takes 5 clock cycles to
complete, during each clock cycle the hardware initiates a new instruction and will be executing some part of the five different instructions.
Two different operations with the same data
path resource and during the same clock cycle are not simultaneously performed.

Further more, pipelining the datapath
requires that values are passed from one pipe stage to the next are placed in registers called pipeline registers. These registers convey values and control information from one stage to another.

In the DLX pipeline, the major functional
units such as ALU etc. are used in different cycles and hence overlapping the execution of multiple instructions introduces relatively few conflicts.
This is possible due to the following reasons.

The usage of different instruction and
data memories eliminates a conflict for a single memory that would arise between the instruction fetch and data memory access of different instructions. The register file is used in two stages : for reading during the ID phase and for writing in the WB stage during a particular clock cycle.

To start a new instruction every clock the
PC needs to be incremented every clock and stored. This is done in the IF stage where the incremented PC or the value of the branch target of an earlier branch is written in PC.
PIPELINE HAZARDS
WHAT ARE PIPELINE HAZARDS ??? Hazards are those situations ,that prevent the next instruction in the instruction stream from executing during its designated clock cycle. They reduce the performance from the ideal speedup gained by pipelining.
CLASSIFICATION OF HAZARDS
Structural Hazards : arise from resource
conflicts when the hardware cant support all possible combinations in simultaneous overlapped execution. Data hazards : arise when an instruction depends upon the results of a previous instruction in a way that is exposed by the overlapping of instructions in the pipeline.
CLASSIFICATION OF HAZARDS
Control Hazards : arise from the
pipelining of branches and other instructions that change the PC
STRUCTURAL HAZARDS
For any system to be free from hazards, pipelining of functional units and duplication of resources is necessary to allow all possible combinations of instructions in the pipeline. Structural hazards arise due to the following reasons :
STRUCTURAL HAZARDS
When a functional unit is not fully pipelined ,
then the sequence of instructions using that unit cannot proceed at the rate of one per clock cycle.
When the resource is not duplicated enough to
allow all possible combinations of instructions. ex : a machine may have one register file write port, but it may want to perform 2 writes during the same clock cycle.
STRUCTURAL HAZARDS
A machine with a shared single memory
for data and instructions . An instruction containing data memory reference will conflict with the instruction reference for a later instruction. This resolved by stalling the pipeline for one clock cycle when the data memory access occurs.
DATA HAZARDS
Data hazards occur when the pipeline changes the order of read/write accesses to operands so that the order differs from the order they see by sequentially executing instructions on an unpipelined machine.
CLASSIFICATION OF DATA HAZARDS

RAW (read after write ) : consider two
instructions i and j with i occurring before j. j tries to read a source before i actually writes into it , as a result j gets the old value. Ex : ADD R1,R2,R3 SUB R4,R1,R5 AND R6,R1,R7 OR R8,R1,R9 XOR R10,R1,R11

This hazard is overcome by a simple hardware technique called forwarding. in forwarding ,the ALU result from the EX/MEM register is always fed back into ALU input latches. if the forwarding hardware detects that the previous ALU operations has written the register corresponding to a source for the current ALU operation, then the control logic selects the forwarded result as the ALU input rather than the value read from the register file.

WAW (write after write) :
j tries to write an operand before it is written by i. Thus the writes are performed in the wrong order leaving the value of i as the final value. This hazard is present in pipelines that write in more than one pipe stage. However in DLX this isnt a hazard as it writes only in the WB stage.

EX : LW R1,0(R2) ADD R1,R2,R3

WRITE AFTER READ (WAR) :
j tries to write a destination before it is read by i. This doesnt happen in DLX as all reads occur early (ID phase) and all writes occur late (in WB stage). EX: SW 0(R1),R2 ADD R2,R3,R4

HAZARDS REQUIRING STALLS : Consider the situation where a load and a sub instruction are consecutive, where the destination register of load is the source register for sub. This hazard cannot be removed by forwarding. Hence a pipeline interlock is introduced to detect the hazard and stalls the pipeline until the hazard is cleared. The hazard is checked during the ID phase and stalls the instruction that wants to use the data until the source instruction produces it.
CONTROL HAZARDS
Control hazards cause a greater performance loss compared to the losses posed by data hazards. The simplest method of dealing with branches is that the pipeline is stalled as soon the branch is detected in the ID phase and until the MEM stage where the new PC is finally determined.
CONTROL HAZARDS
Each branch causes a 3 cycle stall in the DLX
pipeline which is a significant loss as the 30% of the instructions used are branch instructions.
The number of clock cycles in the branch is
reduced by testing the condition for branching in the ID stage and computing the destination address in the ID stage using a separate adder. Thus there is only clock cycle on branches .
WHAT MAKES PIPELINING HARD TO IMPLEMENT???

EXCEPTIONAL SITUATIONS : are those situations in which the normal order of execution is changed. This is due to instructions that raise exceptions that may force the machine to abort the instructions in the pipeline before they complete.

Some of the exceptions include :
o Integer arithmetic overflow/underflow.
o Power failure
o Hardware malfunctions. o I/O device request.

The five categories that are used to define what action is needed for the different execution types are : 1. synchronous/asynchronous 2. User requested/coerced 3. User maskable /non maskable 4. Within versus between instructions 5. Resume versus terminate

EXCEPTIONS IN DLX : 1. IF- page-fault on instruction fetch, misaligned memory access 2. ID- undefined/illegal opcode. 3. EX-arithmetic exceptions. 4. MEM- page-fault on data fetch, misaligned memory access. 5. WB-none
DLX FP PIPELINE
THE FLOATING POINT PIPELINE HAVE THE
SAME PIPELINE AS THE INTEGER INSTRUCTIONS EXCEPT THE FOLLOWING TWO IMPORTANT CHANGES.
The EX cycle can be repeated times to
complete operation.
DLX FP PIPELINE
There are multiple floating point
functional units 1. the main integer unit that handles loads and stores, integer ALU operations and branches. 2.FP and integer multiplier. 3.FP adder 4.FP and integer divider.
DLX FP PIPELINE
All the execution stages of these
functional units are not pipelined. FLOATING PIPELINE HAVE A LONGER LATENCY FOR OPERATIONS. Latency is defined as the number of cycles that elapse between an instruction producing the result and an instruction using the result
DLX FP PIPELINE
Latency is also the number of stages from
the EX stage to the stage that produces the result. Using the above definition ,various functional units have different latencies as shown below. 1.Integer ALU-0 2.Data Memory-1
DLX FP PIPELINE
3.FP add-3 4.FP multiply-6 5.FP divide-24 The pipeline structure has been implemented with the above latencies with the introduction of additional pipeline registers between the additional pipe-stages.
DLX FP PIPELINE
FEATURES : FP multiplier is pipelined with 7 stages. FP adder is pipelined with 4 stages. FP divider is not pipelined and requires 24 clock cycles to complete an operation. Both structural and RAW and WAW data hazards are possible.
INTERDEPENDENCE OF INSTRUCTION SET DESIGN AND PIPELINING

o Variable instruction length and execution times
lead to imbalance among pipeline stages, thus complicating hazard detection. o Sophisticated addressing modes such as postincrement that update registers complicate hazard detection. o Architectures such as 80x86 allow writes into instruction space complicate pipelining.
MIPS R4000 PIPELINE

FEATURES : MIPS-3 INSTRUCTION SET-64 BIT DEEPER PIPELINE THAN DLX-8 STAGE HIGHER CLOCK RATE ACHIEVED. BOTH LOAD AND BRANCH DELAYS ARE INCREASED BASIC BRANCH DELAY 3 CYCLES
MIPS R4000 PIPELINE

MIPS R4000 pipeline consists of 3
functional units : a floating point divider, a floating point multiplier and a floating point adder. The primary reasons for stalls in MIPS R4000 PIPELINE have been attributed to the following :
MIPS R4000 PIPELINE

Load stalls : Delays arising from the use
of a load result one or two cycles after the load. Branch stall :Two cycle stall taken on every branch taken. FP result stall : due to RAW hazards for an FP operand. FP structural stall : arising from conflicts for functional units.
THANK YOU

Deepak 2

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Deepak 2

Hochgeladen von

Copyright:

Verfügbare Formate

PIPELINING

-Deepak Haran (2000B5A3710)

where multiple instructions are overlapped in execution to make fast CPUs.

parallelism among the instructions in a sequential instruction stream.

THE NEED FOR PIPELINING

IMPLEMENTATION OF THE DLX INSTRUCTION SET

IMPLEMENTATION OF THE DLX INSTRUCTION SET

IMPLEMENTATION OF THE DLX INSTRUCTION SET

IMPLEMENTATION OF THE DLX INSTRUCTION SET

IMPLEMENTATION OF THE DLX INSTRUCTION SET

IMPLEMENTATION OF THE DLX INSTRUCTION SET

IMPLEMENTATION OF THE DLX INSTRUCTION SET

THE BASIC PIPELINE FOR DLX

THE BASIC PIPELINE FOR DLX

THE BASIC PIPELINE FOR DLX

THE BASIC PIPELINE FOR DLX

THE BASIC PIPELINE FOR DLX

pipelining of branches and other instructions that change the PC

When the resource is not duplicated enough to

CLASSIFICATION OF DATA HAZARDS

CLASSIFICATION OF DATA HAZARDS

CLASSIFICATION OF DATA HAZARDS

CLASSIFICATION OF DATA HAZARDS

CLASSIFICATION OF DATA HAZARDS

CLASSIFICATION OF DATA HAZARDS

WHAT MAKES PIPELINING HARD TO IMPLEMENT???

WHAT MAKES PIPELINING HARD TO IMPLEMENT???

WHAT MAKES PIPELINING HARD TO IMPLEMENT???

WHAT MAKES PIPELINING HARD TO IMPLEMENT???

INTERDEPENDENCE OF INSTRUCTION SET DESIGN AND PIPELINING

MIPS R4000 PIPELINE

MIPS R4000 PIPELINE

MIPS R4000 PIPELINE

Das könnte Ihnen auch gefallen