Sie sind auf Seite 1von 9

Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining

Pipelines - #1 RISC ISA without pipe


RISC Reduced Instruction Set Computer or Load-Store machine is an ISA that came to prominence in the 1980s and popularized the utilization of pipelines in instruction set architectures (IBM mainframe computers pioneered the utilization of pipelines 2 decades earlier!). RISC architectures generally considered to consist of the following 3 properties, o All operations on data apply to registers alone, affecting the entire register content (32 or 64 bit); o Only load (memory to register transfer) and store (register to memory) operations may involve memory; o Minimalist provision for different instruction formats, with all instructions typically being the same number of bytes. e.g. Opcode Target Source1 Immediate/ Source2 Register-level Transfer Language : (a) Instruction with immediate operand: Target Source OPCODE Immediate (b) Instruction with two registers: Target Source1 OPCODE Source2 Opcode examples ADD DADD DADDI DADDUI AND OR Add Add, double Add, double, immediate Add, double, unsigned immediate Logical, no differentiation between 32 or 64 bit Source Two Registers Register and Sign extended immediate Register Target The above generic series of properties result in the fetch-decode-execute cycle having predicable properties (with respect to a Complex Instruction Set Computer CISC). In addition most RISC architectures have three classes of instruction, 1. ALU instructions

Malcolm Heywood CSCI3121 Dalhousie University FCS

Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining

2. Load-Store instructions Operand (address) (Effective) Address Calculation Load register detailing Base Register AND (sign extended) offset Offset + base register content destination for loaded data. Store register detailing data for storage. 3. Branch or jump instructions Comparison Operand Register to register, or Register to zero Test Specified by opcode Destination ID Add sign extended offset to PC Operand (register)

NOTE RTL and tables summarize example format, but should not be considered as a practical pipelined architecture!

Implementation of RISC ISA


Having defined the basic instruction types now in a position to consider the requirements of a generic fetch-decode-execute cycle. Consider case of NO PIPELINE and then modify for pipelined case. Generic design decisions, o Determine the number of stages; o Determine any special purpose register requirements. 1. Consider, Instruction Fetch (IF) IR Mem[PC] NPC PC + 4; 2. Instruction Decode/ Register Fetch (ID) A Reg[IR(source1)]; B Reg[IR(source2)]; // load 32 bit word // memory is byte addressable // extract source1 // extract source2

Imm (sign ext.)##IR(Immediate); // extract immediate

Malcolm Heywood CSCI3121 Dalhousie University FCS

Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining

3.

Execution/ Effective Address (EX)

One of 4 cases depending on the ID stage, (a) Memory Reference, ALU(out) A + Imm (b) Register-Register ALU operation, ALU(out) A opcode B (c) Register Immediate ALU instruction, ALU(out) A opcode Imm (d) Branch, ALU(out) NPC + Imm Cond (A opcode 0)

4.

Memory Access (MEM)

PC NPC One of either, (a) Memory reference, LMD Mem[ALU(out)] Mem[ALU(out)] B (b) Branch, IF (cond) THEN PC ALU(out) ; or

5.

Write-back (WB)

In the case of a register ALU operation OR a load instruction we need to update the register contents to reflect the new values. (a) Register-Register ALU instruction, Reg[target] ALUout (b) Register-Immediate ALU instruction, Reg[target] ALUout (c) Load Instruction, Reg[target] LMD

Figure 1 summarizes the 5 stages. Note (a) PC and registers are shown in stages corresponding to their read position.

Malcolm Heywood CSCI3121 Dalhousie University FCS

Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining

(b) Multiplexers indicate location of write for PC and registers AND the temporal dependency in this decision. (c) Backward flowing dependency will cause problems for efficient operation of any pipelining.

Figure 1: MIPS RISC data path. Comments, Branch and Store complete in 4 cycles; All other instruction types complete in 5 cycles; ALU instructions idle in MEM stage, so an optimal implementation (without pipelining) would complete ALU instructions here. Two ALUs would not be necessary if we where not interested in basing our pipeline implementation on this. Registers for storing intermediate results would also not be necessary.

Nave RISC Instruction Pipeline


General objective of our instruction pipeline is to overlap the execution of each stage of the fetch-decode-execute cycle in the above example RISC ISA. Ideally this results in the following summary of operation, 4
Malcolm Heywood CSCI3121 Dalhousie University FCS

Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining

Instruction 1 i i+1 i+2 i+3 i+4 IF 2 ID IF 3 EX ID IF 4 MEM EX ID IF

Clock 5 WB MEM EX ID IF WB MEM EX ID WB MEM EX WB MEM WB 6 7 8 9

Figure 1 indicates that the following pipeline hazards exist, o Structural or Resource Hazards Hardware cannot be shared between different stages (two ALUs in figure 1 to avoid this). o Data Hazards operation of one instruction dependent on the result from the previous instruction. Why is this not permissible? contents of PC changed by a branch or jump instruction. Why might this be a problem? o Control Hazards

Some Structural Hazards


The data path for our RISC fetch-decode-execute pipe is shown in figure 2. o Major functional units need to appear in different cycles. Is this the case? avoid any conflict between DM and IM; implies memory BW increases by 5 if clock period unchanged with respect to an architecture without the pipeline Why? o Internal Registers utilized twice in the same cycle, i.e. Read from register in ID; 5 o Instruction and Data memory should be independent (use different buses)

Malcolm Heywood CSCI3121 Dalhousie University FCS

Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining

Write to register in WB; How can we deal with this?

Figure 2: Generic 5-stage pipe. o PC needs to change every clock cycleimplications? Figure 1 indicates that: Separate ALU necessary for NPC calculation AND branch offset calculation; Dont know the result of branch until 3 stages later! o Necessary to save the complete state of each instruction as it progresses between different pipeline stage. Pipeline registers provide this, as per figure 3;

Malcolm Heywood CSCI3121 Dalhousie University FCS

Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining

Figure 3: Pipeline registers isolate each stage by propagating the value of local registers as each instruction passes through the pipe.

Summary
RISC ISA provided a suitable framework for understanding instruction pipelines leading their widespread introduction into micro-processor chip sets in the 1980s. Instruction pipelines o force a series of constraints on the ISA; o result in duplicated hardware between stages; o very sensitive to branch type instructions.

Malcolm Heywood CSCI3121 Dalhousie University FCS

Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining

Example #1
With respect to the nave 5-stage RISC piped and unpiped architectures, let, Clock period be 1ns; ALU and branch instructions take 4 cycles; Memory operations take 5 cycles; Frequency of ALU instruction is 40%; Frequency of branch instruction is 20%; Frequency of memory instruction is 40%; Additional complexity of a pipeline adds 0.2ns to the clock period; 1. Why do ALU and branch instructions complete one clock cycle faster than memory instructions? 2. Assuming a full pipe and no hazards, how much speed up does the piped architecture provide with respect to the architecture without a pipe?

Question:

Example #2
Consider the case of an architecture in which separate instruction and data buses do not exist. With respect to figure 4, a data load at instruction #0 causes a structural hazard with the instruction fetch at instruction #1. The instruction being fetched is therefore stalled delaying instruction #3 and all following instructions by one clock cycle and increasing the CPI of the pipeline, figure 5. Let the following conditions hold, CPI of the pipe without stalls is 1; Data load/ store instructions constitute 40% of the instruction mix; Frequency of the pipe encountering stalls is 1.05 times higher than the pipe without stalls. 1. What is the speedup of the pipe that avoids stalls? 2. If such a stall is so detrimental to the operation of a pipeline, are there any conditions under which structural hazards of this nature might be permitted?

Questions:

Malcolm Heywood CSCI3121 Dalhousie University FCS

Hennessy and Patterson, Computer Architecture a Quantitative Approach (4th Ed) Appendix A Instruction Set Pipelining

Figure 4: Effect of single port memory on generic 5-stage pipe. Instruction 1 i i+1 i+2 i+3 i+4 IF 2 ID IF 3 EX ID IF 4 MEM EX ID stall Clock 5 WB MEM EX IF WB MEM ID IF WB EX ID MEM EX WB MEM 6 7 8 9

Figure 5: Case of structural hazard introduced by single port memory.

Malcolm Heywood CSCI3121 Dalhousie University FCS

Das könnte Ihnen auch gefallen