Sie sind auf Seite 1von 27

Advanced Computer Architecture (CS2354)

ILP Speculation
BY REJIN PAUL Lecturer,CSE Dept.

Outline
Speculation to Greater ILP Speculative Tomasulo Example

3/24/2013

CPE 731, ILP4

Speculation to Greater ILP


Greater ILP: Overcome control dependence by hardware speculating on outcome of branches and executing program as if guesses were correct
Speculation fetch, issue, and execute instructions as if branch predictions were always correct Dynamic scheduling only fetches and issues instructions

Essentially a data flow execution model: Operations execute as soon as their operands are available

3/24/2013

CPE 731, ILP4

Speculation to Greater ILP


To optimally exploit ILP (instruction-level parallelism) e.g. with pipelining, Tomasulo, etc. it is critical to efficiently maintain control dependencies (=branch dependencies) Key idea: Speculate on the outcome of branches(=predict) and execute instructions as if the predictions are correct of course, we must proceed in such a manner as to be able to recover if our speculation turns out wrong

Modern processors such as PowerPC 603/604, MIPS R10000, Intel Pentium II/III/4, Alpha 21264 extend Tomasulos approach to support speculation

3/24/2013

CPE 731, ILP4

Speculation to Greater ILP


Key ideas: separate execution from completion: allow instructions to execute speculatively but do not let instructions update registers or memory until they are no longer speculative therefore, add a final step after an instruction is no longer speculative when it is allowed to make register and memory updates, called instruction commit allow instructions to execute and complete out of order but force them to commit in order add a hardware buffer, called the reorder buffer (ROB), with registers to hold the result of an instruction between completion and commit
3/24/2013 CPE 731, ILP4 5

Speculation to Greater ILP


3 components of HW-based speculation: 1. Dynamic branch prediction to choose which instructions to execute 2. Speculation to allow execution of instructions before control dependences are resolved
+ ability to undo effects of incorrectly speculated sequence

3. Dynamic scheduling to deal with scheduling of different combinations of basic blocks

3/24/2013

CPE 731, ILP4

Adding Speculation to Tomasulo


Must separate execution from allowing instruction to finish or commit This additional step called instruction commit When an instruction is no longer speculative, allow it to update the register file or memory Requires additional set of buffers to hold results of instructions that have finished execution but have not committed This reorder buffer (ROB) is also used to pass results among instructions that may be speculated

3/24/2013

CPE 731, ILP4

Adding Speculation to Tomasulo

FP Op Queue

Reorder Buffer
FP Regs

Res Stations FP Adder

Res Stations FP Adder

3/24/2013

CPE 731, ILP4

3/24/2013

CPE 731, ILP4

Reorder Buffer (ROB)


In Tomasulos algorithm, once an instruction writes its result, any subsequently issued instructions will find result in the register file With speculation, the register file is not updated until the instruction commits
(we know definitively that the instruction should execute)

Thus, the ROB supplies operands in interval between completion of instruction execution and instruction commit
ROB is a source of operands for instructions, just as reservation stations (RS) provide operands in Tomasulos algorithm ROB extends architectured registers like RS

3/24/2013

CPE 731, ILP4

10

Reorder Buffer Entry


Each entry in the ROB contains four fields: 1. Instruction type
a branch (has no destination result), a store (has a memory address destination), or a register operation (ALU operation or load, which has register destinations) Register number (for loads and ALU operations) or memory address (for stores) where the instruction result should be written Value of instruction result until the instruction commits

2. Destination

3. Value

4. Ready
Indicates that instruction has completed execution, and the value is ready

3/24/2013

CPE 731, ILP4

11

Reorder Buffer operation


Holds instructions in FIFO order, exactly as issued When instructions complete, results placed into ROB
Supplies operands to other instructions between execution complete & commit more registers like RS Tag results with ROB buffer number instead of reservation station

Instructions commit values at head of ROB placed in registers Reorder As a result, easy to undo Buffer speculated instructions FP Op on mispredicted branches Queue FP Regs or on exceptions
Commit path
Res Stations FP Adder
3/24/2013 CPE 731, ILP4

Res Stations FP Adder


12

Recall: 4 Steps of Speculative Tomasulo Algorithm


1. Issueget instruction from FP Op Queue
If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination (this stage sometimes called dispatch)

2. Executionoperate on operands (EX)


When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called issue)

3. Write resultfinish execution (WB)


Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available.

4. Commitupdate register with reorder result


When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called graduation)
3/24/2013 CPE 731, ILP4 13

Tomasulo With Reorder buffer:


FP Op Queue Done?
ROB7 ROB6 ROB5 ROB4 ROB3 ROB2

Newest

Reorder Buffer
F0 LD F0,10(R2) N

ROB1

Oldest

Registers
Dest Dest Reservation Stations

To Memory from Memory Dest 1 10+R2 FP multipliers


14

FP adders
3/24/2013

CPE 731, ILP4

Tomasulo With Reorder buffer:


FP Op Queue Done?
ROB7 ROB6 ROB5 ROB4 ROB3

Newest

Reorder Buffer
F10 F0 ADDD F10,F4,F0 LD F0,10(R2) N N

ROB2 ROB1

Oldest

Registers
Dest 2 ADDD R(F4),ROB1 Dest Reservation Stations

To Memory from Memory Dest 1 10+R2 FP multipliers


15

FP adders
3/24/2013

CPE 731, ILP4

Tomasulo With Reorder buffer:


FP Op Queue Done?
ROB7 ROB6 ROB5 ROB4

Newest

Reorder Buffer

F2 F10 F0

DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2)

N N N

ROB3 ROB2 ROB1

Oldest

Registers
Dest 2 ADDD R(F4),ROB1 Dest 3 DIVD ROB2,R(F6) Reservation Stations

To Memory from Memory Dest 1 10+R2 FP multipliers


16

FP adders
3/24/2013

CPE 731, ILP4

Tomasulo With Reorder buffer:


FP Op Queue Done?
ROB7

Reorder Buffer

F0 F4 -F2 F10 F0

ADDD F0,F4,F6 LD F4,0(R3) BNE F2,<> DIVD F2,F10,F6 ADDD F10,F4,F0 LD F0,10(R2)

N N N N N N

ROB6 ROB5 ROB4 ROB3 ROB2 ROB1

Newest

Oldest

Registers
Dest 2 ADDD R(F4),ROB1 6 ADDD ROB5, R(F6) Dest 3 DIVD ROB2,R(F6)

To Memory from Memory Dest 1 10+R2 5 0+R3

FP adders
3/24/2013

Reservation Stations

FP multipliers

CPE 731, ILP4

17

Tomasulo With Reorder buffer:


FP Op Queue -- ROB5 F0 F4 -F2 F10 F0 Done? ST F4,0(R3) N ROB7 ADDD F0,F4,F6 N ROB6 LD F4,0(R3) N ROB5 BNE F2,<> N ROB4 DIVD F2,F10,F6 N ROB3 ADDD F10,F4,F0 N ROB2 LD F0,10(R2) N ROB1 Newest

Reorder Buffer

Oldest

Registers
Dest 2 ADDD R(F4),ROB1 6 ADDD ROB5, R(F6) Dest 3 DIVD ROB2,R(F6)

To Memory from Memory Dest 1 10+R2 5 0+R3

FP adders
3/24/2013

Reservation Stations

FP multipliers

CPE 731, ILP4

18

Tomasulo With Reorder buffer:


FP Op Queue -- M[10] F0 F4 M[10] -F2 F10 F0 Done? ST F4,0(R3) Y ROB7 ADDD F0,F4,F6 N ROB6 LD F4,0(R3) Y ROB5 BNE F2,<> N ROB4 DIVD F2,F10,F6 N ROB3 ADDD F10,F4,F0 N ROB2 LD F0,10(R2) N ROB1 Newest

Reorder Buffer

Oldest

Registers
Dest 2 ADDD R(F4),ROB1 6 ADDD M[10],R(F6) Dest 3 DIVD ROB2,R(F6)

To Memory from Memory Dest 1 10+R2 FP multipliers


19

FP adders
3/24/2013

Reservation Stations

CPE 731, ILP4

Tomasulo With Reorder buffer:


FP Op Queue Done? -- M[10] ST F4,0(R3) Y ROB7 F0 <val2> ADDD F0,F4,F6 Y ROB6 F4 M[10] LD F4,0(R3) Y ROB5 -BNE F2,<> N ROB4 F2 DIVD F2,F10,F6 N ROB3 F10 ADDD F10,F4,F0 N ROB2 F0 LD F0,10(R2) N ROB1 Newest

Reorder Buffer

Oldest

Registers
Dest 2 ADDD R(F4),ROB1 Dest 3 DIVD ROB2,R(F6) Reservation Stations

To Memory from Memory Dest 1 10+R2 FP multipliers


20

FP adders
3/24/2013

CPE 731, ILP4

Tomasulo With Reorder buffer:


FP Op Queue Done? -- M[10] ST F4,0(R3) Y ROB7 F0 <val2> ADDD F0,F4,F6 Y ROB6 F4 M[10] LD F4,0(R3) Y ROB5 -BNE F2,<> N ROB4 F2 DIVD F2,F10,F6 N ROB3 F10 ADDD F10,F4,F0 N ROB2 F0 M[20] LD F0,10(R2) Y ROB1 Newest

Reorder Buffer

Oldest

Registers
Dest 2 ADDD R(F4),M[20] Dest 3 DIVD ROB2,R(F6)

To Memory from Memory Dest FP multipliers


21

FP adders
3/24/2013

Reservation Stations

CPE 731, ILP4

Tomasulo With Reorder buffer:


FP Op Queue Done? -- M[10] ST F4,0(R3) Y ROB7 F0 <val2> ADDD F0,F4,F6 Y ROB6 F4 M[10] LD F4,0(R3) Y ROB5 -BNE F2,<> N ROB4 F2 DIVD F2,F10,F6 N ROB3 F10 ADDD F10,F4,F0 N ROB2
ROB1

Newest

Reorder Buffer

Oldest

Registers
Dest 2 ADDD R(F4),M[20] Dest 3 DIVD ROB2,R(F6)

To Memory from Memory Dest FP multipliers


22

FP adders
3/24/2013

Reservation Stations

CPE 731, ILP4

Tomasulo With Reorder buffer:


FP Op Queue -F0 F4 -F2 F10 Done? M[10] ST F4,0(R3) Y ROB7 <val2> ADDD F0,F4,F6 Y ROB6 M[10] LD F4,0(R3) Y ROB5 BNE F2,<> N ROB4 DIVD F2,F10,F6 N ROB3 <val3> ADDD F10,F4,F0 Y ROB2
ROB1

Newest

Reorder Buffer

Oldest

Registers
Dest Dest 3 DIVD val3,R(F6) Reservation Stations

To Memory from Memory Dest FP multipliers


23

FP adders
3/24/2013

CPE 731, ILP4

Tomasulo With Reorder buffer:


FP Op Queue Done? -- M[10] ST F4,0(R3) Y ROB7 F0 <val2> ADDD F0,F4,F6 Y ROB6 F4 M[10] LD F4,0(R3) Y ROB5 -BNE F2,<> N ROB4 F2 <val4> DIVD F2,F10,F6 Y ROB3
ROB2 ROB1

Newest

Reorder Buffer

Oldest

Registers
Dest Dest Reservation Stations

To Memory from Memory Dest FP multipliers


24

FP adders
3/24/2013

CPE 731, ILP4

Tomasulo With Reorder buffer:


FP Op Queue Done? -- M[10] ST F4,0(R3) Y ROB7 F0 <val2> ADDD F0,F4,F6 Y ROB6 F4 M[10] LD F4,0(R3) Y ROB5 -- Wrong BNE F2,<> Y ROB4
ROB3 ROB2 ROB1

Newest

Reorder Buffer

Oldest

Registers
Dest Dest Reservation Stations

To Memory from Memory Dest FP multipliers


25

FP adders
3/24/2013

CPE 731, ILP4

Tomasulo With Reorder buffer:


FP Op Queue Done? -- M[10] ST F4,0(R3) Y ROB7 F0 <val2> ADDD F0,F4,F6 Y ROB6 F4 M[10] LD F4,0(R3) Y ROB5 -BNE F2,<> N ROB4 F2 DIVD F2,F10,F6 N ROB3 F10 ADDD F10,F4,F0 N ROB2 F0 LD F0,10(R2) N ROB1 Newest

Reorder Buffer
What about memory hazards???

Oldest

Registers
Dest 2 ADDD R(F4),ROB1 Dest 3 DIVD ROB2,R(F6) Reservation Stations

To Memory from Memory Dest 1 10+R2 FP multipliers


26

FP adders
3/24/2013

CPE 731, ILP4

THANK YOU

3/24/2013

CPE 731, ILP4

27

Das könnte Ihnen auch gefallen