Sie sind auf Seite 1von 54

Dynamic Scheduling Using Tomasulos Algorithm

Lotzi Blni

EEL 5708

Acknowledgements
All the lecture slides were adopted from the slides of David Patterson (1998, 2001) and David E. Culler (2001), Copyright 1998-2002, University of California Berkeley

EEL 5708

Dynamic Scheduling
A major limitation of the simple pipelining techniques is in-order execution If an instruction is stalled in the pipeline all the instructions behind it must wait
Even if there would be enough hardware resources to execute them

Solution: Let the instructions behind the stalled instruction proceed


Split the Instruction Decode phase of the pipeline into: Issue: decode instruction and check for structural hazards Read operands: wait until no data hazards, then read operands We will have out-of-order execution and out-of-order completion of the instructions.

EEL 5708

Problems with dynamic scheduling


Out-of-order execution introduces the possibility of WAR and WAW hazards
Handled by register renaming

Problems with handling exceptions:


We dont care about the internals but we expect that a program would generate exactly the same exceptions as if the program would be executed in order. Imprecise exceptions: the right exceptions are generated, but the state of the processor is not the same as if the program would be executed in order. It makes it difficult to recover from exceptions. Precise exceptions can be achieved by speculation.

EEL 5708

Tomasulos Algorithm
Designed for the IBM 360/91, by Robert Tomasulo Goal: high performance without special compilers IBM 360 had only 4 FP registers
Solution: register renaming

Why Study? leads to Alpha 21264, HP 8000, MIPS 10000, Pentium II, PowerPC 604,

EEL 5708

Tomasulos Algorithm
Control & buffers distributed with Function Units (FU) FU buffers called reservation stations; have pending operands Registers in instructions replaced by values or pointers to reservation stations(RS); called register renaming ;
avoids WAR, WAW hazards More reservation stations than registers, so can do optimizations compilers cant

Results to FU from RS, not through registers, over Common Data Bus that broadcasts results to all FUs Load and Stores treated as FUs with RSs as well Integer instructions can go past branches, allowing FP ops beyond basic block in FP queue

EEL 5708

Tomasulo organization

EEL 5708

Reservation Station Components


OpOperation to perform in the unit (e.g., + or ) Vj, VkValue of Source operands
Store buffers has V field, result to be stored

Qj, QkReservation stations producing source registers (value to be written)


Qj,Qk=0 => ready
Store buffers only have Qi for RS producing result

BusyIndicates reservation station or FU is busy Register result statusIndicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

EEL 5708

Three Stages of Tomasulo Algorithm


1. Issueget instruction from FP Op Queue
If reservation station free (no structural hazard), control issues instr & sends operands (renames registers).

2. Executionoperate on operands (EX)


When both operands ready then execute; if not ready, watch Common Data Bus for result

3. Write resultfinish execution (WB)


Write on Common Data Bus to all awaiting units; mark reservation station available

Normal data bus: data + destination (go to bus) Common data bus: data + source (come from bus) 64 bits of data + 4 bits of Functional Unit source address
Write if matches expected Functional Unit (produces result) Does the broadcast

EEL 5708

Tomasulo Example Cycle 0


Instruction status Instruction j k Issue LD F6 34+ R2 LD F2 45+ R3 MULT F0 F2 F4 SUBD F8 F6 F2 DIVDF10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 No 0 Mult2 No Register result status Execution Write complete Result Busy Address Load1 No Load2 No Load3 No

S1 Vj

S2 Vk

RS for j Qj

RS for k Qk

Clock
0 FU

F0

F2

F4

F6

F8

F10 F12 ...

F30

EEL 5708

Tomasulo Example Cycle 1


Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 MULTD F2 F0 F4 SUBD F8 F6 F2 DIVDF10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name BusyOp 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 No Register result status Execution Write complete Result Busy Address Load1 No 34+R2 Yes Load2 No Load3 No

S1 Vj

S2 Vk

RS for j Qj

RS for k Qk

Clock
1 FU

F0 F2

F4

F6
Load1

F8

F10 F12 ...

F30

EEL 5708

Tomasulo Example Cycle 2


Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTD F2 F0 F4 SUBD F8 F6 F2 DIVDF10 F0 F6 ADDD F6 F8 F2 Reservation Stations Time Name BusyOp 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 No Register result status Execution Write complete Result Busy Address Load1 Yes 34+R2 Load2 Yes 45+R3 Load3 No

S1 Vj

S2 Vk

RS for j Qj

RS for k Qk

Clock
2 FU

F0

F2
Load2

F4

F6
Load1

F8

F10 F12 ...

F30

EEL 5708

Tomasulo Example Cycle 3


Instruction status Execution Instruction j k Issue complete LD F6 34+ R2 1 3 LD F2 45+ R3 2 MULTD F2 F0 F4 3 SUBD F8 F6 F2 DIVDF10 F0 F6 ADDD F6 F8 F2 S1 Reservation Stations Time Name BusyOp Vj 0 Add1 No 0 Add2 No Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Write Result Busy Address Load1 Yes 34+R2 Load2 Yes 45+R3 Load3 No

S2 Vk

RS for j Qj

RS for k Qk

R(F4)

Load2

Clock
3

F0

F2

F4

F6
Load1

F8

F10 F12 ...

F30

FU Mult1 Load2

Note: registers names are removed (renamed) in Reservation Stations EEL 5708 Load1 completing; what is waiting for Load1?

Tomasulo Example Cycle 4


Instruction status Execution Write Instruction j k Issue complete Result LD F6 34+ R2 1 3 4 LD F2 45+ R3 2 4 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 DIVDF10 F0 F6 ADDD F6 F8 F2 Reservation Stations S1 S2 Time Name BusyOp Vj Vk 0 Add1 Yes SUBDM(34+R2) 0 Add2 No Add3 No R(F4) 0 Mult1 Yes MULTD 0 Mult2 No Register result status Busy Address Load1 No Load2 Yes 45+R3 Load3 No

RS for j Qj

RS for k Qk Load2

Load2

Clock
4

F0 F2
FU Mult1 Load2

F4

F6

F8

F10 F12 ...

F30

M(34+R2) Add1
EEL 5708

Load2 completing; what is waiting for it?

Tomasulo Example Cycle 5


Instruction status Execution Write Instruction j k Issue complete Result LD F6 34+ R2 1 3 4 LD F2 45+ R3 2 4 5 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 DIVDF10 F0 F6 5 ADDD F6 F8 F2 Reservation Stations S1 S2 RS for j Time Name BusyOp Vj Vk Qj 2 Add1 Yes SUBDM(34+R2) M(45+R3) 0 Add2 No Add3 No M 10 Mult1 Yes MULTD (45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Busy Address Load1 No Load2 No Load3 No

RS for k Qk

Clock
5

F0 F2
FU Mult1 M(45+R3)

F4

F6

F8

F10 F12 ...

F30

M(34+R2) Add1 Mult2


EEL 5708

Tomasulo Example Cycle 6


Instruction status Execution Write Instruction j k Issue complete Result LD F6 34+ R2 1 3 4 LD F2 45+ R3 2 4 5 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 DIVDF10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations S1 S2 RS for j Time Name BusyOp Vj Vk Qj 1 Add1 Yes SUBDM(34+R2) M(45+R3) 0 Add2 Yes ADDD M(45+R3) Add1 Add3 No 9 Mult1 Yes MULTD (45+R3) R(F4) M 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Busy Address Load1 No Load2 No Load3 No

RS for k Qk

Clock
6

F0 F2
FU Mult1 M(45+R3)

F4

F6
Add2

F8

F10 F12 ...

F30

Add1 Mult2
EEL 5708

Tomasulo Example Cycle 7


Instruction status Execution Write Instruction j k Issue complete Result LD F6 34+ R2 1 3 4 LD F2 45+ R3 2 4 5 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 7 DIVDF10 F0 F6 5 ADDD F6 F8 F2 6 S1 S2 RS for j Reservation Stations Time Name BusyOp Vj Vk Qj 0 Add1 Yes SUBDM(34+R2) M(45+R3) 0 Add2 Yes ADDD M(45+R3) Add1 Add3 No M 8 Mult1 Yes MULTD (45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Busy Address Load1 No Load2 No Load3 No

RS for k Qk

Clock
7

F0 F2
FU Mult1 M(45+R3)

F4

F6
Add2

F8

F10 F12 ...

F30

Add1 Mult2
EEL 5708

Add1 completing; what is waiting for it?

Tomasulo Example Cycle 8


Instruction status Execution Write Instruction j k Issue complete Result LD F6 34+ R2 1 3 4 LD F2 45+ R3 2 4 5 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 7 8 DIVDF10 F0 F6 5 ADDD F6 F8 F2 6 S1 S2 RS for j Reservation Stations Time Name BusyOp Vj Vk Qj 0 Add1 No 2 Add2 Yes ADDDM()-M() M(45+R3) 0 Add3 No M 7 Mult1 Yes MULTD (45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Busy Address Load1 No Load2 No Load3 No

RS for k Qk

Clock
8

F0 F2
FU Mult1 M(45+R3)

F4

F6
Add2

F8

F10 F12 ...

F30

M()-M()Mult2
EEL 5708

Tomasulo Example Cycle 9


Instruction status Execution Write Instruction j k Issue complete Result LD F6 34+ R2 1 3 4 LD F2 45+ R3 2 4 5 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 7 8 DIVDF10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations S1 S2 RS for j Time Name BusyOp Vj Vk Qj 0 Add1 No 1 Add2 Yes ADDDM()M() M(45+R3) 0 Add3 No 6 Mult1 Yes MULTD (45+R3) R(F4) M 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Busy Address Load1 No Load2 No Load3 No

RS for k Qk

Clock
9

F0 F2
FU Mult1 M(45+R3)

F4

F6
Add2

F8

F10 F12 ...

F30

M()M() ult2 M
EEL 5708

Tomasulo Example Cycle 10


Instruction status Execution Write Instruction j k Issue complete Result LD F6 34+ R2 1 3 4 LD F2 45+ R3 2 4 5 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 7 8 DIVDF10 F0 F6 5 ADDD F6 F8 F2 6 10 Reservation Stations S1 S2 RS for j Time Name BusyOp Vj Vk Qj 0 Add1 No 0 Add2 Yes ADDDM()M() M(45+R3) 0 Add3 No M 5 Mult1 Yes MULTD (45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Busy Address Load1 No Load2 No Load3 No

RS for k Qk

Clock
10

F0

F2

F4

F6
Add2

F8

F10 F12 ...

F30

FU Mult1 M(45+R3)

M()M() ult2 M
EEL 5708

Add2 completing; what is waiting for it?

Tomasulo Example Cycle 11


Instruction status Execution Write Instruction j k Issue complete Result LD F6 34+ R2 1 3 4 LD F2 45+ R3 2 4 5 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 7 8 DIVDF10 F0 F6 5 ADDD F6 F8 F2 6 10 11 Reservation Stations S1 S2 RS for j Time Name BusyOp Vj Vk Qj 0 Add1 No 0 Add2 No 0 Add3 No M 4 Mult1 Yes MULTD (45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Busy Address Load1 No Load2 No Load3 No

RS for k Qk

Clock
11

F0

F2

F4

F6

F8

F10 F12 ...

F30

FU Mult1 M(45+R3)

(M-M)+M()M()M() ult2 M

EEL 5708

Tomasulo Example Cycle 12


Instruction status Execution Write Instruction j k Issue complete Result LD F6 34+ R2 1 3 4 LD F2 45+ R3 2 4 5 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 6 7 DIVDF10 F0 F6 5 ADDD F6 F8 F2 6 10 11 Reservation Stations S1 S2 RS for j Time Name BusyOp Vj Vk Qj 0 Add1 No 0 Add2 No 0 Add3 No M 3 Mult1 Yes MULTD (45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Busy Address Load1 No Load2 No Load3 No

RS for k Qk

Clock
12

F0 F2
FU Mult1 M(45+R3)

F4

F6

F8

F10 F12 ...

F30

(M-M)+M()M()M() ult2 M
EEL 5708

Note: all quick instructions complete already

Tomasulo Example Cycle 13


Instruction status Execution Write Instruction j k Issue complete Result LD F6 34+ R2 1 3 4 LD F2 45+ R3 2 4 5 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 7 8 DIVDF10 F0 F6 5 ADDD F6 F8 F2 6 10 11 S1 S2 RS for j Reservation Stations Time Name BusyOp Vj Vk Qj 0 Add1 No 0 Add2 No Add3 No 2 Mult1 Yes MULTD (45+R3) R(F4) M 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Busy Address Load1 No Load2 No Load3 No

RS for k Qk

Clock
13

F0 F2
FU Mult1 M(45+R3)

F4

F6

F8

F10 F12 ...

F30

(MM)+M()M()M() ult2 M
EEL 5708

Tomasulo Example Cycle 14


Instruction status Execution Write Instruction j k Issue complete Result LD F6 34+ R2 1 3 4 LD F2 45+ R3 2 4 5 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 7 8 DIVDF10 F0 F6 5 ADDD F6 F8 F2 6 10 11 Reservation Stations S1 S2 RS for j Time Name BusyOp Vj Vk Qj 0 Add1 No 0 Add2 No 0 Add3 No M 1 Mult1 Yes MULTD (45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Busy Address Load1 No Load2 No Load3 No

RS for k Qk

Clock
14

F0 F2
FU Mult1 M(45+R3)

F4

F6

F8

F10 F12 ...

F30

(MM)+M() M()M() ult2 M


EEL 5708

Tomasulo Example Cycle 15


Instruction status Execution Write Instruction j k Issue complete Result LD F6 34+ R2 1 3 4 LD F2 45+ R3 2 4 5 MULTD F2 F0 F4 3 15 SUBD F8 F6 F2 4 7 8 DIVDF10 F0 F6 5 ADDD F6 F8 F2 6 10 11 S1 S2 RS for j Reservation Stations Time Name BusyOp Vj Vk Qj 0 Add1 No 0 Add2 No Add3 No M 0 Mult1 Yes MULTD (45+R3) R(F4) 0 Mult2 Yes DIVD M(34+R2) Mult1 Register result status Busy Address Load1 No Load2 No Load3 No

RS for k Qk

Clock
15

F0 F2
FU Mult1 M(45+R3)

F4

F6

F8

F10 F12 ...

F30

(MM)+M() M()M() ult2 M


EEL 5708

Mult1 completing; what is waiting for it?

Tomasulo Example Cycle 16


Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 DIVDF10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations Time Name BusyOp 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 40 Mult2 Yes DIVD Register result status Execution Write complete Result 3 4 4 5 15 16 7 8 10 S1 Vj S2 Vk 11 RS for j Qj RS for k Qk Busy Address Load1 No Load2 No Load3 No

M*F4

M(34+R2)

Clock
16

F0

F2

F4

F6

F8

F10 F12 ...

F30

FU M*F4 M(45+R3)

(MM)+M() M()M() ult2 M

Note: Just waiting for divide

EEL 5708

Tomasulo Example Cycle 55


Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 DIVDF10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations Time Name BusyOp 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 1 Mult2 Yes DIVD Register result status Execution Write complete Result 3 4 4 5 15 16 7 8 10 S1 Vj S2 Vk 11 RS for j Qj RS for k Qk Busy Address Load1 No Load2 No Load3 No

M*F4

M(34+R2)

Clock
55

F0 F2
FU M*F4 M(45+R3)

F4

F6

F8

F10 F12 ...

F30

(MM)+M()M()M() ult2 M
EEL 5708

Tomasulo Example Cycle 56


Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 DIVDF10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations Time Name BusyOp 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 Yes DIVD Register result status Execution complete 3 4 15 7 56 10 S1 Vj Write Result 4 5 16 8 11 S2 Vk RS for j Qj RS for k Qk Busy Address Load1 No Load2 No Load3 No

M*F4

M(34+R2)

Clock
56

F0 F2
FU M*F4 M(45+R3)

F4

F6

F8

F10 F12 ...

F30

(MM)+M()M()M() ult2 M
EEL 5708

Mult 2 completing; what is waiting for it?

Tomasulo Example Cycle 57


Instruction status Instruction j k Issue LD F6 34+ R2 1 LD F2 45+ R3 2 MULTD F2 F0 F4 3 SUBD F8 F6 F2 4 DIVDF10 F0 F6 5 ADDD F6 F8 F2 6 Reservation Stations Time Name BusyOp 0 Add1 No 0 Add2 No Add3 No 0 Mult1 No 0 Mult2 No Register result status Execution complete 3 4 15 7 56 10 S1 Vj Write Result 4 5 16 8 57 11 S2 Vk Busy Address Load1 No Load2 No Load3 No

RS for j Qj

RS for k Qk

Clock
57

F0

F2

F4

F6

F8

F10 F12 ...

F30

FU M*F4 M(45+R3)

(MM)+M() M()M() *F4/M M

Again, in-order issue, out-of-order execution, completion

EEL 5708

Tomasulo Drawbacks
Complexity
delays of 360/91, MIPS 10000, IBM 620?

Many associative stores (CDB) at high speed Performance limited by Common Data Bus
Multiple CDBs => more FU logic for parallel assoc stores

EEL 5708

Tomasulo Loop Example


Loop: LD MULTD SD SUBI BNEZ F0 F4 F4 R1 R1 0 F0 0 R1 Loop R1 F2 R1 #8

Assume Multiply takes 4 clocks Assume first load takes 8 clocks (cache miss?), second load takes 4 clocks (hit) To be clear, will show clocks for SUBI, BNEZ Reality, integer instructions ahead

EEL 5708

Loop Example Cycle 0


Instruction status Instruction j k iteration LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 No 0 Mult2 No Register result status Clock R1 F0 0 80 Qi Issue Execution Write complete Result Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTD F4 SD F4 SUBI R1 BNEZ R1 Busy Address No No No Qi No No No

S1 Vj

S2 Vk

0 R1 F0 F2 0 R1 R1 #8 Loop

F2

F4

F6

F8

F10 F12 ... F30

EEL 5708

Loop Example Cycle 1


Instruction status Instruction j k iteration LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 No 0 Mult2 No Register result status Clock R1 F0 1 80 Qi Load1 Issue 1 Execution Write complete Result Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTD F4 SD F4 SUBI R1 BNEZ R1 Busy Address Yes 80 No No Qi No No No

S1 Vj

S2 Vk

0 R1 F0 F2 0 R1 R1 #8 Loop

F2

F4

F6

F8

F10 F12 ... F30

EEL 5708

Loop Example Cycle 2


Instruction status Instruction j k iteration LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock R1 F0 2 80 Qi Load1 Issue 1 2 Execution Write complete Result Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTD F4 SD F4 Load1 SUBI R1 BNEZ R1 Busy Address Yes 80 No No Qi No No No

S1 Vj

S2 Vk

R(F2)

0 R1 F0 F2 0 R1 R1 #8 Loop

F2

F4
Mult1

F6

F8

F10 F12 ... F30

EEL 5708

Loop Example Cycle 3


Instruction status Instruction j k iteration LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock R1 F0 3 80 Qi Load1 Issue 1 2 3 Execution Write complete Result Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTD F4 SD F4 Load1 SUBI R1 BNEZ R1 Busy Address Yes 80 No No Qi Yes 80 Mult1 No No

S1 Vj

S2 Vk

R(F2)

0 R1 F0 F2 0 R1 R1 #8 Loop

F2

F4
Mult1

F6

F8

F10 F12 ... F30

Note: MULT1 has no registers names in RS

EEL 5708

Loop Example Cycle 4


Instruction status Instruction j k iteration LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock R1 F0 4 72 Qi Load1 Issue 1 2 3 Execution Write complete Result Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTD F4 SD F4 Load1 SUBI R1 BNEZ R1 Busy Address Yes 80 No No Qi Yes 80 Mult1 No No

S1 Vj

S2 Vk

R(F2)

0 R1 F0 F2 0 R1 R1 #8 Loop

F2

F4
Mult1

F6

F8

F10 F12 ... F30

EEL 5708

Loop Example Cycle 5


Instruction status Instruction j k iteration LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock R1 F0 5 72 Qi Load1 Issue 1 2 3 Execution Write complete Result Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTD F4 SD F4 Load1 SUBI R1 BNEZ R1 Busy Address Yes 80 No No Qi Yes 80 Mult1 No No

S1 Vj

S2 Vk

R(F2)

0 R1 F0 F2 0 R1 R1 #8 Loop

F2

F4
Mult1

F6

F8

F10 F12 ... F30

EEL 5708

Loop Example Cycle 6


Instruction status k iteration Instruction j LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time NameBusy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Executio Write Issue complete Result BusyAddress 1 Load1 Yes 80 2 Load2 Yes 72 3 Load3 No Qi 6 Store1 Yes 80 Mult1 Store2 No Store3 No S1 S2 RS forRS for k Vj Vk Qj Qk Code: LD F0 0 R1 MULTD F0 F2 F4 SD F4 0 R1 R(F2) Load1 SUBIR1 R1 #8 BNEZ R1 Loop

Clock
6

R1 72

F0
Qi Load2

F2

F4
Mult1

F6

F8

F10F12 F30 ...

Note: F0 never sees Load1 result

EEL 5708

Loop Example Cycle 7


Executio rite W Instruction status k iteration Issue complete Result Instruction j BusyAddress LD F0 0 R1 1 1 Load1 Yes 80 MULTD F4 F0 F2 1 2 Load2 Yes 72 SD F4 0 R1 1 3 Load3 No Qi LD F0 0 R1 2 6 Store1 Yes 80 Mult1 MULTD F4 F0 F2 2 7 Store2 No Store3 No SD F4 0 R1 2 S1 S2 RS forRS for k Reservation Stations Time NameBusy Op Vj Vk Qj Qk Code: 0 Add1 No LD F0 0 R1 MULTD F0 F2 F4 0 Add2 No 0 Add3 No SD F4 0 R1 0 Mult1 Yes MULTD R(F2) Load1 SUBIR1 R1 #8 BNEZ R1 Loop 0 Mult2 Yes MULTD R(F2) Load2 Register result status

Clock
7

R1 72

F0
Qi Load2

F2 F4
Mult2

F6

F8

F10F12 F30 ...


EEL 5708

Loop Example Cycle 8


Instruction status k iteration Instruction j LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time NameBusy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 Yes MULTD Register result status Executio Write Issue complete Result BusyAddress 1 Load1 Yes 80 2 Load2 Yes 72 3 Load3 No Qi 6 Store1 Yes 80 Mult1 7 Store2 Yes 72 Mult2 8 Store3 No S1 S2 RS forRS for k Vj Vk Qj Qk Code: LD F0 0 R1 MULTD F0 F2 F4 SD F4 0 R1 R(F2) Load1 SUBIR1 R1 #8 R(F2) Load2 BNEZ R1 Loop

Clock
8

R1 72

F0
Qi Load2

F2

F4
Mult2

F6

F8

F10F12 F30 ...

EEL 5708

Loop Example Cycle 9


Instruction status Instruction j k LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes 0 Mult2 Yes Register result status Clock R1 9 64 Qi iteration 1 1 1 2 2 2 Op Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 Load1 Load2 Load3 Store1 Store2 Store3 S2 RS for j RS for k Vk Qj Qk Busy Address Yes 80 Yes 72 No Qi Yes 80 Mult1 Yes 72 Mult2 No Code: LD F0 MULTD F4 SD F4 SUBI R1 BNEZ R1

MULTD MULTD

R(F2) R(F2)

Load1 Load2

0 R1 F0 F2 0 R1 R1 #8 Loop

F0
Load2

F2

F4
Mult2

F6

F8

F10 F12 ... F30

Load1 completing; what is waiting for it?

EEL 5708

Loop Example Cycle 10


Instruction status Instruction j k LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 4 Mult1 Yes 0 Mult2 Yes Register result status Clock R1 10 64 Qi iteration 1 1 1 2 2 2 Op Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 Load1 Load2 Load3 10 Store1 Store2 Store3 S2 RS for j RS for k Vk Qj Qk Busy Address No Yes 72 No Qi Yes 80 Mult1 Yes 72 Mult2 No Code: LD F0 MULTD F4 SD F4 SUBI R1 BNEZ R1

MULTD MULTD

M(80) R(F2) R(F2)

Load2

0 R1 F0 F2 0 R1 R1 #8 Loop

F0
Load2

F2

F4
Mult2

F6

F8

F10 F12 ... F30

Load2 completing; what is waiting for it?

EEL 5708

Loop Example Cycle 11


Instruction status Instruction j k LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 3 Mult1 Yes 4 Mult2 Yes Register result status Clock R1 11 64 Qi iteration 1 1 1 2 2 2 Op Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 Load1 Load2 Load3 10 11 Store1 Store2 Store3 S2 RS for j RS for k Vk Qj Qk Busy Address No No Yes 64 Qi Yes 80 Mult1 Yes 72 Mult2 No Code: LD F0 MULTD F4 SD F4 SUBI R1 BNEZ R1

MULTD MULTD

M(80) R(F2) M(72) R(F2)

0 R1 F0 F2 0 R1 R1 #8 Loop

F0
Load3

F2

F4
Mult2

F6

F8

F10 F12 ... F30

EEL 5708

Loop Example Cycle 12


Instruction status Instruction j k LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 2 Mult1 Yes 3 Mult2 Yes Register result status Clock R1 12 64 Qi iteration 1 1 1 2 2 2 Op Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 Load1 Load2 Load3 10 11 Store1 Store2 Store3 S2 RS for j RS for k Vk Qj Qk Busy Address No No Yes 64 Qi Yes 80 Mult1 Yes 72 Mult2 No Code: LD F0 MULTD F4 SD F4 SUBI R1 BNEZ R1

MULTD MULTD

M(80) R(F2) M(72) R(F2)

0 R1 F0 F2 0 R1 R1 #8 Loop

F0
Load3

F2

F4
Mult2

F6

F8

F10 F12 ... F30

EEL 5708

Loop Example Cycle 13


Instruction status Instruction j k LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 1 Mult1 Yes 2 Mult2 Yes Register result status Clock R1 13 64 Qi iteration 1 1 1 2 2 2 Op Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 Load1 Load2 Load3 10 11 Store1 Store2 Store3 S2 RS for j RS for k Vk Qj Qk Busy Address No No Yes 64 Qi Yes 80 Mult1 Yes 72 Mult2 No Code: LD F0 MULTD F4 SD F4 SUBI R1 BNEZ R1

MULTD MULTD

M(80) R(F2) M(72) R(F2)

0 R1 F0 F2 0 R1 R1 #8 Loop

F0
Load3

F2

F4
Mult2

F6

F8

F10 F12 ... F30

EEL 5708

Loop Example Cycle 14


Instruction status Instruction j k LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1 Reservation Stations Time Name Busy 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes 1 Mult2 Yes Register result status Clock R1 14 64 Qi iteration 1 1 1 2 2 2 Op Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 Load1 14 Load2 Load3 10 11 Store1 Store2 Store3 S2 RS for j RS for k Vk Qj Qk Busy Address No No Yes 64 Qi Yes 80 Mult1 Yes 72 Mult2 No Code: LD F0 MULTD F4 SD F4 SUBI R1 BNEZ R1

MULTD MULTD

M(80) R(F2) M(72) R(F2)

0 R1 F0 F2 0 R1 R1 #8 Loop

F0
Load3

F2

F4
Mult2

F6

F8

F10 F12 ... F30

Mult1 completing; what is waiting for it?

EEL 5708

Loop Example Cycle 15


Instruction status Instruction j k iteration LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 No 0 Mult2 Yes MULTD Register result status Clock R1 F0 15 64 Qi Load3 Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 Load1 14 15 Load2 Load3 10 11 Store1 15 Store2 Store3 S2 RS for j RS for k Vk Qj Qk Busy Address No No Yes 64 Qi Yes 80 M(80)*R(F2) Yes 72 Mult2 No Code: LD F0 MULTD F4 SD F4 SUBI R1 BNEZ R1

M(72) R(F2)

0 R1 F0 F2 0 R1 R1 #8 Loop

F2

F4
Mult2

F6

F8

F10 F12 ... F30

Mult2 completing; what is waiting for it?

EEL 5708

Loop Example Cycle 16


Instruction status Instruction j k iteration LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock R1 F0 16 64 Qi Load3 Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 Load1 14 15 Load2 Load3 10 11 Store1 15 16 Store2 Store3 S2 RS for j RS for k Vk Qj Qk Busy Address No No Yes 64 Qi Yes 80 M(80)*R(F2) Yes 72 M(72)*R(72) No Code: LD F0 MULTD F4 SD F4 SUBI R1 BNEZ R1

R(F2)

Load3

0 R1 F0 F2 0 R1 R1 #8 Loop

F2

F4
Mult1

F6

F8

F10 F12 ... F30

EEL 5708

Loop Example Cycle 17


Instruction status Instruction j k iteration LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock R1 F0 17 64 Qi Load3 Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 Load1 14 15 Load2 Load3 10 11 Store1 15 16 Store2 Store3 S2 RS for j RS for k Vk Qj Qk Busy Address No No Yes 64 Qi Yes 80 M(80)*R(F2) Yes 72 M(72)*R(72) Yes 64 Mult1 Code: LD F0 MULTD F4 SD F4 SUBI R1 BNEZ R1

R(F2)

Load3

0 R1 F0 F2 0 R1 R1 #8 Loop

F2

F4
Mult1

F6

F8

F10 F12 ... F30

EEL 5708

Loop Example Cycle 18


Instruction status Instruction j k iteration LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock R1 F0 18 56 Qi Load3 Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 14 15 18 10 11 15 16 S2 Vk Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTD F4 SD F4 Load3 SUBI R1 BNEZ R1 Busy Address No No Yes 64 Qi Yes 80 M(80)*R(F2) Yes 72 M(72)*R(72) Yes 64 Mult1

R(F2)

0 R1 F0 F2 0 R1 R1 #8 Loop

F2

F4
Mult1

F6

F8

F10 F12 ... F30

EEL 5708

Loop Example Cycle 19


Instruction status Instruction j k iteration LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock R1 F0 19 56 Qi Load3 Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 14 15 18 19 10 11 15 16 S2 Vk Load1 Load2 Load3 Store1 Store2 Store3 RS for j RS for k Qj Qk Code: LD F0 MULTD F4 SD F4 Load3 SUBI R1 BNEZ R1 Busy Address No No Yes 64 Qi No Yes 72 M(72)*R(72) Yes 64 Mult1

R(F2)

0 R1 F0 F2 0 R1 R1 #8 Loop

F2

F4
Mult1

F6

F8

F10 F12 ... F30

EEL 5708

Loop Example Cycle 20


Instruction status Instruction j k iteration LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock R1 F0 20 56 Qi Load3 Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 14 15 18 19 10 11 15 16 20 S2 RS for j Vk Qj Load1 Load2 Load3 Store1 Store2 Store3 RS for k Qk Code: LD F0 MULTD F4 SD F4 Load3 SUBI R1 BNEZ R1 Busy Address No No Yes 64 Qi No Yes 72 M(72)*R(72) Yes 64 Mult1

R(F2)

0 R1 F0 F2 0 R1 R1 #8 Loop

F2

F4
Mult1

F6

F8

F10 F12 ... F30

EEL 5708

Loop Example Cycle 21


Instruction status Instruction j k iteration LD F0 0 R1 1 MULTD F4 F0 F2 1 SD F4 0 R1 1 LD F0 0 R1 2 MULTD F4 F0 F2 2 SD F4 0 R1 2 Reservation Stations Time Name Busy Op 0 Add1 No 0 Add2 No 0 Add3 No 0 Mult1 Yes MULTD 0 Mult2 No Register result status Clock R1 F0 21 56 Qi Load3 Issue 1 2 3 6 7 8 S1 Vj Execution Write complete Result 9 10 14 15 18 19 10 11 15 16 20 21 S2 RS for j Vk Qj Load1 Load2 Load3 Store1 Store2 Store3 RS for k Qk Code: LD F0 MULTD F4 SD F4 Load3 SUBI R1 BNEZ R1 Busy Address No No Yes 64 Qi No No Yes 64 Mult1

R(F2)

0 R1 F0 F2 0 R1 R1 #8 Loop

F2

F4
Mult1

F6

F8

F10 F12 ... F30

EEL 5708

Tomasulo Summary
Reservations stations: renaming to larger set of registers + buffering source operands
Prevents registers as bottleneck Allows loop unrolling in HW

Not limited to basic blocks (integer units gets ahead, beyond branches) Helps cache misses as well Lasting Contributions
Dynamic scheduling Register renaming Load/store disambiguation

360/91 descendants are Pentium II; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264
EEL 5708