Sie sind auf Seite 1von 7

E C E N

6 2 5 3

A d v a n c e d

D i g i t a l

C o m p u t e r

D e s i g n

Register Renaming
Recall from our earlier discussion of instruction level parallelism (ILP) that name dependences arise from reuse of the same register for different data. The two kinds of name dependence, anti-dependence (WAR hazard) and output dependence (WAW hazard), both write new data to a register which must not occur until after previous instructions have finished reading or writing the register. Name dependences can be eliminated by using new registers for new data so that each register is written only once. Since nothing is in the register before writing into the register for the first time, there are no name dependences. Completely eliminating name dependences is not practical for a real processor with a finite number of registers. The compiler is limited to the architected set of registers. For example, the MIPS instruction set has 32 (31 really if you do not count R0) architected registers that must be reused to do all the different calculations in a program. Code generated by compilers typically has many name dependences. With the large number of transistors in modern processors, there is no reason to limit the hardware implementation to just the architected registers. We can provide extra internal registers (rename registers) invisible to the programmer for writing new data values. As long as we do not run out of rename registers, name dependences are avoided and instructions with name dependences can be allowed to execute in parallel. Actually we have already provided rename registers in the reorder/completion buffer. Instruction results (data) are stored (written) there after execution finish until the instructions are completed (data written to the architected registers) in program order. The storage in the reorder/completion buffer serves as rename registers. Hardware implementation may have the rename registers in a separate Rename Register File (RRF) or as part of the reorder buffer (fig. 5-15, p. 240). The register specifier input to the Architected Register File (ARF) is the architected register from the instruction code. The busy bit is set when a (non-completed) instruction will write into the architected register at a later time. The tag points to the location in the RRF where the new value will be stored. The valid bit in the RRF is set when the new data is written after execution of the instruction finishes. Register renaming requires the connection of the ARF and RRF as shown in fig. 5-16, p. 241 to accomplish the following five operations. 1. source read: The decode or dispatch stage must find the source operand values to pass onto the execution stage. a. ARF.Busy = 0: The correct value is read from the ARF. b. ARF.Busy = 1 and RRF.Valid = 1: The correct value is read from the RRF.
Register Renaming January 17, 2006 page 1 of 7

E C E N

6 2 5 3

A d v a n c e d

D i g i t a l

C o m p u t e r

D e s i g n

c. ARF.Busy = 1 and RRF.Valid = 0: The value has not yet been calculated. The tag is passed instead to the reservation station where the instruction waits for the value to be calculated before starting execution. 2. destination allocate: The decode or dispatch stage must find a rename register to use for the destination operand value. a. Find a register in the RRF with RRF.Busy = 0 and set its RRF.Busy = 1 and RRF.Valid = 0. To avoid repeatedly reading the RRF.Busy values, the RRF.Busy bits are hardwired to a priority encoder which selects the first unused RRF register(s). b. copy the RRF register number from step 2a into the tag field of the architected register in the ARF and simultaneously set its ARF.Busy = 1. 3. rename register update: The execution finish stage must store the destination operand value into the rename register. a. The destination tag (passed to execution by dispatch) is used to select a rename register in the RRF. b. The destination value is written from the end of the execute pipeline into the selected rename register and simultaneously sets RRF.Valid = 1. 4. architected register update: The complete stage takes the instruction at the top of the reorder buffer and copies the data from the RRF to the ARF, waiting for the valid bit to be set if necessary. a. If RRF.Valid = 0, the destination data is not ready and completion cannot take place. b. If RRF.Valid = 1, completion takes place by copying RRF.data to ARF.data and freeing the rename register by resetting RRF.Busy = 0. If ARF.Tag = RRF register number, then reset ARF.Busy = 0. Since entries in the reorder buffer are in program order (dispatch reserves a place in the reorder buffer for each instruction in program order), reads and writes to the ARF are in original program order. In order reads and writes are as the compiler intended and are free of WAR and WAW hazards. 5. mispredicted branch recovery: When a mispredicted branch is at the top of the reorder buffer, the complete stage discards the rename registers by setting RRF.Busy = 0 and ARF.Busy = 0 for all entries in the RRF and ARF. The five activities of register renaming must take place simultaneously every clock cycle. To see how this might be done, let us assume that the ARF and the RRF are implemented as multiported static RAM which is capable of a read or write every half clock cycle. Two operations per clock are made possible by pre-decoding the address (as was done for the register file for the scalar pipeline last semester). The following diagram shows the timing of the connections of the ARF and RRF to the various stages in the pipeline. Each individual bus line and port in the diagram is actually a set of s busses and ports in parallel for a superscalar design capable of simultaneously fetching s instructions per clock.

Register Renaming

January 17, 2006

page 2 of 7

E C E N

6 2 5 3

A d v a n c e d

D i g i t a l

C o m p u t e r

D e s i g n

Decode Buffer dest RRF Busy Priority Encoder rename reg 1 ARF Tag Busy RRF Busy Valid Valid Data RRF Valid Data Reservation Stations Execution Pipelines Execution Finish 1 RRF Valid Data Completion Buffer RRF Valid Data ARF Data Busy 0 RRF Busy Valid src1 src2

Dispatch Buffer 0 1

Tag Data Busy

ARF Tag Data Busy

Register Renaming

January 17, 2006

page 3 of 7

E C E N

6 2 5 3

A d v a n c e d

D i g i t a l

C o m p u t e r

D e s i g n

The address decoders for each s-wide port are shown as the small squares. The trapezoids are multiplexers. Additional logic (not shown) is required if the same register is renamed more than once in the same fetch group. In general, it is assumed that the address inputs to a port must be valid a half cycle before the data transfer takes place. The decoder outputs (select lines) are latched in the decoder to keep them stable during the data transfer. The different number of ports required for the different parts of the ARF and RRF indicate that the ARF.Busy, RRF.Data and RRF.Valid should be implemented as separate memory modules. The large number of ports for these modules would be very expensive to implement. The cost goes roughly (see the VLSI course for details) as (nr+nw)2 x b where nr is the number of read ports, nw is the number of write ports and b is the number of bits. Since ARF.Busy and RRF.Valid are only 1-bit wide, it does not cost much to add more ports. RRF.Data is as wide as the word size (32-bits in the MIPS instruction set). It would be attractive to consider an alternative design that implements RRF.Data more cheaply with fewer ports. Pooled Register File. The reason that RRF.Data is expensive to implement is that an extra set of busses is needed to transfer the register contents from RRF to ARF when an instruction completes. If the ARF and RRF are pooled into a single register file, then it would be unnecessary to transfer data between them. The registers in the pooled register file can be used as either architected registers or rename registers. A mapping table is used to translate the logical register number (from the instruction code) into the (possibly renamed) physical register. This was done for the floating point unit in the IBM RS/6000 (fig. 5-17, p. 242). An entry in the map table is selected by the operand logical register number. The entry in the table is the physical register number (just like the tag in the ARF). Only load destination registers were renamed in the IBM design, but all destination registers could be renamed if desired. A register is renamed by moving the physical register number to the Pending Target Return Queue (PTRQ) and getting a new physical register number from the Free List. The PTRQ corresponds to the reorder buffer and the Free List corresponds to the RRF.Busy bits in the previous design. When an instruction completes, the destination register data does not move; it stays in the same physical register. Instead the old physical register number is moved from the PTRQ into the Free List. Since the physical register number (6-bits) is much smaller than the data word size (32 or more bits), it is cheaper to add extra ports to the PTRQ and Free List than it is to add extra ports to RRF.Data in the previous design.

Register Renaming

January 17, 2006

page 4 of 7

E C E N

6 2 5 3

A d v a n c e d

D i g i t a l

C o m p u t e r

D e s i g n

OSU Virtual Register Design. Our research group at OSU has been exploring a different version of the pooled register design for the last several years. It has two mapping tables that translate the (virtual) register number in the instruction code into a physical register number. One table contains pointers to the current (possibly) renamed physical registers and the other contains pointers to the current architected physical registers. The problem with the IBM design is that moving the physical register number from the mapping table into the PTRQ requires an extra port on the mapping table. Our design avoids this by maintaining a second mapping table that remains pointing at the architected physical register making it unnecessary to move the pointer into the reorder buffer (PTRQ). Lets use the following notation. RF: combined register file. RRP: rename register pointer. ARP: architected register pointer. The five operations required for register renaming can be accomplished as follows. 1. source read: The decode or dispatch stage must find the source operand values to pass onto the execution stage. a. Use the register number in the instruction to read the ARP and RRP to get the physical register pointer to the RF. If RRP.Valid = 1, the register has been renamed and the physical register pointer comes from the RRP; otherwise the register has not been renamed and the pointer comes from the ARP. b. If RF.Valid = 1: The correct value is read from the RF. c. If RF.Valid = 0: The value has not yet been calculated. The physical register pointer is passed instead to the reservation station where the instruction waits for the value to be calculated before starting execution. 2. destination allocate: The decode or dispatch stage must find a rename register to use for the destination operand value. a. Find a register in the RF with RF.Busy = 0 and set its RF.Busy = 1 and RF.Valid = 0. To avoid repeatedly reading the RF.Busy values, the RF.Busy bits are hardwired to a priority encoder which selects the first unused RF register(s). b. copy the RF register number from step 2a into the RRP. 3. rename register update: The execution finish stage must store the destination operand value into the rename register. a. The (renamed) physical register number (passed to execution by dispatch) is used to select a register in the RF.

Register Renaming

January 17, 2006

page 5 of 7

E C E N

6 2 5 3

A d v a n c e d

D i g i t a l

C o m p u t e r

D e s i g n

b. The destination value is written from the end of the execute pipeline into the selected register and simultaneously sets RF.Valid = 1. 4. architected register update: The complete stage takes the instruction at the top of the reorder buffer and changes the ARP to point to the new physical register, waiting for the valid bit to be set if necessary. a. If RF.Valid = 0, the destination data is not ready and completion cannot take place. b. If RF.Valid = 1, completion takes place by copying physical register number to ARP and setting RF.Busy = 0 for the old register. 5. mispredicted branch recovery: When a mispredicted branch is at the top of the reorder buffer, the complete stage discards the rename registers by setting RRP.Valid = 0 for all entries in the RRP. The following diagram shows the timing of the connections of the ARP, RRP and RF to the various stages in the pipeline. Let us make the same assumptions as before that the ARP, RRP and RF are implemented as multiported static RAM which is capable of a read or write every half clock cycle. Remember that each individual bus line and port in the diagram is actually a set of s busses and ports in parallel for a superscalar design capable of simultaneously fetching s instructions per clock. In this design, only the RF.Valid has an excessive number of ports. The RF.Data has only 2s read ports and s write ports which is the minimum possible for an s-wide superscalar.

Register Renaming

January 17, 2006

page 6 of 7

E C E N

6 2 5 3

A d v a n c e d

D i g i t a l

C o m p u t e r

D e s i g n

Decode Buffer dest RF Busy Priority Encoder rename reg 1 RRP Pointer Valid ARP Pointer RF Busy Valid Valid Data RF Valid Data Reservation Stations Execution Pipelines Execution Finish 1 RF Valid Data Completion Buffer ARP Pointer 0 RF Busy Valid src1 Pointer Dispatch Buffer 0 1 src2 ARP Pointer Valid Pointer RRP Valid Pointer

Register Renaming

January 17, 2006

page 7 of 7

Das könnte Ihnen auch gefallen