Beruflich Dokumente
Kultur Dokumente
COMPUTER
ARCHITECTURE
BY
DR. RADWA M. TAWFEEK
SUPERSCALAR PROCESSORS
LAST LECTURE
• Cache Memory
• Block Location
• Replacement Strategy
• Write Strategy
THIS LECTURE
• ILP
• Superscalar Processor
THE METHOD FOR EXPLOITING PARALLELISM
+ multiple processors
PARALLEL PROCESSING
How would the register file know which output values would be directed to which ALU?
SUPERSCALAR WITH DIFFERENT ALUS
FUNCTIONS
DIVERSIFIED PIPELINE
• Advantages:
• Each pipe can be customized for a
particular instruction type, resulting in an
efficient hardware design
• Considerations:
• Number and mix of functional units
• Order in which instructions update registers and memory values (order of completion)
Standard Categories:
• In-order issue with in-order completion
• In the scalar pipeline there are state registers (buffers) in between the
stages
• Using superscalar pipeline, a multientry buffer must be used to hold the
data of each instruction to be executed in parallel
DYNAMIC PIPELINING (2)
• In the scalar pipeline, when stall happens, it prevents the data in the buffer to
flow to the next stage
• In superscalar pipeline, stalling all the buffer will cause stalling instructions
which don’t need to be stalled (revise the example shown in slide 17)
• So more complex multientry buffer design is required.
• One enhancement is to add capability to explicitly address each individual entry
in the buffer, and independently control the reading and writing of each entry.
• In superscalar, trailing instructions can bypass leading stalled instruction, which
causes out of order execution, theses superscalars are called Dynamic pipelining
DYNAMIC PIPELINING (3)
SUPERSCALAR EXECUTION
COMMITTING OR RETIRING INSTRUCTIONS
• - having 3 separate function units (e.g., two integer arithmetic and one floating-point
arithmetic)
• - 2 instances of the write-back pipeline stage
• - 6 instruction code fragment with the following constraints:
• ‣ I1 requires two cycles to execute
• ‣ I3 and I4 conflict for the same functional unit (e.g., both need floating-point arithmetic) ‣ I5 depends on
the value produced by I4
• ‣ I5 and I6 conflict for a functional unit
• - When there is a conflict for a functional unit, or when a functional unit requires more than
one cycle to generate a result, instructions temporarily stall.
IN-ORDER ISSUE -- IN-ORDER COMPLETION
Again:
• I1 requires 2 cycles to execute
• I3 & I4 conflict for the same functional unit
• I5 depends upon value produced by I4
• I5 & I6 conflict for a functional unit
OUT-OF-ORDER ISSUE -- OUT-OF-ORDER COMPLETION
Again:
• I1 requires 2 cycles to execute
• I3 & I4 conflict for the same functional unit
• I5 depends upon value produced by I4
• I5 & I6 conflict for a functional unit
Note: I5 depends upon I4, but I6 does not
SOME ARCHITECTURES
• PowerPC 604
• six independent execution units:
• Branch execution unit
• Load/Store unit
• 3 Integer units
• Floating-point unit
• in-order issue
• register renaming
• Power PC 620
• provides in addition to the 604 out-of-order issue
• Pentium
• three independent execution units:
• 2 Integer units
• Floating point unit
• in-order issue
ASSIGNMENT 2