Beruflich Dokumente
Kultur Dokumente
James E. Stine
Advanced architectures
2
Instruction Flow
• Challenges:
– Branches: control dependences
– Branch target misalignment
– Instruction cache misses
• Solutions
– Code alignment (static vs.dynamic)
– Prediction/speculation
PC
3 instructions fetched
3
Issues in Decoding
• Primary Tasks
– Identify individual instructions (!)
– Determine instruction types
– Determine dependences between instructions
4
Pentium Pro Fetch/Decode
5
Predecoding in the AMD K5
6
Instruction Dispatch and Issue
• Parallel pipeline
– Centralized instruction fetch
– Centralized instruction decode
• Diversified pipeline
– Distributed instruction execution
7
Necessity of Instruction Dispatch
8
Centralized Reservation Station
9
Distributed Reservation Station
10
Issues in Instruction Execution
• Current trends
– More parallelism bypassing very challenging
– Deeper pipelines
– More diversity
11
Bypass Networks
Fetch Q BR
Scan
BR
Decode Predict
FX1 FX2 CR BR
Unit LD1 LD2 Unit Unit Unit
FP1 FP2 Unit Unit
Unit Unit
StQ
D-Cache
[Modern Processor Design – Shen / Lipasti]
12
Specialized units
13
New Instruction Types
14
Issues in Completion/Retirement
• Out-of-order execution
– ALU instructions
– Load/store instructions
• In-order completion/retirement
– Precise exceptions
– Memory coherence and consistency
• Solutions
– Reorder buffer
– Store buffer
– Load queue snooping (later)
15
A Dynamic Superscalar Processor
16
Impediments to High IPC
I-cache
Branch FETCH
Instruction
Predictor Flow
Instruction
Buffer
DECODE
Memory
Data
EXECUTE Flow
Reorder
Buffer
Register (ROB)
Data COMMIT
Flow Store D-cache
Queue
17
Superscalar Summary
• Instruction flow
– Branches, jumps, calls: predict target, direction
– Fetch alignment
– Instruction cache misses
18