Sie sind auf Seite 1von 92

Advanced Computer Architecture

Chapter 4
Processors and Memory Hierarchy
Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani

Source diginotes.in Save the earth. Go paperless


In this chapter…

• Design Space of Processors


• Superscalar and Vector Processors
• Memory Hierarchy Technology

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
2
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors
• Processor families mapped onto a coordinated space of clock rate
v/s CPI
o Clock Rates moved from lower
to higher speeds
o CPI rate is lowered

• Broad Categorization
o CISC
o RISC

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
3
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors
• The Design Space
o CISC Computers
o RISC Computers
o Superscalar Processors
o VLIW Processors
o Vector Supercomputers

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
4
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors
• Instruction Pipelines
o Instruction Cycle Phases
o Pipeline and Pipeline Cycle
o Instruction Pipeline Cycle
o Instruction issue Latency
o Instruction Issue Rate (degree of superscalar processor)
o Simple Operation Latency
o Resource Conflicts
o Base scalar processor

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
5
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
6
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
7
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
8
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
9
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
10
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
11
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
12
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
13
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
14
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors
• Vector Processors
o Memory to memory VP
• Memory based instructions
• Longer instructions
• Instructions include memory address
o Register to register VP
• Shorter Instructions
• Vector register files

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
15
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors
• Vector Processors
o Vector Instructions (Register to register)
• Binary Vector V1 o V2  V3
• Scaling s1 o V1  V2
• Binary Reduction V1 o V2  s1
• Vector Load M(1:n) V1
• Vector Store V1 M(1:n)
• Unary Vector o V1  V2
• Unary Reduction o V1  s1

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
16
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
17
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors
• Vector Processors
o Vector Instructions (Memory to memory)
• M1(1:n) o M2(1:n)  M3(1:n)
• s1 o M1(1:n)  M2(1:n)
• o M2(1:n)  M(1:n)
• M1(1:n) o M2(1:n)  M(k)
o Vector Pipelines

• Symbolic Processors
o Prolog Processors, Lisp Processors or symbolic manipulators.
o Deals with logic programs, symbolic lists, objects, scripts, productions systems, semantic networks,
frames and artificial neural networks.

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
18
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
19
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors
• Symbolic Processors
o Attributes & Characteristics of Symbolic Processing
• Knowledge Representation
o Lists, relational databases, semantic nets, frames etc.
• Common Operations
o Search, sort, pattern matching, filtering etc.
• Memory Requirements
o Large memory with intensive access pattern, content-based
• Communication Patterns
o Varying traffic size, granularity and format of messages
Continued…

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
20
ADVANCED PROCESSOR TECHNOLOGY
Design Space of Processors
• Symbolic Processors
o Attributes & Characteristics of Symbolic Processing
• Properties of Algorithms
o Non-deterministic, parallel and distributed computations
• I/O Requirements
o User-guided programs, intelligent person-machine interfaces
• Architecture Features
o Parallel update of knowledge bases, dynamic load balancing, dynamic memory
allocation, hardware based garbage collection etc.

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
21
ADVANCED PROCESSOR TECHNOLOGY
Memory Hierarchy Technology

• Memory Hierarchy
- Need & Significance

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
22
ADVANCED PROCESSOR TECHNOLOGY
Memory Hierarchy Technology
• Memory Hierarchy
o Parameters
• Access time
• Memory size
• Cost per byte
• Transfer bandwidth
• Unit of transfer
o Properties
• Inclusion
• Coherence
• Locality

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
23
ADVANCED PROCESSOR TECHNOLOGY
Memory Hierarchy Technology
• Memory Hierarchy
o Parameters
• T(i): Access time (round-trip time from CPU to i-th level memory)
o T(i-1) < T(i) <T(i+1)
• S(i): Memory size (number of bytes or words in i-th level memory)
o S(i-1) < S(i) < S(i+1)
• C(i): Cost per byte (per byte cost of i-th level memory; total cost estimated by C(i)*S(i))
o C(i-1) > C(i) > C(i+1)
• B(i): Transfer bandwidth (rate at which information is transferred between adjacent levels)
o B(i-1) > B(i) > B(i+1)
• X(i): Unit of transfer (grain size for data transfer between levels I and i+1)
o X(i-1) < X(i) < X(i+1)

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
24
ADVANCED PROCESSOR TECHNOLOGY
Memory Hierarchy Technology
• Memory Hierarchy
o Properties
• Inclusion Property
o M1 ⊂ M2 ⊂ M3 ⊂ ....⊂ Mn
o M(i-1) is a subset of M(i)
• Coherence Property
o Copies of same information item at successive memory levels be consistent
o Strategies to maintain Coherence:
• 1) Write-Through (WT)
• 2) Write-Back (WB)
• Locality of Reference
o Temporal: recently referenced items are likely to be referenced again in near future
o Spatial: tendency of a process to access items whose addresses are near one another
Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of
Technology Source diginotes.in Save the earth. Go paperless
25
The Working Sets model
ADVANCED PROCESSOR TECHNOLOGY
Memory Hierarchy Technology

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
26
ADVANCED PROCESSOR TECHNOLOGY
Memory Hierarchy Technology
• Memory Capacity Planning
o Hit ratios
• 𝐡𝐢
o Access Frequency to ith level memory
• 𝐟𝐢 = 𝟏 − 𝐡𝟏 𝟏 − 𝐡𝟐 … 𝟏 − 𝐡𝐢−𝟏 𝐡𝐢
o Effective Access Time
• 𝐓𝐞𝐟𝐟 = σ𝒏𝒊=𝟏 𝒇𝒊 ∗ 𝒕𝒊
o Total Cost of Memory Hierarchy
• 𝐂𝐭𝐨𝐭𝐚𝐥 = σ𝒏𝒊=𝟏 𝒄𝒊 ∗ 𝒔𝒊
o Hierarchy Optimization
• 𝐂𝐭𝐨𝐭𝐚𝐥 = σ𝒏𝒊=𝟏 𝒄𝒊 ∗ 𝒔𝒊
• Subject to: si > 0 ti > 0 (for i=1 to n) 𝐂𝐭𝐨𝐭𝐚𝐥 = σ𝒏𝒊=𝟏 𝒄𝒊 ∗ 𝒔𝒊 < 𝑪𝟎

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
27
ADVANCED PROCESSOR TECHNOLOGY
Memory Hierarchy Technology
Virtual Memory Technology
• Virtual memory is stored in the secondary storage device and helps to
extend additional memory capacity.
• Work with primary memory to load applications.
• Reduces the cost of expanding the capacity of physical memory.
• Implimentations differ from one OS to other.

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
28
Virtual Memory Technology

• Each process address space is partitioned into parts and used for code, data
and stack.
• Parts are loaded into primary memory when needed and written back to
secondary storage otherwise.
• The logical address space is referred to as virtual memory.
• Virtual memory is much larger than the physical memory.
• Virtual memory uses: Virtual address and Physical address.
• CPU translates Virtual address to Physical address.
• Virtual memory system uses paging.

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
29
Locating an Object in a Cache

1. Search for matching tag “Cache”


o SRAM cache Tag Data
Object Name 0: D 243
X = X? 1: X 17
• •
• •
• •
N-1: J 105
2. Use indirection to look up actual object location
• DRAM cache Lookup Table “Cache”
Location Data
Object Name D: 0 0: 243
X J: N-1 1: 17
• •
• •
• •
X: 1 N-1: 105
Source diginotes.in Save the earth. Go paperless
A System with Physical Memory Only
• Examples:
o most Cray machines, early PCs, nearly all embedded systems, etc.
Memory
0:
Physical 1:
Addresses
CPU

N-1:

Addresses generated by the CPU point directly to bytes in physical memory


Source diginotes.in Save the earth. Go paperless
A System with Virtual Memory
• Examples:
o workstations, servers, modern PCs, etc. Memory
0:
Page Table 1:
Virtual Physical
Addresses 0: Addresses
1:

CPU

P-1:
N-1:

Disk
Address Translation: the hardware converts virtual addresses into physical addresses via
an OS-managed lookup table (page table)
Source diginotes.in Save the earth. Go paperless
Page Faults (Similar to “Cache Misses”)

• What if an object is on disk rather than in memory?


o Page table entry indicates that the virtual address is not in memory
o An OS exception handler is invoked, moving data from disk into memory
• current process suspends, others can resume
• OS has full control over the placement.

Before fault After fault


Memory
Memory
Page Table
Virtual Physical Page Table
Addresses Addresses Virtual Physical
Addresses Addresses
CPU
CPU

Disk
Disk
Source diginotes.in Save the earth. Go paperless
Servicing a Page Fault

• Processor Signals Controller (1) Initiate Block Read


o Read block of length P starting Processor
Reg
at disk address X and store (3) Read
starting at memory address Y Done
Cache
• Read Occurs
o Direct Memory Access (DMA)
Memory-I/O bus
o Under control of I/O controller
• I/O Controller Signals Completion (2) DMA Transfer I/O
o Interrupt processor Memory controller

o OS resumes suspended
process disk
Disk disk
Disk

Source diginotes.in Save the earth. Go paperless


Solution: Separate Virtual Addr. Spaces

o Virtual and physical address spaces divided into equal-sized blocks


• blocks are called “pages” (both virtual and physical)
o Each process has its own virtual address space
• operating system controls how virtual pages are assigned to physical memory

0
Virtual 0 Address Translation Physical
VP 1 PP 2 Address
Address VP 2
Space for ... Space
Process 1: N-1 (DRAM)
PP 7 (e.g., read/only library
code)
Virtual 0
VP 1
Address VP 2 PP 10
Space for ...
N-1 M-1
Process 2:
Source diginotes.in Save the earth. Go paperless
Protection
• Page table entry contains access rights information
o hardware enforces this protection (trap into OS if violation occurs)
Page Tables Memory
Read? Write? Physical Addr 0:
VP 0: Yes No PP 9 1:
Process i: VP 1: Yes Yes PP 4
VP 2: No No XXXXXXX
• • •
• • •
• • •
Read? Write? Physical Addr
VP 0: Yes Yes PP 6
Process j: VP 1: Yes No PP 9 N-1:
VP 2: No No XXXXXXX
• • •
• • •
• • •
Source diginotes.in Save the earth. Go paperless
Virtual Memory Address Translation
V = {0, 1, . . . , N–1} virtual address space N>M
P = {0, 1, . . . , M–1} physical address space

MAP: V  P U {} address mapping function


MAP(a) = a' if data at virtual address a is present at physical
address a' in P
=  if data at virtual address a is not present in P

page fault
fault
Processor handler
Hardware 
Addr Trans
Main Secondary
a Mechanism Memory memory
a'
OS performs
virtual address part of the physical address this transfer
on-chip
memory mgmt unit (MMU) (only if miss)
Source diginotes.in Save the earth. Go paperless
Virtual Memory Address Translation

• Parameters
o P = 2p = page size (bytes).
o N = 2n = Virtual address limit
o M = 2m = Physical address limit

n–1 p p–1 0
virtual page number page offset virtual address

address translation

m–1 p p–1 0
physical page number page offset physical address

Notice that the page offset bits don't change as a result of translation
Source diginotes.in Save the earth. Go paperless
Page Tables
Memory resident
Virtual Page page table
Number (physical page
or disk address)
Valid Physical Memory
1
1
0
1
1
1
0
1
0 Disk Storage
1 (swap file or
regular file system file)

Source diginotes.in Save the earth. Go paperless


Integrating VM and Cache

VA PA miss
Trans- Main
CPU Cache
lation Memory
hit
data

• Most Caches “Physically Addressed”


o Accessed by physical addresses
o Allows multiple processes to have blocks in cache at same time
o Allows multiple processes to share pages
o Cache doesn’t need to be concerned with protection issues
• Access rights checked as part of address translation

Source diginotes.in Save the earth. Go paperless


Speeding up Translation with a TLB

• “Translation Lookaside Buffer” (TLB)


o Small hardware cache in MMU
o Maps virtual page numbers to physical page numbers
o Contains complete page table entries for small number of pages

hit
VA PA miss
TLB Main
CPU Cache
Lookup Memory
miss hit
Trans-
lation
data

Source diginotes.in Save the earth. Go paperless


Page Replacement

• Page replacement refers to the process in which a resident page in


main memory is replaced by a new page transferred from the disk.
• The goal of a page replacement policy is to minimize the number of
page faults.
• Reduce the effective memory access time.
• R(t): Resident set of all pages residing in the main memory at time t.
• Forward distance ft(x): The number of time slots from time t to the
first repeated reference of page x in the future.
• Backward distance bt(x): The number of time slots from time t to the
most recent reference of page x in the past.

Source diginotes.in Save the earth. Go paperless


Page Replacement Policies
o Least recently used (LRU): Replaces the page in R(t) which has the
longest backward distance.
o Optimal (OPT) algorithm: Replaces the page in R(t) with longest forward
distance.
o First –in-first-out (FIFO): Replaces the page in R(t) which has been in
memory for the longest time.
o Least frequently used (LFU): Replaces the page in R(t) which has been
least referenced in the past.
o Circular FIFO: Joins all the page frame entries into a circular FIFO
queue using a pointer to indicate the front of the queue.
o Random replacement: Trivial algorithm which chooses any page for
replacement randomly.
Source diginotes.in Save the earth. Go paperless
Advanced Computer Architecture

Chapter 6
Pipelining and Superscalar
Techniques
Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani

Source diginotes.in Save the earth. Go paperless


In this chapter…

• Linear Pipeline Processors


• Non-linear Pipeline Processors
• Instruction Pipeline Design
• Arithmetic Pipeline Design
• Superscalar Pipeline Design

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
2
LINEAR PIPELINE PROCESSORS
• Linear Pipeline Processor
o Cascade of processing stageswhich are linearly connected to perform a fixed function over a stream of
data flowing from one end to the other.
o Instruction execution, arithmetic computations, and memory-access operations.
• Models of Linear Pipeline
o Synchronous Model
o Asynchronous Model
o (Corresponding reservation tables)
• Clocking and Timing Control
o Clock Cycle
o Pipeline Frequency
o Clock skewing
o Flow-through delay
o Speedup factor
o Optimal number of Stages and Performance-Cost Ratio (PCR)

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
3
LINEAR PIPELINE PROCESSORS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
4
LINEAR PIPELINE PROCESSORS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
5
NON-LINEAR PIPELINE PROCESSORS
• Dynamic Pipeline
o Configured to perform variable functions at different times.
o Allows streamline, feed-forward connection and feedback connection

• Static Pipeline
o Used to perform fixed functions

• Reservation and Latency Analysis


o Reservation tables
o Evaluation time
• Latency Analysis
o Latency
o Collision
o Forbidden latencies
o Latency Sequence, Latency Cycle and Average Latency

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
6
NON-LINEAR PIPELINE PROCESSORS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
7
NON-LINEAR PIPELINE PROCESSORS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
8
NON-LINEAR PIPELINE PROCESSORS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
9
INSTRUCTION PIPELINE DESIGN

• Instruction Execution Phases


o E.g. Fetch, Decode, Issue, Execute, Write-back
o In-order Instruction issuing and Reordered Instruction issuing
• E.g. X = Y + Z , A = B x C
• Mechanisms/Design Issues for Instruction Pipelining
o Pre-fetch Buffers
o Multiple Functional Units
o Internal Data Forwarding
o Hazard Avoidance
• Dynamic Scheduling
• Branch Handling Techniques

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
10
INSTRUCTION PIPELINE DESIGN

• Fetch: fetches instructions from memory; ideally one per cycle


• Decode: reveals instruction operations to be performed and identifies the resources needed
• Issue: reserves the resources and reads the operands from registers
• Execute: actual processing of operations as indicated by instruction
• Write Back: writing results into the registers

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
11
INSTRUCTION PIPELINE DESIGN

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
12
INSTRUCTION PIPELINE DESIGN

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
13
INSTRUCTION PIPELINE DESIGN

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
14
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
• Pre-fetch Buffers
o Sequential Buffers
o Target Buffers
o Loop Buffers

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
15
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
• Multiple Functional Units
o Reservation Station and Tags
o Slow-station as Bottleneck stage
• Subdivision of Pipeline Bottleneck stage
• Replication of Pipeline Bottleneck stage
• (Example to be discussed)

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
16
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
17
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
• Internal Forwarding and Register Tagging
o Internal Forwarding:
• A “short-circuit” technique to replace unnecessary memory accesses by register-register
transfers in a sequence of fetch-arithmetic-store operations
o Register Tagging:
• Use of tagged registers , buffers and reservation stations, for exploiting concurrent activities
among multiple arithmetic units
o Store-Fetch Forwarding
• (M  R1, R2  M) replaced by (M  R1, R2  R1)
o Fetch-Fetch Forwarding
• (R1  M, R2  M) replaced by (R1  M, R2  R1)
o Store-Store Overwriting
• (M  R1, M  R2) replaced by (M  R2)

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
18
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
• Internal Forwarding and Register Tagging

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
19
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
• Internal Forwarding and Register Tagging

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
20
INSTRUCTION PIPELINE DESIGN
Mechanisms/Design Issues of Instruction Pipeline
• Hazard Detection and Avoidance
o Domain or Input Set of an instruction
o Range or Output Set of an instruction
o Data Hazards: RAW, WAR and WAW
o Resolution using Register Renaming approach

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
21
INSTRUCTION PIPELINE DESIGN
Dynamic Instruction Scheduling
• Idea of Static Scheduling
o Compiler based scheduling strategy to resolve Interlocking among instructions

• Dynamic Scheduling
o Tomasulo’s Algorithm (Register-Tagging Scheme)
• Hardware based dependence-resolution
o Scoreboarding Technique
• Scoreboard: the centralized control unit
• A kind of data-driven mechanism

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
22
INSTRUCTION PIPELINE DESIGN
Dynamic Instruction Scheduling

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
23
INSTRUCTION PIPELINE DESIGN
Dynamic Instruction Scheduling

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
24
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
• Branch Taken, Branch Target, Delay Slot
• Effect of Branching
o Parameters:
• k: No. of stages in the pipeline
• n: Total no. of instructions or tasks
• p: Percentage of Brach instructions over n
• q: Percentage of successful branch instructions (branch taken) over p.
• b: Delay Slot
• τ: Pipeline Cycle Time
o Branch Penalty = q of (p of n) * bτ = pqnbτ
o Effective Execution Time:
• Teff = [k + (n-1)] τ + pqnbτ = [k + (n-1) + pqnb]τ

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
25
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
• Effect of Branching
o Effective Throughput:
• Heff = n/Teff
• Heff = n / {[k + (n-1) + pqnb]τ} = nf / [k + (n-1) + pqnb]
• As nInfinity and b = k-1
o H*eff = f / [pq(k-1)+1]
• If p=0 and q=0 (no branching occurs)
o H**eff = f = 1/τ
o Performance Degradation Factor
• D = 1 – H*eff / f = pq(k-1) / [pq(k-1)+1]

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
26
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
27
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques
• Branch Prediction
o Static Branch Prediction: based on branch code types
o Dynamic Branch prediction: based on recent branch history
• Strategy 1: Predict the branch direction based on information found at decode stage.
• Strategy 2: Use a cache to store target addresses at effective address calculation stage.
• Strategy 3: Use a cache to store target instructions at fetch stage
o Brach Target Buffer Organization

• Delayed Branches
o A delayed branch of d cycles allows at most d-1 useful instructions to be executed following the
branch taken.
o Execution of these instructions should be independent of branch instruction to achieve a zero
branch penalty

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
28
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
29
INSTRUCTION PIPELINE DESIGN
Branch Handling Techniques

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
30
ARITHMETIC PIPELINE DESIGN
Computer Arithmetic Operations
• Finite-precision arithmetic
• Overflow and Underflow
• Fixed-Point operations
o Notations:
• Signed-magnitude, one’s complement and two-complement notation
o Operations:
• Addition: (n bit, n bit)  (n bit) Sum, 1 bit output carry
• Subtraction: (n bit, n bit)  (n bit) difference
• Multiplication: (n bit, n bit)  (2n bit) product
• Division: (2n bit, n bit)  (n bit) quotient, (n bit) remainder

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
31
ARITHMETIC PIPELINE DESIGN
Computer Arithmetic Operations
• Floating-Point Numbers
o X = (m, e) representation
• m: mantissa or fraction
• e: exponent with an implied base or radix r.
• Actual Value X = m * r e
o Operations on numbers X = (mx, ex) and Y = (my, ey)
• Addition: (mx * rex-ey + my, ey)
• Subtraction: (mx * rex-ey – my, ey)
• Multiplication: (mx * my, ex+ey)
• Division: (mx / my, ex – ey)
• Elementary Functions
o Transcendental functions like: Trigonometric, Exponential, Logarithmic, etc.

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
32
ARITHMETIC PIPELINE DESIGN
Static Arithmetic Pipelines
• Separate units for fixed point operations and floating point operations
• Scalar and Vector Arithmetic Pipelines
• Uni-functional or Static Pipelines
• Arithmetic Pipeline Stages
o Majorly involve hardware to perform: Add and Shift micro-operations
o Addition using: Carry Propagation Adder (CPA) and Carry Save Adder (CSA)
o Shift using: Shift Registers

• Multiplication Pipeline Design


o E.g. To multiply two 8-bit numbers that yield a 16-bit product using CSA and CPA Wallace Tree.

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
33
ARITHMETIC PIPELINE DESIGN
Static Arithmetic Pipelines

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
34
ARITHMETIC PIPELINE DESIGN
Static Arithmetic Pipelines

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
35
ARITHMETIC PIPELINE DESIGN
Multifunctional Arithmetic Pipelines
• Multifunctional Pipeline:
o Static multifunctional pipeline
o Dynamic multifunctional pipeline

• Case Study: T1/ASC static multifunctional pipeline architecture

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
36
ARITHMETIC PIPELINE DESIGN
Multifunctional Arithmetic Pipelines

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
37
ARITHMETIC PIPELINE DESIGN
Multifunctional Arithmetic Pipelines

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
38
SUPERSCALAR PIPELINE DESIGN
• Pipeline Design Parameters
o Pipeline cycle, Base cycle, Instruction issue rate, Instruction issue Latency, Simple Operation Latency
o ILP to fully utilize the pipeline
• Superscalar Pipeline Structure
• Data and Resource Dependencies
• Pipeline Stalling
• Superscalar Pipeline Scheduling
o In-order Issue and in-order completion
o In-order Issue and out-of-order completion
o Out-of-order Issue and out-of-order completion
• Superscalar Performance

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
39
SUPERSCALAR PIPELINE DESIGN

Parameter Base Scalar Processor Super Scalar Processor


(degree = K)
Pipeline Cycle 1 (base cycle) K
Instruction Issue Rate 1 K
Instruction Issue Latency 1 1
Simple Operation Latency 1 1
ILP to fully utilize pipeline 1 K

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
40
SUPERSCALAR PIPELINE DESIGN

4,

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
41
SUPERSCALAR PIPELINE DESIGN

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
42
SUPERSCALAR PIPELINE DESIGN

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
43
SUPERSCALAR PIPELINE DESIGN

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
44
SUPERSCALAR PIPELINE DESIGN

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
45
SUPERSCALAR PIPELINE DESIGN

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
46
SUPERSCALAR PIPELINE DESIGN

4,

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
47
SUPERSCALAR PIPELINE DESIGN

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of


Technology Source diginotes.in Save the earth. Go paperless
48
SUPERSCALAR PIPELINE DESIGN

• Time required by base scalar machine:


o T(1,1) = K + N – 1
• The ideal execution time required by m-issue superscalar machine:
o T(m,1) = K + (N – m)/m
o Where,
• K is the time required to execute first m instructions through m pipelines of k-stages
simultaneously
• Second term corresponds to time required to execute remaining N-m instructions , m per
cycle through m pipelines
• The ideal speedup of superscalar machine
o S(m,1) = T(1,1)/T(m,1) = m(N + k – 1)/[N+ m(k – 1)]
• As n  infinity, S(m,1) m
Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of
Technology Source diginotes.in Save the earth. Go paperless
49

Das könnte Ihnen auch gefallen