Sie sind auf Seite 1von 22

Wen-mei Hwu and S. J.

Patel, 2005
ECE 511, University of Illinois
Lecture 3:
Instruction Set Architecture
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
Outline
Instruction Set Architecture
Traditional issues
The (old) debate: RISC vs. CISC
New issues
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
The Big Picture
Requirements
Algorithms
Prog. Lang./OS
ISA
uArch
Circuit
Device
Problem Focus
Performance
Focus
BOX BOX Si fin - Body!
Drain Source
Gate
f2() {
f3(s2, &j, &i);
*s2->p = 10;
i = *s2->q + i;
}

i1: ld r1, b <p1>
i2: ld r2, c <p1>
i3: ld r5, z <p3>
i4: mul r6, r5, 3 <p3>
i5: add r3, r1, r2 <p1>
f1 f2
f3
f4
f5
s q
p
j
i
fp
f3
SPEC
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
Instruction Set Architecture
Application
Instruction Set Architecture
Implementation
SPARC MIPS ARM x86 HP-PA IA-64
Intel Pentium X
AMD K6, Athlon, Opteron
Transmeta Crusoe TM5x00
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
Instruction Set Architecture
Strong influence on cost/performance

New ISAs are rare, but versions are not
16-bit, 32-bit and 64-bit X86 versions
Longevity is a strong function of
marketing prowess
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
Strongly constrained by the number of bits
available to instruction encoding
Opcodes/operands
Registers/memory
Addressing modes
Orthogonality
0, 1, 2, 3 address machines
Instruction formats
Decoding uniformity
Traditional Issues
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
Instruction Formats
Alpha (fixed length)
32 bits
6 bits
opcode
opcode
opcode
opcode
RA
RA
RA RB
RB
RC
TRAP
Branch
Mem
Operate
x86 (variable length)
prefixes opcode addr mode displ imm
0 to 4 bytes of prefix
1 or 2 bytes of opcode
0 to 2 bytes (ModR/M and SIB)
0 to 8 bytes
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
The (old) Debate : RISC vs. CISC
At the time, IBM 370 and VAX dominated

CMOS was up and coming technology
Small number of transistors per chip
RISC was appealing
lower design complexity
easier to pipeline
higher performance when fit on a chip
IBM 801 (Cocke et al, 1982)
RISC I (Patterson et al, 1982)
MIPS (Hennesey et al, 1982)

Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
What is RISC?
Fixed length instructions
Few formats
Load/Store
Few addressing modes
Simple decode/control

Many registers
Few unpipelinable insts
Compiler Complexity
Hardware Complexity
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
The MIPS Pipeline
Compiler knows the pipeline
organization
Schedules instructions around
hazards
Branches are handled by delay slots
No need to interlock the pipeline
Fetch Decode ALU Memory WriteBack
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
Pipelining a CISC [Patt et al 85]
Fetch
Instruction Bytes
Decode
Op
store
Emits RISC-like
micro-operations
RF Read Execute Mem WB
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
RISC Baggage
In hindsight, like CISC, even RISC
architectures suffered from legacy effects.
Delay slots
Used for dealing with branches in short pipelines
Helps primarily with target generation
Becomes a burden for the future generations whose
pipelines need to be deeper
Register windows
Quick save/restore state for procedure calls
Reduce procedure call overhead to 1 cycle
Makes register renaming and out of order execution
more complex

Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
The Dynamic-Static Interface
Perhaps the main contribution of the RISC Revolution

John Cocke (IBM) is credited for the original idea.

John Hennessy a major driving force later, followed by
IMPACT team at Illinois.

a willingness to make design tradeoffs freely between
the architecture and implementation
Colwell et al, 1985.

This legacy is still alive and kicking today.
DSI
DSI
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
DSI and Static Optimization
Granularity of
ISA instruction
ISAs for
reconfigurable
architectures
VAX ISA
MIPS ISA
Potential for Static Optimization
Itanium ISA
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
Variable Instruction Format
Motivation 1: to accommodate a large
number of opcodes with nonuniform
frequency of occurrence
VAX has 304 opcodes. If we insisted using
uniform opcode encoding, we would need 9 bits.
Due to the policy of byte alignment, one needs two
bytes to encode each VAX opcode.
An observation: some opcodes are used more
often than the others. The top 200 opcodes acount
for about 98% of the dynamic opcode usage.
Instead of using 2 bytes to encode all the 304
opcodes, use 1 byte to encode the frequently used
ones and use 2 bytes to encode the infrequently
used ones.
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
Variable Instruction Format (cont.)
Motivation 2: to allow each instr. exactly the
number of operands it needs.
RET (0), INC (1), ADD (2 or 3),
Motivation 3: to allow each operand specifier
exactly the number of bytes ineeds.
Reg (1), Disp (1 to 8),
All motvations come from reducing the
amount of bytes needed to
represent the program
be fetched during execution
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
VIF Cost
Sequential Decoding Problem
The decoder cannot be sure where the 1st
operand specifier is until the opcode is decoded.
The decoder cannot locate the ith operand
specifier is until the (i-1)th operand specifier is
decoded.
The decoder can not be sure where the jth
instruction starts until the last operand specifier of
the (j-1)th instruction is decode.
Typical solution: instruction buffering with multi-
stage decode pipeline plus post-decode I-cache or
trace cache
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
VIF Cost (cont.)
Non-aligned Instruction Access: instructions
are not aligned to any byte position in each
memory word.
Instruction opcodes and operand specifiers are
not aligned to the decoding logic when fetched
from memory.
Instructions may spill over cache block boundaries
and page boundaries
Typical solution: instruction buffer that decouples
fetch and decode
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
Data Dependent Decoding
What an instruction does depends on
the values of the explicit and/or implicit
input operands.
Cause: generality of instructions
Example: string move instructions in x86
generates different number of loads and
stores according to an input operand value
Typical solution: use microcode when
executing these instructions
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
Number of Registers
The large number of registers allows the
compiler to eliminate memory
references and redundant computation
by storing more values in the register
file
Cost: more bits to encode register
operends
Benefit: suppot for the compiler to achieve
high performance
MIPS had 32, IA-64 has 128 (levels of
metallines is a factor here)

Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
Compatibility subtle issues
Most from incomplete ISA specification
Needed to have extendability
User imposed requirements
Inappropriate use of ISA
Undefined bits being used
Implementation imposed compatibility
Bug compatibility
Pentium II had to reproduce the bugs of
Pentium
Wen-mei Hwu and S. J. Patel, 2005
ECE 511, University of Illinois
Todays Issues
Where should the DSI be placed? What
control is given to the compiler (static) and
what is relegated to the hardware (dynamic).
This is becoming a more pressing issue as the
power crisis continue to grow
Information flow across the DSI interface.
Speculation, predication, registers, analysis info
There is an emerging difference between the
target architecture and the implementation
architecture.
Java, .NET