Sie sind auf Seite 1von 37

Lecture-3

Instruction Set Architecture (ISA)


Classifications and Addressing Modes

Recap

EE/CS520- Comp. Archi.

9/6/2012

Whats Performance?
Two common measures

Latency (Total time taken to do task X)


Also called response time and execution time
Interesting for a desktop user (generally)

Throughput (how often can it do X in a given time)


Interesting for a large data processing center admin

EE/CS520- Comp. Archi.

9/6/2012

Measuring Performance
Benchmarks
Real applications and application suites
E.g., SPEC CPU2000, SPEC2006, TPC-C. etc.
Kernels
Key pieces of real applications
Easier and quicker to set up and run
Often not really representative of the entire app
Toy programs, synthetic benchmarks, etc.
Not very useful for reporting
Sometimes used to test/stress specific functions/features
Synthetic benchmarks
Fake programs designed to imitate real applications
Last 3 are discredited as they can be conspired to
4

make a product faster !!

EE/CS520- Comp. Archi.

9/6/2012

Amdahls Law
Speedup =

Execution Time without Enhancement Execution Time old


=
Execution Time with Enhancement
Execution Time new

What if enhancement does not enhance everything?


Speedup =

Execution Time new

Execution Time without using Enhancement at all


Execution Time using Enhancement when Possible

Fraction Enhanced
= Execution Time old (1 Fraction Enhanced ) +
Speedup Enhanced

Overall Speedup =

EE/CS520- Comp. Archi.

Fraction Enhanced
(1 Fraction Enhanced ) +
Speedup Enhanced

9/6/2012

Price Vs. Performance

EE/CS520- Comp. Archi.

9/6/2012

CPU Performance: Example 2


Freq. of FP ops = 25%
CPIavg of FP ops = 4.0

CPIavg of other ops = 1.33


Freq. of FPSqr = 2%
CPI of FPSqr = 20
Either

decrease CPI of FPSqr = 2


decrease CPI of FP ops = 2.5

Result: Option-2 is better than option-1


7

EE/CS-520: Comp. Archi.

9/6/2012

Price vs. Performance Trade-Off


Without optimized FPSqr

System costs PKR. 40,000 to manufacture

Selling price is PKR. 55,000 15K profit per system


If we sell 10,000 systems, thats PKR. 150M in profit

With FPSqr

System costs extra PKR. 10,000

Selling price is PKR. 70,000 20K profit per system


But only a few people care for buying that system:
We only sell 4000 systems and make PKR.80M in profit

EE/CS520- Comp. Archi.

9/6/2012

Price vs. Performance Trade-Off


How much effective performance do I get out of it?
10x speedup for small fraction of instructions isnt that efficient
How much more do I have to invest in it?
R&D, testing, marketing costs
How much more can I charge for it?
Does the market even care?

How does the price change affect the volume?

EE/CS520- Comp. Archi.

9/6/2012

Price vs. Performance Trade-Off

10

($3346)

EE/CS520- Comp. Archi.

($3099)

($2907)

($5201)

($2145)

9/6/2012

Instruction Set Architecture

11

EE/CS520- Comp. Archi.

9/6/2012

Classes of Computer
Desktop Computing

First and still the largest market in monetary terms


Well organized in terms of applications and benchmarks
Price vs. Performance is the most critical comparison

Servers

large-scale & more reliable computing services


Dependability
Scalability
Computational capacity, storage, I/O bandwidth, memory

Embedded Computing

The fastest growing portion of computing market


Numerous applications, stringent constraints:
Power consumption, price, memory (area), response time

12

EE/CS520- Comp. Archi.

9/6/2012

Instruction Set Architecture


ISA is the portion of computer visible to the

programmer or the compiler writer


Basic domains of computer applications

Desktop computing
Concentrates on integer and floating point (FP) ops.
Little regards to program size or power
Servers
Concentrates on integer ops and character strings
Embedded computing
Targets code size (memory footprint) and power
FP ops can be omitted if not-needed

13

ISAs for all three are pretty similar


Mostly MIPS serves for all of them

EE/CS520- Comp. Archi.

9/6/2012

Hybrid ISAs
Example : 80x86 (CISC) and RISC

Pentium 4 uses HW to translate 80x86 into RISC


Programmer writes an 80x86 program code

(externally)
Processor executes RISC insts (internally)

14

EE/CS520- Comp. Archi.

9/6/2012

ISA Classifications
Stack based

Accumulator based
Register-based

Register-Register (aka Load-Store) based


Register-Memory based

Memory-Memory based
15

EE/CS520- Comp. Archi.

9/6/2012

Stack based ISA


Implicit operands on Top of the Stack (ToS)
B,C
A

Output
Input

C= A+B

PUSH A
PUSH B
ADD

POP C

16

EE/CS520- Comp. Archi.

9/6/2012

Accumulator based ISA


One implicit operand is accumulator itself
A,C

Output
Input

C=A+B
B

17

EE/CS520- Comp. Archi.

Load A
Add B

# mem to accum

Store C # accum to mem

9/6/2012

Register-Memory based ISA


R3

R1=A

Explicit operands are used


Output
Input

C=A+B

Load R1, A

18

EE/CS520- Comp. Archi.

Add R3, R1, B


Store R3, C

9/6/2012

Register-Register based ISA


R3=C
R1=A
R2=B

Explicit operands are used

Code C=A+B
Load R1, A
Load R2, B

Add R3, R1, R2


Store R3, C

19

EE/CS520- Comp. Archi.

9/6/2012

Whats the Popular Choice?


Register-based architecture (aka GPR* architecture)
Load-Store (Register-Register)
Virtually every architecture since 1980

Why?
Registers are internal to processor, so faster than memory
Registers can hold variables
Once the variables are loaded in regs, memory traffic is reduced
Program code density improves, as regs can be named with
fewer bits than memory
o e.g. 32 regs (encoded in 5-bits) while 128MB memory
(encoded in 28-bits)
20

EE/CS520- Comp. Archi.

*GPR = General Purpose Register

9/6/2012

GPR-Architecture (1)
Two major ISA characteristics

No. of operands supported by ALU (2 or 3)


Add R1, R2
# R1 is both src1 and dst
Add R1, R2, R3 # R1 is dst, R2&R3 are src

How many operands may be memory addresses

# of
operands

# of mem adr.

Load-Store

Mem-Mem

2
2
21

EE/CS520- Comp. Archi.

1
2

Archi. type

Reg-Mem

Mem-Mem

Examples

Alpha, ARM, MIPS, PowerPC

IBM 360/370, Intel 80x86


VAX (obsolete)
VAX (obsolete)

9/6/2012

GPR-Architecture (2)

Type
(#mem, #ops)
Reg-Reg
(0,3)

Reg-Mem
(1,2)

Mem-Mem
(2,2) or (3,3)

22

Advantages

Disadvantages

Simple, Fixed-length, Equal CPI Higher IC, larger program size

Extra load inst not needed,


easy to encode, good code
density
Most compact

EE/CS520- Comp. Archi.

Operands are not equivalent since


one op is destroyed, CPI vary by
operand location
Large variations in inst size, large
CPI, memory bottleneck

9/6/2012

Memory Addressing (1)


How memory addresses are interpreted

Little Endian
Byte at xx000 is put at least significant position
7

Big Endian
Byte at xx000 is put at most significant position
0

Little Endian ordering fails when dealing with


23

character strings, strings are organized in Big Endian


fashion

EE/CS520- Comp. Archi.

9/6/2012

Memory Addressing (2)


Accesses to the objects larger than a byte must be aligned
An access to an object of s bytes at byte address A is said to

be aligned when A mod s = 0

E.g. a 32-bit (4B) object has to be placed at an address that is

completely divisible by 4 i.e. 0x2000, 0x2004, and not 0x2001

Complications with misaligned memory access??

Memories are aligned on multiple of word or double word

boundaries
A misaligned reference is inefficient as it needs multiple
aligned memory reference to implement a single access

24

EE/CS520- Comp. Archi.

9/6/2012

Memory Addressing (3)


width

1B

Aligned Aligned

2B (HW)
2B (HW)
4B (W)

1
Aligned

4B (W)
4B (W)
4B (W)

8B(DW)
8B(DW)
8B(DW)
8B(DW)
8B(DW)
8B(DW)
25

8B(DW)
8B(DW)

EE/CS520- Comp. Archi.

Aligned

Aligned

Aligned

Aligned

Aligned

Aligned

Misaligned
Aligned

Aligned

Misaligned

Misaligned

Aligned

Misaligned

Misaligned

Aligned

Misaligned

Aligned

Misaligned

Misa

Misaligned

Aligned

Misaligned

Misa

Misaligned

Misaligned

Misaligned

Misaligned

Misaligned
Misa
9/6/2012

Addressing Modes (1)

26

EE/CS520- Comp. Archi.

9/6/2012

Addressing Modes (2)

27

Addressing modes can reduce the Inst Count by generating a complex inst,
however they can increase the resultant CPI and hardware complexity

EE/CS520- Comp. Archi.

9/6/2012

Addressing Modes (3)


Immediate and
Displacement are the
dominant modes

28

Memory addressing mode frequency on VAX for three different program code

EE/CS520- Comp. Archi.

9/6/2012

Displacement Mode

add R1, 100(R2)

Percentage of displacement

Displacement values are widely distributed

29

EE/CS520- Comp. Archi.

No. of bits of displacement

9/6/2012

Immediate Mode (1)


Widely used in arithmetic ops
In comparisons for example
cmp R1, #400

Also in moves

When constant value is needed in a reg.


Both constants written in the code and address constants
mov R1, #400

30

EE/CS520- Comp. Archi.

9/6/2012

Immediate Mode (2)

31

About of Loads and ALU ops use immediate mode for


integer programs, an overall 1/5 of all instructions

EE/CS520- Comp. Archi.

9/6/2012

Percentage of immediate

Immediate Mode (3)

32

EE/CS520- Comp. Archi.

Small imm. values are mostly


used. Large imm. values are
seldom used

No. of bits needed for immediate

9/6/2012

Operations in the ISA


Operator type

Examples

Arith. Or Logical

Integer arithmetic: add, sub, and, or, shift

Data transfer

Control

Branch, jump, procedure call, procedure return

Decimal

Decimal add, decimal mul, decimal-to-character


conversion

System
FP

String

Graphics

33

Load, store

EE/CS520- Comp. Archi.

OS call, virtual memory management inst


FP ops: add, sub, mul

String move, string compare, string search

Pixels and vertex operations, (de)compression

9/6/2012

Top 10 instructions in 80x86

34

EE/CS520- Comp. Archi.

9/6/2012

Inst Frequency

Occurrence of RISC Instructions

load

35

store

add

EE/CS520- Comp. Archi.

sub

or

and

xor

sl

sr

Different RISC Insts

mult

div

sqrt

9/6/2012

RISC Machines
90 10 Rule :
By profiling the program performance, we note that

Only 10% of instructions are used 90% of the time

90% of unused instructions are costly in time and silicon area

Idea : we limit the number of instructions in an ISA

To those that are most frequently used


We will carry out the complex instructions by combination of simple
instructions
Instructions executable in 1 cycle
higher clock frequency
Multiple instructions
in 1 cycle

36

EE/CS520- Comp. Archi.

1 complex instruction
in n cycles

Processor RISC

Reduced Instruction Set Computer

9/6/2012

Assignment -1: Review of Assembly Language


Assignment:

Some C codes to be converted into MIPS assembly codes


Will be available on LMS under assignments tab on Thursday, 6th
September 2012 (today) by 12:00 pm

Helping material:

A PDF about basics of assembly language and C->MIPS conversion


Will be available on LMS under reading material
You can discuss it with TA during tutorial slot on Friday

Submission Deadline: Thursday, 13th September 2012 12:00 pm


Late assignments: 25% marks deduction per day
Submission Format: Hard copy to TA or me in my office
37

EE/CS520- Comp. Archi.

9/6/2012

Das könnte Ihnen auch gefallen