Sie sind auf Seite 1von 74

Computer Architecture

Chapter 1
Fundamentals

Chapter 1 - Fundamentals 1
Introduction

1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy

Chapter 1 - Fundamentals 2
Art and
Architecture

What’s the difference


between Art and
Architecture?

Lyonel Feininger,
Marktkirche in Halle

Chapter 1 - Fundamentals 3
Art and Architecture

Notre Dame
de Paris

What’s the difference between Art and Architecture?


Chapter 1 - Fundamentals 4
What’s Computer Architecture?
The attributes of a [computing] system as seen by the
programmer, i.e., the conceptual structure and functional
behavior, as distinct from the organization of the data
flows and controls the logic design, and the physical
implementation.
Amdahl, Blaaw, and Brooks, 1964

SOFTWARE

Chapter 1 - Fundamentals 5
What’s Computer Architecture?
• 1950s to 1960s: Computer Architecture Course
Computer Arithmetic.
• 1970s to mid 1980s: Computer Architecture Course
Instruction Set Design, especially ISA appropriate for
compilers. (What we’ll do in Chapter 2)
• 1990s to 2000s: Computer Architecture Course
Design of CPU, memory system, I/O system,
Multiprocessors. (All evolving at a tremendous rate!)

Chapter 1 - Fundamentals 6
The Task of a
Computer Designer
1.1 Introduction
1.2 The Task of a Computer
Designer
1.3 Technology and Computer
Usage Trends Evaluate Existing
1.4 Cost and Trends in Cost Implementation Systems for
1.5 Measuring and Reporting Complexity Bottlenecks
Performance
1.6 Quantitative Principles of
Computer Design Benchmarks
1.7 Putting It All Together: The
Concept of Memory Technology
Hierarchy Trends

Implement Next
Simulate New
Generation System
Designs and
Organizations

Workloads

Chapter 1 - Fundamentals 7
Technology and
Computer Usage Trends
1.1 Introduction
1.2 The Task of a Computer Designer When building a Cathedral numerous
1.3 Technology and Computer Usage very practical considerations need to
Trends
be taken into account:
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance • available materials
1.6 Quantitative Principles of Computer • worker skills
Design
• willingness of the client to pay the
1.7 Putting It All Together: The Concept of
Memory Hierarchy price.

Similarly, Computer Architecture is about


working within constraints:
• What will the market buy?
• Cost/Performance
• Tradeoffs in materials and processes

Chapter 1 - Fundamentals 8
Trends
Gordon Moore (Founder of Intel) observed in 1965 that the number of
transistors that could be crammed on a chip doubles every year.
This has CONTINUED to be true since then.
Transistors Per Chip

1.E+08

Pentium 3

Pentium Pro
1.E+07 Pentium II
Pentium Power PC G3

486 Power PC 601


1.E+06
386

80286
1.E+05

8086

1.E+04

4004
1.E+03
1970 1975 1980 1985 1990 1995 2000 2005

Chapter 1 - Fundamentals 9
Trends
Processor performance, as measured by the SPEC benchmark has
also risen dramatically.

5000
Alpha 6/833

4000

3000

2000
DEC Alpha 5/500
DEC
1000 Sun MIPS
IBM AXP/
RS/ 500 DEC Alpha 4/266 DEC Alpha 21264/600
-4/ M
6000
260 2000
0
87
88
89
90
91
92
93
94
95
96
97
98
99
2000
Chapter 1 - Fundamentals 10
Trends
Memory Capacity (and Cost) have changed dramatically in the last 20
years.

size

1000000000

100000000
year size(Mb) cyc time
10000000
1980 0.0625 250 ns
1983 0.25 220 ns
1000000
1986 1 190 ns
100000 1989 4 165 ns
1992 16 145 ns
10000
1996 64 120 ns
1000 2000 256 100 ns
1970 1975 1980 1985 1990 1995 2000

Year

Chapter 1 - Fundamentals 11
Trends
Based on SPEED, the CPU has increased dramatically, but memory
and disk have increased only a little. This has led to dramatic
changed in architecture, Operating Systems, and Programming
practices.

Capacity Speed (latency)


Logic 2x in 3 years 2x in 3 years
DRAM 4x in 3 years 2x in 10 years
Disk 4x in 3 years 2x in 10 years

Chapter 1 - Fundamentals 12
Measuring And
Reporting Performance
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance This section talks about:
1.6 Quantitative Principles of Computer
Design
1. Metrics – how do we describe
1.7 Putting It All Together: The Concept of
Memory Hierarchy in a numerical way the
performance of a computer?

2. What tools do we use to find


those metrics?

Chapter 1 - Fundamentals 13
Metrics
Throughput
Plane DC to Paris Speed Passengers
(pmph)

Boeing 747 6.5 hours 610 mph 470 286,700

BAD/Sud
3 hours 1350 mph 132 178,200
Concodre

• Time to run the task (ExTime)


– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns …
(Performance)
– Throughput, bandwidth
Chapter 1 - Fundamentals 14
Metrics - Comparisons
"X is n times faster than Y" means

ExTime(Y) Performance(X)
--------- = ---------------
ExTime(X) Performance(Y)

Speed of Concorde vs. Boeing 747

Throughput of Boeing 747 vs. Concorde


Chapter 1 - Fundamentals 15
Metrics - Comparisons
Pat has developed a new product, "rabbit" about which she wishes to determine
performance. There is special interest in comparing the new product, rabbit to the
old product, turtle, since the product was rewritten for performance reasons. (Pat
had used Performance Engineering techniques and thus knew that rabbit was
"about twice as fast" as turtle.) The measurements showed:

Performance Comparisons

Product Transactions / second Seconds/ transaction Seconds to process transaction


Turtle 30 0.0333 3
Rabbit 60 0.0166 1

Which of the following statements reflect the performance comparison of rabbit and
turtle?

o Rabbit is 100% faster than turtle. o Rabbit takes 200% less time than turtle.
o Rabbit is twice as fast as turtle. o Turtle is 50% as fast as rabbit.
o Rabbit takes 1/2 as long as turtle. o Turtle is 50% slower than rabbit.
o Rabbit takes 1/3 as long as turtle. o Turtle takes 200% longer than rabbit.
o Rabbit takes 100% less time than turtle. o Turtle takes 300% longer than rabbit.

Chapter 1 - Fundamentals 16
Metrics - Throughput
Application Answers per month
Operations per second
Programming
Language
Compiler
(millions) of Instructions per second: MIPS
ISA (millions) of (FP) operations per second: MFLOP/s
Datapath
Control Megabytes per second
Function Units
Transistors Wires Pins Cycles per second (clock rate)

Chapter 1 - Fundamentals 17
Methods For Predicting
Performance
• Benchmarks, Traces, Mixes
• Hardware: Cost, delay, area, power estimation
• Simulation (many levels)
– ISA, RT, Gate, Circuit
• Queuing Theory
• Rules of Thumb
• Fundamental “Laws”/Principles

Chapter 1 - Fundamentals 18
Benchmarks
SPEC: System Performance Evaluation
Cooperative
• First Round 1989
– 10 programs yielding a single number (“SPECmarks”)
• Second Round 1992
– SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs)
• Compiler Flags unlimited. March 93 of DEC 4000 Model 610:
spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)=
memcpy(b,a,c)”
wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200
nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas
• Third Round 1995
– new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating
point)
– “benchmarks useful for 3 years”
– Single flag setting for all programs: SPECint_base95, SPECfp_base95

Chapter 1 - Fundamentals 19
Benchmarks
CINT2000 (Integer Component of SPEC CPU2000):

Program Language What Is It


164.gzip C Compression
175.vpr C FPGA Circuit Placement and Routing
176.gcc C C Programming Language Compiler
181.mcf C Combinatorial Optimization
186.crafty C Game Playing: Chess
197.parser C Word Processing
252.eon C++ Computer Visualization
253.perlbmk C PERL Programming Language
254.gap C Group Theory, Interpreter
255.vortex C Object-oriented Database
256.bzip2 C Compression
300.twolf C Place and Route Simulator
http://www.spec.org/osg/cpu2000/CINT2000/
Chapter 1 - Fundamentals 20
Benchmarks
CFP2000 (Floating Point Component of SPEC
CPU2000):
Program Language What Is It
168.wupwise Fortran 77 Physics / Quantum Chromodynamics
171.swim Fortran 77 Shallow Water Modeling
172.mgrid Fortran 77 Multi-grid Solver: 3D Potential Field
173.applu Fortran 77 Parabolic / Elliptic Differential Equations
177.mesa C 3-D Graphics Library
178.galgel Fortran 90 Computational Fluid Dynamics
179.art C Image Recognition / Neural Networks
183.equake C Seismic Wave Propagation Simulation
187.facerec Fortran 90 Image Processing: Face Recognition
188.ammp C Computational Chemistry
189.lucas Fortran 90 Number Theory / Primality Testing
191.fma3d Fortran 90 Finite-element Crash Simulation
200.sixtrack Fortran 77 High Energy Physics Accelerator Design
301.apsi Fortran 77 Meteorology: Pollutant Distribution

http://www.spec.org/osg/cpu2000/CFP2000/
Chapter 1 - Fundamentals 21
Benchmarks Sample Results For
SpecINT2000
http://www.spec.org/osg/cpu2000/results/res2000q3/cpu2000-20000718-00168.asc

Base Base Base Peak Peak Peak


Benchmarks Ref Time Run Time Ratio Ref Time Run Time Ratio
164.gzip 1400 277 505* 1400 270 518*
175.vpr 1400 419 334* 1400 417 336*
176.gcc 1100 275 399* 1100 272 405*
181.mcf 1800 621 290* 1800 619 291* Intel OR840(1 GHz
Pentium III processor)
186.crafty 1000 191 522* 1000 191 523*
197.parser 1800 500 360* 1800 499 361*
252.eon 1300 267 486* 1300 267 486*
253.perlbmk 1800 302 596* 1800 302 596*
254.gap 1100 249 442* 1100 248 443*
255.vortex 1900 268 710* 1900 264 719*
256.bzip2 1500 389 386* 1500 375 400*
300.twolf 3000 784 382* 3000 776 387*
SPECint_base2000 438
SPECint2000 442

Chapter 1 - Fundamentals 22
Benchmarks
Performance Evaluation
• “For better or worse, benchmarks shape a field”
• Good products created when have:
– Good benchmarks
– Good ways to summarize performance
• Given sales is a function in part of performance relative to
competition, investment in improving product as reported by
performance summary
• If benchmarks/summary inadequate, then choose between
improving product for real programs vs. improving product to get
more sales;
Sales almost always wins!
• Execution time is the measure of computer performance!

Chapter 1 - Fundamentals 23
Benchmarks
How to Summarize Performance
Management would like to have one number.
Technical people want more:
1. They want to have evidence of reproducibility – there should be enough
information so that you or someone else can repeat the experiment.
2. There should be consistency when doing the measurements multiple
times.

How would you report these results?


Computer A Computer B Computer C

Program P1 (secs) 1 10 20

Program P2 (secs) 1000 100 20

Total Time (secs) 1001 110 40

Chapter 1 - Fundamentals 24
Quantitative Principles
of Computer Design
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost
Make the common case fast.
1.5 Measuring and Reporting Performance Amdahl’s Law:
1.6 Quantitative Principles of Computer Relates total speedup of a
Design
system to the speedup of some
1.7 Putting It All Together: The Concept of
Memory Hierarchy portion of that system.

Chapter 1 - Fundamentals 25
Quantitative Amdahl's Law
Design

Speedup due to enhancement E:

Execution _ Time _ Without _ Enhancement Performance _ With _ Enhancement


Speedup( E ) = =
Execution _ Time _ With _ Enhancement Performance _ Without _ Enhancement

This fraction enhanced


Suppose that enhancement E accelerates a fraction F
of the task by a factor S, and the remainder of the
task is unaffected
Chapter 1 - Fundamentals 26
Quantitative
Amdahl's Law
Design
ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced

Speedupenhanced

1
ExTimeold
Speedupoverall = =
ExTimenew (1 - Fractionenhanced) + Fractionenhanced

Speedupenhanced

This fraction enhanced


ExTimeold ExTimenew

Chapter 1 - Fundamentals 27
Quantitative Amdahl's Law
Design
• Floating point instructions improved to run 2X; but only
10% of actual instructions are FP

ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold

1
Speedupoverall = = 1.053
0.95

Chapter 1 - Fundamentals 28
Quantitative Cycles Per
Design Instruction
CPI = (CPU Time * Clock Rate) / Instruction Count
= Cycles / Instruction Count

n
CPU _ Time = Cycle _ Time * ∑ CPI i * I i
i =1
“Instruction Frequency” Number of
instructions of
type I.

n
CPI = ∑ CPI i * Fi where Fi = Ii
Instruction _ Count
i =1

Invest Resources where time is Spent!


Chapter 1 - Fundamentals 29
Quantitative Cycles Per
Design Instruction
Suppose we have a machine where we can count the frequency with which
instructions are executed. We also know how many cycles it takes for
each instruction type.

Base Machine (Reg / Reg)


Op Freq Cycles CPI(i) (% Time)
ALU 50% 1 .5 (33%)
Load 20% 2 .4 (27%)
Store 10% 2 .2 (13%)
Branch 20% 2 .4 (27%)
Total CPI 1.5

How do we get CPI(I)?


How do we get %time?
Chapter 1 - Fundamentals 30
Quantitative Locality of
Design Reference
Programs access a relatively small portion of the address space at
any instant of time.

There are two different types of locality:

Temporal Locality (locality in time): If an item is referenced, it will


tend to be referenced again soon (loops, reuse, etc.)

Spatial Locality (locality in space/location): If an item is referenced,


items whose addresses are close by tend to be referenced soon
(straight line code, array access, etc.)

Chapter 1 - Fundamentals 31
The Concept of
Memory Hierarchy
1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage
Trends
1.4 Cost and Trends in Cost Fast memory is expensive.
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer
Design Slow memory is cheap.
1.7 Putting It All Together: The Concept of
Memory Hierarchy
The goal is to minimize the
price/performance for a
particular price point.

Chapter 1 - Fundamentals 32
Memory Hierarchy

Level 1 Level 2
Registers Memory Disk
cache Cache

Typical 4 - 64 <16K bytes <2 Mbytes <16 >5


Size Gigabytes Gigabytes
Access 1 nsec 3 nsec 15 nsec 150 nsec 5,000,000
Time nsec
Bandwidth 10,000 – 2000 - 5000 500 - 1000 500 - 1000 100
(in MB/sec) 50,000
Managed Compiler Hardware Hardware OS OS/User
By

Chapter 1 - Fundamentals 33
Memory Hierarchy
• Hit: data appears in some block in the upper level (example:
Block X)
– Hit Rate: the fraction of memory access found in the upper level
– Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
• Miss: data needs to be retrieve from a block in the lower level
(Block Y)
– Miss Rate = 1 - (Hit Rate)
– Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor
• Hit Time << Miss Penalty (500 instructions on 21264!)

Chapter 1 - Fundamentals 34
Memory Hierarchy

Level 1 Level 2
Registers Memory Disk
cache Cache

What is the cost of executing a program if:


• Stores are free (there’s a write pipe)
• Loads are 20% of all instructions
• 80% of loads hit (are found) in the Level 1 cache
• 97 of loads hit in the Level 2 cache.

Chapter 1 - Fundamentals 35
Wrap Up

1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together: The Concept of Memory Hierarchy

Chapter 1 - Fundamentals 36
Computer Architecture
Chapter 2
Instruction Sets

Chapter 1 - Fundamentals 37
Introduction
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The MIPS Architecture
Bonus

Chapter 1 - Fundamentals 38
Introduction
The Instruction Set Architecture is that portion of the machine visible
to the assembly level programmer or to the compiler writer.

software

instruction set

hardware

1. What are the advantages and disadvantages of various


instruction set alternatives.
2. How do languages and compilers affect ISA.
3. Use the DLX architecture as an example of a RISC architecture.
Chapter 1 - Fundamentals 39
Classifying Instruction Set
2.1 Introduction
Architectures
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
Classifications can be by:
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers 1. Stack/accumulator/register
2.8 The DLX Architecture 2. Number of memory operands.
3. Number of total operands.

Chapter 1 - Fundamentals 40
Instruction Set Basic ISA
Architectures Classes
Accumulator:
1 address add A acc ← acc + mem[A]
1+x address addx A acc ← acc + mem[A + x]

Stack:
0 address add tos ← tos + next

General Purpose Register:


ALU Instructions
2 address add A B EA(A) ← EA(A) + EA(B)
can have two or
3 address add A B C EA(A) ← EA(B) + EA(C)
three operands.
Load/Store:
0 Memory load R1, Mem1
ALU Instructions can
load R2, Mem2 have 0, 1, 2, 3 operands.
add R1, R2 Shown here are cases of
0 and 1.
1 Memory add R1, Mem2

Chapter 1 - Fundamentals 41
Instruction Set Basic ISA
Architectures Classes
The results of different address classes is easiest to see with the examples here,
all of which implement the sequences for C = A + B.

Stack Accumulator Register Register


(Register-memory) (load-store)
Push A Load A Load R1, A Load R1, A

Push B Add B Add R1, B Load R2, B

Add Store C Store C, R1 Add R3, R1, R2

Pop C Store C, R3

Registers are the class that won out. The more registers on the CPU, the better.

Chapter 1 - Fundamentals 42
Instruction Set Intel 80x86
Architectures Integer Registers
GPR0 EAX Accumulator
GPR1 ECX Count register, string, loop
GPR2 EDX Data Register; multiply, divide
GPR3 EBX Base Address Register
GPR4 ESP Stack Pointer
GPR5 EBP Base Pointer – for base of stack seg.
GPR6 ESI Index Register
GPR7 EDI Index Register
CS Code Segment Pointer
SS Stack Segment Pointer
DS Data Segment Pointer
ES Extra Data Segment Pointer
FS Data Seg. 2
GS Data Seg. 3
PC EIP Instruction Counter
Eflags Condition Codes
Chapter 1 - Fundamentals 43
Memory Addressing
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set Sections Include:
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
Interpreting Memory Addresses
2.7 The Role of Compilers
2.8 The DLX Architecture
Addressing Modes

Displacement Address Mode

Immediate Address Mode

Chapter 1 - Fundamentals 44
Memory Interpreting Memory
Addressing Addresses

What object is accessed as a function of the address and length?

Objects have byte addresses – an address refers to the number of bytes


counted from the beginning of memory.
Little Endian – puts the byte whose address is xx00 at the least
significant position in the word.
Big Endian – puts the byte whose address is xx00 at the most significant
position in the word.
Alignment – data must be aligned on a boundary equal to its size.
Misalignment typically results in an alignment fault that must be
handled by the Operating System.

Chapter 1 - Fundamentals 45
Memory Addressing
Addressing Modes
This table shows the most common modes. A more complete set is in
Figure 2.6

Addressing Mode Example Meaning When Used


Instruction
Register Add R4, R3 R[R4] <- R[R4] + R[R3] When a value is in a
register.
Immediate Add R4, #3 R[R4] <- R[R4] + 3 For constants.

Displacement Add R4, 100(R1) R[R4] <- R[R4] + Accessing local


M[100+R[R1] ] variables.
Register Deferred Add R4, (R1) R[R4] <- R[R4] + Using a pointer or a
M[R[R1] ] computed address.
Absolute Add R4, (1001) R[R4] <- R[R4] + M[1001] Used for static data.

Chapter 1 - Fundamentals 46
Addressing Modes
Mode Example Meaning
Register add r4, r3 R[4]←R[4]+R[3]

Immediate add r4, #3 R[4]←R[4]+3


Displacement add r4, 100(r1) R[4]←R[4]+M[100+R[1]]
Register indirect add r4, (r1) R[4]←R[4]+M[R[1]]
Indexed add r3, (r1+r2) R[3]←R[3]+M[R[1]+R[2]]
Direct/Absolute add r1, (1001) R[1]←R[1]+M[1001]
Memory indirect add r1, @(r3) R[1]←R[1]+M[M[R[3]]]
Autoincrement add r1, (r2)+ R[1]←R[1]+M[R[2]]
R[2]←R[2]+d
Autodecrement add r1, – (r2) R[2]←R[2] – d
R[1]←R[1]+M[R[2]]
Scaled add r1, 100(r2)[r3] R[1]←R[1]+M[100+R[2]+R[3]*d]

( ) Î memory access [ ] Î accessing


Chapter a Register or Memory location
1 - Fundamentals 47
Memory Displacement
Addressing Addressing Mode
How big should the displacement be?

For addresses that do fit in displacement size:


Add R4, 10000 (R0)
For addresses that don’t fit in displacement size, the compiler
must do the following:
Load R1, address
Add R4, 0 (R1)

Depends on typical displaces as to how big this should be.

On both IA32 and DLX, the space allocated is 16 bits.

Chapter 1 - Fundamentals 48
Memory Immediate Address
Addressing Mode
Used where we want to get to a numerical value in an
instruction.

At high At Assembler level:


level:
Load R2, 3
a = b + 3; Add R0, R1, R2

Load R2, 17
if ( a > 17 ) CMPBGT R1, R2

Load R1, Address


goto Addr Jump (R1)

So how would you get a 32 bit value into a register?


Chapter 1 - Fundamentals 49
Operations In The
2.1 Introduction
2.2 Classifying Instruction Set Architectures Instruction Set
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands Sections Include:
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
Detailed information about types
2.8 The DLX Architecture
of instructions.

Instructions for Control Flow


(conditional branches, jumps)

Chapter 1 - Fundamentals 50
Operations In The Operator Types
Instruction Set
Arithmetic and logical - and, add
Data transfer - move, load
Control - branch, jump, call
System - system call, traps
Floating point - add, mul, div, sqrt
Decimal - add, convert
String - move, compare
Multimedia - 2D, 3D? e.g., Intel MMX and Sun VIS

Chapter 1 - Fundamentals 51
Operations In The Control
Instructions
Instruction Set Conditional branches are 20%
of all instructions!!

Control Instructions Issues:


• taken or not
• where is the target
• link return address
• save or restore
Instructions that change the PC:
• (conditional) branches, (unconditional) jumps
• function calls, function returns
• system calls, system returns

Chapter 1 - Fundamentals 52
Operations In The Control
Instructions
Instruction Set
There are numerous tradeoffs: There are numerous tradeoffs:

Compare and branch condition in general-purpose register


+ no extra compare, no state passed + no special state but uses up a register
between instructions -- branch condition separate from branch
logic in pipeline
-- requires ALU op, restricts code
some data for MIPS
scheduling opportunities
> 80% branches use immediate data, >
Implicitly set condition codes - Z, N, V, C 80% of those zero
+ can be set ``for free'' 50% branches use == 0 or <> 0
-- constrains code reordering, extra compromise in MIPS
state to save/restore branch==0, branch<>0
Explicitly set condition codes compare instructions for all other
+ can be set ``for free'', decouples compares
branch/fetch from pipeline
-- extra state to save/restore

Chapter 1 - Fundamentals 53
Operations In The Control
Instructions
Instruction Set
Link Return Address: Save or restore state:

implicit register - many recent What state?


architectures use this function calls: registers
system calls: registers, flags, PC, PSW, etc
+ fast, simple
-- s/w save register before next call, Hardware need not save registers
surprise traps? Caller can save registers in use
explicit register Callee save registers it will use
+ may avoid saving register Hardware register save
IBM STM, VAX CALLS
-- register must be specified
Faster?
processor stack
Many recent architectures do no register
+ recursion direct saving
-- complex instructions
Or do implicit register saving with register
windows (SPARC)

Chapter 1 - Fundamentals 54
Type And Size of Operands
2.1 Introduction
The type of the operand is usually
2.2 Classifying Instruction Set encoded in the Opcode – a LDW
Architectures
implies loading of a word.
2.3 Memory Addressing
Common sizes are:
2.4 Operations in the Instruction Set
Character (1 byte)
2.5 Type and Size of Operands
Half word (16 bits)
2.6 Encoding and Instruction Set Word (32 bits)
2.7 The Role of Compilers Single Precision Floating Point (1 Word)
2.8 The DLX Architecture Double Precision Floating Point (2 Words)
Integers are two’s complement binary.
Floating point is IEEE 754.
Some languages (like COBOL) use
packed decimal.

Chapter 1 - Fundamentals 55
Encoding And Instruction Set
2.1 Introduction This section has to do with how an
2.2 Classifying Instruction Set Architectures assembly level instruction is
2.3 Memory Addressing encoded into binary.
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
Ultimately, it’s the binary that is
2.6 Encoding and Instruction Set
read and interpreted by the
2.7 The Role of Compilers
machine.
2.8 The DLX Architecture

We will be using the Intel instruction set which is defined at:


http://developer.intel.com/design/Pentium4/manuals.

Volume 2 has the instruction set.

Chapter 1 - Fundamentals 56
Encoding And 80x86 Instruction
Instruction Set Encoding
for ( index = 0; index < iterations; index++ ) Here’s some
0040D3AF C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0 sample code that’s
0040D3B6 EB 09 jmp main+0D1h (0040d3c1) been disassembled.
0040D3B8 8B 4D F0 mov ecx,dword ptr [ebp-10h] It was compiled
0040D3BB 83 C1 01 add ecx,1 with the debugger
0040D3BE 89 4D F0 mov dword ptr [ebp-10h],ecx option so is not
0040D3C1 8B 55 F0 mov edx,dword ptr [ebp-10h] optimized.
0040D3C4 3B 55 F8 cmp edx,dword ptr [ebp-8]
0040D3C7 7D 15 jge main+0EEh (0040d3de)
long_temp = (*alignment + long_temp) % 47; This code
0040D3C9 8B 45 F4 mov eax,dword ptr [ebp-0Ch] was
0040D3CC 8B 00 mov eax,dword ptr [eax] produced
0040D3CE 03 45 EC add eax,dword ptr [ebp-14h] using Visual
0040D3D1 99 cdq Studio
0040D3D2 B9 2F 00 00 00 mov ecx,2Fh
0040D3D7 F7 F9 idiv eax,ecx
0040D3D9 89 55 EC mov dword ptr [ebp-14h],edx
0040D3DC EB DA jmp main+0C8h (0040d3b8)
Chapter 1 - Fundamentals 57
Encoding And 80x86 Instruction
Instruction Set Encoding
Here’s some
for ( index = 0; index < iterations; index++ )
sample code that’s
00401000 8B 0D 40 54 40 00 mov ecx,dword ptr ds:[405440h]
00401006 33 D2 xor edx,edx
been disassembled.
00401008 85 C9 test ecx,ecx It was compiled
0040100A 7E 14 jle 00401020 with optimization
0040100C 56 push esi
0040100D 57 push edi This code
0040100E 8B F1 mov esi,ecx was
long_temp = (*alignment + long_temp) % 47; produced
00401010 8D 04 11 lea eax,[ecx+edx] using Visual
00401013 BF 2F 00 00 00 mov edi,2Fh Studio
00401018 99 cdq
00401019 F7 FF idiv eax,edi
0040101B 4E dec esi
0040101C 75 F2 jne 00401010
0040101E 5F pop edi
0040101F 5E pop esi
00401020 C3 ret
Chapter 1 - Fundamentals 58
Encoding And 80x86 Instruction
Instruction Set Encoding
Here’s some
for ( index = 0; index < iterations; index++ )
sample code that’s
0x804852f <main+143>: add $0x10,%esp been disassembled.
0x8048532 <main+146>: lea 0xfffffff8(%ebp),%edx It was compiled
0x8048535 <main+149>: test %esi,%esi with optimization
0x8048537 <main+151>: jle 0x8048543 <main+163>
0x8048539 <main+153>: mov %esi,%eax This code
0x804853b <main+155>: nop was
0x804853c <main+156>: lea 0x0(%esi,1),%esi produced
long_temp = (*alignment + long_temp) % 47; using gcc
0x8048540 <main+160>: dec %eax and gdb.
0x8048541 <main+161>: jne 0x8048540 <main+160> For details,
0x8048543 <main+163>: add $0xfffffff4,%esp
see Lab 2.1

Note that the representation of


the code is dependent on the
compiler/debugger!
Chapter 1 - Fundamentals 59
Encoding And 80x86 Instruction
Encoding
Instruction Set

4 3 1 8 A Morass of disjoint encoding!!


ADD Reg W Disp.

6 2 8 8

SHL V/w postbyte Disp.

This is Figure D.8


7 1 8 8

TEST W postbyte Immediate

Chapter 1 - Fundamentals 60
Encoding And 80x86 Instruction
Encoding
Instruction Set
4 4 8

JE Cond Disp.

8 16 16

CALLF Offset Segment Number

6 2 8 8

MOV D/w postbyte Disp.

5 3

PUSH Reg

Chapter 1 - Fundamentals 61
Encoding And 80x86 Instruction
Instruction Set
Here’s the instruction that we had several pages ago:
Encoding
0040D3AF C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0
Is described in:
http://developer.intel.com/design/pentium4/manuals/245471.htm
(I found it on page 479, but this is obviously version dependent.)

C7 /0 MOV r/m32,imm32 Move an immediate 32 bit data item to a register or to memory.

Copies the second operand (source operand) to the first operand (destination operand).
The source operand can be an immediate value, general purpose register, segment
register, or memory location. Both operands must be the same size, which can be a
byte, a word, or a doubleword.
In our case, because of the “C7” Opcode, we know it’s a sub-flavor of MOV putting an
immediate value into memory.

C7 45 F0 00 00 00 00 mov dword ptr [ebp-10h],0


Op Code for
Mov Immediate
This is 32 bits of 0.
Target Register -10 hex.
+ use next 8 bits as
displacement. Chapter 1 - Fundamentals 62
The Role of Compilers
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set Compiler goals:
2.5 Type and Size of Operands • All correct programs execute
2.6 Encoding and Instruction Set correctly
2.7 The Role of Compilers • Most compiled programs
2.8 The DLX Architecture execute fast (optimizations)
• Fast compilation
• Debugging support

Chapter 1 - Fundamentals 63
The Role of Steps In Compilation
Compilers
Parsing --> intermediate representation
Jump Optimization
Loop Optimizations
Register Allocation
Code Generation --> assembly code
Common Sub-Expression
Procedure in-lining
Constant Propagation
Strength Reduction
Pipeline Scheduling

Chapter 1 - Fundamentals 64
The Role of
Steps In Compilation
Compilers
Optimization Explanation % of the total number of
Name optimizing
transformations
High Level At or near the source level; Not Measured
machine-independent

Local Within Straight Line Code 40%

Global Across A Branch 42%

Machine Dependent Depends on Machine Knowledge Not Measured

Chapter 1 - Fundamentals 65
The Role of What compiler writers want:
Compilers
One solution or all possible solutions
• regularity • 2 branch conditions - eq, lt
• orthogonality • or all six - eq, ne, lt, gt, le, ge
• composability • not 3 or 4

Compilers perform a giant case There are advantages to having


analysis instructions that are primitives.
• too many choices make it hard
Let the compiler put the instructions
Orthogonal instruction sets together to make more complex
• operation, addressing mode, data sequences.
type

Chapter 1 - Fundamentals 66
The MIPS Architecture
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing MIPS is very RISC oriented.
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
MIPS will be used for many
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
examples throughout the
2.8 The MIPS Architecture course.

Chapter 1 - Fundamentals 67
The MIPS MIPS Characteristics
Architecture Addressing Modes
There’s MIPS – 32 that we learned in • Immediate
CS140 • Displacement
32-bit byte addresses aligned • (Register Mode used only for ALU)
Load/store - only displacement
addressing
Standard datatypes Data transfer
3 fixed length formats • load/store word, load/store
32 32-bit GPRs (r0 = 0) byte/halfword signed?
16 64-bit (32 32-bit) FPRs • load/store FP single/double
FP status register • moves between GPRs and FPRs
No Condition Codes ALU
There’s MIPS – 64 – the current arch. • add/subtract signed? immediate?
Standard datatypes • multiply/divide signed?
4 fixed length formats (8,16,32,64) • and,or,xor immediate?, shifts: ll, rl,
32 64-bit GPRs (r0 = 0) ra immediate?
64 64-bit FPRs • sets immediate?
Chapter 1 - Fundamentals 68
The MIPS MIPS Characteristics
Architecture
Control
• branches == 0, <> 0
• conditional branch testing FP bit
• jump, jump register
• jump & link, jump & link register
• trap, return-from-exception

Floating Point
• add/sub/mul/div
• single/double
• fp converts, fp set

Chapter 1 - Fundamentals 69
The MIPS The MIPS Encoding
Architecture
Register-Register
31 26 25 21 20 16 15 11 10 6 5 0

Op Rs1 Rs2 Rd Opx

Register-Immediate
31 26 25 21 20 16 15 0

Op Rs1 Rd immediate

Branch
31 26 25 21 20 16 15 0

Op Rs1 Rs2/Opx immediate

Jump / Call
31 26 25 0

Op target

Chapter 1 - Fundamentals 70
RISC versus CISC
BONUS

combines 3 features
• architecture
• implementation
• compilers and OS
argues that
• implementation effects are second order
• compilers are similar
• RISCs are better than CISCs: fair comparison?

Chapter 1 - Fundamentals 71
RISC versus CISC
BONUS

RISC factor: {CPI VAX * Instr VAX }/ {CPI MIPS * Instr MIPS }

Benchmark Instruction CPI CPI CPI RISC


Ratio MIPS VAX Ratio factor
li 1.6 1.1 6.5 6.0 3.7
eqntott 1.1 1.3 4.4 3.5 3.3
fpppp 2.9 1.5 15.2 10.5 2.7
tomcatv 2.9 2.1 17.5 8.2 2.9

Chapter 1 - Fundamentals 72
RISC versus CISC
BONUS
Compensating factors Factors favoring MIPS
• Increase VAX CPI but decrease • Operand specifier decoding
VAX instruction count • Number of registers
• Increase MIPS instruction count • Separate floating point unit
• e.g. 1: loads/stores versus • Simple branches/jumps (lower
operand specifiers latency)
• e.g. 2: necessary complex • No complex instructions
instructions: loop branches • Instruction scheduling
Factors favoring VAX • Translation buffer
• Big immediate values • Branch displacement size
• Not-taken branches incur no
delay

Chapter 1 - Fundamentals 73
Wrapup
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The DLX Architecture
Bonus

Chapter 1 - Fundamentals 74

Das könnte Ihnen auch gefallen