Beruflich Dokumente
Kultur Dokumente
Edition
The Hardware/Software Interface
Chapter 1
Computer Abstractions
and Technology
Classes of Computers
Personal computers
General purpose, variety of software
Subject to cost/performance tradeoff
Server computers
Network based
High capacity, performance, reliability
Range from small servers to building sized
Embedded computers
Hidden as components of systems
Stringent power/performance/cost constraints
Hierarchy of memories
BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50
0 100 200 300 400 500 0 2000 4000 6000 8000 10000
BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50
Clock (cycles)
Data transfer
and computation
Update state
B I 600ps 1.2
CPU Tim e
…by this much
CPU Tim e I 500ps
A
Chapter 1 — Computer Abstractions and Technology — 25
CPI in More Detail
If different instruction classes take different
numbers of cycles
n
Clock Cycles (CPIi Instruction Counti )
i1
Relative frequency
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
Sequence 1: IC = 5 Sequence 2: IC = 6
Clock Cycles Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
Avg. CPI = 10/5 = 2.0 Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and Technology — 27
Performance Summary
The BIG Picture
Performance depends on
Algorithm: affects IC, possibly CPI
Programming language: affects IC, CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, Tc
n
n
Execution time ratio
i1
i
Instruction count
MIPS
Execution time 106
Instruction count Clock rate
Instruction count CPI CPI 10 6
10 6
Clock rate
CPI varies between programs on a given CPU
Chapter 1 — Computer Abstractions and Technology — 32
§1.9 Concluding Remarks
Concluding Remarks
Cost/performance is improving
Due to underlying technology development
Hierarchical layers of abstraction
In both hardware and software
Instruction set architecture
The hardware/software interface
Execution time: the best performance
measure
Power is a limiting factor
Use parallelism to improve performance
Chapter 1 — Computer Abstractions and Technology — 33
Chapter 2
Instructions: Language
of the Computer
HW#1:
1.3 all, 1.4 all, 1.6.1, 1.14.4, 1.14.5, 1.14.6, 1.15.1, and 1.15.4
Due date: one week.
Practice:
1.5 all, 1.6 all, 1.10 all, 1.11 all, 1.14 all, and1.15 all
§2.1 Introduction
Instruction Set
The repertoire of instructions of a
computer
Different computers have different
instruction sets
But with many aspects in common
Early computers had very simple
instruction sets
Simplified implementation
Many modern computers also have simple
instruction sets
op rs rt rd sa funct R format
op rs rt immediate I format
0 17 18 8 0 0x22
35 19 8 2410
Memory
2410 + $s3 = 0xf f f f f f f f
0x120040ac
0x0000000c
0x00000008
0x00000004
0x00000000
Dr. Yahya Tashtoush
data word address (hex)
Byte Addresses
Since 8-bit bytes are so useful, most architectures
address individual bytes in memory
Alignment restriction - the memory address of a
1011 1101 -3
0101
1110 -2
and add a 1 1111 -1
and add a 1
0000 0
1010 0001 1
0110
0010 2
complement all the bits 0011 3
0100 4
0101 5
0110 6
23 - 1 = 0111 7
2s-Complement Signed Integers
Given an n-bit number
n1 n2
x xn12 x n 2 2 x12 x0 2
1 0
x x 1111...1112 1
x 1 x
Example: negate +2
+2 = 0000 0000 … 00102
–2 = 1111 1111 … 11012 + 1
= 1111 1111 … 11102
Instruction fields
op: operation code (opcode)
rs: first source register number
rt: second source register number
rd: destination register number
shamt: shift amount (00000 for now)
funct: function code (extends opcode)
0 17 18 8 0 32
000000100011001001000000001000002 = 0232402016
lhi $s0, 61 0000 0000 0111 1101 0000 0000 0000 0000
ori $s0, $s0, 2304 0000 0000 0111 1101 0000 1001 0000 0000
op rs rt constant or address
6 bits 5 bits 5 bits 16 bits
PC-relative addressing
Target address = PC + offset × 4
PC already incremented by 4 by this time
Dr. Yahya Tashtoush
Jump Control Flow Instructions
MIPS also has an unconditional branch instruction or
jump instruction:
j label #go to label
00
32
4
PC 32
Static linking
A
Y
B
Arithmetic/Logic Unit
Multiplexer Y = F(A, B)
Y = S ? I1 : I0
A
I0 M
u Y ALU Y
I1 x
B
S F
Clk
D Q
D
Clk
Q
Clk
D Q Write
Write D
Clk
Q
Increment by
4 for next
32-bit instruction
register
Sign-bit wire
replicated
Load/
35 or 43 rs rt address
Store
31:26 25:21 20:16 15:0
Branch 4 rs rt address
31:26 25:21 20:16 15:0
Four loads:
Speedup
= 8/3.5 = 2.3
Non-stop:
Speedup
= 2n/0.5n + 1.5 ≈ 4
= number of stages
Prediction
correct
Prediction
incorrect
MEM
Right-to-left WB
flow leads to
hazards
Wrong
register
number
Need to stall
for one cycle
Stall inserted
here
Or, more
accurately…
Chapter 4 — The Processor — 80
Datapath with Hazard Detection
Flush these
instructions
(Set control
values to 0)
PC
… IF ID EX MEM WB
beq stalled IF ID
beq stalled IF ID
beq stalled ID
Add Add
Data
Register #
PC Address Instruction Registers ALU Address
Register # Data
Instruction
memory
memory Register #
Data
Falling edge
R
Q
Q
S
"logically true",
— could mean electrically low
• Two inputs:
– the data value to be stored (D)
– the clock signal (C) indicating when to read & store D
• Two outputs:
– the value of the internal state (Q) and it's complement
C
D
Q
C
Q
_
Q
D
D D Q D Q
D D Q
latch latch
Q
C C Q
State State
element Combinational logic element
1 2
Clock cycle
• From the figure in the previous slide, the two state elements
surrounding a block of combinational logic, which operates
in a single clock cycle:
– All signals must propagate from state element 1, through
the combinational logic, and to state element 2 in the time
of one clock cycle.
– The time for signals to reach state element 2 defines the
length of the clock cycle.
• Both the clock signal and the write control signal are inputs,
and the state element is changed only when the write control
signal is asserted and a clock edge occurs.
Add Sum
Instruction
PC address
Instruction
Instruction
memory
Register 1
M
... u Read data 1
Read register x
number 1 Register n – 2
Read
Read register data 1 Register n – 1
number 2
Register file
Write Read
register data 2 Read register
number 2
Write
data Write
M
u Read data 2
x
Write
C
0
1 Register 0
n-to-2n .. D
Register number .
decoder
C
Register 1
n–1
D
n
.
..
C
Register n – 2
D
C
Register n – 1
Register data D
ALU control
5 Read 4
register 1
Read
Register 5 data 1
Read
numbers register 2 Zero
Registers Data ALU ALU
5 Write result
register
Read
Write data 2
Data data
RegWrite
a. Registers b. ALU
re g is te r 1
R e a d
d a ta 1
5
R e g is t e r R e a d
re g is te r 2 Z e r o
n u m b e rs
Instruction R e g is t e r s D a ta A L U
A L U
5
W r ite
r e s u lt
re g is te r
R e a d
d a ta 2
W r ite
D a ta
d a ta
R e g W rite
Instruction MemWrite
address
Read
Instruction PC Add Sum Address data
16 32
Instruction Sign
memory Data extend
Write memory
MemRead
a. Registers b. ALU
PCSrc
M
Add u
x
ALU
4 Add
result
Shift
left 2
16 32 MemRead
Sign
extend
1 6 3 2
A d d re s s R e a d S ig n
d a ta
e x t e n d
D a ta
W rite
d a ta m e m o ry
M e m R e a d
a . D a ta m e m o ry u n it b . S ig n - e x t e n s io n u n it
r e g ist e r 1
Read M e m W rite
5 d ata 1
R e g is te r R e ad
n u m be r s r e g ist e r 2 Z e ro
Instruction R e g is te r s A L U
A L U
5
W r ite A d dre ss R e ad
r e s u lt
d ata
r e g ist e r
Read
d ata 2
W r ite D ata
D a ta W rite
d a ta
da ta m e m ory
R e g W rite
M e m R ea d
1 6 3 2
S ig n
e x te n d
a. D ata m e m o ry un it
5 R e ad A L U c o n tr o l
4
re g is te r 1
R e a d
5 d a ta 1
R e ad
Instruction re g is te r 2 To
R e g is t e r s
A L U Z e ro branch
W r ite
re g is te r
control
R e a d
logic
d a ta 2
W r ite
d a ta
R e g W r ite
16 32
S ig n
e x te n d
R-Type
R-Type or Load
©2004 Morgan Kaufmann Publishers 37
Combining datapaths for memory instructions, R-type
instructions, and instruction fetch
Add Sum
4
Registers
ALU operation
4
Instruction
Read
register 1 ALUSrc
MemWrite
Load
PC address Read
Read data 1 MemtoReg
register 2 Zero
M
Add u
x
4 Add ALU
result
Shift
left 2
Registers
Read 3 ALU operation
MemWrite
Read register 1 ALUSrc
PC Read
address Read data 1 MemtoReg
register 2 Zero
Instruction ALU ALU
Write Read Address Read
register M result data
data 2 u M
Instruction u
memory Write x Data x
data memory
Write
RegWrite data
16 32
Sign
extend MemRead
op rs rt rd shamt funct
35 2 1 100
op rs rt 16 bit offset
0000 AND
0001 OR
0010 add
0110 subtract
0111 set-on-less-than
1100 NOR
Instruction [5–0]
Inputs
Op5
Op4
Op3
Op2
Op1
Op0
Outputs
R-format Iw sw beq
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUOp1
ALUOpO
M
Add u
x
ALU
4 Add
result
Shift
left 2
16 32 MemRead
Sign
extend
Instruction
register
PC Address Data
A
Instruction Register #
or data Registers ALU ALUOut
Memory
Register #
Memory B
Data data Register #
register
PC 0
Instruction Read 0
M
u Address [25–21] register 1 M
x Read u
A x
1 Instruction data 1
Memory Read 1 Zero
[20–16] register 2
MemData 0 ALU ALU
Instruction M Registers ALUOut
[15–0] Instruction u Write result
Read
Write [15–11] x register B 0
data 2
data Instruction 1 4 1M
register Write u
0 data 2 x
Instruction M 3
[15–0] u
x
1
Memory 16 32
Sign Shift
data extend left 2
register
• Break it down into steps following our rule that data flows through at
most one major functional unit (e.g., balance work across steps)
• Instruction Fetch
• Write-back step
IR <= Memory[PC];
PC <= PC + 4;
A <= Reg[IR[25:21]];
B <= Reg[IR[20:16]];
ALUOut <= PC + (sign-extend(IR[15:0]) << 2);
• Memory Reference:
• R-type:
ALUOut <= A op B;
• Branch:
The write actually takes place at the end of the cycle on the edge
lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label #assume not
add $t5, $t2, $t3
sw $t5, 8($t3)
Label: ...