Beruflich Dokumente
Kultur Dokumente
Control
Selecting the operations to perform (ALU, read/write, etc.)
Controlling the flow of data (multiplexor inputs)
Information comes from the 32 bits of the instruction
Example:
add $8, $17, $18
Instruction Format:
000000
10001
10010
01000
op
rs
rt
rd
00000 100000
shamt
funct
op
rs
rt
100
16 bit offset
ALUOp
Instruction
operation
Funct field
Desired ALU
action
ALU control
input
LW
00
load word
XXXXXX
Add
010
SW
00
store word
XXXXXX
Add
010
Branch equal
01
Branch
equal
XXXXXX
Subtract
110
R-type
10
Add
100000
Add
010
R-type
10
Subtract
100010
Subtract
110
R-type
10
AND
100100
And
000
R-type
10
OR
100101
Or
001
R-type
10
Set on less
than
101010
Set on less
than
111
F5
X
X
X
X
X
X
X
F u n c t fie ld
F4 F3 F2 F1
X
X
X
X
X
X
X
X
X
0
0
0
X
0
0
1
X
0
1
0
X
0
1
0
X
1
0
1
O p e r a tio n
F0
X
X
0
0
0
1
0
010
110
010
110
000
001
111
Operation2
F3
Operation
F2
Operation1
F (5 0)
F1
Operation0
F0
rs
rt
rd
shamt
funct
31-26
25-21
20-16
15-11
10-6
5-0
rs
rt
address
31-26
25-21
20-16
15-0
Branch instruction:
rs
rt
address
31-26
25-21
20-16
15-0
10
11
The 16-bit offset for branch equal, load, and store is always in
positions 15-0.
12
Add
ALU
Add result
4
Shift
left 2
RegWrite
rs
Instruction [25 21]
Read
address
Instruction
[31 0]
PC
Instruction
memory
rt load
0
M
u
Instruction [15 11] x
1
rd R-type
Read
register 1
Read
register 2
Address
offset
Read
data 1
MemWrite
ALUSrc
Read
Write
data 2
register
Write
Registers
data
RegDst
Instruction [15 0]
16
Sign
extend
0
M
u
x
1
Zero
ALU ALU
result
MemtoReg
Address
Write
data
32
ALU
control
Read
data
Data
memory
1
M
u
x
0
MemRead
Instruction [5 0]
0
M
u
x
1
ALUOp
2-bit
PC does not require a write control, since it is written once at the end of
every clock cycle. Also, since all the multiplexors have two inputs, they
each require a single control line.
There are seven 1-bit control lines + 2-bit ALUOp control signal.
13
RegDst
RegWrite
None
ALUSrc
PCSrc
MemRead
None
MemWrite
None
MemtoReg
14
The input to the control unit is the 6-bit opcode field from the
instruction; except the PCSrc control line.
For PCSrc control line: an AND gate is used to combine the branch
control signal from the control unit and the Zero output from the
ALU.
The AND gate output controls the selection of the next PC.
15
opcode
Instruction [3126]
R egDst
Branch
MemRead
MemtoReg
Control ALUOp
MemWrite
ALUSrc
RegWrite
Instruction [2521]
PC
Read
address
Instruction
memory
Instruction [1511]
Zero
ALU ALU
result
Address
Read
register 1
Instruction [2016]
Instruction
[310]
ALU
Add result
Shift
left 2
0
M
u
x
1
Read
data1
Read
register 2
Registers Read
Write
data2
register
0
M
u
x
1
Write
data
Write
data
Instruction [150]
16
Sign
extend
Read
data
Data
memory
1
M
u
x
0
32
ALU
control
Instruction [50]
16
Add
4
In s tru c tio n [3 1 2 6 ]
C o n tro l
In s tru c tio n [2 5 2 1 ]
PC
Read
a d d re s s
In s tru c tio n
m e m o ry
In s tru c tio n [1 5 1 1 ]
ALU
re s u lt
A LU
ALU
r e s u lt
S h ift
le ft 2
R e gD st
B ra n c h
M e m R e ad
M e m to R e g
ALUO p
M e m W r ite
A L U S rc
R e g W rite
Read
r e g is te r 1
In s tru c tio n [2 0 1 6 ]
In s tru c tio n
[3 1 0 ]
Add
0
M
u
x
1
R e ad
d a ta 1
Read
r e g is te r 2
R e g is te rs R e a d
W rite
d a ta 2
r e g is te r
Z e ro
0
M
u
x
1
W rite
d a ta
A d d re s s
W rite
d a ta
In s tru c tio n [1 5 0 ]
16
S ig n
e x te n d
Read
d a ta
D a ta
m e m o ry
1
M
u
x
0
32
ALU
c o n tr o l
In s tru c tio n [5 0 ]
17
18
The second and third rows of table in previous slide give the control
signal settings for lw and sw.
ALUSrc and ALUOp fields are set to perform the address
calculation.
MemRead and MemWrite are set to perform the memory access.
RegDst and RegWrite are set for a load to cause the result to be
stored into the rt register.
The branch instruction is similar to an R-format operation, since it
sends the rs and rt registers to the ALU.
ALUOp field for branch is set for a subtract (ALU control = 01),
which is used to test for equality.
Note: MemtoReg field is irrelevant when RegWrite signal is 0
since register is not being written, the value of data on register data
write port is not used. So, entry of MemtoReg in last two rows of
table is replaced with X (Dont care).
Dont care can also be added to RegDst when RegWrite is 0.
19
20
PC
RegDst
Branch
MemRead
MemtoReg
Control ALUOp
MemWrite
ALUSrc
RegWrite
Instruction [2521]
Read
address
Instruction
memory
Instruction [1511]
Zero
ALU ALU
result
Address
Read
register 1
Instruction [2016]
Instruction
[310]
ALU
Add result
Shift
left 2
0
M
u
x
1
Read
data1
Read
register 2
Registers Read
Write
data2
register
0
M
u
x
1
Write
data
Write
data
Instruction [150]
16
Sign
extend
Read
data
Data
memory
1
M
u
x
0
32
ALU
control
Instruction [50]
21
22
ALU
result
Shift
left 2
MemRead
Instruction [31 26]
MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite
Read
register 1
Read
address
Instruction [20 16]
Instruction
[31 0]
Instruction
memory
0
M
u
x
1
Read
data 1
Read
register 2
Registers Read
Write
data 2
register
0
M
u
x
1
Write
data
Zero
ALU ALU
result
Address
Write
data
Instruction [15 0]
16
Sign
extend
Read
data
Data
memory
1
M
u
x
0
32
ALU
control
Instruction [5 0]
23
24
Example:
25
PC
Instruction [1511]
Zero
ALU ALU
result
Address
Read
register 1
Instruction [2016]
Instruction
[310]
Instruction
memory
RegDst
Branch
MemRead
MemtoReg
Control ALUOp
MemWrite
ALUSrc
RegWrite
Instruction [2521]
Read
address
ALU
Add result
Shift
left 2
0
M
u
x
1
Read
data1
Read
register 2
Registers Read
Write
data2
register
0
M
u
x
1
Write
data
Write
data
Instruction [150]
16
Sign
extend
Read
data
Data
memory
1
M
u
x
0
32
ALU
control
Instruction [50]
26
The inputs of the control unit are the 6-bit opcode field.
The encoding for each of the opcodes, both as a decimal number and
as a series of bits that are input to the control unit:
Name
Opcode in
decimal
Opcode in binary
Op5
Op4
Op3
Op2
Op1
Op0
R-format
lw
35
sw
43
beq
27
Inputs
Outputs
R-format
lw
sw
beq
Op5
Signal name
Op4
Op3
Op2
Op1
Op0
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUOp1
ALUOp0
28
opcode
35
43
4
O u tp u ts
R -fo r m a t
Iw
sw
beq
R e gD st
A LU S rc
M e m to R e g
R e g W r ite
M em Read
M e m W r ite
B ra n c h
A LU O p1
A LU O pO
29
Implementing Jumps
The jump instruction looks like a branch instruction but
computes the target PC differently and is not conditional.
Instruction format for jump instruction (opcode = 2):
Field
Bit Position
Address
31-26
25-0
30
Shift
left 2
Jump address
[31-0]
M
u
x
M
u
x
ALU
Add result
Zero
ALU ALU
result
Address
Read
data
28
PC+4 [31-28
Jump
Add
4
Instruction [3126]
Shift
left 2
R egDst
Branch
MemRead
MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite
PC
Instruction[2016]
Instruction
[310]
Instruction
memory
Read
register 1
Instruction[2521]
Read
address
Instruction[1511]
0
M
u
x
1
Read
data1
Read
register 2
Registers Read
Write
data2
register
0
M
u
x
1
Write
data
Write
data
Instruction [150]
16
Sign
extend
Data
memory
1
M
u
x
0
32
ALU
control
Instruction[50]
31
32
33
Assume that the operation times for the major functional units in
the single-cycle implementation are the following:
Memory units: 2 nanoseconds (ns)
ALU and adders: 2 ns
34
Add
ALU
Add result
4
Shift
left 2
RegWrite
Instruction [25 21]
PC
Read
address
Instruction
[31 0]
Instruction
memory
Read
register 1
Read
register 2
Read
data 1
MemWrite
ALUSrc
Read
Write
data 2
register
Write
Registers
data
16
Sign
extend
1
M
u
x
0
1
M
u
x
0
Zero
ALU ALU
result
MemtoReg
Address
Write
data
32
ALU
control
Read
data
Data
memory
1
M
u
x
0
MemRead
Instruction [5 0]
ALUOp
35
36
Answer (Cont.)
The critical path for different instruction classes is as follows:
Instruction
class
ALU type
Instruction
fetch
Register
access
ALU
Register
access
Load word
Instruction
fetch
Register
access
ALU
Memory
access
Store word
Instruction
fetch
Register
access
ALU
Memory
access
Branch
Instruction
fetch
Register
access
ALU
Jump
Instruction
fetch
Register
access
37
Answer (Cont.)
Using these critical paths from the previous slide, we can
compute the required length for each instruction class:
Instruction
class
Instruction
memory
Register
read
ALU
operation
Data
memory
Register
write
Total
ALU type
6 ns
Load word
8 ns
Store word
Branch
Jump
7 ns
5 ns
2 ns
38
Answer (Cont.)
The clock cycle for a machine with a single clock for all
instructions will be determined by the longest
instruction, which is 8 ns.
A machine with a variable clock will have a clock cycle
that varies between 2 ns and 8 ns.
We can find the average clock cycle length for a
machine with a variable-length clock using the
information in previous tables and the instruction
frequency distribution.
The average time per instruction with a variable clock
is:
CPU clock cycle = 8x24% + 7x12% + 6x44% + 5x18% + 2x2%
= 6.3 ns
39
Answer (Cont.)
40
41
General Remarks
With the single-cycle implementation, each functional unit
can be used only once per clock; therefore, some
functional units must be duplicated, raising the cost of the
implementation.
A single-cycle design is inefficient both in its performance
and in its hardware cost!
In the next chapter, we will look at another implementation
technique, called pipelining, that uses a datapath very
similar to the single-cycle datapath, but is much more
efficient.
Pipelining gains efficiency by overlapping the execution of
multiple instructions, increasing hardware utilization and
improving performance.
42