Sie sind auf Seite 1von 49

CS1104: Computer Organisation

http://www.comp.nus.edu.sg/~cs1104
School of Computing
National University of Singapore

PII Lecture 6: Processor:


Datapath and Control
Datapath:
Single-bus Organization
Multiple-bus Organization
MIPS: Multicycle Datapath and Control
Stages of Instructions
Datapath Walkthroughs
Processor and Logic Design

CS1104-P2-6

Processor: Datapath a

PII Lecture 6: Processor:


Datapath and Control
Reading:
Chapter 9 of textbook, which is Chapter 7 in

Computer Organization by Hamacher, Vranesic


and Zaky.
Optional reading: Chapter 5 in Computer
Organization & Design by Patterson and
Hennessy.

CS1104-P2-6

Processor: Datapath a

Datapath

CS1104-P2-6

Processor: Datapath a

Recap: Organisation
Bus

Processor

Memory

Devices

Control
Cache
Datapath

Input
Output

Registers

CS1104-P2-6

Processor: Datapath a

Fundamental Concepts

Processor (CPU): the active part of the

computer, which does all the work (data


manipulation and decision-making).
Datapath: portion of the processor which
contains hardware necessary to perform all
operations required by the computer (the
brawn).
Control: portion of the processor (also in
hardware) which tells the datapath what
needs to be done (the brain).

CS1104-P2-6

Processor: Datapath a

Fundamental Concepts (2)

Instruction execution
cycle: fetch, decode,
execute.
Fetch: fetch next

instruction (using PC)


from memory into IR.
Decode: decode the
instruction.
Execute: execute
instruction.

Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction

CS1104-P2-6

Processor: Datapath a

Fundamental Concepts (3)

Fetch: Fetch next instruction into IR

(Instruction Register).
Assume each word is 4 bytes and each instruction

is stored in a word, and that the memory is byte


addressable.
PC (Program Counter) contains address of next
instruction.
IR [[PC]]
PC [PC] + 4

CS1104-P2-6

Processor: Datapath a

Single-bus Organization
Internal
processor bus

Address line
Memory
bus
Data line

PC

Control signals
...

MAR

Instruction
decoder
and control
logic

MDR

IR

Y
Constant 4

RO
MUX

Select
ALU
control
lines

Add
Sub

:
:
R(n1)

ALU
Carry-in

XOR

TEMP
Z

CS1104-P2-6

Processor: Datapath a

Instruction Execution

An instruction can be executed by performing


one or more of the following operations in
some specified sequence:
Transfer a word of data from one register to

another or to the ALU (Arithmetic Logic Unit).


Perform an arithmetic or a logic operation and
store the result in a register.
Fetch the contents of a given memory location and
load them into a register.
Store a word of data from a register into a given
memory location.

CS1104-P2-6

Processor: Datapath a

10

Register Transfer

Register to register transfer:


For each register Ri, two control signals:

Riin used to load the data on the bus into the register.
Riout to place the registers contents on the bus.
Example: To transfer contents of R1 to R4:
Set R1out to 1. This places contents of R1 on the bus.
Set R4in to 1. This loads data from the processor bus into
R4.

CS1104-P2-6

Processor: Datapath a

11

Register Transfer (2)


Internal
processor bus
Ri in

Y in
X

Ri

Y
Constant 4
Select

X
Ri out

MUX
A

ALU
Z in

X
Z
X
Z out

CS1104-P2-6

Processor: Datapath a

12

Arithmetic/Logic Operation
Internal
processor bus

ALU: Performs
arithmetic and
logic operations
on its A and B
inputs.
Select
To perform
R3 [R1] + [R2]:
1. R1out, Yin
2. R2out, SelectY,
Add, Zin
3. Zout, R3in

Ri in

Y in
X

Ri

Y
Constant 4

X
Ri out

MUX
A

ALU
Z in

X
Z
X
Z out

CS1104-P2-6

Processor: Datapath a

13

Arithmetic/Logic Operation (2)

If there are n operations, do we need n

ALU control lines?


We could use encoding, which requires
log2 n control lines for n operations.
However, this will increase complexity and
hardware (additional decoder needed).
ALU
control
lines

Add
Sub

:
XOR

CS1104-P2-6

ALU
Carry-in

Processor: Datapath a

14

Reading a Word from Memory


Move (R1), R2
1.
2.
3.
4.
5.

/* R2 [[R1]]

MAR [R1]
Start a Read operation on the memory bus
Wait for the MFC response from the memory
Load MDR from the memory bus
R2 [MDR]

MDR has four control signals: MDRin, MDRout, MDRinE


and MDRoutE.

Memory-bus
data lines
MDR inE

Internal
processor bus
MDR in

X
MDR

CS1104-P2-6

MDR outE

MDR out

Processor: Datapath a

15

Reading a Word from Memory (2)


Move (R1), R2 /* R2 [[R1]]
Sequence of control steps:
1. R1out, MARin, Read
2. MDRinE, WMFC
3. MDRout, R2in

WMFC: Wait for arrival of MFC (Memory-FunctionCompleted) signal.

MFC: To accommodate variability in response time,


the processor waits until it receives an indication that
the Read/Write operation has been completed. The
addressed device sets MFC to 1 to indicate this.
CS1104-P2-6

Processor: Datapath a

16

Storing a Word in Memory


Move R2, (R1) /* [R1] [R2]
Sequence of control steps:
1. R1out, MARin
2. R2out, MDRin, Write
3. MDRoutE, WMFC

CS1104-P2-6

Processor: Datapath a

17

Executing a Complete Instruction


Add (R3), R1
/* R1 [R1] + [[R3]]
Adds the contents of a memory location pointed to by

R3 to register R1.
Sequence of control steps:
1. PCout, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin

Steps 1 3:
Instruction
fetch

4. R3out, MARin, Read


5. R1out, Yin, WMFC
6. MDRout, SelectY, Add, Zin
7. Zout, R1in, End

CS1104-P2-6

Processor: Datapath a

18

Multiple-Bus Organization
Single-bus structure: Control sequences are long as

only one data item can be transferred over the bus in


a clock cycle.
Figure on next slide shows a three-bus structure.
All registers are combined into a single block called
register file with three ports: 2 outputs allowing 2
registers to be accessed simultaneously and have
their contents put on buses A and B, and 1 input
allowing data on bus C to be loaded into a third
register.
Buses A and B are used to transfer source operands
to the A and B inputs of ALU, and result transferred to
destination over bus C.

CS1104-P2-6

Processor: Datapath a

19

Multiple-Bus Organization (2)


Bus A Bus B

Bus C

Bus A Bus B

Bus C

Incrementer
Instruction
decoder

PC

IR

Register
file

MUX

Constant 4

MDR
A

ALU

MAR

R
B

Address
line
Memory bus
data lines

CS1104-P2-6

Processor: Datapath a

20

Multiple-Bus Organization (3)


For the ALU, R=A (or R=B) means that its A (or B)

input is passed unmodified to bus C.


Add R4, R5, R6 /* R6 [R4] + [R5]
Adds the contents of R4 and R5 to R6.
Sequence of control steps:
1. PCout, R=B, MARin, Read, IncPC
2. WMFC
3. MDRoutB, R=B, IRin
4. R4outA, R5outB, SelectA, Add, R6in, End

CS1104-P2-6

Processor: Datapath a

21

Control
Hardwired control or microprogrammed control.
Hardwired control:
Clock

CLK

Control step
counter

...

IR

:
:

External
inputs

Condition
codes

Decoder/
encoder

...
Control signals

CS1104-P2-6

Processor:Memory
Datapath
a
bus
data lines

22

Control (2)
Microprogrammed control:
Control signals generated by a program.
Control word (CW) is a microinstruction that contains

individual bits that represent the various control signals.


Vertical organization: highly encoded schemes that use
compact codes to specify only a small number of control
functions in each microinstruction.
Horizontal organization: minimally encoded scheme in
which many resources can be controlled with a single
microinstructions.
Popular in Complex Instruction Set Architectures (CISC)
because complex instruction sets require complex
controllers that can more easily be implemented as
microprograms.

CS1104-P2-6

Processor:Memory
Datapath
a
bus
data lines

23

Control (3)

MDRout, SelectY, Add, Zin

7.

Zout, R1in, End

1
2
3
4
5
6
7

..

0
1
0
0
0
0
0

1
0
0
0
0
0
0

CS1104-P2-6

1
0
0
1
0
0
0

0
0
1
0
0
1
0

1
0
0
1
0
0
0

0
0
1
0
0
0
0

WMFC
End

6.

R1 in
R3 out

R1out, Yin, WMFC

Z out
R1 out

5.

Z in

R3out, MARin, Read

Add

4.

Select

MDRout, IRin

Y in

3.

MDR out
IR jn

Zout, PCin, Yin, WMFC

MAR in
Read

organization scheme:

2.

PC out

PCout, MARin, Read, Select4, Add, Zin

PC in

1.

Microinstruction

Example of a horizontal

0
1
0
0
1
0
0

1
0
0
0
0
0
0

1
0
0
0
0
1
0

1
0
0
0
0
1
0

0
1
0
0
0
0
1

0
0
0
0
0
0
1

0
1
0
0
1
0
0

0
0
0
0
1
0
0

Processor:Memory
Datapath
a
bus
data lines

0
0
0
1
0
0
0

0
0
0
0
0
0
1

..

Select=0: SelectY
Select=1: Select4

24

MIPS: Multicycle Datapath and Control


Adapted from D. Pattersons CS61C
http://www.cs.berkeley.edu/~pattrsn/61CF00
Copyright 2000 UCB

CS1104-P2-6

Processor: Datapath a

25

Stages of a Datapath

Problem: a single, atomic block which

executes an instruction (performs all


necessary operations beginning with fetching
the instruction) would be too bulky and
inefficient.

Solution: break up the process of executing

an instruction into stages, and then connect


the stages to create the whole datapath.
Smaller stages are easier to design.
Easy to optimize (change) one stage without
touching the others.

CS1104-P2-6

Processor: Datapath a

26

Stages of a Datapath (2)

There is a wide variety of MIPS instructions:


so what general steps do they have in
common?

Stages
1.
2.
3.
4.
5.

Instruction Fetch
Instruction Decode
ALU
Memory Access
Register Write

CS1104-P2-6

Processor: Datapath a

27

Stages of a Datapath (3)

Stage 1: Instruction Fetch.


No matter what the instruction is, the 32-bit

instruction word must first be fetched from


memory (the cache-memory hierarchy).
Also, this is where we increment PC
(that is, PC = PC + 4, to point to the next
instruction; byte addressing so + 4).

CS1104-P2-6

Processor: Datapath a

28

Stages of a Datapath (4)

Stage 2: Instruction Decode


Upon fetching the instruction, we next gather data

from the fields (decode all necessary instruction


data).
First, read the opcode to determine instruction
type and field lengths.
Second, read in data from all necessary registers.
For add, read two registers.
For addi, read one register.
For jal, no read necessary.

CS1104-P2-6

Processor: Datapath a

29

Stages of a Datapath (5)

Stage 3: ALU (Arithmetic-Logic Unit)


The real work of most instructions is done here:

arithmetic (+, -, *, /), shifting, logic (&, |),


comparisons (slt).
What about loads and stores?
lw $t0, 40($t1)
The address we are accessing in memory =
the value in $t1 plus the value 40.
We do this addition at this stage.

CS1104-P2-6

Processor: Datapath a

30

Stages of a Datapath (6)

Stage 4: Memory Access


Actually only the load and store instructions do

anything during this stage; for the other


instructions, they remain idle during this stage.
Since these instructions have a unique step, we
need this extra stage to account for them.
As a result of the cache system, this stage is
expected to be just as fast (on average) as the
others.

CS1104-P2-6

Processor: Datapath a

31

Stages of a Datapath (7)

Stage 5: Register Write


Most instructions write the result of some

computation into a register.


Examples: arithmetic, logical, shifts, loads, slt
What about stores, branches, jumps?
They do not write anything into a register at
the end.
These remain idle during this fifth stage.

CS1104-P2-6

Processor: Datapath a

32

+4

1. Instruction
Fetch

CS1104-P2-6

ALU

imm

2. Decode/
Register
Read

Data
memory

rd
rs
rt

registers

PC

instruction
memory

Datapath: Generic Steps

3. Execute 4. Memory 5. Reg.


Write

Processor: Datapath a

33

Datapath Walkthroughs: add

add $r3,$r1,$r2 # r3 = r1+r2


Stage 1: Fetch this instruction, increment PC.
Stage 2: Decode to find that it is an add

instruction, then read registers $r1 and $r2.


Stage 3: Add the two values retrieved in stage 2.
Stage 4: Idle (nothing to write to memory).
Stage 5: Write result of stage 3 into register $r3.

CS1104-P2-6

Processor: Datapath a

34

reg[1]+reg[2]
reg[2]

ALU

Data
memory

reg[1]

imm
add r3, r1, r2

+4

3
1

registers

PC

instruction
memory

Datapath Walkthroughs: add (2)

CS1104-P2-6

Processor: Datapath a

35

Datapath Walkthroughs: slti

slti $r3,$r1,17
Stage 1: Fetch this instruction, increment PC.
Stage 2: Decode to find it is an slti, then read

register $r1.
Stage 3: Compare value retrieved in stage 2 with
the integer 17.
Stage 4: Go idle.
Stage 5: Write the result of stage 3 in register
$r3.

CS1104-P2-6

Processor: Datapath a

36

imm

reg[1]-17
ALU

Data
memory

reg[1]

17

slti r3, r1, 17

+4

x
1

registers

PC

instruction
memory

Datapath Walkthroughs: slti (2)

CS1104-P2-6

Processor: Datapath a

37

Datapath Walkthroughs: sw

sw $r3, 20($r1)
Stage 1: Fetch this instruction, increment PC.
Stage 2: Decode to find it is an sw, then read

registers $r1 and $r3.


Stage 3: Add 20 to value in register $r1 (retrieved
in stage 2).
Stage 4: Write value in register $r3 (retrieved in
stage 2) into memory address computed in stage
3.
Stage 5: Go idle (nothing to write into a register).

CS1104-P2-6

Processor: Datapath a

38

imm

reg[1]+20
reg[3]

20

CS1104-P2-6

Processor: Datapath a

ALU

Data
MEM[r1+20]<-r3 memory

reg[1]

sw r3, 20(r1)

+4

x
1

registers

PC

instruction
memory

Datapath Walkthroughs: sw (2)

39

Why Five Stages?

Could we have a different number of stages?


Yes, and other architectures do.
So why does MIPS have five stages, if
instructions tend to go idle for at least one
stage?
There is one instruction that uses all five stages:
the load.

CS1104-P2-6

Processor: Datapath a

40

Datapath Walkthroughs: lw

lw $r3, 40($r1)
Stage 1: Fetch this instruction, increment PC.
Stage 2: Decode to find it is a lw, then read

register $r1.
Stage 3: Add 40 to value in register $r1 (retrieved
in stage 2).
Stage 4: Read value from memory address
compute in stage 3.
Stage 5: Write value found in stage 4 into register
$r3.

CS1104-P2-6

Processor: Datapath a

41

Datapath Walkthroughs: lw (2)

40

CS1104-P2-6

Processor: Datapath a

ALU

r3<-MEM[r1+40]

imm

reg[1]+40

Data
memory

reg[1]

lw r3, 40(r1)

+4

x
1

registers

PC

instruction
memory

reg[3]

42

What Hardware Is Needed?

PC: a register which keeps track of address

of the next instruction.


General Purpose Registers
Used in stages 2 (read) and 5 (write).
We are currently working with 32 of these.
Memory
Used in stages 1 (fetch) and 4 (R/W).
Cache system makes these two stages as fast as
the others, on average.

CS1104-P2-6

Processor: Datapath a

43

Datapath: Summary
Construct datapath based on register transfers

+4

ALU

Data
memory

rd
rs
rt

registers

instruction
memory

PC

required to perform instructions.


Control part causes the right transfers to happen.

imm
opcode, funct
Controller

CS1104-P2-6

Processor: Datapath a

44

Where is Logic Design Used?

Combinational circuits for


ALU and other parts of the
datapath.

Different control signals are


needed for different clock
cycles and different
instructions for the ALU,
registers and other parts of
the datapath. Sequential
circuits.
CS1104-P2-6

Processor: Datapath a

ALU

ALU Control

45

Where is Logic Design Used? (2)


Start
Instruction fetch/decode and register fetch

Memory access
instructions

R-type
instructions

Branch
instruction

Jump
instruction

High-level view of finite state machine control.


Sequential logic design can be used to assert the
correct control signals at the correct times.

CS1104-P2-6

Processor: Datapath a

46

Summary

Datapath is the hardware that performs

operations necessary to execute programs.


Control instructs datapath on what to do next.
Datapath needs:
access to storage (general purpose registers and

memory)
computational ability (ALU)
helper hardware (local registers and PC)

CS1104-P2-6

Processor: Datapath a

47

Summary (2)

Five stages of datapath (executing an

instruction):
1: Instruction Fetch (Increment PC)
2: Instruction Decode (Read Registers)
3: ALU (Computation)
4: Memory Access
5: Write to Registers
ALL instructions must go through ALL five
stages.
Datapath designed in hardware.

CS1104-P2-6

Processor: Datapath a

48

End of file

CS1104-P2-6

Processor: Datapath a

49

Das könnte Ihnen auch gefallen