Sie sind auf Seite 1von 5

Datorarkitektur I F 6/7- 1

Petru Eles, IDA, LiTH


INTERNAL STRUCTURE AND
FUNCTIONING OF THE CPU
1. Internal Structure of the CPU
2. Register Organization
3. The Instruction Cycle
4. Instruction Pipelining
5. Pipeline Hazards
6. Structural Hazards
7. Data Hazards
8. Control Hazards
Datorarkitektur I F 6/7- 2
Petru Eles, IDA, LiTH
Internal Structure of the CPU
ALU
Control
Unit
R
e
g
i
s
t
e
r
s
IR
PC
C
o
n
t
r
o
l

L
i
n
e
s
D
a
t
a
L
i
n
e
s
A
d
d
r
e
s
s

L
i
n
e
s
Internal
CPU Bus
CPU
System Bus
Datorarkitektur I F 6/7- 3
Petru Eles, IDA, LiTH
Register Organization
The set of registers within the CPU represents the
top level of the memory hierarchy inside the
computer system
- User visible registers: can be accessed by as-
sembly language programmers.
- Control and Status registers: used by the Con-
trol Unit to control the operation of the CPU; not
directly accessible by the programmer.
User Visible Registers
Some architectures provide a set of registers which
can be used without restrictions as operands for
any opcode and as address registers; these are so
called general-purpose registers.
Often the architecture creates a separation between:
- data registers: can be used to hold only data.
Some architectures impose restrictions to the
use of data registers: for example there can be
disjoint sets of registers for integer and for oat-
ing point computation.
- address registers: registers used only for ad-
dress representation and computation: base
registers, index registers, stack pointer, etc. In
some architectures address registers can be
specialized for some of the previous functions.
Datorarkitektur I F 6/7- 4
Petru Eles, IDA, LiTH
Some Trade-offs
A large number of general purpose registers
large number of bits for encoding register operands;
specialization of registers reduces this need.
Too small number of registers creates problems to
the programmer and leads to an increased memory
trafc.
The number of general-purpose or data registers is often
between 8 - 32.
RISC processors often have a very large number of
registers (~ 100).
Control and Status Registers
Program Counter (PC): holds the address of the
instruction to be fetched.
Instruction Register (IR): holds the last instruction
fetched.
Memory Address Register (MAR): holds the
address of a memory location that is to be read or
written.
Memory Buffer Register (MBR): holds the data to
be written to memory or the data most recently
read.
Program Status Word (PSW): Condition Code
Flags + other bits dening the status of the CPU
(interrupt enabled/disabled, supervisor, etc.)
Datorarkitektur I F 6/7- 5
Petru Eles, IDA, LiTH
Some Examples of Register Organizations
Z8000: 16 General purpose registers; no
restrictions in use
Intel 80X86, Pentium: 4 Data registers
4 Index&address registers
4 Base (segment) registers
Some of the Address registers can also be used for
general purpose
PowerPC: 2 groups of General purpose registers,
each of 32 registers; one group is for
integer (xed point) computation, the
other one for oating point computation.
Datorarkitektur I F 6/7- 6
Petru Eles, IDA, LiTH
The Instruction Cycle
Fetch
instruction
Decode
Fetch
operand
Execute
instruction
FI
DI
- Calculate operand address (CO)
- Fetch operand (FO)
- Execute instruction (EI)
- Write back operand (WO)
Datorarkitektur I F 6/7- 7
Petru Eles, IDA, LiTH
Instruction Pipelining
Instruction execution is extremely complex and
involves several operations which are executed
successively (see slide 6). This implies a large
amount of hardware, but only one part of this
hardware works at a given moment.
Pipelining is an implementation technique whereby
multiple instructions are overlapped in execution.
This is solved without additional hardware but only
by letting different parts of the hardware work for
different instructions at the same time.
The pipeline organization of a CPU is similar to an
assembly line: the work to be done in an instruction
is broken into smaller steps (pieces), each of which
takes a fraction of the time needed to complete the
entire instruction. Each of these steps is a pipe
stage (or a pipe segment).
Pipe stages are connected to form a pipe:
The time required for moving an instruction from
one stage to the next: a machine cycle (often this is
one clock cycle). The execution of one instruction
takes several machine cycles as it passes through
the pipeline.
Stage 1 Stage 2 Stage n
Datorarkitektur I F 6/7- 8
Petru Eles, IDA, LiTH
Acceleration by Pipelining
Two stage pipeline: FI: fetch instruction
EI: execute instruction
We consider that each instruction takes execution time T
ex
.
Execution time for the 7 instructions, with pipelining:
(T
ex
/2)*8= 4*T
ex
FI EI
FI EI
FI EI
FI EI
FI EI
FI EI
FI EI
1 2 8 3 4 5 6 7 Clock cycle
Instr. i
Instr. i+1
Instr. i+2
Instr. i+3
Instr. i+4
Instr. i+5
Instr. i+6
Datorarkitektur I F 6/7- 9
Petru Eles, IDA, LiTH
Acceleration by Pipelining (contd)
Six stage pipeline (see also slide 6):
FI: fetch instruction FO: fetch operand
DI: decode instruction EI: execute instruction
CO: calculate operand address WO:write operand
Execution time for the 7 instructions, with pipelining:
(T
ex
/6)*12= 2*T
ex
After a certain time (N-1 cycles) all the N stages of
the pipeline are working: the pipeline is lled. Now,
theoretically, the pipeline works providing maximal
parallelism(N instructions are active simultaneously).
FI DI
1 2 8 3 4 5 6 7 Clock cycle
Instr. i
Instr. i+1
Instr. i+2
Instr. i+3
Instr. i+4
Instr. i+5
Instr. i+6
COFO EI WO
FI DI COFO EI WO
FI DI COFO EI WO
FI DI COFO EI WO
FI DI COFO EI WO
FI DI COFO EI WO
FI DI COFO EI WO
9 10 11 12
Datorarkitektur I F 6/7- 10
Petru Eles, IDA, LiTH
Acceleration by Pipelining (contd)
Apparently a greater number of stages always
provides better performance. However:
- a greater number of stages increases the over-
head in moving information between stages
and synchronization between stages.
- with the number of stages the complexity of the
CPU grows.
- it is difcult to keep a large pipeline at maximum
rate because of pipeline hazards.
80486 and Pentium: five-stage pipeline for integer instr.
eight-stage pipeline for FP instr.
PowerPC: four-stage pipeline for integer instr.
six-stage pipeline for FP instr.
Datorarkitektur I F 6/7- 11
Petru Eles, IDA, LiTH
Pipeline Hazards
Pipeline hazards are situations that prevent the
next instruction in the instruction stream from
executing during its designated clock cycle. The
instruction is said to be stalled. When an instruction
is stalled, all instructions later in the pipeline than
the stalled instruction are also stalled. Instructions
earlier than the stalled one can continue. No new
instructions are fetched during the stall.
Types of hazards:
1. Structural hazards
2. Data hazards
3. Control hazards
Datorarkitektur I F 6/7- 12
Petru Eles, IDA, LiTH
Structural Hazards
Structural hazards occur when a certain resource
(memory, functional unit) is requested by more than
one instruction at the same time.
Instruction ADD R4,X fetches in the FO stage operand X
from memory. The memory doesnt accept another
access during that cycle.
Penalty: 1 cycle
Certain resources are duplicated in order to avoid
structural hazards. Functional units (ALU, FP unit)
can be pipelined themselves in order to support
several instructions at a time. A classical way to
avoid hazards at memory access is by providing
separate data and instruction caches.
FI DI
1 2 8 3 4 5 6 7 Clock cycle
ADD R4,X
Instr. i+1
Instr. i+2
Instr. i+3
Instr. i+4
COFO EI WO
FI DI COFO EI WO
FI DI COFO EI WO
FI DI COFO EI WO
FI DI COFO EI WO
9 10 11 12
stall
Datorarkitektur I F 6/7- 13
Petru Eles, IDA, LiTH
Data Hazards
We have two instructions, I1 and I2. In a pipeline
the execution of I2 can start before I1 has
terminated. If in a certain stage of the pipeline, I2
needs the result produced by I1, but this result has
not yet been generated, we have a data hazard.
I1: MUL R2,R3 R2 R2 * R3
I2: ADD R1,R2 R1 R1 + R2
Before executing its FO stage, the ADD instruction is
stalled until the MUL instruction has written the result
into R2.
Penalty: 2 cycles
FI DI
1 2 8 3 4 5 6 7 Clock cycle
MUL R2,R3
ADD R1,R2
Instr. i+2
COFO EI WO
FI DI CO FO EI WO
9 10 11 12
stall stall
FI DI COFO EI WO
Datorarkitektur I F 6/7- 14
Petru Eles, IDA, LiTH
Data Hazards (contd)
Some of the penalty produced by data hazards can
be avoided using a technique called forwarding
(bypassing).
The ALU result is always fed back to the ALU input.
If the hardware detects that the value needed for the
current operation is the one produced by the previous
operation (but which has not yet been written back)
it selects the forwarded result as the ALU input, in-
stead of the value read from register or memory.
After the EI stage of the MUL instruction the result is avail-
able by forwarding. The penalty is reduced to one cycle.
MUX MUX
ALU
to register or memory
from reg-
ister or
memory
from reg-
ister or
memory
bypass
path
bypass
path
FI DI
1 2 8 3 4 5 6 7 Clock cycle
MUL R2,R3
ADD R1,R2
COFO EI WO
FI DI CO
9 10 11 12
stall FO EI WO
Datorarkitektur I F 6/7- 15
Petru Eles, IDA, LiTH
Control Hazards
Control hazards are produced by branch
instructions.
Unconditional branch
- - - - - - - - - - - - - -
BR TARGET
- - - - - - - - - - - - - -
TARGET - - - - - - - - - - - - - -
Penalty: 3 cycles
FI DI
1 2 8 3 4 5 6 7 Clock cycle
BR TARGET
target
target+1
COFO EI WO
FI
FI DI COFO EI WO
9 10 11 12
stall stall FI DI COFO EI WO
The instruction following the
branch is fetched; before
the DI is nished it is not
known that a branch is exe-
cuted. Later the fetched in-
struction is discarded
After the FO stage of the
branch instruction the
address of the target is
known and it can be fetched
Datorarkitektur I F 6/7- 16
Petru Eles, IDA, LiTH
Control Hazards (contd)
Conditional branch
ADD R1,R2 R1 R1 + R2
BEZ TARGET branch if zero
instruction i+1
- - - - - - - - - - - - -
TARGET - - - - - - - - - - - - -
Branch is taken
Penalty: 3 cycles
Branch not taken
Penalty: 2 cycles
FI DI
1 2 8 3 4 5 6 7 Clock cycle
ADD R1,R2
BEZ TARGET
target
COFO EI WO
FI DI COFO EI WO
9 10 11 12
FI stall stall FI DI COFO EI WO
At this moment, both the
condition (set by ADD) and
the target address are known.
FI DI
1 2 8 3 4 5 6 7 Clock cycle
ADD R1,R2
BEZ TARGET
instr i+1
COFO EI WO
FI DI COFO EI WO
9 10 11 12
FI stall stall DI COFO EI WO
At this moment the condition is
known and instr+1 can go on.
Datorarkitektur I F 6/7- 17
Petru Eles, IDA, LiTH
Control Hazards (contd)
With conditional branch we have a penalty even if
the branch has not been taken. This is because we
have to wait until the branch condition is available.
Branch instructions represent a major problem in
assuring an optimal ow through the pipeline.
Several approaches have been taken for reducing
branch penalties (see slides of the following
lecture).
Datorarkitektur I F 6/7- 18
Petru Eles, IDA, LiTH
Summary
The main components of the CPU are: Control
Unit, ALU and Register set. They are
interconnected through the internal CPU Bus.
Interconnection with external modules is through
the System Bus. Control signals issued by the
Control Unit coordinate the functionality and data
ow inside the CPU and between CPU and external
modules.
The register set id the top level of the memory
hierarchy. Only a part of the registers is user visible.
User visible registers can be general-purpose or
specialised.
Instructions are executed by the CPU as a
sequence of steps. Instruction execution can be
substantially accelerated by instruction pipelining.
A pipeline is organized as a succession of N
stages. At a certain moment N instructions can be
active inside the pipeline.
Keeping a pipeline at its maximal rate is prevented
by pipeline hazards. Structural hazards are due to
resource conicts. Data hazards are produced by
data dependencies between instructions. Control
hazards are produced as consequence of branch
instructions

Das könnte Ihnen auch gefallen