Topic 4 Processor Performance

Topic 4 Processor Performance
AH Computing
Introduction
6502
8 bit processor, 16 bit address bus
Intel8086/88 (1979)
IBM PC
16-bit data and address buses
Motorola 68000
16-bit data and 24-bit address
PowerPC (1992)
Incorporated pipelining and superscaling
8086
Introduction
Technological developments
RISC processors
SIMD
Pipelining
Superscalar processing
CISC and RISC
CISC- Complex Instruction Set Computer
Memory in those days was expensive

bigger program->more storage->more money
Hence needed to reduce the number of instructions
per program
Number of instructions are reduced by having multiple
operations within a single instruction
Multiple operations lead to many different kinds of
instructions that access memory
In turn making instruction length variable and fetch-
decode-execute time unpredictable making it more
complex
Thus hardware handles the complexity
Example: x86 ISA
CISC
CISC Language Development

Increase instruction size of instruction
sets (by providing more operations)
Design ever more complex instructions
Provide more addressing modes
Implement some HLL constructs in
machine instruction sets
CISC
Intel 8086, 80286, 80386, 80486, Pentium
The logic for each instruction has to be hard-
wired into the control unit
As new instructions developed they were
added to original instructions set
Difficult and expensive to design and build
One way of solving this problem is to use
microprogramming
CISC
Microprogramming complex
instructions are split into a series of
simpler instructions
When a complex instruction is executed,
the CPU executes a small microprogram
stored in a control memory
This simplifies design of processor and
allows the addition of new complex
instructions
RISC
Attempt to make architecture simpler

Reduced number of instructions
Make them all the same format if poss.
Reduce the number of memory accesses
required by increasing the number of
registers
Reduce the number of addressing modes
Allow pipelining of instructions
RISC
The characteristics of most RISC

processors are
A large number of GP registers
A small number of simple instructions
that mostly have the same format
A minimal number of addressing modes
Optimisation of instruction pipeline
RISC
CISC processor RISC processor
Intel 80486 Sun SPARC
Year developed 1989 1987
No. instructions 235 69
Instruction Size 1-11 4

(bytes)
Addressing 11 1
modes
GP Registers 8 40-520
RISC in the Home
Your home is likely to have many devices with
RISC-based processors.
Devices using RISC-based processors include
the Nintendo Wii, Microsoft Xbox 360, Sony
PlayStation3, Nintendo DS and many
televisions and phones.
However, x86 processors--those found in
nearly all of the world's personal computers--
are CISC. This is a limitation born of necessity;
adopting a new instruction set for PC
processors would mean that all the software
used in PCs would no longer function.
Scholar Activity
Characteristics
of RISC processor
Review Questions
Q6 7
2010 14a-c
Parallel Processing
At least two microprocessors handle parts of an
overall task.
A computer scientist divides a complex problem into
component parts using special software specifically
designed for the task.
He or she then assigns each component part to a
dedicated processor.
Each processor solves its part of the overall
computational problem.
The software reassembles the data to reach the end
conclusion of the original complex problem.
Single Instruction, Single Data (SISD)
computers have one processor that handles
one algorithm using one source of data at a
time. The computer tackles and processes
each task in order, and so sometimes people
use the word "sequential" to describe SISD
computers. They aren't capable of performing
parallel processing on their own.
SIMD
Single Instruction, Multiple Data (SIMD)
computers have several processors that follow
the same set of instructions, but each
processor inputs different data into those
instructions. SIMD computers run different
data through the same algorithm. This can be
useful for analyzing large chunks of data
based on the same criteria. Many complex
computational problems don't fit this model.
SIMD
A single computer instruction performing the same
identical action (retrieve, calculate, or store)
simultaneously on two or more pieces of data.
Typically this consists of many simple processors,
each with a local memory in which it keeps the data
which it will work on.
Each processor simultaneously performs the same
instruction on its local data progressing through the
instructions in lock-step, with the instructions issued by
the controller processor.
The processors can communicate with each other in
order to perform shifts and other array operations.
SIMD
SIMD Example
A classic example of data parallelism is

inverting an RGB picture to produce its
negative.
You have to iterate through an array of
uniform integer values (pixels), and
perform the same operation (inversion)
on each one
multiple data points, a single
operation.
MMX (implementation of SIMD)
Short for Multimedia Extensions, a set of 57
multimedia instructions built into Intel
microprocessors and other x86-compatible
microprocessors.
MMX-enabled microprocessors can handle
many common multimedia operations, such as
digital signal processing (DSP), that are
normally handled by a separate sound or
video card.
SIMD
ThePentium III chip introduced eight

128 bit registers which could be
operated on by the SIMD instructions
SIMD
The Motorola Power

PC 7400 chips used in
Apple G4 computers
also provided SIMD
instructions, which can
operate on multiple
data items held in 32
128-bit registers.
SIMD
Huge impact on the processing of

multimedia data
Improves performance on any type of
processing which requires the same
instruction to be applied to multiple data
items
Other examples - voice-to-text
processing, data encryption/decryption
SIMD PP Questions
2008 Q15
Pipelining
Instruction pipelining = assembly line

the processor works on different steps of
the instruction at the same time,
more instructions can be executed in a
shorter period of time.
Analogy washing, drying and folding
clothes
Analogy washing, drying and folding
clothes
Execution of instructions without a
pipeline
fetch decode execute

fetch decode exec
time
Execution of instructions with a
pipeline
time
Example - 5 Stage Pipeline
1. Instruction fetch (IF)

2. Instruction Decode (ID)
3. Execution (EX)
4. Memory Read/Write (MEM)
5. Result Writeback (WB)
All modern processors operate pipelining with

5 or more stages
Example - 5 Stage Pipeline
Problems with Pipelining
Ledto an increase in performance

Works best when
all instructions are the same length and
follow in direct sequence
Not always the case!

Problems with Pipelining
3 problems that can arise during pipelining

Varying instruction lengths
Data Dependency
Branch instructions
Problems with Pipelining 1
Instruction Length
In CISC-based designs, instructions can
vary in length
A long slow instructions can hold up the
pipeline
Less of a problem in RISC-based
designs as most instructions are fairly
short
Data dependency
If one instruction relies on the result
produced by a previous instruction
Data required for the 2 nd instruction may
not yet be available because the 1st
instruction is still being executed
Pipeline must be stalled until data is
ready for the 2nd instruction
Branch instructions
BCC 25 - branch 25 bytes ahead if the
carry flag is clear
If the carry flag is set, the next
instructions is carried out as normal
If the carry flag is clear then the
instruction 25 bytes ahead is next
Instruction 3 is a
Branch Instruction
requiring a jump to
instruction 15 so 4
instructions are
flushed from the
pipeline
Optimising the Pipeline
Techniques include
Branch prediction
Data flow analysis
Speculative loading of data
Speculative execution of instructions
Predication
Branch prediction
Some processors predict branch "taken"
for some op-codes and "not taken" for
others.
The most effective approaches,
however, use dynamic techniques.
Branch Prediction - Example

Many branch instructions are repeated often in a
program (e.g. the branch instruction at the end of
a loop).
The processor can then note whether or not the
branch was "taken" previously, and assume that
the same will happen this time.
This requires the use of a branch history table, a
small area of cache memory, to record the
information. This method is used in the
AMD29000 processor.
Data Flow Analysis

Used to overcome dependency
Processor analyses instructions for
dependency
Then allocates instructions to the
pipeline in an order which prevents
dependency stalling the flow
Speculative loading of data

Processor looks ahead and processes
early any instructions which load data
from memory
Data stored in registers for later use (if
required)
Discarded if not required
Speculative execution
Processor carries out instructions before
they are required
Results stored in temporary registers
Discarded if not required
Predication
Tackles conditional branches by
executing instructions from both
branches until it knows which branch is
to be taken
All of these techniques are possible due to

The increasing speeds
The increasing complexity
The increasing numbers of processors
available in modern processors
Pipelining PP Questions
2010 Q11b,c, 13a,b

2009 13f
2008 14c 16e
2007 16 a,b,c,d
2006 18a,b,d
2011 13b,c
Superscalar Processing
More than one pipeline within the

processor
Pipelines can work independently
Superscalar processors try to take
advantage of instruction-level parallelism
A superscalar CPU architecture implements a
form of parallelism called instruction-level
parallelism within a single processor.
It thereby allows faster CPU throughput than
would otherwise be possible at the same clock
rate.
A superscalar processor executes more than
one instruction during a clock cycle by
simultaneously dispatching multiple
instructions to redundant functional units on
the processor.
Tryto take advantage of instruction-level

parallelism
The degree to which instructions in a
program can be executed in parallel
a= a + 2
Can be executed in parallel
b= b + c
a= a + 2 Cannot be executed in parallel
b= a + c Why?
Whileearly superscalar CPUs would

have two ALUs and a single FPU, a
modern design such as the PowerPC
970 includes four ALUs, two FPUs, and
two SIMD units.
Scholar Activity
Review Questions
Q8-14
2011 13a,b,c
2009 13e
2008 17
2006 18c

Topic 4 Processor Performance

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Topic 4 Processor Performance

Hochgeladen von

Copyright:

Verfügbare Formate

Topic 4 Processor Performance

Memory in those days was expensive

CISC Language Development

Attempt to make architecture simpler

The characteristics of most RISC

No. instructions 235 69

Instruction Size 1-11 4

A classic example of data parallelism is

ThePentium III chip introduced eight

The Motorola Power

Huge impact on the processing of

Instruction pipelining = assembly line

fetch decode execute

fetch decode execute

fetch decode execute

fetch decode execute

1. Instruction fetch (IF)

All modern processors operate pipelining with

Ledto an increase in performance

Not always the case!

3 problems that can arise during pipelining

Branch Prediction - Example

Data Flow Analysis

Speculative loading of data

All of these techniques are possible due to

2010 Q11b,c, 13a,b

More than one pipeline within the

Tryto take advantage of instruction-level

Whileearly superscalar CPUs would

Das könnte Ihnen auch gefallen