Beruflich Dokumente
Kultur Dokumente
SISD
Single processor executes a single instruction stream to operate on a data stored in single memory Uni-processor
SIMD
Single machine instruction controls simultaneous execution of number of processing elements Each instruction is executed on different set of data by different processors Vector and array processors
MISD
Sequence of data transmitted to set of processors Each processor executes different instruction sequence Never been implemented
MIMD
Set of processors simultaneously execute different instruction sequences on different sets of data SMPs, clusters and NUMA systems
MIMD - Overview
General purpose processors Each can process all instructions necessary Further classified by method of processor communication
NUMA
Non uniform memory access Access times to different regions of memory may differ
7
Parallel Organizations
SISD
SIMD
10
11
Symmetric Multiprocessors
A stand alone computer with the following characteristics
Two or more similar processors of comparable capacity Processors share same memory and are connected by a bus or other internal connection such that memory access time is approximately the same for each processor All processors share access to I/O All processors can perform the same functions (symmetric) System controlled by integrated operating system
Provide interaction between processors and their programs
12
13
SMP Advantages
Performance
If some work can be done in parallel
Availability
Since all processors can perform the same functions, failure of a single processor does not halt the system
Incremental growth
Increase performance by adding additional processors
Scaling
Vendors can offer range of products based on number of processors
14
15
16
17
Flexibility
Easy to expand the system by attaching more processors to the bus.
Reliability
Bus is a passive medium, and the failure of any attached device should not cause failure of the whole system
18
19
Vector Computation
High precision repeated floating point calculations on large arrays of numbers Supercomputers handle these types of problem
Hundreds of millions of floating point operations $10-15 million Optimised for calculation Limited market Research, government agencies, meteorology
20
21
22
Processor Designs
Pipelined ALU
Decomposing of floating point operations into stages Different stages can operate on different sets of data llly
Can be further enhanced if the vector elements are available in registers rather than from main memory Within operations Across operations
23
24
Chaining
Cray Supercomputers Vector operation may start as soon as first element of operand vector available and functional unit is free Result from one functional unit is fed immediately into another If vector registers used, intermediate results do not have to be stored in memory
25
Parallel ALUs
Parallel processors
break the task up into multiple processes to be executed in parallel effective only if the software and hardware for effective coordination of parallel processors
26
27
OS
OS is a program that controls the execution of application programs and acts as an interface between the user and the hardware Manages the computers resources, Provides services for programmers, and Schedules the execution of other programs.
28
Efficiency
Allowing better use of computer resources
29
30
31
32
33
Early Systems
Late 1940s to mid 1950s No Operating System Programs interact directly with hardware Two main problems:
Scheduling Setup time
34
35
36
Timer
To prevent a job using the system
Privileged instructions
Only executed by Monitor e.g. I/O
Interrupts
Allows regaining control from user program
37
38
Single Program
39
40
41
42
Scheduling
Key to multi-programming Types
Long term Medium term Short term I/O
43
44
45
46
47
PCB Diagram
48
Scheduling Example
49
50
Process Scheduling
51
52
Cache memory
IBM S/360 model 85 in 1969
Pipelining
Introduces parallelism into fetch execute cycle
Multiple processors
53
54
Comparison of processors
55
Leads to:
Large instruction sets More addressing modes CASE machine instruction on the VAX
e.g. CASE (switch) on VAX
56
Execution Characteristics
Studies have been done to determine the characteristics of execution of machine instructions generated from HLL programs Different approach: namely, to make the architecture that supports the HLL simpler Operations performed Operands used Execution sequencing
57
Operations
Assignments - predominate
Movement of data is of high importance
58
59
Operands
Mainly local scalar variables Optimisation should concentrate on accessing local variables
60
Procedure Calls
Very time consuming Depends on number of parameters passed Depends on level of nesting Most programs do not do a lot of calls followed by lots of returns
61
Implications
Attempt to make the instruction set architecture close to HLLs is not the most effective Best support is given by optimising most used and most time consuming features Large number of registers Careful design of pipelines Simplified (reduced) instruction set
62
Smaller programs?
Program takes up less memory but Memory is now cheap May not occupy less bits, just look shorter in symbolic form
63
It is far from clear that a trend to increasingly complex instruction sets is appropriate
64
RISC Characteristics
One instruction per cycle Register to register operations Few, simple addressing modes Few, simple instruction formats
65
RISC v CISC
Not clear cut Many designs borrow from both philosophies e.g. PowerPC and Pentium II
66
RISC Pipelining
Most instructions are register to register Two phases of execution
I: Instruction fetch E: Execute
ALU operation with register input and output
D: Memory
Register to memory or memory to register operation
67
Effects of Pipelining
68
Delayed branch
Optimization of Pipelining
makes use of a branch that does not take effect until after execution of the following instruction Delayed Load
Register to be target is locked by processor Continue execution of instruction stream until register required Idle until load complete Re-arranging instructions can allow useful work Replicate body of loop a number of times Reduces loop overhead Increases instruction parallelism Improved register, data cache or TLB locality
Loop Unrolling
69
Delayed branch
70
71
72
Controversy
Quantitative
compare program sizes and execution speeds
Qualitative
examine issues of high level language support
Problems
No pair of RISC and CISC that are directly comparable No definitive set of test programs Most comparisons done on toy machines rather than production machines Most commercial devices are a mixture
73
74
Micro-Operations
A computer executes a program Fetch/execute cycle Each cycle has a number of steps
pipelining
75
76
Fetch - 4 Registers
Memory Address Register (MAR)
Connected to address bus Specifies address for read or write op
Fetch Sequence
Address of next instruction is in PC and it is moved to MAR Control unit issues READ command Result (data from memory) appears on data bus Data from data bus copied into MBR PC incremented by 1 (in parallel with data fetch from memory) Data (instruction) moved from MBR to IR
78
second and third micro-operations both take place during the second time unit
79
80
Indirect Cycle
MBR contains an address
81
Interrupt Cycle
This is a minimum
May be additional micro-ops to get addresses Saving context is done by interrupt handler routine, not micro-ops
82
83
Instruction Cycle
Each phase decomposed into sequence of elementary micro-operations E.g. fetch, indirect, and interrupt cycles Assume new 2-bit register
Instruction cycle code (ICC) designates which part of cycle processor is in
00: Fetch 01: Indirect 10: Execute 11: Interrupt
84
85
Functional Requirements
Define basic elements of processor Describe micro-operations processor performs Determine the functions that the control unit must perform to cause the micro-operations to be performed
86
87
Types of Micro-operation
Transfer data between registers Transfer data from register to external interface Transfer data from external interface to register Perform arithmetic or logical operations
88
Execution
Causing the performance of each micro-op
89
Control Signals
Clock
This is how the control unit keeps time.
Instruction register
Op-code for current instruction Determines which micro-instructions are performed
Flags
Status of CPU Results of previous ALU operations
91
To control bus
To memory To I/O modules
92
Implementation
two categories:
Hardwired implementation Microprogrammed implementation
In hardwired , the control unit is essentially a combinational circuit. Input logic signals are transformed into a set of output logic signals, which are the control signals
93
94
CPU Structure
CPU must:
Fetch instructions Interpret instructions Fetch data Process data Write data
95
96
97
Registers
CPU must have some working space (temporary storage) called registers Number and function vary between processor Top level of memory hierarchy Perform two roles:
User-visible registers Control and status registers
98
99
Addressing
Segment registers Index registers Stack pointer
100
How big?
Large enough to hold full address Large enough to hold full word Often possible to combine two data registers
101
102
103
Other Registers
May have registers pointing to:
Process control blocks Interrupt Vectors
105
106
Indirect Cycle
May require memory access to fetch operands Indirect addressing requires more memory accesses Can be thought of as additional instruction subcycle
107
108
109
110
111
112
Pipelining
Fetch instruction Decode instruction Calculate operands (i.e. EAs) Fetch operands Execute instructions Write result
113
114
Timing Diagram
115
116
117