Beruflich Dokumente
Kultur Dokumente
Topics Covered
An Overview of Parallel Processing Parallelism in Uniprocessor Systems Organization of Multiprocessor
An Analogy of Parallelism
The task of ordering a shuffled deck of cards by suit and then by rank can be done faster if the task is carried out by two or more people. By splitting up the decks and performing the instructions simultaneously, then at the end combining the partial solutions you have performed parallel processing.
uniprocessor system.
Some examples are the instruction pipeline, arithmetic pipeline, I/O processor.
on the same instruction is not considered parallel. Only if the system processes two different instructions simultaneously can it be considered parallel.
A Reconfigurable Pipeline With Data Flow for the Computation A[i] B[i] * C[i] + D[i]
Data Inputs
0 * LATCH 1 MUX 2 3 S1 S0
0 0
0 | LATCH 1 MUX 2 3 S1 S0
x x
0 + LATCH 1 MUX 2 3 S1 S0
0 1
0 1 MUX 2 3 S1 S0
1 1
To
memory
and registers
Although arithmetic pipelines can perform many iterations of the same operation in parallel, they cannot perform different operations simultaneously. To perform different arithmetic operations in parallel, a CPU may include a vectored arithmetic unit.
Data Inputs
%
AB+C DE-F
11
Was proposed by researcher Michael J. Flynn in 1966. It is the most commonly accepted taxonomy of computer organization. In this classification, computers are classified by whether it processes a single instruction at a time or multiple instructions simultaneously, and whether it operates on one or multiple data sets.
12
SISD
SIMD
MISD
MIMD
4 categories of Flynns classification of multiprocessor systems by their instruction and data streams 13
on individual data values using a single processor. Based on traditional Von Neumann uniprocessor architecture, instructions are executed sequentially or serially, one step after the next. Until most recently, most computers are of SISD type.
14
SISD
Simple Diagrammatic Representation
IS
IS
DS
15
multiple data values simultaneously using many processors. Since there is only one instruction, each processor does not have to fetch and decode each instruction. Instead, a single control unit does the fetch and decoding for all processors. SIMD architectures include array processors.
16
SIMD
Simple Diagrammatic Representation
P
IS
DS
C
P
DS
17
multiprocessors or multicomputers. It may execute multiple instructions simultaneously, contrary to SIMD machines. Each processor must include its own control unit that will assign to the processors parts of a task or a separate task. It has two subclasses: Shared memory and distributed memory
18
MIMD
IS
IS
DS
M
IS
IS
DS
19
20
MISD
IS
IS
DS
M
IS
IS
DS
21
SISD: a single desk SIMD: many desks and a supervisor with a megaphone giving instructions that every desk obeys MIMD: many desks working at their own pace, synchronized through a central database
22
System Topologies
Topologies
A system may also be classified by its topology. A topology is the pattern of connections between processors. The cost-performance tradeoff determines which topologies to use for a multiprocessor system.
23
Topology Classification
A topology is characterized by its diameter,
Diameter the maximum distance between two processors in the computer system. Total bandwidth the capacity of a communications link multiplied by the number of such links in the system. Bisection bandwidth represents the maximum data transfer that could occur at the bottleneck in the topology.
24
System Topologies
Shared Bus
M M M
Topology
Processors communicate with each other via a single bus that can only handle one data transmissions at a time. In most shared buses, processors directly communicate with their own local memory.
25
System Topologies
Ring Topology
Uses direct connections between processors instead of a shared bus. Allows communication links to be active simultaneously but data may have to travel through several processors to reach its destination.
P
26
System Topologies
Tree Topology
Uses direct connections between processors; each having three connections. There is only one unique path between any pair of processors.
27
Systems Topologies
Mesh Topology
In the mesh topology, every processor connects to the processors above and below it, and to its right and left.
28
System Topologies
Hypercube Topology Is a multiple mesh topology. Each processor connects to all other processors whose binary values differ by one bit. For example, processor 0(0000) connects to 1(0001) or 2(0010).
P
P
P
P
P
P
P
P
29
System Topologies
Completely Connected Topology
Every processor has n-1 connections, one to each of the other processors. There is an increase in complexity as the system grows but this offers maximum communication capabilities.
P P P
P P P
30
contrast to its topology, refers to its connections to its system memory. A systems may also be classified by their architectures. Two of these are:
31
Parallel/Vector computers
Parallel computers Execute programs in
MIMD mode Are of two types Shared memory multiprocessors message passing multicomputers The processors in MP communicate through shared variables in common memory
32
pipelines Two families of pipelined vector processor memory to memory register to register
33
Development layers
Applications Programming environment Languages supported Communication model Addressing space
Hardware architecture
34
Clock has cycle time -- Clock rate f = 1/ Size of program is instruction count (Ic) Cycles per instruction(CPI) time need for executing each instruction
35
Performance factors
CPU time needed to execute the program T = Ic x CPI x Or, T = Ic x (p+m+k) x Where p=no. of processor cycle m = no. of memory references k = ratio between memory cycle and processor cycle
36
System attributes
Instruction set architecture Compiler technology CPU implementation and control Cache and memory hierarchy
37
MIPS rate
38
39
multiprocessor, or SMP, that has two or more processors that perform symmetric functions. UMA gives all CPUs equal (uniform) access to all memory locations in shared memory. They interact with shared memory by some communications mechanism like a simple bus or a complex multistage interconnection network.
40
Processor n
41
architectures do not allow uniform access to all shared memory locations. This architecture still allows all processors to access all shared memory locations but in a nonuniform way, each processor can access its local shared memory more quickly than the other memory modules not next to it.
42
Processor 1
Processor 2
Processor n
Memory 1
Memory 2
Memory n
Communications mechanism
43
is a computer memory organization for use in multiprocessors in which the local memories (typically DRAM) at each node are used as cache. This is in contrast to using the local memories as actual main memory, as in NUMA organizations.
44
which distributed memories are converted into caches There is no memory hierarchy at each processor node All caches form a global address space Remote cache access is assisted by the distributed cache directories Depending on interconnection network used sometimes hierarchical directories may be used to help locate copies of cache blocks Initial data placement is not critical because data will eventually migrate to where it will be used.
45
Interconnection network
D C P D C P D
C
P
46
Vector Supercomputers
Epitomized by Cray-1, 1976: Scalar Unit + Vector Extensions Load/Store Architecture Vector Registers Vector Instructions Hardwired Control Highly Pipelined Functional Units Interleaved Memory System No Data Caches No Virtual Memory
47
48
THE END
49