Beruflich Dokumente
Kultur Dokumente
Session 2
Multiprocessor and Multicomputer Models
Speedup
For a fixed problem size (input data set):
1
Performance =
time
Performance (p processors)
Speedup (p processors) =
Performance (1 processor)
Time (1 processor)
Speedup (p processors) =
Time (p processors)
Parallel Computers (MIMD)
• Two major classes:
1. Shared-memory multiprocessors
2. Message-passing multicomputers
• Major difference:
– Memory sharing
– Inter processor communication mechanism
Memory Sharing
• Multiprocessor systems:
– All processors have access to a common
memory
• Multicomputer systems:
– Each computer node has a local memory
which is not shared with other nodes
Processor Communication
• Multiprocessor systems:
– Processors communicate through shared
variables in a common memory
• Multicomputer systems:
– Processors communicate through message
passing among the nodes
Shared-Memory Multiprocessors
• Three models of shared-memory
multiprocessors:
1. Uniform Memory Access (UMA)
2. Nonuniform Memory Access (NUMA)
3. Cache-Only Memory Architecture (COMA)
The UMA Model
Processors
P1 P2 … Pn
System Interconnect
(Bus, Crossbar, Multistage Network)
LM 1 P1
LM 2 P2 Interconnection
Network
…
…
…
LM n Pn
Cluster 1
…
GIN
…
P P … P
CIN
D D D Directory
C C … C Cache
P P P Processor
M P P M
Message-Passing
…
…
Interconnection
Network
M P P M
P P
…
M M
M P P M
Message-Passing
…
…
Interconnection
Network
M P P M
P P
…
M M
Mass Host
Storage Computer I/O
Scalar Functional
Pipeline
Vector Processor
Vector
Scalar Instructions Vector
Control Unit Control Unit
…
Data Registers
Vector
Main Memory Data Vector Func. Pipe.
Program and Data
• Program and data are first loaded into the main memory through a host computer
• All instructions are first decoded by the scalar control unit
• If the instruction is a scalar operation it will be executed by the scalar processor
• If the instruction is decoded as a vector operation, it will be sent to the vector control unit
• The vector control unit will supervise the flow of vector data between main memory and vector
functional pipelines
• A number of vector pipeline functional units may be built into a vector processor
Vector Processor Models
…
Data Registers
Vector Func. Pipe.
• Vector registers are used to hold vector operands and intermediate and final
results
• The vector functional pipelines retrieve operands from the vector registers
• Results are written back into the vector registers by the vector functional
pipelines
• The length of each vector register is usually fixed
• In some cases the length is reconfigurable
• In general there are fixed number of vector registers and functional
pipelines in a vector processor
• Both resources must be reserved in advance to avoid resource conflicts
between different vector operations
Memory-to-Memory Architecture
• A vector stream unit replaces the vector
registers
• Vector operands and results are directly
retrieved from the main memory in
superwords (for example 512 bits)
SIMD Supercomputers
Control Unit
PE 0 PE 1 PE N-1
Interconnection Network
An operational model of SIMD supercomputers is specified by:
…
– Compute
– Write memory
• The operation is synchronized
Pn
Shared Memory Access Methods in
PRAM Models
How concurrent read and concurrent write of memory are
handled?
Four options:
• ER: Exclusive Read
– At most one processor can read from a memory location in a cycle
• EW: Exclusive Write
– At most one processor can write into a memory location in a cycle
• CR: Concurrent Read
– Multiple processors can read the same information from the same
memory location in one cycle
• CW: Concurrent Write
– Multiple processors can write to the same memory location in one
cycle
• Policy needed to resolve the write conflicts
PRAM Variants
• EREW-PRAM Model:
– Exclusive Read, Exclusive Write
• Forbids more than one processor from reading or
writing the same memory cell simultaneously
• CREW-PRAM Model:
– Concurrent Read, Exclusive Write
• ERCW-PRAM Model:
– Exclusive Read, Concurrent Write
• CRCW-PRAM Model:
– Concurrent Read, Concurrent Write
Concurrent Write Conflicts
• Policies to resolve conflicting writes:
• Common:
– All simultaneous writes store the same value to the hot-
spot memory location
• Arbitrary:
– Any one of the values written may remain and is
acceptable, the others are ignored
• Minimum:
– The value with the minimum index will remain
• Priority:
– The values being written are combined using some
associative functions such as summation or maximum