Beruflich Dokumente
Kultur Dokumente
Evolution in Microprocessors
Key Points
Illustration
Year
Year
Year
Year
1: 100x faster
2: 62.5x faster
3: 39x faster
10: 0.9x faster
Power efficiency
Thermal packaging limit vs. cost
Types of parallelism
Pipelining
A (a load)
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
ID
EX
IF
WB
MEM
WB
Superscalar/ VLIW
Original:
LD
F0, 34(R2)
ADDD
F4, F0, F2
LD
F7, 45(R3)
ADDD
F8, F7, F6
Schedule as:
LD
F0, 34(R2)
| LD
ADDD
F4, F0, F2
F7, 45(R3)
Cache size
Intel
Clovertown
AMD
Barcelona
IBM Cell
# cores
8+1
Clock Freq
2.66 GHz
2.3 GHz
3.2 GHz
Core type
OOO
Superscalar
OOO
Superscalar
2-issue SIMD
Caches
2x4MB L2
512KB L2
(private),
2MB L3 (shd)
256KB local
store
Chip power
120 Watts
95 Watts
100 Watts
10
Historical Perspectives
11
12
13
14
Parallel computers
15
and cooperate
Autonomy
16
What domains?
Highly (embarassingly) parallel apps
17
Cost/performance
Power/performance
Reliability and availability
18
Parallelism
Flynn taxonomy:
Programming Model:
Shared Memory
Message passing
Hybrid
19
20
10
Subroutines:
Cost = getCost();
A = computeSum();
B = A + Cost;
21
gcc
gcc
gcc
gcc
c code1.c
// assign to proc1
c code2.c
// assign to proc2
c main.c
// assign to proc3
main.o code1.o code2.o
+ no communication
Hard to balance
Few opportunities
22
11
Parallelism
Flynn taxonomy:
Programming Model:
Shared Memory
Message passing
Hybrid
23
Control
unit
Instruction
stream
ALU
Data
stream
24
12
SIMD
SISD:
for (i=0; i<8; i++)
a[i] = b[i] + c[i];
SIMD:
a = b + c;
ALU 1
Control Instruction
unit
stream
ALU 2
ALUn
// vector addition
Data
stream
1
Data
stream
2
Data
streamn
25
MISD machine
Data
stream
26
13
y3
x7
x6
y2
x5
x4
x3
w4
y1
xin
yin
x
w
xout
yout
w2
w1
xout = x
x = xin
yout = yin + w xin
x1
w3
x2
General purpose systems work well for same algorithms (locality etc.)
Fundamentals of Computer Architecture - Chapter 1
27
MIMD machine
Programming abstraction:
Distributed Memory:
Clusters, Grid
28
14
caches
M
P
caches
caches
Network
M
29
caches
caches
Network
30
15
caches
caches
I/O
I/O
Distributed System/Memory:
- Also called clusters, grid
- Dont confuse it with distributed
shared memory
Network
31
Distrib comp
Parallel comp
Distrib comp
Perf
Parallel comp
size
size
32
16
Parallelism
Flynn taxonomy:
Programming Model:
Shared Memory
Message passing
Hybrid (e.g., UPC)
Data parallel
33
Programming Models
Shared M
34
17
M
P
35
36
18
Architectural model
Original motivations
Matches simple differential equation solvers
Centralize high cost of instruction
fetch/sequencing
Fundamentals of Computer Architecture - Chapter 1
37
Other examples:
38
19
Common Today
Systolic Arrays:
Dataflow:
Shared memory:
Data parallel/SIMD:
39
http://www.top500.org
Lets look at the Earth Simulator
Hardware:
40
20
Programming Model
Instruction level
Loop level
Algorithm level
41
42
21
43
44
22
BlueGene
65536 processors
Each processor PowerPC 440 700 MHz (2.8 GFlops)
Rpeak (GFlops):183 TFLOPS
Rmax (GFlops):136 TFLOPS
45
Niche market
Power wall
Programming wall
46
23