Beruflich Dokumente
Kultur Dokumente
Quad Quad
Proc Proc Proc Proc Proc Proc Proc Proc
L2 L2 L2 L2 L2 L2 L2 L2
Quad Quad
I/O Mem Local Remote Remote Local Mem I/O
Bridge Link cache cache Link Bridge
Ctlr Memory Memory Ctlr
L2 L2 L2 L2 L2 L2 L2 L2
The platform is built of 4 processor SMP blocks, called quads, each with memory, I/O and a remote cache
The remote cache services 4 processors
The remote cache is 4 way set associative
In the PentiumPro systems the remote cache is 32M, in the Xeon systems it is 128M
The PentiumPro L2 is 4 way 1M, the Xeon L2 is 4 way 2M
The L2/SMP bus line size is 32 bytes while the remote cache line size is 64 bytes
The interconnect is an SCI ring
Test Configurations
1.00%
Processor Data Miss
% of all data references
0.80%
0.60%
0.40%
0.20%
0.00%
on
on
A
B
on
ro
ro
ro
ro
UM
UM
B-
B-
Pp
Pp
Pp
Pp
Xe
Xe
Xe
NU
D
nN
nN
8
8
8
16
16
8
on
on
on
1
5
1
No
No
Xe
Xe
st
st
Q
cc
N
9
3
cc
Q
cu
cu
cc
cd
ro
ro
ro
tp
tp
8
tp
cd
Pp
Pp
Pp
tp
9
Q
Q
tp
4
cd
cd
1
3
3-
3-
3-
tp
tp
st
st
st
cu
cu
cu
58MB/s 218MB/s
50% 62% 70% 59%
79%
114MB/s 132MB/s
40%
57% 57%
107MB/s
30% 372 MB/s 53%
123MB/s85%
20% 85% 129MB/s
83%
10%
0%
on
on
A
B
on
ro
ro
ro
ro
B-
B-
UM
UM
Pp
Pp
Pp
Pp
Xe
Xe
Xe
NU
D
nN
nN
8
16
16
on
on
8
on
1
5
No
No
Xe
Xe
st
st
Q
cc
N
9
3
cc
Q
cu
cu
cc
ro
ro
ro
tp
cd
tp
8
tp
cd
Pp
Pp
Pp
tp
9
Q
Q
tp
4
cd
cd
1
3
3-
3-
3-
tp
tp
st
st
st
cu
cu
cu
The remote reference percentage of L2 misses (RC Hit + RC Miss) ranges from about 10% to 80%
The remote cache hit rate (RC Hit/(RC Hit + RC Miss)) ranges from 53% to 89%
The remote cache satisfies 8% to 70% of the L2 misses
The SCI BW is reduced by 58MB/s to 372MB/s
Remote Reference Components
100%
Inv Miss
90% RdInv Miss Inv
RdInv Miss Data
80% Rd Miss
Wb Hit
70%
Inv Hit
60% RdInv Hit
Rd Hit
50%
40%
30%
20%
10%
0%
A
-B
-B
on
n
n
o
o
ro
ro
M
UM
eo
eo
pr
pr
B
Pp
Pp
Xe
U
P
P
D
X
X
nN
nN
nN
8
8
8
on
on
16
16
8
No
No
No
5
1
Xe
Xe
Q
st
st
cc
9
3
cc
Q
cu
cu
cc
ro
ro
ro
cd
tp
tp
8
cd
tp
Pp
Pp
Pp
tp
9
Q
Q
tp
4
cd
cd
1
3
3-
3-
3-
tp
tp
st
st
st
cu
cu
cu
The remote reference rate is broken down into reference types with hits or misses
Reads are the most frequent reference type in most cases
Reads have a good hit rate
Invalidates have a poor hit rate
Simulated Relative Performance
1.80
1.00
0.80
0.60
0.40
0.20
0.00
No Rc RC
The remote cache contributes a 18% to 54% performance increase with a 10:1 remote to local latency ratio
Remote Cache even helps TPCC 8 Xeon which has a 10% remote reference rate but a high L2 miss rate (1.3%)
TPCD Q5 8 Xeon DB-B gains the least from a remote cache due to its high communications component
Simulated Relative Performance
1.12
1.10
TPCC1 8 Ppro
TPCC2 8 Xeon
1.08
TPCD Q5 8 Ppro
TPCD Q9 16 Xeon
1.06
TPCD Q5 8 Xeon DB-B
Relative Performance
1.02
1.00
0.98
0.96
0.94
No Rc RC
The remote cache gives a 3% to 10% performance increase with a 2:1 remote to local latency ratio
Simulated Relative Performance
12.00
10.00
TPCC1 8 Ppro
TPCC2 8 Xeon
TPCD Q5 8 Ppro
4.00
2.00
0.00
RC No RC
A system with no remote cache would need a 2.5 to 5.4 ratio to achieve equal performance to a system with a
remote cache and a 10:1 ratio
Simulated Relative Performance
7.00
6.00
TPCC1 8 Ppro
TPCC2 8 Xeon
TPCD Q5 8 Ppro
5.00
TPCD Q9 16 Xeon
TPCD Q5 8 Xeon DB-B
Remote to Local Ratio
3.00
2.00
1.00
0.00
No Rc RC
A system with a remote cache can have a ratio of 2.7 to 6.2 and achieve equal performance to a system with no
remote cache and a 2:1 ratio
Conclusions