Sie sind auf Seite 1von 13

MESSENGERS 3.

1 –
multiple daemon processes
on one host
Physical node
• darwin.ics.uci.edu
• darwin.ics.uci.edu:0
• % messengers –n darwin.ics.uci.edu:0
• Binds the daemon process on CPU 0 on darwin
• Can comment out the code to bind daemon and
multiple daemon processes can still run on one
host
Daemon binding to a CPU ?
• Benefit:
– Extend the data-CPU affinity to the processor/core
level – might speed up the program if accessing local
cache is significantly faster than accessing shared
cache or memory and if the OS is not smart enough in
scheduling (more later)
• Drawback:
– Multiple threads in the same daemon are restricted to
one CPU
• Whether bound or not, can utilize multiple CPUs
Bind daemon
• int bind_daemon(unsigned int cpu_id) // for Solaris
• {
• int ret;

• ret = processor_bind(P_PID, getpid(), cpu_id, NULL)


• return ret;
• }

• int bind_daemon(unsigned int cpu_id) // for Linux


• {
• int ret = 0;
• cpu_set_t mask;
• unsigned int len = sizeof(mask);

• CPU_ZERO(&mask);
• CPU_SET(cpu_id, &mask);
• ret = sched_setaffinity(0, len, &mask);
• return ret;
• }
– [ the Linux approach seems to be more flexible ]
Inter-daemon communication on
the same host
• Still uses socket
• Don’t know whether there is optimization
Clusters
• hermod, hayes – one CPU per host
• Solaris 10, Intel – 2 CPUs per host –
always busy
• gamera (Solaris 9, UltraSparc) - 2 CPUs
per host
• NACS gradea – 2 CPUs per host
Performance running Crout on
dual-processor gamera
1 CPU 2 CPU gamera+rodan

• [2000 1 1] 38 30 29
• [3000 1 1] 130 104 101
• [3000 2 10] 131 87 109 ?!
• [2000 2 1] 38 183 !!!???

• [2000 2 1] on non-dedicated rodan 28/29 165


• [2000 2 1] on rodan and gamera – 3.1 228/235/211
• – 3.0.8 213
• – 1.2.04 209
Performance running Crout on
uni-processor uni-core hayes01
(Linux Pentium hyperthreading)
1 CPU 2 CPU hayes01+hayes02
• [3000 1 1] 48 39 38
• [3000 1 10] 48 44 41
• [3000 2 1] 48 447 4154
• [3000 2 10] 49 63 320
• Seems to suggest that given the right data
distribution, the new MESSENGERS can take
advantage of Simultaneous Multi-threading (SMT)
Working on …
• Verifying that logical nodes, physical nodes are
where they should be
• Investigating the details about CPU IDs –
referring to virtual or physical CPUs? How can
one tell which CPU (core) is on the same
socket?
• Rough edges in code: processor_bind(),
IP+CPU_ID, …
• Performance comparison: bound vs unbound
daemons, …
Daemon binding enables
data distribution study
on a finer level
• Computation and data distribution is about
fixing computation and data on a physical
node (during certain period during
program execution)
• If computation hops from one node to
another beyond programmer control,
impossible to study the effect of
computation/data allocation
Daemon binding enables
data distribution study
on a finer level
• Binding a daemon to a processor/core
allows one to study data placement on the
processor/core level
• Effect more pronounced on multi-
processor (i.e. socket) host than on multi-
core host
• [multi-core vs many-core]
Multi-level data distribution
• Just an immature idea of mine
• One data distribution pattern (e.g. block)
on the host level, another (e.g. cyclic) on
the processor/core level
Multi-level data distribution
• Block 1 2 3 4 5 6 7 8 | 9 10 11 12 13 14 15 16
• Cyclic 1 3 5 7 9 11 13 15 | 2 4 6 8 10 12 14 16
• Blk-cyclic(2) 1 2 5 6 9 10 13 14 | 3 4 7 8 11 12 15 16

• Blk on host 1 2 3 4 5 6 7 8 | 9 10 11 12 13 14 15 16
• Cyclic on CPU 1 3 5 7 | 2 4 6 8 | 9 11 13 15 | 10 12 14 16

• Cyc on host 1 3 5 7 9 11 13 15 | 2 4 6 8 10 12 14 16
• Cyc on CPU 1 5 9 13 | 3 7 11 15 | 2 6 10 14 | 4 8 12 16

Das könnte Ihnen auch gefallen