Sie sind auf Seite 1von 8

Implementing Virtual Interface Architecture

on top of the GM Message Passing Interface

Guillaume Chelius
Ecole Normale Superieure de Lyon
46 all6 d’ltalie, 69346 Lyon Cedex 07, France
Guillaume.Chelius@ens-lyon.fr

Abstract Myrinet is given in [ 101.


The aim of the presented work was to provide a VIA in-
In this papel; two different strategies to provide a Vir- terface on top of the Myrinet interconnect network. Indeed,
tual Interjiace Architecture (VIA)layer on top of the Myrinet no efficient implementation is available at this time. To ful-
interconnect network are presented. The VIA protocol is fill this task, two different strategies have been studied. The
based on U network user level access to allow high perjor- first one is to use an already existing implementation of the
mances and implenients a point-to-point bidirectional con- protocol, Modular-VIA (see [5] for details about M-VIA),
nected coniniunication model. which allows the integration of new device drivers. The
The first alternative is the integration of a Myrinet de- second approach is to write a VIA layer on top of an exist-
vice driver in the Modulur- VIA architecture. This particu- ing message passing interface. The two alternatives lead to
lar iniplenientatiori allows an easy development of drivers two different implementations, with different interests and
to support VIA on top of new network devices. However; drawbacks.
perjorniances are not fully satisfuctor): In the second section, a brief overview of VIA is given.
The second strutegy is the developnietit of U VIA inter- The information presented here is extracted from the VIA
face on top of the Myricom message passing intequce. It is Specifications VI .O (see [ 121) and the Intel Virtual Interface
a user-level libray which achieves great portuhility, perj5or- Architecture Developer’s Guide (see [ 151). The third sec-
niance and scalability. I t can also be integrated in a multi- tion explains the work done to associate the Myrinet media
protocol environment based on the Myrinet media. with M-VIA. The two last sections describe GVIA, an im-
plementation of VIA on top of GM, the message passing
interface developed at Myricom, Inc (see [4] for details).
Performances and drawbacks of GVIA are presented.
1. Introduction

VIA, standing for Virtual Interface Architecture, is a pro- 2. Overview of VIA


gramming Application Programming Interface (API) devel-
oped for a use in clusters of workstations. It aims at reduc- In a traditional network architecture, the operating sys-
ing the system-processing overhead of traditional commu- tem (OS) virtualizes the network hardware into a set of
nication models by providing a protected, user level access logical communication endpoints available to network con-
to a network interface. The VIA architecture lies on the sumers. The OS multiplexes access to the hardware among
Virtual Interface (VI) abstraction. VIS are communication these endpoints. The drawback of this organization is that
endpoints which must be connected to allow bidirectional all communication operations require a call or trap into the
and point-to-point data transfers. operating system kernel, which can be quite expensive.
A Myrinet interconnect network is a switched, worm- The VI architecture eliminates the system-processing
hole routed network. One feature of the Myrinet hard- overhead of the traditional model by providing each con-
ware is the integration of a RISC (Reduced instruction Set sumer process with a protected, directly accessible inter-
Computer) processor as well as memory directly into the face to the network hardware - a Virtual Interface. Each
board. These features allow the development of “smart” VI represents a communication endpoint. VIS can be
. drivers and enable the move of some of the communication logically connected to support bidirectional, point-to-point
work from the host to the board. A detailed presentation of transfers. A network adapter performs the endpoint virtu-

245
0-7695-1010-8/01 $10.00 @ 2001 IEEE
alization directly and subsumes the tasks of multiplexing, Queues and directly performs data transfer functions. The
de-multiplexing, and data transfer scheduling normally per- Kernel Agent is a privileged part of the OS that performs
formed by an OS kernel and device driver. the setup and management needed to maintain a Virtual In-
terface between VI Consumers and VI NICs. This includes
2.1. The Global Architecture creation/destruction of VIS, Completion Queues, manage-
ment of system memory, interrupt management, VI connec-
tion setup and error handling.
A Virtual Interface is the mechanism that allows a VI
VI
Consumer
Application
OS Communication Interface
I Consumer to directly access a VI Provider to perform data
transfer operations. Figure 2 illustrates a Virtual Interface.
A VI consists of a pair of Work Queues: a send queue and
a receive queue. VI Consumers post requests, in the form
rite
of Descriptors, in the Work Queues to send or receive data.
VI Providers asynchronously process the posted Descrip-
tors from the Work Queues and mark them with a status
when completed. VI Consumers remove completed De-
scriptors and use them for subsequent requests. Each Work
Queue has an associated Doorbell that is used to notify the
VI VI network adapter that a Descriptor has been posted. This
Provider
mechanism is directly implanted in the adapter and requires
no OS intervention. A Completion Queue allows a VI Con-
sumer to coalesce notification of Descriptor completions
Figure 1. The VIA Architectural Model from the Work Queues of multiple VIS in a single location.

2.2. Memory, Connection and Error Management


VI Consumer
t t
I Most computer system designs require that the memory
( I I pages used to hold messages are locked down and that their
oor 11 virtual addresses be translated into physical locations before

i;3
a NIC can access them. Pages are unlocked when the trans-
fer is completed. Traditional network transports either per-
form these operations on every data transfer request or copy
the data into a pre-registered buffer. These processes con-
tribute significant overhead to the data transfer operation.
VI The VI Architecture requires the VI Consumer to identify
I memory used for a data transfer prior to submitting the re-
Network Interface Controller
quest. Only memory that has been registered with the VI
Provider can be used for data transfer.
Memory registration consists of locking the pages of a
virtually contiguous memory region into physical memory
Figure 2. A Virtual Interface and providing the virtual to physical translation to the VI
NIC. When registering, the VI Consumer gets an opaque
The VI architecture is composed of four basic compo- handler which is used in subsequent calls to the VI Provider.
nents: Virtual Interfaces (VIS), Completion Queues (CQs), VIA includes several memory protection facilities. When
a VI Provider and a VI Consumer. The architecture is illus- registered, a memory region is associated with a protection
trated in Figure 1. tag and several permission attributes that define the allowed
The VI Consumer is generally composed of an applica- memory operations. Whenever a memory block is con-
tion program and an operating system communication fa- cerned by a user operation, the correct protection tag has
cility. The VI provider is the set of hardware and software to be provided by the VI consumer for the operation to be
components responsible for instantiating a Virtual Interface. effective.
The VI Provider consists of a network interface controller The VI provider is responsible for handling the connec-
and a Kernel Agent. The VI Network Interface Controller tion and error management. VIA only defines the connec-
(NIC) implements the Virtual Interfaces and Completion tion scheme/model. The underlying protocol remains im-

246
plementation dependant. Concerning errors, the specifica- The Core module is responsible for the management of
tions propose several behaviors. They define different lev- resources and connections. All the functionalities provided
els of reliability: unreliable, reliable delivery and reliable by this module are default ones. A device module is free to
reception. override any of them by registering the new functions when
loaded.
2.3. Existing Implementations The VI Provider library implements the API defined in
the VIA specifications. The main characteristic of this li-
Several implementations of VIA exist. Most of them brary lies in the implementation of the kernel agent calls.
have been developed to deal with proprietary media and Indeed, a device driver can decide whether these calls are
only few information is available about them. Examples performed through the use of classical system calls or, to
are Giganet, Tandem, Fujitsu System Technologies or NEC. improve efficiency, through fast-traps. A fast-trap enables
Two other implementations are the product of academic ef- the execution of privileged code with minimum overhead
forts. In the two cases, the aim was to build a reference (no scheduling operations and signal processing as in clas-
implementation of the protocol. sical system calls).
Berkeley VIA is a multi-platform (Solaris, Linux and A device module provides the abstraction of a VI NIC.
When a device registers itself to the core module, it provides
Windows NT) native implementation of VIA for Myrinet
boards. It has been developed at UC Berkeley (see [17] its specific abilities (e.g. Maximum Transfer Unit, resource
and [2] for more information about Berkeley VIA). It is limitations, doorbell mechanisms) and its own implemen-
tation of some of the VI Kernel Agent calls (e.g. the send
an example of embedded VIA since the protocol is almost
call with the DEC Tulip card). Basically (for VIA-unaware
completely implemented directly into the Myrinet board.
hardware) a M-VIA device module is an Ethernet driver en-
M-VIA, standing for Modular VIA, is an implementa-
hanced with a (de-)multiplexing ability.
tion of VIA for the Linux operating system, developed at
the National Energy Research Scientific Computing Center
(NERSC) of the Lawrence Berkeley National Laboratory 3.2. The Myrinet Device Module
(see [ 161 and [5] for more information about M-VIA). It sup-
ports several physical media such as Ethernet cards and the Like classical M-VIA devices, the M-VIA Myrinet driver
GNIC-I1 Gigabit Ethernet device and allows an easy exten- is based upon the Ethernet driver (the GM Ethernet driver
sion to new devices. in our study). Resources (buffer rings, interruption han-
dlers) are shared and a (de-)multiplexing of several events
is required. For example, at the reception of a packet,
3. Integration of Myrinet in M-VIA
the receive interrupt handler is responsible for handling the
packet, checking its type and transmitting it to the correct
A first solution to support VIA on a Myrinet network upper-layer’s protocol . The send interrupt handling is also
is the porting of M-VIA for Myrinet boards. Indeed, M- de-multiplexed. To achieve this, a permanent record of the
VIA has been designed to allow an easy integration of new send ring state and, more precisely, a record of the packets
network interfaces. being processed is kept up-to-date. This information en-
ables the send interrupt handler to decide which protocol
3.1. M-VIA Architecture the sent packet was associated to.
Compared to other M-VIA drivers, one of the Ethernet
A primary design goal of M-VIA is to enable a sup- driver’s features based on Myricom’s GM library, is its abil-
port of VIA for multiple network interfaces, including both ity to handle gather descriptors. This feature is used when
VIA-aware and VIA-unaware ones. This is achieved by a VIA packet is sent. Instead of fragmenting and emitting
the modular implementation of the protocol. M-VIA pro- each of the VIA segments until the termination of the de-
poses a complete framework, but also allows a device to scriptor, the VIA gather list is mapped into one or several
override some VIA functionalities in its own specific way. GM gather lists. The translation of VIA descriptors into GM
This allows VIA-aware devices to use their special abili- ones is not trivial since the constraints of both descriptors
ties. Its architecture is composed of a user-level library are not the same (number of segments, Maximum Transfer
and several Linux kernel modules. The core module is Unit, etc.); but the fragmentation of the sent data is, in most
device-independent and implements the kernel agent of the cases, considerably reduced.
VI provider. The other modules are device modules and im- In addition to classical optimizations, two optimization
plement device specific functionalities. They are composed schemes have been added to the driver. The first one con-
of an Ethernet device driver to which several VIA abilities cerns the handling of sent interrupt on the host side and the
have been added. second one is related to the Myrinet DMA engine and the

247
processing of gather lists. Device I M-VIA I GAMMA
An interruption is a costly process. It requires the GNIC-I1 Gigabit Ethernet 1 59.7 I 93.7
LANai, the processor on the Myrinet board, to access the t--"G-
I
I
I
11.9 I
I
12.1 1
I

PCI bus and eventually to stall on a pending interrupt. It also


requires the host to suspend its activity to access the PCI bus Table 3. Bandwidth (MBytesk) comparison
to get the interrupt related information and finally to execute between M-VIA (results extracted from [16])
the correspondent handler. Often, this handler does not ex- and GAMMA (results extracted from [3]).
ecute any critical operation and could have been delayed.
This observation led us to aggregate the handling of certain
interrupts. The firmware on the Myrinet board (the MCP
standing for Myrinet Control Program) was modified to al- Table 1 presents the performance of the Myrinet driver.
low an interrupt-on-request version of the Ethernet sending Though the latency is high compared to classical Etherne:t
functions. When processing an output request, a comple- devices (see table 2 for a reference), the bandwidth is equiv-
tion interrupt is asked in the only case of a critical need in alent to the one reached with the GNIC-I1 Gigabit Ethernet
resources. Otherwise, a mute send is performed. In the han- card (see table 3), far from the theoretical maximum. Since
dling of a sent interrupt,the operations corresponding to all this maximum can be reached with other protocols, M-VZA
completed sent since the last interrupt are performed (see seems to be the bottleneck. The high latency is also a real
also technics developed in [9]). issue. Since G M achieves a 15 usec latency on the same
The new generation of Myrinet boards (LANai >= 7.0) system, it can be concluded that the Ethernet support in the
includes a versatile DMA controller. It uses chain of con- MCP and kernel is far from being efficient. The GM Ether-
trol blocks stored in the LANai memory to initiate DMA- net driver has to be re-designed, including other optimiza.-
mastering operations. The current version of the G M MCP tions such as a receive interrupt aggregation or a transfer of
does not use this ability but only processes one DMA block the interrupt informations from the Lanai to the host. Some
at a time. In the case of an Ethernet VO request, the MCP works have been studied in [9].
loops on the descriptor, DMAing one segment after another. The success of M-VIA, and one of its goals, is the mod.-
To correct this deficiency, a given number of DMA blocks is ularity of the system. It is the only VIA implementation
statically allocated in the LANai memory. When processing that allows the use of different devices in a transparent way.
an Ethernet request, the different blocks are chained accord-
Moreover, it supplies new hardwares with a support for VIA
ingly to the gather list. at a low development cost. Another good aspect of M-VL4
is the access it provides to the Ethernet driver of a device.
3.3. Performances and Conclusions This may present several opportunities, notably for testing
purposes. For example, a performance program can directly
benchmark the Ethernet performances of a given device b y
Hardware Latency (packet Bandwidth (packet minimizing the overhead associated with the access to the
size: 1 Byte) size: 31487 Bytes) driver.
Myrinet 35 usec 58 MBytes/s However, another goal of this implementation, high-
performance, has not been reached. To justify this state-
Table 1. Performancesof M-VIA on bi-pentium ment, let us consider two different situations: VIA-unaware
II 450 MHz with LANai 7.0 and 32 bits/33 MHz and VIA-aware devices. The first kind of device usual1:y
PCI buses. gives M-VIA its best results (M-VIA performance against
hardware one). This is even truer if the device is intrinsi-
cally slow. For example, as shown in table 3, the DEC Tulip
reaches its maximum theoretical bandwidth: 11.9 MBytes/s
Device M-VIA GAMMA against a maximum of 12.5 MBytes/s. However, in the case
GNIC-I1 Gigabit Ethernet 18.3 9.5 of a faster device, GNIC-I1 Gigabit Ethernet, M-VIA be-
DEC Tulip 22.6 14.3 comes the bottleneck: 59.7 MByteds against a maximum
of 125 MBytes/s. It can be stated that this last theoretical
value can be approached since, for example, the GAMMA
(see [3] and [ 1 I ] for more informations) project achieves a
bandwidth of 93.7 MBytes/s with its own protocol (not a
VIA implementation) and the same hardware. A compar-
ison of latency (see table 2) leads to the same conclusion.
Compared to the GAMMA API, M-VIA introduces a signif-

248
icant overhead of approximately 9 usec for both devices.
This overhead is OS overhead. Indeed, M-VIA does not pro-
vide a user level access to the network interface. The cost
associated with fast-traps and system managements, even if
lowered, is still present. Secondly, M-VIA is not a 0-copy
,g
..........,

I..........
...........

*..........
~
I-@
L.. ........
.d . Y CQ E m o b ltd
protocol. Although on the send side, memory registration IIc.LI 4-U.

and address translation allow avoiding a costly copy, this is Recv WQ Send WQ Completion
not true on the receive side. Virtual Interface J Queue

Even with VIA-aware devices, there is still a problem: I ,


I
despite its design, M-VIA does not provide enough modu-
larity to take a real profit of the device. With Myrinet for
example, it is possible to use the gather ability of the hard-
ware, but not to implement a 0-copy receive operation. The
consequence is a performance far from what could be ex-
pected. As a demonstration, bandwidths achieved by both
Myrinet and the GNIC-I1 Gigabit Ethernet card can be com-
pared: in both cases, the VIA implementation is the bottle-
neck and the bandwidth can not go over 60 MBytesIs. As
another example of the hard integration of special hardware
capacities, we can simply say that, as far as we know, the
Myrinet device driver is the only M-VIA driver of a “smart”
device. Figure 3. Global Architecture of GVZA.

4.1. Global architecture


4. Design of GVZA
Though GVIA is built with the classical bricks described
in the VIA specification (Virtual Interfaces, Completion
Queues, Network Interface Controller, etc.), several ma-
Since results of the M-VIA implementation are not sat-
jor differences in the global architecture have been intro-
isfactory given the potential of the Myrinet hardware, we
duced. They concern both the abstraction and implementa-
have to look elsewhere to provide an efficient VIA for the
tion which is made from these elements.
Myrinet network. A first possibility is the development of
GM provides a reliable and connectionless model of
a native Myrinet VIA. It means developing a VIA-aware
communication between endpoints called Ports. A Port is
MCP, a host driver and a user library. This strategy has al-
the interface between a consumer and the GM library. It is
ready been studied in [ 171. The second possibility, an unex-
owned by one process to access the Myrinet media. When
plored one, is to port the VIA protocol on top of an existing
a packet travels through a Myrinet board, a multiplexing or
Myrinet API, GM for example. Although we can expect
de-multiplexing step is performed to dispatch the packet to
best performances from a native VIA, a porting on GM has
the targeted port. Ports does not fit well in the VIA model
several interests. It allows the integration of the protocol
because, though the two objects are similar: GM does not
into a bigger system which allows the use of several com-
provide enough ports to abstract them as Virtual Interfaces.
munication interfaces on the same media. Another point of
They have to be considered as Network Interfaces (NI). So
interest is the great portability achieved by such an inter-
a single Myrinet board is seen as several NIs (one per port).
face. All these reasons led to the development of GVIA, a
A process can own several NIs belonging to the same or
VIA implementation on top of GM.
different medias and several NIs of the same board can be
Implementing the same specifications, meeting the same owned by different processes. This offers a great flexibility
requirements and constraints, VIA implementations are in the use of a Myrinet interface and also allows the use of
similar in numerous parts. GVIA does not get away from other communication protocols on the free remaining ports
this rule. However, bcing a pure software and user-level (e.g. MPI-over-GM or PVM-over-GM ...).
implementation, it is relatively different from other ones. Just as GM resources are accessed by a consumer using
Indeed, a lot of features, the ones not implemented in GM, a Port, VIA defines Handles to allow a VI consumer to ref-
that are normally done by the network interface or the host erence VIA resources. This mechanism allows the setup of
system have to be handled at the user level. a protection scheme that prevents the user of passing cor-

249
rupted references to the VIA library. In most existing im- GM only performs a de-multiplexing between ports. Since
plementations of VIA, the common policy is to abstract a ports can’t be used as VIS, this operation has to be imple-
handle as a pointer to the object it references. The benefit is mented in the library. Another issue is the support for scat-
an efficient translation from handler to object in the library. ter and gather lists. Although it is supported by the Ethernet
However it prevents to properly validate the handles. To driver, the user GM API does not accept it. The proposed
remedy this, the architecture shown in Figure 3 is used. In solution differentiates two cases depending on the trans-
every NIC, for each object’s type, the existing resources are fered data size. For short messages, a copy operation is
listed through the use of a reference array. In this scheme, used. In the case of long messages, a rendezvous protocol
handles are just indexes. By use of mutual exclusion re- is set up.
sources, the architecture allows to protect the library against The copy of a small amount of data does not introduce
almost any misbehavior from the VIA consumer. a high overhead in comparison to the transport cost. This
fact justifies the use of shared buffers to receive and send
4.2. A user layer short VIA packets (see Figure 3). When a short message is
received in the Myrinet board, it is copied into the receive
GM already implements some functionality required by buffer ring of the targeted NIC, that is the targeted Port.
VIA: memory registration, address translation and a Do- The library is responsible for copying the packet into the
main Name Service feature. In a classical VIA implementa- scatter list of the first free descriptor in the Receive Work
tion, all these operations are performed by the Kernel Agent Queue of the target VI. It is also responsible for notifying
of the VI Provider. In GVIA, a Kernel Agent is useless since the VI of the completed operation. On the send side, the
the GM driver can handle most of its work. If its remaining data is copied from the different segment of the gather list
tasks can be handled directly in user space, GVIA can be into the send ring buffer. This communication scheme also
reduced to a lone user library. applies to all control messages. In this case, the copy is even
In addition to the tasks performed by the GM driver, avoided since messages are treated on-the-fly.
the remaining responsibilities of the Kernel Agent are re- In the case of large transfer of data, a copy is not accept-
source management (creation, destruction), protection vali- able. To achieve a 0-copy transfer, a rendezvous protocol
dation (use of Protection Tags), error handling and connec- is established. It is allowed by the use of one CM’s fea-
tion management. Without entering into details, the archi- ture, the directed send. A directed send is a Put operation.
tecture briefly described in the previous section allows to . It transfers data from a source buffer on the local host to a
easily handle resource management and protection valida- target buffer on the remote host, both buffers being speci-
tion from the user space. To handle error and connection fied by the initiator of the operation. At the termination of
management, an intense use of the GM’s flow control abil- the transfer, no notification is performed on the remote host.
ities was made. A subtle handling of GM flow control er- A directed send requires an active resource acquisition only
rors and the use of alarms allow the library to translate GM o n the sender node. As the receive side needs to be notified
events into a VIA situation that can be handled by the li- of the reception completion, an acknowledgment message
brary. is sent by the sender to the receiver. As the sender also
The user status of GVIA led to a last particularity. In needs to know where to send the data, the receiver sends
a classical VIA implementation, numerous calls are block- its gather segments’ list to the sender on the request of the
ing. They are connection requests, connection wait, block- latest. Finally, the complete protocol requires three control
ing receive, etc. Usually, the blocking mechanism lies in the packets in addition to the data transfer. They use the short
kernel like for classical data transfer protocols. However, message scheme and are directly handled from the receive
GVIA only provides a user-level access to the communica- buffer ring.
tion media, without any intervention from the kernel. As
a consequence, the blocking mechanism has to be imple- 5. Status of GVIA
mented directly in the library. It is achieved using mutex
resources and by recording all blocking threads as well as GVIA is a implementation of the VIA specifications on
the operations they are blocking on. top of the message passing interface GM. It implements
the unreliable service but offers reliable communications.
4.3. Data Transfer GVIA is a user library which includes no system-dependent
code and very few hardware dependent code (only little en-
VIA requires from the network interface a high level of dian to big endian translation macros). It is compatible with
de-multiplexing; an incoming message has to be dispatched any thread library that supports mutex operations. The re-
between the potential thousand of VIS (the specification re- sult is a high portability which is only limited by the porting
quires a minimal support of 1024 VIS per NIC). Basically, of GM on different systems.

250
GVIA was tested with the Intel Virtual Interface Archi- sented in figure 4 and a bandwidth comparison in figure 5.
tecture Conformance Suite (see [ l ] and [ 141 for more infor- All test machines have a 66MHd64 bits PCI bus and are
mations). It passed all (about 120) but 5 tests corresponding running Linux. For small packets (0 to 100 Bytes), the over-
to its implementation. Of these 5 tests, two of them test the head introduced by the VIA protocol is less than 1 usec. For
behavior of the VI Provider when confronted to non-aligned bigger packets, the overhead never exceeds 30 usec, which
data. Since the DMA engine of the Myrinet board does not is the cost of the extra messages in the hand-shake protocol.
require any particular alignment, we chose to weaken the This last overhead will be reduced in the near future by the
VIA constraint. The three other failed tests correspond to introduction of a new hand-shake protocol.
behaviors that are unspecified in the VIA specifications. Compared to existing implementations of VIA, a GVIA
based on the new Myricom 2000 boards achieves the high-
5.1. Performances est bandwidth ever (215 MBytes/s) and the shortest latency.
For comparison, Giganet's latency is 8.5 usec though the
latency of GVIA with the latest Myrinet cards (LANai 9
200MHz) is 7.5 usec. It is however important to notice
that the hardware conditions are not the same since Giganet
boards can not take a full advantage of 64 bits/66 MHz PCI
buses.

5.2. Flaws and solutions

A rendezvous protocol requires the exchange of several


messages between the two concerned hosts. Since our ren-
dezvous is handled by the host, a mechanism is needed to
notify the host that the next step of the protocol can be
taken. Yet, GM only provides an event polling mechanism
and no notification mechanism. It basically means that the
Figure 4. Latency comparison between GM protocol can not advance if the VIA consumer stops dealing
and GVIA (LANai 9 133MHz on a Pentium 111 with the GVIA library. This problem is common with this
600 MHr). kind of architecture and has already been discussed in [ 191.
A solution is to mark each received packet with a times-
tamp. If the packet is not handled before the expiration of
a delay, an interrupt is raised and the handling of the packet
is forced.
The connection mechanism, such as it is implemented
right now, is fully satisfactory. However, it does not allow
an extension to new connection models. For example, the
peer-to-peer one, described in [ 151, can not be implemented
using only the flow control ability of GM. It requires the
capacity to record connection requests on a host even if no
VIA consumer is running. With our architecture, the use
of a kernel daemon is not acceptable but it is possible to
open a user connection daemon on a well-known port of
the Myrinet media with the task of managing connection
establishment. Such a solution would both allow the easy
transition towards new connection schemes and respect the
user status of the implementation.
Figure 5. Bandwidthcomparison between GM Finally, the current version of the library does not include
and GVIA (LANai 9 133MHz on a Pentium 111 any direct memory copy facility to use with shared memory
600 MHz). multiprocessors (SMPs). Local data transfer are processed
like distant ones. To reduce the cost of local communica-
tions, a memory copy alternative has to be implemented.
Performances achieved by the library are fully satisfac- Such a work has already be done with the implementation
tory. A latency comparison between GM and GVIA is pre- of MPZ over BIP presented in [ 131. BIP, standing for Basic

251
Interface for Parallelism, is a message passing interface for the aim is not to get a fully embedded VIA, but only to in-
Myrinet boards (see [ 181 for a description). troduce some VIA facilities into CM.

5.3. Future Works References

The development of GVIA should follow two influences. [I] http://developer.intel.com/design/servers/vi/.


[2] http://www.cs.berkeley.edu/ philipb/via/.
The first one consists of adding more VIA features in the [3] http://www.disi.unige.it/project/gamma/.
library and implementing derivative versions. New fea- [4] http://www.myri.com.
tures include the peer-to-peer connection model, notifi- [5] http://www.nersc.gov/research/ftg/via/.
cation functions or the system information management [6] http://www.viarch.org/.
[7] A. Basu, V. Buch, W. Vogels, and T. von Eicken. U-net:
which are described in [6] and [15]. They are the last
A user-level network interface for parallel and distributed
steps in achieving the VIA Full Conformance described in computing. In 15th ACM Symposium on Operating Systems
the Intel Virtual Interface Architecture Developer’s Guide Principles, December 1995.
(see [ 151) and does not present any issue. An experimen- [8] R. Bhoedjang, T. Ruhl, and H. Bal. User-Level Network In-
tal version of the library already includes some of these terface Protocols. IEEE Computer, 3 1(11):53-60, Novem-
features. Derivative versions of the library include thread ber 1998.
safety and the introduction of the reliable delivery and reli- [9] R. Bhoedjang, K. Verstoep, H. E. Bal, and T. Ruhl. Reduc-
able reception services. Here again, no issue is expected. ing Data and Control Transfer Overhead through Network-
Interface Support. In Proc. First Myrinet User Group Con-
The evolution of GM is a second influence for poten-
ference, Lyon, France, Sept. 2000.
tial changes in the library. For example, the expected in- 101 Boden, Cohen, Feldermann, Kulwik, Seitz, Seizovic, and
troduction of a GET operation in the GM API will lead to Su. MYRINET: A Gigabit per second Local Area Network.
deep modifications in the rendezvous protocol and, hope- IEEE-Micro, 15:29-36, February 1995.
fully, better performances. Other structural optimizations, 1 I ] G. Chiola and G. Ciaccio. GAMMA: a Low-cost Network
such as the pipelining of small packets and the fusion of of Workstations Based on Active Messages. In PDP’97 (5th
the receive and send rings are also being studied in order to EUROMICRO workshop on Parallel and Distributed Pro-
improve the data transfer operations. cessing), London, UK, 1997.
121 Compaq Computer Corp, Intel Corporation, Microsoft Cor-
poration. Virtual Intei$ace Architecture Specifcation, De-
6. Conclusions cember 1997. Draft Revision l .
[13] P. Geoffray, L. Prylli, and B. Tourancheau. BIP-SMP: High
performance message passing over a cluster of commodity
Two solutions have been proposed to provide a VIA in- SMPs. In Proceedings of Supercomputing (SC ’99), Port-
terface to Myrinet network interfaces. The first one is asso- land, OR, Nov. 1999.
ciated to the use of M-VIA. It was allowed by the extension [ 141 Intel Corporation. Virtual Interface Architecture Confor-
and optimization of the GM Ethernet driver. Though this mance Suite User’s Guide, December 1998. Preliminary
approach has several advantages, such as a great flexibility Version 0.5.
[ 1.51 Intel Corporation. Virtual Interface Archifecture Devel-
( i t allows the use of different interfaces) or a direct access oper’s Guide, September 1998. Revision 1.O.
to the Ethernet driver, its interest for other-than-academical [ 161 B. S. Patrick Bozeman. A modular high performance imple-
studies is doubtful because of bad performances. The sec- mentation of the virtual interface architecture. In Proceed-
ond solution developed in this study is a VIA layer on top of ings of the Extreme linux workshop, Monterey Califomia,
GM. It achieves really high performances as well as a great June 1999. USENIX Annual Technical Conference.
portability and scalability. It also has drawbacks, the main [17] D. C. Philip Buonadonna, Andrew Geweke. An implemen-
one being the processing of the hand-shake protocol on the tation and analysis of the virtual interface architecture. In
Proceedings of SC98, Orlando, Florida, November 1998.
host. At this time, the library is being tested in its alpha ver-
[ 181 L. Prylli and l3. Tourancheau. BIP: a new protocol designed
sion and integrated in the Myricom software offer. A first for high performance networking on Myrinet. In 1st Work-
release. supported by Myricom, is planned. shop on Personal Computer based Networks Of Worksta-
Future works does not only include enhancements and tions (PC-NOW ’98). Held in conjunction with IPPS/SPDP
optimizations of GVIA. Indeed, if some progress can be 1998. IEEE, Apr. 1998.
made, this architecture has its own limitations: the API of [ 191 L. Prylli, B.Tourancheau, and R. Westrelin. The design for a
G M . To take a full advantage of the Myrinet hardware, it high performance MPI implementation on the Myrinet net-
is needed to slowly introduce some VIA features directly work. In Recent Advances in Parallel Virtual Machine and
Message Passing Interface. Proc. 6th European PVM/MPI
into the MCP in order to get a VIA-aware network interface.
Users’ Group (EuroPVM/MPI ‘99),Barcelona, Spain, Sept.
Such changes include, for example, a scatter and gather sup-
1999.
port or a particular memory protection scheme. However,

252

Das könnte Ihnen auch gefallen