0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
28 Ansichten31 Seiten
Myricom's 10-Gigabit Ethernet NICs support both Ethernet and Myrinet network protocols at the Data Link level (layer 2) Myri-10G switches retain the efficiency and scalability of layer-2 Myrinet switching internally, but can have a mix of 10Gb Ethernet and 10Gb Myrinet ports externally.
Myricom's 10-Gigabit Ethernet NICs support both Ethernet and Myrinet network protocols at the Data Link level (layer 2) Myri-10G switches retain the efficiency and scalability of layer-2 Myrinet switching internally, but can have a mix of 10Gb Ethernet and 10Gb Myrinet ports externally.
Myricom's 10-Gigabit Ethernet NICs support both Ethernet and Myrinet network protocols at the Data Link level (layer 2) Myri-10G switches retain the efficiency and scalability of layer-2 Myrinet switching internally, but can have a mix of 10Gb Ethernet and 10Gb Myrinet ports externally.
Myricoms Myri-10G Software Support for Network Direct (MS MPI) on either 10-Gigabit Ethernet or 10-Gigabit Myrinet 30. Mrz 2009 2. Treffen der WinHPC UG Dresden Dr. Markus Fischer Senior Software Architect fischer@myri.com 2
www.myri.com 2009 Myricom, Inc. Myri-10G Based on 10-Gigabit Ethernet PHYs (layer 1) Standard 10-Gigabit Ethernet cabling, copper and fiber Myri-10G NICs support both Ethernet and Myrinet network protocols at the Data Link level (layer 2) and, like earlier Myricom NICs, include processors and firmware for offload and kernel-bypass operation Myri-10G switches retain the efficiency and scalability of layer-2 Myrinet switching internally but can have a mix of 10-Gigabit Myrinet and 10-Gigabit Ethernet ports externally 4th-generation Myricom products, a convergence at 10-Gigabit/s data rates of Myrinet with Ethernet 3
www.myri.com 2009 Myricom, Inc. The Spectrum of Myri-10G Applications 10-Gigabit Ethernet solutions for general networking 10-Gigabit Ethernet NICs with wire-speed TCP/IP performance, low cost, fully compliant with Ethernet standards, interoperable with the 10-Gigabit Ethernet products of other companies 10-Gigabit Ethernet layer-2 switches that scale efficiently to thousands of ports, because they are Myrinet switches inside Optional capabilities enabled by NIC firmware for low-latency, low-host-CPU-load, kernel-bypass operation over either 10Gb Ethernet or 10Gb Myrinet networks MX (Myrinet Express) for HPC; Video Pump TM for video streaming; RAP for packet capture and playback; and more 10-Gigabit Ethernet and Myrinet solutions for HPC A complete, low-latency, scalable, cluster-interconnect solution NICs, software, and switches software-compatible with Myrinet- 2000, and highly interoperable with 10-Gigabit Ethernet 4
www.myri.com 2009 Myricom, Inc. Myri-10G NICs Protocol-offload 10-Gigabit Ethernet NICs - Myri10GE software distribution - 10G Ethernet switch - 9s TCP/IP latency - 9.5-9.9 Gbit/s TCP/IP data rate - Low host-CPU utilization Kernel-bypass, low-latency, 10-Gigabit Myrinet NICs - MX-10G software distribution - 10G Myrinet switch - 2s MPI latency - 1.2 GByte/s MPI data rate - Very low host-CPU utilization PCI Express x8 NICs, available with many different 10GbE PHYs and form factors. These NICs are all based on the Myricom Lanai-Z8E or -Z8ES chips, and share the same software support and optional firmware-enabled performance features.
Two SFP+ports for Direct Attach,
10GBase-SR, or 10GBase-LR One QSFP port for Direct Attach or EOE fiber cables 5
www.myri.com 2009 Myricom, Inc. Lanai Z8ES Chip Lanai Z8E core with 2MB of on-chip fast SRAM, plus functional enhancements 2 ports for failover SR-IOV SMBus more I/O is PCIe x8 and two XAUI ports 21.5% higher clock rate than the Z8E 2.5W typical at 364.6MHz 25mm square FCBGA package 6
www.myri.com 2009 Myricom, Inc. Block Diagram of the Lanai Z8ES Chip 7
www.myri.com 2009 Myricom, Inc. Examples of Lanai-Z8ES-based NICs Lanai Z8ES Network Ports 10 Gb/s PCIe x8 Simple NIC Two ports for failover 10G-PCIE-8B-2S, -2I, -2C Lanai Z8ES Network Ports 20 Gb/s PCIe x8 Double NIC blades, motherboards 10G-PCIE-8B2-4I Lanai Z8ES PCIe x8 Lanai Z8ES Network Ports 20 Gb/s Double NIC Gen2 PCIe Add-in Card 10G-PCIE2-8B2-2QP, -2S, -2C Lanai Z8ES Gen 2 PCIe x8 PCIe Switch 8
www.myri.com 2009 Myricom, Inc. Preferred NIC for HPC Applications Myricom 10G-PCIE-8B-QP 3W with copper cables Add 1W for EOE cable QSFP Network Port P C I
E x p r e s s
x 8
H o s t
P o r t Indiana University and TU Dresden used these fast but low-power NICs in an IBM iDataplex system together with a Myri-10G switch to win the SC08 Cluster Challenge (performance under power limits) Advance Product Information 9
www.myri.com 2009 Myricom, Inc. 2-Port NIC for the Data Center Myricom 10G-PCIE-8B-2S 2 x SFP+ Network Ports P C I
E x p r e s s
x 8
H o s t
P o r t Two Ports for Failover (HA), Low-profile PCI Express x8 add-in card, Dual-protocol network ports, 10- Gigabit Ethernet or 10-Gigabit Myrinet, Wire-speed performance from either port, Firmware-controlled offloads, 6 Watts typical including power for two SFP+transceivers 10
www.myri.com 2009 Myricom, Inc. Gen2 PCIe v2.0 (5GT/s) NIC with 2 SFP+ ports for Performance (20Gb/s) and Failover P C I
E x p r e s s
5 G T / s
x 8 H o s t
P o r t 2 SFP+ Network Ports Product code: 10G-PCIE2-8B2-2S IP - The two active ports can carry TCP/IP traffic concurrently at an aggregate data rate of 19.8 Gb/s with a 9KB MTU, or 18.9 Gb/s with a 1500B MTU. MX Can use both ports for data transfers allowing for 2x BW (NIC bonding) 11
www.myri.com 2009 Myricom, Inc. Myri-10G High Speed Expansion Cards (HSECs) for the IBM BladeCenter H 4 Ports, 2 for performance, 2 for failover 20 Gb/s throughput Two PCIe devices 6 Watts Product Code 10G-PCIE-8B2-4I 2 Ports for failover 10 Gb/s throughput Single PCIe device 3 Watts Product Code 10G-PCIE-8B-2I Fastest, lowest cost, lowest power, Ethernet expansion cards available for the IBM BladeCenter H 12
www.myri.com 2009 Myricom, Inc. Myri-10G Modular Switches Myricoms modular Myri- 10G switches are based on a 110ns-latency 32-port Myrinet-protocol crossbar- switch chip. The photos show the current choice of enclosures for these modular switches. Different mixes of Ethernet or Myrinet external ports, as well as ports with different PHYs, are configured by using different types of line cards. The line cards plug into a backplane that connects the switch chips on the line cards in a diameter-3 full-bisection Clos network. For applications requiring more than 512 host ports, the switch networks are scalable by interconnecting the internal Myrinet fabrics. 13
www.myri.com 2009 Myricom, Inc. Myri-10G Software The driver and firmware for 10-Gigabit Ethernet operation (Myri10GE software distribution) is included with the NIC The broader software support for 10-Gigabit Myrinet and Low-Latency 10-Gigabit Ethernet is MX-10G (Myrinet Express) MX kernel bypass Sockets over MX + MPI over MX IP Sockets IP Sockets Host APIs Myrinet IPoM IP over Myrinet MXoM MX over Myrinet IPoE IP over Ethernet MXoE MX over Ethernet MX-10G TCP/IP, UDP/IP host-OS network stack Ethernet IPoE IP over Ethernet Myri10GE TCP/IP, UDP/IP host-OS network stack Network Network protocols Software Distribution Host protocols MX-10G, Video Pump, and other software distributions are functional supersets of the Myri10GE software. 14
www.myri.com 2009 Myricom, Inc. Myri-10G 10-Gigabit Ethernet NICs Hopefully, HPC people will excuse this short story about Myricoms very popular 10GbE NICs Remember that Ethernet is the most widely used interconnect for HPC clusters, and even dominates the TOP500 list 15
www.myri.com 2009 Myricom, Inc. Myri-10G NICs in the 10GbE Market Demonstrated interoperability with 10-Gigabit Ethernet switches from Myricom, Foundry, Extreme, HP, Quadrics, Fujitsu, SMC, Force10, Cisco, Blade Network, Broadcom, Arista, (the list keeps growing) The Myri10GE driver and firmware is currently available for Linux, Windows, Solaris, Mac OS X, FreeBSD, and VMware ESX Driver was contributed to and accepted in the Linux kernel; included in the 2.6.18 and later kernels and in many Linux distributions WHQL-certified for Windows XP, Windows Server 2003 and 2008, Vista NDIS 5.1, 5.2 , and NDIS 6.0 Highly effective stateless offloads Zero-copy send checksum offloads LRO TSO RSS interrupt coalescing multicast filtering, Not a TOE; no troublesome stateful offloads; no OS patches See http://www.linuxfoundation.org/en/Net:TOE 16
www.myri.com 2009 Myricom, Inc. 16 Myri-10G TCP/IP Performance The excellent netperf benchmark results below are with Linux (2.6.20) between servers with two Intel quad-core 2.66GHz Xeon X5355s: See http://www.myri.com/scs/performance/Myri10GE/ for additional TCP/IP and UDP/IP performance benchmarks NIC firmware allows Myricom to take advantage of advances such as PCIe Direct Cache Access (DCA) as soon as it appeared in servers See http://www.10gbe.net/ for independent comparisons Netperf Test MTU BW TX_CPU % RX_CPU % ------------ ---- ------- -------- -------- TCP_STREAM 9000 9910.26 11.32 5.89 TCP_SENDFILE 9000 9894.46 3.37 5.91 TCP_STREAM 1500 9463.06 10.54 8.71 TCP_SENDFILE 1500 9354.66 2.75 8.67 17
www.myri.com 2009 Myricom, Inc. Myri-10GE Performance on Windows Ntttcp Results, MTU 9000 Commands: Sender: ntttcps -m 1,1,10.0.130.50 -l 1048576 -n 100000 -w -v -a 8 Receiver: ntttcpr -m 1,1,10.0.130.50 -l 1048576 -rb 2097152 -n 1000000 -w -v -a 8 Results on the Sender: Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s) ===================================================================== 104857.600000 85.500 60667.263 9811.237 Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU % =================================================================== 1728405 281845 2 0 10.70 Results on the Receiver: Total Bytes(MEG) Realtime(s) Average Frame Size Total Throughput(Mbit/s) ===================================================================== 104857.600000 85.735 8959.587 9784.345 Packets Sent Packets Received Total Retransmits Total Errors Avg. CPU % =================================================================== 281837 11703396 0 0 27.27 Myri-10G TCP/IP Performance on Windows 18
www.myri.com 2009 Myricom, Inc. Myri-10G for HPC Applications The Power of Programmable NICs 19
www.myri.com 2009 Myricom, Inc. MX Software Interfaces Applications UDP TCP IP Ethernet driver MX driver In the Host OS kernel Ethernet NIC MX firmware in the Myri NIC Myrinet-2000, 10G Myrinet, or 10G Ethernet ports MPI Sockets Other Middleware Kernel bypass Initialization & IP Traffic 20
www.myri.com 2009 Myricom, Inc. MX Software MX-10G is the low-level message-passing system for low-latency, low-host-CPU-utilization, kernel-bypass operation of Myri-10G NICs over either 10Gb Myrinet or 10Gb Ethernet networks MX-2G for Myrinet-2000 PCI-X NICs was released in J une 2005 Myricom software support always spans two generations of NICs MX-2G and MX-10G are fully compatible at the application level Includes TCP/IP, UDP/IP, MPICH-MX, and Sockets-MX Also available: MPICH2-MX, OpenMPI, HP-MPI, Intel-MPI, Supports NIC bonding (teaming) With N NICs, nearly N times the data rate at the same latency Cluster file systems operate directly over MX Lustre-MX, PVFS2-MX, others in progress 21
www.myri.com 2009 Myricom, Inc. MX over Ethernet Myricom extended MX-10G to operate over 10-Gigabit Ethernet as well as 10-Gigabit Myrinet MXoE works with Myri-10G NICs (kernel bypass) and standard 10-Gigabit Ethernet switches MXoE uses Ethernet as a layer-2 network with an MX EtherType to identify MX packets (frames) MX has its own efficient protocols for reliable delivery The Myri-10G NICs can carry IP traffic (IPoE) together with MX traffic (MXoE) Myricom has made the MXoE protocols open for use with other Ethernet NICs See the Open-MX project (http://open-mx.gforge.inria.fr/) 22
www.myri.com 2009 Myricom, Inc. Myri-10G Software Stacks for Windows MX drivers are WHQL certified Carry both IP and MX traffic efficiently Windows CCS WSD-MX allows for TCP Sockets acceleration achieving line rate even for Pingpong HPC Server 2008: WSD-MX (lower latency) NdProv-MX as a Provider for Network Direct (ND Logo tested) 23
www.myri.com 2009 Myricom, Inc. MSMPI Performance using NdProv-MX 24
www.myri.com 2009 Myricom, Inc. 25 Thank you! More questions? 26
www.myri.com 2009 Myricom, Inc. Extras Or to help answer likely questions 27
www.myri.com 2009 Myricom, Inc. Myri-10G in the TOP500 List IBM used several hundred Myri-10G 10GbE NICs in the Roadrunner system (#1) for storage The IBM Blue Gene/P system at Argonne National Laboratory (#5) uses a Myri-10G dual-protocol switch with nearly 1000 ports, 10Gb Ethernet to the Blue Gene/P racks and 10Gb Myrinet to the file-system cluster Other TOP500 Blue Gene/P systems that use Myri-10G switches for storage switching: IDRIS (#16) and EDF (#24), both in France and similar off-the-list use in many other TOP500 systems Were pleased about the Myrinet 10G clusters in the Nov-2008 TOP500 list, i.e., the T2K cluster at the University of Tokyo (#27), and the clusters at Clemson and USC (#60 & #61). However, in keeping with Myricoms diversification, our Myri-10G products are used invisibly in many other TOP500 systems, for example: 28
www.myri.com 2009 Myricom, Inc. Photos of the T2K Cluster at U Tokyo Photos taken while the cluster was being installed. The installation uses 14 21U switches and CX4 cabling. This cluster is much too large and too tightly packed in its room to capture in a single photograph. 29
www.myri.com 2009 Myricom, Inc. Cabling Change in our HPC Offerings We are phasing out CX4 cabling for HPC clusters and inter-switch links for switches with more than 512 host ports We recommend the use of components with QSFP (QP) ports, with QSFP-terminated copper cables up to 5m, or with QSFP-terminated EOE (Electrical-Optical-Electrical) cables for distances larger than 5m Additional cabling options, such as QSFP-terminated copper cables with active circuitry in the cable ends, are expected Myricom 10G-QP-10M QSFP-terminated EOE cable 30
www.myri.com 2009 Myricom, Inc. The Dominance of Ethernet The Myricom technical team that designed Myri-10G (starting in 2002) was not willing to settle for just a 4th generation of Myrinet, but decided that Myricom should diversify from a specialty network company into mainstream Ethernet. In retrospect, Myri-10G anticipated the current revolution in Ethernet, in which the standards bodies and many companies are engaged today in developing what is variously called Low Latency Ethernet, Data Center Ethernet, Convergence Enhanced Ethernet, or Lossless Ethernet. Application targets: HPC, Storage, Fibre Channel over Ethernet (FCoE), and others The goal: Ethernet domination; the elimination of specialty networks Standards-based Ethernet will soon be capable of everything that the specialty networks Myrinet, Fibre Channel, and InfiniBand do today, but in the meanwhile Myri-10G comes closer than any other commercial products to achieving this goal. Aided in part by the firmware in the programmable NICs, Myri-10G already supports FCoE and multiple forms of HPCoE. 31
www.myri.com 2009 Myricom, Inc. Extra: Myth and Truth about I/O Performance Data Rate vs Signaling Rate (8b/10b Encoding) Traditional networking such as Ethernet advertises the Data Rate. 10Gb/s is the data rate of 10-Gigabit Ethernet, whatever the signal encoding. IB marketing advertises signaling rate with 8b/10b encoding, so dont be disappointed when running performance tests. 10Gb/s IB is really 8Gb/s data rate; 20Gb/s IB is 16 Gb/s data rate, and 40Gb/s IB is 32Gb/s data rate. PCI Express also lists signaling rate (GT/s =GigaTransfers/s = GBaud per lane), and is also 8b/10b encoded, so multiply by 8/10 and the number of lanes to get data rate. Gen 1 PCI Express (2.5GT/s) x8 =8 * 2.5 Gb/s (signaling) =20 Gb/s (signaling) 16 Gb/s (data rate) each direction Good implementations achieve 13 Gb/s data rate after protocol overhead Well suited for 10Gb Ethernet Gen 2 PCI Express (5GT/s) x8 =8 * 5.0 Gb/s (signaling) =40 Gb/s (signaling) 32 Gb/s (data rate) each direction A good implementation achieves 25 Gb/s data rate after protocol overhead Well suited for dual-port 10Gb Ethernet NICs, allowing true 2 * 10Gb/s Now, think about how to handle 40Gb/s or 80Gb/s?