Beruflich Dokumente
Kultur Dokumente
However, selecting the right products, technologies and partners can be critical to
success. The Book High Performance Computing for Dummies gives a brief
introduction to HPC and explains how HPC helps enterprises to gain a competitive
edge.
Supercomputer
A supercomputer is a computer that is at the frontline of current processing capacity,
particularly speed of calculation. Supercomputers were introduced in the 1960s and
were designed primarily by Seymour Cray at Control Data Corporation (CDC), which
led the market into the 1970s until Cray left to form his own company, Cray
Research. He then took over the supercomputer market with his new designs, holding
the top spot in supercomputing for five years (1985–1990). In the 1980s a large
number of smaller competitors entered the market, in parallel to the creation of the
minicomputer market a decade earlier, but many of these disappeared in the mid-
1990s "supercomputer market crash".
The term supercomputer itself is rather fluid, and today's supercomputer tends to
become tomorrow's ordinary computer. CDC's early machines were simply very fast
scalar processors, some ten times the speed of the fastest machines offered by other
companies. In the 1970s most supercomputers were dedicated to running a vector
processor, and many of the newer players developed their own such processors at a
lower price to enter the market. The early and mid-1980s saw machines with a modest
number of vector processors working in parallel to become the standard. Typical
numbers of processors were in the range of four to sixteen. In the later 1980s and
1990s, attention turned from vector processors to massive parallel processing systems
with thousands of "ordinary" CPUs, some being off the shelf units and others being
custom designs. Today, parallel designs are based on "off the shelf" server-class
microprocessors, such as the PowerPC, Opteron, or Xeon, and most modern
supercomputers are now highly-tuned computer clusters using commodity processors
combined with custom interconnects.
Contents
• 1 Hardware and software design
o 1.1 Supercomputer challenges, technologies
o 1.2 Processing techniques
o 1.3 Operating systems
o 1.4 Programming
o 1.5 Software tools
• 2 Modern supercomputer architecture
• 3 Special-purpose supercomputers
• 4 The fastest supercomputers today
o 4.1 Measuring supercomputer speed
o 4.2 The Top500 list
o 4.3 Current fastest supercomputer system
o 4.4 Quasi-supercomputing
• 5 Research and development
• 6 Timeline of supercomputers
• 7 See also
• 8 Notes
• 9 External links
As with all highly parallel systems, Amdahl's law applies, and supercomputer designs
devote great effort to eliminating software serialization, and using hardware to
address the remaining bottlenecks.
• Vector processing
• Liquid cooling
• Non-Uniform Memory Access (NUMA)
• Striped disks (the first instance of what was later called RAID)
• Parallel filesystems
Processing techniques
Vector processing techniques were first developed for supercomputers and continue to
be used in specialist high-performance applications. Vector processing techniques
have trickled down to the mass market in DSP architectures and SIMD (Single
Instruction Multiple Data) processing instructions for general-purpose computers.
Modern video game consoles in particular use SIMD extensively and this is the basis
for some manufacturers' claim that their game machines are themselves
supercomputers. Indeed, some graphics cards have the computing power of several
TeraFLOPS. The applications to which this power can be applied was limited by the
special-purpose nature of early video processing. As video processing has become
more sophisticated, graphics processing units (GPUs) have evolved to become more
useful as general-purpose vector processors, and an entire computer science sub-
discipline has arisen to exploit this capability: General-Purpose Computing on
Graphics Processing Units (GPGPU).
Operating systems
Programming
The parallel architectures of supercomputers often dictate the use of special
programming techniques to exploit their speed. The base language of supercomputer
code is, in general, Fortran or C, using special libraries to share data between nodes.
In the most common scenario, environments such as PVM and MPI for loosely
connected clusters and OpenMP for tightly coordinated shared memory machines are
used. Significant effort is required to optimize a problem for the interconnect
characteristics of the machine it will be run on; the aim is to prevent any of the CPUs
from wasting time waiting on data from other nodes.
Software tools
Software tools for distributed processing include standard APIs such as MPI and
PVM, VTL, and open source-based software solutions such as Beowulf, WareWulf,
and openMosix, which facilitate the creation of a supercomputer from a collection of
ordinary workstations or servers. Technology like ZeroConf (Rendezvous/Bonjour)
can be used to create ad hoc computer clusters for specialized software such as
Apple's Shake compositing application. An easy programming language for
supercomputers remains an open research topic in computer science. Several utilities
that would once have cost several thousands of dollars are now completely free thanks
to the open source community that often creates disruptive technology.
As of November 2009 the fastest supercomputer in the world is the Cray XT5 Jaguar
system at National Center for Computational Sciences with more than 19000
computers and 224,000 processing elements, based on standard AMD processors. The
fastest heterogeneous machine is IBM Roadrunner. This machine is a cluster of 3240
computers, each with 40 processing cores and includes both AMD and Cell
processors. By contrast, Columbia is a cluster of 20 machines, each with
512 processors, each of which processes two data streams concurrently.
Moore's Law and economies of scale are the dominant factors in supercomputer
design: a single modern desktop PC is now more powerful than a ten-year-old
supercomputer, and the design concepts that allowed past supercomputers to out-
perform contemporaneous desktop machines have now been incorporated into
commodity PCs. Furthermore, the costs of chip development and production make it
uneconomical to design custom chips for a small run and favor mass-produced chips
that have enough demand to recoup the cost of production. A current model quad-core
Xeon workstation running at 2.66 GHz will outperform a multimillion dollar Cray
C90 supercomputer used in the early 1990s; most workloads requiring such a
supercomputer in the 1990s can now be done on workstations costing less than 4,000
US dollars. Supercomputing is taking a step of increasing density, allowing for
desktop supercomputers to become available, offering the computer power that in
1998 required a large room to require less than a desktop footprint.
In addition, many problems carried out by supercomputers are particularly suitable for
parallelization (in essence, splitting up into smaller parts to be worked on
simultaneously) and, in particular, fairly coarse-grained parallelization that limits the
amount of information that needs to be transferred between independent processing
units. For this reason, traditional supercomputers can be replaced, for many
applications, by "clusters" of computers of standard design, which can be
programmed to act as one large computer.
Special-purpose supercomputers
Special-purpose supercomputers are high-performance computing devices with a
hardware architecture dedicated to a single problem. This allows the use of specially
programmed FPGA chips or even custom VLSI chips, allowing higher
price/performance ratios by sacrificing generality. They are used for applications such
as astrophysics computation and brute-force codebreaking. Historically a new special-
purpose supercomputer has occasionally been faster than the world's fastest general-
purpose supercomputer, by some measure. For example, GRAPE-6 was faster than
the Earth Simulator in 2002 for a particular special set of problems.
"Petascale" supercomputers can process one quadrilion (1015) (1000 trillion) FLOPS.
Exascale is computing performance in the exaflops range. An exaflop is one
quintillion (1018) FLOPS (one million teraflops).
Since 1993, the fastest supercomputers have been ranked on the Top500 list according
to their LINPACK benchmark results. The list does not claim to be unbiased or
definitive, but it is a widely cited current definition of the "fastest" supercomputer
available at any given time.
In November 2009, the AMD Opteron-based Cray XT5 Jaguar at the Oak Ridge
National Laboratory was announced as the fastest operational supercomputer, with a
sustained processing rate of 1.759 PFLOPS.[4] [5]
Quasi-supercomputing
Another distributed computing project is the BOINC platform, which hosts a number
of distributed computing projects. As of December 2009, BOINC recorded a
processing power of over 4.725 petaflops through over 609,000 active computers on
the network.[7] The most active project (measured by computational power),
MilkyWay@home, reports processing power of over 1.4 petaflops through over
30,000 active computers.[8]
The PlayStation 3 Gravity Grid uses a network of 16 machines, and exploits the Cell
processor for the intended application, which is binary black hole coalescence using
perturbation theory.[13][14] The Cell processor has a main CPU and 6 floating-point
vector processors, giving the machine a net of 16 general-purpose machines and 96
vector processors. The machine has a one-time cost of $9,000 to build and is adequate
for black-hole simulations, which would otherwise cost $6,000 per run on a
conventional supercomputer. The black hole calculations are not memory-intensive
and are locally introduced, and so are well-suited to this architecture.
Other notable computer clusters are the flash mob cluster and the Beowulf cluster.
The flash mob cluster allows the use of any computer in the network, while the
Beowulf cluster still requires uniform architecture.
In May 2008 a collaboration was announced between NASA, SGI and Intel to build a
1 petaflops computer, Pleiades, in 2009, scaling up to 10 PFLOPs by 2012.[18]
Meanwhile, IBM is constructing a 20 PFLOPs supercomputer at Lawrence Livermore
National Laboratory, named Sequoia, which is scheduled to go online in 2011.
Given the current speed of progress, supercomputers are projected to reach 1 exaflops
(1018) (one quintillion FLOPS) in 2019.[19]
Timeline of supercomputers
This is a list of the record-holders for fastest general-purpose supercomputer in the
world, and the year each one set the record. For entries prior to 1993, this list refers to
various sources[22][citation needed]. From 1993 to present, the list reflects the Top500
listing[23], and the "Peak speed" is given as the "Rmax" rating.
Peak speed
Year Supercomputer Location
(Rmax)
1938 Zuse Z1 1 OPS Konrad Zuse, Berlin, Germany
1941 Zuse Z3 20 OPS Konrad Zuse, Berlin, Germany
Post Office Research Station, Bletchley
1943 Colossus 1 5 kOPS
Park, UK
Colossus 2 (Single Post Office Research Station, Bletchley
1944 25 kOPS
Processor) Park, UK
Colossus 2 (Parallel Post Office Research Station, Bletchley
1946 50 kOPS
Processor) Park, UK
UPenn ENIAC Department of War
1946
(before 1948+ 5 kOPS Aberdeen Proving Ground, Maryland,
modifications) USA
Department of Defense
1954 IBM NORC 67 kOPS U.S. Naval Proving Ground, Dahlgren,
Virginia, USA
Massachusetts Inst. of Technology,
1956 MIT TX-0 83 kOPS
Lexington, Massachusetts, USA
25 U.S. Air Force sites across the
1958 IBM AN/FSQ-7 400 kOPS continental USA and 1 site in Canada
(52 computers)
Atomic Energy Commission (AEC)
1960 UNIVAC LARC 250 kFLOPS Lawrence Livermore National
Laboratory, California, USA
AEC-Los Alamos National Laboratory,
1961 IBM 7030 "Stretch" 1.2 MFLOPS
New Mexico, USA
1964 CDC 6600 3 MFLOPS
AEC-Lawrence Livermore National
1969 CDC 7600 36 MFLOPS
Laboratory, California, USA
1974 CDC STAR-100 100 MFLOPS
Burroughs ILLIAC NASA Ames Research Center,
1975 150 MFLOPS
IV California, USA
Energy Research and Development
Administration (ERDA)
1976 Cray-1 250 MFLOPS
Los Alamos National Laboratory, New
Mexico, USA (80+ sold worldwide)
1981 CDC Cyber 205 400 MFLOPS (~40 systems worldwide)
U.S. Department of Energy (DoE)
Los Alamos National Laboratory;
1983 Cray X-MP/4 941 MFLOPS
Lawrence Livermore National
Laboratory; Battelle; Boeing
Scientific Research Institute of
1984 M-13 2.4 GFLOPS
Computer Complexes, Moscow, USSR
DoE-Lawrence Livermore National
1985 Cray-2/8 3.9 GFLOPS
Laboratory, California, USA
1989 ETA10-G/8 10.3 GFLOPS Florida State University, Florida, USA
1990 NEC SX-3/44R 23.2 GFLOPS NEC Fuchu Plant, Fuchū,_Tokyo, Japan
Thinking Machines DoE-Los Alamos National Laboratory;
59.7 GFLOPS
CM-5/1024 National Security Agency
Fujitsu Numerical National Aerospace Laboratory, Tokyo,
1993 124.50 GFLOPS
Wind Tunnel Japan
Intel Paragon XP/S DoE-Sandia National Laboratories, New
143.40 GFLOPS
140 Mexico, USA
Fujitsu Numerical National Aerospace Laboratory, Tokyo,
1994 170.40 GFLOPS
Wind Tunnel Japan
Hitachi SR2201/1024 220.4 GFLOPS University of Tokyo, Japan
1996 Hitachi/Tsukuba CP- Center for Computational Physics,
368.2 GFLOPS
PACS/2048 University of Tsukuba, Tsukuba, Japan
1997 Intel ASCI Red/9152 1.338 TFLOPS DoE-Sandia National Laboratories, New
1999 Intel ASCI Red/9632 2.3796 TFLOPS Mexico, USA
DoE-Lawrence Livermore National
2000 IBM ASCI White 7.226 TFLOPS
Laboratory, California, USA
Earth Simulator Center, Yokohama,
2002 NEC Earth Simulator 35.86 TFLOPS
Japan
2004 70.72 TFLOPS DoE/IBM Rochester, Minnesota, USA
136.8 TFLOPS DoE/U.S. National Nuclear Security
2005 IBM Blue Gene/L
280.6 TFLOPS Administration,
Lawrence Livermore National
2007 478.2 TFLOPS Laboratory, California, USA
1.026 PFLOPS DoE-Los Alamos National Laboratory,
2008 IBM Roadrunner
1.105 PFLOPS New Mexico, USA
DoE-Oak Ridge National Laboratory,
2009 Cray Jaguar 1.759 PFLOPS
Tennessee, USA
Server-side hardware offloads of network communications:-
“Ah!” says the typical cluster-HPC user. “You mean RDMA, like InfiniBand!” (some
people might even remember to cite OpenFabrics, which includes iWARP).
No, that’s not what I mean, and that’s the one of the points of this entry.
The hardware offload that I’m referring to is a host-side network adapter that offloads
most of the networking “work” so that the server’s main CPU(s) don’t have to. In this
way, you can have dedicated (read: very fast/optimized) hardware do the heavy lifting
while the rest of the server’s resources are free to do other stuff. Among other things,
this means that the main CPU(s) don’t have to process all that network traffic,
protocol, and other random associated overhead. Depending on the network protocol
used, offloading to dedicated hardware may or may not save a lot of processing
cycles. Sending and receiving TCP data, for example, may take a lot of cycles in a
software-based protocol stack. Sending and receiving raw ethernet frames may not
(YMMV, of course—depending on your networking hardware, server hardware,
operating system, yadda yadda yadda).
That being said, it’s not just processor cycles that are saved. Caches—both
instruction and data—are likely not to be thrashed. Interrupts may be fired less
frequently. There may be (slightly) less data transferred across internal buses. ...and
so on. All of these things add up: server-side network hardware offload is a Good
Thing; it can make a server generally more efficient because of the combination of
several effects.
Over the years, many MPI implementations have benefited from one form of network
offload or another. MPI implementations that take advantage of hardware offload
typically not only increase efficiency as described above, but also provide true
communication / computation overlap (C/C overlap issues have been discussed in
academic literature for many years). True overlap allows a well-coded MPI
application to start a particular communication and then go off and do other
application-meaningful stuff while the MPI (the network offload hardware, for the
purposes of this blog entry) to progress most—if not all—of the message passing
progress independent of the main server processor(s).
Network offload is typically most beneficial with long messages—sending a short
message in its entirety can frequently cost exactly the same as starting a
communication action and then polling it for completion later. The effective overlap
for short messages can be negligible (or even negative). Hence, the biggest “win” of
hardware offload is for well-coded applications that send and receive large messages.
That being said, hardware offload for small messages may also benefit, such as when
associated with deep server-side hardware buffering and the ability to continue
progressing flow control and protocol issues independent of the main processor.
All this being said, note that Remote Direct Memory Access (RDMA) is a popular /
well-known flavor of hardware offload these days—but it is one of many. Vendors
have churned out various forms of network hardware offload over the past 20+ years.
Indeed, there have been many academic discussions over the past few years
discussing returning to the idea of using a “normal” CPU/processor in “dedicated
network” mode (fueled by the fires of manycore, of course): if you have scads and
scads of cores, who’s going to miss one [or more?] of them? Dedicate a few of them
to act as the network proxies for the rest of the cores. Such schemes have both
benefits and drawbacks, of course (and it’s been tried before, but not necessarily in
exactly the same context as manycore). The jury’s still out on how both the
engineering and market forces will affect these ideas.
MPI will use whatever is available when trying to attain high performance—including
hardware offload (such as RDMA). But to be totally clear: RDMA is not what
enables high performance in MPI—hardware offload is (one way) to attain high
performance in an MPI implementation. RDMA just happens to be among the most
recent flavors of network hardware offload.
Data center:-
A data center or datacenter (or datacentre), also called a server farm,[1] is a facility
used to house computer systems and associated components, such as
telecommunications and storage systems. It generally includes redundant or
backup power supplies, redundant data communications connections,
environmental controls (e.g., air conditioning, fire suppression) and security
devices.
Contents
• 1 History
• 2 Requirements for modern data centers
• 3 Data center classification
• 4 Physical layout
• 5 Network infrastructure
• 6 Applications
History
Data centers have their roots in the huge computer rooms of the early ages of the
computing industry. Early computer systems were complex to operate and maintain,
and required a special environment in which to operate. Many cables were necessary
to connect all the components, and methods to accommodate and organize these were
devised, such as standard racks to mount equipment, elevated floors, and cable trays
(installed overhead or under the elevated floor). Also, old computers required a great
deal of power, and had to be cooled to avoid overheating. Security was important –
computers were expensive, and were often used for military purposes. Basic design
guidelines for controlling access to the computer room were therefore devised.
During the boom of the microcomputer industry, and especially during the 1980s,
computers started to be deployed everywhere, in many cases with little or no care
about operating requirements. However, as information technology (IT) operations
started to grow in complexity, companies grew aware of the need to control IT
resources. With the advent of client-server computing, during the 1990s,
microcomputers (now called "servers") started to find their places in the old computer
rooms. The availability of inexpensive networking equipment, coupled with new
standards for network cabling, made it possible to use a hierarchical design that put
the servers in a specific room inside the company. The use of the term "data center,"
as applied to specially designed computer rooms, started to gain popular recognition
about this time.
The boom of data centers came during the dot-com bubble. Companies needed fast
Internet connectivity and nonstop operation to deploy systems and establish a
presence on the Internet. Installing such equipment was not viable for many smaller
companies. Many companies started building very large facilities, called Internet data
centers (IDCs), which provide businesses with a range of solutions for systems
deployment and operation. New technologies and practices were designed to handle
the scale and the operational requirements of such large-scale operations. These
practices eventually migrated toward the private data centers, and were adopted
largely because of their practical results.
IT operations are a crucial aspect of most organizational operations. One of the main
concerns is business continuity; companies rely on their information systems to run
their operations. If a system becomes unavailable, company operations may be
impaired or stopped completely. It is necessary to provide a reliable infrastructure for
IT operations, in order to minimize any chance of disruption. Information security is
also a concern, and for this reason a data center has to offer a secure environment
which minimizes the chances of a security breach. A data center must therefore keep
high standards for assuring the integrity and functionality of its hosted computer
environment. This is accomplished through redundancy of both fiber optic cables and
power, which includes emergency backup power generation.
The four levels are defined, and copyrighted, by the Uptime Institute, a Santa Fe, New
Mexico-based think tank and professional services organization. The levels describe
the availability of data from the hardware at a location. The higher the tier, the greater
the accessibility. The levels are: [4] [5] [6]
Physical layout
A data center can occupy one room of a building, one or more floors, or an entire
building. Most of the equipment is often in the form of servers mounted in 19 inch
rack cabinets, which are usually placed in single rows forming corridors between
them. This allows people access to the front and rear of each cabinet. Servers differ
greatly in size from 1U servers to large freestanding storage silos which occupy many
tiles on the floor. Some equipment such as mainframe computers and storage devices
are often as big as the racks themselves, and are placed alongside them. Very large
data centers may use shipping containers packed with 1,000 or more servers each;
when repairs or upgrades are needed, whole containers are replaced (rather than
repairing individual servers).[7]
• Air conditioning is used to control the temperature and humidity in the data
center. ASHRAE's "Thermal Guidelines for Data Processing Environments"[8]
recommends a temperature range of 16–24 °C (61–75 °F) and humidity range
of 40–55% with a maximum dew point of 15°C as optimal for data center
conditions.[9] The electrical power used heats the air in the data center. Unless
the heat is removed, the ambient temperature will rise, resulting in electronic
equipment malfunction. By controlling the air temperature, the server
components at the board level are kept within the manufacturer's specified
temperature/humidity range. Air conditioning systems help control humidity
by cooling the return space air below the dew point. Too much humidity, and
water may begin to condense on internal components. In case of a dry
atmosphere, ancillary humidification systems may add water vapor if the
humidity is too low, which can result in static electricity discharge problems
which may damage components. Subterranean data centers may keep
computer equipment cool while expending less energy than conventional
designs.
• Modern data centers try to use economizer cooling, where they use outside air
to keep the data center cool. Washington state now has a few data centers that
cool all of the servers using outside air 11 months out of the year. They do not
use chillers/air conditioners, which creates potential energy savings in the
millions.[10].
• Backup power consists of one or more uninterruptible power supplies and/or
diesel generators.
• To prevent single points of failure, all elements of the electrical systems,
including backup system, are typically fully duplicated, and critical servers are
connected to both the "A-side" and "B-side" power feeds. This arrangement is
often made to achieve N+1 Redundancy in the systems. Static switches are
sometimes used to ensure instantaneous switchover from one supply to the
other in the event of a power failure.
• Data centers typically have raised flooring made up of 60 cm (2 ft) removable
square tiles. The trend is towards 80–100 cm (31–39 in) void to cater for better
and uniform air distribution. These provide a plenum for air to circulate below
the floor, as part of the air conditioning system, as well as providing space for
power cabling. Data cabling is typically routed through overhead cable trays in
modern data centers. But some are still recommending under raised floor
cabling for security reasons and to consider the addition of cooling systems
above the racks in case this enhancement is necessary. Smaller/less expensive
data centers without raised flooring may use anti-static tiles for a flooring
surface. Computer cabinets are often organized into a hot aisle arrangement to
maximize airflow efficiency.
• Data centers feature fire protection systems, including passive and active
design elements, as well as implementation of fire prevention programs in
operations. Smoke detectors are usually installed to provide early warning of a
developing fire by detecting particles generated by smoldering components
prior to the development of flame. This allows investigation, interruption of
power, and manual fire suppression using hand held fire extinguishers before
the fire grows to a large size. A fire sprinkler system is often provided to
control a full scale fire if it develops. Fire sprinklers require 18 in (46 cm) of
clearance (free of cable trays, etc.) below the sprinklers. Clean agent fire
suppression gaseous systems are sometimes installed to suppress a fire earlier
than the fire sprinkler system. Passive fire protection elements include the
installation of fire walls around the data center, so a fire can be restricted to a
portion of the facility for a limited time in the event of the failure of the active
fire protection systems, or if they are not installed. For critical facilites these
firewalls are often insufficient to protect heat-sensitive electronic equipment,
however, because conventional firewall construction is only rated for flame
penetration time, not heat penetration. There are also deficiencies in the
protection of vulnerable entry points into the server room, such as cable
penetrations, coolant line penetrations and air ducts. For mission critical data
centers fireproof vaults with a Class 125 rating are necessary to meet NFPA
75[11] standards.
• Physical security also plays a large role with data centers. Physical access to
the site is usually restricted to selected personnel, with controls including
bollards and mantraps.[12] Video camera surveillance and permanent security
guards are almost always present if the data center is large or contains
sensitive information on any of the systems within. The use of finger print
recognition man traps is starting to be commonplace.
Network infrastructure
Communications in data centers today are most often based on networks running the
IP protocol suite. Data centers contain a set of routers and switches that transport
traffic between the servers and to the outside world. Redundancy of the Internet
connection is often provided by using two or more upstream service providers (see
Multihoming).
Some of the servers at the data center are used for running the basic Internet and
intranet services needed by internal users in the organization, e.g., e-mail servers,
proxy servers, and DNS servers.
Network security elements are also usually deployed: firewalls, VPN gateways,
intrusion detection systems, etc. Also common are monitoring systems for the
network and some of the applications. Additional off site monitoring systems are also
typical, in case of a failure of communications inside the data center.
Applications
The main purpose of a data center is running the applications that handle the core
business and operational data of the organization. Such systems may be proprietary
and developed internally by the organization, or bought from enterprise software
vendors. Such common applications are ERP and CRM systems.
A data center may be concerned with just operations architecture or it may provide
other services as well.
Often these applications will be composed of multiple hosts, each running a single
component. Common components of such applications are databases, file servers,
application servers, middleware, and various others.
Data centers are also used for off site backups. Companies may subscribe to backup
services provided by a data center. This is often used in conjunction with backup
tapes. Backups can be taken of servers locally on to tapes., however tapes stored on
site pose a security threat and are also susceptible to fire and flooding. Larger
companies may also send their backups off site for added security. This can be done
by backing up to a data center. Encrypted backups can be sent over the Internet to
another data center where they can be stored securely.
For disaster recovery, several large hardware vendors have developed mobile
solutions that can be installed and made operational in very short time. Vendors such
as Cisco Systems,[13] Sun Microsystems,[14][15]IBM and HP have developed systems
that could be used for this purpose.[16]
Contents
• 1 Description
o 1.1 Signaling rate
o 1.2 Latency
o 1.3 Topology
o 1.4 Messages
• 2 Programming
• 3 History
Description
Effective theoretical throughput in different configurations
Like Fibre Single (SDR) Double (DDR) Quad (QDR)
Channel, PCI
1X 2 Gbit/s 4 Gbit/s 8 Gbit/s
Express, Serial
ATA, and many 4X 8 Gbit/s 16 Gbit/s 32 Gbit/s
other modern 12X 24 Gbit/s 48 Gbit/s 96 Gbit/s
interconnects, InfiniBand offers point-to-point bidirectional serial links intended for
the connection of processors with high-speed peripherals such as disks. It supports
several signalling rates and, as with PCI Express, links can be bonded together for
additional bandwidth.
Signaling rate
The serial connection's signalling rate is 2.5 gigabit per second (Gbit/s) in each
direction per connection. InfiniBand supports double (DDR) and quad data rate
(QDR) speeds, for 5 Gbit/s or 10 Gbit/s respectively, at the same data-clock rate.
Links use 8B/10B encoding — every 10 bits sent carry 8bits of data — making the
useful data transmission rate four-fifths the raw rate. Thus single, double, and quad
data rates carry 2, 4, or 8 Gbit/s respectively.
Implementers can aggregate links in units of 4 or 12, called 4X or 12X. A quad-rate
12X link therefore carries 120 Gbit/s raw, or 96 Gbit/s of useful data. As of 2009 most
systems use either a 4X 10 Gbit/s (SDR), 20 Gbit/s (DDR) or 40 Gbit/s (QDR)
connection. Larger systems with 12x links are typically used for cluster and
supercomputer interconnects and for inter-switch connections.
Latency
The single data rate switch chips have a latency of 200 nanoseconds, and DDR switch
chips have a latency of 140 nanoseconds.The end-to-end latency range ranges from
1.07 microseconds MPI latency (Mellanox ConnectX HCAs) to 1.29 microseconds
MPI latency (Qlogic InfiniPath HTX HCAs) to 2.6 microseconds (Mellanox
InfiniHost III HCAs).[citation needed] As of 2009 various InfiniBand host channel adapters
(HCA) exist in the market, each with different latency and bandwidth characteristics.
InfiniBand also provides RDMA capabilities for low CPU overhead. The latency for
RDMA operations is less than 1 microsecond (Mellanox ConnectX HCAs).
Topology
As in the channel model used in most mainframe computers, all transmissions begin
or end at a channel adapter. Each processor contains a host channel adapter (HCA)
and each peripheral has a target channel adapter (TCA). These adapters can also
exchange information for security or quality of service.
Messages
• a direct memory access read from or, write to, a remote node (RDMA)
• a channel send or receive
• a transaction-based operation (that can be reversed)
• a multicast transmission.
• an atomic operation
Programming
InfiniBand has no standard programming interface. The standard only lists a set of
"verbs"; functions that must exist. The syntax of these functions is left to the vendors.
The most common to date has been the syntax developed by the OpenFabrics
Alliance, which was adopted by most of the InfiniBand vendors, both for Linux and
Windows. The Infiniband software stack developed by OpenFabrics Alliance is
released as "OpenFabrics Enterprise Distribution (OFED)", under a choice of two
licenses GPL2 or BSD license.
History
InfiniBand originated from the 1999 merger of two competing designs:
From the Compaq side, the roots of the technology derived from Tandem's ServerNet.
For a short time before the group came up with a new name, InfiniBand was called
System I/O.[1]
As of 2009 InfiniBand has become the de-facto interconnect of choice for high
performance computing, and its adoption as seen in the TOP500 supercomputers list
is faster than Ethernet[2]. However, note that Top500 uses Linpack for benchmarks,
which as a neatly parallel computing task tends to be fairly easy on the interconnect;
InfiniBand shouldn't be confused with the custom-built interconnects of vector
supercomputers. For example, the NEC SX-9 provides 128 GB/s of low-latency
interconnect bandwidth between each computing node, compared to the 96 Gbit/s of
an InfiniBand 12X Quad Data Rate link. Enterprise datacenters have seen more
limited use. It is used today mostly for performance focused computer cluster
applications, and there are some efforts to adapt InfiniBand as a "standard"
interconnect between low-cost machines as well. A number of the TOP500
supercomputers have used InfiniBand including the former[3] reigning fastest
supercomputer, the IBM Roadrunner. In another example of InfiniBand use within
high performance computing, the Cray XD1 uses built-in Mellanox InfiniBand
switches to create a fabric between HyperTransport-connected Opteron-based
compute nodes.[citation needed]
SGI, LSI and DDN, among others, have also released storage utilizing InfiniBand
"target adapters". These products essentially compete with architectures such as Fibre
Channel, SCSI, and other more traditional connectivity-methods. Such target adapter-
based discs can become a part of the fabric of a given network, in a fashion similar to
DEC VMS clustering. The advantage to this configuration is lower latency and higher
availability to nodes on the network (because of the fabric nature of the network). In
2009, the Jaguar Spider storage system used this type of InfiniBand attached storage
to deliver over 240 gigabytes per second of bandwidth.
InfiniBand uses copper CX4 cable — also commonly used to connect SAS (Serial
Attached SCSI) HBAs to external (SAS) disk arrays. With SAS, this is known as an
SFF-8470 connector, and is referred to as an "Infiniband style" Connector[4].
In 2008 Oracle Corporation released its HP Oracle Database Machine build as a RAC
Database (Real Application Clustered Database) with storage provided on its Exadata
Storage server which utilises InfiniBand as the backend interconnect for all IO and
Interconnect traffic. Updated versions of the Exadata Storage system, now using Sun
computing hardware, continue to utilize Infiniband infrastructure.
In 2009, IBM announced a December 2009 release date for their DB2 pureScale
offering, a shared-disk clustering scheme (inspired by parallel sysplex for DB2 z/OS)
that uses a cluster of IBM System p servers (POWER6/7) communicating with each
other over an InfiniBand interconnect.