Sie sind auf Seite 1von 16


Definition: Cluster Computing or Computer Clusters Cluster computing is the technique of linking two or more computers into a network (usually through a local area network) in order to take advantage of the parallel processing power of those computers. A computer cluster consists of a set of loosely connected or tightly connected computers that work together so that in many respects they can be viewed as a single system. The components of a cluster are usually connected to each other through fast local area networks ("LAN"), with each node (computer used as a server) running its own instance of an operating system. It is a widely used term meaning independent computers combined into a unified system through software and networking. At the most fundamental level, when two or more computers are used together to solve a problem, it is considered a cluster. Clusters are typically used for High Availability (HA) for greater reliability or High Performance Computing (HPC) to provide greater computational power than a single computer can provide. As high-performance computing (HPC) clusters grow in size, they become increasingly complex and time-consuming to manage. Tasks such as deployment, maintenance, and monitoring of these clusters can be effectively managed using an automated cluster computing solution. Hence Computer clusters emerged as a result of convergence of a number of computing trends including the availability of low cost microprocessors, high speed networks, and software for high performance distributed computing.


Computational Science and Engineering

Computational science and engineering (CSE) is a relatively new discipline that deals with the development and application of computational models and simulations, often coupled with high-performance computing, to solve complex physical problems arising in engineering analysis and design (computational engineering) as well as natural phenomena (computational science). CSE has been described as the "third mode of discovery" (next to theory and experimentation). In many fields, computer simulation is integral and therefore essential to business and research. Computer simulation provides the capability to enter fields that are either inaccessible to traditional experimentation or where carrying out traditional empirical inquiries are prohibitively expensive. CSE should neither be confused with pure computer science, nor with computer engineering, although a wide domain in the former is used in CSE (e.g., certain algorithms, data structures, parallel programming, high performance computing) and some problems in the latter can be modeled and solved with CSE methods (as an application area). In mechanical Engineering Combustion simulations, structural dynamics, computational fluid dynamics, computational thermodynamics, computational solid mechanics, vehicle crash simulation, biomechanics, trajectory calculation of satellites

CAE - Computer-Aided Engineering

1. Computational fluid dynamics: CFD is a branch of fluid mechanics that uses numerical methods and algorithms to solve and analyze problems that involve fluid flows. Computers are used to perform the calculations required to simulate the interaction of liquids and gases with surfaces defined by boundary conditions. With high-speed supercomputers, better solutions can be achieved. Ongoing research yields software that improves the accuracy and speed of complex simulation scenarios such as transonic or turbulent flows. Initial experimental validation of such software is performed using a wind tunnel with the final validation coming in full-scale testing, e.g. flight tests.

2. Computer-aided design: CAD is the use of computer systems to assist in the creation, modification, analysis, or optimization of a design. Mechanical team it is also known as computer-aided drafting (CAD) which describes the process of creating a technical drawing with the use of computer software.

3. Computer-aided manufacturing: CAM is the use of computer software to control machine tools and related machinery in the manufacturing of workpieces. Its primary purpose is to create a faster production process and components and tooling with more precise dimensions and material consistency, which in some cases, uses only the required amount of raw material (thus minimizing waste), while simultaneously reducing energy consumption.

4. Computer-integrated manufacturing: CIM is the manufacturing approach of using computers to control the entire production process. This integration allows individual processes to exchange information with each other and initiate actions. Through the integration of computers, manufacturing can be faster and less error-prone, although the main advantage is the ability to create automated manufacturing processes. Typically CIM relies on closed-loop control processes, based on real-time input from sensors. It is also known as flexible design and manufacturing


NX CAE is a fully integrated, modern engineering environment for modeling and simulation of real-world problems. NX simulation applications include dynamic motion simulation, linear and nonlinear stress analysis, system-level performance simulation, dynamic response simulation, vibration analysis, fluid flow and thermal analyses, durability analysis, multi-physics engineering analysis, and analysis to physical test correlation.

NX Nastran is a finite element solver that analyzes stress, vibration, structural failure/durability, heat transfer, noise/acoustics and flutter/aeroelasticity.

Femap is a CAD-independent, Windows-native pre- and post-processor for advanced engineering FEA. It provides engineers and analysts with an FEA modeling solution to handle even the most complex tasks easily, accurately and affordably.

Solid Edge Simulation is a built-in FEA tool for design engineers to validate part and assembly designs digitally within the Solid Edge environment. Based on proven Femap finite element modeling technology, Solid Edge Simulation significantly reduces the need for physical prototypes, thereby reducing material and testing costs, while saving design time.

The following software components are used by CAE software developers as the foundation for their applications:

Parasolid is 3D geometric modeling component software, enabling users of Parasolid-based products to model complex parts and assemblies. It is used as the geometry engine in hundreds of different CAD, CAM and CAE applications.

D-Cubed Components are six software libraries that can be licensed by software developers for integration into their products. The capabilities they provide include parametric sketching, part and assembly design, motion simulation, collision detection, clearance measurement and hidden line visualization.

NX CAM and CAM Express allow NC programmers to maximize the value of their investments in the latest, most efficient and most capable machine tools. NX CAM provides the full range of functions to address high speed surface machining, multi-function mill-turning, and 5-axis machining. CAM Express provides powerful NC programming with low total cost of ownership.

NX Tooling and Fixture Design offers a set of automated applications for mold and die design, fixture design and other tooling processes built on a foundation of industry knowledge and best practices.

Tecnomatix Part Planning and Validation allows manufacturing engineers, NC programmers, tool designers, and managers to work together to define and validate the part manufacturing process digitally. They can share tooling and resource libraries, and connect the plan data directly to shop floor systems such as DNC and tool management

Ansys is a engineering simulation solution sets in engineering simulation that a design process requires. Companies in a wide variety of industries use ANSYS software


There are several different varieties of computer clusters, each offering different advantages to the user. These varieties are: High Availability Clusters HA Clusters are designed to ensure constant access to service applications. The clusters are designed to maintain redundant nodes that can act as backup systems in the event of failure. The minimum number of nodes in a HA cluster is two one active and one redundant though most HA clusters will use considerably more nodes. Computer clustersHA clusters aim to solve the problems that arise from mainframe failure in an enterprise. Rather than lose all access to IT systems, HA clusters ensure 24/7 access to computational power. This feature is especially important in business, where data processing is usually time-sensitive. Load-balancing Clusters Load-balancing clusters operate by routing all work through one or more load-balancing front-end nodes, which then distribute the workload efficiently between the remaining active nodes. Load-balancing clusters are extremely useful for those working with limited IT budgets. Devoting a few nodes to managing the workflow of a cluster ensures that limited processing power can be optimized. High-performance Clusters HPC clusters are designed to exploit the parallel processing power of multiple nodes. They are most commonly used to perform functions that require nodes to communicate as they perform their tasks for instance, when calculation results from one node will affect future results from another. The best known HPC cluster is Berkeleys Seti@Home Project, an HPC cluster consisting of over 5 million volunteer home computers devoting processing power to the analysis of data from the Arecibo Observatory radio telescope.


Computer clusters offer a number of benefits over mainframe computers, including: Reduced Cost: The price of off-the-shelf consumer desktops has plummeted in recent years, and this drop in price has corresponded with a vast increase in their processing power and performance. The average desktop PC today is many times more powerful than the first mainframe computers. Processing Power : The parallel processing power of a high-performance cluster can, in many cases, prove more cost effective than a mainframe with similar power. This reduced price per unit of power enables enterprises to get a greater ROI from their IT budget. Improved Network Technology: Driving the development of computer clusters has been a vast improvement in the technology related to networking, along with a reduction in the price of such technology. Computer clusters are typically connected via a single virtual local area network (VLAN), and the network treats each computer as a separate node. Information can be passed throughout these networks with very little lag, ensuring that data doesnt bottleneck between nodes.

Scalability: Perhaps the greatest advantage of computer clusters is the scalability they offer. While mainframe computers have a fixed processing capacity, computer clusters can be easily expanded as requirements change by adding additional nodes to the network. Availability: When a mainframe computer fails, the entire system fails. However, if a node in a computer cluster fails, its operations can be simply transferred to another node within the cluster, ensuring that there is no interruption in service Introduction

Cluster is a widely used term meaning independent computers combined into a unified system through software and networking. At the most fundamental level, when two or more computers are used together to solve a problem, it is considered a cluster. Clusters are typically used for High Availability (HA) for greater reliability or High Performance Computing (HPC) to provide greater computational power than a single computer can provide.

As high-performance computing (HPC) clusters grow in size, they become increasingly complex and time-consuming to manage. Tasks such as deployment, maintenance, and monitoring of these clusters can be effectively managed using an automated cluster computing solution.

Choosing a processor
The first step in designing a cluster is to choose the building block. The processing power, memory, and disk space of each node as well as the communication bandwidth between the nodes are all factors that can be chosen. You will need to decide which are important based on the mixture of applications you intend to run on the cluster, and the amount of money you have to spend.

Best performance for the price ==> PC (currently dual-Xeon systems) If maximizing memory and/or disk is important, choose faster workstations For maximum bandwidth, more expensive workstations may be needed

PCs running Linux are by far the most common choice. They provide the best performance for the price at the moment, providing good CPU speed with cheap memory and disk space. They have smaller L2 cache sizes than some more expensive workstations which can limit the SMP performance. They have less main memory bandwidth which can limit the performance for applications that do not reuse data cache well. The availability of 64-bit PCI-X slots and memory upto 16 GBytes removes several bottlenecks, but new 64-bit architectures will still perform better for large-memory applications. For applications that require more networking than Gigabit Ethernet can provide, more expensive workstations may be the way to go. You will have fewer but faster nodes, requiring less overall communications, plus the memory subsystem can support communication rates upwards of 200-800 MB/sec. When in doubt, it is always a good idea to benchmark your code on the machines that you are considering. If that is not possible, there are many generic benchmarks that you can look at to help you decide. The HINT

benchmark developed at the SCL, or a similar benchmark based on the DAXPY kernel shown below, show the performance of each processor for a range of problem sizes. If your application uses little memory, or heavily reuses data cache, it will operate mainly on the left side of the graph. Here the clock rate is important, and the compiler choice can make a big difference. If your application is large and does not reuse data much, the right side will be more representative and the memory speed will be the dominate factor.

Designing the network

Along with the basic building block, you will need to choose the fabric that connects the nodes. As explained above, this depends greatly on the applications you intend to run, the processors you choose, and how much money you have to spend. Gigabit Ethernet is clearly the cheapest. If your application can function with a lower level of communication, this is cheap, reliable, but scales only to around 14 nodes using a flat switch (completely connected cluster, no topology). There are many custom network options such as Myrinet and Giganet. These are costly, but provide the best performance at the moment. They do not scale well, and therefore will force you to have a multilayered network topology of some sort. Don't go this route unless you know what you're doing.

Fast Ethernet

11.25 MB/sec ~free ~$100 / machine > $1000 / machine $1100/4-port card

> 100 nodes --> 24 nodes stackable small switches 2D mesh

Gigabit Ethernet ~110 MB/sec Myrinet SCI InfiniBand ~200 MB/sec 150? MB/sec ~800 MB/sec

$900/card $1000/port limited to small swithes now

Netpipe graphs can be very useful in characterizing the different network interconnects, at least from a pointto-point view. These graphs show the maximum bandwidth, and the effects of the latency pulling the performance down for smaller message sizes. They can also be very useful for fine tuning a system, from the hardware to the drivers to the message-passing layer.

Which OS?
The choice of an OS is largely dictated by the machine that you choose. Linux is always an option on any machine, and is the most common choice. Many of the cluster computing tools were developed under Linux. Linux, and many compilers that run on it, are also available free. With all that being said, there are PC clusters running Windows NT, IBM clusters running AIX, and we have even built a G4 cluster running Linux.

Loading up the software

Message-passing libraries: MPICH, LAM MPI, MPI/Pro, MP_Lite, PVM Compilers: GNU gcc and g77, Intel, Portland Group, Compaq compilers Libraries: Intel MKL (BLAS), ScaLAPACK, ASCI Red BLAS for Linux PCs Queues: PBS, DQS Status monitors: statmon

I would recommend choosing one MPI implementation and going with that. PVM is still around, but MPI the way to go (IMHO). LAM/MPI is distributed as an RPM so it is easiest to install. It also performs reasonably well on clusters. There are many free compilers available, and the availability will of course depend on the OS you choose. For PCs running Linux, the GNU compilers are acceptible. The Intel compilers provide better performance in most cases for the Intel processors, and pricing is reasonable. The Intel or PGI compilers may help on the AMD processors. However, the cluster licenses for the PGI compilers are prohibitively expensive at this point. For Linux on the Alpha processors, Compaq freely distributes the same compilers that are available under Tru64 Unix. There are also many parallel libraries such as ScaLAPACK available. For Linux PCs, you may also want to install a BLAS library like the Intel MKL or one Sandia developed. If you have many users on a cluster, it may be worthwhile to put on a queueing system. PBS (portable batch system) is currently the most advanced, and is under heavy development. DQS can also handle multiprocessor jobs, but is not quite as efficient. You will also want users to have a quick view of the status of the cluster as a whole. There are several status monitors freely available, such as statmon developed locally. None are up to where I'd like them to be yet, although commercial versions give a more active and interactive view.

Assembling the cluster

A freestanding rack costs around $100, and can hold 16 PCs. If you want to get fancier and reduce the footprint of your system, most machines can be ordered with rackmount attachments. You will also need a way to connect a keyboard and monitor to each machine for when things go wrong. You can do this manually, or spend a little money on a KVM (keyboard, video, mouse) switch that makes it easy to access any computer.


Each of the machines in a cluster can be a complete system, usable for a wide range of other computing applications. This leads many people to suggest that cluster parallel computing can simply claim all the "wasted cycles" of workstations sitting idle on people's desks. It is not really so easy to salvage those cycles, and it will probably slow your co-worker's screen saver, but it can be done. The current explosion in networked systems means that most of the hardware for building a cluster is being sold in high volume, with correspondingly low "commodity" prices as the result. Further savings come from the fact that only one video card, monitor, and keyboard are needed for each cluster (although you will have to swap these to each machine to perform the initial installation of Linux; once running, a typical Linux PC does not need a "console"). In comparison, SMP* and attached processors are much smaller markets, tending towards somewhat higher price per unit performance. Cluster computing can scale to very large systems. While it is currently hard to find a Linux-compatible SMP with many more than four processors, most commonly available network hardware easily builds a cluster with up to 16 machines. With a little work, hundreds or even thousands of machines can be networked. In fact, the entire Internet can be viewed as one truly huge cluster. The fact that replacing a "bad machine" within a cluster is trivial compared to fixing a partly faulty SMP yields much higher availability for carefully designed cluster configurations. This becomes important not only for particular applications that cannot tolerate significant service interruptions, but also for general use of systems containing enough processors so that single-machine failures are fairly common. (For example, even though the average time to failure of a PC might be two years, in a cluster with 32 machines, the probability that at least one will fail within 6 months is quite high.)

OK, so clusters are free or cheap and can be very large and highly available... why doesn't everyone use a cluster? Well, there are problems too:

With a few exceptions, network hardware is not designed for parallel processing. Typically latency is very high and bandwidth relatively low compared to SMP and attached processors. For example, SMP latency is generally no more than a few microseconds, but is commonly hundreds or thousands of microseconds for a cluster. SMP communication bandwidth is often more than 100 MBytes/second, whereas even the fastest ATM network connections are more than five times slower. There is very little software support for treating a cluster as a single system.

Thus, the basic story is that clusters offer great potential, but that potential may be very difficult to achieve for most applications. The good news is that there is quite a lot of software support that will help you achieve good performance for programs that are well suited to this environment, and there are also networks designed specifically to widen the range of programs that can achieve good performance.

*SMP: Multiple processors were once the exclusive domain of mainframes and high-end servers. Today, they are common in all kinds of systems, including high-end PCs and workstations. The most common architecture used in these devices is symmetrical multiprocessing (SMP). The term "symmetrical" is both important and misleading. Multiple processors are, by definition, symmetrical if any of them can execute any given function. Cluster Computing vs. Grid Computing Cluster Computing Characteristics Tightly coupled computers. Single system image. Centralized job management and scheduling system.

Cluster computing is used for high performance computing and high availability computing. Grid Computing Characteristics

Loosely coupled. Distributed JM & scheduling. No SSI.

Grid computing is the superset of distributive computing. It's both used for high throughput computing as well as high performance computing depending on the underlying installation setup.

Concurrent with this evolution, more capable instrumentation, more powerful processors, and higher fidelity computer models serve to continually increase the data throughput required of these clusters. This trend applies pressure to the storage systems used to support these I/O-hungry applications, and has prompted a wave of new storage solutions based on the same scale-out approach as cluster computing. Anatomy of Production of High Throughput Computing Applications

Most of these high throughput applications can be classified as one of two processing scenarios: Data Reduction, or Data Generation. In the former, large input datasets-often taken from some scientific instrument-are processed to identify patterns and/or produce aggregated descriptions of the input. This is the most common scenario for seismic processing, as well as similarly structured analysis applications such as micro array data processing, or remote sensing. In the latter scenario, small input datasets (parameters) are used to drive simulations that generate large output datasets-often time sequenced-that can be further analyzed or visualized. Examples here include crash analysis, combustion models, weather prediction, and computer graphics rendering applications used to generate special effects and full-feature animated films.

Divide and Conquer

To address these problems, today's cluster computing approaches utilize what is commonly called a scale-out or shared nothing approach to parallel computing. In the scale-out model, applications are developed using a divide-and-conquer approach-the problem is decomposed into hundreds, thousands or even millions of tasks, each of which is executed independently (or nearly independently). The most common decomposition approach exploits a problem's inherent data parallelism-breaking the problem into pieces by identifying the data subsets, or partitions, that comprise the individual tasks, then distributing those tasks and the corresponding data partitions to the compute nodes for processing.

These scale-out approaches typically employ single or dual processor compute nodes in a 1U configuration that facilitates rack-based implementations. Hundreds, or even thousands of these nodes are connected to one another with high-speed, low latency proprietary interconnects such as Myricom's Myrinet, Infiniband, or commodity Gigabit Ethernet switches. Each compute node may process one or more application data partitions, depending on node configuration and the application's computation, memory, and I/O requirements. These partitioned applications are often developed using the Message Passing Interface (MPI) program development and execution environment.

The scale-out environment allows accomplished programmers to exploit a common set of libraries to control overall program execution and to support the processor-to-processor communication required of distributed high-performance computing applications. Scale-out approaches provide effective solutions for many problems addressed by highperformance computing. These two data-intensive scenarios are depicted in figure 1. It's All About Data Management

However, scalability and performance come at a cost-namely the additional complexity required to break a problem into pieces (data partitions), exchange or replicate information across the pieces when necessary, then put the partial result sets back together into the final answer. This data parallel approach requires the creation and management of data partitions and replicas that are used by the compute nodes. Management of these partitions and replicas poses a number of operational challenges, especially in large cluster and grid computing environments shared amongst a number of projects or organizations, and in environments where core datasets change regularly. This is typically one of the most time consuming and complex development problems facing organizations adopting cluster computing.

Scalable shared data storage, equally accessible by all nodes in the cluster, is an obvious vehicle to provide the requisite data storage and access services to compute cluster clients. In addition to providing high bandwidth aggregate data access to the cluster nodes, such systems can provide non-volatile storage for computing checkpoints and can serve as a results gateway for making cluster results immediately available to downstream analysis and visualization tools for an emerging approach known as computational steering.

Yet until recently, these storage systems-implemented using traditional SAN and NAS architectures-have only been able to support modest-sized clusters, typically no more than 32 or 64 nodes. Shared Storage Architectures

Storage Area Networks (SANs) and Network Attached Storage (NAS) are the predominant storage architectures of the day. SANs extend the block-based, direct attached storage model across a high-performance dedicated switching fabric to provide device sharing capabilities and allow for more flexible utilization of storage resources. LUN and volume management software supports the partitioning and allocation of drives and storage arrays across a number of file or application servers (historically, RDBMS systems). SATA-based SAN storage and further commoditization of FC switching and HBAs have fueled recent growth in the SAN market-and more widespread adoption in high performance computing applications.

Network Attached Storage (NAS) systems utilize commodity computing and networking components to provide manageable storage directly to users through their standard client interconnection infrastructure (100Mbit or 1 GB Ethernet) using shared file access protocols (NFS and CIFS). This class of storage includes both NAS appliances and systems constructed from DAS and SAN components that export their file systems over NFS and CIFS. NAS devices deliver large numbers of file "transactions" (ops) to a large client population (hundreds or thousands of users) but can be limited in implementation by what has been characterized as the filer bottleneck. Figure 2 depicts traditional DAS, SAN, and NAS storage architectures.

Each of these traditional approaches has its limitations in high performance computing scenarios. SAN architectures improve on the DAS model by providing a pooled resource model for physical storage that can be allocated and reallocated to servers as required. But data is not shared between servers and the number of servers is typically limited to 32 or 64. NAS architectures afford file sharing to thousands of clients, but run into performance limitations as the number of clients increases.

While these storage architectures have served the enterprise computing market well over the years, cluster computing represents a new class of storage system interaction-one requiring high concurrency (thousands of compute nodes) and high aggregate I/O. This model pushes the limits of traditional storage systems. Enter Scale - Out Storage

So, why not just "scale-out" the storage architecture in the same way as the compute cluster-i.e. with multiple file servers that support the large cluster? In fact, many organizations have indeed tried this. However, bringing additional fileservers into the environment greatly complicates storage management. New volumes and mount points are introduced into the application suite and developers are taxed with designing new strategies for balancing both capacity and bandwidth across the multiple servers or NAS heads. Additionally, such an approach typically requires periodic reassessment and rebalancing of the storage resources-often accompanied by system down time. In short, these approaches "don't scale"-particularly from a manageability perspective.

Now, on the horizon, are a number of clustered storage systems capable of supporting multiple petabytes of capacity and tens of gigabytes per second aggregate throughput-all in a single global namespace with dynamic load balancing and data redistribution. These systems extend current SAN and NAS architectures, and are being offered by Panasas (ActiveScale), Cluster File Systems (Lustre), RedHat (Sistina GFS), IBM (GPFS), SGI (CxFS), Network Appliance (SpinServer), Isilon (IQ), Ibrix (Fusion), TerraScale (Terragrid), ADIC (StorNext), Exanet (ExaStore), and PolyServe (Matrix).

These solutions use the same divide-and-conquer approach as scale-out computing architectures-spreading data across the storage cluster, enhancing data and metadata operation throughput by distributing load, and providing a single point of management and single namespace for a large, high performance file system. Clustered storage systems provide:

Scalable performance, in both bandwidth and IOPS. Uniform shared data access for compute cluster nodes. Effective resource utilization, including automatic load and capacity balancing. Multi-protocol interoperability to support a range of production needs, including in-place post-processing and visualization.

The commoditization of computing and networking technology has advanced the penetration of cluster computing into mainstream enterprise computing applications. The next wave of technology commoditization-scalable networked storage architectures - promises to accelerate this trend, fueling the development of new applications and approaches that leverage the increased performance, scalability, and manageability afforded by these systems. Conclusion

With the advent of cluster computing technology and the availability of low cost cluster solutions, more research computing applications are being deployed in a cluster environment rather than on a single shared-memory system. High-performance cluster computing is more than just having a large number of computers connected with highbandwidth low-latency interconnects. To achieve the intended speed-up and performance, the application itself has to be well parallelized for the distributed-memory environment.


1 Select three or more computers to form the nodes of your cluster. Only one computer will need a monitor, mouse and keyboard. This computer is called the master node, on which you will perform your operations. It is the mainframe of the network. 2 Connect the computers to one another through a flat switch with a basic Ethernet connection. 3Ensure that all the computers, or nodes, are operating correctly and identically. It is important that all the computers share the same hardware, software and settings. 4Install the Linux distribution, Red Hat, onto all the computers in the cluster. Each computer will need unique host names and IP addresses. 5Set up identical user accounts on each node computer using the add user command as a root. 6Create the necessary host files. Make an ".rhosts" file in both the user and the root directory. Also, create a host file in the "/etc" directory. 7Allow all nodes to connect to other nodes of the private cluster by adding "ALL+" to the "/etc/security" file. 8Download the most recent version of MPICH to the master node and set it up under the user name that Is common to all the nodes, and in the root directory. Issue the command "tar zxfv mpich.tar.gz," or another tar establishment command, to create tar files, and clone them to all the nodes. 9Change to the mandel directory and create a "pmandel exec." file, and then test your cluster.

Ref: - impresive