Sie sind auf Seite 1von 17

Amazon Web Services:

Performance Analysis of High Performance Computing Applications on the


Amazon Web Services Cloud

Summarized by: Michael Riera


9/17/2011

University of Central Florida CDA5532

Agenda

Purpose
Benchmarks used
Machine Setups (including EC2)
Experiment Setup
Results
Conclusions

Introduction
The purpose of this paper is to compare
Amazon EC2 service performance against
industry standard benchmarks for High
Performance Computing data centers.
This papers draws comparison between
known super computers, and HP data center,
and AWS EC2

Benchmarks
NERSC Framework
Workload includes:

Areas of climate
Materials science
Fusion
Accelerator modeling
Astrophysics
Quantum Chromodynamics

Integrated Performance Monitoring


Used to quantify the computing and communications
with MPI interfaces.

Machine Setup
Carver
National Energy Research Scientific Computing
Center at Lawrence Berkeley National Labs.
400 nodes
Quad-core Intel Nehalem 2.67 Ghz
Dual socket nodes and a single Quad Data Rate (QDR)
Each Node has 24 GB of RAM (3GB per core)

Machine Setup
Franklin
National Energy Research Scientific Computing
(NERSC) Center at Lawrence Berkeley National
Labs.
9660 nodes
Cray XT4 supercomputers
Single quad-core 2.3 Ghz AMD Opteron Budapest
processpr
6.4Gb interconnects (node innerconnect)
Each Node has 8 GB of RAM (2 GB per core)

Machine Setup
Lawrencium
Information Technology Division at Berkeley
198 nodes (1584 core)
Dell PowerEdge 1950 server
Two Intel Xeon quad-core 64 bit, 2.66Ghz Harptown
processors
DDR Infiniband network
Each node, 16GB of RAM (2GB per core)

Machine Setup
Amazon EC2
Virtual configuration
CPU Capacity is defined in terms of an abstract Amazon
EC2 compute unit.
EC2 CU are approximately equivalent to 1.0 1.2 Ghz
The large instances has:

4 EC2 Compute Units


2 Virtual Cores
7.5 GB of memory
Interconnect: Gigabit ethernet

Machine Setup

Machine Setup
/proc/cpuinfo
Different combinations (no control over
assignation)
Intel Xeon E5430 2.66Ghz quad-core processor
AMD Opteron 270 2.0Ghz dual-cores
AMD Opteron 2218 HE 2.6Ghz dual-core

Experiment Setup
CAM
The community Atmosphere Model (CAM) is the
atmospheric component of the Community
Climate System Model (CCSM)

GAMESS
Uses sockets communication
Considered stride-1 memory access, which
stresses memory bandwidth, and interconnect
collective performance

Experiment Setup
GTC

Fully self-consistent, gyrokinetic 3-D Particle-in-cell (PIC) code with a


non-spectral poisson solver

IMPACT-T

Integrated Map and Particle Accelerator Tracking Time


Uses Hockneys FFT

MAESTRO

Used to simulating astrophysical flows such as those leading up to


ignition in Type Ia supernovae

MILC

Represents lattice computation that is used to study Quantum


ChromoDynamics.

Paratec

Performs Density Functional Theory quantum-mechanical total energy


calculations using pseudi-potentials

Results

Results

Franklin, Lawrence, and EC2, are 1.4x, 2.6x and 2.7x slower than Carver In GAMES Worse case on
PARATEC, EC2 is more than 50x slower than Carver. Paratec performs a 3-DFFT and EC2
performed 52x slower than carver

Results

Results:
AWS Cloud HW Variance

CONCLUSION
Cannot control type of hardware in the cloud
Near supercomputer speeds at every house
hold

Das könnte Ihnen auch gefallen