Beruflich Dokumente
Kultur Dokumente
Presentation by:
Lokesh Pradhan
Introduction
File System
Way to organize data which is expected to be
Examples
Optical discs
DB2
Amazons S3
In HPC world
Equally large applications
Large input data set (e.g. astronomy data)
Parallel execution on large clusters
accessibility, serviceability
History of GPFS
Shark video server
Video streaming from single RS/6000
Complete system, included file system, network driver, control server
Large data blocks, admission control, deadline scheduling
Bell Atlantic video-on-demand trial (1993-94)
Tiger Shark multimedia file system
Multimedia file system for RS/6000 SP
Data striped across multiple disks, accessible from all nodes
Hong Kong and Tokyo video trials, Austin video server products
GPFS parallel file system
General purpose file system for commercial and technical computing
(possibly on multiple
nodes) participate in the
I/O
Application level
parallelism
File is stored on
multiple disks on a
parallel file system
Compute Nodes
Interconnect
I/O Server Nodes
Disk
cluster
Including maintaining a consistent view across all nodes
for the same file
Programming model allowing programs to access file data
Distributed over multiple nodes
From multiple tasks running on multiple nodes
Physical distribution of data across disks and network
entities eliminates bottlenecks both at the disk interface and
the network, providing more effective bandwidth to the I/O
resources
nodes.
Allowing for distributed token (lock) management.
Distributing token management reduces system
delays associated with a lockable object waiting to
obtaining a token.
Allowing for the specification of other networks for
GPFS daemon communication and for GPFS
administration command usage within your cluster.
disks (VSD)
All access to permanent data through disk I/O
interface
Distributed protocols, e.g., distributed locking,
coordinate disk access from multiple nodes
Fine-grained locking allows parallel access by
multiple clients
Logging and Shadowing restore consistency after
node failures
Petabytes)
The largest system in production is 75 TB
Failure detection and recovery protocols to
handle node failures
Replication and/or RAID protect against disk /
storage node failure
On-line dynamic reconfiguration (add, delete,
replace disks and nodes; rebalance file system)
(lock) operations.
Supports data replications to increase
availability in the vent of a storage media
failure.
Offers time-tested reliability and has been
installed on thousands of nodes across
industries
Basis of many cloud storage offerings
GPFSs Achievement
Used on six of the ten most powerful
Conclusion
Efficient for managing data volumes
Provides world-class performance,
References
"File System." Wikipedia, the Free Encyclopedia. Web. 20 Jan. 2012.
<http://en.wikipedia.org/wiki/File_system>.
"IBM General Parallel File System for AIX: Administration and Programming Reference -
Contents." IBM General Parallel File System for AIX. IBM. Web. 20 Jan. 2012.
<https://support.iap.ac.cn/hpc/ibm/ibm/gpfs/am3admst02.html>.
"IBM General Parallel File System." Wikipedia, the Free Encyclopedia. Web. 20 Jan. 2012.
<http://en.wikipedia.org/wiki/IBM_General_Parallel_File_System>.
Intelligent Storage Management with IBM General Parallel File System. Issue brief. IBM, July
Haita Research Lab, May 2005. Web. 21 Jan. 2012. <Architectural and Design Issues in the
General Parallel File System>.
"NCSA Parallel File Systems." National Center for Supercomputing Applications at the
<www.dell.com/powersolutions>.
Welch, Brent. "What Is a Cluster Filesystem?" Brent B Welch. Web. 21 Jan. 2012.
<http://www.beedub.com/clusterfs.html>.
Questions?