You are on page 1of 2

Vignesh Gawali

vg975 |
Paper Review Google File System

Google File System is basically a scalable and distributed file system, designed to run on commodity
hardware and provide service to thousands of clients with high performance and efficiency.
The file system was designed with the following considerations in mind:

The hardware underneath the system is commodity hardware, and it is a norm that hardware
failures are frequent. Hence the system should consist of fault tolerance and recovery is essential.
The file sizes are of the order of gigabytes and hence read and write operations should be efficient.
This should also be considered for sequential and random writes, and synchronization should be
provided for multiple clients writing to the same file.
High bandwidth is more important than low latency as all the operations demand high amounts of
data processing and few have requirements for low latency.

File System Interface: This interface is similar to other distributed file systems and support all standard
operations along with Snapshot and Record Append. Snapshot operation creates copies of files or
directories, while Record append allows multiple clients to append to a file simultaneously.
The GFS cluster comprises of a single master node and multiple chunkservers which are essentially Linux
machines. Files are divided into chunks of fixed sizes. These chunks are stored on the chunkservers and
the chunks are identified by unique ids created by the master at the time of creation. All chunks have 3
replicas stored on different servers. For any read/write operation, the chunkserver requests the master
for the location of the chunk, and once the master replies with the chunk handle, the chunkserver then
responds with the requested operation on the chunk. The chunk size is set to 64MB.
The master typically stores 3 major types of metadata: the filename, the chunk namespace and
mappings. All the metadata is stored in-memory. The master polls the chunk for all its chunk
information at its startup, and also monitors chunkserver state using HeartBeat messages.
System Interactions:
The client requests the master and the location of the chunks it needs to modify. The master then sets a
primary chunkserver and also returns all replicas of the requested chunk. The client the pushes the data
to all replicas and then requests the primary chunkserver to initiate the modify operations. This is done
with respect to a lease mechanism for which an initial timeout is 60 seconds but can be extended.
Atomic record appends: A technique for appending data to chunks in which the client only sends the
data to the gfs, and it is appended to the chunk at an offset determined by gfs, which helps
synchronization during simultaneous writes by multiple clients.

Snapshot: The new copies are created on the same chunkserver as the original, as local disk writes are
faster than network transfers.
Garbage Collection: The file that is deleted is renamed to hidden name, for which the resources are not
recovered immediately. The recovery is done during regular file system check and if the hidden file
name extension exists for more than 3days. The deleted file can be recovered by renaming it back to
original before it is completely removed from the system scan.
Fault Tolerance:
Master node is replicated on multiple machines. If it fails, the node restarts almost instantly. If the disk
fails or machine fails, a monitoring mechanism starts a new master process on a replicated node. The
system also implements shadow master nodes that provides read-only access when the master is down.
The file system maintains data integrity by checksum mechanism that each chunkserver verifies on its
own replicas.
Ignoring a few trade-offs like using just one master node, Google has successfully built a scalable file
system that addresses most of the issues related to a scalable and high performance distributed system,
using the above mentioned techniques, and has also formed a base for other distributed systems.