Sie sind auf Seite 1von 2

Vignesh Gawali

vg975 | vg975@nyu.edu
Paper Review 2 - Big Table
Big Table is basically a scalable distributed storage system designed to store tremendous
amounts of structured data over numerous machines made of commodity hardware, which
provides features like high performance, scalability, low latency, high availability, providing
support to numerous applications.
Big table implementation is similar to other database systems like relational database,
in-memory databases, parallel databases, but it differs in various properties for e.g. providing
control over the layout and format of the data stored in the system.
The Big Table system is basically implemented as a sorted multi-dimensional map, where each
entry is indexed by row key, column key, and timestamp.
The values are uninterrupted array of bytes.
-

Row Key: up to 64KB, but typically 10-100 bytes. The row range is partitioned into
Tablets, which are units for distribution and load balancing.
Column Key: Column keys are grouped into column families, which are the units for
access control. All data stored in a column family is usually of the same type. Access
control and disk memory accounting are performed at column family level.
TimeStamp: Each cell contains multiple versions of same data,indexed by timestamp.

BigTable uses several other systems like Google File System for data and log storage, operates
on shared machines that run other applications as well. The Google SSTable file format is used
to store data in BigTable. BigTable also uses Chubby, a high availability and persistent lock
service, to find tablet servers and locations, to store bigTable schema information and storage
of access control information.
The BigTable system consists of 3 major components:
- Library linked to every client
- Master Server: Assigns tablets to tablet servers, managing expiration of tablets, garbage
collection, load balancing of tablet servers.
- Tablet servers: Responsible for storing set of tablets, managing read/write requests to
tablets, and splitting of tablets.
The tablets are stored in a hierarchy with 3 levels:
- Chubby File stores the location of the root tablet.
- Root tablet stores the information for all Metadata tablets.
- Metadata Tablets store the location of the user tablets.

Tablet Assignment: The master server is responsible for assigning each tablet to one of the
tablet servers, and keep a track of the assigned tablets. The master frequently checks the status
of the tablet servers, to see if the tablet servers are alive and are serving its tablets. If not,
master will acquire a lock on the file, and the tablet server would not be able to serve that file
again.
Tablet Serving: The state of a tablet is stored using GFS. The recent updates are stored in a
buffer memory called as memtable. The previous log records are stored as SSTable entries. If a
tablet is corrupted, or needs to be recovered, the metadata table provides the list of SSTables
from which we can recover the tablet. Before any write operations are performed, the system
checks if user is authorized to perform the write, only then the commit is written to the log and
inserted into the memtable. A read operation also undergoes authorization check, and then a
merged view of the SSTables and memtables is presented for the read operation.
There are also other mechanisms that are implemented over the above described techniques to
tune the system to provide high performance, high availability, and low latency.
Locality group: A separate SSTable generated for each group, for storage of related data in one
group making access more efficient.
Implementation of 2 level cache for higher read performance.
Creation of Bloom filters for SSTables to reduce the number of disk accesses.
Compaction of tablets in order to speed up tablet recovery.
In conclusion, the BigTable system implemented by Google is a simple yet very efficient
distributed storage designed to store structured data, along with providing high performance,
availability and low latency operations. Also this system is currently in use for various
applications like Google Earth, Search mechanism, Google Analytics etc.

Das könnte Ihnen auch gefallen