Research Authors: James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, Dale Woodford
In this paper, the writer presents a database model called Spanner. Spanner is a new SQL distributed relational database designed by Google that can primarily distribute and store data, in data centers around the world and support externally-consistent distributed transactions. Spanner provided consistency that equaled and exceeded those used in RDBMS while enabling it to store an amount of data that exceeds the capacity of a single data center. In the paper, it can be seen that, Spanner provided the scalability that enabled databases to store, a few trillion database rows in millions of nodes distributed to hundreds of data centers. When the data was read from the database, Spanner connected the users to the data center that was geographically closest to the user, and when the data was written to the database, it distributed and stored it to multiple data centers. If there were a failure at the data center, when a user tried to access/read the data, the data was simply read from another data center that had a replica of the data.
Further enhancements were made to the model to increase scalability and performance. In order to offer such a broad amount of geographic redundancy as well as let applications read (and, to a lesser extent write) data without being crushed by huge latencies, the developers introduced the functionality of True Time, which provided accurate time synchronization in a distributed system by expressing the inaccuracy of time more specifically by versioning the data, allowing each version to automatically be stamped with its commit time by the True Time API and as a result applications were able to read ,write and replicate data across countries and continents, while having extremely fast read times. An example of this database used currently is, F1, Google's advertisement platform, which can specify which datacenters contain which bits of data so that frequently read data can be located nearer to users, in order to reduce write latency.
A limitation to this solution presently, is that although Spanner is scalable in the number of nodes, the node-local data structures have relatively poor performance on complex SQL queries, because they were designed for simple key-value accesses. (James C. Corbett, 2012, p.263). Inadition we would also need the ability to move client-application processes between datacentres in an automated, coordinated fashion. Moving processes raises the even more difficult problem of managing resource acquisition and allocation between datacentres. (James C. Corbett, 2012, p.263)
In closing, based on 5 years of intensive development efforts, Spanner was developed, by combining, blending and extending ideas from a multitude of research communities. Firstly, Spanner accepted familiar, easy-to-use, transactions and SQL-based query language from one of the database research communities, while also integrating and combining the concepts of scalability, wide distribution, failure resistance, auto segmentation, data replication and consistency from other research communities. In addition Spanner also incorporated the functionality of True Time, which provided accurate time synchronization in a distributed system by expressing the inaccuracy of time more specifically in the time API. (James C. Corbett, 2012, p.263) As a result, Spanner achieved its goal of developing a scalable, globally-distributed database which had previously been impossible in Big Tables under globally distributed environments.
References Spanner: Googles Globally-Distributed Database James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, Dale Woodford
Jeff Shute et al. F1The Fault-Tolerant Distributed RDBMS Supporting Googles Ad Business. Proc. of SIGMOD. May2012, pp. 777778.
BSC Final Year Project 2010-University of The West Indies, Agricultural Economics and Extension Department, Natural Sciences Agricultural Extension Portal