Sie sind auf Seite 1von 4

NoSQL Databases

Vamshi Krishna Reddy V


International Institute of Information Technology
Bangalore, India
vamshikrishnareddy.v@iiitb.net

ABSTRACT swift response i.e. the performance of the system should be


From the day of invention of internet to period around 2003, high enough to handle the load. Data is becoming more and
websites were mostly read only where a publisher would post more connected (more and more joins where relationships
some content and users would be reading or using that con- are exploding). Datasets are becoming larger and larger.
tent. This kind of applications needed a different kind of Data is becoming less and less structured. As the existing
data storage which is predefined and predictable. Thus ma- systems are unable to provide optimized solutions for these
jor part of research went in storing data and presenting it to challenges, we are in quench for features more than SQL
users. But now due to increase in social interactive trend on databases are providing.
web, every user has become a publisher, where the databases
changed from read heavy architectures to read-write and As a solution for the above mentioned requirements, NoSQL
write-heavy architectures. In this paper I am presenting the has come into picture in recent days. A lot of SQL Server
basic need for alternate database systems, purpose of using professionals themselves do not really know what NoSQL is.
NoSQL databases, available options with their distinctive Some think it is a product, but most think it is a competitor
features and challenges involved in implementing them in and potential threat to SQL Server and relational databases
real world occupied by Relational databases. in general. NOSQL literally means Not Only SQL but not
an “Anti Sql” or “No SQL” which is misinterpreted by lot of
Keywords people.
NoSQL Databases, Non Relational Databases
NoSQL Definition
1. INTRODUCTION
Next Generation Databases mostly addressing some of the
In the present scenario of information explosion, enormous points like being non-relational, distributed, open-source and
amount of data is being generated by different applications horizontal scalable. The term covers a wide range of tech-
and organizations over the internet. Traditional ways of nologies, data architectures and priorities. The original in-
storing data about payrolls, employee management etc., are tention has been modern web-scale databases. The move-
have remained same from past decades without much change ment began in early 2009 and is growing rapidly. Some
in the way they are being done. But in the areas where the more characteristics such as schema-free, easy replication
user generated content is more and applications which en- support, simple API, eventually consistent are also featured
abled the users to post their content online, the data being by NoSQL databases. NoSQL is not a competitor for tradi-
generated is humongous and more over its a new kind of tional RDBMS/SQL rather It is actually a solution to many
problem which doesn’t existed before few years. For this use cases where using RDBMS was perhaps a poor fit. Thus
kind of needs where data generated is about the users, data the decision for an architect is not to decide which of the
is important but not safety critical, the storage systems two competing options (RDBMS or NoSQL) is better but
need not be so stringent with rules and schemas upon them. to choose one of them as a preferred standard storage strat-
These in turn reduce the performance in the process of en- egy. This simply is a design decision of choosing a more
forcing many facilities which are not necessary. So there is a appropriate storage system for the application under con-
need for a different genre of data storage systems for specific sideration.
usage requirements.
In these lines, many of the online giants who are generat-
ing data, themselves have come up with new ways of storing
2. NOSQL data for efficient lookup and high performance. They are
all not that different and many of them are inspired by one
We are using traditional SQL databases also known as Re- another and borrow features from each other. Most of them
lational Database systems for managing our data all these are open source. Memcached was the first one to come up
days. But due to above mentioned reasons and informa- in this category, which was developed by livejournal about
tion explosion there is a need for systems which are more a decade ago. The most influential developments in this
flexible compared to SQL databases. Huge amounts of data area which suddenly catalysed the development of NOSQL
also results in large amount of requests for it, which requires databases are Google’s BigTable and Amazon’s paper on Dy-
namo. Later facebook, the largest social networking website • In Facebook, the number of messages exchanged are
as of today has come up with its own way of storing and about 50 Million per day which is more than the num-
querying data. This is called Cassandra which was open ber of mails exchanged over internet.
sourced. They have developed this in order to increase the
performance of the inbox messages. Just as NoSQL presents
new challenges it also offers significant rewards to those who 4. TYPES OF NOSQL SOLUTIONS
can successfully incorporate it into their solution portfolio.
The key benefits are going to emerge around improved data
comprehension, flexible scaling solutions and productivity. Different types of NoSQL databases currently available are

3. DRIVING FACTORS OF NOSQL 1. Column-Oriented Databases They store all the col-
umn data in order sequentially on disk, you can read
There are three key drivers behind the increased interest in all the data for that column very quickly. Some of
NoSQL. the examples for this kind of databases are SenSage,
SybaseIQ, FluidDB. Typical application scenario may
be statistics.
1. The whole new form of traffic profile driven by what
might be referred to as Web 2.0 or the Social Web 2. Document Oriented Databases eliminates “gaps”
which is maturing now over the internet. As the world left by lots of empty fields in db records. Lack of
becomes more connected it is possible for sites to ex- enforced structure makes it very flexible. Examples
perience massive variations of traffic. This may be ex- are CouchDB, MongoDb etc., Typical applications are
pected and unexpected, such as on events like Christ- most of web applications. Tolerant of incomplete data
mas or world cup, the traffic may increase enormously is its strength.
but some of the unexpected events like 9/11 attacks,
3. Key-Value Stores will maintain relation of the keys
where many of the websites faced a gruesome challenge
and their corresponding values. These values may be
in serving the traffic and most of them actually failed
in turn documents etc., Major example in this area
to do so.
are Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle
2. The second factor is the fact that data changes over BDB
time. As the business model evolves concepts and data
4. Graph Databases are those which use the graph the-
models often struggle to evolve and keep pace with
ory concepts to store the data. Data is stored in terms
changes.
of nodes and the relations between them are realized in
3. The final factor is that the NoSQL technology is now the form of edges. Examples for graph databases are
starting to become a commodity. Once an Amazon or Neo4J, InfoGrid, Infinite Graph. Typical Applications
Google had no choice but to create a solution for them- include Social networking, Recommendations. Some of
selves that answered their problems of scale. Apache the strengths are Graph algorithms e.g. shortest path,
Foundation or other open source groups are providing connectedness, n degree relationships, etc.
community-driven support and development and has
lead to the possibility of using extremely sophisticated 5. Distributed Databases are the databases in which
code by small businesses as they are being provided data may be stored in multiple computers located in
in a smart simpler way by abstracting the complexity the same physical location, or may be dispersed over a
through RESTful API’s. network of interconnected computers. Examples: Cas-
sandra, HBase, Riak. Typical Applications include im-
plementation of Distributed file systems. Fast lookups,
Some use cases to show the need of NoSQL Databases good distributed storage of data add to its strengths.

• An application generating a large amount of temporary 5. TARGETED USERS


data which is not actually a part of main database such
as in shopping cart, retained search, personalization As said earlier, different types of these NoSQL databases
and incomplete questionnaires by users. have been developed in order to cater specific needs of the
• Web applications like twitter where people follow each users. The decision of using this kind of databases also de-
other as well as get followed by some body. Main- pends upon the user base and the frequency of usage. Also
taining these relations in relational tables will be a big users need to drill down some points of concern of their
problem as we need to have redundant data and lot of application and the extent of relaxation they can allow on
joins and foreign keys. This can be easily done with several factors like reliability or fault-tolerance etc. If it is
graphs. So we go for graph databases. going to be a safety critical system, RDBMS may be the
best choice. Choosing a NOSQL is to solve a problem that
• In write heavy architectures like twitter where users relational databases are a bad fit for. SQL will be around
write content to the data store where there happens for a while. It’s good at doing what it was designed to do.
to be 600 tweets per second i.e those many writes per However, there are many times when people use SQL simply
second. But this is not the only job of that database, because there is nothing better out there. As data complex-
it has to also cater to the read requests made to it by ity rises, a new method for accessing and persisting that
the third party applications. data will have to be investigated.
Most of the Social networking websites, Music streaming 7. CONFLICTS OF NOSQL WITH SQL
and video streaming websites can use this NoSQL databases
as eventual consistency is required rather than over all con-
sistency. The read write access for these applications will
ACID Properties
be in order of millions of requests per day and size of data
will be in order of Petabytes. More over user may not af- • The major appeal of relational databases is they make
ford to miss a comment but he will be fine if a comment the ACID promise:
shows up on second refresh though it was made by the time
– Atomicity - a transaction should be either commit
you refreshed it first. This is not the case in stocks, safety
or roll back.
critical systems like air traffic control databases and orders
through secure payment etc. These are Not meant for 100% – Consistency - valid data should only be written
transactional reliability But for scalability and performance. to the database
Not all organizations or applications will have have 100 mil- – Isolation - Ensure even all transactions are hap-
lion users accessing at same time. So Data intensive or- pening serially, the data is correct and one trans-
ganizations/applications can use the NoSQL databases and action should not be able to see uncommitted
Normal predictive usage with limited records can be well writes.
supported by RDBMS with more efficiency. – Durability - When once committed or written, it
should be persistent.
• The problem with ACID is that it provides much more
6. FEATURES OF NOSQL than what an application requires. It trips the applica-
tion when tried to scale a system across multiple nodes.
Horizontal Scalability: Most of the NoSQL databases Down time is unacceptable. So the system needs to be
such as Document oriented databases, key value stores, dis- reliable. Reliability requires multiple nodes to handle
tributed databases are designed for horizontal scalability. machine failures. To make scalable systems that can
This means as your database grows, you can simply add handle lots and lots of reads and writes, many more
more hardware, or more resources from the cloud. This nodes are required.
can be achieved as these databases operate on something • Scaling ACID across many machines will cause prob-
similar to Distributed Hash Tables (DHTs). DHTs store a lems with network failures and delays. The algorithms
key/value pair in hash buckets. These buckets hold a num- don’t work in a distributed environment at any accept-
ber of key/value pairs indexed by ”hash value.” This hash able speed.
value is a number generated from the data in such a way
that all key/value pairs are distributed evenly among the
Brewer’s CAP Theorem
hash buckets. For example, if the DHT has 5 hash buckets
and 50 key/value pairs are stored, each hash bucket should
have about 10 key/value pairs. One of the advantages here Most of the NoSQL databases are compromising on ACID
is that this is extremely easy to parallelize. If application de- properties in order to provide high performance and scala-
mands for more database servers, adding more hash buckets bility. In this process, it can be assured that out of the fol-
would solve the problem. As the database grows, just add lowing three properties only two can be satisfied by a given
more servers, and none of them need to be super-computer. database.
This is what a “Horizontally Scalable” system means.
• Consistency: The data should be same all the time. In
One of the problems with SQL servers is they don’t work simple words “what you write is what you see”. A sys-
well in a cloud. As databases grow and as traffic increases, tem is consistent if an update is applied to all relevant
larger and faster computers are required. Load balancing nodes at the same logical time.
can be achieved by mirroring the servers, but they still need
to be large and fast. This just doesn’t fit with the cloud • Availability: All operations on the data store should
model. NoSQL servers, on the other hand, can simply add return successfully. For a distributed system to be con-
more nodes. tinuously available, every request received by a non-
failing node in the system must result in a response.
Schema-Less: Another differentiating feature of NoSQL
databases is that they are schemaless. This is somewhat • Partition-Tolerance: The capability of a system to be
different from the designing of traditional databases so it consistent and deliver messages and respond to clients
would be hard to digest for the users who were using rela- requests even when one or more nodes in the network
tional databases for a long period of time. Instead of each fails to transport messages.
record existing in a row of carefully designed columns, each
record exists in a document. This is like a file on the file sys- From ACID and CAP theorem, following conclusions can
tem. Each document can store any amount of data without be made about the database systems which are assigned an
following a schema. Though these documents are schema- acronym called BASE.
less, they’re not freeform. Many databases opt to use the
open formats such as JSON, which helps you store key/value
pairs in a formatted way. A document can have any number • Basically Available - System should be up and running
of key/value pairs and these entries need not be fixed. all the time.
• Soft State - Consistency can be compromised at a given for every data access application. NoSQL implementations
point of time. solve a data storage problem that relational databases were
not really designed to address. So the bottom line on NoSQL
• Eventually Consistent - But the system should become is that NoSQL is not a technology that is going to replace
consistent at some later time. relational database systems except in very specialized in-
stances. Relational databases will continue to work effi-
Some of the big applications such as systems by Google, ciently for the tasks they have been designed to and NoSQL
Yahoo, ebay, Facebook, Amazon, etc., are being built using databases will carry out the applications which require fea-
CAP and BASE properties. tures that are Not Only SQL.

Eg: In Facebook, a person may afford to get a comment in 10. REFERENCES


his second refresh even though its being made by the time [1] The end of SQL and relational databases.
he refreshed first. So Overall system should be consistent http://blogs.computerworld.com/15510/the_end_of
but can be relaxed at a given point of time. _sql_and_relational_databases_part_1_of_3, 2010.
[2] Why Enterprises Are Uninterested in NoSQL -
8. CHALLENGES INVOLVED IN NOSQL Communications of the ACM.
http://cacm.acm.org/blogs/blog-cacm/99512-why-
enterprises-are-uninterested-in-nosql/fulltext,
In the process of incorporating the NoSQL strategy for a
2010.
large scale monolithic systems having relational data stor-
age, systems may actually be depending on certain prop- [3] CAP Theorem, Eventual Consistency, NoSQL.
erties of relational data, for example data types or trans- http://venublog.com/2010/04/07/cap-theorem-
actional consistency, then the problem is much harder. In eventual-consistency-nosql/,
some ways decoupling data provision needs to be the first 2010.
task rather than migrating data storage [4] My Thoughts on NoSQL.
http://www.eflorenzano.com/blog/post/my-
While migration there must be a clear analysis of what data thoughts-nosql/,
is relational and what is stored in relational stores currently, 2009.
but only due to the lack of alternatives. It is also important [5] The dark side of NoSQL.
to review historic decisions to see if they were made with http://codemonkeyism.com/dark-side-nosql/, 2009.
historical constraints in mind. [6] CAP Confusion: Problems with âĂŸpartition
toleranceâĂŹ - Cloudera - Apache Hadoop for the
Businesses mine information in corporate databases to im- Enterprise.
prove their efficiency and competitiveness, and business in- http://www.cloudera.com/blog/2010/04/cap-
telligence (BI) is a key issue in Information Technology for confusion-problems-with-partition-tolerance/,
all medium to large companies. NoSQL databases offer few 2010.
facilities for ad-hoc query and analysis. Even a simple query [7] NoSQL: A Modest Proposal.
requires significant programming expertise, and commonly http://voodootikigod.com/nosql-a-modest-
used BI tools do not provide connectivity to NoSQL. proposal,
2009.
Lack of experienced experts in the area of NoSQL databases,
because all the NoSQL developers are in the learning mode,
where as it is easy to find experienced RDBMS program-
mers.

9. CONCLUSION

Ease of use is the major key concern for any kind of sys-
tem, similarly in this case, simpler setup and ease-of-use of
NOSQL databases is attracting lot of users and developers
towards it. Traditional RDBMS/SQL needs more stringent
rule building as in building of schemas and defining key con-
straints etc., but those schemas are exactly what give rela-
tional databases higher performance opportunity in parallel
DBMS implementations.

NoSQL offers the chance to think differently about data i.e.


how to store and query the data efficiently. It allowed the
users to understand what is required and what can be ig-
nored while building a database. NoSQL is just another tool
in the application developer’s toolbox. It is like a hammer
cannot be used to fix every job around the house, you cannot
expect that a relational database is always the right choice

Das könnte Ihnen auch gefallen