Sie sind auf Seite 1von 10

A WHITE PAPER

Neo Technology

NOSQL for the Enterprise


November 2011

Summary
Businesses are struggling to cope with an explosion of data, growing at 40% per year (McKinsey Global Institute). The business need to leverage complex and connected data is driving the adoption of scalable, high-performance NOSQL databases to expand and enhance their data management strategies. Why NOSQL? What is missing from traditional relational databases that would create a need for a new breed of database solutions? The answer lies in the types of data being managed, the volume of information available today, and the relationships between individual records. Ironically relational databases were not designed to navigate the types of relationships that bring enormous value to many of todays enterprise applications. A number of NOSQL databases have emerged over the past decade, not only to handle the terabytes and petabytes of data generated by enterprises and consumers, but also the types of data created. NOSQL databases contain structured, semi-structured and unstructured data such as text, audio, video, social network feeds, Web logs and more that cannot be managed in traditional databases. However, the first NOSQL databases were designed for high-volume Web properties, such as Amazon and Google, who needed to store a lot of data but were not necessarily designed for the high-transaction, business-critical needs of todays enterprise applications. After meeting with hundreds of developers, architects and CIOs at Fortune 500 companies, a few key requirements emerge as enterprise needs for a NOSQL database. Not surprisingly, many of them are characteristics that have been proven for years by traditional enterprise-strength databases. A NOSQL database should:

Enable high-performance queries on complex, connected data Easily represent the complex, connected data stored in todays applications Demonstrate mature support for end-to-end transactions Ensure enterprise-grade durability Provide native support for Java, the most widely used platform in todays enterprises

Todays NOSQL databases include Key-Value stores, Column Family databases, Document databases, and Graph databases. Each type stores data in a different way and is designed with a specific purpose in mind. The type of NOSQL database you choose depends on what type of data you need to store and how you want to access it. A graph database, for instance, models real world connections better than other NOSQL databases. It can support todays complex and connected data types, and scale to billions of nodes and relationships. It is ideally suited for any application where knowledge is obtained by relationships. An enterprise delivering modern applications needs a database that can manage todays complex and connected data while still delivering the enterprise strength, transactions and durability IT departments have counted on for years.

2011 Neo Technology

Page 2

The Rise of NOSQL


Businesses are struggling to cope with an explosion of data, growing at 40% per year (McKinsey Global Institute). The business need to leverage complex and connected data is driving the adoption of scalable, high performance NOSQL databases to expand and enhance their data management strategies. What is missing from traditional relational databases that would create a need for a new breed of database solutions? The answer lies in the types of data being managed, the volume of information available today, and the need to connect relationships between individual records. Ironically relational databases were not designed to manage the types of relationships that are so essential in todays applications. The relational database represents one of the most important developments in the history of computer science. Upon its arrival over 40 years ago, it revolutionized the way the industry views data management and today it is practically ubiquitous. However, in some cases relational database technology shows its age. The relational database model is built to handle structured data data with a defined and complete schema in tables and it does that very well. The problem is that a lot of the data used today is not rigidly structured. Big Data has emerged as a term that refers not only to the massive amounts of data generated by enterprises and consumers - terabytes and petabytes of information - but the types of data created: structured, semi-structured and unstructured data such as text, audio, video, social network feeds and Web logs that cannot be managed in traditional databases. Many organizations have introduced complex (semi-structured and/or unstructured) data to their applications, but because they are so heavily invested in their relational database they believe that the only way to model this new data is by forcing it into a tabular structure and then trying to work around it in upper layers. For complex data, the relational database offers poor runtime characteristics. Todays data is highly inter-related, and the relationships or connections between them are important. Traditional databases were designed to perform set-based operations, or to consider an entire data set when performing a query and then filtering down to reach a conclusion. They were not optimized for todays applications, which want to see how individual records relate to one another. In a relational database, querying at that level is costly and comes with a performance penalty. Over the past few years some of the best known Web properties felt they had no option but to build their own custom NOSQL databases to manage huge volumes of ever-changing data. Amazons Dynamo and Googles Big Table are examples of homegrown databases that can store lots of data, however at times sacrificing consistency for availability. Not every company can design its own custom NOSQL database, and so a few categories of open source and commercially available NOSQL have emerged. But NOSQL databases built to manage these large Web properties are not necessarily designed for the majority of enterprise applications. The enterprise has seen the same explosion in data complexity and volume as the 2011 Neo Technology Page 3

Web world, yet few of the NOSQL databases available today can meet the demands of the enterprise.

What are the Enterprise Needs for NOSQL Databases?


As enterprises introduce new interactive applications from banks offering self-service applications to retailers suggesting additional products based on a customers business network they expect their database to perform much as it did before, even though their data is much more complex and connected. After meeting with hundreds of developers, architects and CIOs at Fortune 500 companies, a few essentials emerge as enterprise requirements for a NOSQL database. Not surprisingly, many are characteristics that have been proven for years by traditional enterprise-strength databases.

Ability to Handle Todays Complex and Connected Data


The biggest difference between a relational database and a NOSQL database is the ability to store not only huge volumes of data, but also data types that are complex and connected. In other words, data such as audio, video, social network feeds, Web logs, email, documents, and other text-centric information are very difficult, if not impossible, to squeeze into the confines of a traditional relational database. A NOSQL database should enable high performance queries on complex, connected data inherent in today's applications. Users should be able to ask questions such as "Who are all my contacts in Europe?" and "Which of my contacts ordered from this catalog?"

Simplify the Development of Applications Using Complex and Connected Data


A NOSQL database should be able to easily represent the complex and connected data that makes up todays enterprise applications. Unlike traditional databases, a flexible schema that allows for multiple data types enables developers to easily change applications without disrupting live systems. More collaborative development practices such as Agile have replaced waterfall processes and databases must be flexible and adaptable to keep the lights on amid constantly changing infrastructures.

Support for End-to-End Transactions


A surprisingly few number of NOSQL databases commercially available today are able to conduct all or nothing transactions the way traditional databases do. Although this is a musthave for relational databases, not all NOSQL databases can do this. Enterprise developers want to be able to group operations and have all of them succeed or not at all. An example of this would be taking $100 out of one bank account: the database should confirm that $100 has been deposited into another account before committing it to the database log. Twitter will probably survive if a single Tweet is lost, but an enterprise application such as online banking cannot afford such a mistake. A NOSQL database for the enterprise should support ACID transactions including XA-compliant distributed two-phase commits. The connections between data should be stored on a disk, in a structure designed for high-performance retrieval of connected data sets, all while enforcing strict transaction management. This design delivers significantly better performance for connected data than offered by relational database technologies. 2011 Neo Technology Page 4

Enterprise-grade Durability so that Data is Never Lost


An NOSQL database for the enterprise needs to have enterprise-grade durability that ensures any transaction committed to the database will not be lost. In database systems, durability means the ACID property that ensures that transactions committed will be there, no matter what. In other words, if you book an airline ticket and the system goes down, that seat should still be booked after the system is recovered. Durability is ensured through the use of database backups and transaction logs that facilitate the restoration of committed transaction in spite of any software or hardware failures. Some NOSQL databases tout single machine durability, but how can a business-critical application put all its eggs in one basket? Relational databases have employed replication for years to guarantee enterprise-strength durability. NOSQL databases should also be able to ensure durability.

Java Still Reigns for Enterprise Development


In order to be serious about enterprise development, a NOSQL database must support Java. Java remains the most prevalent programming languages in todays enterprises. Developers need a Java-friendly way to handle complex, connected data using the transactional guarantees necessary for critical business applications. While hooks to other languages such as Ruby, Python, Groovy and others are convenient, a NOSQL database must first and foremost support Java to be a serious contender in the enterprise arena.

Emerging Categories of NOSQL Databases


There are four emerging categories of NOSQL databases available today: Key-Value stores, Column Family databases, Document databases and Graph databases. Each was designed to accommodate the huge volumes of data stored today as well as the new data types that are not easily stored within the confines of a traditional relational database. The type of NOSQL database you choose should be based on the type of data you need to store, its size and complexity.

Key-Value Stores are the Simplest of NOSQL Databases


A Key Value data model is simple: it stores data in key and value pairs where every key maps to a value. It can scale across many machines, but cannot support other data types. A Key-Value store is ideal for applications that require massive amounts of simple data like sensor data or for rapidly changing data such as stock quotes. Key-Value stores support massive data sets, of very primitive data (hence the term store and not database). They are ideal for capturing timeseries data, like every vital statistic from your morning run, and everyone else's morning run, over the last decade. Amazons Dynamo was built as a Key-Value store.

Column Family Databases Store Large Amounts of Data, But Not Rich Data
A Column Family database can handle semi-structured data, because in theory every row can have its own schema. It has few mandatory attributes and few optional attributes. Its a powerful way to capture semi-structured data, but often sacrifices consistency for availability. Column Family databases can accommodate huge amounts of data, with basic organization to help sift through the information. Writes are faster than reads, so one natural niche is real-time data analysis. Logging real-time events is a perfect use case or any time when you need random, real-time read/write access to your Big Data. Googles Big Table was built on a Column Family 2011 Neo Technology Page 5

database. Apache Cassandra is another example, which was originally developed for Facebook to store billions of columns per row. However, it is unable to support unstructured data types or query end-to-end transactions.

Document Databases Store Multiple Data Types, But Lack Transaction Support
A document database contains a collection of key-value pairs stored in documents. While it is good at storing documents, it was not designed with enterprise-strength transactions and durability in mind. Document databases are the most flexible of the key-value style stores, perfect for storing a large collection of unrelated, discrete documents. A good application would be a product catalog, which can display individual items, but not related items. You can see whats available for purchase, but you cannot connect it to what other products like customers bought after they viewed it. MongoDB and CouchDB are examples of document databases.

Graph Databases Show the Connections Between Data


A graph database uses nodes, relationships between nodes and key-value properties instead of tables to represent information. This model is typically substantially faster for associative data sets and uses a schema-less, bottoms-up model that is ideal for capturing ad-hoc and rapidly changing data. Much of todays complex and connected data can easily be stored in a graph database where there is great value in the relationships among data sets.

Figure 1: In a graph database, everything is represented by nodes, relationships and properties.

A graph database accesses data using traversals. A traversal is how you query a graph, navigating from starting nodes to related nodes according to an algorithm, finding answers to questions like what music do my friends like, that I dont yet own? or if this power supply goes down, what Web services are affected? Using traversals, you can easily conduct end-toend transactions that represent real user actions.

2011 Neo Technology

Page 6

Neo4j is the leading graph database available today, and includes enterprise-ready support for complex and connected data, transactions, durability, and Java. To summarize, there are four categories of NOSQL databases: Key-Value stores, Column Family databases, Document databases, and Graph databases. The type of NOSQL database you choose depends on what type of data you need to store, its size and complexity. Each was designed to handle todays data that could not be successfully managed in traditional databases. Key-Value stores and Column Family databases handle size very well, but when it comes to complexity, Document databases and Graph databases are better suited to represent rich data. Graph databases are ideal for applications with complex and connected data.

Figure 2: NOSQL Databases are designed for the size and complexity of todays data.

In the Enterprise, There is Value in Relationships


A graph database models real world connections better than other NOSQL databases. It can support todays complex and connected data types, and scale to billions of nodes and relationships. It is ideally suited for any application where knowledge is obtained by relationships. For example, you may want to know which of your customers on the East Coast have made a purchase in the last six months AND will be attending an upcoming conference. The ability to cross-reference these data points gives you much more context to an individual customer than just a single record. Take it a step further and you can find out more about an individual customer whether you have worked in a similar industry or play soccer on the weekends all of which you can reference when you meet in person at the show. A NOSQL graph database can easily perform these queries without impacting performance or being as cost-prohibitive as traditional databases. They were designed to quickly and easily compare how individual records relate to one another. 2011 Neo Technology Page 7

Enterprise Use Cases for NOSQL Graph Databases


A graph database is ideal for any enterprise application that has structured and unstructured data and relies on the relationships between records, such as: Master Data Management Todays Fortune 500 companies need a database that can easily overlay information to make business-critical decisions. Imagine the ramifications if an enterprise found that its traditional database could not handle the joins when mapping its sales force coverage to its customer base: if the system could not recognize which sales representative was responsible for deals in the Southeast region, he/she might not be compensated for a sale. A graph database can capture complex inter-relationships directly in a graph and keep the lights on for such revenueimpacting systems. Network Data Management While the cloud is a promising new way to utilize computing resources, adding yet more layers to be managed presents a significant challenge. Managing the towering hierarchies of applications, services, switches, servers and power requires a focus on how things are connected. A graph database helps perform what-if analysis, and can respond in real-time to a changing topology of networked entities. Social Networks Social networks are not just the Facebooks of today; they are now a part of enterprise applications where a buyer would like to know what other companies used this product, or what other products they also purchased. The relationships among entities are a natural application for a graph database. Every user wants their own view on the world, resulting in extremely localized queries of the data. With a graph database, local queries are always efficient, no matter how many users are added to the entire set. Recommendation Engines Recommendations are increasingly prevalent in todays enterprise applications. In the case of a highly collaborative, global application, creative marketing teams wanted to share and query on similar assets. With a graph database, users can quickly and easily find out which assets their peers used and get recommendations on what they want even before they ask for it.

2011 Neo Technology

Page 8

NOSQL for the Enterprise


NOSQL has emerged to manage new data types, huge volumes of data and the relationships between complex and connected today inherent in modern applications. The type of NOSQL database you choose depends on what type of data you need to store and how you want to access it. Each of the NOSQL databases discussed serves a specific purpose. A graph database models real world connections better than other NOSQL databases. It can support todays complex and connected data types, and scale to billions of nodes and relationships. It is ideally suited for any application where knowledge is obtained by relationships. NOSQL databases often coexist with traditional relational databases. Thats why the term NOSQL has evolved to mean Not Only SQL. Enterprises are too big and too complex for a one size fits all solution. Transactions span from multiple data stores and need to have seamless integration. But when evaluating a NOSQL database, it is critical to demand enterprisereadiness. An enterprise delivering modern applications needs a NOSQL database that can manage todays complex and connected data while still delivering the enterprise strength, transactions and durability IT departments have relied on for years.

2011 Neo Technology

Page 9

About Neo Technology Neo Technology is the NOSQL database company for the enterprise. Proven by eight years of 24/7 production use, Neo4j is a fully transactional database which enables customers, including Adobe and Cisco, to tackle complex data problems. Neo Technology is a privately held company funded by Fidelity Growth Partners Europe, Sunstone Capital and Conor Venture Partners, and is headquartered in Menlo Park, CA. For more information, visit www.neotechnology.com.

World Headquarters Neo Technology 1370 Willow Road Menlo Park, CA 94025 USA U.S. & Canada: 1 (855) 636-4532 European Lab Neo Technology Anckargripsgatan 3 211 19 Malmo, Sweden Tel: 0808-189 0493

www.neotechnology.com
Copyright 2011 Neo Technology. All rights reserved.

Das könnte Ihnen auch gefallen