Sie sind auf Seite 1von 52

Introduction to Big Data and NoSQL

SQL Azure Saturday April, 21, 2012


Don Demsak Advisory Solutions Architect EMC Consulting www.donxml.com

Meet Don
Advisory Solutions Architect EMC Consulting
Application Architecture, Development & Design

DonXml.com, Twitter: donxml Email don@donxml.com SlideShare - http://www.slideshare.net/dondemsak

The era of Big Data

How did we get here?


Expensive
Processors Disk space Memory Operating Systems Software Programmers

Monoculture
Limit CPU cycles Limit disk space Limit memory Limited OS Development Limited Software Programmers
Mono-lingual Mono-persistence

Typical RDBMS Implementations


Fixed table schemas Small but frequent reads/writes Large batch transactions Focus on ACID
Atomicity Consistency Isolation Durability

How we scale RDBMS implementations

1st Step Build a relational database

Database

2nd Step Table Partitioning


p1 p2 p3

Database

3rd Step Database Partitioning


Browser Customer #1 Web Tier B/L Tier Database

Browser Customer #2

Web Tier

B/L Tier

Database

Browser Customer #3

Web Tier

B/L Tier

Database

4th Step Move to the cloud?


Browser Customer #1 Web Tier B/L Tier
SQL Azure Federation

Browser Customer #2

Web Tier

B/L Tier

SQL Azure Federation

Browser Customer #3

Web Tier

B/L Tier

SQL Azure Federation

10

There has to be other ways

11

Polyglot Persistence

12

Polyglot Programmer

13

14

Where Did NoSQL Originate?


1998 - Carlo Strozzi
NoSQL project - lightweight open-source relational DB with no SQL interface

2009 - Eric Evans & Johan Oskarsson of Last.fm wanted to organize an event to discuss opensource distributed databases

15

NoSQL (loose) Definition


(often) Open source Non-relational Distributed (often) dont guarantee ACID

16

Atlanta 2009
No:sql(east) conference
select fun, profit from real_world where relational=false

Billed as conference of no-rel datastores

17

Types Of NoSQL Data Stores

18

5 Groups of Data Models


Relational

Document

Key Value

Graph

Column Family

19

Document Store
Apache Jackrabbit CouchDB MongoDB SimpleDB

XML Databases
MarkLogic Server eXist.

20

Document?
Okay think of a web page...
Relational model requires column/tag Lots of empty columns Wasted space

Document model just stores the pages as is


Saves on space Very flexible.

21

Graph Storage
AllegroGraph Core Data Neo4j DEX

FlockDB
Microsoft Trinity (research project)
http://research.microsoft.com/en-us/projects/trinity/

22

Whats a graph?
Graph consists of
Node (stations of the graph) Edges (lines between them)

FlockDB
Created by the Twitter folks Nodes = Users Edges = Nature of relationship between nodes.

23

Key/Value Stores
On disk Cache in Ram Eventually Consistent
Weak Definition
If no updates occur for a period, eventually all updates will propagate through the system and all replicas will be consistent

Strong Definition
for a given update and a given replica eventually either the update reaches the replica or the replica retires

Ordered
Distributed Hash Table allows lexicographical processing

24

Key/Value Examples
Azure AppFabric Cache Memcache-d VMWare vFabric GemFire

25

Object Databases
Db4o GemStone/S InterSystems Cach Objectivity/DB

ZODB

26

Tabular
BigTable Mnesia Hbase Hypertable

Azure Table Storage


SQL Server 2012

27

Azure Table Storage Demo

28

Big Data

29

Big Data Definition


Volumes & volumes of data Unstructured Semi-structured Not suited for Relational Databases

Often utilizes MapReduce frameworks

30

Big Data Examples


Cassandra Hadoop Greenplum Azure Storage

EMC Atmos
Amazon S3 SQL Azure (with Federations support)

31

Real World Example


Twitter
The challenges
Needs to store many graphs
Who you are following Whos following you Who you receive phone notifications from etc

To deliver a tweet requires rapid paging of followers Heavy write load as followers are added and removed Set arithmetic for @mentions (intersection of users).

32

What did they try?


Started with Relational Databases

Tried Key-Value storage of denormalized lists


Did it work?
Nope
Either good at
Handling the write load Or paging large amounts of data But not both

33

What did they need?


Simplest possible thing that would work Allow for horizontal partitioning Allow write operations to Arrive out of order
Or be processed more than once Failures should result in redundant work

Not lost work!

34

The Result was FlockDB


Stores graph data Not optimized for graph traversal operations Optimized for large adjacency lists
List of all edges in a graph
Key is the edge value a set of the node end points

Optimized for fast read and write


Optimized for page-able set arithmetic.

35

How Does it Work?


Stores graphs as sets of edges between nodes Data is partitioned by node
All queries can be answered by a single partition

Write operations are idempotent


Can be applied multiple times without changing the result

And commutative
Changing the order of operands doesnt change the result.

36

Working With Big Data

37

ACID
Atomicity
All or Nothing

Consistency
Valid according to all defined rules

Isolation
No transaction should be able to interfere with another transaction

Durability
Once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors

38

BASE
Basically Available
High availability but not always consistent

Soft state
Background cleanup mechanism

Eventual consistency
Given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent.

39

Traditional (relational) Approach

Extract

Transactional Data Store

Transform

Load

Data Warehouse

40

Big Data Approach


MapReduce Pattern/Framework
an Input Reader Map Function To transform to a common shape (format) a partition function a compare function Reduce Function an Output Writer

41

MongoDB Example
> // map function > m = function(){ ... this.tags.forEach( ... function(z){ ... emit( z , { count : 1 } ); ... } ... ); ...}; > // reduce function > r = function( key , values ){ ... var total = 0; ... for ( var i=0; i<values.length; i++ ) ... total += values[i].count; ... return { count : total }; ...};

> // execute > res = db.things.mapReduce(m, r, { out : "myoutput" } );

42

MongoDB Demo

43

Big Data on Azure


Azure Table Storage
Azure Service Bus

SQL Azure Federations


MongoDB on Azure
http://www.mongodb.org/display/DOCS/MongoDB+on+Azure

Hadoop on Azure
https://www.hadooponazure.com/

44

Using Azure for Computing

Data

Worker Master Job/Task Scheduler Worker Worker

Data

Client

Data

Data

45

Moving to Event Based Architecture


Web Role Web Role Worker Role Worker Role

Web Role

Worker Role

Req

Req

Req

Queue

Web Role

Worker Role

Web Role

Web Role

Monitor queue length against users expectations

Worker Role

Worker Role

46

Aggregate Stores

47

Visualizing Aggregates
ID: 1001 Customer: Ann Line Items

Orders

Customers

32411234 707423234 125145

2 1 1

$48 $56 $24

$96 456 $24

Payment Details

Order Lines

Card: AmEx CC#: 12343 Expiration: 07/2015

Credit Cards

48

Visualizing Aggregates
ID: 1001 Customer: Ann

Line Items

32411234 707423234 125145

2 1 1

$48 $56 $24

$96 456 $24

Payment Details

{ SalesOrdersView:{ ID: 1001, Customer: Ann, LineItems: [] .. . .. } }

Card: AmEx CC#: 12343 Expiration: 07/2015

49

MongoDB on Azure Demo

50

Next Steps
Learn a NoSQL product
Great place to start AppFabric Cache, Azure Table Storage, MongoDB

Pick a new programming language to learn


Not Java or C#/VB Node.js, JavaScript, F#

51

THANK YOU

52

Das könnte Ihnen auch gefallen