Sie sind auf Seite 1von 59

EvaluatingandModeling

NoSQLandSQLDatabases
Part1
2016EDW
byMichaelBowers
20160314
v. 4.9
mike@cssDesignPatterns.com
1

Abstract
Weareinthemiddleofadatabaserevolution.NoSQLisdisrupting
thedatabaseworldbyinnovatinginmanydisruptiveways.
Howdowemodelinthesenewparadigms?
HowdoestheoldSQLparadigmfitinthisnewbraveworld?
Whatparadigmisbestforyourproject?
Weareinanewdataparadigm:
Newdatabasearchitectures(softwareandhardware)handle
thelargeandevergrowingvelocityandvolumeofdatathat
isdispersedacrossgeographicallydistantdatacenters
Newgraph,document,andwidecolumnmodeling
paradigmscompetewithrelational,anddimensional
Schemalessdatabasesenablemaximumagilityofsoftware
developmentandrapidchangestohugedatasets
2

Whatwillyoulearn?
Youwillbeabletochoosethebestdatabasetomeetyourneedsfor
velocity,volume,variety,variability,relevance,productivity,data
model,scale,consistency,andcost.
YouwillknowthetradeoffsofACIDorBASEconsistencymodels
andwhenitisOKornotOKtocompromiseconsistency.
Youwillunderstandthestrengthsandweaknessesofrelational,
dimensional,document,keyvalue,andtriplemodels,andwhich
SQLandNoSQLdatabasessupportwhichmodels.

AbouttheAuthor
MichaelBowers
PrincipalArchitect
LDSChurch
Author
ProCSSandHTMLDesign
Patterns
PublishedbyApress,2007

ProHTML5andCSS3Design
Patterns
PublishedbyApress,2011

mike@cssDesignPatterns.com
4

ChurchofJesusChristofLatterdaySaints

15millionmembers(29,621congregationsworldwide)
Humanitarianassistancein185countries
Thousandsofdocumentsin188publishedlanguages
192websitesandapplicationsinproduction
withbillionsofpageviewsannuallyrunningon
hundredsofMarkLogicservers

WeareinaDatabaseRevolution
Existingparadigmsare
beingchallenged

Models
Hardware
Software
Languages

Willtweakingrelational
databasesbeenough?
wikipedia

Agenda
1.
2.
3.
4.

DefiningNoSQLandBigData
OptimizingforVelocityorVolume
OptimizingforAvailabilityorConsistency
OptimizingforModelingParadigms

Agenda
1.
2.
3.
4.

DefiningNoSQLandBigData
OptimizingforVelocityorVolume
OptimizingforAvailabilityorConsistency
OptimizingforModelingParadigms

microseconds
10Kt 100Kt

newSQL

LiveAnalytics

#1 Oracle Exalytics
#19 SAP HANA

1Kt

#58 GemFire
#69 Oracle x10

WideColumn
Complex
Key

#8 Cassandra
#15 Hbase

Key/Value
Simple
Key

#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak

SQL
Hoursminutessecondsmilliseconds
PBsTBsGBs0.1Kt0.5Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

Databases(Rankedbypopularityasof20160314)

DataWarehouse

Document
JSON

#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant

Relational
Morestructure(schema)

Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift

Dimensional

#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan

DocWarehouse
XML

#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB

Graph/RDF

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx

Widecolumn/Keyvalue

Raw

Hadoop
#18 Splunk

Document
Graph Raw
Lessstructure(schemaless)

microseconds
10Kt 100Kt

newSQL

LiveAnalytics

Oracle Exalytics
SAP HANA

WideColumn
Complex
Key

Cassandra

Key/Value
Simple
Key

DynamoDB

Document
JSON

Graph/RDF

MarkLogic
MongoDB

MarkLogic

1Kt

Oracle x10

SQL
Hoursminutessecondsmilliseconds
PBsTBsGBs0.1Kt0.5Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

Databases(Rankedforenterprisesasof20160314)

DataWarehouse

DocWarehouse
XML

Oracle DB
SQL Server
DB2
MySQL
EnterpriseDB

Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

Oracle Exadata
Teradata
Netezza

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Raw

MarkLogic
Splunk
Hadoop

Relational
Morestructure(schema)

Dimensional

Widecolumn/Keyvalue

Document
Graph Raw
Lessstructure(schemaless)

10

microseconds
10Kt 100Kt

newSQL

LiveAnalytics

WideColumn
Complex
Key

Key/Value
Simple
Key

Document
JSON

Graph/RDF

Oracle DB

1Kt

#1Multimodel
SQL
Database

SQL
Hoursminutessecondsmilliseconds
PBsTBsGBs0.1Kt0.5Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

MultimodelSQLDatabases

DataWarehouse

DocWarehouse
XML
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:

Oracle DB

Drug
Name
Minicillan
Maxicillan
Minicillan

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Raw

Oracle Exadata

Relational
Morestructure(schema)

Dimensional

Widecolumn/Keyvalue

Graph Raw
Document
Lessstructure(schemaless)

11

LiveAnalytics

WideColumn
Complex
Key

Key/Value
Simple
Key

Document
JSON

Graph/RDF

MarkLogic

MarkLogic

MarkLogic

1Kt

microseconds
10Kt 100Kt

newSQL

SQL
Hoursminutessecondsmilliseconds
PBsTBsGBs0.1Kt0.5Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

MultimodelNoSQLDatabases

DataWarehouse

DocWarehouse
XML
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

#1Multimodel
NoSQL
Database
Relational
Morestructure(schema)

Dimensional

Widecolumn/Keyvalue

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Raw

MarkLogic

Graph Raw
Document
Lessstructure(schemaless)

12

microseconds
10Kt 100Kt

newSQL

LiveAnalytics

#1 Oracle Exalytics
#19 SAP HANA

1Kt

#58 GemFire
#69 Oracle x10

WideColumn
Complex
Key

#8 Cassandra
#15 Hbase

Key/Value
Simple
Key

#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak

SQL
Hoursminutessecondsmilliseconds
PBsTBsGBs0.1Kt0.5Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

Databases(Rankedbypopularityasof20160314)

DataWarehouse

Document
JSON

#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant

Relational
Morestructure(schema)

Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift

Dimensional

#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan

DocWarehouse
XML

#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB

Graph/RDF

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx

Widecolumn/Keyvalue

Raw

Hadoop
#18 Splunk

Document
Graph Raw
Lessstructure(schemaless)

13

Agenda
1.
2.
3.
4.

DefiningNoSQLandBigData
OptimizingforVelocityorVolume
OptimizingforAvailabilityorConsistency
OptimizingforModelingParadigms

14

WhatswrongwithSQLDBs?
Velocity
SQLDBsareserialized
toensureconsistency
Theyusehighlatencydisk
Thispreventsthemfromscaling
horizontallyandlimitsvelocity

Volume
SQLDBsareserialized
toshareresources:
cores,caches,andstorage
Thispreventsthemfromscaling
horizontallyandlimitsvolume

hacky

15

Storage CostandPerformance
DefinesPhysicalDatabaseArchitecture
Cost/
Blade

GB /
Blade

Cost/
GB

Bandwidth
(MB/s)

Cost/
MB/s

$2,500

4,000

$0.63

200

$12.58

Flash*

$8,600

1,200

$7.00

975

RAM

$11,700

768

$15.23

12,800

RAID0HDDs

Volume

Velocity

IOPs
(1000/s)

Latency
(S)

Cost/
IOPs

7,000

$2.10

$10.00

115

81

$0.20

$0.91

1,333,000

0.02

$0.00

*FlashistheaverageofSSDsandFlashPCIecards

Whatdoyouneedyourdatabasestobeoptimizedfor?
16

SyncorAsync
Transactions

HardwareArchitecture
VelocityOptimized(OLTP)

RAM DB

JSON

SyncorAsync

XML
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

SyncorAsync

Disk DB

JSON

SyncorAsync

XML
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

VolumeOptimized(Warehouse)
17

Problem:Serialized DBDesign
SQLDBsuseserializationandsynchronizationforconsistency
Loggingdata

Lockingrowsofdata

LoadingdatafromdisktoRAM

Lockingbuffered/cacheddata

PercentagesfromMichaelStonebraker@2011NoSQLConferenceSanJose

18

CostofSynchronization
Singlethreadedoperationsare746timesfasterthanmultiplethreadswithlocks
Time(ns)
Singlethread

300

Singlethreadwithmemorybarrier

4,700

SinglethreadwithCAS

5,700

Singlethreadwithlock

10,000

TwothreadswithCAS

30,000

Twothreadswithlock

224,000
0

50,000 100,000 150,000 200,000 250,000

SlidefromJamesGatesSORTpresentation,WhenPerformanceReallyMatters

19

FutureisMore Cores NotFasterCores


10,000,000

ExponentialTransistors

Transistors(000)

1,000,000

ClockSpeed(MHz)
Power(W)

100,000

Gap
Causes
Multiple
Cores

FlatClock

ILP

(fasterclockneedsmorepower)

10,000

FlatPower

1,000

(powerisexpensiveandhot)

100

FlatILP
(instructionlevelparallelism)

10
1
0
1970

1975

1980

1985

1990

1995

2000

2005

2010

20

DatabaseVelocity
Volume
PerDay

Realworld1K
Transactions
PerDay

Realworld 1K
Transactions
PerSecond

Relational
DB

Document
DB

Wide
Columnor
KeyValue

8GB

8,640,000

100 AsIs

86 GB

86,400,000

1,000 Tuned*

AsIs

432GB

432,000,000

5,000 Appliance

Tuned*

AsIs

864GB

864,000,000

10,000 Clustered
Appliance

Clustered
Servers

Tuned*

8,640GB

8,640,000,000

100,000

Many
Clustered
Servers

Clustered
Servers

43,200GB

43,200,000,000

500,000

Many
Clustered
Servers

* Tunedmeanstuningthemodel,queries,and/orhardware(moreCPU,RAM,andFlash)
21

HardwareTakeaway
ChooseDBdesignedtomeetyourscalingneeds
forvelocityandvolumeatlowesthardwarecost

LeverageRAMwhenyouneedmaximumvelocity
(lowlatency)

Leveragediskwhenyouneedmassivevolume
(highbandwidth)

Scalehorizontallyformaximumparallelprocessing

Choosetherightmixofsynchronousand
asynchronoustransactions
22

Whatvelocityandvolumedoyouneed?
Thoughts?

23

microseconds
10Kt 100Kt

newSQL

LiveAnalytics

#1 Oracle Exalytics
#19 SAP HANA

1Kt

#58 GemFire
#69 Oracle x10

WideColumn
Complex
Key

#8 Cassandra
#15 Hbase

Key/Value
Simple
Key

#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak

SQL
Hoursminutessecondsmilliseconds
PBsTBsGBs0.1Kt0.5Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

Databases(Rankedbypopularityasof20160314)

DataWarehouse

Document
JSON

#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant

Relational
Morestructure(schema)

Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift

Dimensional

#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan

DocWarehouse
XML

#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB

Graph/RDF

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx

Widecolumn/Keyvalue

Raw

Hadoop
#18 Splunk

Document
Graph Raw
Lessstructure(schemaless)

24

Agenda
1.
2.
3.
4.

DefiningNoSQLandBigData
OptimizingforVelocityorVolume
OptimizingforAvailabilityorConsistency
OptimizingforModelingParadigms

25

Availabilityvs.Consistency

Availability:
Receiver

Publisher

Asynchronous
Replication
Input
Disruptor

Output
Disruptor

Journaler

Replicator

Marshaller

Unmarshaller

Business
Logic
Processor

DoIwantmultipledatacenterstohaveconsistent dataimmediatelyoreventually?
DoIwantreadsand/orwritestobeavailablewhendatacentersaredown?

VariousNoSQLdatabasesgiveyoudifferentoptions

26

Brewers1998CAPTheorem
Wikipedia
Itisimpossiblefora distributedcomputersystem tosimultaneously
provideallthreeofthefollowingguarantees:
Consistency (allnodesseethesamedataatthesametime)
Availability (aguaranteethateveryrequestreceivesaresponse
aboutwhetheritsucceededorfailed)
Partitiontolerance (thesystemcontinuestooperatedespite
arbitrarypartitioningduetonetworkfailures)

27

Realtime
Synchronous
FewDataCopies
LessCompute
VerticalScale
LessAvailability
OneCPUCore
MultipleCores
MultipleCPUs
Servers

CAPInPracticeToday
Asdistanceincreases,communicationlatency
increases.Thismakescommunicationless
reliable,andrealtime accesstodatabecomes
slowerandlessavailable
Thisdoesnothavetoaffectthe
pointintimeconsistency ofdata

AvailabilityZones
GlobalDataCenters
Neartime
Asynchronous
ManyDataCopies
MoreCompute
HorizontalScale
GlobalAvailability

Solutions
GloballyConsistentClusters
MultimasterClusters

28

Consistent
Realtime
FewDataCopies
LessCompute
VerticalScale
LessAvailability

ClustersMitigate
CAPLimitations

OneCPUCore

PurposesofDatabaseClusters

MultipleCores

1. Availability

MultipleCPUs

2. Scalability

Servers
AvailabilityZones

Data
Processing

GlobalDataCenters
Inconsistent
Neartime
ManyDataCopies
MoreCompute
HorizontalScale
GlobalAvailability
29

Consistent
Realtime
FewDataCopies
LessCompute
VerticalScale
LessAvailability

AvailabilityClusters
Datacenter

Datacenter

OneCPUCore

Zone2

Zone1

MultipleCores
MultipleCPUs

sync

sync

Servers

sync

sync

AvailabilityZones
Datacenter

GlobalDataCenters
Inconsistent
Neartime
ManyDataCopies
MoreCompute
HorizontalScale
GlobalAvailability

sync

Zone2

Zone1
async

sync

sync

Datacenter

Zone2

Zone1

Independent
Storage

sync

sync

sync

30

Consistent
Realtime
FewDataCopies
LessCompute
VerticalScale
LessAvailability
OneCPUCore
MultipleCores
MultipleCPUs
Servers
AvailabilityZones

ScaleClusters
Datacenter

Datais
automatically
spreadacrossall
serverstoscalethe
storage,
processing,and
queryingofbig
data

GlobalDataCenters
Inconsistent
Neartime
ManyDataCopies
MoreCompute
HorizontalScale
GlobalAvailability

Datacanbedispersedrandomlyacrossallserversfor
maximumparallelqueryperformance
Datacanbeshardedontospecificserversfordatacolocation,
predictabledataprocessing,predictablelookups,etc.
Commondatacanbereplicatedacrossallserversforquick
localaccess

31

GloballyConsistent
Clusters

Consistent
Realtime
FewDataCopies
LessCompute
VerticalScale
LessAvailability

Datacenter1

OneCPUCore

Writesonlygotoone
scaleclusteratatime
withautomaticfailover
betweenzonesanddata
centers

Zone1

MultipleCores

sync

Datacenter2

Zone2

MultipleCPUs

async

Servers
Zone1

AvailabilityZones
GlobalDataCenters
Inconsistent
Neartime
ManyDataCopies
MoreCompute
HorizontalScale
GlobalAvailability

sync

Pros

Zone2

Globalavailability
Realtimetoneartimereads&writes
Simpledevelopment
Consistentdata(someclustersgetconsistentdatasoonerthanothers)

Cons

Allwritesaroundtheworldgotoonecluster thisslowswritesfordistantlocations
Localreadsmaynotcomefromthelatestwrites
Whenadatacenterfails,anydatacommittedinitbutnotyettransmittedtootherdata
centersislostuntilthefaileddatacentercomesbackonline

32

Consistent
Realtime
FewDataCopies
LessCompute
VerticalScale
LessAvailability

Datacenter1

Multimaster
Clusters
sync
sync

OneCPUCore

Zone1

MultipleCores
Zone2

MultipleCPUs

async
async

Writeseventuallygo
toallclustersandthe
applicationdeals
withconflictsand
inconsistentdata
Datacenter2

Servers
Zone1

AvailabilityZones
GlobalDataCenters
Inconsistent
Neartime
ManyDataCopies
MoreCompute
HorizontalScale
GlobalAvailability

Pros

Zone2

Globalavailability
Realtimetoneartimereads&writes
Allwritesgotoclosestlocation
Localreadsusuallycomefromlatestwrites

Cons

Inconsistentdata(atanypointintimeeachclusterhasdifferentdata)
Complexdevelopmenttohandleconflictresolution
Whenadatacenterfails,anydatacommittedinitbutnotyettransmittedtootherdata
centersislostuntilthefaileddatacentercomesbackonline

33

ChoosingBetweenGlobalClusters
ConsistencyandDevelopmentSimplicityvs. WriteLocality

GloballyConsistentClusters

MultimasterClusters
Datacenter1

Datacenter1

Zone1

Zone1
Datacenter2
Zone2

Zone2

Datacenter2

Zone1

Zone1
Zone2

Zone2

34

ACIDvs.BASE

H2SO4
Sulfuric Acid

Atomic
Consistent
Isolated
Durable

Basically
Available
Softstate
Eventualconsistency
NaOH

Sodium Hydroxide

35

ACIDvs.BASEinLargeScaleClusters
ACIDtransactionsbetweennodes(withinclustersoracrossdatacenters)are
synchronouslycoupledthroughatwophasecommit
1.
2.

3.

Transactioncoordinatorprecommitsthetransactiononeachnodeandindicateifthecommitispossible
Ifbothnodesagreeacommitispossible,thetransactioncoordinatorasksbothnodestoapplythe
commit
Ifanynodevetoesthecommit,thetransactioncoordinatorasksbothnodestorollbackthecommit

4.

Addingnodesreducesavailabilityandslowsinserts,updates,anddeletes

BASEtransactionsbetweennodes(withinclustersoracrossdatacenters) are
asynchronouslydecoupled
1.
2.
3.
4.

Useasynchronous,guaranteeddelivery,andorderedmessages
Addalogtabletotargetdatabasetotracksuccessfulexecutionofqueuemessages
Entriesintothelogtableoccurwhenmessagesaresuccessfullyexecutedinthetargetdatabase
Messagesinthequeuearedequeuedonlyafterthelogconfirmstheyhavebeenexecuted

5.

Addingnodesincreasesavailability,parallelprocessing,datamovement,and
inconsistency
36

WhatswrongwithNoSQL?
InmostNoSQLsolutions,
thedeveloperisresponsible
forensuringconsistency
Imagineprogramminganappto
coordinatethousandsofconcurrent
threadsacrossgigabytesofdata
structures
Imaginewritingcodetohandleall
threadingissues
Locks
Contention
Serialization
Deadlocks
Raceconditions

hacky

Threadingbugs

NOTE:MarkLogiciscurrentlytheonlyNoSQLdatabasethatisfullyACIDcompliant

37

WhatareACIDtransactions?
RelationaldatabasesusetheACIDmodeltomakeiteasy,reliable,andfastfor
concurrentprocessestoqueryandmodifythesamedataconsistently
FewNoSQLdatabasesusetheACIDmodel

Atomic
Alldataandcommandsinatransactionsucceed,orallfailandrollback

Consistent
Allcommitteddatamustbeconsistentwithalldatarulesincluding
constraints,triggers,cascades,atomicity,isolation,anddurability
H2SO4
Sulfuric Acid

Isolated
Notransactioncaninterferewithotherconcurrenttransactions

Durable
Onceatransactioniscommitted,datawillsurvivesystemfailures,and
canbereliablyrecoveredafteranunwanteddeletion
38

Hoursminutessecondsmillisecondsmicroseconds
PBsTBsGBs0.1Kt0.5Kt1Kt10Kt100Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

ACIDcompliantNoSQLDatabases
newSQL

LiveAnalytics

WideColumn
Complex
Key

Key/Value
Simple
Key

Document
JSON

Graph/RDF

MarkLogic
FoundationDB

MarkLogic

MarkLogic

SQL

DataWarehouse

DocWarehouse
XML
Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

Raw

MarkLogic

Relational
Morestructure(schema)

Dimensional

Widecolumn/Keyvalue

Document
Graph Raw
Lessstructure(schemaless)

39

WhatisDurability?
Durabledatasurvivessystemfailures

Onceatransactionhascommitted,itsdataisguaranteedtosurvivesystemfailures
Failuresmayoccurintheserver,operatingsystem,disk,anddatabase.
Failuresmaybecausedbyservercrash,fulldisk,corrupteddisk,poweroutages,etc.

Durabilityrequiresstoragetoberedundant.

Durabilityrequireslogstoreplayasynchronouslywrittendata.

Durabilityrequireslogstobearchivedtoanotherlocationsotheycanberecovered.

Durabilityworkswithatomicity toensurethatpartiallywrittendataisnotdurable.

Withoutdurabilityyoucanhavefasterinserts,updates,anddeletesbecauseyouhaveno
logstowriteandyoucanstoredatainvolatilememorywhilelazilywritingittodisk

Durabledatacanberecoveredafterunwanteddeletion

Durabilityrequiresbackupstobedurable

Durabilityrequiresrecoveriestobedurable

Durabilityallowsdatatoberecoveredtoapointintimebeforethesystemfailedorbefore
applicationsorusersincorrectlydestroyedormodifieddata

40

Howmuchdurabilitydoyouneed?
Howdurableisthedatabasesdata?
Doesthedatabaseensuredataisdurablebeforeitreturnssuccess?
Doesthedatabaseuselogsandarchiveitslogs?
Doesitwritedatatomultipleservers?
Doesitallowdeveloperstooverridedurabilityatruntimeforhighperformance?

Howwelldoesthedatabasedobackupsandrestores?
Howeasyisittomanagebackupsandrestores?Schedules?Userinterface?
Doesdatabaseprovidefull,incremental,anddifferentialbackups?
Canitrestoretoapointintime?

Howmuchcode willyouhavetowrite
tobackupdataandconfigurations?
torecoveraccidentaldeletionofdata?
toleveragethephysicallayoutofthedatabasecluster?
topropagatedatadurablyacrossthecluster?
torestoredatabasedataandclusterconfigurationstoapointintime?
41

WhatisAtomicity?
Anatomictransactionisallornothing

Atomicitymeansallpartsofatransactionsucceedornothingdoes
Thisrequirespartiallywrittendatatobeautomaticallyremoved
Atransaction isasinglecommandorasetofcommandsthatexecuteatomically
Asinglecommandtoaprogrammeroftenrepresentsmultiplecommandstothedatabase
becausethedatabaseneedstoreplicatedatatomultipledisks,toupdateindexes,to
executetriggers,toverifyconstraints,andtocascadedeletesandupdates,etc.
Withoutatomicity,youcanhavefastertransactionsbecausetheydontneedtwophase
commit

Setsofdata

Anoperationmayneedtoprocessmultipledataitems
Alldatainthesetneedstobechangedornone;otherwiseitbecomesarbitrarilyinconsistent
Forexample,youwanttodeletepartofasetofdataandpartwaythroughthetransactionfails.Ifallthechangesarenot
automaticallyrolledback,thedataisleftinaninconsistentstatethatcanaffecttheresultsofothertransactions.

Setsofcommands

Aseriesofcommandsoftenneedstoworkasaunittoproduceaccurateresults
Allcommandsneedtosucceedorallneedtofail;otherwisedatabecomesarbitrarilyinconsistent
Inconsistentdataishardtofix:datamaycontradictotherdata,andtheremaybeextradataormissingdata.Withoutatomicity
torollbackfailures,theremaybenowaytofixthedata.
Theclassicexampleiswhereyouneedtodebitoneaccountandcreditanotherasasingletransaction.Ifonecommandsucceeds
andtheotherfails,accounttotalsareinaccurate.

42

HowmuchAtomicitydoyouneed?
Howwelldoesthedatabasesupportatomictransactions?
Howimportantisitforthedatabasetostartatransaction
thatspansmorethanonedocument?
thatspansmorethanonecommand?
thatspansmorethanoneserverorclusterordatacenter?
thatrollsbackatransactioninprocess?

Issavingsomedatamoreimportantthansavingallornothing?
Howmuchcode willyouhavetowrite
tohandlecornercaseswherepartsoftransactionssucceedandothersfail?
tofindandfixinconsistentdatacausedbypartiallyfailedtransactions?
todeterminewhendatainconsistencyiscausedbyabugorapartially
completetransaction?
toimplementtwophasecommits?
toimplementatomicsequences,constraints,triggers,datareplications?
tohandleinconsistenciescausedbymultimasterupdates?
43

WhatisIsolation?
Isolationpreventsconcurrenttransactionsfromaffectingeachother

Readisolationensuresqueriesreturnaccurateresultsasofapointintime
Writeisolationlocksdatatopreventraceconditionsduringupdates,deletes,andinserts
Withoutisolation,queriesandtransactionsrunfasterbecausethedatabasedoesnthaveto
provideaconsistentviewusinglocks,snapshots,orsystemversionnumbers

Setsofdata
Anoperationcanonlyproduceaccurateresultsasofapointintime
Ittakestimeforacommandtoprocessasetofdata
Duringthistime,concurrenttransactionsmayinsert,update,anddeletedata
Withoutisolation,asinglecommandexecutesagainstdatawhileitisbeingchanged
byotherconcurrenttransactions.
Recordsmaybeaddedafterthecommandstartedrunning
Recordsmaybedeletedorchangedafterthecommandhasprocessedthem.
Changesduringaquery,violatesjoinsacrossdifferenttypesofrecords
Thiscreatesinconsistentresults:aggregatefunctionsproducewronganswers

Setsofcommands
Aseriesofcommandsneedtoworkonaconsistentviewofdatatoproduce
accurateresults.Withoutisolation,eachcommandinaserieswillexecute
againstarbitrarilydifferentdata.
44

Isolation Examples
UnisolatedMultimasterQuery
IsolatedQuery
Totalquantityis
unaffectedbydata
changesthathappen
afterthequerystarts
nomatterhow
longitruns

Totalquantityisaffectedbydatachangesthathappen
whilethequeryrunsandduringclustersynchronization
Cluster1DataActions
Insert id1,qty:1

Cluster2DataActions
Insert id2,qty:3

Update id1,qty:5

Insert

Update id1,qty:1

Delete id2

Always
Sometimes
Cluster2Synced Actions
Correct
Correct
Insert id1,qty:1
Insert id2,qty:3
1
3
Insert id2,qty:3
Insert id1,qty:1
4
Pointintime
4
Update id1,qty:5
Insert id3,qty:2
resultsfor
8
6
aggregating
Insert id3,qty:2
Update id1,qty:5
10
10
total
Update id1,qty:1
Delete id2
quantity
6
7
Delete id2
Update id1,qty:1
3
3

id3,qty:2

DataActions

Correctansweris4
query
starts

RecordsQueried
id2,qty:3
id1,qty:5

query
ends

id3,qty:2

Answeris10

45

HowmuchIsolationdoyouneed?
Howwelldoesthedatabasesupportreadisolation?
Howimportantisqueryaccuracy?

Howwelldoesthedatabasesupportwriteisolation?
Howimportantisittopreventraceconditionsanddeadlocks?
Howimportantisittoensurecommandsrunonaconsistentsetofdataata
pointintime?

Howmuchcode willyouhavetowrite
tohideconcurrentupdates,insertsanddeletesfromqueries?
tohandleupdateconflicts,raceconditionsanddeadlocks?
tohandlejoinstolookuptablestoensuretheydonotchangeduringaquery
orawrite?
toensureaggregatequeriesoperateonunchangingdata

46

WhatisConsistency?
Consistencyistheproductofatomicity,isolation,anddurability
Atomicity ensuresthatifdatarulesareviolated,suchasconstraintsandtriggers,
thetransactionfailsandallchangesarerolledback.
Isolation ensuresaqueryseesaconsistentsetofdatawhileotherconcurrent
commandsaremodifyingtheunderlyingdata
Isolation ensuresbulkupdateslocksetsofdatasotheycanbeprocessedasa
consistentunitwithoutotherconcurrentcommandsmodifyingtheirdata.

Consistencyis
thelastrefuge
ofthe
unimaginative
OscarWilde

Durability ensuresthatdataisconsistentlyreplicatedtoothernodesinacluster
soalossofanodewontcausealossofdata

Allcommitteddatamustbeconsistentwithalldatarules
Constraints,triggers,cascades,atomicity,isolation,anddurability

Datamustalwaysbeinaconsistentstateatanypointintime

47

Doyouneedcompleteconsistency?
Notnecessarily instead,youmayprefer
Absolutefastestperformanceatlowesthardwarecost
Highestglobaldataavailabilityatlowesthardwarecost
Workingwithonedocumentatatime
Writingadvancedcodetocreateyourownconsistencymodel
Consistencyis
thelastrefuge
ofthe
unimaginative
OscarWilde

Eventuallyconsistentdata
Someinconsistentdatathatcantbereconciled
Somemissingdatathatcantberecovered
Someinconsistentqueryresults
48

ConsistencyTakeaway
Chooseadatabasethatmeetsyourneeds
forwritelocalityorconsistency
MultimasterClusters

NaOH

BASE

Datacenter1

WriteLocality

Zone1

Zone2

Datacenter2
Zone1

Zone2

ACID
H2SO4

PointintimeConsistency
Lessdataloss(durability)
Morequeryaccuracy(isolation)

GloballyConsistentClusters
Datacenter1
Zone1
Zone2

Moredataintegrity(atomicity)
Lesscode tocompensatefordata
inconsistenciesandconflicts

Datacenter2
Zone1
Zone2

49

Whatdoyouneedmost?
Thoughts?

Highestperformanceforqueriesandtransactions
LowestHardwarecostacrossmultipledatacenters
WriteLocality

Lessdataloss(i.e.durability)
Morequeryaccuracy&lessdeadlocks(i.e.isolation)
Moredataintegrity(i.e.atomicity)
Lesscode tocompensateforlackofACIDcompliance
50

Agenda
1.
2.
3.
4.
5.

DefiningNoSQLandBigData
OptimizingforVelocityorVolume
OptimizingforAvailabilityorConsistency
OptimizingforModelingParadigms
Summary

51

ModelingTakeaway
Noonephysicaldatamodelmeetsallneeds,sochooseamultimodelDB
Dimensional

BusinessIntelligencereportingand
analytics

Relational

Flexiblequeries,joins,updates,
mature,standard

WideColumn

Simple,fastputsandgets,massively
scalable

Document

Fastdevelopment,schemaless
JSON/XML,searchable

Graph/RDF

Modelinganythingatruntime
includingrelationships

DocumentscombinedwithGraph
arethefuture
52

VelocityTakeaway
ChooseDBthathandlesyourrequiredvelocity
Volume
PerDay

Realworld1K
Transactions
PerDay

Realworld 1K
Transactions
PerSecond

Relational

Document

WideColumn
orKeyValue

8GB

8,640,000

100 AsIs

86 GB

86,400,000

1,000 Tuned*

AsIs

432GB

432,000,000

5,000 Appliance

Tuned*

AsIs

864GB

864,000,000

10,000 Clustered
Appliance

Clustered
Servers

Tuned*

8,640GB

8,640,000,000

100,000

43,200GB

43,200,000,000

500,000

ManyClustered Clustered
Servers
Servers
ManyClustered
Servers

* Tunedmeanstuningthemodel,queries,and/orhardware(moreCPU,RAM,andFlash)
53

HardwareTakeaway
ChooseDBdesignedtomeetyourscalingneeds
forvelocityandvolumeatlowesthardwarecost

LeveragesRAMwhenyouneedmaximumvelocity
(lowlatency)

Leveragesdiskwhenyouneedmassivevolume
(highbandwidth)

Scaleshorizontallyformaximumparallel
processing

Letsyouchoosetherightmixofsynchronousand
asynchronoustransactions
54

ConsistencyTakeaway
Chooseadatabasethatmeetsyourneeds
forwritelocalityorconsistency
MultimasterClusters

NaOH

BASE

Datacenter1

WriteLocality

Zone1

Zone2

Datacenter2
Zone1

Zone2

ACID
H2SO4

PointintimeConsistency
Lessdataloss(durability)
Morequeryaccuracy(isolation)

GloballyConsistentClusters
Datacenter1
Zone1
Zone2

Moredataintegrity(atomicity)
Lesscode tocompensatefordata
inconsistenciesandconflicts

Datacenter2
Zone1
Zone2

55

ModelingTakeaway
Chooseadatabasethatmeetsyourmultiplemodelingneeds
Dimensional

BusinessIntelligencereportingand
analytics

Relational

Flexiblequeries,joins,updates,
mature,standard

WideColumn

Simple,fastputsandgets,massively
scalable

Document

Fastdevelopment,schemaless
JSON/XML,searchable

Graph/RDF

Modelinganythingatruntime
includingrelationships

DocumentscombinedwithGraph
arethefuture
56

NoSQLEnterpriseReadinessTakeaways
NoSQL

DBaaS
DB
Appliances

SQL

MapReduce

Technology
Trigger

Inflated
Expectations

Disillusionment

Enlightenment

EnterpriseReady

1to5years

Productivity

5to10years

DerivedfromGartnerHypeCycleforDataManagement

57

microseconds
10Kt 100Kt

newSQL

LiveAnalytics

#1 Oracle Exalytics
#19 SAP HANA

1Kt

#58 GemFire
#69 Oracle x10

WideColumn
Complex
Key

#8 Cassandra
#15 Hbase

Key/Value
Simple
Key

#9 Redis
#23 Memcached
#26 DynamoDB
#31 Riak

SQL
Hoursminutessecondsmilliseconds
PBsTBsGBs0.1Kt0.5Kt

LowLatencyOperational Velocity
HighBandwidthAnalytical Volume

Databases(Rankedbypopularityasof20160314)

DataWarehouse

Document
JSON

#4 MongoDB
#24 Couchbase
#25 CouchDB
#32 MarkLogic
#41 OrientDB
#48 Cloudant

Relational
Morestructure(schema)

Hospital Name:
Operation Number:
Operation Type:
Surgeon Name:
Drug
Name
Minicillan
Maxicillan
Minicillan

#1 Oracle Exadata
#13 Teradata
#16 Hive
#28 Netezza
#29 Vertica
#33 Greenplum
#36 Amazon Redshift

Dimensional

#20 Neo4j
#32 MarkLogic
#41 OrientDB
#44 Titan

DocWarehouse
XML

#1 Oracle DB
#2 MySQL
#3 SQL Server
#5 PostgreSQL
#6 DB2
#10 SQLite
#12 SAP AS
#19 SAP HANA
#21 Informix
#22 MariaDB

Graph/RDF

Big Data

John Hopkins
13
Heart Transplant
Dorothy Oz

Drug
Manufacturer
Drugs R Us
Canada4Less Drugs
Drug USA

Dose
Size
200
400
150

Dose
UOM
mg
mg
mg

#11 ElasticSearch
#14 Solr
#35 MarkLogic
#37 Sphinx

Widecolumn/Keyvalue

Raw

Hadoop
#18 Splunk

Document
Graph Raw
Lessstructure(schemaless)

58

EvaluatingandModeling
NoSQLandSQLDatabases
Part1
2016EDW
byMichaelBowers
20160314
v. 4.9
mike@cssDesignPatterns.com
59

Das könnte Ihnen auch gefallen