Sie sind auf Seite 1von 68

DATA MODELING WITH GRAPH

DATABASES

Ross McNeely
Principal Consultant, Practice Manager, Business Intelligence

“Data Junkie”
DATA MODELING WITH GRAPH DATABASES

CREATE TABLE #Info


(Info_Type VARCHAR(25)
,Info_Value VARCHAR(50))
INSERT INTO #Information VALUES
('Name','Ross McNeely')
,('Email','rmcneely@tailwindbi.com')
,('Company','Tail Wind Informatics')
,('CompanySite','www.tailwindtech.com')
,('LinkedIn','www.linkedin.com/in/rossmcneely')
,('Blog','www.mcneelydwbi.wordpress.com');
SELECT Info_Type, Info_Value FROM #Information
SPEAKER BIO

Ross McNeely is the Principle


Consultant & BI Practice Manager at
Tail Wind Informatics.
Ross has been working with MS
SQL Server BI stack for over a
decade.
Enterprise Information Management
& Business Intelligence are Ross’
primary focus.
Business “Go
Intelligence Farther,
Solutions Faster”

HTTP://TAILWINDTECH.COM
• Introduction to the Graph Model (15 min)
• Data Modeling with Graph Databases (15 min)
• Relational and Graph Models (10 min)
• Healthcare Use Case (20 min)
Agenda • Deeper Dive into Graph Databases (20 min)
• Logistics Use Case (15 min)
• Security Use Case (15 min)
• Summary (5 min)

DATA MODELING WITH GRAPH DATABASES


• Defining the Graph Database
• Overview of the Graph Market
Introduction to • Benefits of the Graph Data Model
the Graph Model

DATA MODELING WITH GRAPH DATABASES


DEFINING THE GRAPH DATABASE
INTRODUCTION TO THE GRAPH MODEL
NoSQL Primary Groupings
Key Value Column Store

Document
Graph
INTRODUCTION TO THE GRAPH MODEL

Graph Defined:
1“Formally, a graph is just a collection of vertices and edges-or, in less
intimidating language, a set of nodes and the relationships that connect
them.”
Graph
Less Formally Defined:
-A graph is a set of nodes, relationships, and properties.
-A network of connected objects.
INTRODUCTION TO THE GRAPH MODEL

• Nodes (“vertices”)
Property • Relationships (“edges”)
Graph • Properties
INTRODUCTION TO THE GRAPH MODEL

Nodes • Nodes represent entities

Nodes contain properties. Think of nodes as documents that


store properties in the form of arbitrary key-value pairs.

name: bode
miller
INTRODUCTION TO THE GRAPH MODEL

• Relationships are the lines


Relationships between nodes.

Relationships connect and structure nodes.

Olympic
_Address
INTRODUCTION TO THE GRAPH MODEL

• Properties are values about the


Properties node or relationship.
name: bode
Properties can be added to nodes and relationships. miller
Allows you to create additional semantics to relationships.
Address
Type:Olympic

Address:123
Fake Street
INTRODUCTION TO THE GRAPH MODEL

Basic Graph
Node Ross
Property
Relationship
knows knows

Jack knows Megan


OVERVIEW OF THE GRAPH MARKET
INTRODUCTION TO THE GRAPH MODEL
INTRODUCTION TO THE GRAPH MODEL

ArangoDB
Trinity Neo4J BigData
Graph Processing
Bitsy
BrightStartDB
DEX/Sparksee
Filament
GraphBase
Horton
HyperGraphDB
FlockDB Allegro OpenLink
R2DF
Titan
Graph Storage VelocityGraph
VertexDB
INTRODUCTION TO THE GRAPH MODEL

Property Graph Triples* Hypergraph


Neo4j Allegro Graph Hyper Graph DB

*Triple Stores come from the Semantic Web movement. A triple is a subject-predicate-object
data structure
BENEFITS OF THE GRAPH MODEL
INTRODUCTION TO THE GRAPH MODEL
DATA MODELING WITH GRAPH DATABASES

• It is an agile modeling approach


What does the • No pre-defined schema
graph database • General purpose graph data
offer? schema
• Easy of use with the Business
DATA MODELING WITH GRAPH DATABASES

• Performance increase when dealing with


Performance connected data.

• We can add nodes/relationships as the


Flexibility business domain dictates.

• Agile and test-driven software development


Agility practices.
• Why Data Model with a Graph Database?
Data Modeling • Graph Modeling
with Graph
Databases

DATA MODELING WITH GRAPH DATABASES


WHY DATA MODEL WITH A GRAPH DATABASE
DATA MODELING WITH GRAPH DATABASES
DATA MODELING WITH GRAPH DATABASES

Q: Why did I want to use a graph database?

A: Here is the simplified version of my requirements.

• Requirement #1: It is all about the relationships.


• Requirement #2: First learn requirement #1.
INTRODUCTION
Graph StructureTO THE GRAPH MODEL
Label: isMember
Since: 1/20/2014
Name: Ross
Age: 34
Label: Member
Label: Knows
Since: 5/20/2006 Label: Knows
Since 5/20/2008

Type: Activity
Label: isMember
Name: Jack Name: Martial
Since: 6/15/2013
Age: 7 Arts

Label: Member
GRAPH MODELING
DATA MODELING WITH GRAPH DATABASES
DATA MODELING WITH GRAPH DATABASES

The Modeling Half The Database Half

Graph CRUD
CRUD Matrix
Function\Entity Appointment
Enter C
Confirm RU
Cancel D
DATA MODELING WITH GRAPH DATABASES

• 1“A graph database management system


(G-DBMS) is an online database
Graph Database: management system with Create, Read,
Update, and Delete (CRUD) methods that
expose a graph data model.”
DATA MODELING WITH GRAPH DATABASES

Graph Modeling Rules: “By the book1”

• Nodes for Things, Relationships for Structure


• Use nodes to represent entities –this is things that of interest
• Use relationships to (build structure)
• Express connections between entities
• Establish semantic context for each entity
• Use node properties to represent entity attributes, plus metadata
• Use relationship properties to express the strength, weight, or quality of a
relationship, plus metadata.
INTRODUCTION TO THE GRAPH MODEL

How do you use


a graph database? Query Result Options

• Traversal of the • Follow the • A set


database. relationships from • A path
node to node • A pattern
INTRODUCTION TO THE GRAPH MODEL

Set Path Pattern


• The Similarities
• The Differences
Relational and
Graph Models

DATA MODELING WITH GRAPH DATABASES


RELATIONAL AND GRAPH MODELS

• Define and agree upon the domain entities


Similarities • Define the interactions, and governing rules
• Whiteboard stage is the same

• Few changes from conceptual to logical to physical

Differences • Graphs storage model matches the logical model


• After the initial domain definition we enhance the
graph instead of defining the tables.
RELATIONAL AND GRAPH MODELS

Relational

Graph
• Patient Matching
Healthcare Use
Case

DATA MODELING WITH GRAPH DATABASES


HEALTHCARE USE CASE

Accountable Care • Patient Protection and Affordable Care


Organizations Act of 2010
(ACOs) • Transform health providers into ACOs

What does this • Patient Matching


boil down to?
HEALTHCARE USE CASE

• 6Two specific objectives


Patient • Identify common attributes
Matching • Define processes and best practices

Scope of • 6Up to 14% percent of medical


Problem records contain erroneous data
HEALTHCARE USE CASE
PatientMaster PatientSourceB

PK PatientMasterID PK FirstName

• FirstName
PK LastName
FK1,FK2 FirstName PK DOB
• LastName FK1,FK2 LastName PK Gender

Master • DOB
FK1,FK2
FK1,FK2
DOB
Gender
PK
PK
SSN
Address1
Data • Gender
FK1
FK1
SSN
Address1
Lookup
• SSN PatientSourceC
• Address1 PK FirstName
PK LastName
PK DOB
PK Gender
HEALTHCARE USE CASE

Normalization

PatientExternal PatientMaster
PK PatientExternalID PatientSourceRef PK PatientMasterID
FirstNameOriginal PK PatientSourceRefID
FK1,FK2 FirstName
LastNameOriginal FK1,FK2 LastName
DOBOriginal FK1 PatientExternalID
FK1,FK2 DOB
GenderOriginal FK2 PatientMasterID
FK1,FK2 Gender
SSNOriginal IsActiveRecord
FK1 SSN
Address1Original FK1 Address1
HEALTHCARE USE CASE

Source Source Address


A B C

Came_From
I created a
matching site based
Patient Patient Patient Patient Patient
on social graph William Bill Pat Pat Joe
database example
Lives_At Lives_In

Address State DOB Gender


• Graph Modeling Continued
• Graph Modeling Mistakes
Deeper Dive • Patterns
into Graph • Misc.
Databases

DATA MODELING WITH GRAPH DATABASES


GRAPH MODELING CONTINUED
DEEPER DIVE INTO GRAPH DATABASES
DEEPER DIVE INTO GRAPH DATABASES

Graph Modeling Guidelines:

• The query patterns drive the data model


• Normalization is a natural trend in graph modeling
• In general normalization has a low cost
• Complexity with normalization will drive traversal speeds up
• The SIP Methodology2
• Use in-graph indices for range queries*
• Node and Relationship Redundancy is not bad.
• Schema development over time
• Database extensions* *Have not used myself
DEEPER DIVE INTO GRAPH DATABASES

Graph Modeling Dilemmas:

• Q: Should I create a Relationship or a Property?


• Q: Should every node with the same key/value
(property) be connected?
• A: It depends.
GRAPH MODELING MISTAKES
DEEPER DIVE INTO GRAPH DATABASES
DEEPER DIVE INTO GRAPH DATABASES

What was I thinking?

 CHAOS
 I started without a
plan
DEEPER DIVE INTO GRAPH DATABASES

This is easy!
 DESIGN PATTERNS3
 Linked List
 Multiple Relationships
 Tags and Categories
 Multi Level Tree
 R-Tree (spatial)
 Activity Stream
 Anti-pattern: Unconnected graph
PATTERNS
DEEPER DIVE INTO GRAPH DATABASES
DEEPER DIVE INTO GRAPH DATABASES

Anti-pattern Pattern: Linked List


country: Competes name: bode Sport Name: downhill
Olympian usa _for miller _order Rank: 12
name Sport
country Sport
_order _order
sport1_name
sport1_rank Name: super
sport2_name Name: super-g
combined downhill
sport2_rank Rank: 3
Rank: 12
sport3_name Sport
sport3_rank Sport
_order _order

Name: super
combined slalom
Rank: 7
DEEPER DIVE INTO GRAPH DATABASES

Anti-pattern Pattern: Multiple Relationships

Olympian country:
name Competes_in
usa
country Competes Order: 1
sport1_name _for
sport1_rank name: bode Placed
Rank: 8
Downhill
sport2_name miller
sport2_rank
sport3_name Placed Competes_in
sport3_rank Rank: 12 Order: 2

Super
Combined
DEEPER DIVE INTO GRAPH DATABASES

Anti-pattern Pattern: Tags and Categories1


Id: App 1 Id: App 2
Data Center Status: Up/Down Status: Up/Down
Runs_on Runs_on Runs_on
Database_server
Application Id: Vir Machine 15 Id: Vir Machine 16 Id: Vir Machine 17
Virtual Machine Status: Up/Down Status: Up/Down Status: Up/Down
Server
Rack Hosted_by Hosted_by Hosted_by

Id: Server 1 Id: Vir Machine 2


Status: Up/Down Status: Up/Down
In In
Id: Rank 1
Status: Up/Down
DEEPER DIVE INTO GRAPH DATABASES

Pattern: Multi-Level Tree1 timeline

Year Year

2013 2014

Month Month

december january

Day Day Day Day

15 25 1 2

on on on on

Event A Event B Event C Event D


DATA MODELING WITH GRAPH DATABASES
DEEPER DIVE INTO GRAPH DATABASES

Pattern: Stream Analysis5

http://blog.bruggen.com/2013/11/clickstreams-are-so-much-nicer-in-neo4j.html
MISCELLANEOUS
DEEPER DIVE INTO GRAPH DATABASES
DEEPER DIVE INTO GRAPH DATABASES

Fine-Grained Relationships Generic Relationships

name: bode name: bode


miller miller

Olympic Address
_Address Type:Olympic

Address:123 Address:123
Fake Street Fake Street
DEEPER DIVE INTO GRAPH DATABASES

OLTP • Graph Databases


• Native Graph Storage

• Graph Compute Engines

OLAP •

Index-free adjacency processing
Identify clusters in data
• Optimized for scanning and processing large sets
DEEP DIVE INTO GRAPH DATABASES

• Monitoring
Enterprise • Live backups
Ready • High performance caches,
• HA clustering
DEEPER DIVE INTO GRAPH DATABASES

• Joins have a low cost


• Index-free “adjacency of entities”
Physical • Performance is in related to the result size
Model • CONS
• Tabular Data Items
• Blobs
• Social
• Recommendations
Common • Geo
Use Cases •

Master Data Management
Network & Data Center Mgmt
• Authorization and Access Control
• Multiple Picks
• Multiple Drops
Logistic Use
Case

DATA MODELING WITH GRAPH DATABASES


LOGISTICS USE CASE

• Carries need to optimize


Multiple • Make multiple pickups
Picks/Drops • Make multiple drop-offs

Numerous • MIT Supply Chain Management


Examples
LOGISTICS USE CASE

PickDropRef
Carrier Site
PK PickDropRefID
PK CarrierID PK SiteID
FK1 CarrierID
Name FK2 SiteID Name
SomeAttribute SiteType SomeAttribute
Sequence
LOGISTICS USE CASE

Package Drop
2 C

Pickup
Carrier
A

Package Drop Pickup Package


1 B B 3
• Users
Security Use
Case

DATA MODELING WITH GRAPH DATABASES


SECURITY USE CASE1

• 1“Ensure that users and


Authorization administrators see and change only
& Access those parts of the organization and
Control the products and services they are
entitled to manage.”
1“This model comprises
two hierarchies. The first
hierarchy, admins within
each customer
organization are assigned
to groups; these groups
are then granted various
permissions against that
organization’s structure.”

“Graph Databases”1
• Graph Modeling
• Graph Databases
• Tail Wind Informatics
Summary • Ross McNeely

DATA MODELING WITH GRAPH DATABASES


REFERENCES
 1 “Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem (O’Reilly). Copyright 2013 Neo Technology, Inc.,
978-1-449-35626-2.”
 2 “Controlling Complexity in Enterprise Architectures: The SIP Methodology by Roger Sessions” (ObjectWatch).
 3 http://www.neo4j.org/develop/modeling (Michael Hunger)
 4 http://en.wikipedia.org/wiki/R-tree
 5 http://blog.bruggen.com/2013/11/clickstreams-are-so-much-nicer-in-neo4j.html
 6 http://www.himss.org/News/NewsDetail.aspx?ItemNumber=22312
 General References
 https://www.gartner.com/doc/2081316
 http://www.neo4j.org/learn/neo4j
 http://franz.com/agraph/allegrograph/
 http://www.hypergraphdb.org/index
 http://scm.mit.edu/research

Das könnte Ihnen auch gefallen