MongoUK 2011 Schema Design

Schema Design
Alvin Richards - alvin@10gen.com
Topics
Introduction Basic Data Modeling Evolving a schema Common patterns Single table inheritance One-to-Many & Many-to-Many Trees Queues
So why model data?
http://www.ickr.com/photos/42304632@N00/493639870/
A brief history of normalization

1970 E.F.Codd introduces 1st Normal Form (1NF) 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF) 1974 Codd & Boyce dene Boyce/Codd Normal Form (BCNF) 2002 Date, Darween, Lorentzos dene 6th Normal Form (6NF) Goals: Avoid anomalies when inserting, updating or deleting Minimize redesign when extending the schema Make the model informative to users Avoid bias towards a particular style of query
* source : wikipedia
The real benet of relational

Before relational After relational
Data and Logic combined Separation of concerns Data modeled independent of logic Logic freed from concerns of data design
MongoDB continues this separation
Relational made normalized data look like this
Document databases make normalized data look like this
Terminology
RDBMS Table Row(s) Index Join Partition Partition Key MongoDB Collection JSON Document Index Embedding & Linking Shard Shard Key
So todays example will use...
Design Session
Design documents that simply map to your application
> post = {author: "Herg", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: ["comic", "adventure"]} > db.posts.save(post)
Find the document

> db.posts.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Herg", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: [ "comic", "adventure" ] } Notes: ID must be unique, but can be anything youd like MongoDB will generate a default ID if one is not supplied
Add and index, nd via Index

Secondary index for author // 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1}) > db.posts.find({author: 'Herg'}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: ISODate("2011-09-18T09:56:06.298Z"), author: "Herg", ... }
Examine the query plan

> db.blogs.find({author: 'Herg'}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Herg", "Herg" ] ] } }
Query operators
Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,
// find posts with any tags > db.posts.find({tags: {$exists: true}})
Query operators

Regular expressions:
// posts where author starts with h > db.posts.find({author: /^h/i })
Query operators

Regular expressions:
// posts where author starts with h > db.posts.find({author: /^h/i })

Counting:
// number of posts written by Herg > db.posts.find({author: Herg}).count()
Extending the Schema

new_comment = {author: Kyle, date: new Date(), text: great book}
> db.posts.update( {text: Destination Moon }, { $push: {comments: new_comment}, $inc: {comments_count: 1}})

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Herg", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ], comments_count: 1 }

// create index on nested documents: > db.posts.ensureIndex({"comments.author": 1}) > db.posts.find({comments.author:Kyle})

// create index on nested documents: > db.posts.ensureIndex({"comments.author": 1}) > db.posts.find({comments.author:Kyle}) // find last 5 posts: > db.posts.find().sort({date:-1}).limit(5)

// create index on nested documents: > db.posts.ensureIndex({"comments.author": 1}) > db.posts.find({comments.author:Kyle}) // find last 5 posts: > db.posts.find().sort({date:-1}).limit(5) // most commented post: > db.posts.find().sort({comments_count:-1}).limit(1)
When sorting, check if you need an index
Watch for full table scans

> db.blogs.find({text: 'Destination Moon'}).explain() { "cursor" : "BasicCursor", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { } }
Group
Equivalent to a Group By in SQL Specic the attributes to group the data Process the results in a Reduce function
Group - Count post by Author

cmd = { key: { "author":true }, initial: {count: 0}, reduce: function(obj, prev) { prev.count++; }, }; result = db.posts.group(cmd);
[
{ }, { }
"author" : "Herg", "count" : 1
"author" : "Kyle", "count" : 3
Common Patterns
Inheritance
Single Table Inheritance - RDBMS

shapes table id type
1 area radius d 1 length width
circle 3.14
square 4
rect
10
Single Table Inheritance MongoDB

> db.shapes.find()
{ _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}

> db.shapes.find()
// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})

> db.shapes.find()
// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}}) // create index > db.shapes.ensureIndex({radius: 1})
One to Many
One to Many relationships can specify degree of association between objects containment life-cycle
One to Many
- Embedded Array / Array Keys - slice operator to return subset of array - some queries harder e.g nd latest comments across all documents
blogs: { author : "Herg", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ]}
One to Many
- Embedded tree - Single document - Natural - Hard to query
blogs: { author : "Herg", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book", replies: [ { author : James, ...} ] } ]}
One to Many
- Normalized (2 collections) - most exible - more queries
blogs: { author : "Herg", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [ {comment : ObjectId(1)} ]} comments : { _id : 1, author : "James", date : ISODate("2011-09-19T09:56:06.298Z")}
One to Many - patterns
- Embedded Array / Array Keys
- Embedded Array / Array Keys - Embedded tree - Normalized
Many - Many
Example:
- Product can be in many categories - Category can have many products
Many - Many
products:
{ _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30]}
Many - Many
products:

categories:
{ _id: ObjectId("20"), name: "adventure", product_ids: [ ObjectId("10"), ObjectId("11"), ObjectId("12"]}
Many - Many
products:

categories:
{ _id: ObjectId("20"), name: "adventure", product_ids: [ ObjectId("10"), ObjectId("11"), ObjectId("12"]}
//All categories for a given product > db.categories.find({product_ids: ObjectId("10")})
Alternative
products: { _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30]}

categories: { _id: ObjectId("20"), name: "adventure"}
Alternative

categories: { _id: ObjectId("20"), name: "adventure"} // All products for a given category > db.products.find({category_ids: ObjectId("20")})
Alternative

categories: { _id: ObjectId("20"), name: "adventure"} // All products for a given category > db.products.find({category_ids: ObjectId("20")}) // All categories for a given product product = db.products.find(_id : some_id) > db.categories.find({_id : {$in : product.category_ids}})
Trees
Full Tree in Document
{ comments: [ { author: Kyle, text: ..., replies: [ {author: James, text: ..., replies: []} ]} ] }
Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 16MB limit

Trees
Parent Links - Each node is stored as a document - Contains the id of the parent Child Links - Each node contains the ids of the children - Can support graphs (multiple parents / child)
Array of Ancestors
- Store all Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" }
Array of Ancestors
- Store all Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } //find all descendants of b: > db.tree2.find({ancestors: b}) //find all direct descendants of b: > db.tree2.find({parent: b})
Array of Ancestors
- Store all Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } //find all descendants of b: > db.tree2.find({ancestors: b}) //find all direct descendants of b: > db.tree2.find({parent: b}) //find all ancestors of f: > ancestors = db.tree2.findOne({_id:f}).ancestors > db.tree2.find({_id: { $in : ancestors})
Trees as Paths
Store hierarchy as a path expression - Separate each node by a delimiter, e.g. / - Use text search for nd parts of a tree
{ comments: [ { author: Kyle, text: initial post, path: / }, { author: Jim, text: jims comment, path: /jim }, { author: Kyle, text: Kyles reply to Jim, path : /jim/kyle} ] } // Find the conversations Jim was part of > db.posts.find({path: /^jim/i})
Queue
Need to maintain order and state Ensure that updates to the queue are atomic
{ inprogress: false, priority: 1, ... }
Queue
Need to maintain order and state Ensure that updates to the queue are atomic
{ inprogress: false, priority: 1, ... } // find highest priority job and mark as in-progress job = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -1}, update: {$set: {inprogress: true, started: new Date()}}, new: true})
Summary
Schema design is different in MongoDB Basic data design principals stay the same Focus on how the apps manipulates data Rapidly evolve schema to meet your requirements Enjoy your new freedom, use it wisely :-)
download at mongodb.org
alvin@10gen.com
conferences, appearances, and meetups

http://www.10gen.com/events
http://bit.ly/mongo1
Facebook | Twitter | LinkedIn

@mongodb
http://linkd.in/joinmongo

MongoUK 2011 Schema Design

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

MongoUK 2011 Schema Design

Hochgeladen von

Copyright:

Verfügbare Formate

Schema Design

Alvin Richards - alvin@10gen.com

So why model data?

A brief history of normalization

The real benet of relational

MongoDB continues this separation

Relational made normalized data look like this

Document databases make normalized data look like this

So todays example will use...

Find the document

Add and index, nd via Index

Examine the query plan

// find posts with any tags > db.posts.find({tags: {$exists: true}})

// find posts with any tags > db.posts.find({tags: {$exists: true}})

// posts where author starts with h > db.posts.find({author: /^h/i })

// find posts with any tags > db.posts.find({tags: {$exists: true}})

// posts where author starts with h > db.posts.find({author: /^h/i })

// number of posts written by Herg > db.posts.find({author: Herg}).count()

Extending the Schema

Extending the Schema

Extending the Schema

Extending the Schema

Extending the Schema

When sorting, check if you need an index

Watch for full table scans

Group - Count post by Author

"author" : "Herg", "count" : 1

"author" : "Kyle", "count" : 3

Single Table Inheritance - RDBMS

Single Table Inheritance MongoDB

Single Table Inheritance MongoDB

// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})

Single Table Inheritance MongoDB

One to Many - patterns

- Embedded Array / Array Keys

- Embedded Array / Array Keys - Embedded tree - Normalized

//All categories for a given product > db.categories.find({product_ids: ObjectId("10")})

conferences, appearances, and meetups

Facebook | Twitter | LinkedIn

Das könnte Ihnen auch gefallen