Sie sind auf Seite 1von 52

Schema Design

Alvin Richards - alvin@10gen.com

Topics
Introduction Basic Data Modeling Evolving a schema Common patterns Single table inheritance One-to-Many & Many-to-Many Trees Queues

So why model data?

http://www.ickr.com/photos/42304632@N00/493639870/

A brief history of normalization


1970 E.F.Codd introduces 1st Normal Form (1NF) 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF) 1974 Codd & Boyce dene Boyce/Codd Normal Form (BCNF) 2002 Date, Darween, Lorentzos dene 6th Normal Form (6NF) Goals: Avoid anomalies when inserting, updating or deleting Minimize redesign when extending the schema Make the model informative to users Avoid bias towards a particular style of query

* source : wikipedia

The real benet of relational


Before relational After relational

Data and Logic combined Separation of concerns Data modeled independent of logic Logic freed from concerns of data design

MongoDB continues this separation

Relational made normalized data look like this

Document databases make normalized data look like this

Terminology
RDBMS Table Row(s) Index Join Partition Partition Key MongoDB Collection JSON Document Index Embedding & Linking Shard Shard Key

So todays example will use...

Design Session
Design documents that simply map to your application
> post = {author: "Herg", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: ["comic", "adventure"]} > db.posts.save(post)

Find the document


> db.posts.find() { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Herg", date: ISODate("2011-09-18T09:56:06.298Z"), text: "Destination Moon", tags: [ "comic", "adventure" ] } Notes: ID must be unique, but can be anything youd like MongoDB will generate a default ID if one is not supplied

Add and index, nd via Index


Secondary index for author // 1 means ascending, -1 means descending > db.posts.ensureIndex({author: 1}) > db.posts.find({author: 'Herg'}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: ISODate("2011-09-18T09:56:06.298Z"), author: "Herg", ... }

Examine the query plan


> db.blogs.find({author: 'Herg'}).explain() { "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Herg", "Herg" ] ] } }

Query operators
Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,

// find posts with any tags > db.posts.find({tags: {$exists: true}})

Query operators
Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,

// find posts with any tags > db.posts.find({tags: {$exists: true}})


Regular expressions:

// posts where author starts with h > db.posts.find({author: /^h/i })

Query operators
Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,

// find posts with any tags > db.posts.find({tags: {$exists: true}})


Regular expressions:

// posts where author starts with h > db.posts.find({author: /^h/i })


Counting:

// number of posts written by Herg > db.posts.find({author: Herg}).count()

Extending the Schema



new_comment = {author: Kyle, date: new Date(), text: great book}

> db.posts.update( {text: Destination Moon }, { $push: {comments: new_comment}, $inc: {comments_count: 1}})

Extending the Schema


{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Herg", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ], comments_count: 1 }

Extending the Schema


// create index on nested documents: > db.posts.ensureIndex({"comments.author": 1}) > db.posts.find({comments.author:Kyle})

Extending the Schema


// create index on nested documents: > db.posts.ensureIndex({"comments.author": 1}) > db.posts.find({comments.author:Kyle}) // find last 5 posts: > db.posts.find().sort({date:-1}).limit(5)

Extending the Schema


// create index on nested documents: > db.posts.ensureIndex({"comments.author": 1}) > db.posts.find({comments.author:Kyle}) // find last 5 posts: > db.posts.find().sort({date:-1}).limit(5) // most commented post: > db.posts.find().sort({comments_count:-1}).limit(1)

When sorting, check if you need an index

Watch for full table scans


> db.blogs.find({text: 'Destination Moon'}).explain() { "cursor" : "BasicCursor", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "indexBounds" : { } }

Group
Equivalent to a Group By in SQL Specic the attributes to group the data Process the results in a Reduce function

Group - Count post by Author


cmd = { key: { "author":true }, initial: {count: 0}, reduce: function(obj, prev) { prev.count++; }, }; result = db.posts.group(cmd);
[

{ }, { }

"author" : "Herg", "count" : 1

"author" : "Kyle", "count" : 3

Common Patterns

Inheritance

Single Table Inheritance - RDBMS


shapes table id type
1 area radius d 1 length width

circle 3.14

square 4

rect

10

Single Table Inheritance MongoDB


> db.shapes.find()
{ _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}

Single Table Inheritance MongoDB


> db.shapes.find()
{ _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}

// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})

Single Table Inheritance MongoDB


> db.shapes.find()
{ _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, d: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}

// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}}) // create index > db.shapes.ensureIndex({radius: 1})

One to Many
One to Many relationships can specify degree of association between objects containment life-cycle

One to Many
- Embedded Array / Array Keys - slice operator to return subset of array - some queries harder e.g nd latest comments across all documents
blogs: { author : "Herg", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book" } ]}

One to Many
- Embedded tree - Single document - Natural - Hard to query
blogs: { author : "Herg", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [ { author : "Kyle", date : ISODate("2011-09-19T09:56:06.298Z"), text : "great book", replies: [ { author : James, ...} ] } ]}

One to Many
- Normalized (2 collections) - most exible - more queries
blogs: { author : "Herg", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [ {comment : ObjectId(1)} ]} comments : { _id : 1, author : "James", date : ISODate("2011-09-19T09:56:06.298Z")}

One to Many - patterns

- Embedded Array / Array Keys

- Embedded Array / Array Keys - Embedded tree - Normalized

Many - Many
Example:
- Product can be in many categories - Category can have many products

Many - Many
products:
{ _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30]}

Many - Many
products:
{ _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30]}


categories:
{ _id: ObjectId("20"), name: "adventure", product_ids: [ ObjectId("10"), ObjectId("11"), ObjectId("12"]}

Many - Many
products:
{ _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30]}


categories:
{ _id: ObjectId("20"), name: "adventure", product_ids: [ ObjectId("10"), ObjectId("11"), ObjectId("12"]}

//All categories for a given product > db.categories.find({product_ids: ObjectId("10")})

Alternative
products: { _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30]}


categories: { _id: ObjectId("20"), name: "adventure"}

Alternative
products: { _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30]}


categories: { _id: ObjectId("20"), name: "adventure"} // All products for a given category > db.products.find({category_ids: ObjectId("20")})

Alternative
products: { _id: ObjectId("10"), name: "Destination Moon", category_ids: [ ObjectId("20"), ObjectId("30]}


categories: { _id: ObjectId("20"), name: "adventure"} // All products for a given category > db.products.find({category_ids: ObjectId("20")}) // All categories for a given product product = db.products.find(_id : some_id) > db.categories.find({_id : {$in : product.category_ids}})

Trees
Full Tree in Document
{ comments: [ { author: Kyle, text: ..., replies: [ {author: James, text: ..., replies: []} ]} ] }

Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 16MB limit

Trees
Parent Links - Each node is stored as a document - Contains the id of the parent Child Links - Each node contains the ids of the children - Can support graphs (multiple parents / child)

Array of Ancestors
- Store all Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" }

Array of Ancestors
- Store all Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } //find all descendants of b: > db.tree2.find({ancestors: b}) //find all direct descendants of b: > db.tree2.find({parent: b})

Array of Ancestors
- Store all Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } //find all descendants of b: > db.tree2.find({ancestors: b}) //find all direct descendants of b: > db.tree2.find({parent: b}) //find all ancestors of f: > ancestors = db.tree2.findOne({_id:f}).ancestors > db.tree2.find({_id: { $in : ancestors})

Trees as Paths
Store hierarchy as a path expression - Separate each node by a delimiter, e.g. / - Use text search for nd parts of a tree
{ comments: [ { author: Kyle, text: initial post, path: / }, { author: Jim, text: jims comment, path: /jim }, { author: Kyle, text: Kyles reply to Jim, path : /jim/kyle} ] } // Find the conversations Jim was part of > db.posts.find({path: /^jim/i})

Queue
Need to maintain order and state Ensure that updates to the queue are atomic
{ inprogress: false, priority: 1, ... }

Queue
Need to maintain order and state Ensure that updates to the queue are atomic
{ inprogress: false, priority: 1, ... } // find highest priority job and mark as in-progress job = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -1}, update: {$set: {inprogress: true, started: new Date()}}, new: true})

Summary
Schema design is different in MongoDB Basic data design principals stay the same Focus on how the apps manipulates data Rapidly evolve schema to meet your requirements Enjoy your new freedom, use it wisely :-)

download at mongodb.org
alvin@10gen.com

conferences, appearances, and meetups


http://www.10gen.com/events

http://bit.ly/mongo1

Facebook | Twitter | LinkedIn


@mongodb

http://linkd.in/joinmongo

Das könnte Ihnen auch gefallen