Sie sind auf Seite 1von 14

Foursquare & MongoDB

Harry Heymann May 21, 2010

Foursquare Overview
A location based social network. Allows users to "check in" at bars, restaurants, shopping destinations, etc to share their location with friends. Rewards users with virtual prizes (points, badges) that can sometimes lead to real life rewards (5 checkins at a restaurant might get you a free appetizer) ~1.3M registered users. ~615k checkins/day. Nearly 50M checkins total. Very rapid growth.

Basic technical details


Written in scala (a somewhat new language what you'd get if Java & ML had a baby. It runs on the JVM) Uses a web framework called Lift Originally used a single PostgreSQL instance as the data store. Scaling up on on a SQL database can be frustrating (replication & sharding don't work as easily as one would like) so we're moving to MongoDB. Built in geospatial capabilities (obviously very important to foursquare) are a very nice bonus.

Transition to Mongo
Currently writing checkins, tips, venues (and various things related to venues) to MongoDB. All writes still go to PostgreSQL as well. Slowly migrating various reads. Exclusively use mongo for our "Who's here" server. A short lived record of where our users are at any given time (contains 3 hours worth of data). Migrating geo related queries first. Other items later. Checkins a high priority (due to the fact that they represent the bulk of our data).

Geospatial Indexes
MongoDB conveniently supports geospatial indexes out of the box. Currently limited to Earth like dimensions (+/- 180 degrees in each of 2 axes). It cheats to make the math easier/faster by assuming a flat earth where 1 degree of latitude or longitude is always the same distance. This is fine as long as you are dealing with relatively small distances (as foursquare does) Implemented using geographic hash codes atop standard MongoDB b-trees

Creating Geospatial Indexes


Indicate the "2d" index type: db.venues.ensureIndex ({latlng: "2d"}) Specify additional fields if you plan on using compound geospatial queries (more on these in a moment): db.venues. ensureIndex({latlng: "2d", closed: 1, keywordList: 1})

Take care: 1k limit on key size


If you have a compound geospatial index and you don't take care it can be easy to go over the MongoDB limit of 1K. For example if the index is on {latlng: "2d", keywordList: 1} then the following venue would be a problem: {latlng: [40, -72], keywordList: ["some", "venue", "with", "a", "whole", "lot", "of", "different", "words", "in", "the", "name", "of", "the", "venue", "it", "just", "keeps", "going", "on", "and", "on", "forever", "without", "seeming, "to", "ever", "stop"]} In these cases the individual item will be dropped from the index making it impossible to query.

Basic Geospatial Queries


Find the closes 20 venues to a given location: db.venues. find({latlng: {$near: [40.72, -73.99]}).limit(20) Find up to the closes 20 venues to a given location that are within 1 degree of the location: db.venues.find ({latlng: {$near: [40.72, -73.99, 1]}).limit(20)

Foursquare uses this to find nearby venues, tips, specials, and various other geolocated data.

Complex Geospatial Queries


If you have a compound geospatial index defined you can query on additional fields and still use the index: db.venus. find({latlng: {$near: [40.72, -73.99]}, closed: false} (because we generally don't want closed venues) Basic search: db.venus.find({latlng: {$near: [40.72, -73.99]}, closed: false, keywordList: $all: ["nyc", "seminar"]})

Bounded geospatial queries


Foursquare doesn't do much of this, but it's possible to find all of the items in a collection that are within a given circle or square: db.venues.find({latlng: {"$within": {"$box": [[40, -72], [41, -73]]}}}) db.venues.find({latlng: {"$within": {"$circle": [[40, -72], 0.5]}}} Can be combined with complex geospatial queries that were demonstrated on last slide. In general though $near will be more useful/performant than $within.

MongoDB & Scala/Lift


Lift has a generic ORM layer called record for which there is a MongoDB implementation. Originally called scamongo, but as of Lift 2.0 M5 it's integrated into the core Lift codebase as lift-mongodb It's a very thin wrapper around the Java driver provided by 10gen It's new, so has some wards/oddities, but should improve fairly rapidly (foursquare is working on this)

A foursquare venue using lift-mongo


class Venue extends MongoRecord[Venue] with MongoId[Venue] with GeolocationMongo[Venue] { object venuename extends StringField(this, 255) object address extends OptionalStringField(this, 50) object closed extends BooleanField(this) // etc }

Basic ORM operations work as you might expect


val venue = Venue.createRecord .venuename("NYC Seminar & Conference Center") .address("71 W 23rd St") .city("New York").state("NY").zip("10010") venue.save val query = QueryBuilder.start(Venue.venuename.name) .is("Gramercy Tavern") .get() venue = Venue.findAll(query)

PS: We're hiring


All sorts of roles: engineering, operations, business. See http://foursquare.com/jobs or come talk to me.

Das könnte Ihnen auch gefallen