Sie sind auf Seite 1von 47

Machine Learning with Large Networks of People and Places

Blake Shaw, PhD Data Scientist @ Foursquare @metablake

What is foursquare?
An app that helps you explore your city and connect with friends A platform for location based services and data

What is foursquare?
People use foursquare to: share with friends discover new places get tips get deals earn points and badges keep track of visits

What is foursquare?
Mobile Social

Local

Stats

15,000,000+ people 30,000,000+ places 1,500,000,000+ check-ins 1500+ actions/second

Video: h)p://vimeo.com/29323612

Overview Intro to Foursquare Data Place Graph Social Graph Explore Conclusions

The Place Graph 30m places interconnected w/ different


signals:

ow co-visitation categories menus tips and shouts

NY Flow Network

People connect places over time


Places people go after the Museum of Modern
Art (MOMA): MOMA Design Store, Metropolitan Museum of Art,
Rockefeller Center, The Modern, Abby Aldrich Rockefeller, Sculpture Garden, Whitney Museum of American Art, FAO Schwarz

Places people go after the Statue of Liberty:

Ellis Island Immigration Museum, Battery Park, Liberty Island, National September 11 Memorial, New York Stock Exchange, Empire State Building

Predicting where people will go next


Cultural places (landmarks etc.) Bus stops, subways, train staHons Airports College Places Nightlife
AMer bars: american restaurants, nightclubs, pubs, lounges, cafes, hotels, pizza places AMer coee shops: oces, cafes, grocery stores, dept. stores, malls

Collaborative ltering
How do we connect people to new places theyll like? People Places

Collaborative ltering
[Koren, Bell 08]

Item-Item similarity
Find items which are similar to items that a user has
already liked

User-User similarity
Find items from users similar to the current user Low-rank matrix factorization First nd latent low-dimensional coordinates of users
and items, then nd the nearest items in this space to a user

Collaborative ltering
Item-Item similarity

Pro: can easily update w/ new data for a user Pro: explainable e.g people who like Joes pizza, also like Lombardis Con: not as performant as richer global models

User-User similarity Pro: can leverage social signals here as well... similar
can mean people you are friends with, whom youve colocated with, whom you follow, etc...

Finding similar items


Large sparse k-nearest neighbor problem Items can be places, people, brands Different distance metrics Need to exploit sparsity otherwise
intractable

Finding similar items


Metrics we nd work best for recommending: Places: cosine similarity
sim(xi , xj ) =

Friends: intersection

xi xj kxi kkxj k

Brands: Jaccard similarity


sim(A, B) =
|A\B| |A[B|

sim(A, B) = |A \ B|

Computing venue similarity

each entry is the log(# of checkins at place i by user j) one row for every 30m venues...

X2R

nd

Kij = sim(xi , xj ) xi xj = kxi kkxj k

K2R

nn

Computing venue similarity Naive solution for


computing K :

O(n d)

Requires ~4.5m
machines to compute in < 24 hours!!! and 3.6PB to store!

Kij = sim(xi , xj ) xi xj = kxi kkxj k

K2R

nn

Venue similarity w/ map reduce


key user vi, vj vi, vj key vi, vj score score ... score ... visited venues score score

map

emit all pairs of visited venues for each user

reduce

nal score

Sum up each users score contribution to this pair of venues

The Social Graph 15m person social network w/ lots of


different interaction types:

friends follows dones comments colocation

What happens when a new coffee shop opens in the East Village?

A new coffee shop opens...

The Social Graph

The Social Graph


How can we better visualize this network?

A2B

nn

L 2 Rnd

Graph embedding
Spring Embedding - Simulate physical system
by iterating Hookes law Spectral Embedding - Decompose adjacency matrix A with an SVD and use eigenvectors with highest eigenvalues for coordinates Laplacian eigenmaps [Belkin, Niyogi 02] - form graph laplacian from adjacency matrix, L = D A , apply SVD to L and use eigenvectors with smallest non-zero eigenvalues for coordinates

Preserving structure
A connectivity algorithm G(K) such as k-nearest neighbors should be able to recover the edges from the coordinates such that G(K) = A
Embedding

Connectivity G(K)

Edges

Points

Structure Preserving Embedding


[Shaw, Jebara 09]

SDP to learn an embedding K from A Linear constraints on K preserve the global


topology of the input graph Convex objective favors low-rank K close to the spectral solution, ensuring lowdimensional embedding Use eigenvectors of K with largest eigenvalues as coordinates for each node

Structure Preserving Embedding


[Shaw, Jebara 09]

max tr(KA)
KK

Dij > (1

Aij ) max(Aim Dim ) i,j


m

where K = {K 0, tr(K) 1,

Dij = Kii + Kjj


SDP

2Kij

ij

Kij = 0}

SVD

A2B

nn

K2R

nn

L 2 Rnd

1 SDP a From only connectivity information describing = a triplet (i, j, k) such maximum= 1 and Aik which neighbors, b-matching, or that Aij SVD spanning 0. weight where the step-size = t . Tony Jebara X n nodes in randomly chosen graph are connected, K specifying disorm and for eachaccepts as input clearly can we learnthe,set of This set Computer Science kernel constraint a if tree) which aof all triplets a triplet subsumes Cl an acan use projection to enforce th Dept. fP = tr(L LA) (L) max(tr ue A, tr(Cl L L) and returns an adjacency each each individual low-dimensional above, and allows node call that tance constraints update L for matrix, embedding > 0 thencoordinatesaccording to: we suchan s, Columbia University l S ij (L L)ij = 0, by subtracting Structure be written as theused to>reconstruct Preserving application of G y these coordinates can easily tr(Cl K) llest embedding structure preserving ifbe Embedding constraint New York, NY 10027 to 0 where d dividing each entry of L by its F the l K) Lt+1structure ij (finput Cl )Kkk Temporarily the tr(Coriginal K= Lt + SGDnetwork? . G(K) = A. =reproduces of + (Lt ), graph: 2K the 2Kik to optimized via K exactly We will maximize f (L) via projec sjj [Shaw, constraints, Jebara here dropping the centering reserving Embedding and scaling preserves11] we gradient decent. Dene the subgra oLinear the step-sizeto SPE1 learns aKeach as K tr(Cl K constraints on K=be . written step, the a enforce that matrix constraint After we As via ces. where now formulate the SDP above as maximizing the can rst proposed, single randomly chosen triplet: gt topology of the input adjacency matrixthen decomposes Imp following objective function that tr(L L) over cansemidenite program (SDP) and+ 2KsolvesKkk . use projection to enforce2KijL: ( SPE for greedy nearest-neighbor constraints 1 and the ik hich Ptr(Cl K) = Kjj K L L = 2L(A Cl ) Red if tr in following = 0, by performing singular value decomij (L L)ij SDP: by subtracting the mean from L and et of Dene distance and weightX terms of K: x (f (L), Cl ) = position. tr(L LA) by itsmax(tr(Cl L directly j f (L) = We of L Frobenius L), liza- dividing each entrypropose optimizing L norm.0). using 0 otherwise hing that st Dij = iKii descent 2Ktr(C beca stochastic gradient + Kjj k (SGD).l K) < 0 ij max tr(KA) lS ngs, ruct K K Wij = Tony = Kii Kjj + 2Kij Dij Jebara and for each randomly chosen impo trip cted m We will maximize(1 (L) ij ) max(Aim Dim ) i,j subDij > f Avia Dept. Computer Science projected stochastic node k tr(Cl L L) > 0 then update L acco m distr(C A, gradient decent. Dene the subgradient 0 terms of a Columbia University l K) > in SPE-SGD i j P fc- a s stsosingle K = {K chosenNY 1, Aij s.t. A 0} randomly York, G(K) = arg max triplet: ij Kij T Wij New 0, tr(K) 10027 where Lt+1 = Lt + (f (Lt ), logotan yields exponential number ij constraints of form: of SGD A X = s ( mning fStructure tr(L LA) l ) if tr(Cmax(tr(Cl L where the step-size (L) =C ) = 2L(Aconstraints can L) >written L), 0). 1 Large-ScaleStruct C 0 re l L be preserving = 1t . A Algorithm (f (L), l Structure preserving constraints also benet bors k-nearest neighborsotherwise {C can , ...C }, where .D trix as a set of matrices S = 0 1 , C2 bedding can use projection to enforce that l These m in methods P ns a dimensionality reduction algorithms. S SPE forl greedy nearest-neighbor constraints solves the hest each C is a constraint matrix corresponding to ( n h Require:ijA 0, bynsubtracting th , dimension an similarly DijSDP: Aij ) max(Aim Dim ) that preserve if nd randomly chosen triplet constraint C , compact coordinates ij (L L) = B then > (1 following (i, j, k) such that Aij = 1 and Aik =l 0. and for a tripleteach m dividing each , and maximum i 2L(A Cto: of parameter properties of the input according l if tr(Cl L L) > 0 entry of L by its Fro alue certainl L L) >all then update Ldata. Many)of these lizatr(C set of 0 = orm This (L), Cl ) triplets clearly subsumes the SPE (f at 1: Initialize L0 rand(d, n) max ectly manifold learning techniques preserve local distances ings, -balls blah constraints, and otherwise individual A, distance blah K tr(KA) allows each st K 0 ct (or optionally initialize to sp Ltopology. + = Lt We (f (Lthat ladding 0 where ), cted but not graphto be written showtr(Cl K) > explicit t+1 llest constraint Dij > (1 Aij )as t(AC ) im ) i,j ng max im Laplacian eigenmaps solution) constraints to these existingD dis- topological= Kjj 2Kij + 2Kikm Kkk . algorithms is tr(Cl K) y1 1 2: t 0 and the ij (Aij f so- crucial for preventing 2 ) =(Aijt ),After each step, we Lwhere =DL{K+ 0,folding 1t1, PCl ) = problems (f collapsing 0} here t+1 K = step-size tr(K)(L . 2 ) K t x, a where 3: repeat ij olog- that occur projection to enforce centering L) scaling can use in dimensionality reduction. ij es. Temporarily dropping the that tr(L and 1 and li1 P n 4: t t+1 maximum weightby subtracting the mean fromblah subgraph method blah L and j constraints, preserving formulate the SDP above as h: ij (L L) = 0, ng Structure ij we can now be written hich When the connectivityof Lconstraints can maximum 5: i rand(1 . . . n) dividing each entry algorithm G(K) is a maximizing the following by its Frobenius norm. L: objective function over

Large-scale SPE

Video: h)p://vimeo.com/39656540 Notes on next slide

Notes for previous slide: Each node in this network is a person, each edge represents friendship on foursquare. The size of each node is proportional to how many friends that person has. We can see the existence of dense clusters of users, on the right, the top, and on the left. There is a large component in the middle. There are clear hubs. We can now use this low-dimensional representation of this high-dimensional network, to better track what happens when a new coffee shop opens in the east village. As expected, it spreads ...like a virus, across this social substrate. We see as each person checks in to la colombe, their friends light up. People who have discovered the place are shown in blue. The current checkin is highlighted in orange in orange. Its amazing to see how la colombe spreads. Many people have been talking about how ideas, tweets, and memes spread across the internet. For the rst time we can track how new places opening in the real world spread in a similar way.

The Social Graph What does this low-dimensional structure


mean? Homophily

Location, Demographics, etc.

The Social Graph

Inuence on foursquare Tip network


sample of 2.5m people doing tips from other
people and brands avg. path length 5.15, diameter 22.3

How can nd the authoritative people in this


network?

Measuring inuence w/ PageRank


[Page et al 99]

Iterative approach
start with random values and iterate works great w/ map-reduce X PR(j) P PR(i) = (1 d) + d k Aik
j2{Aij =1}

Measuring inuence w/ PageRank


[Page et al 99]

Equivalent to nding the principal


eigenvector of the normalized adj. matrix

A2B

nn

PR(i) / vi where Pv =

Aij Pij = P j Aij

1v

Inuence on foursquare Most inuential brands:


History Channel, Bravo TV, National Post,
Eater.com, MTV, Ask Men, WSJ, Zagat, NY Magazine, visitPA, Thrillist, Louis Vuitton

Most inuential users


Lockhart S, Jeremy B, Naveen S

Explore
A social recommendation engine built from check-in data

Foursquare Explore
Realtime recommendations from signals: location time of day check-in history friends preferences venue similarities

Putting it all together


Nearby relevant venues Friends check-in history, similarity Similar Venues Users check-in history MOAR Signals

< 200 ms

Our data stack MongoDB Amazon S3, Elastic Mapreduce Hadoop Hive Flume R and Matlab

Open questions What are the underlying properties and


dynamics of these networks? How can we predict new connections? How do we measure inuence? Can we infer real-world social networks?

Conclusion
Unique networks formed by people interacting
with each other and with places in the real world Massive scale -- today we are working with millions of people and places here at foursquare, but there are over a billion devices in the world constantly emitting this signal of userid, lat, long, timestamp

Join us!
foursquare is hiring! 110+ people and growing foursquare.com/jobs
Blake Shaw @metablake blake@foursquare.com

Das könnte Ihnen auch gefallen