Talk BedCon Berlin 2012

How to build a recommender system based on Mahout and Java EE
Berlin Expert Days

29. 30. March 2012 Manuel Blechschmidt CTO Apaxo GmbH
All the web content will be personalized in three to five years. Sheryl Sandberg COO Facebook 09.2010
Agenda
- What is personalization? - What algorithms can be used? - Architecture of a recommender system - How to bundle Mahout into an Java EE application - Interface with the recommender - Conclusion
What is personalization?
Personalization involves using technology to accommodate the differences between individuals. Once confined mainly to the Web, it is increasingly becoming a factor in education, health care (i.e. personalized medicine), television, and in both "business to business" and "business to consumer" settings.
Source: https://en.wikipedia.org/wiki/Personalization
Amazon.com
TripAdvisor.com
eBay
criteo.com - Retargeting
Zalando
Plista
YouTube
Naturideen.de (coming soon)
Recommender
This talk will concentrate on recommender technology based on collaborative filtering (cf) to personalize a web site - a lot of research is going on - cf has shown great success in movie and music industry - recommenders can collect data silently and use it without manual maintenance
What is a recommender?
Let U be a set of users of the recommendation system and I be the set of items from which the users can choose. A recommender r is a function which produces for a user ui a set of recommended items Rk with k entries and a binary, transitive, antisymmetric and total relation prefers_overui which can be used for sorting the recommendations for the user. The recommender r is often called a top-k recommender.
WARNING!
The next slides contain math which takes normally 3 years to understand so don't be disappointed if you don't get it.
What should wolf and sheep eat?

Ms. Baumgartner your sun is ignoring again the food chain. SON!
Source: http://static.nichtlustig.de/comics/full/081009.jpg
Demo Data
Carrots Rabbit Cow Dog Pig Chicken Pinguin Bear Lion Tiger Antilope Wolf Sheep 10 7 ? 5 7 2 2 ? ? 6 1 ? Grass 7 10 1 6 6 2 ? ? ? 10 ? 8 Pork 1 ? 10 4 2 ? 8 9 8 1 ? ? Beef 2 ? 10 ? ? 2 8 10 ? 1 8 ? Corn ? ? ? 7 10 2 2 2 ? ? ? ? Fish 1 ? ? 3 ? 10 7 ? 5 ? 3 2
Characteristics of Demo Data

Ratings from 1 10 Users: 12 Items: 6 Ratings: 43 (unusual normally 100,000 100,000,000) Matrix filled: ~60% (unusual normally sparse around 0.5-2%) Average Number of Ratings per User: ~3.58 Average Number of Ratings per Item: ~7.17 Average Rating: ~5.579
https://github.com/ManuelB/facebook-recommender-demo/tree/master/docs/BedConExamples.R
YouTube Rating Distribution
http://techcrunch.com/2009/09/22/youtube-comes-to-a-5-star-realization-its-ratings-are-useless/
Model and Memory Approaches

- Item- or User- Based Collaborative Filtering - Idea suggest what similair users did - Matrix Factorization e.g Singular Value Decomposition - Try to create a profile based of factors for users and items Main difference: A model base approach tries to extract the underlying logic from the data.
User Based Approach

- Find similar or opposite animals like wolf - Checkout what these other animals like - Recommend this to wolf
Find animals which voted for beef, fish and carrots too
Carrots Wolf Pinguin Bear Rabbit Cow Dog Pig Chicken Lion Tiger Antilope Sheep 1 2 2 10 7 ? 5 7 ? ? 6 ? Grass ? 2 ? 7 10 1 6 6 ? ? 10 8 Pork ? ? 8 1 ? 10 4 2 9 8 1 ? Beef 8 2 8 2 ? 10 ? ? 10 ? 1 ? Corn ? 2 2 ? ? ? 7 10 2 ? ? ? Fish 3 10 7 1 ? ? 3 ? ? 5 ? ?
Pearson Correlation
- 1 = very similar - (-1) = complete opposite votings - similarity between wolf and pinguin: -0.2401922 - cor(c(8,3,1),c(2,10,2)) - similarity between wolf and bear: 0.8196562 - cor(c(8,3,1),c(8,7,2)) - similarity between wolf and rabbit: -0.6465846 - cor(c(8,3,1),c(2,1,10))
Weighted sum of the ratings

Pork (10): (0.8196 * 8 + (-0.6466) * 1) / (0.8196 + (-0.6465) = 34 ~ 10 Grass (5.65): (2*(-0.2401)+7*(-0.6465)) / ((-0.2401) + (-0.6465)) = 5,65 Corn (2,0): 2*(-0.2401)+2*(0.8196)) / (-0.2401+0.8196) = 2,0
Predicted ratings
- Wolf should eat: Pork Rating: 10.0 - Wolf should eat: Grass Rating: 5.645701 - Wolf should eat: Corn Rating: 2.0
SVD
http://public.lanl.gov/mewall/kluwer2002.html
Factorized Matrixes
Predicted Matrix (k = 2)
Predicted Ratings
- Sheep should eat: Carrots Rating: 6.01 - Sheep should eat: Corn Rating: 5.83
What other algorithms can be used?

Similarity Measures for Item or User based: - LogLikelihood Similarity - Cosine Similarity - Pearson Similarity - etc. Estimating algorithms for SVD: - ALSWRFactorizer - ExpectationMaximizationSVDFactorizer
Let's built it
Mahout - Taste
Taste is a flexible, fast collaborative filtering engine for Java. It was primarly written by Sean Owen and is part of the Apache Mahout library. Another important author is: Sebastian Schelter It offers all the algorithms introduced here and is Open Source
http://mahout.apache.org/taste.html
Architecture of the recommender
Packaging
Eclipse Project
Maven pom.xml
Demo and live Debugging
Conclusion
Recommendation is a lot of math You shouldn't implement the algorithms again There are a lot of unsanswered questions - Scalibility, Performance, Usability You can gain a lot from good personalization
More sources
http://www.apaxo.de/ http://mahout.apache.org/ http://research.yahoo.com/ http://www.grouplens.org/ http://recsys.acm.org/ https://github.com/ManuelB/facebook-recommender-demo/ http://docs.oracle.com/javaee/6/tutorial/doc/

Talk BedCon Berlin 2012

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Talk BedCon Berlin 2012

Hochgeladen von

Copyright:

Verfügbare Formate

How to build a recommender system based on Mahout and Java EE

Berlin Expert Days

Naturideen.de (coming soon)

What should wolf and sheep eat?

Characteristics of Demo Data

YouTube Rating Distribution

Model and Memory Approaches

User Based Approach

Weighted sum of the ratings

What other algorithms can be used?

Architecture of the recommender

Demo and live Debugging

Das könnte Ihnen auch gefallen