Sie sind auf Seite 1von 1

All Together Now:

A Perspective on the
NETFLIX
PRIZE
Robert M. Bell, Yehuda Koren, and Chris Volinsky

W
hen the Netflix Prize was learning methods tend to center on their ratings on 18,000 movies. This
announced in October of algorithms (black boxes), where the amounted to more than 100 million rat-
2006, we initially approached focus is on the quality of predictions— ings. The task was to use these data to
it as a fun diversion from our ‘day jobs’ at rather than ‘understanding’ what drives build a model to predict ratings for a
AT&T. Our group had worked for many particular predictions. hold-out set of 3 million ratings. These
years on building profiles of customer In contrast, statisticians tend to think models, known as collaborative filter-
patterns for fraud detection, and we more in terms of models with parameters ing, use the collective information of
were comfortable with large data sets, that carry inherent interest for explain- the whole group to make individualized
so this seemed right up our alley. Plus, it ing the world. Leo Breiman’s article, “Sta- predictions.
was about movies, and who doesn’t love tistical Modeling: The Two Cultures,” Movies are complex beasts. Besides
movies? We thought it would be a fun which was published in Statistical Science, the most obvious characterization into
project for a few weeks. provides various views on this contrast. genres, movies differ on countless dimen-
Boy, were we wrong (not about the Our original team consisted of two stat- sions describing setting, plot, characters,
fun part, though). Almost three years cast, and many more subtle features such
isticians and a computer scientist, and
later, we were part of a multinational as tone or style of the dialogue. The
the diversity of expertise and perspec-
team named as the winner of the $1 Movie Genome Project (www.jinni.com/
million prize for having the greatest tive across these two disciplines was an
important factor in our success. movie-genome.html) reports using “thousands
improvement in root mean squared
of possible genes.” Consequently, any
error (RMSE) over Netflix’s internal
finite model is likely to miss some of the
algorithm, Cinematch. Fundamental Analysis
The predominant discipline of par- signal, or explanation, associated with
Challenge people’s ratings of movies.
ticipants in the Netflix Prize appears
to have been computer science, more The Netflix Prize challenge concerns On the other hand, complex models
specifically machine learning. While recommender systems for movies. Net- are prone to overfitting, or matching
something of a stereotype, machine flix released a training set consisting of small details rather than the big pic-
data from almost 500,000 customers and ture—especially where data are scarce.
24 VOL. 23, NO. 1, 2010

Das könnte Ihnen auch gefallen