Recommendation Systems

Recommendation
Systems
What is a Recommmendation System?
Recommendation system is an information filtering technique,
which provides users with information, which
he/she may be interested in.
Examples:
Types of Recommendation
 1. Simple Recommender
Simple recommenders are basic systems that recommends the top
items based on a certain metric or score.
Weighted Rating (WR) = ((v/v+m).R)+((m/v+m).C)
v is the number of votes for the movie;

m is the minimum votes required to be listed in the chart;
R is the average rating of the movie; And
C is the mean vote across the whole report
2. Content Based Recommender

Plot Description Based Recommender
 In this, we will compute pairwise similarity scores

for all movies based on their plot descriptions and
recommend movies based on that similarity score.
 The plot description is available as

the overview feature in the dataset
 To do this, we will compute the word vectors of each
overview or document.
 Term Frequency-Inverse Document Frequency (TF-
IDF) vectors for each document. This will give us a
matrix where each column represents a word in the
overview vocabulary and each row represents a
movie.
TF(t) = (Number of times term t appears in a document) / (Total number of terms in the
document).
 IDF(t) = log_e(Total number of documents / Number of documents with term t in it).
Credits, Genres and Keywords Based Recommender

The quality of the recommender can be increased with the
usage of better dataset. In this we are going to build a
recommender based on the following dataset: the 3 top actors,
the director, related genres and the movie plot keywords.
 Introduce a popularity filter: This recommender would take the

list of the 30 most similar movies, calculate the weighted
ratings (using the IMDB formula from above), sort movies
based on this rating and return the top 10 movies.
 Other crew members: other crew member names, such as
screenwriters and producers, could also be included.
 Increasing weight of the director: To give more weight to the
director, he or she could be mentioned multiple times in the
soup to increase the similarity scores of movies with the same
director
 Collaborative Filtering
 Collaborative filtering, also referred to as social filtering,

filters information by using the recommendations of other
people.
 It is based on the idea that people who agreed in their
evaluation of certain items in the past are likely to agree
again in the future.
 A person who wants to see a movie for example, might ask
for recommendations from friends. The recommendations of
some friends who have similar interests are trusted more
than recommendations from others. This information is used
in the decision on which movie to see.
Neighborhood-based approach
 Most collaborative filtering systems apply the so called
neighbourhood-based technique.
 In the neighborhood-based approach a number of

users is selected based on their similarity to the active
user. A prediction for the active user is made by
calculating a weighted average of the ratings of the
selected users.
Pearson’s correlation coefficient
 The weight given to a person’s ratings is determined by the
correlation between that person and the person for whom to make a
prediction. As a measure of correlation the Pearson correlation
coefficient can be used.
 The ratings of person X and Y of the item k are written as and ,
while and are the mean values of their ratings. The correlation
between X and Y is then given by:
 In this formula k is an element of all the items that

both X and Y have rated.
 A prediction for the rating of person X of the
item i based on the ratings of people who have
rated item i is computed as follows:
 Where Y consists of all the n people who have

rated item .
Selecting neighborhood
 Many collaborative filtering systems have to be able to
handle a large number of users. Making a prediction based
on the ratings of thousands of people has serious
implications for run-time performance. Therefore, when the
number of users reaches a certain amount a selection of the
best neighbors has to be made.
 Two techniques, correlation-thresholding and best-n-
neighbor, can be used to determine which neighbors to
select.
 The first technique selects only those neighbors who’s
correlation is greater than a given threshold. The second
technique selects the best n-neighbors with the highest
correlation.
Item based approach
 This approach is simply an inversion of the
neighborhood-based approach. Instead of measuring
the similarities between people the ratings are used to
measure the correlation between items.
 The Pearson correlation coefficient can again be used
as a measure. For example, the ratings of the movies
“Fargo” and “Pulp Fiction” have a perfect correlation.
Based on this correlation one might predict that Ken
likes “Fargo” given the fact that he liked “Pulp Fiction”.
Unsupervised learning
Unupervised learning: given data, i.e. examples, but no labels

Unsupervised learning
Given some example without labels, do something!

Unsupervised learning applications
learn clusters/groups without any label
customer segmentation (i.e. grouping)
image compression
bioinformatics: learn motifs
find important features
…
Unsupervised learning: clustering
Raw data features

f1, f2, f3, …, fn
f1, f2, f3, …, fn
f1, f2, f3, …, fn Clusters

f1, f2, f3, …, fn group into
extract
classes/clust
features
f1, f2, f3, …, fn ers
No “supervision”, we’re only given data and want to find

natural groupings
Clustering
Clustering: the process of grouping a set of objects

into classes of similar objects
Applications?
Face Clustering
Face clustering
A data set with clear cluster structure
What are some of the

issues for clustering?
What clustering
algorithms have you
seen/used?
K-means
Most well-known and popular clustering algorithm:
Start with some initial cluster centers
Iterate:
 Assign/cluster each example to closest center
 Recalculate centers as the mean of the points in a cluster
K-means: an example
K-means: Initialize centers randomly
K-means: assign points to nearest center
K-means: readjust centers
No changes: Done
Algorithm
Choosing right number of clusters
Elbow Method
Literature Survey
 https://cs.nyu.edu/courses/spring17/CSCI-GA.3033-
006/final_projects/movie-recommender-system.pdf
Tools used
 Python
 Pandas
 Numpy
 Scikit-Learn
 Anaconda

Recommendation Systems

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Recommendation Systems

Hochgeladen von

Copyright:

Verfügbare Formate

Recommendation

v is the number of votes for the movie;

2. Content Based Recommender

Plot Description Based Recommender

 In this, we will compute pairwise similarity scores

 The plot description is available as

Credits, Genres and Keywords Based Recommender

 Introduce a popularity filter: This recommender would take the

 Collaborative filtering, also referred to as social filtering,

 In the neighborhood-based approach a number of

 In this formula k is an element of all the items that

 Where Y consists of all the n people who have

Unupervised learning: given data, i.e. examples, but no labels

Given some example without labels, do something!

customer segmentation (i.e. grouping)

bioinformatics: learn motifs

find important features

Raw data features

f1, f2, f3, …, fn

f1, f2, f3, …, fn Clusters

No “supervision”, we’re only given data and want to find

Clustering: the process of grouping a set of objects

What are some of the

Most well-known and popular clustering algorithm:

Start with some initial cluster centers

Das könnte Ihnen auch gefallen