Sie sind auf Seite 1von 37

Recommendation

Systems
What is a Recommmendation System?
Recommendation system is an information filtering technique,
which provides users with information, which
he/she may be interested in.
Examples:
Types of Recommendation
 1. Simple Recommender
Simple recommenders are basic systems that recommends the top
items based on a certain metric or score.
Weighted Rating (WR) = ((v/v+m).R)+((m/v+m).C)

v is the number of votes for the movie;


m is the minimum votes required to be listed in the chart;
R is the average rating of the movie; And
C is the mean vote across the whole report

2. Content Based Recommender


Types of Recommendation

Plot Description Based Recommender

 In this, we will compute pairwise similarity scores


for all movies based on their plot descriptions and
recommend movies based on that similarity score.

 The plot description is available as


the overview feature in the dataset
Types of Recommendation
 To do this, we will compute the word vectors of each
overview or document.
 Term Frequency-Inverse Document Frequency (TF-
IDF) vectors for each document. This will give us a
matrix where each column represents a word in the
overview vocabulary and each row represents a
movie.
TF(t) = (Number of times term t appears in a document) / (Total number of terms in the
document).
Types of Recommendation
 IDF(t) = log_e(Total number of documents / Number of documents with term t in it).

Credits, Genres and Keywords Based Recommender


The quality of the recommender can be increased with the
usage of better dataset. In this we are going to build a
recommender based on the following dataset: the 3 top actors,
the director, related genres and the movie plot keywords.

 Introduce a popularity filter: This recommender would take the


list of the 30 most similar movies, calculate the weighted
ratings (using the IMDB formula from above), sort movies
based on this rating and return the top 10 movies.
Types of Recommendation
 Other crew members: other crew member names, such as
screenwriters and producers, could also be included.
 Increasing weight of the director: To give more weight to the
director, he or she could be mentioned multiple times in the
soup to increase the similarity scores of movies with the same
director
Types of Recommendation
 Collaborative Filtering

 Collaborative filtering, also referred to as social filtering,


filters information by using the recommendations of other
people.
 It is based on the idea that people who agreed in their
evaluation of certain items in the past are likely to agree
again in the future.
 A person who wants to see a movie for example, might ask
for recommendations from friends. The recommendations of
some friends who have similar interests are trusted more
than recommendations from others. This information is used
in the decision on which movie to see.
Neighborhood-based approach
 Most collaborative filtering systems apply the so called
neighbourhood-based technique.

 In the neighborhood-based approach a number of


users is selected based on their similarity to the active
user. A prediction for the active user is made by
calculating a weighted average of the ratings of the
selected users.
Pearson’s correlation coefficient
Pearson’s correlation coefficient
 The weight given to a person’s ratings is determined by the
correlation between that person and the person for whom to make a
prediction. As a measure of correlation the Pearson correlation
coefficient can be used.
 The ratings of person X and Y of the item k are written as and ,
while and are the mean values of their ratings. The correlation
between X and Y is then given by:

 In this formula k is an element of all the items that


both X and Y have rated.
Pearson’s correlation coefficient
 A prediction for the rating of person X of the
item i based on the ratings of people who have
rated item i is computed as follows:

 Where Y consists of all the n people who have


rated item .
Selecting neighborhood
 Many collaborative filtering systems have to be able to
handle a large number of users. Making a prediction based
on the ratings of thousands of people has serious
implications for run-time performance. Therefore, when the
number of users reaches a certain amount a selection of the
best neighbors has to be made.
 Two techniques, correlation-thresholding and best-n-
neighbor, can be used to determine which neighbors to
select.
 The first technique selects only those neighbors who’s
correlation is greater than a given threshold. The second
technique selects the best n-neighbors with the highest
correlation.
Item based approach
 This approach is simply an inversion of the
neighborhood-based approach. Instead of measuring
the similarities between people the ratings are used to
measure the correlation between items.
 The Pearson correlation coefficient can again be used
as a measure. For example, the ratings of the movies
“Fargo” and “Pulp Fiction” have a perfect correlation.
Based on this correlation one might predict that Ken
likes “Fargo” given the fact that he liked “Pulp Fiction”.
Unsupervised learning

Unupervised learning: given data, i.e. examples, but no labels


Unsupervised learning

Given some example without labels, do something!


Unsupervised learning applications
learn clusters/groups without any label

customer segmentation (i.e. grouping)

image compression

bioinformatics: learn motifs

find important features


Unsupervised learning: clustering

Raw data features


f1, f2, f3, …, fn

f1, f2, f3, …, fn

f1, f2, f3, …, fn Clusters


f1, f2, f3, …, fn group into
extract
classes/clust
features
f1, f2, f3, …, fn ers

No “supervision”, we’re only given data and want to find


natural groupings
Clustering

Clustering: the process of grouping a set of objects


into classes of similar objects

Applications?
Face Clustering
Face clustering
A data set with clear cluster structure

What are some of the


issues for clustering?

What clustering
algorithms have you
seen/used?
K-means

Most well-known and popular clustering algorithm:

Start with some initial cluster centers

Iterate:
 Assign/cluster each example to closest center
 Recalculate centers as the mean of the points in a cluster
K-means: an example
K-means: Initialize centers randomly
K-means: assign points to nearest center
K-means: readjust centers
K-means: assign points to nearest center
K-means: readjust centers
K-means: assign points to nearest center
K-means: readjust centers
K-means: assign points to nearest center

No changes: Done
Algorithm
Choosing right number of clusters
Elbow Method
Literature Survey
 https://cs.nyu.edu/courses/spring17/CSCI-GA.3033-
006/final_projects/movie-recommender-system.pdf
Tools used
 Python
 Pandas
 Numpy
 Scikit-Learn
 Anaconda

Das könnte Ihnen auch gefallen