Beruflich Dokumente
Kultur Dokumente
This project intends to use Association Rule Mining in order to recommend similar movies
based on the movie selected by the target using a hybrid recommendation system.
Project Aim:
This project aims to act as a building base for creating a robust recommendation system that
allows the user to have a seamless viewing experience.
Purpose of Report:
The purpose of this report is to highlight the methods used while creating this
recommendation system in terms of the kind of database used, the pre-processing or data
cleaning required in order to effectively use the data and to provide recommendations based
on the movie selected by the user.
Limitations of Report:
Model accuracy will not be a part of this report due to the fact that movie recommendations
are a matter of taste and it is difficult to determine the accuracy of the model in numerical
terms without conducting surveys or taking feedback for the same.
Project Objective
To act as base for developing a robust recommendation system for movies using Association Rule
Mining.
Methodology
Data Source:
GroupLens Research has collected and made available rating data sets from the MovieLens
web site (http://movielens.org). The data sets were collected over various periods of time,
depending on the size of the set.
The data set used for the purpose of the project was MovieLens Latest Datasets – Full
(https://grouplens.org/datasets/movielens/latest/)
Key Features:
Contains 27753444 ratings and 1108997 tag applications across 58098 movies
Created by 283228 users between January 09, 1995 and September 26, 2018
Users were selected at random for inclusion
All selected users had rated at least 1 movie
No demographic information is included
Each user is represented by an id, and no other information is provided
The rationale for using this dataset is that it is a dynamic dataset which is constantly updated.
This ensures that even the very latest movie ratings recorded in the website are captured by
this dataset. This is useful for recommendation systems as OTT catalogues are constantly
updated with new movies and any dataset used for recommendation should be able to reflect
even the latest relevant additions.
Nature of Data:
The data primarily used for this analysis is the genre or tags for each movie, therefore the
data is nominal in nature.