Sie sind auf Seite 1von 11

as a music recommender system

(for classifying music based on emotion detection)

STUDENT NAME: POOJA VISHAL PANCHOLI


TUTOR NAME : KASUN BANDARA
Student ID : 29984939
FIT5145 Introduction to Data Science Business and Data Case Study

Table of Contents
1. PROJECT INFORMATION ....................................................................................................................... 1
a. Introduction ...................................................................................................................................... 1
b. What is Musixmatch?........................................................................................................................ 1
c. Data Roles ......................................................................................................................................... 1
i. Data Scientist ................................................................................................................................ 1
ii. Data Analysis Engineer .................................................................................................................. 1
iii. Machine Learning Engineer .......................................................................................................... 1
2. BUSINESS MODEL .................................................................................................................................. 2
a. Aim of the system ............................................................................................................................. 2
b. Data Science Perception ................................................................................................................... 2
c. Challenges in the project .................................................................................................................. 2
d. Benefits of the project ...................................................................................................................... 2
3. DATA CHARACTERISTICS ....................................................................................................................... 3
a. Data Source ....................................................................................................................................... 3
b. DataSet Information ......................................................................................................................... 3
c. The five V’s ........................................................................................................................................ 3
4. Data Processing ..................................................................................................................................... 4
a. Clustering .......................................................................................................................................... 4
5. REFERENCES .......................................................................................................................................... 9

Student ID : 29984939 Pooja Vishal Pancholi i|Page


FIT5145 Introduction to Data Science Business and Data Case Study

1. PROJECT INFORMATION
a. Introduction
Digital Music has gained much attention within the last few years thanks to the
perpetually increasing number of individuals accessing music in their daily lives. While
there are many techniques for classifying and grouping the type of music according to
one's choice, the music industry has not stopped researching new and better techniques
for classifying music. Apparently, music is claimed to be the language of expressions and
emotions. Hence, if asked about the song you would like to listen to at any instance, you
would be sure to choose a song that might be pertinent to your mood at that particular
moment. Subsequently, a need for developing a classifying system based on human
emotions arose. This report makes use of Musixmatch to classify and suggest music based
on the relationship between music and human emotions.

b. What is Musixmatch?
Musixmatch is a medium for users to search and simply share song lyrics along with the
translations. It's the world's largest lyric database platform. Musixmatch displays lyrics
that are synchronized with the music that is being played. It scans all the songs in a user’s
library on its native applications and finds the lyrics for all of them and it can also be used
as a music player.

c. Data Roles
By examining the job listings on (https://about.musixmatch.com/careers), we can
encapsulate that there are three main roles practiced at Musixmatch.
i. Data Scientist
A Data Scientist at Musixmatch has the job role of amalgamating data science
phenomenon alongside engineering aspects to make sense of Musixmatch’s
humongous data. It should be someone who is well versed with Python, Matlab,
and R.
ii. Data Analysis Engineer
A Data Analysis Engineer at Musixmatch must be able to tackle complicated data-
related issues while utilizing the huge datasets available at Musixmatch. The Data
Analysis Engineer individual must have proper knowledge of visualization in order
to determine the feasibility of the emotion detection system.
iii. Machine Learning Engineer
A Machine Learning Engineer has the responsibilities of implementing machine
learning and artificial intelligence mechanisms in order to obtain the emotions
related data from the music. The job profile will match with someone who has
extensive knowledge of machine learning frameworks like NLTK, CNTK, Torch, and
TensorFlow.

Student ID : 29984939 Pooja Vishal Pancholi 1|Page


FIT5145 Introduction to Data Science Business and Data Case Study

2. BUSINESS MODEL
a. Aim of the system
Detecting music based on emotions is a trending element in the music industry nowadays.
It allows the categorization of the data according to human feelings. It would be helpful if
music collection organized according to their feelings could be presented to the users.
The aim of this system is to provide the user with proper suggestions based on the
emotion that the user is feeling at that moment. For this, the system will be making use
of the text-based and audio based emotion classification with deep learning architectural
techniques.

b. Data Science Perception


From a data scientist’s viewpoint, a business like Musixmatch encompasses great value
and proffer benefits of success only if it:
 has the ability to synchronize the lyrics with the music correctly
 is able to manage huge music catalogs according to the trend and user’s demands.
 categorizes the music data in proper categories
 creates categorization based on categories such as genres, artists as well as
human emotions.

c. Challenges in the project


 One of the great challenges might be the ambiguousness of the word’s meaning. This
might be because one word can have different meanings and the system would not
be able to recognize the correct emotion linked with the word and may not be able
to categorize the song properly (Jamdar, Abraham, Khanna & Dubey, 2015).

 Another challenge is that there might be an issue if the system is not able to find a
suitable category for the song and is unable to categorize it.

d. Benefits of the project


 Improved user search : The users will be able to save time by searching the songs
based on their current feelings.

 Data Efficiency : The data that is wrangled after classifying into emotion categories
will improve the quality of search results for the user and will be able to reduce data
redundancy.
 Covering a wide range of emotions : Only 9 broad categories are defining a huge
range of emotions and reduce the cost of classifying and storing each song in
redundant categories.
 Efficient usage of resources : The resources such as user’s mood can be used to
predict playlists and time of the day when users wishes to play certain songs. For
example, while exercising, the user likes to play motivational energetic songs.
Student ID : 29984939 Pooja Vishal Pancholi 2|Page
FIT5145 Introduction to Data Science Business and Data Case Study

3. DATA CHARACTERISTICS
a. Data Source
The main data source for the project is
 MusixMatch’s lyrics dataset

b. DataSet Information
In order to classify the emotion based on the segments of the lyrics, we need to store 5
elements in the dataset:
 Track ID
 The lyrics
 Starting time of the audio file segments
 Ending time of the audio file segments
 Mood related to the segment

c. The five V’s

Data Characteristics Description

 This dataset is provided by MusixMatch. The dataset


Volume contains details of over 14 million lyrics in 50 different
languages (MusixMatch, 2019).

 This dataset has structured data with all the lyrics


associated with a track ID for tracks stored in the track
Variety dataset.
 The lyrics available are for more than one languages.

 The data for MusixMatch is constantly increasing as new


lyrics are added or a new song is classified based on
Velocity emotions.
 More than 1 Billion people search lyrics for songs everyday.

 All the music will have different duration and content.


Variability  Many songs can have multiple categories.

 Confusion in emotion can have two moods for one songs,


so, data may not be accurate.
Veracity  Possibilities of NULL values in emotion type (when the song
is not classified). These might lead to not analyze the data
properly.

Student ID : 29984939 Pooja Vishal Pancholi 3|Page


FIT5145 Introduction to Data Science Business and Data Case Study

4. Data Processing
a. Clustering
The process is very tedious. The lyric dataset is analayzed and each segment of lyrics is
separated and taken into account in order to detect the emotion found in the song.
Categories are defined for the common expressions shown by human beings while
listening to music. Each category has some emotions that falls best under that category.
This terminology is termed as categorization based on clustering. From a data scientist
point of view, it seems that this is a good practice as it allows to group all the emotions
together that interprete the similar emotion.

b. Visualization
Once the data is categorized, the already existing data is supposed to be visualized in
order to train the models. The visualization is done using the Russell’s Valence-Arousal
Space (Parisi, Francia, Olivastri & Tavella, 2019) .

Student ID : 29984939 Pooja Vishal Pancholi 4|Page


FIT5145 Introduction to Data Science Business and Data Case Study

c. Prediction
The prediction is performed before the clustering in order to obtain the emotion that is
needed to be clustered along with other emotions in a particular category.

5. RESOURCES
a. Artist’s Songs data
Like any other music or lyric based service, MusixMatch needs a user oriented system that
can attract song artists as well as song listeners to visit MusixMatch to obtain lyrics for a
particular song.

b. User’s data
It is quite obvios that human emotions change in some time. It can be made sure that by
storing user’s personal activity for accessing music can be helpful to predict the user’s
Student ID : 29984939 Pooja Vishal Pancholi 5|Page
FIT5145 Introduction to Data Science Business and Data Case Study

emotion at a particular period of the day and based on those characteristics, the music
can be recommended to the user on his/her next access to MusixMatch. There is a
functionality in MusixMatch where users tag their mood along with the songs lyrics, which
in turn helped to train the model for future predictions.

c. Tracks data
Data should also be stored for each track based on which the mood detection is
performed so that it will become easy for the user to access the particular song’s lyrics
again. For each track, segmenting is done in order to identify the mood in particular
segment and tracks data will help MusixMatch to perfrom segmentation and cluttering
for many tracks.

Student ID : 29984939 Pooja Vishal Pancholi 6|Page


FIT5145 Introduction to Data Science Business and Data Case Study

6. DATA SCIENCE TOOLS AND SOFTWARES


The MusixMatch data scientist make use of the following machine learning mechanisms :
 Support Vector Machines
 Deep Neural Networks
 Convolutional Neural Networks
 Feature extraction
From a data scientist point of view, the project should also make use of basic visualization tools
to overcome the challenges faced to solve the multiple emotions categories assigned to one song
or no categories assigned to a particular song. Because this might cause the data to produce null
values.

For the removal or replacement of the null or undefined vaues, the project should make use of
the wrangling tools like python and R or manually by manipulating the csv files for the data.

7. DATA ANALYSIS
The project makes use of neural networks in order to train the Musixmatch data. There are two
models trained for the data:
a. Text based

Student ID : 29984939 Pooja Vishal Pancholi 7|Page


FIT5145 Introduction to Data Science Business and Data Case Study

The text based models make use of fastText, where there is a dimension of 300 words
used for a segment. All the embedding is done and final outcome of the music emotion is
detected.

s
b. Audio based
The audio based classification process makes use of the convolutional neural network
(CNN), where it segments the music in a time frame of 30 seconds so that emotion is
detected for each segments as shown in the figure below. Here vocals are separated for
better identification of words.

c. Analysis Evaluation
In order to complete the project goals, prediction and optimal analysis must be done on
the MusixMatch data. It mostly makes use of deep learning and neural networking but,
regression can be used for pattern detection as well. Machine learning in this context can
be used to find out the best suitable algorithm to detect the emotion changes in the song
and cluster the emotions likewise. Both the text model as well as the audio model must
be properly trained in order to get the lyrical data or else there will be a confusion in the
output of the learning and a song might not have any emotion linked with it.

The project lacks good visualization in order to find out the types of categories and
emotions and hence, it is static for the categories defined by the MusixMatch dataset. It
should be made dynamic based on the text and audio models learned.

Student ID : 29984939 Pooja Vishal Pancholi 8|Page


FIT5145 Introduction to Data Science Business and Data Case Study

Also, Regression analysis should be done in order to find out the best time of the day to
match with the emotion to provide better recommendations to the user according to
his/her current activity.

8. REFERENCES

Jamdar, A., Abraham, J., Khanna, K. & Dubey, R. (2015). Emotion analysis of songs based on
lyrical and audio features. Paper presented at International Journal of Artificial
Intelligence & Applications, Mumbai. Retrieved from
https://www.researchgate.net/publication/278733877_Emotion_Analysis_of_Songs_Bas
ed_on_Lyrical_and_Audio_Features

MusixMatch, (2019). Retrieved from https://developer.musixmatch.com/

MusixMatch, (2019). Retrieved from https://about.musixmatch.com/business/artificial-


intelligence

Parisi, L., Francia, S., Olivastri, S., & Tavella, M. S, (2019). Exploiting synchronized lyrics and vocal
features for music emotion detection. Retrieved from
https://www.researchgate.net/publication/330410156_Exploiting_Synchronized_Lyrics_
And_Vocal_Features_For_Music_Emotion_Detection

Total Words excluding References : 1900 words

Student ID : 29984939 Pooja Vishal Pancholi 9|Page

Das könnte Ihnen auch gefallen