Sie sind auf Seite 1von 6

Sentiment Analysis of Movie Reviews: A study on

Feature Selection & Classification Algorithms

Tirath Prasad Sahu Sanjeev Ahuja


National Institute of Technology National Institute of Technology
Raipur, India Raipur, India
tirsahu.it@nitrr.ac.in sanju_a17@live.com

Abstract—Sentiment analysis is a sub-domain of opinion technologies available we can know if a movie has more good
mining where the analysis is focused on the extraction of opinions than bad opinions and find the reasons why those
emotions and opinions of the people towards a particular topic opinions are positive or negative. Much of the early research in
from a structured, semi-structured or unstructured textual data. this field was centered around product reviews, such as reviews
In this paper, we try to focus our task of sentiment analysis on on different products on Amazon.com [1], defining sentiments
IMDB movie review database. We examine the sentiment as positive, negative, or neutral. Most sentiment analysis
expression to classify the polarity of the movie review on a scale studies are now focused on social media sources such as
of 0(highly disliked) to 4(highly liked) and perform feature IMDB, Twitter [2] and Facebook, requiring the approaches be
extraction and ranking and use these features to train our multi-
tailored to serve the rising demand of opinions in the form of
label classifier to classify the movie review into its correct label.
text. Furthermore, performing the phrase-level analysis of
Due to lack of strong grammatical structures in movie reviews
which follow the informal jargon, an approach based on
movie reviews proves to be a challenging task.
structured N-grams has been followed. In addition, a In this paper, we follow a lexical approach [3] using the
comparative study on different classification approaches has been SentiWordNet [4] to determine the overall polarity of the
performed to determine the most suitable classifier to suit our movie review. We analyze and study the features that affect the
problem domain. We conclude that our proposed approach to sentiment score of the movie review text. Also, we use the state
sentiment classification supplements the existing rating movie of the art classification algorithms for the evaluation of
rating systems used across the web and will serve as base to
performance and accuracy of the approach used. Also, we not
future researches in this domain. “Our approach using
classification techniques has the best accuracy of 88.95%.”
only study the approach but try to have a deeper understanding
of the problem domain.
Keywords—Feature selection, Movie Review, Sentiment
II. BACKGROUND
Analysis, Information Retrieval, Opinion Mining, Classifier.
Sentiment analysis aims to determine the attitude of a
I. INTRODUCTION speaker or a writer with respect to some topic or the overall
The present era of Internet has become a huge Cyber contextual polarity of a document. The attitude may be their
Database which hosts gigantic amount of data which is created judgment or evaluation, affective state which is the emotional
and consumed by the users. The database has been growing at state while writing, or the intended emotional communication
an exponential rate giving rise to a new industry filled with it, which is the emotional effect the writer wishes to have on the
in which users express their opinions across channels such as reader. Sentiment is defined as any kind of emotion and in the
Facebook, Twitter, Rotten Tomatoes and Foursquare. Opinions context of sentiment analysis is the opinion that is being
which are being expressed in the form of reviews provide an expressed in the form of text or speech.
opportunity for new explorations to find collective likes and A. Feature Selection
dislikes of cyber community. One such domain of reviews is
the domain of movie reviews which affects everyone from Most researchers apply standard feature selection in their
audience, film critics to the production company. The movie approach to improve computational performance with a
reviews being posted on the websites are not formal reviews handful using more sophisticated approaches. Papers focusing
but are rather very informal and are unstructured form of entirely on feature selection to improve sentiment analysis are
grammar. Opinions expressed in movie reviews give a very few. Among them, the famous one was Pang & Lee [6], who
true reflection of the emotion that is being conveyed. The removed objective sentences on a testbed consisting of
presence of such a great use of sentiment words to express the objective and subjective text trained on SVM. Initially they
review inspired us to devise an approach to classify the polarity found that sentiment classification result actually abated. They
of the movie using these sentiment words. then concluded it was more likely that sentences adjacent to
discarded sentences improved classification result over their
Sentiment Analysis is a technology that will be very baseline.
important in the next few years. With opinion mining, we can
distinguish poor content from high quality content. With the
978-1-4673-6621-2/16/$31.00 © 2016 IEEE
Another work used sophisticated feature selection and classification. SentiWordNet 3.0 is an open resource for
found that using either information gain (IG) or genetic research purposes, and presentlyendorsed to more than 300
algorithm (GA) results in an improvement in accuracy. research groups. SentiWordNet 3.0, especially focus on the
improvements regarding aspect (the algorithm used for
B. Information Gain automatically annotating WordNet) that it incorporates.
Let D be a dataset of labeled texts. Let pD represent the
probability that a random text D is classified as positive. The Godbole, Manjunath & Stevens in their work [5] present a
system that quantifies positive or negative opinion to each
classification should be fairly simple if the text is majorly
distinct entity in the text corpus. Their system consists of two
biased towards positive or negative instances. On the contrary,
phases, a sentiment recognition phase where opinion
if the set is very unevenly distributed with equal likelihood of expressing entities are determined and a scoring phase where a
positive and negative instances, then the task is difficult. The relative score for each entity is determined.
disorder in the set D is measured by its entropy:
In the work by Annett & Kondark [6] it was observed that
log 1 log 1 (1) ML technique of sentimental classification on movie reviews is
quite successful and it was also observed that the type of
This can be simplified as the average number of bits I features that are chosen have a dramatic impact on accuracy of
required to communicate the classification of each item in the the classifier. As there is an upper bound on the accuracy level
that a dictionary based approach has as demonstrated in lexical
corpus.
approach.
It is required to choose relevant features that help us
classify the set D. A feature is useful if it helps to Pang & Lee work [7] is considered to be a standard in
lowerdisorganization in the corpus. On choosing a feature x, sentimental analysis of movie review. They consider the
the corpus is divided amongst instances where x is 0 and problem of classifying documents not by topic, but by overall
instances where x is 1. Let the subsets be D0 and D1. If both of sentiment, e.g. determining whether a review is positive or
these sets are relatively organized, then we have minimized negative. They inferenced, that classical machine learning
disorder. Quantitatively, information gain is calculated by: techniques gives better results than human-produced baselines.
However, the three machine learning methods they employed
| | | | (Naive Bayes, maximum entropy classification, and support
, (2) vector machines) do not give as good results on sentiment
| | | |
classification as on traditional topic-based categorization. They
This is the difference between the entropy in the original further extracting these portions [8] and implementing efficient
dataset D, and the average entropy of the sets D0, D1. The techniques for finding minimum cuts in graphs; this greatly
Information Gain Criterion chooses features x1,…..xk that favorsindulgence of cross-sentence contextual constraints,
maximize IG(D,k). It chooses one feature at a time. which provides an efficient means for integrating inter-
sentence level contextual information with traditional
dictionary of words features.
C. SentiWordNet
Singh et al. [9] presents experimental work on performance
SentiWordNet [4] (SWN) is an extension of WordNet [10] evaluation of the SentiWordNet approach for document-level
that was developed by Esuli & Se- bastiani, which augments sentiment classification of Movie reviews and Blog posts. They
the information in WordNet with sentiment of the words in did variations in semantic features, scoring schemes and
them. Each synset in SWN comprises of sentimentscores that thresholds of SentiWordNet approach along with two popular
are positive and negative score along with an objectivity score. machine learning approaches: Naive Bayes and SVM for
The summation of these three scores gives the relative strength sentiment classification. The comparative performance of the
of positivity, negativity and objectivity of each synset. These approaches for both movie reviews and blog posts is illustrated
values have been obtained by using many semi-supervised through standard performance evaluation metrics of Accuracy,
ternary classifiers, with the capability of determining whether F-measure and Entropy.
a word was positive, negative, or objective. If all the
IV. A CLOSER LOOK AT THE PROBLEM
classifiers settled on a result then the highest value are
assigned for the analogous score, else the values for the The domain of opinion mining is vast having numerous
positive, negative and objective scores were proportional to applications. Sentiment analysis is tough because a same topic
the number of classifiers that assigned the word to each class. can be expressed in different ways. Also the words used to
express a positive sentiment would be negative in other
III. PREVIOUS WORK statements. The movie reviews posted on the inter-net are
The paper work presented by us takes into account the unstructured form of grammar and expressing opinions on a
previous works that have been performed in the problem topic are never standardized, one person's appreciation may
domain of sentiment analysis of movie reviews. differ from others.
Stefano, Andrea & Fabrizio in [4] present SentiWordNet In this section we have described the problem statement on
3.0, a lexical resource especiallycreated for assisting sentiment Sentiment Analysis of Movie Reviews:
1. Extracting Sentiment Words – It is the hearth of sentiment
analysis; all the review statement containns sentiment words
which have a major contribution in determining the
polarity of the review. Example, “The moovie was good and
interesting”, here the sentiment words goood and interesting
tells us that the polarity of the movie is poositive.
2. Sarcasm – It is really difficult to know the tone of author in
textual sentences, we can’t definitely saay that bad means
bad or good. For example, “The movie was w supposed to be
hilarious?”
3. Parsing – What does the verb and/oor adjective of a
subjective or objective textual sentence reeally refer to?
4. Scaling – What is the quantity of data inpput as a proportion
of the total universe of users? 10% of the IMDB corpus
gives you a rough idea of what's going onn but the result are
nowhere close to the resolution you gett with 50% of the
reviews.
V. PROPOSED METHODOLOG
GY
The basic methodology to determine polarityy is the one with a
lexical approach, where we look at the worrds comprising the
document and apply some algorithms to quuantify words with
some sentiment score and determine the collective
c polarity.
We have based our computational methodd on the publicly
available library SentiWordNet [4].In this work for
determining polarity of the document, we have focused on two
areas: 1) Feature Selection and Ranking 2) Classification
using Machine Learning techniques.We use the Rotten Figure 1: Proposeed Methodology
Tomatoes movie review dataset comprisinng of 8000 polar (Preposition) an (Determinnant) example (Noun) of
movie reviews. We tend to label the polarrity as follows: 0- (Preposition) annotated (Verb) text
t (Noun)”.
Strong Negative, 1-Weak Negative, 2--Neutral, 3-Weak
Positive, 4-Strong Positive. The proposed meethodology can be B. Feature Extraction
well explained from the flow chart as shown in Figure 1. After the preprocessing phase, the next step was analyzing the
data to find common observabble patterns that may affect the
A. Pre-processing
polarity of the document. In order
o to calculate the document
Pre-processing of the document is the preparaation of the dataset polarity, it is necessary to undeerstand that the sentiment score
before applying any algorithm on it. This iss done to speed up may be enhanced or diminisheed with its usage as well as their
the process of labeling the polarity of thhe document. Our relationship with the nearby words. These allowed us to
approach of preprocessing of the document iss as follows:
analyze features that affect thee polarity of the document and
1. Porter Stemming: It is a process of rem
moving commoner are as follows:
morphological endings from words in Engglish. It stems the
words to root words. For example, abate, abates, abated are 1. Positive Sentiment Word ds: Words having a positive
stemmed to root word, 'abate'. sentiment score accordingg to SentiWordNet. Example:
good, awesome etc.
2. Stopping: It is a technique to remove moost common words
according to a stop-word list to reduce size of document. We 2. Negative Sentiment Word ds: Words having negative score
have used the stop-word list containing the prepositions (e.g. according to SentiWordNet. Example: bad, awful etc.
above, across) and determiners (e.g. a, an, the).
t For example, 3. Positive Sentiment Bi-grrams: Two consecutive words
the text “Ram has made a good comeback” will
w be processed to both having a positive sentiment score according to
“Ram made good comeback”.
SentiWordNet. Example, “IInteresting and entertaining”.
3. Part of Speech Tagging : Part of speeech tagging is a
4. Negative Sentiment Bi-grrams: Two consecutive words
preprocessing technique where the woords are marked
corresponding to a particular part of speecch such as nouns, both having a negative sentiment score according to
verbs, adverbs etc. based on its relationship with adjacent and Gloomy and Boring”.
SentiWordNet. Example, “G
related words in the phrase, sentence or paaragraph. The text 5. Positive Sentiment Tri-grams: Three consecutive words,
“this sentence serves as an example of annootated text” will be all having a positive sentiment score according to
tagged as “this (Determinant) sentence (Nounn) serves (Verb) as
SentiWordNet. Example, “Interesting and Entertaining and SentiScore = Score(word) + (Score(word – 1)* 0.6))
Intriguing”. If positive sentiment word, then:
6. Negative Sentiment Tri-grams: Three consecutive words, If(word 1) is positive && adjective, then:
all having a negative sentiment score according to SentiScore = Score(word) + (Score(word – 1) * 0.7)
SentiWordNet. Example, “Gloomy and Boring and Else if, (word -1) is positive, then:
Uninteresting”. SentiScore = Score(word) + (Score(word – 1) * 1.3)
Else, SentiScore += Score(word)
7. Positive Sentiment words coupled with Adjective:
Positive sentiment words preceded by an adjective.
Example, “An awesome well-scripted movie”. We can see that the SentiScore(Sentiment Score) is calculated
according the six features we defined. The weights of each
8. Negative Sentiment words coupled with Adjective:
feature are different based on the information gain ratio and
Negative sentiment words preceded by an adjective.
feature ranking we performed.
Example, “A boring weak-scripted movie”.
Now, once we get the SentiScore, we determine the label
9. Positive Sentiment words with repeated letters: Positive
using the following algorithm:
sentiment words with repetition in letters. Example,
Awwwesome.
10. Negative Sentiment words with repeated letters: Algorithm 2. Proposed Algorithm for determining the label
Negative sentiment words with repetition in letters.
Example, Awwwfull. Initialize Count = 0
Count = No. of positive sentiment words + no. of negative
C. Feature Impact Analysis and Reduction sentiment words
With the analysis of features from observation, we need to Average Score = SentiScore / Count
find the impact of each feature on the polarity of the document If Average Score > 0.25, then:
to set the scaling factor for each of the feature. To find the SentiLabel = 4
impact, we used Information Gain of each features and used a Else if, Average Score > 0.00 &&<= 0.25, then:
Feature Ranking Algorithm to rank all the features. SentiLabel = 3
Table 1: Feature Impact
Else if Average Score = -0.25, then:
SentiLabel = 1
Feature Information Gain Else if Average Score < 0.25, then:
Positive Sentiment Words 0.312 SentiLabel = 0
Else SentiLabel = 2
Positive Sentiment Bi-grams 0.26
Positive Sentiment words+Adjective 0.22
Negative Sentiment Words 0.207
E. Classification
Negative Sentiment Bi-grams 0.153
With this we generated our result table, which we used for
applying classification techniques. We use well known
Negative Sentiment words+Adjective 0.116 classifiers namely Bagging, Random Forest, Decision Tree,
Naive Bayes, K-Nearest Neighbor, Classification via
As we can see in Table 1, the highest information gain is of Regression. The classification is done in our methodology
the feature Positive Sentiment Words. We can also find that with the aim to predict the class level for a machine to predict
the Information Gain on features with zero value can be the class of a movie review whenever it arrives.
neglected. Hence we are finally reduced to above features. VI. EXPERIMENT AND RESULT ANALYSIS
D. Proposed Algorithm A. Dataset Description
Based on the results of feature impact analysis we have We constructed a collection of 50,000 reviews from
proposed the following algorithm in Algorithm 1. IMDB, with a maximum of 30 reviews per movie. The
constructed dataset contains evenly distributed positive and
Algorithm 1. Proposed Algorithm
negative reviews, so a random guess gives 50% accuracy.
Initialize all features to 0. Following previous works on polarity classification, we
For each sentence, extract features. consider only highly polarized reviews. A negative review has
For each extracted feature do: a score<4, a neutral review has a score>4 and score<7 and a
If negative sentiment word, then: positive review has a score>7 out of 10. We split the dataset
If(word 1) is negative && adjective, then: equally into training and test sets. We cross-validate classifier
SentiScore = Score(word) + (Score(word – 1) * 0.5) parameters on training set using Random Forest to determine
Else if, (word – 1) is negative sentiment, then: classifier performance. The data set contains the movie
reviews along with their sentimental polarity levels. These
levels are on the scale of 0-10, so we use this scale for was deuce, even though I liked the original cartoon. There’s
defining the class label of the movie reviews which is divided this one scene I remember when the mafia ape guy explains to
as follows: his minions what rhetorical questions are. It’s atrocious. Many
fans hate on the series for including a female turtle, but that
1. Sentiment polarity level 0-2 belong to class 0
didn’t bother me. So much so that I didn’t even remember her
2. Sentiment polarity level 2-4 belong to class 1 until I read about the show recently. All in all, it’s miserably
forgettable. The only okay thing was the theme song. Guilty
3. Sentiment polarity level 5 belong to class 2
pleasure,they call it... Nananana ninja....”
4. Sentiment polarity level 6-8 belong to class 3
Reading the movie review we can well observe that the
5. Sentiment polarity level 8-10 belong to class 4 polarity of the review is negative. The IMDB score for the
review is 3/10, which denotes that the author tends to display a
B. Evaluation Measure negative sentiment towards the movie.
Five popular evaluation measures namely precision, recall, F-
measure, Accuracy and Area under the curve have been used Now let us see how our algorithm performs on this movie
for evaluating the performance of the method used in the review. Table 2 is what the result table is when our algorithm
experiment. is applied on the review.
Table 2: Features
1) Precision
In the field of information retrieval, precision is the fraction of Id Pos Pos Neg PosAdj Neg NegA Label
retrieved documents that are relevant to the query: Bigrams Bigrams dj
33 13 2 11 3 3 3 1
| | (3)
As we can see that our algorithm gave this review a label
2) Recall of 1 which according to our scale is negative sentiment which
Recall in information retrieval is the fraction of the is what the IMDB rating also conveys.
documents that are relevant to query that are successfully D. Experiment Results
retrieved.
1) Evaluation Measure Results
| | (4) The comparison has been performed on evaluation
measures discussed above. It is a clear observation from the
3) F-Measure Table 3 that random forest has the highest value among all
In statistical analysis of binary classification the F1 score other classification techniques where else Naive Bayes has the
(also F-score or F-measure) is a measure of a test’s accuracy. least values.
It computes the score considering both precision p and recall r
as follows:
Table 3: Evaluation Measure
2 (5)
Classification Precision Recall F-Measure AUC Accurac
Technique y
4) Accuracy
The accuracy is the number of true results (both true Random Forest 0.892 0.89 0.89 0.983 88.95%
positives and true negatives) among the total number of cases Decision Tree 0.879 0.875 0.876 0.975 87.53%
examined, i.e., true positives, true negatives, false positives, COCR 0.824 0.825 0.824 0.958 82.53%
false negatives.
Bagging 0.888 0.886 0.886 0.966 88.57%
5) Area under the Curve KNN 0.891 0.889 0.889 0.98 88.86%
It is a graphical plot that demonstrates the performance of
Naive Bayes 0.538 0.548 0.541 0.834 54.77%
a binary classifier by plotting the true positive rate against the
false positive rate at a various threshold settings.
2) Comparison with previous Models
C. Case Study On the evaluation of the previous studies on sentimental
In order to demonstrate our methodology with an example analysis of movie reviews, the work by Pang & Lee [8] which
to give a brief view of how our methodology works and how is one of the earlier work done in this domain achieve an
sentiment label is determined. Also we compare that our label accuracy of 87.2%. Other works which are based on Pang &
with the label in the IMDB dataset to determine how efficient Lee’s work have aimed towards improving the baseline
our algorithm is. accuracy, such as the works done by Mullen [11], Parabowo
[12] using pattern based accuracy using 10-fold & 5-fold cross
Let us consider an example with following review:
validation has achieved a maximum of 87.3% accuracy [12].
“This is the worst thing the TMNT franchise has ever On comparing our result with the previous method multiple
spawned. I was a kid when this came out and I still thought it classes, we can see in Table 4 that our approach achieves the
accuracy level of 88.95% which is a promising result as it is newspaper articles, product reviews, political discussion
comparable with previous state of-the-art results where they forums etc.
have only binary class where else our approach uses five class
levels for predicting the sentiment in a movie review. REFERENCES
[1] Gregory, Michelle L., et al. "User-directed sentiment analysis: Visualizing
Table 4: Comparison with Previous Model the affective content of documents." Proceedings of the Workshop on
Sentiment and Subjectivity in Text. Association for Computational
Data Cross Feature Baseline Best
Author Classifier Linguistics, 2006.
Split Validation Selection Accuracy Accuracy
[2] Pak, Alexander, and Patrick Paroubek. "Twitter as a Corpus for Sentiment
NB, ME, Analysis and Opinion Mining." LREC. Vol. 10. 2010.
Pang et al [7] 700+ 3-fold No N/A 82.9%
SVM
[3] Taboada, Maite, et al. "Lexicon-based methods for sentiment analysis."
Pang & Lee Computational linguistics 37.2 (2011): 267-307.
1000+ NB, SVM 10-fold Yes 87.15% 87.2%
[8] [4] Baccianella, Stefano, Andrea Esuli, and FabrizioSebastiani.
Mullen & Hybrid
"SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment
700+ 10-fold No 83.5% 86% Analysis and Opinion Mining." LREC. Vol. 10. 2010.
Collier [11] SVM
[5] Godbole, Namrata, ManjaSrinivasaiah, and Steven Skiena. "Large-Scale
Prabowo&T Hybrid
helwall [12]
1000+
SVM
10-fold No 87.3% 87.3% Sentiment Analysis for News and Blogs." ICWSM 7 (2007): 21.
[6] Annett, Michelle, and GrzegorzKondrak. "A comparison of sentiment
NB, analysis techniques: Polarizing movie blogs." Advances in artificial
KNN,
Proposed intelligence. Springer Berlin Heidelberg, 2008. 25-35.
1000+ Bagging, 10-fold Yes 81.87% 88.95%
Method [7] Pang, Bo, Lillian Lee, and ShivakumarVaithyanathan. "Thumbs up?:
COCR,
RF, DT sentiment classification using machine learning techniques." Proceedings
of the ACL-02 conference on Empirical methods in natural language
VII. CONCLUSION AND FUTURE SCOPE processing-Volume 10. Association for Computational Linguistics, 2002.
[8] Pang, Bo, and Lillian Lee. "A sentimental education: Sentiment analysis
In this work, we extracted new features that have a strong using subjectivity summarization based on minimum cuts." Proceedings
impact on determining the polarity of the movie reviews and of the 42nd annual meeting on Association for Computational Linguistics.
applied computation linguistic methods for the preprocessing Association for Computational Linguistics, 2004.
of the data. We then performed the feature impact analysis by [9] Singh, V. K., et al. "Sentiment analysis of movie reviews: A new feature-
based heuristic for aspect-level sentiment classification." Automation,
computing information gain for each feature in the feature set Computing, Communication, Control and Compressed Sensing (iMac4s),
and used it to derive a reduced feature set. Among six 2013 International Multi-Conference on. IEEE, 2013.
classification techniques, we found that the highest accuracy [10] Andreevskaia, Alina, and Sabine Bergler. "Mining WordNet for a Fuzzy
was given by Random Forest with an accuracy of 88.95%. Sentiment: Sentiment Tag Extraction from WordNet Glosses." EACL. Vol.
6. 2006.
In future, we would like to evaluate the effectiveness of the [11] Mullen, Tony, and Nigel Collier. "Sentiment Analysis using Support
proposed sentiment classification features and techniques for Vector Machines with Diverse Information Sources." EMNLP. Vol. 4.
other tasks, such as sentiment classification. We would like to 2004.
apply in-depth concepts of NLP for better prediction of the [12] Prabowo, Rudy, and Mike Thelwall. "Sentiment analysis: A combined
approach." Journal of Informetrics 3.2 (2009): 143-157.
polarity of the document. We would also like to extend this
technique on other domains of opinion mining likes

Das könnte Ihnen auch gefallen