Sie sind auf Seite 1von 20

MINOR PROJECT – I

TWITTER SENTIMENT ANALYSIS USING NLTK IN PYTHON

PRESENTED TO: PRESENTED BY:


Dr. Atul Kumar Vanshika Gupta – 9916102005
Aditi Pareek - 9916102035
Ankit Pathak - 9916102084

1
TWITTER SENTIMENT
ANALYSIS USING NLTK
IN PYTHON
OUR AIM
Sentiment classification is the way to analyze the subjective
information in the text and then mine the opinion. Sentiment
analysis is the procedure by which information is extracted from the
opinions, appraisals and emotions of people in regard to entities,
events and their attributes. In decision making, the opinions of
others have a significant effect on customers ease, making choices
with regards to online shopping, choosing events, products and
entities.
The main objective of our work is to perform the sentiment analysis
on Indian Political leaders like Narendra Modi and Rahul Gandhi, such
that people opinions about these leaders, progress, workers, policies,
etc. which are extracted from Twitter.Thus, to achieve this objective
we build a classifier based on supervised learning and perform live
sentiment analysis on data collected of different political leaders.
4
MACHINE LEARNING
MACHINE LEARNING is a core sub-area of Artificial Intelligence (AI) that uses
statistical techniques to give computer system the ability to “learn” with data,
without being explicitly programmed.

The process of learning begins with observations or data, such as examples,


direct experience, or instruction, in order to look for patterns in data and make
better decisions in the future based on the examples that we provide. The
primary aim is to allow the computers learn automatically without human
intervention or assistance and adjust actions accordingly.

5
WHY MACHINE LEARNING?
Machine learning is starting to reshape how we live, and it’s time we
understood what it is and why it matter.
To better understand the uses of machine learning, consider some of the
instances where machine learning is applied: the self-driving Google car, cyber
fraud detection, online recommendation engines—like friend suggestions on
Facebook, Netflix showcasing the movies and shows you might like, and “more
items to consider” and “get yourself a little something” on Amazon—are all
examples of applied machine learning.
All these examples echo the vital role machine learning has begun to take in
today’s data-rich world. Machines can aid in filtering useful pieces of
information that help in major advancements.

6
7
8
TECHNIQUES OF MACHINE LEARNING
I. CLASSIFICATION

▫ classify a document into a predefined category.


▫ documents can be text, images
▫ Popular one is Naive Bayes Classifier.
▫ Steps:
– Step1 : Train the program (Building a Model) using training set
with a category for e.g. sports, cricket, news, etc.
– Classifier will compute probability for each word, the probability
that it makes a document belong to each of considered
categories
– Step2 : Test with a test data set against this Model

▫ http://en.wikipedia.org/wiki/Naive_Bayes_classifier
9
II. REGRESSION

▫ is a measure of the relation between the mean value of one


variable (e.g. output) and corresponding values of other variables
(e.g. time and cost).
▫ regression analysis is a statistical process for estimating the
relationships among variables.
▫ Regression means to predict the output value using training data.
▫ Popular one is Logistic regression (binary regression)

▫ http://en.wikipedia.org/wiki/Logistic_regression

10
ML APPLICATIONS
▪ Spam Email Detection
▪ Image Search (Similarity)
▪ Clustering (K-Means) : Amazon Recommendations
▪ Text Summarization - Google News
▪ Rating a Review/Comment: Yelp
▪ Fraud detection : Credit card Providers
▪ Decision Making : e.g. Bank/Insurance sector
▪ Sentiment Analysis
▪ Speech Understanding – iPhone with Siri
▪ Face Detection – Facebook’s Photo tagging

11
SENTIMENT ANALYSIS
▪ Sentiment Analysis refers to the use of Natural
Language Processing. NLP is a field of computer
science that focuses on interaction between
computers and human (natural) languages.
▪ Extracting feeling, For/against, like/dislike,
good/bad, etc.
▪ Machine learning methods to extract, identify, or
otherwise characterize the sentiment content of
a text unit
▪ It’s also known as opinion mining
▪ Deriving the opinion or attitude of a speaker

12
WHY SENTIMENT ANALYSIS?

Sentiment Analysis is highly efficient in certain fields such as:


▪ Business: In marketing field, companies use it to develop their strategies,
to understand customers’ feelings towards products or brand, how
people respond to their campaigns or product launches and why
consumers don’t buy some products.
▪ Politics: In political field, it is used to keep track of political view, to detect
consistency and inconsistency between statements and actions at the
government level. It can be used to predict election results as well!

13
FLOWCHART Stream
Twitter Developer
app is created and
Tweets are streamed

OF PROJECT
Tweets from Twitter.

Using techniques like


Pre-process Tokenizing, Stop
Tweets words removal,
Stemming, etc.

Using Supervised
Classify Learning models, pre
Tweets processed Tweets
are classified.

After classifying the


Positive Negative Tweets, we check
Tweet Tweet the accuracy of
classification models.
14
TWEET EXTRACTION
Twitter provider a platform from which we can access data from twitter account and
can use it for our own purpose. For this we have to login with our twitter credentials in
dev.twitter.com website. In this website, we first create an application which will be
used for streaming tweets by providing necessary details. Once our API is created we
can get to know customer key, customer secret key, access token key and access
secret key. These keys are used to authenticate user when user want to access twitter
data.
As the objective of this project is to analyze the sentiment of Tweets posed for
political leaders, only tweets about related to this should be collected. Hence for this
we create a Python script which will be used to fetch tweets from twitter.

15
PREPROCESSING TWEETS
Data obtained from twitter is not fit for extracting features. Mostly tweets consists
of message along with usernames, empty spaces, special characters, stop words,
emoticons, abbreviations, hash tags, time stamps, URL's ,etc. Thus to make this
data fit for mining we pre-process this data by using various function of NLTK. In
pre- processing we first extract our main message from the tweet, then we remove
all empty spaces, stop words (like is, a, the, he, them, etc.), hash tags, repeating
words, URL's, etc. We then replace all emoticons and abbreviations with their
corresponding meanings like :-), =D, =), LOL, Rolf, etc. are replaced with happy or
laugh. Once we are done with it, we are ready with processed tweet which is
provided to classifier for required results.

16
CLASSIFYING TWEETS
To classify tweets in different class (positive and negative) we build a classifier
which consists of several machine learning classifiers. To build our classifier we
used a library of Python called, Scikit-learn. Scikit-learn is a very powerful and
most useful library in Python which provides many classification algorithms. Scikit-
learn also include tools for classification, clustering, regression and visualization. To
install Scikit-learn we simply use on line command in python which is 'pip install
scikit- learn'.
In order to build our classifier, we use five in-build classifiers which come in Scikit-
learn library, which are: Naïve-Bayes Classifier, MultinomialNB Classifier,
BernoulliNB Classifier, Logistic Regression Classifier and Linear SVC and one
Ensemble Classifier.

17
CONCLUSION
To do the sentiment analysis of tweets, the proposed system first
extracts the twitter posts from twitter by user. The system can also
compute the frequency of each term in tweet.
We perform analysis on around 1,500 tweets total for each political
leader, so that we analyze the results, understand the patterns and give
a review on people opinion. We saw different leaders have different
sentiment results according to their progress and working procedure.
We also saw how any social event, speech or rally cause a fluctuation in
sentiment of people. We also get to know which policies are getting
more support from people which are started by any of these leaders. It
was shown that Narendra Modi is more successful political leader in
present time based on people opinion.

18
FUTURE SCOPE
Some of the future scopes that can be included in our project are:
1. A web-based application that can be made for our work in future.
2. We can improve our system that can deal with sentences with
multiple meanings.
3. We can work on multi languages like Hindi to provide sentiment
analysis to more local.

19
REFERENCES
[1] P. Pang, L. Lee and S. Vaithyanathan, Thumbs up? sentiment classification
using machine learning techniques, Proc. ACL-02 conference on Empirical
methods in natural language processing, vol.10, pp. 79-86, 2002
[2] Ankit, Saleena N., An Ensemble Classification System for Twitter Sentiment
Analysis, Proc. International Conference on Computational Intelligence and Data
Science (ICCIDS 2018), Procedia Computer Science 132, pp. 937–946, 2018
[3] E. Loper and S. Bird, NLTK: The Natural Language Toolkit, Proc. ACL-02
Workshop on Effective tools and methodologies for teaching natural language
processing and computational linguistics, vol. 1, pp. 63-70, 2002
[4] Juneja P., Ojha U., Casting Online Votes:To Predict Offline Results Using
Sentiment Analysis by machine learning Classifiers, Proc. 8th ICCNT 2017, IIT
Delhi, July 2017
[5] Sharma P., Moh T., Prediction of Indian Election Using Sentiment Analysis on
Hindi Twitter, Proc. IEEE International Conference on Big Data (Big Data), 2016
[6] www.wikipedia.org
[7] https://developer.twitter.com
[8] www.simplilearn.com
[9] www.geeksforgeeks.org
[10] https://scikit-learn.org

Das könnte Ihnen auch gefallen