Sie sind auf Seite 1von 23

PROJECT REPORT ON TWITTER FEED SENTIMENT ANALYSIS

BY USING NATURAL LANGUAGE PROCESSING (NLP)


A dissertation submitted in partial fulfillment of the
Requirements for the award of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE & ENGINEERING
Submitted by
KOTHPALLI RAJA SEKHAR N140695
SHEIK BASHEERUDDIN N140945
KAMBALA BHANU PRASAD N140991
Under the Esteemed Guidance of
Ms.N. SWATHI
Faculty, Dept. of CSE
RGUKT- IIIT, Nuzvid.

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


RAJIV GANDHI UNIVERSITY of KNOWLEDGE
TECHNOLOGY
IIIT NUZVID 521202 KRISHNA Dist. A.P
ACADEMIC YEAR 2018-2019
RAJIV GANDHI UNIVERSITY of KNOWLEDGE TECHNOLOGY
IIIT NUZVID 521202 KRISHNA Dist. A.P
ACADEMIC YEAR 2018-2019
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE OF PROJECT COMPLETION


This is to certify that the project work entitled “Twitter Feed Sentiment Analysis by using
Natural Language Processing (NLP)” is a bona fide work done by

KOTHPALLI RAJA SEKHAR N140695


SHEIK BASHEERUDDIN N140945
KAMBALA BHANU PRASAD N140991
In the partial fulfillment of the requirement for the award of marks of Mini Project in the department of
Computer Science and Engineering under my guidance during the academic year 2018-2019. This
project in my opinion, is worthy of consideration for the award of marks in accordance with the
Department and University regulations.
Date:
Place:

Project Guide
Ms.N. SWATHI R. UPENDAR RAO
Faculty Head of the Department
Department of Computer Science Department of Computer Science
Engineering Engineering
DECLARATION
We certify that
A. The work contained in the project report is original and has been done by ourselves under the
general supervision of our supervisor.
B. The work has not been submitted to any other institute for any degree or diploma.
C. We have followed the guidelines provided by the institute in writing the thesis.
D. We have conformed to the norms and guidelines gives in the Ethical Code of Conduct of the
Institute.
E. Whenever we have used material (data, theoretical analysis and text) from other sources, we
have given due credit to them by citing in the text of the thesis and giving their details in the
reference.
F. Whenever we have quoted written material from other sources, we have put them under
quotations marks and given due credit to them by citing them and giving their details in the
references.
KOTHPALLI RAJA SEKHAR N140695
SHEIK BASHEERUDDIN N140945
KAMBALA BHANU PRASAD N140991

Table of Contents
1. Abstract………………………………………………………5
2. Introduction…………………………………………………6
2.1 Purpose
2.2 Scope
3. Problem Definition ……………………………………….7
3.1 Existing System
3.2 Proposed System
4. System Analysis…………………………………………….8
4.1 Data Flow Diagrams
4.2 UML Diagrams
5. Software Requirement Specification…………………12
5.1 Definition of SRS
5.2 Requirement Analysis
5.3 Requirement Specification
6. Document Design……………………………………………14
6.1 System Design
7. About Software………………………………………………15
7.1 Over View of R
8. About Algorithms………………………………………………16
8.1 Over View of Natural Language Processing (NLP)
9. Coding………………………………………………………….17
10. Testing………………………………………………………..29
11. Screens……………………………………………………….20
Fig 1: Input-1
Fig 2: Output-1
Fig 3: Input-2
Fig 4: Output-2
12. Conclusion…………………………………………………..22
13. Bibliography………………………………………………..23
1. ABSTRACT
By randomly taking some number of tweets and its respective retweets using a keyword, we
analyse whether it is positive or negative. As everyone spend hours daily on social medias and sharing
their opinion in the form of tweets, we can easily gather those and run sentiment analysis on each word
in different topics. We run experiments on different queries from politics to humanity and show the
interesting results in the form of a bar graph. Analysing the public sentiment is important for many
applications such as firms trying to find out the response of their products in the market and predicting
political elections.
2. INTRODUCTION
2.1 Purpose:
Twitter Feed Sentiment Analysis is a tweet analyzing system, in this we use Natural Language
Processing (NLP) to decide whether the tweet is positive or negative or neutral.

2.2 Scope:
 Open source and freely available
 TTA is a good approach to give result on a survey by analyzing all tweets and retweets on
that survey topic.
 TTA use some certain and specified keywords to decide whether the tweet is positive or
negative or neutral.
3. PROBLEM DEFINITION
3.1Existing System:
In this system, we don’t know whether a tweet is positive or negative without reading it. In
twitter they conduct polls to get the results of either survey or to know opinion of people on a certain
problem. It is difficult to analyse each and every tweet in a poll, it takes lot of time to declare result of
the poll. Since twitter contains so much data of tweets, it’s somewhat difficult to analyse a group of
tweets manually. To overcome this problem we proposed Twitter Feed Sentiment Analysis.

3.2 Proposed System:


As we can’t read every tweet and its respective retweets,so we propose Twitter Feed Sentiment
Analysis.We have to register an app through twitter account using twitter API to fetch some tweets based
on user given keyword. Tweets will be saved in a .CSV file and we just take column of tweets to pre-
process the data. After pre-process, we divide every tweet into tokens using tokenization and apply
sentiment analysis on each token to determine whether it is positive, negative or neutral. Results will be
displayed in the form of bar graph and it will automatically save in current working directory in .JPG
format.

4. SYSTEM ANALYSIS
System Analysis is first stage according to System Development Life Cycle model. This System
Analysis is a process that starts with the analyst.
Analysis is a detailed study of the various operations performed by a system and their
relationships within and outside the system. One aspect of analysis is defining the boundaries of the
system and determining whether or not a candidate should consider other related systems.

4.1 DATAFLOW DIAGRAMS:


A graphical tool used to describe and analyze the moment of data through a system manual or
automated including the process, stores of data, and delays in the system. Data Flow Diagrams are the
central tool and the basis from which other components are developed. The transformation of data from
input to output, through processes, may be described logically and independently of the physical
components associated with the system. The DFD is also known as a data flow graph or a bubble chart.
DFD Symbols:
Dataflow:
Data move in a specific direction from an origin to a Destination.

Process:
People, procedures, or devices that use or produce (Transform) Data. The physical component is
not identified.

Source:
External sources or destination of data, which may be People , programs, organizations or other
entities.

Data Store:
Here data are stored or referenced by a process in the System

Dataflow diagrams:
4.2 UML DIAGRAMS:
We prepare UML diagrams to understand the system in a better and simple way. A single
diagram is not enough to cover all the aspects of the system. UML defines various kinds of diagrams to
cover most of the aspects of a system.

Use Case Diagrams:


Use case diagrams are a set of use cases, actors, and their relationships. They represent the use
case view of a system. A use case represents a particular functionality of a system. Hence, use case
diagram is used to describe the relationships among the functionalities and their internal/external
controllers. These controllers are known as actors.

Fig: Use case diagram


Sequence Diagram:
Sequence diagrams describe interactions among classes in terms of an exchange of messages
over time. They're also called event diagrams. A sequence diagram is a good way to visualize and
validate various runtime scenarios. These can help to predict how a system will behave and to discover
responsibilities a class may need to have in the process of modeling a new system.

Fig: Sequence diagram

Class Diagrams:
Class diagrams are one of the most useful types of diagrams in UML as they clearly map out the
structure of a particular system by modeling its classes, attributes, operations, and relationships between
objects. With our UML diagramming software, creating these diagrams is not as overwhelming as it
might appear. This guide will show you how to understand, plan, and create your own class diagrams.
Fig: Class diagram

4.3 FEASIBILITY ANALYSIS:


Feasibility study is an important phase in the software development process. It enables the
developer to have an assessment of the product being developed. It refers to the feasibility study of the
product in terms of the product, operational use and technical support required for implementing it.
5. SOFTWARE REQUIREMENT SPECIFICATION
5.1 Definition of SRS:
The SRS is the means of translating the ideas of the minds of client into a formal document and
also it fully describes what the software will do and how it will be expected to perform.

5.2 Requirement Analysis:


This stage is to obtain a clear picture of the needs and requirements of the end-user and also the
organization. Analysis involves interaction between the clients and the analysis. Usually analysts
research a problem from any questions asked and reading existing documents. The analysts have to
uncover the real needs of the user even if they don’t know them clearly.
 The information domain of the problem must be represented and understood.
 The functions that the software is to perform must be defined.
 The behavior of the software as a consequence of external events must be defined.
 The analysis process must move from essential information to implementation detail.

5.3 Requirement Specification:


Specification Principles:
Software Requirements Specification plays an important role in creating quality software
solutions. Specification is basically a representation process. Requirements are represented in a manner
that ultimately leads to successful software implementation.
Requirements may be specified in a variety of ways. However, there are some guidelines worth
following: -
 Representation format and content should be relevant to the problem.
 Information contained within the specification should be nested.
 Diagrams and other notational forms should be restricted in number and consistent in use.
 Representations should be revisable.

Software Requirements Specifications:


The software requirements specification is produced at the culmination of the analysis task. The
function and performance allocated to the software as a part of system engineering are refined by
establishing a complete information description, a detailed functional and behavioral description, and
indication of performance requirements and design constraints, appropriate validation criteria and other
data pertinent to requirements.

External Interface Requirements:


User Interfaces:
The user interface for the software shall be compatible to any browser such as Internet Explorer,
Mozilla or Chrome by which user can access to the system. The user interface shall be implemented
using R x64 3.4.4.

Hardware Interfaces:
Since the application import the libraries and packages through the internet, all the hardware
require to connect internet will be hardware interface for the system.
Software Interfaces:
We use R x64 3.4.4 and some packages for NLP. We use “twitteR” library to retrieve
tweets form twitter, “tidyverse” library to load the core tidyverse libraries and “dplyr” library to
manipulate the data.

Security Requirements:
We are planning to design Sentiment analyzer and that is made to analyze user’s tweets. It
will analyze all tweets and retweets. We implements Natural Language Processing to give result by
analyzing all tweets.

Software Quality Attributes:


Product is adaptable to any changes. Such as the product can be modified to transfer not
only text but also image, audio, video files. Product is maintainable i.e. in future the properties of the
product can be changed to meet the requirements.
6. DOCUMENT DESIGN
6.1 SYSTEM DESIGN:
Hardware Requirements:
 I3 processor
 2GB/4GB RAM
 60GB to 80GB Hard disk space

6.2 SOTWARE REQUIREMENTS:


Language : R Language
Software : R Studio x64 3.4.4
Dataset : Amazon, Flip kart, Snap deal.
Library : twitteR, tidyverse, dplyr, ggplot2, wordcloud2, tidytext and stringr.
Operating System: Operating System (Windows 7/8/8.1/10)
7. ABOUT SOFTWARE
7.1 OVERVIEW OF R:
R is a programming language and free software environment for statistical computing and
graphics supported by the R Foundation for Statistical Computing. The R language is widely used
among statisticians and data miners for developing statistical software and data analysis. Polls, data
mining surveys, and studies of scholarly literature databases show substantial increases in popularity in
recent years.
R is an implementation of the S programming language combined with lexical
scoping semantics, inspired by Scheme. S was created by John Chambers in 1976, while at Bell Labs.
There are some important differences, but much of the code written for S runs unaltered. R was created
by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently
developed by the R Development Core Team (of which Chambers is a member). R is named partly after
the first names of the first two R authors and partly as a play on the name of S. The project was
conceived in 1992, with an initial version released in 1995 and a stable beta version in 2000.
The capabilities of R are extended through user-created packages, which allow specialized
statistical techniques, graphical devices, import/export capabilities, reporting tools (knitr, Sweave), etc.
These packages are developed primarily in R, and sometimes in Java, C, C++, and FORTRAN. The R
packaging system is also used by researchers to create compendia to organize research data, code and
report files in a systematic way for sharing and public archiving. A core set of packages is included with
the installation of R, with more than 15,000 additional packages (as of September 2018) available at the
Comprehensive R Archive Network (CRAN), Bioconductor, Omegahat, GitHub, and other repositories.
8. ALGORITHMS
8.1 Natural Language Processing (NLP):
Natural language processing (NLP) is a subfield of computer science, information engineering,
and artificial intelligence concerned with the interactions between computers and human (natural)
languages, in particular how to program computers to process and analyze large amounts of natural
language data. Challenges in natural language processing frequently involve speech recognition, natural
language understanding, and natural language generation.
The history of natural language processing generally started in the 1950s, although work can be
found from earlier periods. In 1950, Alan Turing published an article titled "Intelligence" which
proposed what is now called the Turing test as a criterion of intelligence.
The Georgetown experiment in 1954 involved fully automatic translation of more than sixty
Russian sentences into English. The authors claimed that within three or five years, machine translation
would be a solved problem. However, real progress was much slower, and after the ALPAC report in
1966, which found that ten-year-long research had failed to fulfill the expectations, funding for machine
translation was dramatically reduced. Little further research in machine translation was conducted until
the late 1980s, when the first statistical machine translation systems were developed.
9. CODING
Packages:
library(twitteR)
library(tidyverse)
library(tidytext)
library(dplyr)
library(ggplot2)

## initiate the handshake with twitter:


consumer_key <- 'GxflL4L0GASIlP0F6x4YJe3YX'
consumer_secret <- 'x1lu3OA9onmYD6cE1ipIoj1ULLFbR7JJ3gdcgmcPZXC5SjrbWt'
access_token <- '3273974424-n1q1m1S6FWDu9Rid2IG9BDTlgJ39Iyjx8wJcw3u'
access_secret <- 'kOylb794mgptSeFGILXYevOTHkVnsEScZHvf0o9Cmpg0g'

setup_twitter_oauth(consumer_key,consumer_secret , access_token, access_secret)

setwd("C:/Users/Bhanu Prasad/Desktop/mp")

No_of_Keys <- readline(prompt="Enter number of keywords: ")


keys <- c()
for(i in 1:No_of_Keys){
string <- readline(prompt="Enter keyword: ")
keys[i] <- string
}
tweets <- searchTwitter(keys,since='2019-01-1', until='2019-04-25', n=1000, lang="en")

tweets <- do.call("rbind", lapply(tweets, as.data.frame))


write.csv(tweets, file = "C:/Users/Bhanu Prasad/Desktop/mp/Retrieved_tweets.csv")
tweets <- read.csv("Retrieved_tweets.csv")

## Pre processing the data:


Atest <- tweets$text
tweets_list <- lapply(Atest, function(x) iconv(x, "latin1", "ASCII", sub=""))

tweets_list <- lapply(tweets_list, function(x) gsub("htt.*",' ',x))


unlist(tweets_list)

## Tokenization:
Atokens <- data_frame(text = tweets_list) %>% unnest_tokens(word,text)

Ascore <- Atokens %>%


inner_join(get_sentiments(lexicon = c( "bing")), by="word") %>%
count(sentiment) %>%
spread(sentiment, n, fill = 0) %>%
print(sentiment)%>%
mutate(sentiment= positive - negative) # no . of positive words - # of negative words
## Plot data on sentiments:
Ascore <- as.data.frame(Ascore)
Ascore <- t(Ascore)
colnames(Ascore) <- c("Scores")
Senti <- c("Negative","Positive","Net")
Ascore <- cbind(Ascore,Senti)
Ascore <- as.data.frame(Ascore)

graph <- ggplot(Ascore , aes(x=Senti, y=Scores, fill=Senti, alpha=0.7))+


geom_bar(stat = "identity")+
labs(x="Sentiment", y="Scores", title="Sentiment plot")
graph
jpeg("rplot.jpg", width = 700, height = 700)
graph
dev.off()
10. TESTING
Software Testing is a critical element of software quality assurance and represents the
ultimate review of specification, design and coding, Testing presents an interesting anomaly for the
software engineer.
Testing Principles:
 All tests should be traceable to end user requirements
 Tests should be planned long before testing begins.
 Testing should begin on a small scale and progress towards testing in large.
 Exhaustive testing is not possible.
 To be most effective testing should be conducted by an independent third party.
11. SCREENS
Input:

Fig 1: Input-1

Output:

Fig 2: Output-1
Input:

Fig 3: Input-2

Output:

Fig 4: Output-2
12. CONCLUSION
Twitter Feed Sentiment Analysis (TFSA) is an approach to analyze the tweets in twitter and
categories them into positive or negative or neutral using sentiment analysis. We focused on Twitter as
and have implemented the R program to implement sentiment analysis. Natural Language Processing
technique have been used for sentiment analysis on Twitter. It is very useful in getting feedbacks on
surveys or policies.
13. BIBLIOGRAPHY
1. https://www.google.com/amp/s/www/r-bloggers.com/twitter-sentiment-analysis-with-r/amp/
2. https://link.medium.com/RLVd1kh0XV
3. https://link.medium.com/rVqSgqt0XV
4. http://dataaspirant.com/2018/03/22/twitter-sentiment-analysis-using-r/

Das könnte Ihnen auch gefallen