Beruflich Dokumente
Kultur Dokumente
167
categorization of Tweets as neutral, positive or negative. In This information is of great use for governments and
recent years, the trend of sharing health-related issues on corporate managers to assess the general public’s opinion
social networks has increased analyzing user posted content and capture experience feedback. Current trends, marketing
has come up as a challenge in semantic analysis [28]. promotions and media release get quick attention and the
Adverse Drug Reactions (ADRs) which are a leading cause user feedback can be accessed and monitored on a real-time
of deaths in patients admitted in hospitals. Patients on their basis.
own reported on social media about ADRs on social media.
Twitter and other forums focusing on Spanish respondents a. DATASET CREATION
were explored by the TrendMiner project to monitor the
content related to drugs and related reactions. Social In the research, tweets for #aircel, #airtelindia,
networks are vast repositories of public created data that #ideacellular, #reliancejio, #vodafonein the Twitter handles
contain information representing a general mood about for the Telecomm companies, were extracted on a daily
brands and events [26]. In the airline industry, collecting basis from 1st March 2017 to 31st July 2017. This period is
feedback from customers is an important task for which the selected for research as Reliance Jio had announced to start
conventional methods are inefficient and incorrect [29]. The charging its customers for all its services from 1st April,
study created a dataset of 107,866 for the major airlines in 2017 (though actual charging of services started on 15 th
North America. Tweets were labeled as positive, negative April). This could be a real test for Reliance Jio to analyze if
and neutral for the segregation process. the general public is inclined towards its services or not
Performance of various sentiment classification amidst the presence of operators like Aircel, Bharti Airtel,
approaches was compared to develop an ensemble Idea Cellular and Vodafone India which were operating for
approach. As per the study, in the airline industry, many years in the past. The combined sentiment of the
sentiment-analysis and classification is accurate enough to general public needed to be compared to check if the
be used for customer satisfaction investigation [30]. The general public is showing interest towards the new entrant
researchers used social media opinion mining to Poland’s Reliance Jio or existing market leaders, when all of them
2010 political crisis discussion dataset containing a high were now offering mobile services at similar competitive
conflict level and deep polarization [31]. The study also tariffs.
analyzed the relation between online sentiment and its
impact on Java Governing Board’s open sourcing decision The number of tweets extracted on a particular instance is
of Java. restricted by the Twitter rate limiting. During the said
The researchers studied the application of sentiment period, 18,457 tweets for #aircel, 37,546 tweets for
analysis process in economic and financial modeling and #airtelindia, 14,592 tweets for #ideacellular, 51,478 tweets
concluded that Tweet sentiment analysis is one of the best for #reliancejio and 31,578 tweets for #vodafonein were
methods to automate the sentiment analysis process [32]. extracted. Then, the filter was applied to this dataset on the
They proposed to develop an ontology-based method for Tweet column containing Tweet content to remove
segmenting individual Tweet rather than treating each duplicate tweets generated from retweets. As a result,
Tweet as a single expression being assigned a sentiment distinct tweets for #aircel, #airtelindia, #ideacellular,
score. The research suggested that a brand’s social network #reliancejio, #vodafonein were 4045, 7674, 5346, 10,591
is an effective way of attracting consumers and attaching and 7894 respectively. The tweets in an individual dataset of
them emotionally to the brand [33]. Also, existing #aircel, #airtelindia, #ideacellular, #reliancejio, #vodafonein
consumer’s bonding with the brand is reflected in the have each other’s mention in some cases where the user
brand’s social network on different social network websites. wants to compare them in terms of different parameters like
Sentiment analysis helps organizations to scrutinize social network usage charges, coverage, call drops, Internet speed
media content on a real-time basis and act accordingly [34]. etc. Figure 2 shows a snapshot of dataset of #reliancejio
Sentiment analysis of social media can help in picking dataset.
promising stocks for better returns.
b. DATASET PROCESSING
III. RESEARCH METHOD This process discards the incomplete, incorrect or
irrelevant data. Tweets contain advertisement links of
Social networks have become a valuable source of
different companies involved in paid marketing, and other
knowledge in the form of sentiments for a number of sectors
irrelevant data. Many users post tweets in Non-English
such as public opinion management, brand building
language text which cannot be used in the general text
customer relationship management [35]. Social networks
processing or Natural Language Processing. It is required to
allow APIs to query them for a particular keyword generally
remove such data from the dataset. This module identifies
superseded by # (hashtag). As most of the social networks
and removes the unwanted data and the cleaned data are
have users of the order of millions, any current news item,
passed to the next module.
trend, product, organization or service is flooded with
numerous user opinions on them.
168
A. COMMUNITY DETECTION research used Python libraries to detect communities in the
datasets by using the betweenness centrality measure of
Popular tweets attract user attention and get shares, retweets
network graphs. The betweenness centrality of a node v,
and responses. Multiplicity of edges amongst nodes leads to
BC(v) is defined as:
community evolution and growth. The communities are the
channels for online word of mouth propagation. The
Semantic analysis was performed on each of the Similarly positive opinion Tweets (P and P+) have +1 and
hashtag's datasets to classify the Tweets into five levels of +2 weights. A sentiment dictionary is a critical part of
polarity N+, N, NEU, P and P+. N+ and N represent a sentiment analysis to recognize the sentiment tokens in any
negative polarity generated for a negative opinion in the document [36]. This dictionary contains words, phrases,
text, whereas P and P+ represent positive polarity. NEU related concepts and sentiment polarity and valuable
polarity is generated for neutral opinion or when the polarity information used to identify certain significant phrases from
cannot be calculated. Table I shows weights assigned to the the source and establish agreements and weight for them.
standard polarities. NEU category of polarities Tweets Table II shows the custom dictionary created for performing
contains neutral opinion about the concept or entity and sentiment analysis of the Tweet datasets. Some of the most
hence they have been given a 0 (zero) weight. Negative influential keywords for describing opinion about likes and
opinion Tweets (N and N+) have been given -1 and -2 dislikes for a telecom brand have been included in this
weight depending upon the magnitude of negative opinion. dictionary to capture the essence of user comments.
Keyword App Call drop Cost Kbps Mbps Network coverage Port to Signal Support
Type Entity Concept Concept Concept Concept Concept Concept Entity Concept
Figure 3 describes the overall architecture of the Table III data and figure 4 show the polarity distribution
sentiment analysis based prediction model for monthly for tweets of #aircel, #airtelindia, #ideacellular, #reliancejio,
subscriber addition of Telecomms. The sentiment analysis #vodafonein for the base month March, 2017. It is evident
was separately performed on datasets of distinct tweets of that there are a significant number of Tweets with polarity
#aircel, #airtelindia, #ideacellular, #reliancejio, value 0. These are the neutral sentiment Tweets for which
#vodafonein. The sentiment score of a particular Tweet is positive or negative sentiment could not be determined. The
calculated by multiplying its identified polarity with the frequency of neutral sentiment does not contribute to the
weight assigned to that polarity. The total sentiment score of overall sentiment score.
a hashtag is obtained by summing up the score of all tweets
and the following results were obtained.
169
Web
www.coai.com www.twitter.com
Ontology assignment
Sentiment Analysis
Figure 4. Sentiment count for positive and negative tweets of #aircel, #airtelindia, #ideacellular, #reliancejio, #vodafonein
170
Figure 4 shows the number of positive and negative in the Telecomm field. It is a reflection of respondent’s
sentiment tweets for #aircel, #airtelindia, #ideacellular, individual experience about the brand in terms of different
#reliancejio, #vodafonein. For #airtelindia and #vodafonein, parameters based on different issues related to mobile
the number of negative sentiment is greater than the positive network service and pricing. This affects their bonding with
sentiment and hence the overall score of #airtelindia and the brand they are using or the decision making for the
#vodafonein is negative. The overall sentiment score of brand to which they want to switch over to.
#reliancejio is positive as the positive sentiment count is Table V shows the month wise overall sentiment score
greater than the negative sentiment count. Sentiment count for the study duration. Table VI shows the month wise
without weight or polarity does not carry the essence of the subscriber addition data. The data for the number of
dataset. Sentiment score is calculated by multiplying subscribers added each month for Aircel, Bharti Airtel, Idea
number of tweets having a particular polarity with the Cellular, Reliance Jio and Vodafone India was downloaded
weight of that polarity. from the website of the Cellular Operators Association of
Table IV data shows that the overall sentiment score of India [37]. The data for the March month has been used as
#aircel, #airtelindia, #ideacellular, #reliancejio, #vodafonein base data. The difference of sentiment score of an operator
is 855, 2841, 1353, 7530 and 3752 respectively.. The score and its previous month sentiment score gives the growth
indicates that the general public has a comparatively rate percentage.
stronger positive opinion about Reliance Jio, the new entrant
TABLE IV. TELECOMM SENTIMENT SCORES FOR MARCH 2017
Polarity → N+ N NEU P P+
Total
Weight (W) → -2 -1 0 1 2
#airtelindia 2841 2871 1.06 2895 0.84 2916 0.73 976 -66.53
#ideacellular 1,353 1347 -0.44 1350 0.22 1351 0.07 1334 -1.26
#reliancejio 7,530 10722 42.39 11676 8.90 12868 10.21 1554 -87.92
#vodafonein 3752 3767 0.40 3788 0.56 3806 0.48 3779 -0.71
171
TABLE VII. MONTH-WISE PREDICTED AND ACTUAL GROWTH RATE: CORRELATION ANALYSIS
(a) (b)
APRIL MONTH MAY MONTH
Predicted Actual Predicted Actual
growth rate growth rate growth rate growth rate
(%) (%) (%) (%)
Predicted Pearson Predicted Pearson .949
1 .993 1
growth Correlation growth Correlation
rate (%) rate (%) Significant
Significant .025
.0001 value
value
Number of Number of
5 5 5 5
companies
companies
Actual Pearson Actual Pearson
.993 1 .924 1
growth Correlation growth Correlation
rate (%) rate (%) Significant
Significant .025
.0001 value
value
Number of Number of
5 5
5 5 companies
companies
(c) (d)
JULY MONTH
JUNE MONTH
Predicted Actual
Predicted Actual
growth rate growth rate
growth rate growth rate
(%) (%)
(%) (%)
Predicted Pearson
Predicted Pearson 1 .893
1 .865 growth Correlation
growth Correlation
rate (%) Significant
rate (%) Significant .041
.058 value
value
Number of Number of
5 5 5 5
companies companies
Actual Pearson Actual Pearson
.865 1 .893 1
growth Correlation growth Correlation
rate (%) rate (%) Significant
Significant .041
.058 value
value
Number of Number of
5 5 5 5
companies companies
172
suited to their requirements. Data mining and sentiment Jun-2017].
analysis techniques can be used by managers to take timely [16] “About Bharti Airtel.” [Online]. Available:
actions to predict and prevent such customer churn [38], http://www.airtel.in/about-bharti/about-bharti-airtel.
[39]. [Accessed: 25-May-2018].
[17] “Company History | About Vodafone India.”
REFERENCES [Online]. Available: https://www.vodafone.in/about-
us/company-history?section=consumer. [Accessed:
[1] J. Poushter, “Smartphone Ownership and Internet 23-May-2018].
Usage Continues to Climb in Emerging Economies,” [18] “Products & Brands :: Reliance Industries
Pew Research Center, pp. 1–5, 2016. Limited.” [Online]. Available:
[2] S. I. World, “World Internet Users Statistics and http://www.ril.com/OurCompany/ProductsAndBrand
2018 World Population Stats,” Internet world stats, s.aspx. [Accessed: 20-May-2018].
2018. [Online]. Available: [19] S. Bharathi and A. Geetha, “Sentiment Analysis for
https://www.internetworldstats.com/stats.htm. Effective Stock Market Prediction,” International
[Accessed: 26-May-2018]. Journal Intell. Eng. Syst., vol. 10, no. 3, pp. 146–
[3] S. Ranjan and S. Sood, “Exploring Twitter for Large 153, 2017.
Data Analysis,” International Journal of Advance [20] S. Bharathi, A. Geetha, and R. Sathiynarayanan,
Research in Computer Science and Software “Sentiment analysis of twitter and RSS news feeds
Engineering, vol. 6, no. 7, pp. 325–330, 2016. and its impact on stock market prediction,”
[4] Y. M. Li and Y. L. Shiu, “A diffusion mechanism for International Journal Intell. Eng. Syst., vol. 10, no. 6,
social advertising over microblogs,” Decision pp. 68–77, 2017.
Support System, vol. 54, no. 1, pp. 9–22, 2012. [21] W. Medhat, A. Hassan, and H. Korashy, “Sentiment
[5] I. Taxidou and P. Fischer, “Realtime analysis of analysis algorithms and applications: A survey,” Ain
information diffusion in social media,” Proc. VLDB Shams Eng. J., vol. 5, no. 4, pp. 1093–1113, 2014.
Endowment, vol. 6, no. 12, pp. 1416–1421, 2013. [22] T. Wilson, J. Wiebe, and P. Hoffman, “Recognizing
[6] J. Prager, “Open-Domain Question–Answering,” contextual polarity in phrase level sentiment
Foundations and Trends® in Information Retrieval, analysis,” Acl, vol. 7, no. 5, pp. 12–21, 2005.
vol. 1, no. 2, pp. 91–231, 2006. [23] M. Kanakaraj and R. M. R. Guddeti, “NLP based
[7] L. Harris and A. Rae, “Social networks: The future sentiment analysis on Twitter data using ensemble
of marketing for small business,” Journal of classifiers,” in 2015 3rd International Conference on
Business Strategy, vol. 30, no. 5, pp. 24–31, 2009. Signal Processing, Communication and Networking,
[8] S. K. Janane, M. S. Keerthana, and B. ICSCN 2015, 2015.
Subbulakshmi, “Hybrid Classification for Sentiment [24] T. Nasukawa, T. Nasukawa, J. Yi, and J. Yi,
Analysis of Movie Reviews,” International Journal “Sentiment analysis: Capturing favorability using
of Engineering Sciences and Research, vol. 7, no. 4, natural language processing,” Proc. 2nd International
pp. 724–728, 2018. Conf. Knowl. capture, pp. 70–77, 2003.
[9] S. Ranjan, “Online Word of Mouth Communication [25] W. Fan and M. D. Gordon, “The Power of Social
in Bollywood Tweet Dataset,” International Journal Media Analytics,” Commun. ACM, vol. 57, no. 6,
for Research in Applied Science & Engineering pp. 74–81, 2014.
Technology, vol. 5, no. 12, pp. 1442–1449, 2017. [26] A. Giachanou and F. Crestani, “Like it or not: A
[10] Kwak, H., Lee, C., Park, H., Moon, S., 2010. What is survey of Twitter sentiment analysis methods,”
Twitter, a Social Network or a News Media?, in ACM Comput Surv, vol. 49, no. 2, p. Article 28; 1-
proc: The International World Wide Web 41, 2016.
Conference Committee (IW3C2), 2010, pp. 1–10. [27] X. Liu, A. C. Burns, and Y. Hou, “An Investigation
https://doi.org/10.1145/1772690.1772751 of Brand-Related User-Generated Content on
[11] W. X. Zhao et al., “Comparing Twitter and Twitter,” J. Advert., vol. 46, no. 2, pp. 236–247,
Traditional Media Using Topic Models,” in Proc. Apr. 2017.
European Conf. on Information Retrieval, 2011, pp. [28] P. Martínez, J. L. Martínez, I. Segura-Bedmar, J.
338–349. Moreno-Schneider, A. Luna, and R. Revert,
[12] “Twitter Usage Statistics - Internet Live Stats,” “Turning user generated health-related content into
2018. [Online]. Available: actionable knowledge through text analytics
http://www.internetlivestats.com/twitter-statistics/. services,” Comput. Ind., vol. 78, pp. 43–56, 2016.
[Accessed: 07-May-2018]. [29] Y. Wan and Q. Gao, “An Ensemble Sentiment
[13] S. Kumar, F. Morstatter, and H. Liu, “Twitter Data Classification System of Twitter Data for Airline
Analytics,” Springer, p. 89, 2013. Services Analysis,” Proc. - 15th IEEE International
[14] T. Press Release, “Telecom Regulatory Authority of Conf. Data Min. Work. ICDMW 2015, pp. 1318–
India,” July, no. 65, pp. 1–19, 2017. 1325, 2016.
[15] “Statistics.” [Online]. Available: [30] M. M. Mostafa, “An emotional polarity analysis of
http://www.itu.int/en/ITU- consumers’ airline service tweets,” Soc. Netw. Anal.
D/Statistics/Pages/stat/default.aspx. [Accessed: 29- Min., vol. 3, no. 3, pp. 635–649, Sep. 2013.
173
[31] P. Sobkowicz, M. Kaschesky, and G. Bouchard, [35] M. M. Mostafa, “More than words: Social networks’
“Opinion mining in social media: Modeling, text mining for consumer brand sentiments,” Expert
simulating, and forecasting political opinions in the Syst. Appl., vol. 40, no. 10, pp. 4241–4251, 2013.
web,” Gov. Inf. Q., vol. 29, no. 4, pp. 470–479, [36] A. C. Tsai, “Building a Concept- Level Sentiment on
2012. Commonsense Knowledge,” Ieee Intell. Syst., vol.
[32] E. Kontopoulos, C. Berberidis, T. Dergiades, and N. MARCH/APRI, pp. 22–30, 2013.
Bassiliades, “Ontology-based sentiment analysis of [37] “Cellular Operators Association of India.” [Online].
twitter posts,” Expert Syst. Appl., vol. 40, no. 10, pp. Available: https://coai.com/#research-and-reports.
4065–4074, 2013. [Accessed: 17-Jul-2017].
[33] H. Park and Y. K. Kim, “The role of social network [38] A. M. Almana, M. S. Aksoy, and R. Alzahrani, “A
websites in the consumer-brand relationship,” J. Survey On Data Mining Techniques In Customer
Retail. Consum. Serv., vol. 21, no. 4, pp. 460–467, Churn Analysis For Telecom Industry,” J. Eng.
2014. Research Appl., vol. 4, no. 5, pp. 165–171, 2014.
[34] R. Feldman, “Techniques and applications for [39] W. Verbeke, D. Martens, and B. Baesens, “Social
sentiment analysis,” Commun. ACM, vol. 56, no. 4, network analysis for customer churn prediction,”
pp. 82–89, 2013. Appl. Soft Comput. J., vol. 14, no. PART C, pp.
431–446, 2014.
174