Beruflich Dokumente
Kultur Dokumente
Volume: 5 Issue: 3 91 96
_______________________________________________________________________________________________
Public Opinion Analysis Using Hadoop
AbstractRecent technological advances in devices, computing, and social networking have revolutionized the world
but have also increased the amount of data produced by humans on a large scale. If you collect this data in the form of disks, it
may fill an entire football field. According to studies, 2.5 billion gigabytes of new data is generated every day and 2.5 petabytes of
data is collected every hour. This rate is still growing enormously. Though all this information produced is meaningful and can be
useful when processed, it gets neglected. Social media has gained massive popularity nowadays. Twitter makes it easy to engage
users in expressing, sharing and discussing hot latest topics but these public expressions and views are hard to analyze due to the
bigger size of the data created by Twitter. In order to perform analysis and predictions over the hot topics in society, latest
technologies are needed. The most popular solution for this is Hadoop. Hadoop acts as an open-source framework for developing
and executing distributed applications that process very large amounts of data. It stores and process big data in a distributed
fashion on large clusters of commodity hardware. The risk, of course, in running on commodity machines is how to handle failure.
Hadoop is built with the assumption that hardware will fail and as such, it can easily handle most failures. Hadoop can be used for
developing and executing distributed applications that process very large amounts of data. It provides a suitable environment
needed for treating or processing huge data. Our job is to extract and store data into its file system and query the data according to
the desired output.
We propose to perform analysis on Public opinion expressed over Twitter regarding the trending topics of the society by
using Apache Hadoop framework along with its services Apache Flume and Apache Hive.
Keywords- Analysis; Twitter; Hadoop; BIGDATA; Hive; Flume; Knime; AFINN; Structured; JSON; HQL
__________________________________________________*****_________________________________________________
93
IJRITCC | March 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 3 91 96
_______________________________________________________________________________________________
95
IJRITCC | March 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169
Volume: 5 Issue: 3 91 96
_______________________________________________________________________________________________
challenges with big databases are creation, duration, storage, a lot of Research and wecame to know about so many new
sharing, search, analysis and visualization. So to manage these concepts for which we are really thankful to them.
databases we need, highly parallel softwares. Data is
retrieved from different sources such as social media, REFERENCES
traditional enterprise data or sensor data etc. Flume can be
used to retrieve data from social media network. Then, this [1] Apoorv Agarwal, Jasneet Singh Sabarwal, End to End
data can be organized and stored using distributed file systems Sentiment Analysis of Twitter Data
such as Google File System or Hadoop File System. These file [2] Real Time Sentiment Analysis of Twitter Data Using
systems are very efficient and optimal when number of reads Hadoop Sunil B. Mane, YashwantSawant, SaifKazi,
are very high as compared to writes. At last, data can be VaibhavShinde
processed & analyzed using map reducer with various Hadoop [3] Luciano Barbosa and Junlan Feng. 2010. Robust
services and tools. In this project, we have queried the data sentiment detection on twitter from biased and noisy
acquired from twitter and have performed successful Public data.
Opinion Analysis using Hadoop with the help of Apache [4] Semantic Sentiment Analysis of Twitter Hassan Saif,
Flume, Apache Hive and KNIME Analytics Platform. Yulan He and Harith Alani Knowledge Media Institute,
The Open University, United Kingdom.
[5] Kouloumpis, E., Wilson, T., Moore, J.: Twitter
ACKNOWLEDGMENT sentiment analysis: The good the bad and the omg! In:
We would like to express our special thanks of gratitude to Proceedings of the ICWSM (2011).
our Guide Prof. A. E. Patil, as well as our Head of [6] Go, A., Bhayani, R., Huang, L.: Twitter sentiment
DepartmentProf. Sunil Wankhade, who gave us the golden classification using distant supervision. CS224N
opportunity to do this wonderful project on the topic Public Project Report, Stanford (2009).
Opinion Analysis using Hadoop, which also helped usin doing
96
IJRITCC | March 2017, Available @ http://www.ijritcc.org
_______________________________________________________________________________________