Beruflich Dokumente
Kultur Dokumente
Jiewen Xiao
U07648520
When we are trying to understand people’s perception of a particular issue, social media
mining has two advantages over the traditional ways of the survey. The cost and efforts of
reaching a broad audience are low, and the anonymity of social media makes people more likely
to express their real opinion without self-imposed censor (Das, Sun, and Dutta, 2015).
Twitter, as one of the largest social platforms, provides us with vast resources for
conducting social research as well as understanding the target market where a company plans to
operate. The data I’m planning to work with are those individual tweets that contain keywords of
interest, and particularly for my case, it’s “dock-less bike.” The tweets include introductory
information about reformed bike-sharing programs that are relatively new to the market. Most
importantly, the unstructured text area contains people’s opinion towards this new form of the
bike-sharing program. Some people may think they are convenient and environmental friendly;
others may dislike it because the bikes take up public space and could be potentially dangerous.
A company that provides bike-sharing services can utilize these positive or negative sentiments
to decide when and where to initiate the program, how to improve the service and how to address
I installed several packages to facilitate the twitter data extraction process. I followed the
tutorial of Roy(2017) to use the R package “twitteR”, “ROAuth”, and “RCurl” to set up a search-
and-extract mechanism to get the data from Twitter (Roy, 2017). Using the function
“searchTwitter”, I can customize the keywords, length, language, location, and other
characteristics for the search results (Gentry, 2016). I also installed the “tm” package for further
text cleaning and analysis, the “wordcloud” package for presenting word frequency in a keyword
cloud, the “ggplot2” package for elegantly visualize the data, the “XML” package for parsing
XML and HTML documents, the “stringr” package for making string functions simpler and
easier to use, and the “RTextTools” package to use the machine learning to simplify data
processing. There might be more packages and tools that I will find useful in the future.
References:
1. Das, S., Sun, X., & Dutta, A. (2015). Investigating user ridership sentiments for bike
2. Jia, Z., Xie, G., Gao, J., & Yu, S. (2016, December). Bike-Sharing System: A Big-Data
https://www.researchgate.net/post/How_do_I_extract_tweets_using_R