Sie sind auf Seite 1von 2

Individual Assignment #1

Data Source and Libraries

Jiewen Xiao
U07648520

When we are trying to understand people’s perception of a particular issue, social media

mining has two advantages over the traditional ways of the survey. The cost and efforts of

reaching a broad audience are low, and the anonymity of social media makes people more likely

to express their real opinion without self-imposed censor (Das, Sun, and Dutta, 2015).

Twitter, as one of the largest social platforms, provides us with vast resources for

conducting social research as well as understanding the target market where a company plans to

operate. The data I’m planning to work with are those individual tweets that contain keywords of

interest, and particularly for my case, it’s “dock-less bike.” The tweets include introductory

information about reformed bike-sharing programs that are relatively new to the market. Most

importantly, the unstructured text area contains people’s opinion towards this new form of the

bike-sharing program. Some people may think they are convenient and environmental friendly;

others may dislike it because the bikes take up public space and could be potentially dangerous.

A company that provides bike-sharing services can utilize these positive or negative sentiments

to decide when and where to initiate the program, how to improve the service and how to address

the common concerns.

I installed several packages to facilitate the twitter data extraction process. I followed the

tutorial of Roy(2017) to use the R package “twitteR”, “ROAuth”, and “RCurl” to set up a search-

and-extract mechanism to get the data from Twitter (Roy, 2017). Using the function

“searchTwitter”, I can customize the keywords, length, language, location, and other
characteristics for the search results (Gentry, 2016). I also installed the “tm” package for further

text cleaning and analysis, the “wordcloud” package for presenting word frequency in a keyword

cloud, the “ggplot2” package for elegantly visualize the data, the “XML” package for parsing

XML and HTML documents, the “stringr” package for making string functions simpler and

easier to use, and the “RTextTools” package to use the machine learning to simplify data

processing. There might be more packages and tools that I will find useful in the future.

References:

1. Das, S., Sun, X., & Dutta, A. (2015). Investigating user ridership sentiments for bike

sharing programs. Journal of Transportation Technologies, 5(02), 69.

2. Jia, Z., Xie, G., Gao, J., & Yu, S. (2016, December). Bike-Sharing System: A Big-Data

Perspective. International Conference on Smart Computing and Communication, 548-

557. Springer, Cham.

3. Tweets. Retrieved from Twitter: https://www.twitter.com

4. Developer Website for Twitter. Retrieved from https://dev.twitter.com/docs/auth/oauth

5. Roy S. (2017). Tutorial on how to extract tweets using R. Retrieved from

https://www.researchgate.net/post/How_do_I_extract_tweets_using_R

6. Gentry, J. (2016). Package ‘twitteR’. R package version, 1(9).

Das könnte Ihnen auch gefallen