0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
442 Ansichten8 Seiten
The document summarizes a data mining project analyzing the relationship between Bitcoin price and sentiment expressed on Twitter. The project collected over 1 million tweets mentioning Bitcoin from May 24-31, 2014 and used sentiment analysis and association rule mining to discover correlations. Several rules were found including: (1) high tweet intensity and Bitcoin price were associated with happier moods; (2) happier moods and higher prices led to more intense tweets. The results suggest sentiment analysis of social media can provide insights into cryptocurrency price fluctuations.
The document summarizes a data mining project analyzing the relationship between Bitcoin price and sentiment expressed on Twitter. The project collected over 1 million tweets mentioning Bitcoin from May 24-31, 2014 and used sentiment analysis and association rule mining to discover correlations. Several rules were found including: (1) high tweet intensity and Bitcoin price were associated with happier moods; (2) happier moods and higher prices led to more intense tweets. The results suggest sentiment analysis of social media can provide insights into cryptocurrency price fluctuations.
The document summarizes a data mining project analyzing the relationship between Bitcoin price and sentiment expressed on Twitter. The project collected over 1 million tweets mentioning Bitcoin from May 24-31, 2014 and used sentiment analysis and association rule mining to discover correlations. Several rules were found including: (1) high tweet intensity and Bitcoin price were associated with happier moods; (2) happier moods and higher prices led to more intense tweets. The results suggest sentiment analysis of social media can provide insights into cryptocurrency price fluctuations.
June, 2014 Sarajevo BitCoin price-sentiment analysis AA, AB, AK, MM 2
Contents 1. Project Definition .................................................................................................................................. 3 2. Data location and collection ................................................................................................................. 3 3. Data preparation, pre-processing, integration and exploration ........................................................... 4 List of attributes: ....................................................................................................................................... 5 4. Data Mining and Evaluation .................................................................................................................. 6 4.1 Association Rules .............................................................................................................................. 6 Rule 1: ....................................................................................................................................................... 6 Rule 2: ....................................................................................................................................................... 6 Rule 3: ....................................................................................................................................................... 7 Rule 4: ....................................................................................................................................................... 7 5. Result Interpretation ............................................................................................................................. 8
BitCoin price-sentiment analysis AA, AB, AK, MM 3
1. Project Definition Bitcoin is peer to peer version of electronic cash which allows users online payments to be sent directly from one party to another without going through financial institution. During last quarter of 2013 bitcoin started to grow very rapidly and reached record price on November 29 of $1,242 per coin. For comparison, during the same day spot gold prices hit a price of $1,240 per ounce. Currently there are more than 12 million bitcoins in circulation and the rate of new bitcoins will be halved every four years until there is a maximum of 21 million coins. After record price of bitcoin in November, price plunged to around $600 and then started to stagnate around that price point with sight ups and downs. Today, price of bitcoin is $617 and scored slight growth in May 2014. Because of stated facts where price of virtual currency passes price of gold in one point of day, we will try to analyze is there a correlation between twitter post called tweets and price of bitcoin. If there is a correlation, that can be a good standing point for predicting future plunges or jumps in terms of bitcoin price.
2. Data location and collection The data source we choose to use is twitter. Twitter is social platform which allows users to post small amount of text called tweets. For data mining purposes we can use 1% of all tweets and can choose tweets with certain keywords. Keyword we used is #bitcoin. We collected data for the period of time from 24-31 of May. Because this is basically real time data in raw format and with each tweet we collect huge amount of junk which must discard to get data we need. After that we must adjust that raw data for inserting it to the tables which can later on be used for analysis.
BitCoin price-sentiment analysis AA, AB, AK, MM 4
3. Data preparation, pre-processing, integration and exploration Before preprocessing and discarding of unnecessary data we had 13,7GB file. For easing the process of data mining we had to preprocess the data. During preprocessing part, we discarded all irrelevant attributes like profile pictures, background etc. The file we got after these two methods was 1.76GB (7.2 million records) and we concluded that was enough if we take in consideration that one tweet with all relevant attributes take approximately 256bytes. Next thing is to clear all non-English language tweets by language filtering and remove spam by taking most frequent words and with nave Bayesian decided which the spam are. For making things faster we included missing data handling within spam filter. Because we need time stamp for our mining process and time is very hard to fill in instead of missing values, we discarded all tweets without timestamp. For spam reduction we discarded all data records with word count lower than 3 and records whose tweet contain non ASCII characters because those are ones which we cannot analyze with confidence. After these filtering methods, we have got around 1.1 million records which was 252MB. With data we acquired after filtering we begin with sentiment analysis. Sentiment analysis is done with list of words with valance, arousal and dominance. After successful sentiment analysis we should get three dimensional map of tweet moods but sentiment analysis will remove records which cannot be analyzed. After sentiment analysis we were left with 80MB of data or 335 000 individual records and each of them have new derived attributes related to sentiment analysis and those are: mood, mean valence, mean arousal, mean dominance and intensity of the mood. Attribute mood has 20 different values and each of those can have different arousal, valence, dominance and intensity. After preparing our data acquired from twitter, we must take historic bitcoin price data with time information from one of the largest bitcoin exchange websites. Next thing is to match each tweet with corresponding bitcoin price by using relevant timestamp. Next thing we need to do is to adjust BitCoin price-sentiment analysis AA, AB, AK, MM 5
our data set for WEKA. Because WEKA requires csv format we need to convert our data set to that format. List of attributes:
Attribute 1: USERID - ID of the twitter user Attribute 2: VERIFIED_USER True or false if this user is verified user on twitter Attribute 3: FOLLOWERS_COUNT Numerical value which represents followers count of certain user Attribute 4: TWEET_FAVOURITE_COUNT Numerical value, how many favorites has certain tweet Attribute 5: TWEET_RETWEETED True or false if this tweet had been retweeted Attribute 6: TWEET_RETWEET_COUNT Number of retweets of this tweet Attribute 7: TIMESTAMP time of tweet creation Attribute 8: MEAN_VALENCE Represents if the tweet sentiment is good or bad Attribute 9: MEAN_AROUSAL Represents amount of excitement and involvement in tweet Attribute 10: MEAN_DOMINANCE Represents level of assertiveness Attribute 11: MOOD Which mood is expressed by certain tweet, 20 moods in total Attribute 12: INTENSITY What is the intensity of certain mood Attribute 13: BTC_PRICE Numerical value of bitcoin price at the time of tweet creation Attribute 14: BTC_VOL Average of how much bitcoins are sold in that second
BitCoin price-sentiment analysis AA, AB, AK, MM 6
4. Data Mining and Evaluation
For the process of data mining we used the scripting language Python 2.7 to acquire and preprocess all the data. After we used WEKA 3-6-10 to analyze our final data set and extract knowledge from it. To do so we used the Apriori algorithm for generating association rules. We picked rules based on the lift coefficient of the rules because merely confidence levels were not enough to produce meaningful results. We also filtered the rules based on what they imply, mostly bitcoin price as we wanted to see a correlation between it and twitter.
4.1 Association Rules
Rule 1: INTENSITY='(4.123189-4.567537]' BTC_PRICE='(548.820107-557.316128]' 68291 ==> MOOD=happy 67397 conf:(0.99) If the overall intensity of the mood is high and the bitcoin price is comparatively high to the recent past the mood of the overall tweet is happy. Rule 2: MOOD=happy BTC_PRICE='(557.316128-565.812149]' 73378 ==> INTENSITY='(4.123189- 4.567537]' 69220 conf:(0.94) If the user mood is happy and the price of bitcoin is high, the level of emotions expressed in tweets will be high. BitCoin price-sentiment analysis AA, AB, AK, MM 7
Rule 3: MOOD=happy INTENSITY='(4.123189-4.567537]' 181989 ==> BTC_PRICE='(548.820107- 557.316128]' 67397 conf:(0.37) < lift:(1.13)> lev:(0.02) [7954] conv:(1.07) If the user is happy and the intensity of the tweet is high we can expect a tendency to slightly higher price of bitcoin than average. Rule 4: 15. MOOD=happy INTENSITY='(4.123189-4.567537]' 181989 ==> BTC_PRICE='(557.316128- 565.812149]' 69220 conf:(0.38) < lift:(1.12)> lev:(0.02) [7685] conv:(1.07) If the user is happy and the intensity of the tweet is high we can expect a tendency to greatly higher price of bitcoin than average.
BitCoin price-sentiment analysis AA, AB, AK, MM 8
5. Result Interpretation
From the association rules generated we can see that mood of a users tweet, its intensity and the bitcoin price have significant correlation. Rule 1 and 2 show us that in this period the overall tendency for bitcoins sentiment was very positive and correlates to slight price increase tendencies. Rule 3 and 4 confirm us that a positive user attitude towards the given topic correlates to slightly too highly increased bitcoin prices. All of this can be confirmed by checking recent resources online (May 2014) where we can see that after the bitcoin crash last year the last two weeks were the first time bitcoin recovered over USD $500 and thus broke the psychological barrier that existed for months resulting in a lot of positive news further fueling the price beyond USD $565. We can see similar findings in our results.
The Star Interview: The Ultimate Guide to a Successful Interview, Learn The Best Practices On How to Ace An Interview As Well As Crucial Mistakes You Need to Avoid In Order To Land the Job