0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
103 Ansichten6 Seiten
This document discusses using machine learning and deep learning methods to forecast Bitcoin prices. Specifically, it:
1) Reviews previous literature on applying techniques like linear regression, decision trees, and deep learning to predict Bitcoin price.
2) Proposes using two variants of the LSTM model and XGBoost algorithm on univariate time series data to forecast prices, with the univariate LSTM outperforming other models.
3) Provides context on Bitcoin and blockchains, discusses the volatility of Bitcoin prices, and the potential for accurately predicting price trends using novel techniques like machine learning.
This document discusses using machine learning and deep learning methods to forecast Bitcoin prices. Specifically, it:
1) Reviews previous literature on applying techniques like linear regression, decision trees, and deep learning to predict Bitcoin price.
2) Proposes using two variants of the LSTM model and XGBoost algorithm on univariate time series data to forecast prices, with the univariate LSTM outperforming other models.
3) Provides context on Bitcoin and blockchains, discusses the volatility of Bitcoin prices, and the potential for accurately predicting price trends using novel techniques like machine learning.
This document discusses using machine learning and deep learning methods to forecast Bitcoin prices. Specifically, it:
1) Reviews previous literature on applying techniques like linear regression, decision trees, and deep learning to predict Bitcoin price.
2) Proposes using two variants of the LSTM model and XGBoost algorithm on univariate time series data to forecast prices, with the univariate LSTM outperforming other models.
3) Provides context on Bitcoin and blockchains, discusses the volatility of Bitcoin prices, and the potential for accurately predicting price trends using novel techniques like machine learning.
Devansh R. Parikh NMIMS University, devansh.parikh@gmail.com
Abstract - Bitcoin is a cryptocurrency launched in 2008 applications. This transformation of cryptocurrencies is
as a means of transaction between members without any categorized as Blockchain 1.0, 2.0 and 3.0. Bitcoin is an intermediaries. Among their many characteristics, example of Blockchain 1.0, since it only allows the transfer Bitcoin price s highly volatile; with it skyrocketing or of digital tokens i.e. Bitcoins. suddenly falling. Hence, traders are always looking As a currency, Bitcoin offers a novel opportunity for price forward to methods of accurately predicting the price prediction due its young age and high volatility, which is far trend. In this paper, we apply machine learning and deep greater than that of traditional currencies. However, of all learning methods,to forecast the price of Bitcoin and the papers published on Bitcoin, very few deal with machine compare the results with traditional time series learning and deep learning algorithms. To allow for a algorithm ARIMA. We use two variants of the popular comparison with traditional forecasting approaches, an LSTM model and XGBoost algorithm, with the ARIMA model is also developed for performance univariate LSTM outperforming all other models. comparison with machine learning and deep learning models. Index Terms - Bitcoin, Time Series, Deep Learning, LSTM, XGBoost, SARIMA LITERATURE SURVEY
INTRODUCTION Ruchi Mittal, Shefali Arora and M.P.S Bhatia (2018)[1]
have explored how the features in the bitcoin network Bitcoin is the world,s most popular and valuable explain its price hike.They monitor the change in the cryptocurrency that is traded in over 40 exchanges activities over time and relate them to economic theories. worldwide. It is a decentralized digital currency without a Their dataset consists of over nine features relating to the central bank or single administrator that can be sent from cryptocurrency price recorded daily over the period of 6 user to user on the peer-to-peer bitcoin network without the months.Multivariate linear regression has been used to need for intermediaries. predict the highest and lowest prices of cryptocurrency. In Bitcoin was invented by an unknown person or group of this model, multiple independent variables contribute to a people using the name Satoshi Nakamoto and released as dependent feature with the help of multiple coefficients. For open-source software in 2009. Bitcoins are created as a this study,they acquired data from the public Blockchain of reward for a process known as mining. They can be Bitcoin and API’s of online resources. In the dataset, they exchanged for other currencies, products, and services. They have made use of the following independent features to are all stored in a digital cryptocurrency wallet. determine the highest price of cryptocurrencies. These Bitcoin prices are highly volatile with sudden jumps and features are: Open, Low, Close. drops especially during the time of August 2017 to July Muhammad Saad, Aziz Mohasien (2018)[2] monitor the 2018 with the price going as high as US$ 19,343. (December changes in Bitcoin activities over time and relates then to 16, 2017). economic theories to identify key network features and The underlying technology of every cryptocurrency determine the demand and supply dynamics of a including Bitcoin is the Blockchain. Blockchain acts as a cryptocurrency. Finally they use machine learning method to decentralized public database that preserves anonymity and construct models that predict Bitcoin price. They collect creates trust between members. Trust is the result of Bitcoin’s data over more than 20 months and estimate the consensus protocols like Proof-of-Stake (PoS), most signicant features that inuence the price. They Proof-of-Work (PoW), Proof-of-Knowledge (PoK), and computed the correlation between features such as hash rate, distributed consensus. A ledger is maintained that hold the number of users, transaction rate, total bitcoins and price. record for every valid transaction ever made in the They then map the change in features on users and network cryptocurrencies network. This decentralized environment activities to understand the dynamics of Bitcoin. For this and the verification measures implemented make study, they acquired data from the public Blockchain of Blockchains tamper proof, and these lay the foundation for Bitcoin and API’s of online resources. The dataset included cryptocurrency foundation. features such as the number of wallets, unspent transaction Cryptocurrency not only involves the exchange of digital outputs (UTXO’s), mempool size, block size, mean assets, but also smart contracts and decentralized conrmation time, miner’s income, transactions per day, transactions per block, unique Bitcoin addresses, cumulative encode the predicted price into categorical variable networks hashing rate, network’s difculty, fee, fee per reecting: price up, down or no change. The dependent transaction, system-wide total bitcoins, trade volume and the variables for this paper come from the Coindesk website, market price of Bitcoin. The paper implements a regression and Blockchain.info. In addition to the closing price, the approach with Linear Regression, Decision tree and opening price, daily high and daily low are also included as Gradient Descent. It also implements a Deep Learning well as Blockchain data, i.e. the mining difculty and hash Approach and used conjugate gradient algorithm with linear rate. The features which have been engineered (considered search for price prediction. as technical analysis indicators ) include two simple moving Siddhi Velankar, Sakshi Valecha, Shreya Maji (2018)[3] averages (SMA) and a de-noised closing price. The RNN have made an attempt is made to predict the Bitcoin price and LSTM both performed well, however LSTM performed accurately taking into consideration various parameters that better and more capable in recognising long term affect the Bitcoin value. For the first phase of the analysis, dependencies. they aim to understand and identify daily trends in the Wu, Chih-Hung, Ma, Yu-Feng, Lu, Chih-Chiang, Lu, Bitcoin market while gaining insight into optimal features Ruei-Shan (2018)[5] talk about Long short-term memory surrounding Bitcoin price. The data set consists of various (LSTM) networks, which are a state-of-the-art sequence features relating to the Bitcoin price and payment network learning in deep learning for time series forecasting. The over the course of five years, recorded daily. For the second paper proposes two LSTM for new forecasting techniques - phase of the analysis they try to predict the sign of the daily conventional LSTM and LSTM with AR(2) model. The price change with highest possible accuracy. Database paper is basically a comparison between the two models and collection is done from Quandl and CoinMarketCap. This is indicates how the LSTM with AR(2) model is superior and time-series data, from over a time period of five years at achieves higher accuracy than conventional LSTM. Metrics different time instances based on the nature of Bitcoin evaluate are Mean squared error (MSE), root mean square transactions. It is also normalized and smoothened using error (RMSE), mean absolute percentage error (MAPE) methods like log normalization, Standard deviation and mean absolute error (MAE). The paper states that due to normalization and Z-score normalisation. Two algorithms lack of seasonality of cryptocurrency markets and it's high i.e. Bayesian Regression and GLM/Random forest were volatility, time series methods are not very effective for this proposed. task. Two methods are used: In method 1 ACF and PACF is Sean McNally, Jason Roche, Simon Caton (2018)[4] attempt calculated directly, without considering limitation of time to ascertain with what accuracy the direction of Bitcoin price series method, and find the number of periods of price lag in USD can be predicted. This paper achieves varying by its graphical features. This used in LSTM. In the second degrees of success through the implementation of a Bayesian method that is used is the traditional time series method to optimised recurrent neural network (RNN) and a Long Short perform ADF unit root test on the Bitcoin price. The second Term Memory (LSTM) network. The popular ARIMA stage is the demonstration of three models. The first model model for time series forecasting is implemented as a is conventional LSTM method that uses the original bitcoin comparison to the deep learning models. As expected, the closing price series and trade volumes directly for training non-linear deep learning methods outperform the ARIMA and prediction with LSTM. The second model is the forecast which performs poorly. The LSTM achieves the proposed hybrid LSTM method. The method first uses the highest classication accuracy of 52% and a RMSE of 8%. ACF and PACF graphical features of the bitcoin price to Manual grid search and Bayesian optimisation were utilised find the p (price lag period) and q (moving averaging in this study. Grid search, implemented for the Elman RNN, period), and the transaction volume as the predictive is the process of selecting two hyperparameters with a variables, and then use LSTM for training and prediction. minimum and maximum for each. Similar to the RNN, The third model is the conventional time series method, Bayesian optimisation was chosen for selecting LTSM which is based on the time series verification procedure parameters where possible. This is a heuristic search method described in stage 1, and is predicted by ARIMA or Transfer which works by assuming the function was sampled from a Function Method (TF). In the final stage, the prediction Gaussian process and maintains a posterior distribution for results of the three models are calculated respectively, such this function as the results of different hyperparameter as MSE, RMSE, MAE and MAPE, to evaluate the three selections are observed. models, which have better predictive ability. In conclusion, Finally, both deep learning models are benchmarked on both the proposed LSTM forecasting framework can overcome a GPU and a CPU with the training time on the GPU and improve problem of input variable selection in LSTM, outperforming the CPU implementation by 67.7%. The thereby eliminating the need for researchers to have domain independent variable for this study is the closing price of knowledge and trial and error to determine the optimal Bitcoin in USD taken from the Coindesk Bitcoin Price selection of input variables. Index. To assess the performance of models, the root mean squared error (RMSE) of the closing price is used to further learning models for the LSTM both using univariate and METHODOLOGY multivariate input data.We use the SARIMA model for A. DATASET traditional time series forecasting and grid search various parameters until we converge on the best model. All of the The independent variable for this study is the closing price implementation was done in Python, where we used Keras of Bitcoin in USD taken from the Kaggle Historic Coin for the deep learning models. We have predicted the future dataset, which consists of minute to minute data of Open , values of “Close” from the dataset. High, Low, Close, Volume in BTC and indicated currency, and weighted bitcoin price. The dataset Bitcoin dataset IMPLEMENTATION used, ranges from the 1st of January 2012 until the 13th of A. LONG SHORT TERM MEMORY (LSTM) NETWORK March 2019. LSTM’s are a type of Recurrent Neural Network. Recurrent B. DATA PREPROCESSING networks, take as their input not just the current input The data has a lot of missing values. Volume/trades are a example they see, but also what they have perceived single event so fill NAN’s with zeros for relevant fields. previously in time. The decision a recurrent net reached at next we need to fix the OHLC (open high low close) data time step t-1 affects the decision it will reach one moment which is a continuous time series se we forwards fill those later at time step t. So recurrent networks have two sources values. We then group the minute to minute data to an hour of input, the present and the recent past, which combine to by taking mean of the values. The data is scaled using determine how they respond to new data. Adding memory to Min-Max Scalar, that is scaled between 0 and 1 using the neural networks has a purpose: There is information in the maximum and minimum value in that dataset. It basically sequence itself, and recurrent nets use it to perform tasks helps to normalise the data within a particular range. that feedforward networks can’t. Sometimes, it also helps in speeding up the calculations in In the mid-90s, a variation of recurrent net with so-called an algorithm.A plot of all the cleaned data features can be Long Short-Term Memory units, or LSTMs, was proposed seen in figure I. by the German researchers Sepp Hochreiter and Juergen Schmidhuber as a solution to the vanishing gradient problem.
LSTMs help preserve the error that can be backpropagated
through time and layers. By maintaining a more constant error, they allow recurrent nets to continue to learn over many time steps (over 1000), thereby opening a channel to link causes and effects remotely.
The LSTM networks are composed of an input layer, one or
more hidden layers, and an output layer. The structure of LSTM memory cell is illustrated in Fig. 2. Each of the memory cell in LSTM network has three types of gates—the forget gate, the input gate, and the output gate to maintain and adjust its cell state St. 1. The forget gate ft defines which information is deleted from the memory (cell state). 2. The input gate it specifies which information is added to the memory (cell state). 3.The output gate Ot specifies which information from the memory (cell state) to be used as output information [34]. At every time step t, LSTM decide what information will remove from the cell state through the decision by a transformation function (sigmoid or tanh) in C. MODELS the forget gate layer. The input xt and its output ht-1 of the We explore how the following models work on the Bitcoin memory cells at the previous time step t-1 and then outputs a data: LSTM, XGBoost and SARIMA time series. We look number between 0 and 1 for each number in the cell state and compare the different metrics like MAE (Mean Absolute Ct-1. The number 1 denotes “completely keep this” while a Error) and, RMSE (Root Mean Square Error). 0 denotes “complete get rid of this” .[6] Appropriate methods of finding parameters are critical to model performance. We used Grid Search CV to optimise the parameters for XGBoost, and tested various deep ● Now, F0 and h1 are combined to give F1, the boosted version of F0. The mean squared error from F1 will be lower than that from F0.
To improve the performance of F1, we could model after the
residuals of F1 and create a new model F2. This can be done for ‘m’ i terations, until residuals have been minimized as much as possible.Here, the additive learners do not disturb the functions created in the previous steps. Instead, they impart information of their own to bring down the errors.[7] Here, we implemented a XGBoost model using the “xgboost” package in Python. Using Grid Search on an XGBRegressor model, we get the best parameters with a We have implemented 2 LSTM models, one with minimum child weight of 10 and a max depth of 15, among multivariate input data (Model 1) and the other with others, with a learning rate of 0.1. univariate input data (Model 2). Both the models were configured with early stopping criteria nand each was run on C. SARIMA 100 epochs. The architecture of both models is the same: 2 stacked Seasonal Autoregressive Integrated Moving Average, LSTM layers and 1 densely connected output neuron. SARIMA or Seasonal ARIMA, is an extension of ARIMA B. XGBOOST that explicitly supports univariate time series data with a seasonal component.It adds three new hyperparameters to XGBoost is an implementation of gradient boosted decision specify the autoregression (AR), differencing (I) and moving trees designed for speed and performance. It is an ensemble average (MA) for the seasonal component of the series, as learning method.Ensemble learning offers a systematic well as an additional parameter for the period of the solution to combine the predictive power of multiple seasonality. Along with the 3 trend elements of ARIMA learners. The resultant is a single model which gives the (autoregression, difference order and moving average) 4 aggregated output from several models. seasonal elements (autoregression, difference order, moving The models that form the ensemble, also known as base average and number of time steps are added for learners, could be either from the same learning algorithm or configuration.A seasonal ARIMA model uses differencing at different learning algorithms. Bagging and boosting are two a lag equal to the number of seasons (s) to remove additive widely used ensemble learners.In Boosting, each tree learns seasonal effects. As with lag 1 differencing to remove a from its predecessors and updates the residual errors. Hence, trend, the lag s differencing introduces a moving average the tree that grows next in the sequence will learn from an term. The seasonal ARIMA model includes autoregressive updated version of the residuals. and moving average terms at lag s.We study the In contrast to bagging techniques like Random Autocorrelation and Partial Autocorrelation plot, and Forest, in which trees are grown to their maximum extent, determine the range of features to grid search.[8] boosting makes use of trees with fewer splits. Such small When performing SARIMA, we group the data by trees, which are not very deep, are highly interpretable. month instead of hours, to maintain seasonality. We use Parameters like the number of trees or iterations, the rate at SARIMA model from the statsmodel package in python, and which the gradient boosting learns, and the depth of the tree, test the aic value of the different models, and the choose the could be optimally selected through validation techniques best model for analysis, with parameters: (1, 1, 1)x(2, 1, 0, like k-fold cross validation. Having a large number of trees 12). might lead to overfitting. So, it is necessary to carefully choose the stopping criteria for boosting. RESULTS Boosting consists of three simple steps: We measure the performance of the models using metrics like mean absolute error and root mean square error. ● An initial model F0 is defined to predict the target variable y. This model will be associated with a residual (y – F0) Mean Absolute Error (MAE): MAE measures the average ● A new model h1 is fit to the residuals from the magnitude of the errors in a set of predictions, without previous step considering their direction. It’s the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight.
Root mean squared error (RMSE): RMSE is a quadratic
scoring rule that also measures the average magnitude of the error. It’s the square root of the average of squared differences between prediction and actual observation. CONCLUSION
Deep learning models such as the LSTM are evidently
effective for Bitcoin prediction with the LSTM very capable for recognising longer-term dependencies. However, a high variance task of this nature makes it difcult to transpire this into impressive validation results. As a result it remains a Model MAE RMSE difcult task. There is a ne line between overtting a model and preventing it from learning sufciently. Dropout is a LSTM (Model 1) 294.26 393.98 valuable feature to assist in improving this. Despite the metrics of sensitivity, specicity and precision indicating LSTM (Model 2) 43.64 45.15 good performance, the actual performance of the SARIMA forecast based on error was signicantly worse than the XGBoost 365.84 387.96 neural network models. The LSTM outperformed the other models marginally, but not signicantly. However, the SARIMA 152.44 169.31 LSTM takes considerably longer to train. The XGBoost The results of all the models for predicting the closing price model gave good results, but was overfit during training of Bitcoin for 6269 days are shown in Fig. (2), respectively. One limitation of the research is that the model has not been It can be seen from the two figures that the predicted values implemented in a practical or real time setting for predicting of the models are close to the actual values, and the trend into the future as opposed to learning what has already direction of the changes are also highly consistent. The happened. In addition, the ability to predict using streaming results showed the train and validation loss meeting and data should improve the model. There are many more socal, revealed a good fit for our LSTM model. Based on loss economic and political factors involved in BITcoin prices function history of model, the forecasting error remain a that can largely affect the prices, like social hype. However, stable status after 50 epochs. the LSTM and even XGBoost models scale well to the data and predict the price well with an error in 10’s of US Dollars.
:Pal Singh. (2018). Automated Cryptocurrencies Price Prediction Using Machine Learning. International Journal of Soft Computing. 8. 1758-1761. 10.21917/ijsc.2018.0245. 2. Velankar, Siddhi et al. “Bitcoin price prediction using machine learning.” 2018 20th International Conference on Advanced Communication Technology (ICACT) (2018): 144-147. This the prediction of the time series SARIMA model, with the redline being the values predicted by the model 3. Wu, Chih-Hung & Lu, Chih-Chiang & Ma, Yu-Feng & Lu, Ruei-Shan. (2018). A New Forecasting Framework for Bitcoin Price with LSTM. 168-175. 10.1109/ICDMW.2018.00032. 4. McNally, Sean et al. “Predicting the Price of Bitcoin Using Machine Learning.” 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) (2018): 339-343 5. Saad, Muhammad and Aziz Mohaisen. “Towards characterizing blockchain-based cryptocurrencies for highly-accurate predictions.” IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) (2018): 704-709. 6. C. Olah, "Understanding LSTM Networks," http://colah.github.io/posts/2015-08-Understanding -LSTMs/,2015. 7. Jason Brownlee “ A gentle introduction to XGBoost for Applied Machine learning” https://machinelearningmastery.com/gentle-introdu ction-xgboost-applied-machine-learning/ 8. Jason Brownlee “ A gentle introduction to SARIMA for Time Series Forecasting in Python” https://machinelearningmastery.com/sarima-for-tim e-series-forecasting-in-python/