Sie sind auf Seite 1von 6

Bitcoin Price Forecasting using Deep Learning

Devansh R. Parikh
NMIMS University, devansh.parikh@gmail.com

Abstract - Bitcoin is a cryptocurrency launched in 2008 applications. This transformation of cryptocurrencies is


as a means of transaction between members without any categorized as Blockchain 1.0, 2.0 and 3.0. Bitcoin is an
intermediaries. Among their many characteristics, example of Blockchain 1.0, since it only allows the transfer
Bitcoin price s highly volatile; with it skyrocketing or of digital tokens i.e. Bitcoins.
suddenly falling. Hence, traders are always looking As a currency, Bitcoin offers a novel opportunity for price
forward to methods of accurately predicting the price prediction due its young age and high volatility, which is far
trend. In this paper, we apply machine learning and deep greater than that of traditional currencies. However, of all
learning methods,to forecast the price of Bitcoin and the papers published on Bitcoin, very few deal with machine
compare the results with traditional time series learning and deep learning algorithms. To allow for a
algorithm ARIMA. We use two variants of the popular comparison with traditional forecasting approaches, an
LSTM model and XGBoost algorithm, with the ARIMA model is also developed for performance
univariate LSTM outperforming all other models. comparison with machine learning and deep learning
models.
Index Terms - Bitcoin, Time Series, Deep Learning, LSTM,
XGBoost, SARIMA LITERATURE SURVEY

INTRODUCTION Ruchi Mittal, Shefali Arora and M.P.S Bhatia (2018)[1]


have explored how the features in the bitcoin network
Bitcoin is the world,s most popular and valuable explain its price hike.They monitor the change in the
cryptocurrency that is traded in over 40 exchanges activities over time and relate them to economic theories.
worldwide. ​It is a decentralized digital currency without a Their dataset consists of over nine features relating to the
central bank or single administrator that can be sent from cryptocurrency price recorded daily over the period of 6
user to user on the peer-to-peer bitcoin network without the months.Multivariate linear regression has been used to
need for intermediaries. predict the highest and lowest prices of cryptocurrency. In
Bitcoin was invented by an unknown person or group of this model, multiple independent variables contribute to a
people using the name Satoshi Nakamoto and released as dependent feature with the help of multiple coefficients. For
open-source software in 2009. Bitcoins are created as a this study,they acquired data from the public Blockchain of
reward for a process known as mining. They can be Bitcoin and API’s of online resources. In the dataset, they
exchanged for other currencies, products, and services. They have made use of the following independent features to
are all stored in a digital cryptocurrency wallet. determine the highest price of cryptocurrencies. These
Bitcoin prices are highly volatile with sudden jumps and features are: Open, Low, Close.
drops especially during the time of August 2017 to July Muhammad Saad, Aziz Mohasien (2018)[2] monitor the
2018 with the price going as high as US$ 19,343. (December changes in Bitcoin activities over time and relates then to
16, 2017). economic theories to identify key network features and
The underlying technology of every cryptocurrency determine the demand and supply dynamics of a
including Bitcoin is the Blockchain. Blockchain acts as a cryptocurrency. Finally they use machine learning method to
decentralized public database that preserves anonymity and construct models that predict Bitcoin price. They collect
creates trust between members. Trust is the result of Bitcoin’s data over more than 20 months and estimate the
consensus protocols like Proof-of-Stake (PoS), most signicant features that inuence the price. They
Proof-of-Work (PoW), Proof-of-Knowledge (PoK), and computed the correlation between features such as hash rate,
distributed consensus. A ledger is maintained that hold the number of users, transaction rate, total bitcoins and price.
record for every valid transaction ever made in the They then map the change in features on users and network
cryptocurrencies network. This decentralized environment activities to understand the dynamics of Bitcoin. For this
and the verification measures implemented make study, they acquired data from the public Blockchain of
Blockchains tamper proof, and these lay the foundation for Bitcoin and API’s of online resources. The dataset included
cryptocurrency foundation. features such as the number of wallets, unspent transaction
Cryptocurrency not only involves the exchange of digital outputs (UTXO’s), mempool size, block size, mean
assets, but also smart contracts and decentralized conrmation time, miner’s income, transactions per day,
transactions per block, unique Bitcoin addresses, cumulative encode the predicted price into categorical variable
networks hashing rate, network’s difculty, fee, fee per reecting: price up, down or no change. The dependent
transaction, system-wide total bitcoins, trade volume and the variables for this paper come from the Coindesk website,
market price of Bitcoin. The paper implements a regression and Blockchain.info. In addition to the closing price, the
approach with Linear Regression, Decision tree and opening price, daily high and daily low are also included as
Gradient Descent. It also implements a Deep Learning well as Blockchain data, i.e. the mining difculty and hash
Approach and used conjugate gradient algorithm with linear rate. The features which have been engineered (considered
search for price prediction. as technical analysis indicators ) include two simple moving
Siddhi Velankar, Sakshi Valecha, Shreya Maji (2018)[3] averages (SMA) and a de-noised closing price. The RNN
have made an attempt is made to predict the Bitcoin price and LSTM both performed well, however LSTM performed
accurately taking into consideration various parameters that better and more capable in recognising long term
affect the Bitcoin value. For the first phase of the analysis, dependencies.
they aim to understand and identify daily trends in the Wu, Chih-Hung, Ma, Yu-Feng, Lu, Chih-Chiang, Lu,
Bitcoin market while gaining insight into optimal features Ruei-Shan (2018)[5] talk about Long short-term memory
surrounding Bitcoin price. The data set consists of various (LSTM) networks, which are a state-of-the-art sequence
features relating to the Bitcoin price and payment network learning in deep learning for time series forecasting. The
over the course of five years, recorded daily. For the second paper proposes two LSTM for new forecasting techniques -
phase of the analysis they try to predict the sign of the daily conventional LSTM and LSTM with AR(2) model. The
price change with highest possible accuracy. Database paper is basically a comparison between the two models and
collection is done from Quandl and CoinMarketCap. This is indicates how the LSTM with AR(2) model is superior and
time-series data, from over a time period of five years at achieves higher accuracy than conventional LSTM. Metrics
different time instances based on the nature of Bitcoin evaluate are Mean squared error (MSE), root mean square
transactions. It is also normalized and smoothened using error (RMSE), mean absolute percentage error (MAPE)
methods like log normalization, Standard deviation and mean absolute error (MAE). The paper states that due to
normalization and Z-score normalisation. Two algorithms lack of seasonality of cryptocurrency markets and it's high
i.e. Bayesian Regression and GLM/Random forest were volatility, time series methods are not very effective for this
proposed. task. Two methods are used: In method 1 ACF and PACF is
Sean McNally, Jason Roche, Simon Caton (2018)[4] attempt calculated directly, without considering limitation of time
to ascertain with what accuracy the direction of Bitcoin price series method, and find the number of periods of price lag
in USD can be predicted. This paper achieves varying by its graphical features. This used in LSTM. In the second
degrees of success through the implementation of a Bayesian method that is used is the traditional time series method to
optimised recurrent neural network (RNN) and a Long Short perform ADF unit root test on the Bitcoin price. The second
Term Memory (LSTM) network. The popular ARIMA stage is the demonstration of three models. The first model
model for time series forecasting is implemented as a is conventional LSTM method that uses the original bitcoin
comparison to the deep learning models. As expected, the closing price series and trade volumes directly for training
non-linear deep learning methods outperform the ARIMA and prediction with LSTM. The second model is the
forecast which performs poorly. The LSTM achieves the proposed hybrid LSTM method. The method first uses the
highest classication accuracy of 52% and a RMSE of 8%. ACF and PACF graphical features of the bitcoin price to
Manual grid search and Bayesian optimisation were utilised find the p (price lag period) and q (moving averaging
in this study. Grid search, implemented for the Elman RNN, period), and the transaction volume as the predictive
is the process of selecting two hyperparameters with a variables, and then use LSTM for training and prediction.
minimum and maximum for each. Similar to the RNN, The third model is the conventional time series method,
Bayesian optimisation was chosen for selecting LTSM which is based on the time series verification procedure
parameters where possible. This is a heuristic search method described in stage 1, and is predicted by ARIMA or Transfer
which works by assuming the function was sampled from a Function Method (TF). In the final stage, the prediction
Gaussian process and maintains a posterior distribution for results of the three models are calculated respectively, such
this function as the results of different hyperparameter as MSE, RMSE, MAE and MAPE, to evaluate the three
selections are observed. models, which have better predictive ability. In conclusion,
Finally, both deep learning models are benchmarked on both the proposed LSTM forecasting framework can overcome
a GPU and a CPU with the training time on the GPU and improve problem of input variable selection in LSTM,
outperforming the CPU implementation by 67.7%. The thereby eliminating the need for researchers to have domain
independent variable for this study is the closing price of knowledge and trial and error to determine the optimal
Bitcoin in USD taken from the Coindesk Bitcoin Price selection of input variables.
Index. To assess the performance of models, the root mean
squared error (RMSE) of the closing price is used to further
learning models for the LSTM both using univariate and
METHODOLOGY multivariate input data.We use the SARIMA model for
A. DATASET traditional time series forecasting and grid search various
parameters until we converge on the best model. All of the
The independent variable for this study is the closing price implementation was done in Python, where we used Keras
of Bitcoin in USD taken from the Kaggle Historic Coin for the deep learning models. We have predicted the future
dataset, which consists of minute to minute data of Open , values of “Close” from the dataset.
High, Low, Close, Volume in BTC and indicated currency,
and weighted bitcoin price. The dataset Bitcoin dataset IMPLEMENTATION
used, ranges from the 1st of January 2012 until the 13th of A. LONG SHORT TERM MEMORY (LSTM) NETWORK
March 2019.
LSTM’s are a type of Recurrent Neural Network. Recurrent
B. DATA PREPROCESSING networks, take as their input not just the current input
The data has a lot of missing values. Volume/trades are a example they see, but also what they have perceived
single event so fill NAN’s with zeros for relevant fields. previously in time. The decision a recurrent net reached at
next we need to fix the OHLC (open high low close) data time step t-1 affects the decision it will reach one moment
which is a continuous time series se we forwards fill those later at time step t. So recurrent networks have two sources
values. We then group the minute to minute data to an hour of input, the present and the recent past, which combine to
by taking mean of the values. The data is scaled using determine how they respond to new data. Adding memory to
Min-Max Scalar, that is scaled between 0 and 1 using the neural networks has a purpose: There is information in the
maximum and minimum value in that dataset. It basically sequence itself, and recurrent nets use it to perform tasks
helps to normalise the data within a particular range. that feedforward networks can’t.
Sometimes, it also helps in speeding up the calculations in In the mid-90s, a variation of recurrent net with so-called
an algorithm.A plot of all the cleaned data features can be Long Short-Term Memory units, or LSTMs, was proposed
seen in figure I. by the German researchers Sepp Hochreiter and Juergen
Schmidhuber as a solution to the vanishing gradient
problem.

LSTMs help preserve the error that can be backpropagated


through time and layers. By maintaining a more constant
error, they allow recurrent nets to continue to learn over
many time steps (over 1000), thereby opening a channel to
link causes and effects remotely.

The LSTM networks are composed of an input layer, one or


more hidden layers, and an output layer. The structure of
LSTM memory cell is illustrated in Fig. 2. Each of the
memory cell in LSTM network has three types of gates—the
forget gate, the input gate, and the output gate to maintain
and adjust its cell state St. 1. The forget gate ft defines which
information is deleted from the memory (cell state). 2. The
input gate it specifies which information is added to the
memory (cell state). 3.The output gate Ot specifies which
information from the memory (cell state) to be used as
output information [34]. At every time step t, LSTM decide
what information will remove from the cell state through the
decision by a transformation function (sigmoid or tanh) in
C. MODELS the forget gate layer. The input xt and its output ht-1 of the
We explore how the following models work on the Bitcoin memory cells at the previous time step t-1 and then outputs a
data: LSTM, XGBoost and SARIMA time series. We look number between 0 and 1 for each number in the cell state
and compare the different metrics like MAE (Mean Absolute Ct-1. The number 1 denotes “completely keep this” while a
Error) and, RMSE (Root Mean Square Error). 0 denotes “complete get rid of this” .[6]
Appropriate methods of finding parameters are critical to
model performance. We used Grid Search CV to optimise
the parameters for XGBoost, and tested various deep
● Now, F0 and h1 are combined to give F1, the
boosted version of F0. The mean squared error
from F1 will be lower than that from F0.

To improve the performance of F1, we could model after the


residuals of F1 and create a new model F2. This can be done
for ​‘m’ i​ terations, until residuals have been minimized as
much as possible.Here, the additive learners do not disturb
the functions created in the previous steps. Instead, they
impart information of their own to bring down the errors.[7]
Here, we implemented a XGBoost model using the
“xgboost” package in Python. Using Grid Search on an
XGBRegressor model, we get the best parameters with a
We have implemented 2 LSTM models, one with minimum child weight of 10 and a max depth of 15, among
multivariate input data (Model 1) and the other with others, with a learning rate of 0.1.
univariate input data (Model 2). Both the models were
configured with early stopping criteria nand each was run on
C. SARIMA
100 epochs.
The architecture of both models is the same: 2 stacked Seasonal Autoregressive Integrated Moving Average,
LSTM layers and 1 densely connected output neuron. SARIMA or Seasonal ARIMA, is an extension of ARIMA
B. XGBOOST that explicitly supports univariate time series data with a
seasonal component.It adds three new hyperparameters to
XGBoost is an implementation of gradient boosted decision specify the autoregression (AR), differencing (I) and moving
trees designed for speed and performance. It is an ensemble average (MA) for the seasonal component of the series, as
learning method.Ensemble learning offers a systematic well as an additional parameter for the period of the
solution to combine the predictive power of multiple seasonality. Along with the 3 trend elements of ARIMA
learners. The resultant is a single model which gives the (autoregression, difference order and moving average) 4
aggregated output from several models. seasonal elements (autoregression, difference order, moving
The models that form the ensemble, also known as base average and number of time steps are added for
learners, could be either from the same learning algorithm or configuration.A seasonal ARIMA model uses differencing at
different learning algorithms. Bagging and boosting are two a lag equal to the number of seasons (s) to remove additive
widely used ensemble learners.In Boosting, each tree learns seasonal effects. As with lag 1 differencing to remove a
from its predecessors and updates the residual errors. Hence, trend, the lag s differencing introduces a moving average
the tree that grows next in the sequence will learn from an term. The seasonal ARIMA model includes autoregressive
updated version of the residuals. and moving average terms at lag s.We study the
In contrast to bagging techniques like Random Autocorrelation and Partial Autocorrelation plot, and
Forest, in which trees are grown to their maximum extent, determine the range of features to grid search.[8]
boosting makes use of trees with fewer splits. Such small When performing SARIMA, we group the data by
trees, which are not very deep, are highly interpretable. month instead of hours, to maintain seasonality. We use
Parameters like the number of trees or iterations, the rate at SARIMA model from the statsmodel package in python, and
which the gradient boosting learns, and the depth of the tree, test the aic value of the different models, and the choose the
could be optimally selected through validation techniques best model for analysis, with parameters: (1, 1, 1)x(2, 1, 0,
like k-fold cross validation. Having a large number of trees 12).
might lead to overfitting. So, it is necessary to carefully
choose the stopping criteria for boosting. RESULTS
Boosting consists of three simple steps:
We measure the performance of the models using metrics
like mean absolute error and root mean square error.
● An initial model F0 is defined to predict the target
variable y. This model will be associated with a
residual (y – F0) Mean Absolute Error (MAE)​: MAE measures the average
● A new model h1 is fit to the residuals from the magnitude of the errors in a set of predictions, without
previous step considering their direction. It’s the average over the test
sample of the absolute differences between prediction and
actual observation where all individual differences have
equal weight.

Root mean squared error (RMSE)​: RMSE is a quadratic


scoring rule that also measures the average magnitude of the
error. It’s the square root of the average of squared
differences between prediction and actual observation.
CONCLUSION

Deep learning models such as the LSTM are evidently


effective for Bitcoin prediction with the LSTM very capable
for recognising longer-term dependencies. However, a high
variance task of this nature makes it difcult to transpire this
into impressive validation results. As a result it remains a
Model MAE RMSE difcult task. There is a ne line between overtting a model
and preventing it from learning sufciently. Dropout is a
LSTM (Model 1) 294.26 393.98 valuable feature to assist in improving this. Despite the
metrics of sensitivity, specicity and precision indicating
LSTM (Model 2) 43.64 45.15 good performance, the actual performance of the SARIMA
forecast based on error was signicantly worse than the
XGBoost 365.84 387.96 neural network models. The LSTM outperformed the other
models marginally, but not signicantly. However, the
SARIMA 152.44 169.31 LSTM takes considerably longer to train. The XGBoost
The results of all the models for predicting the closing price model gave good results, but was overfit during training
of Bitcoin for 6269 days are shown in Fig. (2), respectively. One limitation of the research is that the model has not been
It can be seen from the two figures that the predicted values implemented in a practical or real time setting for predicting
of the models are close to the actual values, and the trend into the future as opposed to learning what has already
direction of the changes are also highly consistent. The happened. In addition, the ability to predict using streaming
results showed the train and validation loss meeting and data should improve the model. There are many more socal,
revealed a good fit for our LSTM model. Based on loss economic and political factors involved in BITcoin prices
function history of model, the forecasting error remain a that can largely affect the prices, like social hype. However,
stable status after 50 epochs. the LSTM and even XGBoost models scale well to the data
and predict the price well with an error in 10’s of US
Dollars.

REFERENCES

1. Mittal, Ruchi & Arora, Shefali & Bhatia, Mohinder


:Pal Singh. (2018). Automated Cryptocurrencies
Price Prediction Using Machine Learning.
International Journal of Soft Computing. 8.
1758-1761. 10.21917/ijsc.2018.0245.
2. Velankar, Siddhi et al. “Bitcoin price prediction
using machine learning.” 2018 20th International
Conference on Advanced Communication
Technology (ICACT) (2018): 144-147.
This the prediction of the time series SARIMA model, with
the redline being the values predicted by the model 3. Wu, Chih-Hung & Lu, Chih-Chiang & Ma,
Yu-Feng & Lu, Ruei-Shan. (2018). A New
Forecasting Framework for Bitcoin Price with
LSTM. 168-175. 10.1109/ICDMW.2018.00032.
4. McNally, Sean et al. “Predicting the Price of
Bitcoin Using Machine Learning.” 2018 26th
Euromicro International Conference on Parallel,
Distributed and Network-based Processing (PDP)
(2018): 339-343
5. Saad, Muhammad and Aziz Mohaisen. “Towards
characterizing blockchain-based cryptocurrencies
for highly-accurate predictions.” IEEE INFOCOM
2018 - IEEE Conference on Computer
Communications Workshops (INFOCOM
WKSHPS) (2018): 704-709.
6. C. Olah, "Understanding LSTM Networks,"
http://colah.github.io/posts/2015-08-Understanding
-LSTMs/,2015.
7. Jason Brownlee “ A gentle introduction to
XGBoost for Applied Machine learning”
https://machinelearningmastery.com/gentle-introdu
ction-xgboost-applied-machine-learning/
8. Jason Brownlee “ A gentle introduction to
SARIMA for Time Series Forecasting in Python”
https://machinelearningmastery.com/sarima-for-tim
e-series-forecasting-in-python/

Das könnte Ihnen auch gefallen