Sie sind auf Seite 1von 11

algorithms

Article
Study of Precipitation Forecast Based on Deep
Belief Networks
Jinglin Du 1,2, * ID
, Yayun Liu 1,2 and Zhijun Liu 3
1 School of Electronic and Information Engineering, Nanjing University of Information Science and
Technology, Nanjing 210044, China; 20152281493@nuist.edu.cn
2 Jiangsu Key Laboratory of Meteorological Observation and Information Processing, Nanjing University of
Information Science and Technology, Nanjing 210044, China
3 Jiangsu Longchuan Water Conservancy Construction Co., Ltd., Yangzhou 225200, China;
liuzhij2018@163.com
* Correspondence: jldu@nuist.edu.cn; Tel.: +86-025-5823-5191

Received: 21 July 2018; Accepted: 30 August 2018; Published: 4 September 2018 

Abstract: Due to the impact of weather forecasting on global human life, and to better reflect
the current trend of weather changes, it is necessary to conduct research about the prediction of
precipitation and provide timely and complete precipitation information for climate prediction and
early warning decisions to avoid serious meteorological disasters. For the precipitation prediction
problem in the era of climate big data, we propose a new method based on deep learning. In this
paper, we will apply deep belief networks in weather precipitation forecasting. Deep belief
networks transform the feature representation of data in the original space into a new feature
space, with semantic features to improve the predictive performance. The experimental results show,
compared with other forecasting methods, the feasibility of deep belief networks in the field of
weather forecasting.

Keywords: deep belief networks; precipitation prediction

1. Introduction
Whether it is sailing, navigation, agriculture, or travel, accurate weather forecasts are always
needed. In recent years, science and technology have developed rapidly, the methods of weather data
collection have become more diverse, and more and more meteorological data can be collected. It is
always a challenge to improve the accuracy of traditional weather forecasting with large amounts of
collected meteorological data. Information and computer technology have driven the development of
massive data analysis capabilities, stimulated the interest of many research groups in using machine
learning techniques for big data research, and pushed them to explore hidden correlations of weather
forecasting with big data sets. Our mission is to build a powerful weather forecasting model that uses
a large amount of weather data to reveal hidden data associations in the data and ultimately improve
the accuracy of weather forecasts.
Precipitation forecasting is the core of the meteorological forecasting system. Improving the
accuracy of precipitation prediction results is crucial to improving the forecast results of the entire
meteorological forecasting system. Precipitation prediction is a complicated systematic project.
The establishment of a meteorological forecasting system involves not only the collection and storage
of data, such as climate, geography, and environment, but also accurate predictions based on the
obtained data. This has always been a hot issue in the field of meteorological forecasting. Currently,
precipitation data is collected mainly in the following three ways: Measurement of rain ganges,
satellite-derived rainfall data, and radar rainfall estimation [1]. The three acquisition methods have

Algorithms 2018, 11, 132; doi:10.3390/a11090132 www.mdpi.com/journal/algorithms


Algorithms 2018, 11, 132 2 of 11

their own advantages and disadvantages. Although the precipitation data obtained by rain ganges
is accurate, it only reflects the precipitation in a small area, with poor spatial representativeness.
Precipitation data from satellites and radars have a high coverage area, but the data accuracy is not
very satisfactory. Therefore, the precipitation data collected by the automatic weather station is the
most reliable data among the precipitation observation data. However, due to the limitation of the
geographical environment and funds, the automatic weather station cannot be evenly distributed,
so the observation data inevitably appear unevenly distributed in time and space. Although the
accuracy of precipitation data from rain ganges is great, the data lacks continuity in time and space,
and it is difficult to reflect the overall trend of regional climate change. Therefore, the existing ground
weather station cannot meet the increasingly demanding accuracy requirements of today’s precipitation
products, and there is an urgent need for research breakthroughs.
Nowadays, how to improve the accuracy of forecasts is a hot and difficult topic in the field of
forecasting. In the era of big data, how to use the large amount of weather data collected to improve
the accuracy of the traditional weather forecast rate has always been a challenge of weather forecasting.
Our task is to create a powerful weather forecasting model that uses a large amount of weather data
to reveal hidden data associations in the weather data and eventually improve the accuracy of the
weather forecast.

2. Related Work
For the requirement of high quality and high resolution precipitation products, the National
Oceanic and Atmospheric Administration (NOAA), National Severe Storms Laboratory (NSSL),
and the National Weather Service (NWS) Hydrology Development Office jointly developed the
NMQ plan (The National Mosaic and Multisensor QPE Project), developed real-time quantitative
precipitation estimations and introduced all kinds of high resolution (QPE) products [2,3]. Meanwhile,
the free MPing software (Version 2.0) developed is used to collect meteorological data of the public’s
participation in the perception, collects the meteorological information around the public’s location,
and transmits it to the server through the MPing software installed in smart mobile devices to assist
the dual-polarization radar data.
The weather forecasting method has achieved long-term development in recent years, with
a lot of scholars establishing some prediction models about precipitation forecasting, such as the
ARIMA (Autoregressive Integrating Moving Average) model [4–7], Markov model [8–10], gray
theory-based prediction model [11], and so on. These studies have contributed to the development
of the precipitation forecast. However, there are some shortcomings that should be further studied.
The Markov model and gray model-based forecasting model are more suitable for the exponential
growth of rainfall. The prediction error of extremum is larger in the ARIMA model. Deep learning
is a novel machine learning method proposed in the field of artificial intelligence in recent years.
Deep learning can be an effective big data processing method by training big data, mining and
capturing the deep connection between big data to improve the classification and prediction accuracy.
In addition, the deep learning model training is faster, and it shows better performance than the
general growth of the method with the increase of training samples. The weather forecast model
based on deep learning is better equipped to overcome the shortcomings of the existing forecast
methods. Due to the achievements of deep learning algorithms in various fields, more and more
people have tried to use deep learning algorithms in the field of weather forecasting, and some
progress has been made. Researchers have done a lot of important work in this area. For example,
Hsu demonstrated the potential of identifying the structure and parameters of three-layer feed forward
Artificial Neural Network models (ANN), and it provided a better representation of the rainfall-runoff
relationship of the medium-size Leaf River basin near Collins, Mississippi, than the linear ARMAX
(autoregressive moving average with exogenous inputs) and the conceptual SAC-SMA (Sacramento
soil moisture accounting) model [12]. Liu proposed that the deep neural network (DNN) model
may granulate the features of the raw weather data layer by layer to process massive volumes of
Algorithms 2018, 11, 132 3 of 11

weather data [13]. Belayneh and Adamowski evaluated the effectiveness of three data-driven models,
the artificial neural networks (ANNs), support vector regression (SVR), and wavelet neural networks
(WN), by using the standard precipitation index for forecasting drought conditions in the Awash River
Basin of Ethiopia [14]. Afshin proposed a long term rainfall forecasting model using the integrated
wavelet and neuro-fuzzy long term rainfall forecasting model [15]. The other study has shown
the applicability of an ensemble of artificial neural networks and learning paradigms for weather
forecasting in southern Saskatchewan, Canada [16]. Valipour forecasted annual precipitation based on
a non-linear autoregressive neural network (NARNN) and non-linear inputbility of an ensemble of
artificial neural networks with historical precipitation data, and the results showed that the accuracy of
the NARNNX (non-linear autoregressive neural network with exogenous input) was better than that of
the NARNN and NIO (non-linear input–output), based on values of r [17]. Ha used DBN (Deep belief
networks) to improve precipitation accuracy with the past precipitation, temperature, and parameters
of the sun and moon’s motion in Seoul. Their experiment proved that the DBN performed better than
the MLP (Multi-Layer Perceptron) for forecasting precipitation [18].
Under the background of meteorological big data, deep learning technology can use massive
multi-source meteorological data and take sufficient observation data as training samples to ensure the
accuracy of the weather forecasting model. The deep learning model can explore the inherent data
relationship between meteorological elements in depth, and establish a more accurate proxy model of
complex mechanism models between weather conditions and meteorological elements.
Based on the above reasons, this study proposes an effective rainfall forecasting model based
on deep belief networks. The purpose of our survey is to explore the potential of deep learning
technologies for weather forecasting. We will explore several deep learning models for weather
forecasting, such as support vector machine (SVM), support vector machines based on particle swarm
optimization (PSO-SVM), and deep belief network models (DBN). The remainder of this paper is
organized as follows: In the following section an introduction to PSO-SVM and DBN are presented.
In the Simulation and Discussion sections, we describe the features of the data used for predictions of
precipitation, as well as the comparative results among these prediction models. The last part is the
Results section of this paper, and it presents some discussions of our model and the last section states
our conclusions.

3. Material and Methods

3.1. SVM Based on the PSO


The support vector machine with particle swarm optimization (PSO-SVM) was used to analyze
and process the precipitation data in the literature [19]. SVM has many unique advantages in solving
small sample, nonlinear, and high dimensional pattern recognition problems. It is given by Vapnik
in 1995 [20]. An artificial neural network model based on hybrid genetic algorithm and particle
swarm optimization (HGAPSO) optimization was proposed as an intelligent method for predicting
natural depletion of asphaltenes [21]. SVM is a classification method based on statistical learning and
Vapnik–Chervonenkis (VC) dimensional theories [22].
The literature [19] introduced the particle swarm optimization algorithm to search the training
parameters in global space to improve the accuracy of the model for precipitation prediction.
A combination of the PSO algorithm and SVM model can effectively solve the parameter selection
problem of the SVM algorithm. It used the PSO algorithm to optimize the parameters in the
literature [19]: Initialize parameters of particle swarm optimization include the population size,
maximum iteration number, and the parameters, C and g. Particle swarm optimization is adopted
to search the optimal solution of particles in global space by using the cross-validation algorithm.
Inputting the training data into the SVM model with the optimal parameters allows the trained
PSO-SVM model to be obtained. Figure 1 is the flowchart of the PSO-SVM model.
Algorithms 2018, 11, 132 4 of 11
Algorithms 2018, 11, x FOR PEER REVIEW 4 of 11

Algorithms 2018, 11, x FOR PEER REVIEW 4 of 11

Figure 1. The flowchart of the support vector machine with particle swarm optimization (PSO-SVM)
Figure 1. The flowchart of the support vector machine with particle swarm optimization (PSO-SVM) model.
model.
Figure 1. The flowchart of the support vector machine with particle swarm optimization (PSO-SVM)
model.
3.2. 3.2.
Deep Belief
Deep Network
Belief Network
3.2. Deep Belief Network
Deep Deep belief networks
belief networksarearedeveloped
developedbased based on on multi-layer
multi-layer restricted Boltzmann machines
restricted Boltzmann machines
(RBMs)
(RBMs) Deep
[23]. The
[23]. belief
The networks
algorithm
algorithm isisare developed
composed
composed ofaabased
of on multi-layer
multi-layer
multi-layer restricted
restricted restricted
Boltzmann
Boltzmann Boltzmann
machine
machine machines
(RBM)
(RBM) andanda
a BPBP (RBMs)
(back [23]. The algorithm
(backpropagation)
propagation) is composed
network.
network. The topof
The top a multi-layer
layer
layer ofofthe
theBPrestricted
BP network
network Boltzmann
fine-tunes
fine-tunes machine
thethe
RBM (RBM)
RBM and
layers
layers belowabelow
to
BP (back
improve
to improve propagation)
the
the performance
performance network.
of theThe
of the top layerThe
algorithm.
algorithm. of the
The BP network
restricted
restricted fine-tunes
Boltzmann
Boltzmann the RBMisis
machine
machine layers
dividedbelow
divided intototwo
into two
improve
parts. the performance of the algorithm. The restricted Boltzmann machine is divided into two
parts. TheThe firstfirst part
part is aisvisible
a visible layer,
layer, V,V, whichisisused
which usedtotoreceive
receive the
the feature data,
data, and
andthethesecond
secondpartpart
parts. The first part is a visible layer, V, which is used to receive the feature data, and the second part
is a ishidden
a hidden layer,
layer, H, H, which
which is is
usedusedasasa afeature
featuredetector,
detector,including
including abstract
abstractfeatures
featuresofofthe
thedata
data[24].
[24].
is a hidden layer, H, which is used as a feature detector, including abstract features of the data [24].
This
This is is illustrated
illustrated in
in the the Figure
Figure 2 below.
2 below.
This is illustrated in the Figure 2 below.

h3
h3

W3
W3

hh22

W2 W2

hh11

W1 W1

vv

Figure 2.
Figure The structure chart
chartof deep belief networks.
Figure 2. 2.
TheThestructure
structurechart ofofdeep
deep belief
belief networks.
networks.
The study [25] showed that RBMs can be stacked and trained in a greedy manner to form so-
The study [25] showed that RBMs can be stacked and trained in a greedy manner to form so-
The study
called deep[25] showed
belief that(DBN).
networks RBMsDBNs can belearn stacked and trained
to extract a deep in a greedy manner
hierarchical to form
representation ofso-called
the
called deep belief networks (DBN). DBNs learn to extract a deep hierarchical representation k
of the
deeptraining
belief networks (DBN).
data. The joint DBNs learn
distribution between to extract
the input a deepvector, hierarchical l hidden layers,ofhthe
x , and the representation , istraining
k as
training data. The joint distribution between the input vector, x , and the l hidden layers, h , is as
The joint distribution between the input vector, x, and the l hidden layers, hk , is as follows:
data.follows:
follows:
l2
P ( x , h 1 ,  , h l )  (l k k 1
−l2 2P ( h kh k)) P ( h l 1 , h l ) (1)
1
l , h )  (
l 1
P (Ph(l h1 ,l h−l1), hl )
P( x, h , · · · , h ) = ( ∏k  0P(h h 1))))
1P ( x , h ,  k 0 P ( hk hk + (1)(1)
0 l 1 l k 1 k
where x  h , and P ( h , h ) is the V-H joint distribution in the top-level. RBM P ( h
k = 0 h ) is a
0 l 1 l k 1 k
where x  h , and P ( h , h ) is the V-H joint distribution in the top-level. RBM P ( h h ) is a
where conditional and P(hl −for
x = h0 ,distribution 1 , hthe
l ) isunits
theofV-H the visible layer conditioned
joint distribution in the on the units of the
top-level. RBMhidden k −1 | hk ) is
P(hlayer
conditional
of the RBM
a conditional
distribution k .the
for
at the level,for
distribution
the units of the visible layer conditioned on the units of the hidden layer
units of the visible layer conditioned on the units of the hidden layer
of the RBM at the level, k .
of the RBM at the level, k.
Algorithms 2018, 11, 132 5 of 11
Algorithms 2018, 11, x FOR PEER REVIEW 5 of 11

This paper uses a fast layer learning algorithm for deep belief nets proposed by Hinton [26]. The
This paper uses a fast layer learning algorithm for deep belief nets proposed by Hinton [26].
principle of greedy layer-wise unsupervised training can be applied to DBNs with RBMs as the
The principle of greedy layer-wise unsupervised training can be applied to DBNs with RBMs as the
building blocks for each layer [25,27]. The process is as follows:
building blocks for each layer [25,27]. The process is as follows:
(0)
Step 1. Train the raw input, x  h , as the first RBM layer. The first layer is its visible layer.
Step 1. Train the raw input, x = h(0) , as the first RBM layer. The first layer is its visible layer.
Step 2. The hidden layer of the first RBM layer is used as the visual layer of the second RBM layer.
Step 2. The hidden layer of the first RBM layer is used as the visual layer of the second RBM layer.
The output of the first layer is used as the input of the second layer. This representation can
The output of the first layer is used as the input of the second layer. This representation can be
be chosen as being the samples of p (h (1) h (0) ) or mean activations of p(h(1)  1 h(0) ) .
chosen as being the samples of p(h(1) |h(0) ) or mean activations of p(h(1) = 1|h(0) ).
Step 3. Take the transformed samples or mean activations as training examples to train the second
Step 3. Take the transformed samples or mean activations as training examples to train the second
layer as an RBM.
layer as an RBM.
Step 4. Repeat Step 2 and Step 3, upward of either samples or mean values each iterate.
Step 4. Repeat Step 2 and Step 3, upward of either samples or mean values each iterate.
Step 5. When the training period is reached, or this satisfies the stop condition, end the iteration.
Step 5. When the training period is reached, or this satisfies the stop condition, end the iteration.
In this paper, we focus on fine-tuning by supervised gradient descent (SGD). We use a logistic
In this paper,
regression we (LRC)
classifier focus on fine-tuning
to classify by supervised
the input vector, x gradient
, based ondescent (SGD).
the output We last
of the use hidden
a logistic
regression (classifier
l) (LRC) to classify the input vector, x, based on the output of the
layer, h , of the DBN. Fine-tuning is then performed via SGD of the negative log-likelihood cost last hidden layer,
( l )
h function.
, of the DBN. Fine-tuning is then performed via SGD of the negative log-likelihood cost function.
TheTheprogram
program module
moduleisisshown
shownin inFigure
Figure 3.
3.

Figure 3. The flow chart of the deep belief networks model.


Figure 3. The flow chart of the deep belief networks model.

Each module in Figure 3 is described as follows:


Each module in Figure 3 is described as follows:
(1) Import data sets
(1) Import data sets
Import preprocessed weather data from the database. The data is divided into three data sets:
Import preprocessed
Training data, weather
verification datatest
data, and from the Store
data. database. The data
data and data labels
is divided
into into threeand
data_set data
sets: Training data, verification data, and test data. Store data and data labels into data_set and
data_label_set.
data_label_set.
(2) Conversion data format
(2) Conversion
The formatdata format
of the read data is a matrix, which requires further processing. Therefore, the data
setThe
can format
be loaded intoread
of the the shared
data is variable,
a matrix,reducing the consumption
which requires of continuously
further processing. copiedthe
Therefore, data,
data
setand
canimproving
be loadedthe
intoefficiency of the
the shared program.
variable, At the same
reducing time, the labelof
the consumption ofcontinuously
the data set is copied
converted
data,
andinto a one-dimensional
improving vector
the efficiency ofto facilitate
the program. the At
program calculation.
the same time, the label of the data set is converted
into
(3)a one-dimensional vector to facilitate the program calculation.
Establish DBN model
(3) Establish
InitializeDBN model
the parameters. Set the learning rate, fiftytune_lr = 0.1, the maximum iteration number of
pre-training, pretraining_epochs = 100, the learning rate of pre-training, pretrain_lr = 0.01, the maximum
Initialize the parameters. Set the learning rate, fiftytune_lr = 0.1, the maximum iteration
iteration number of training, training_epochs = 100, and the batch data size, batch_size = 10.
number of pre-training, pretraining_epochs = 100, the learning rate of pre-training, pretrain_lr =
0.01, the maximum iteration number of training, training_epochs = 100, and the batch data size,
batch_size = 10.
Algorithms 2018, 11, 132 6 of 11

(4) Pre-training model


Divide the training data into several minbatches to shorten the overhead of building models.
At the same time, we give an index to every minbatch. An RBM is trained on the minbatch according
to the index value, and establishes the function model. Through continuous iteration, we can build
function models in all minbatches, so we get a series of model functions.
(5) Fine-tuning the model
Create three functions to complete the fine-tuning of the model. The three functions calculate the
loss function on a batch of the training, verification, and test set. During the fine tuning, we perform
random gradient descent through the MLP to find the best DBN model with the best loss value.
(6) Results and Models
We test the test data with the best DBN model and get the test results. Get the best test accuracy
on the test set and the time spent in the pre-training and fine-tuning phases of the program.

4. Results and Discussion


4.1. Data Collection and Preprocessing
For this study, we use the one-year ground-based meteorological data from Nanjing Station
(NO. [58238]). The data sets are downloaded from the China meteorological data network, and the
information of the Nanjing station is showed in Table 1.

Table 1. Station information of China Ground Weather Station.

Air Pressure Sensor Observatory


Province Station No. Station Name Latitude Longitude
Pull Height (m) Height (m)
Jiangsu 58238 Nanjing 31.56 118.54 36.4 35.2

The dataset contains atmospheric pressure, sea level pressure, wind direction, wind speed, relative
humidity, and precipitation. Data is collected every three hours. As shown in Table 2, the first line is
the attributes of the original meteorological data set; PRS represents atmospheric pressure, PRS_Sea is
the sea level pressure, WIN_D is the wind direction, WIN_S is the wind speed, TEM is the temperature,
RHU is the relative humidity, and PRE_1h is the one hourly precipitation.

Table 2. Original meteorological data.

PRS PRS_Sea WIN_D WIN_S TEM RHU PRE_1h


(hPa) (hPa) (◦ ) (0.1 m/s) (◦ C) (%) (mm)
1031.2 1035.8 89 2.5 77 2 0
1030.8 1035.4 113 2.9 61 6.4 0
1027.3 1031.9 153 2.1 49 8.3 0
1026.2 1030.8 122 2 55 7.1 0
.. .. .. .. .. .. ..
. . . . . . .
1027.1 1031.7 121 0.7 71 4.1 0

4.2. Data Normalization


In general, data need to be normalized when the sample data are scattered and the sample
span is large, so that the data span is reduced to build the model and prediction. In the DBN
modeling, to improve the accuracy of prediction and smooth the training procedure, all the sample
data were normalized to fit them in the interval [0,1] using the following linear mapping formula:
(− xidmax , xidmax )
number; xi is i of input data; and xmax and xmin denote the maximum and minimum values of
the initial data, respectively.

4.3. Algorithm Validation


Algorithms 2018, 11, 132 7 of 11
For the assessment of the algorithms’ results, calibration and external testing were carried out.
In the calibration assessment, the models were developed using the training set and validated with
the same one. Finally, the prediction xi − xwere
X 0 =results min obtained via an external validation, training and
, i = 1, 2, 3 · · · , N (2)
testing the models with the training andmax x − xmin
test datasets, respectively.
where X 0 is
In this paper, the number
the mapped value; of
x issamples
the initial was
value200, 400,the
from 600, 800, 1000, data;
experimental 1200,N1400,
is the1600, 1800, and
total number;
2000. For
xi is each
i of type
input of sample
data; and xmax number,
and xmin 80%denote
of the the
datamaximum
were randomly selected values
and minimum as the training set for
of the initial
data, respectively.
constructing the algorithm model, and the remaining 20% of the data were used as the test set to
validate the model accuracy.
4.3. Algorithm Validation
From the analysis in the third section, we know the process of establishing the DBN model. We
used the data
For theprocessing
assessmentmethod described in
of the algorithms’ the third
results, sectionand
calibration to establish a DBN were
external testing model for theout.
carried pre-
In the calibration
processed assessment,
meteorological data.the models were
According developed
to the using theflow,
data processing training
theset and validated
following with theare
data results
same one. Finally, the prediction results were obtained via an external validation, training and testing
obtained.
theFigure
models with thethe
4 shows training and test datasets,
time required respectively.part with the number of samples changes.
for the pre-training
In this
The figure showspaper, theasnumber
that of samples
the number was 200,
of samples 400, 600,the
increases, 800, 1000,
time 1200, 1400,
required 1600, 1800, and
for pre-training 2000.
increases
For each type of sample number, 80% of the data were randomly
too. When the number of samples is small, the curve has a linear relationship with the number selected as the training set for of
samples; when the number of samples reaches a certain size, such as 1400, 1800, and 2000 points,tothe
constructing the algorithm model, and the remaining 20% of the data were used as the test set
validate the model accuracy.
time consumed is almost the same, but the training time of 1600 points is less than the 1400 point and
From the analysis in the third section, we know the process of establishing the DBN model. We
1800 point, and it is higher than the pre-training time of 1200 samples.
used the data processing method described in the third section to establish a DBN model for the
We finetuned the pre-training model that was obtained, used the verification set to obtain the
pre-processed meteorological data. According to the data processing flow, the following data results
error rate of the fine-tuned model, and selected the model with the smallest error rate as the best
are obtained.
model. The result in Figure 5 is the best error rate obtained by verifying the prediction effect of the
Figure 4 shows the time required for the pre-training part with the number of samples changes.
model by the verification set. When the number of samples gradually increases, the error of the pre-
The figure shows that as the number of samples increases, the time required for pre-training increases
training model
too. When theinnumber
the validation
of samplesset gradually
is small, the becomes
curve has stable. Therelationship
a linear error rate ofwith
the the
200number
point isofthe
highest compared
samples; when the to the
numbernumber of otherreaches
of samples samples. Becausesize,
a certain of the
suchsmall size of
as 1400, the and
1800, data2000
set, it cannot
points,
establish a better pre-training model, so the error is higher than others. The subsequent
the time consumed is almost the same, but the training time of 1600 points is less than the 1400 point curve changes
smoothly
and 1800 and the and
point, errorit rate is stable
is higher thanat around
the 11%. time of 1200 samples.
pre-training

Figure 4. Time-varying curve of pre-training with different sample sizes.


Figure 4. Time-varying curve of pre-training with different sample sizes.

We finetuned the pre-training model that was obtained, used the verification set to obtain the
error rate of the fine-tuned model, and selected the model with the smallest error rate as the best model.
The result in Figure 5 is the best error rate obtained by verifying the prediction effect of the model by
the verification set. When the number of samples gradually increases, the error of the pre-training
model in the validation set gradually becomes stable. The error rate of the 200 point is the highest
compared to the number of other samples. Because of the small size of the data set, it cannot establish
a better pre-training model, so the error is higher than others. The subsequent curve changes smoothly
and the error rate is stable at around 11%.
Algorithms 2018, 11, x FOR PEER REVIEW 8 of 11

Algorithms 2018, 11, 132 8 of 11


Algorithms 2018, 11, x FOR PEER REVIEW 8 of 11
Algorithms 2018, 11, x FOR PEER REVIEW 8 of 11

Figure 5. The error curve of the pre-training model in the validation set with different sample sizes.

The curve in
Figure Figurecurve 6 is thethe error rate curve obtained by training withthe test datasample
on the model. The
Figure5. 5.The
Theerror
error curveof of thepre-training
pre-training model
model inin the
the validation
validation set
set with different
different sample sizes.
sizes.
average error
Figure rateerror
5. The of thecurveverification data is model
of the pre-training 11.17%, andvalidation
in the the average error
set with rate ofsample
different the test data is
sizes.
10.55%,The which
curve is in
lower
Figurethan 6 isthetheerror
errorrate
rateof training
curve and verification
obtained by training the data.
testThedatalowest
on theerror
model. rate is at
The
The curve in Figure 6 is the error rate curve obtained by training the test data on the model.
theaverage
200
The error
point.
curve Duerate
in to ofthethe
Figure verification
small
6 is thesample datacurve
size,
error rate is 11.17%,
the and
error rate the
atbythisaverage error
point changes rate of thethe
significantly.test The
dataerror
is
The average error rate of the verification data isobtained
11.17%, and training
the averagethe test
error data
rateonof themodel.
test data The
10.55%,
rate of the which
1600 is lower
point is than
10%. the error
Although rate
the of training
pre-trainingand verification
time of this data.
point The
is lowest
shorter error
than rate is
others, at
average error
is 10.55%, rate is
which oflower
the verification
than the error datarateis of
11.17%,
training and
andthe average error
verification data. rate of the error
The lowest ratethe
test data is
the
error
10.55%,200
rate point.
is
which not Due
ishigher
lowerto the
than small
that
the sample
of
error size,
other
rate the
sample
of error rate
numbers.
training and at this
On point
the
verification changes
whole,
data. thesignificantly.
The error
lowestrate The
of
error theerror
ratesmall
is at
is at the 200 point. Due to the small sample size, the error rate at this point changes significantly.
therate
200of
sample
The
theof
size
point.
error
1600
the
Due
rate
point
ofdata
to
thethe
is
1600
10%.
issmall
larger
point
Although
than
sample the
is 10%.
the
size, pre-training
error
the
Althoughrate when
error
therate time
at of point
thethis
pre-training
this point
sample size
time
is shorter
ofofthis
changes datathan
thesignificantly.
point
others,
is shorter
is large. The the
error
than
error
rate rate is not higher than that of other sample numbers. On the whole, the error rate of the small
others, the error rate is not higher than that of other sample numbers. On the whole, the error rate ofthe
of the 1600 point is 10%. Although the pre-training time of this point is shorter than others,
sample size of the data is larger than the error rate when the sample size of the data is large.
error
the rate
smallissample
not higher
size of than
the that
data of other than
is larger sample the numbers.
error rate whenOn the thewhole,
samplethe sizeerror
of therate
dataofisthe small
large.
sample size of the data is larger than the error rate when the sample size of the data is large.

Figure 6. The error rate curve of the test data with the model.
Figure 6. The error rate curve of the test data with the model.
Figure 6. The error rate curve of the test data with the model.
Figure 7 shows the time curve of the model training test data. With the increase of the number
Figure 7 shows the time curve of the model training test data. With the increase of the number of
Figure
of samples, the7 shows
time ofthe
thetime
Figure 6.curve
modelThe of the
training
error model
test
rate training
data test
ofgradually data.
withWith
increases. the increase
Only of of
the time the1000
number
points
samples, the time of the model training test curve the test
data gradually data
increases.the model.
Only the time of 1000 points is
of samples, the time of the model training
is slightly higher than the other points around. test data gradually increases. Only the time of 1000 points
slightly higher than the other points around.
is slightly
Figure higher
7 shows than
thethe other
time points
curve around.
of the model training test data. With the increase of the number
of samples, the time of the model training test data gradually increases. Only the time of 1000 points
is slightly higher than the other points around.

Figure 7. The time curve of the model training test data.


Figure
Figure7.7.The
Thetime
time curve
curve of the model
of the modeltraining
trainingtest
testdata.
data.

Figure 7. The time curve of the model training test data.


Algorithms 2018, 11, x FOR PEER REVIEW 9 of 11

Experiments
Algorithms 2018, 11,show
132 that the application of DBN is effective and feasible in the 9 field
of 11 of
precipitation prediction.
The deep confidence network used in this article is implemented in the python platform using
Experiments show that the application of DBN is effective and feasible in the field of
the theano package,
precipitation which makes the program have very good scalability and facilitates the
prediction.
applicationTheofdeep
the program.
confidence The python
network usedplatform has is
in this article a large numberinof
implemented theopen source
python frameworks
platform using the that
can be applied
theano package,to big
whichdata processing,
makes the program and
havehas
verya good
very scalability
broad prospect in the
and facilitates thefield of bigofdata
application
processing.
the program. The python platform has a large number of open source frameworks that can be applied
As shown
to big in Figure 8,and
data processing, thehas
blue linebroad
a very indicates the time
prospect in therequired fordata
field of big DBN to make predictions. M-
processing.
As shown in Figure 8, the blue line indicates the time required
SVM is based on mesh optimization, GA-SVM is based on genetic algorithm optimization, for DBN to make predictions.
and PSO-
SVMM-SVM
is based is on
based on mesh
particle swarmoptimization,
optimization.GA-SVM is based
Because on genetica algorithm
it consumes lot of time optimization,
for the model
and PSO-SVM
establishment in theis based on particle
previous period,swarm
when optimization.
the model is used Becauseforitprediction,
consumes athe lot time
of time for the is
consumed
model establishment in the previous period, when the model is used for prediction, the time consumed
lower than all the SVM methods used to establish the model. Additionally, it is guaranteed to have a
is lower than all the SVM methods used to establish the model. Additionally, it is guaranteed to have
certain accuracy rate. In practice, it is feasible to establish a good model using historical data in
a certain accuracy rate. In practice, it is feasible to establish a good model using historical data in
advance, and and
advance, it will notnot
it will affect thethe
affect result
resultofofpredicting precipitation.Therefore,
predicting precipitation. Therefore, an an
SVM SVM method
method withwith
excellent performance
excellent performance cancan
be be
usedused when
whenthe thedata
data set is small,
set is small,andandthethehigh
high efficiency
efficiency of the
of the DBNDBN
method can be
method canused
be usedforfor
large-scale
large-scaledatadatasets.
sets.

Figure 8. Several model time comparison curves.


Figure 8. Several model time comparison curves.

DataData preprocessing
preprocessing cancan improve
improve thethe running
running resultsofofthe
results themodel.
model.After
Afterdata
datapreprocessing,
preprocessing, the
the result of the operation is relatively smooth, so it is very necessary to preprocess the source data.
result of the operation is relatively smooth, so it is very necessary to preprocess the source data.
According to Figures 4–8, in terms of the test set’s training accuracy, the accuracy of the DBN model is
According to Figures 4–8, in terms of the test set’s training accuracy, the accuracy of the DBN model
more stable than that of the PSO-SVM model. However, because the steps for building the model are
is more stablethe
different, than that of the
PSO-SVM PSO-SVM
optimizes model. However,
the parameters because
(C, g) on the theset.
training steps for uses
It then building the model
the value of
are different, the PSO-SVM optimizes the parameters (C, g) on the training set. It then
(C, g) on the test set to recreate the model. DBN establishes and fine-tunes the model on the training uses the value
of (C,set.
g) on thethe
Then, testmodel
set toisrecreate
directlythe model.
invoked onDBN establishes
the test set, whichand savesfine-tunes thebuilding
the time for model on thethe training
model.
set. Then, the model is directly invoked on the test set, which saves the time
Therefore, in the application of large data volume, the model can be trained in advance and thefor building the model.
Therefore,
responsein speed
the application
is faster. of large data volume, the model can be trained in advance and the
response speed is faster.
5. Conclusions

5. Conclusions
The result of this study is expected to contribute to weather forecasting for wide range of
application domains, including flight navigation, agriculture, and tourism.
The This
result of this study is expected to contribute to weather forecasting for wide range of
paper focuses on the increasing size of meteorological datasets, discusses the application of
application
big datadomains,
processingincluding
technologyflight
in thenavigation, agriculture,
meteorological and tourism.
field, and proposes a meteorological precipitation
This paper method
forecasting focusesbased
on theonincreasing size of
deep learning. meteorological
This method is based datasets,
on the discusses
deep beliefthe application
network and of
big data processing technology in the meteorological field, and proposes a meteorological
establishes a statistical model between precipitation features and other meteorological elements based
precipitation forecasting
on historical method
meteorological data.based
It useson deep learning.
meteorological Thistomethod
big data train the is basedfully
model, on tap
thepotential
deep belief
features between data elements, and achieve precipitation forecast based on meteorological
network and establishes a statistical model between precipitation features and other meteorological data.
The validity
elements based on of historical
the DBN model in precipitation
meteorological forecasting
data. It was verified big
uses meteorological by comparison
data to trainwith
the the
model,
fully tap potential features between data elements, and achieve precipitation forecast based on
meteorological data. The validity of the DBN model in precipitation forecasting was verified by
comparison with the classical machine learning prediction method. The research shows that the
forecasting method based on deep learning can overcome the shortcomings of traditional forecasting
Algorithms 2018, 11, 132 10 of 11

classical machine learning prediction method. The research shows that the forecasting method based
on deep learning can overcome the shortcomings of traditional forecasting methods, especially in the
context of big data, and it can better tap the value of air, like big data, and improve the application
effect of meteorological big data.

Author Contributions: J.D., Y.L. and Z.L. conceived and designed the experiments, analyzed the data and wrote
the paper.
Funding: This research was funded by the National Nature Science Foundation of China grant number [41575155],
and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).
Acknowledgments: The authors would like to thank the anonymous reviewers for their constructive comments,
which greatly helped to improve this paper.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Pan, Y.; Shen, Y.; Yu, J.J.; Xiong, A.Y. An experiment of high-resolution gauge-radar-satellite combined
precipitation retrieval based on the Bayesian merging method. J. Meteorol. 2015, 73, 177–186. [CrossRef]
2. Kondragunta, C.; Seo, D.J. Toward Integration of Satellite Precipitation Estimates into the Multisensor
Precipitation Estimator Algorithm. Available online: https://www.google.com.hk/url?sa=t&rct=j&q=&esrc
=s&source=web&cd=2&ved=2ahUKEwj2jrnJ4Z3dAhUOHXAKHWqoAD8QFjABegQICBAC&url=https
%3A%2F%2Fams.confex.com%2Fams%2Fpdfpapers%2F71020.pdf&usg=AOvVaw3O8d6_9aKJB8DEx073
1J-P (accessed on 3 September 2004).
3. Seo, D.J. Real-time esimation of rainfall fields using radar rainfall and rain gage data. J. Hydrol. 1998, 208,
37–52. [CrossRef]
4. Lin, F.H.; Liang, L.I.; Chen, L.B. Theoretical Forecast Model of Rainfall and Its Application in Engineering.
China Railway Sci. 2002, 23, 62–66.
5. Qian, H.; Li, P.Y.; Wang, T. Precipitation Predictionon Shizuishan City in Ningxia Province Based on Moving
Average and Weighted Markov Chain. J. North China Inst. Water Conserv. Hydroelectr. Power 2010, 31, 6–9.
6. Cui, L.; Chi, D.C.; Qu, X. Application of Smooth and Steady Time Series Based on Wavelet Denoising in
Precipitation Prediction. China Rural Water Hydropower 2010, 34, 30–32, 35.
7. Cui, D.Y. Application of Combination Model in Rainfall Prediction. Comput. Simul. 2012, 29, 163–166.
8. Wang, T.; Qian, H.; Li, P.Y. Prediction of Precipitation Based on the Weighted Markov Chain in Yinchuan
Area. South-to-North Water Transf. Water Sci. Technol. 2010, 8, 78–81. [CrossRef]
9. Wang, Y.; Mao, M.Z. Application of Weighted Markov Chain Determined by Optimal Segmentation Method
in Rainfall Forecasting. Stat. Decis. 2009, 11, 17–18. [CrossRef]
10. Zhong, Y.J.; Li, J.; Wang, L. Precipitation Predicting Model Based on Improved Markov Chain. J. Univ. Jinan
(Sci. Technol.) 2009, 23, 402–405. [CrossRef]
11. Ren, Y.; Xu, S.M. Gray neural network combination model Application of Annual Precipitation Forecast in
Qingan County. Water Sav. Irrig. 2012, 9, 24–25, 29.
12. Hsu, K.L.; Gupta, H.V.; Sorooshian, S. Artificial neural network modeling of rainfall-runoff process.
Water Resources Res. 1995, 31, 2517–2530. [CrossRef]
13. Liu, J.N.K.; Hu, Y.; He, Y.; Chan, P.W.; Lai, I. Deep Neural Network Modeling for Big Data Weather Forecasting,
Information Granularity, Big Data, and Computational Intelligence; Springer International Publishing: Cham,
Switzerland, 2015; pp. 389–408.
14. Belayneh, A.; Adamowski, J. Standard Precipitation Index Drought Forecasting Using Neural Networks,
Wavelet Neural Networks, and Support Vector Regression. In Applied Computational Intelligence and Soft
Computing; Article ID 794061; Hindawi Publishing Corporation: New York, NY, USA, 2012; 13p.
15. Afshin, S. Long Term Rainfall Forecasting by Integrated Artificial, Neural Network-Fuzzy Logic-Wavelet
Model in Karoon Basin. Sci. Res. Essay 2011, 6, 1200–1208.
16. Maqsood, I.; Khan, M.R.; Abraham, A. An ensemble of neural networks for weather forecasting.
Neural Comput. Appl. 2004, 13, 112–122. [CrossRef]
17. Valipour, M. Optimization of neural networks for precipitation analysis in a humid region to detect drought
and wet year alarms. Meteorol. Appl. 2016, 23, 91–100. [CrossRef]
Algorithms 2018, 11, 132 11 of 11

18. Ha, J.H.; Yong, H.L.; Kim, Y.H. Forecasting the Precipitation of the Next Day Using Deep Learning. J. Korean
Inst. Intell. Syst. 2016, 26, 93–98. [CrossRef]
19. Du, J.L.; Liu, Y.Y. A Prediction of Precipitation Data Based on Support Vector Machine and Particle Swarm
Optimization (PSO-SVM) Algorithms. Algorithms 2017, 10, 57. [CrossRef]
20. Vapnik, V.N. The Nature of Statistical Learning Theory, 1st ed.; Springer: New York, NY, USA, 1995;
ISBN 978-1-4757-2442-4.
21. Ahmadi, M.A.; Golshadi, M. Neural network based swarm concept for prediction asphaltene precipitation
due to natural depletion. J. Pet. Sci. Eng. 2012, 98–99, 40–49. [CrossRef]
22. Vapnik, V.N.; Chervonenkis, A. On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities,
Theory of Probability and Its Applications; Springer International Publishing: New York, NY, USA, 1971;
pp. 264–280, ISBN 978-3-319-21851-9.
23. Bengio, Y.; Delalleau, O. On the expressive power of deep architectures. In International Conference on
Algorithmic Learning Theory; Springer: Berlin/Heidelberg, Germany, 2011; pp. 18–36.
24. Liu, W.; Wang, Z.; Liu, X. A survey of deep neural network architectures and their applications.
Neurocomputing 2016, 234, 11–26. [CrossRef]
25. Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006,
313, 504–507. [CrossRef] [PubMed]
26. Hinton, G.E.; Osindero, S.; The, Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006, 18,
1527–1554. [CrossRef] [PubMed]
27. Bengio, Y.; Lamblin, P.; Popovici, D.; Larochelle, H. Greedy Layer-Wise Training of Deep Networks.
In Proceedings of the Advances in Neural Information Processing Systems 19 (NIPS 2006), Vancouver,
BC, Canada, 4–7 December 2006; pp. 153–160.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Das könnte Ihnen auch gefallen