You are on page 1of 6

2016 International Conference on Computational Intelligence and Networks

WEATHER MONITORING USING ARTIFICIAL INTELLIGENCE


T.R.V.ANANDHARAJAN1G.ABHISHEK HARIHARAN2,K.K.VIGNAJETH3,R.JIJENDIRAN4KUSHMITA5
1,2,3,4,

Velammal Institute of Technology, Chennai, Tamil Nadu, India


1
Assistant professor,2,3,4,5Graduate student
1
2
{ trvanandharajan abhishekhariharan1995 3vignajeth 4jijendiranravichandran 5kushmithavijay}
@gmail.com
Abstract - Weather forecasting is rather a
statistical measure than a binary decision. We
intend to develop an intelligent weather
predicting module since this has become a
necessary tool. This tool considers measures
such as maximum temperature, minimum
temperature and rainfall for a sampled period
of days and are analyzed. An intelligent
prediction based on the available data is
accomplished
using
machine
learning
techniques. The analysis and prediction is based
on linear regression which predicts the next
days weather with good accuracy. An accuracy
of more than 90% is obtained, based on the
dataset. Recent studies have reflected that
machine learning techniques achieved better
performance
than
traditional
statistical
methods. Machine learning, a branch of
artificial intelligence has been proved to be a
robust method in predicting and analyzing a
given data set. The module plays a vital role in
agricultural, industrial and logistical fields
where the weather forecast is an important
criterion.

role in every day to day aspect, utilizing the needs


of a common man to research scientists. This
explains why forecasting cannot be predicted with
simpler means. In the present times there are high
definition satellite images to accurately predict the
forecast of the upcoming days, but the process is
neither simple nor economical. Here this module
helps us to predict the weather using the past data
and analyze it with a good rate of accuracy and
proves to be a simple one .The module involves the
use of concepts related to artificial intelligence and
machine learning tools. Among the various tools,
we have chosen linear regression technique. One
thing which is to be done mandatorily by the user is
to update the previous days weather paramaters or
else the module fails to apply linear regression to
predict, as every tool dealing with machine learning
involves the constant renewal of past data.
II. RELATED WORKS
The authors in [1] dealt with the
prediction of atmospheric temperature using
Support vector machine. This helped to understand
about defects of SVM. The prediction interval
using hydrological data which helped us to know
about the uncertainty was discussed in (2). The
authors in [4], predicted the amount of solar energy
generated using weather forecast provided an
example of how to use the forecast in daily life.
Prediction of the maximum temperature using
support vector machine helped us in the prediction
temperature process was discussed in (3). In [5],
the authors gave an intuition of different kernels
used in support vector machine. Forecasting using
artificial neural network provided on the use of
forecasting using ANN (7).

Index Terms: Weather forecasting. Machine


learning. Artificial intelligence. Linear regression.

I. INTRODUCTION
Weather prediction, in general, is a complex
process and challenging task. It requires various
parameters to forecast the weather. Monitoring and

From the literature survey we got


intuition of how to process our work further.

predicting weather helps in various fields like


agriculture,
travel,
pollution
dispersal,
communication,
disaster
management,
etc.
Henceforth the forecasting of weather plays a vital
2375-5822/16 $31.00 2016 IEEE
DOI 10.1109/CINE.2016.26

106

an

III. FUNCTIONS

The normal equation plays a vital role in predicting


the probability of the forecasted day, (i.e.) whether
it was a rainy or sunny or cloudy day.

A. HYPOTHESIS:

D MULTICLASS CLASSIFICATION:

To describe a supervised learning


problem, the output is predicted according to the
given inputs. We are aware of how the output
arrives for the particular input. In hypothesis
function, the aim is to predict a hypothesis which is
as close as to the output.. To be simply told the cost
function renders hypothesis which has the least
distance or measure to the output.

h(x) = h(x) = 0 + 1X1 +. n Xn

In the multi class classification the


classified output values are more than two cases,
i.e. more than the usual 0s and 1s. In this case we
need to predict one of the three classes, sunny or
rainy or cloudy. Hence we go for multi class
classification which uses a the algorithm of logistic
regression, except the fact that it can handle more
than 2 classes.

(1)

Where h(X) is the hypothesis function, X is the


input (in matrix) and theta () is the parameter
corresponding to X. Theta are set of values which
makes the mean square error minimum.

IV.FLOWCHART

Weather Data

B COST FUNCTION:
is a mathematical
Cost function ()
function which makes the above hypothesis to be
closer to the output. So this function minimises the
mean square error. The cost function is explained
as given in equation 2.


() =  
((
) )^2

Labelling data to
different climate

---(2)
Cost function

Where J () is the cost function(x) is the hypothesis


function, m is the number of training examples and
y is the output. The cost function should denote the
minimum distance between the hypothesis curve
and the output curve.

Gradient descent

Hypothesis
C GRADIENT DESCENT:
Gradient descent is a differential equation
which minimises the theta( ) value in order to
minimise the cost function after repeated iterations.
In matlab the following function represented by
equation (3) is implemented to converge the value
of the cost function . This can be made only by
minimising the theta value in consecutive steps.

Error verification

Training
set

j :=j - J()

Cross validation
set

--- (3)
Here j is the theta value of a particular iteration
value and J() is the cost function

Weather
prediction

D NORMAL EQUATION:

The normal equation gives the best value of theta


for the hypothesis without any need for Iterations
as done in gradient descent.

 = (XT X)-1 XTY ----(4)


107

Test set

V. METHODOLOGY
Weather cannot be predicted with good percentage
of accuracy. It is an art to forecast weather with
very low deviations and making it to fetch good
results. But weather forecasting tends to deviate
more and has moderate accuracy.

The error verification process is explained


in the error verification and detection column.

VI.WEATHER CLASSIFIER ANALYSIS


The plots in figure (Fig 1.3, Fig
1.4, and Fig1.5) show the curves obtained by
plotting max temp (Celsius) Vs number of days,
min temp (Celsius) Vs number of days and rainfall
(in mm) Vs number of days. The curves are
obtained by evaluating the cost function along with
gradient descent. The theta obtained is used for
plotting the curve(hypothesis).. The curved plot is
due to the polynomial function used in plotting the
data.

The entire results published is done


using the MATLAB tool by implementing
Vectorization concept. Initially a 3 weeks data of
max temperature minimum temperature rainfall
and the corresponding day is put in column of a
matrix and the matrix is represented as X. The type
of the day is represented as 1for sunny, cloudy as
2 and rainy as 3.They are put in a column and the
matrix is represented as Y. Initially the values of
theta are zeros which results in hypothesis (h(x)) as
zero. The hypothesis and Y is used to find the value
of cost function .The obtained(cost function from
equation 2) is sent to gradient descent (equation
3)and the updated theta values are acquired . Again
they are fed into the cost function to get the new
cost function. This process is repeated for several
iterations until the precise value of theta and cost
function is obtained. It has to be the least among all
the iterated values. The obtained theta is plotted
with against the number of days which is the
hypothesis curve. The predicting days maximum
temperature is found by finding the value of y(max
temperature) in the plot by substituting the value of
x(the corresponding day) by extending the
hypothesis curve. Similarly the same procedure is
done for finding minimum temperature and rainfall.
The values of predicted max ,min temperatures and
rainfall are multiplied with theta to get a value
which will be in the range of (1 to 3) as the sunny
day represented by 1 ,cloudy represented by 2 and
rainy as 3.

Fig. 1.3 Maximum temperature Vs Days

In the process finding a good value of theta is


the major task for predicting the type of the day.
The value of theta found can also be done using
normal equation. In finding the value of theta from
normal equation the X and Y matrices are
implemented in the equation(4).The obtained value
of theta is then multiplied with the predicted value
of the max, min temperature and rainfall to get the
value of the type of day.(Sunny or cloudy or rainy).

Fig. 1.4 Minimum temperature Vs Days

108

using all the values of Cross validation set data


through gradient descent but the concept of stack is
implemented with each cross validation data loaded
at a time to find cost function values.Note :Here the
value is obtained using the theta from gradient
descent .Where the entire cross validation set data
was used without the concept of stack.

The training and cross validation set errors for


maximum temperature, minimum temperature and
rainfall are shown in Fig.1.6, Fig1.7 and Fig1.8
respectively.
Fig. 1.5 Rainfall vs. days:
The following hypothesis in fig(1.3,1.4,1.5) will be
a curve, which has the least distance
(perpendicular) between itself and the output
points. The gradient descent undergoes numerous
iteration to minimize the value of theta which is
gives the shape of the curve (hypothesis).
A. ERRORS & THEIR DETECTIONS:
The training set error and cross validation
error are calculated in order to understand the
errors present in the prediction. The error normally
be because of overfitting (high variance) or
underfitting (high bias) of curves. A 20% of data is
taken for the cross validation set and another 20%
of data is taken for the test set. Remaining 60% of
data is evaluated for the training set.

Fig. 1.6. Error verification for Maximum temperature vs days

The cross validation set data and


the training set data are plotted by having a number
of training examples in the x axis and error in the y
axis. By doing so, we can infer whether the
hypothesis is suffering from high bias or high
variance.. Both are subjected to increase the error
and hence should be compensated. Suppose if the
training set error is high, then the hypothesis is said
to have high bias. If the cross validation set error is
high, then the hypothesis is said to have high
variance.
The values for plotting the curve of error
vs number of training examples for training set is
obtained by having the 60% of training set.Each
training set data is loaded in a stack one at a time
and its cost function is obtained by computing the
theta from the data in stack through gradient
descent. The obtained cost function are the values
plotted in the error curves .After the cost function
value is obtained an another data of training set is
loaded into the stack and the procedure is repeated
for the entire data to obtain the error curves
In the case of cross validation error
the error plot is obtained by costfunction values
where the theta used in cost function is obtained by

Fig 1.7. Error verification for Minimum temperature Vs days

Fig. 1.8 Error verification for Rainfall Vs Days

109

All the curves in the plot should have to tend


towards x axis of the plot and should have to
touch the x axis as the number of training examples
increases.The height of the curves edge at the end
from the x axis determines the value of error that
occurs in prediction.but in the plots we can observe
that the curve touches the x axis showing that there
will be no error in the prediction process.

From the result through normal equation and linear


regression with multiple variables we can conclude
that the forecasted weather is going to be a cloudy
day with 89% chance of being a rainy day. Though
the regression module tends to fit the data well and
predicts accurate result a small amount of error is

VII. RESULTS
The prediction for the next days weather
which follows maximum, minimum, rainfall and
the type of the day (whether it is sunny or cloudy or
rainy) is predicted with a good rate of accuracy
using the plots obtained from fig(1.3,1.4,1.5)
through hypothesis by getting the value of yaxis
from the curve by substituting the x axis value

Parameters or
Features:
Max.Temperature
Min Temperature
Rainfall
Type of the day

22nd day
prediction
27.8138
25.42222
6.8656
Rainy day

Theta 4

11.447015

-0.457802

0.176650

0.007970

Theta computed from gradient descent


Theta 2

Theta 3

Theta 4

-0.858447

0.183849

0.086995

34

33.57

1.26%

30

28.95

3.5%

0%

CONCLUSION
The results can also be verified with multi class
classification using logistic regression and with
artificial neural network. But the disadvantage of
using artificial neural network and multi
classification is that, they give us an output of the
day and not the nearest value probability of how
the day is going to be. The support vector machine
can also be used in predicting the data and works
best when there are a large number of features and
classifications present but redundant features must
be avoided .The only effort which is to be taken by
the user is to update the data set in order to show
accurate results. And it works well when the data
set is considerably large enough to provide at least
one-seventh of the data to be forecasted. For
example, if the data set has been taken for 365
days, the forecasted weather will be accurate for
the first 52 days. When there are more features and
more training examples (data set) the forecast
works best. The module helps in monitoring and

Using linear Regression with Multiple


variables:

2.00000

Max
temp:
Min
temp:
Rainfall:

The above Table 1.3 shows the prediction of the


22th and 23th days parameters

The Forecast is = 2.899958e+00

Theta 1

%
DEVIATION

Table 1.3. FORECASTED WEATHER:

Using Normal Equation: Theta computed from


the normal equations:

Theta 3

PREDICTED
VALUE

Tabel.1.2. ERROR VALUE

which is the date of the next day.On finding the


forecast using the predicted values of maximum
temperature, minimum temperature and rainfall we
get the following output in the two forms:
For the 22nd day:

Theta 2

ACTUAL
VALUE

observed. The error is predicted by running the


module for the parameters of past days and
comparing it with the data present in the dataset
.The error for the 12th day is given as follows in
Tabel.1.2

23rd day
prediction
26.5903
24.6687
8.5715
Rainy day

Theta 1

METRIC

The Forecast is = 2.893347e+00

110

predicting the weather with a good rate of accuracy


, particularly in an efficient manner.

ACKNOWLEDGEMENT
The idea and the implementation of this
paper wouldnt be successful without the help of
our supporting professor Dr.T.R.V.Anandharajan
M.Jeeva bharathi and the preceptor Dr.AndrewNg.

REFRENCES
[1]Y.Radhika.,M.Shashi.:AtmosphericTemperature
Prediction using Support Vector Machines Vol. 1,
No. 1, April 2009
[2] Durga L.,Shrestha.,Dimitri P. Solomatine
.:Machine learning approaches for estimation of
prediction interval for the model output. Neural
Networks 19 (2006) 225235
[3]
Paniagua-Tineoa,S.,Salcedo-Sanz a., C.
Casanova-Mateoc.,E.G. Ortiz-Garcaa.,M.A. Cony
b.,E. Hernndez-Martnc.: Prediction of daily
maximum temperature using a support vector
regression algorithm. Renewable Energy 36 (2011)
3054e3060.
[4] Navin Sharma.,Pranshu Sharma., David Irwin.,
Prashant Shenoy.: Predicting Solar Generation
from
WeatherForecasts
Using
Machine
Learning.(IEEE SmartGridComm)
[5]N.
Cristianini.,
J.
Shawe-Taylor.:An
Introduction to Support Vector Machines and Other
Kernel-based
Learning
Methods.
CambridgeUniversity Press, 2000.
[6] Kabir Rasouli , William W. Hsieh , Alex J.
CannonDaily,: streamflow forecasting by machine
learning methods with weather and climate inputs.
Journal of Hydrology 414415 (2012) 284293
[7] Dr. S. Santhosh Baboo, I.Kadar ShereefAn:
Efficient Weather Forecasting System using
Artificial Neural Network. IJESD Vol. 1, No. 4,
October 2010

111