Sie sind auf Seite 1von 5

Name : 維里雅 / Liadira Kusuma Widya

Student ID : P660087047

Homework 3

Air pollution is a serious problem in the city. In the homework, please finish the task
for PM2.5 prediction considering neural network with back propagation. You need to
consider the RMSE for model performance using more than 100 testing data
(independent to training data).

1. Please tell us how to deal with missing data.


Answere :
To develop regression model should be determined the variable. We can get the
coefficient value that is no bias with in imputation using constant value using
package Hmsic in R programme. In the other method, we can use mean/median
value to input in missing value data/missing data. OR. We can do it with in NaN
value (do nothing with the data).

2. Consider the different network settings, such as hidden layer number and neural
number each layer. Please tell us about the best one.
Answere :
Table 1. Statistical results of PM2.5 year 2017 and temperature network architecture settings
Setting Neurons
Iterations MSE RMSE
Number Layer 1 Layer 2 Layer 3 Layer 4
1 18 10 8 223.365 14.9453906
2 9 10 8 5 2 243.491 15.6042014
3 12 10 10 10 217.886 14.7609756
4 14 8 10 8 232.174 15.2372406
5 14 15 25 10 10 220.096 14.8356429
6 7 25 15 10 5 246.813 15.7102737
7 19 10 10 10 10 215.69 14.6864019
Based on 7 different network settings were conducted to observe the relationship
between PM2.5 and temperature. These settings have different number of layer(s) and
different number of neuron(s) in each layer. The statistical results (root mean squared
error (RMSE)) shown in Table 1., the last (seventh) setting better than the first six
settings. The last setting has 14.686 of RMSE. The last setting used the same value in
each layer, with 19 iterations.
Name : 維里雅 / Liadira Kusuma Widya
Student ID : P660087047

(a) (b)

Figure 1. (a) The Best (Setting number 7) Neural Network Training Performance;
(b) The Best (Setting number 7) Neural Network Training Regression

3. Consider the different variable combinations, such as temperature (AMB_TEMP),


RH (relative humidity), wind direction (WIND_DIREC) and speed
(WIND_SPEED). Please list the performance in each case.
Table 2. Statistical results of PM2.5 year 2017 and different variable network architecture
settings
Neurons
Setting
Iterations Input Layer Layer Layer Layer Output MSE RMSE
Number
1 2 3 4

1 10 temperature 213.592 14.61479

relative
2 14 250.0952 15.8144
humidity
10 10 10 10 PM2.5
wind
3 22 243.4932 15.60427
direction

4 22 wind speed 222.9005 14.92985

Based on 4 variable on Table 2.The variable network architecture setting are


temperature, relative humidity, wind direction, wind speed. Based on different
input the best variable is correlation PM2.5 with temperature. Based on the best
setting is 10 10 10 10 in each layer, the RMSE is 14.615 in temperature between
PM2.5. in Figure 2 showed the best training performance and regression for
temperature as the best input variable.
Name : 維里雅 / Liadira Kusuma Widya
Student ID : P660087047

(a) (b)

Figure 2. (a) The Best (Setting number temperature) Neural Network Training
Performance; (b) The Best (Setting number temperature) Neural Network
Training Regression

4. Could you predict the PM2.5 concentration using neural network based on your
best setting and variables? Not only predicting PM2.5 concentration at t+1, but
also predicting the concentration at t+2, t+3, t+4, t+5. Please list the error e.g.
RMSE at t+1, t+2, t+3, t+4, t+5.
Answere :
Table 3. Statistical results of PM2.5 concentration prediction
Neurons
t Iterations MSE RMSE
Layer 1 Layer 2 Layer 3 Layer 4
1 15 161.599 12.7121635
2 11 168.22 12.9699499
3 11 10 10 10 10 191.837 13.8505235
4 14 176.84 13.2981239
5 9 195.797 13.992741
The neural network is used to estimate or predict the PM2.5 concentration between
temperature. The statistical results (root mean squared error (RMSE)) shown in
Table 3., the neural network is able to predict the PM2.5 concentration between
temperture. The best RMSE results are predictions when t + 1( see. Table 3),
because when the prediction time is longer then cause uncertainty that causes the
RMSE getting worse.
Name : 維里雅 / Liadira Kusuma Widya
Student ID : P660087047

(a) (b)

Figure 3. (a) The Best (t+1) Neural Network Training Regression; (b) The Best
(t+1) Neural Network Training Performance

5. Compare to linear regression. Could you predict the PM2.5 concentration using
linear regression? Please list the error e.g. RMSE at t+1, t+2, t+3, t+4, t+5.
The statistical result of PM2.5 between temperature using ANN with OLS. From
both method, the best method is using ANN. By using ANN it is known that the
RMSE value of the best setting (t+1) is 6.163181, while using OLS the RMSE
value is 6.831984.

Figure 4. Comparison between ANN and OLS

Please provide the output result, attach your code and data.
Name : 維里雅 / Liadira Kusuma Widya
Student ID : P660087047

Das könnte Ihnen auch gefallen