Beruflich Dokumente
Kultur Dokumente
Original Article
not. The author uses Augmented Dickey-Fuller (ADF) and Philips-Perron (PP) methods to test the stationarity of the
checked by mean absolute deviation (MAD), mean squared error (MSE), root mean squared error (RMSE), and mean
absolute percentage error (MAPE) formulas with different progressive months. Generally speaking, the errors calculated
by MAD, MSE, and RMSE equations are minimum if each time forecast three months. As for the MAPE method, a
severe penaltymay occur if the observed earthquake number is small, even failed if no earthquake is observed in a
month, because zero will show in denominator.By using ARIMA(2,0,0) as a seismic pattern to forecast, the author
obtains that two earthquakes per month will occur in Hualien from June to September, and one earthquake will happen
from October to December in year 2016.
KEYWORDS: CWB Archive, ARIMA (p,d,q) Model, Seismic Pattern
Received: Sep 23, 2016; Accepted: Oct 21, 2016; Published: Oct 24, 2016; Paper Id.: IJCSEIERDDEC20161
INTRODUCTION
Hualien County, which is located on the eastern coast of Taiwan, is the area with the most frequent
earthquakes in Taiwan. In the 257 months from January 1995 to May 2016, totally 1,248 out of 3,063 labeled
earthquakes, which accounts for 40.74%, occurred in that county. In the seismic archive of the Central Weather
Bureau (CWB) of Taiwan, two kinds of records are reported, labeled and unlabeled. The unlabeled ones are those
which are smaller in magnitude and cause influence only locally. On the other hand, the labeled ones are stronger
and affect more than two counties in twenty municipal areas of Taiwan. The labeled sequence always start from
www.tjprc.org
editor@tjprc.org
Ko-Ming Ni
number one in the beginning of a new year. In this study, only labeled earthquakes in Hualien retrieved from CWB archive
are taken into consideration.
A time series is stationary if its mean and variance are constant over time, and if the covariance between two
values from the series depends only on the length of time separating the two values, and not on the actual times at which
the variance are observed (Hill et al.). If a time serious is not stationary, a danger of unrelated data may have significant
regression result. Such regressions are said to be spurious (Hill et al., Hanke and Wichern). In this study unit-root tests,
such as Augmented Dickey-Fuller (ADF) and Philips-Perron (PP) methods are used to check the stationarity of the time
series of the monthly number of earthquakes from January 1995 to May 2016.
Forecastingis an interesting topic. Dozens of algorithms from simplest nave to more advanced neural networks,
have been proposed (Hanke and Wichern). The author uses one of the most popular methods among them, ARIMA
(Autoregressive Integrated Moving Average) method, to forecast the number of earthquakes per month in Hualien, because
it is versatile and can forecast stationary, trend, cyclical, and even seasonality (Hanke and Wichern, Hyndman and
Athanasopoulos).Furthermore, with different combination of autoregressive order (p), integration degree (d), and moving
average order (q) terms, ARIMA(p,d,q) can be equivalent to methods such as simple exponential smoothing, Holds, and
Winters (Hyndman and Athanasopoulos). As long as a suitable ARIMA(p,d,q) model is found, then it can be used to
forecast the monthly number of earthquakes for Hualien in the near future. The autocorrelation function (ACF) and
Ljung-Box Q (LBQ) statistic are used to check the randomness of the residuals of the regression results in order to confirm
the suitability of the used ARIMA(p,d,q) model.
50
40
30
1
1
1
20
1
1
1
1
11
10
UCL=15.18
_
X=4.86
0
LCL=-5.47
M
95
19
01
1
03
05
07
09
11
03
05
07
/0
M
M
M
M
M
08
0M
2M
4M
97
99
01
03
05
20
01
01
01
19
19
20
20
20
2
2
2
m
Su
Year/Month
Figure 1: Number of Earthquakes Per Month in Hualien from January 1995 to May 2016
Impact Factor (JCC): 6.3724
Randomness
From Figure 1, one finds that the mean value of the number of earthquakes per month in Hualien is 4.86 times.
From January 1995 to May 2016 is twenty one years and five months, or more specifically, 257 months. During this time
period, there are only fourteen (14) months with numbers above three standard errors of mean, which is 15.18 times. An
interesting finding is that the month with large number of earthquakes (means above three standard errors) month may be
followed smaller number of earthquakes in the next few months, usually, below average. For example, in June 2012 there
were 51 earthquakes, but next month there were only three (3), which was below the average of 4.86. This phenomenon is
obvious because an earthquake is energy released from the crest. More earthquakes stand for more released energy; hence,
a blundering month is expected after an active one.
The randomness can be checked by whether the autocorrelation coefficients (rk)between observed monthly
number of earthquakes at time t (yt ) and any time lag k (yt-k) are close to zero or not (Hanke and Wichern).
The autocorrelation coefficient has a form like (Hanke and Wichern):
n
rk
t k 1
y yt k y
y
t 1
, k 0,1, 2,....n
(1)
Where
rk
yt
The autocorrelation function (ACF, which is the collection of many autocorrelation coefficients
rk
, k=0, 1, 2) of
the time series of the monthly number of earthquakes in Hualien from January 1995 to May 2016 is shown as follows:
Autocorrelation Function for Hualien
Autocorrelation
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1
10
15
20
25
30
35
40
45
50
55
60
Lag
editor@tjprc.org
Ko-Ming Ni
One common portmanteau test is based on the Ljung-Box Q (LBQ) statistic. If the autocorrelations are computed
from a random (white noise) process, the LBQ has a chi-square distribution with m (the number of time lags to be tested)
degrees of freedom (Hanke and Wichern). The LBQ of different lags of the time series of Hualien is in Table A1 in
Appendix A. Based on the calculated values of LBQ and compare with the chi-squared table, the author can judge that the
time series of the monthly number of earthquakes in Hualien is not random, because the calculated LBQ values are much
larger than the chi-squared values in the related m degrees of freedom at 95% significance level.
Stationarity
A stationary variable is one that is not explosive, nor trending, and nor wandering aimlessly without returning to
its mean (Hill et al.). One can check stationarity of a time series by visual inspection of Figure 1, or by more formal tests,
such as unit-root tests. Two unit-root tests are used to check the stationarity of a time series in this paper, Dickey-Fuller
(DF) and Philips-Perron (PP) tests. The Dickey-Fuller test has a number of variety forms as in Appendix B, and generally
referred as the Augmented Dickey-Fuller (ADF) test (Hill et al.).
The Dickey-Fuller Critical Values
To test the hypothesis in all the cases, we simply estimate the test equation by least squares and examine the
t-statistic for the hypothesis that 0 (Equation B5). Unfortunately, this t-statistic no longer has the t-distribution, rather,
tau ( ) statistic has to be used(Hill et al.). The critical values of tau ( ) statistic are givenin Table B1 in Appendix B.
Augmented Dickey-Fuller (ADF) and Philips-Perron (PP) Tests
The critical values of the Augmented Dickey-Fuller (ADF) test of the time series of the monthly number of
earthquakes in Hualien are shown in Table 1.
Table 1: The critical values and Dickey-Fuller unit-root test results
( t ) Test Statistic
-12.742
1% Critical Value
5% Critical Value
-3.990
-3.430
MacKinnon approximate p-value for ( t ) = 0.0000
From the above table, one finds the ( t ) test statistic -12.742 < -3.430 (5% critical value), the hypothesis test
H 0 : 0 (nonstationary) is rejected, and H 1 : 0 (stationary) is not rejected. In other words, the time series of the
monthly number of earthquakes per month is a stationary one.
For making a double check, the author also run the Philips-Perron (PP) unit root test. The critical values are in Table 2.
Table 2: Philips-Perron Unit-Root Test for Stationarity of Time Series of Earthquakes per Month in Hualien
(t ) Test Statistic
(
t
)
MacKinnon approximate p-value for
= 0.0000
-12.757
(stationary) is not rejected. Hence,that the time series of earthquakes per month in Hualien is stationary can be confirmed.
1.0
0.8
Autocorrelation
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1
10
15
20
25
30
35
40
45
50
55
60
Lag
2
0.05
= 27.59 (dof=17)> 24.89 ( lags=20) in Table D1. It mean that the time series of residuals for
the ARIMA(2,0,0) model forecasting is a white noise (randomness). More specifically, ARIMA (2,0,0) is a suitable model
to forecast Hualiens monthly number of earthquakes.
M AD
www.tjprc.org
1
n
i 1
y t y t
(2)
editor@tjprc.org
Ko-Ming Ni
M SE
1
n
y
i 1
1
n
RM SE
M APE
Where
1
n
y t
y
i 1
y t
y t y t
t 1
yt
(3)
2
(4)
(5)
yt is the observed monthly number of earthquakes and y t is the forecasted monthly number of earthquakes
in Hualien.
By using ARIMA(2,0,0) as the forecasting model, and taking all the data from 1995 to 2012 as a basis file, using
them to forecast number of earthquakes per month in 2013; adding the observed 2013 data to the basis file, and using it to
forecast the numbers in year 2014, and so on and so forth. This procedure repeated until forecasting is finished. For
comparisonof the effect of progressing time on the accuracy of a forecasting, four kinds of progressing time periods are
selected, they are, one (1) month, three (3) months, six (6) months and 12 months. In the following tables, the number in
the parenthesis denotes how many months will be forecasted in each analysis. Say, MAD (3) means in each calculation the
data for every three monthswill be forecasted, and used them to compare with the observed data and check the errors by the
mean absolute deviation (MAD) method. From January 2013 to May 2016different error calculation methods are
expressedfrom Tables 3 to 6.
Table 3: Error Evaluation by the Mean Absolute Deviation (MAD) Method with
Different Progressing Time Periods from Year 2013 to May 2016
2013
2014
2015
2016
MAD(1)
4.72
2.88
3.51
6.31
MAD(3)
4.07
2.65
3.72
5.14
MAD(6)
4.26
2.37
3.50
6.89
MAD(12)
6.67
2.94
3.87
7.03
From Table 3, one finds the MAD errors of each time forecasting 12 months are the largest in comparison with
each time progressing one, three and six months in each year. However, each time progressing one month (MAD(1)) does
not get the most accurate results as one expected. Forecasting three months (MAD (3)) each time is the most accurate in
the analysis, except in year 2015.
Table 4: Error Evaluation by the Mean Squared Error (MSE) Method with
Different Progressing Time Periods from Year 2013 to May 2016
2013
2014
2015
2016
MSE(1)
57.2
9.81
18.2
56.6
MSE(3)
52.1
8.09
18.6
36.6
MSE(6)
55.9
8.51
17.8
62.9
MSE(12)
93.0
14.4
35.4
89.9
Same as MAD, forecasting three months (MSE (3)) each time is the most accurate in the analysis, except in year
2015.
Table 5: Error Evaluation by the Root Mean Squared Error (RMSE) Method with
Different Progressing Time Periods from Year 2013 to May 2016
2013
2014
2015
2016
RMSE(1)
7.56
3.13
4.26
7.53
RMSE(3)
7.22
2.84
4.31
6.05
RMSE(6)
7.47
2.92
4.22
7.93
RMSE(12)
9.65
3.80
5.95
9.48
Same as MAD, forecasting three months (RMSE (3)) each time is the most accurate in the analysis, except in year
2015.
Table 6: Error Evaluation by the Mean Absolute Percentage Error (MAPE) Method with
Different Progressing Time Periods from Year 2013 to May 2016
2013
2014
2015
2016
MAPE(1)(%)
57.77
143.9
136.2
178.0
MAPE(3)(%)
41.33
142.0
160.4
168.9
MAPE(6)(%)
41.42
148.8
142.4
219.1
MAPE(12)(%)
73.40
68.88
65.30
130.5
By checking the equation of MAPE (Equation 5), if the denominator is zero (no earthquake in that month), this
algorithm fails. If the forecasted number differs from the observed one too much, the result of this equation will get severe
penalty. From Table 6, the results are most accurate when progressing 12 months in each forecasting (MAPE(12)) except
2013.In the MAPE(12) column, the average errors of the monthly number of earthquakes forecasted in Hualien from year
2013 to 2015 are between 65.30 to 73.40%, and the average error of the first five months in 2016 is 130.48%, almost twice
as those in the previous three years. The reason may be due to the fact that the volatility of earthquake data of the first five
months in 2016 makes ARIMA (2, 0, 0) model difficult to trace. The number of earthquakes from January to March, 2016
were less than two times per month, but in April and May there were 16 and 17 times respectively. Those are three standard
errors above the mean value, making ARIMA (2,0,0) respond sluggishly. The number of earthquakes in the remaining
months of year 2016 can be forecasted by ARIMA (2,0,0) model as follows:
Table 7: Forecast the Monthly Number of Earthquakes by
ARIMA(2,0,0) Model from June to December in 2016
Month
June July August September October November December
Numbers
2
2
2
2
1
1
1
Two earthquakes per month will occur from June to September, and one earthquake will happen from October
to December in year 2016.
CONCLUSIONS
After analyzing the characters of monthly number of earthquakes in Hualien and recognizing its pattern, the
following conclusions can be obtained:
The time series for the monthly number of earthquakes from January 1995 to May 2016 is not random by visual
inspection as well as by checking the Ljung-Box Q (LBQ) statistics.
Unit-root tests such as Augmented Dickey-Fuller (ADF) and Philips-Perron (PP) are used to test the stationarity of
the time series of Hualiens monthly number of earthquakes. The hypothesis test
www.tjprc.org
H 0 : 0 (nonstationary) is
editor@tjprc.org
Ko-Ming Ni
rejected, and
H 1 : 0 (stationary) is not rejected. In other words, the time series of the monthly number of
The Autoregressive Integrated Moving Average (ARIMA(p,d,q)) model is used to explore the pattern of the times
series of the monthly number of earthquakes in Hualien. Through careful inspection, the author finds
ARIMA(2,0,0) is a good model for forecasting the earthquake number in Hualien County.
Historicalseismic archive from the Central Weather Bureau (CWB) is evaluated by the proposed ARIMA(2,0,0)
model with four kinds of progressive periods (one, three, six and 12 months) for checking the forecasting errors.
Generally speaking, each time forecasting three months can get best results by checking errors with the mean
absolute deviation (MAD), mean squared error (MSE), and root mean squared error (RMSE) formulas. As for the
mean absolute percentage error (MAPE) equation, because it has observed monthly earthquake number in the
denominator, as long as no earthquake occurs in that month, then MAPE algorithm will fail.
By using ARIMA(2,0,0) as a monthly number of earthquakes forecasting pattern, the author presumes that two
earthquakes per month will occur from June to September, and one earthquake will happen from October to
December in year 2016.
REFERENCES
1.
Central Weather Bureau (CWB) of Taiwan. http://www.cwb.gov.tw. Accessed June 20, 2016.
2.
Hanke, J. E., and Wichern, D. W. Business Forecasting, 9th ed. (2009). New Jersey: Pearson Prentice Hall.
3.
Hill, R. C, Griffiths, W. E, and Lim, G. C. Principle of Econometrics, 4th ed. (2012). John Wiley & Sons, Inc.
4.
Salvatore, D., and Reagle, D., Statistics and Econometrics, 2nd ed. (2011). McGraw-Hill et al. Companies, Inc.
5.
APPENDICES
Appendix A
The first 20 lags of ACF for the time series of number of earthquakes per month in Hualien from January 1995 to
May 2016 is in Table A1.
Table A1: Autocorrelation Function (ACF) of Time
Series of Earthquakes per Month in Hualien
Lag
1
2
3
4
5
6
7
8
9
10
11
12
Impact Factor (JCC): 6.3724
ACF
0.253363
0.134621
0.032357
0.035332
0.071311
0.099146
0.065798
0.032586
0.045569
0.071694
0.258491
0.09773
t
4.06
2.03
0.48
0.52
1.06
1.46
0.96
0.48
0.66
1.04
3.75
1.34
LBQ
16.69
21.42
21.7
22.02
23.37
25.97
27.13
27.41
27.97
29.35
47.43
50.03
NAAS Rating: 3.01
13
14
15
16
17
18
19
20
51.91
51.99
52.28
53.06
56.5
56.5
57.12
58.53
Appendix B
B1: Dickey-Fuller Unit-Root Tests (Hill et al.)
(B1.1) Dickey-Fuller Test 1 (No constant and no trend)
yt yt 1 vt
(B1)
Where
yt yt yt1
(B2)
(B3)
t = residuals
(B4)
H 0 : 0 (nonstationary)
(B5)
H 1 : 0 (stationary)
(B6)
yt yt 1 vt
(B7)
yt yt 1 t vt
(B8)
yt yt 1 as yt s vt
s 1
(B9)
As many lagged first difference term are added to ensure that the residuals are not autocorrelated, the number of
t , or the significance of
lagged terms can be determined by examining the autocorrelation function (ACF) of the residuals
the estimated lag coefficients
editor@tjprc.org
10
Ko-Ming Ni
Table B1: Critical Values for the Dickey-Fuller Test (Hill et al.)
Model
yt yt 1 vt
yt yt 1 vt
y t y t 1 t v t
Standard critical values (tstatistic)
1%
-2.56
5%
-1.94
10%
-1.62
-3.43
-2.86
-2.57
-3.96
-3.41
-3.13
-2.33
-1.65
-1.28
Appendix C
The partial autocorrelation function (PACF) for the time series of monthly number of earthquakes in Hualien from
January 1995 to May 2016 is shown in Figure C1.
Partial Autocorrelation Function for Hualien
Partial Autocorrelation
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1
10
15
20
25
30
35
40
45
50
55
60
Lag
and Reagle)
yt 1 yt 1 2 yt 2 ..... k yt k vt
Where
(C1)
Appendix D
The first 20 lags of the ACF for the residuals of the ARIMA (2,0,0) model are shown in Table D1.
Table D1: The Autocorrelation Function of the Residuals of ARIMA(2,0,0)
Lag
1
2
3
4
Impact Factor (JCC): 6.3724
ACF
0.001306
0.003682
-0.031164
-0.006395
t
0.02
0.06
-0.50
-0.10
LBQ
0.00
0.00
0.26
0.27
NAAS Rating: 3.01
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
www.tjprc.org
11
0.67
2.20
2.47
2.47
2.48
2.48
18.51
18.68
19.32
20.25
20.28
20.47
23.61
24.22
24.35
24.89
editor@tjprc.org