Sie sind auf Seite 1von 14

Atmospheric Environment 44 (2010) 4252e4265

Contents lists available at ScienceDirect

Atmospheric Environment
journal homepage: www.elsevier.com/locate/atmosenv

GARCH modelling in association with FFTeARIMA to forecast ozone episodes


Ujjwal Kumar*, Koen De Ridder
Environmental Modelling Unit, VITO-Flemish Institute for Technological Research, Boeretang 200, 2400 Mol, Belgium

a r t i c l e i n f o
Article history: Received 6 April 2010 Received in revised form 24 June 2010 Accepted 28 June 2010 Keywords: GARCH FFT ARIMA O3-episodes Air quality modelling Air pollution

a b s t r a c t
In operational forecasting of the surface O3 by statistical modelling, it is customary to assume the O3 time series to be generated through a homoskedastic process. In the present work, weve taken heteroskedasticity of the O3 time series explicitly into account and have shown how it resulted in O3 forecasts with improved forecast condence intervals. Moreover, it also enabled us to make more accurate probability forecasts of ozone episodes in the urban areas. The study has been conducted on daily maximum O3 time series for four urban sites of two major European cities, Brussels and London. The sites are: Brussels (Molenbeek) (B1), Brussels (PARL.EUROPE) (B2), London (Brent) (L1) and London (Bloomsbury) (L2). Fast Fourier Transform (FFT) has been used to model the periodicities (annual periodicity is especially distinct) exhibited by the time series. The residuals of actual data subtracted with their corresponding FFT component exhibited stationarity and have been modelled using ARIMA (Autoregressive Integrated Moving Average) process. The MAPEs (Mean absolute percentage errors) using FFTeARIMA for one day ahead 100 out of sample forecasts, were obtained as follows: 20%, 17.8%, 19.7% and 23.6% at the sites B1, B2, L1 and L2. The residuals obtained through FFTeARIMA have been modelled using GARCH (Generalized Autoregressive Conditional Heteroskedastic) process. The conditional standard deviations obtained using GARCH have been used to estimate the improved forecast condence intervals and to make probability forecasts of ozone episodes. At the sites B1, B2, L1 and L2, 91.3%, 90%, 70.6% and 53.8% of the times probability forecasts of ozone episodes (for one day ahead 30 out of sample) have correctly been made using GARCH as against 82.6%, 80%, 58.8% and 38.4% without GARCH. The incorporation of GARCH also signicantly reduced the no. of false alarms raised by the models. 2010 Elsevier Ltd. All rights reserved.

1. Introduction Surface O3 is one of the six criteria air pollutants and a critical air quality indicator (Masters, 1998). Therefore, forecasting and investigating statistical nature of O3 concentration in ambient urban environment have been the subject of many of the studies (e.g., Prior et al., 1981; Simpson and Layton, 1983; Robeson and Steyn, 1990; Hubbard and Cobourn, 1998, 2007; Slini et al., 2002; Kumar et al., 2009; Tsai et al., 2009; Demuzere and van Lipzig, 2010; etc). Prior et al. (1981) applied regression model for forecasting daily maximum ozone which will occur later in the day in terms of solar radiation intensity, temperature, wind-speed and NOx data taken earlier in the day at St. Louis. Although their model had an overall 83% accuracy in predicting daily maximum O3 concentration, the model was not quite successful in predicting higheozone days (O3 > 120 ppb). To develop a probabilistic forecast of ozone concentrations, Robeson and Steyn (1989) suggested that use be made of the inherent properties of

* Corresponding author. Tel.: 32 14 336761. E-mail addresses: ujjwal.kumar@vito.be, ujjwal.kumar@yahoo.co.in (U. Kumar). 1352-2310/$ e see front matter 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.atmosenv.2010.06.055

seasonality and autocorrelation in O3 time series. A nonstationary, autocorrelated stochastic process is used to simulate a conditional probability density function (p.d.f.) which quanties the effects of seasonality and autocorrelation. Robeson and Steyn (1990) used three models namely e (1) A univariate deterministic/stochastic model, (2) A univariate Autoregressive Integrated Moving Average (ARIMA) model, and (3) A bivariate temperature and persistence based regression model to estimate daily maximum O3 concentration in the lower Fraser valley of British Columbia. They concluded that the ARIMA model had nearly the same predictive capability as persistence model while the mixed deterministic/stochastic model performs the worst. Hubbard and Cobourn (1998) made use of 10 parameters multiple linear regression model to predict daily domain level peak O3 and found that 50% of the forecasts are within 7.6 ppb, and on 80% of the accuracy was within 14.8 ppb Slini et al. (2002) applied autoregressive integrated moving average (ARIMA) to maximum ozone concentration forecasts in Athens, Greece for the analysis of a 9-year air quality observation record. Results show a good index of agreement, accompanied by a weakness in forecasting alarms. Cobourn (2007) applied Takagiesugeno fuzzy system and a nonlinear regression(NLR) model and report their performance

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265

4253

results in terms of mean absolute error (8.0 ppb for Takagiesugeno fuzzy system and 8.1 ppb for NLR). Tsai et al. (2009) applied two cost sensitive articial neural network (ANN) methods to forecast O3-episode. They found that cost sensitive articial neural network (ANN) methods perform better than the standard articial neural network. It is to note that a simple multiple linear regression model is often difcult to construct for air pollutants as there exists no direct linear relationship among the atmospheric variables. Various studies have also used neural-networks for air quality forecasting (e.g., Tsai et al., 2009). There are some studies that compare forecasting performances of ARIMA and neural-networks (e.g., Choon and Chuin, 2008; Chateld, 2004; Ho et al., 2002; Shabri, 2001; Tang et al., 1991; etc). However, theres no single conclusive result. In general, ARIMA works as good as neural network and often outperforms neural network for one-step-ahead forecasts [Chateld (2004), pp 230e235 provides a review on it]. Even multivariate methods does not improve the forecasts most often. When purpose is forecasting and the variable in question is affected by innumerable no. of factors for which physical relations are not very well structured (as is the case with air pollutants), univariate time series technique most often outperforms the multivariate methods (in terms of forecasting) (Chateld, 2004, pp 90e103). There are also various CTMs (chemical transport models) available [e.g., LOTOS (van Loon et al., 2000), CHIMERE (Schmidt et al., 2001)] that takes meteorology, atmospheric processes, chemical reactions etc into account in order to produce air quality scenario of a region. These models also most often need to undergo a statistical adjustment exercise called data-assimilation (see e.g., van Loon et al., 2000; Denby et al., 2008). This adjusts and improves the spatial scenario, however, producing good future forecasts using CTMs is still a subject matter of intensive research (see e.g., Denby et al., 2008; Honor et al., 2008). The present study focuses on the forecasting of daily maximum O3 concentration in urban areas. As noted above, when focus is purely on forecasting, stochastic models often perform well. It is noteworthy that in forecasting of surface O3 concentration by stochastic models, the heteroskedasticity of O3 time series has most often been ignored in earlier studies. In the present study, deterministic part of the time series has been modelled using FFT (Fast Fourier Transform) and the stochastic part using ARIMA (Autoregressive Integrated Moving Average). In addition to stationary stochastic model ARIMA, weve taken heteroskedasticity of O3 time series explicitly into account and modelled it through GARCH (Generalized Autoregressive Conditional Heteroskedastic). The GARCH models have been used to reconstruct the new forecast condence intervals and to make probability forecasts of O3-episodes. Section 2 presents the description of data and sites used in the study. Methodology has been discussed in Section 3 and the results and discussion have been presented in Section 4. Section 5 concludes the study. 2. Data and sites In the present work, four sites have been studied from two major European cities, viz, London and Brussels, two sites from each city. The O3 data for Londons two sites have been procured from UK air quality archive (http://www.airquality.co.uk). Automatic Networks in UK produce hourly pollutant concentrations, with data being collected from individual sites by modem. The website also provides the statistics for daily maximum O3 concentration from the hourly average data that has been used in the current study. The data for the Brussels region have been obtained from European air quality database, i.e., the AIRBASE dataarchive (http://air-climate.eionet.europa.eu/databases). The daily maximum O3 data for the Brussels sites have been extracted from

the hourly average O3 AIRBASE data-le of the corresponding sites for the current study. The study sites are as follows: 1. Brussels (Molenbeek) (B1 henceforth) (AIRBASE station code: BE0184A): (Lat: 50 5100100, Lon: 4 200 0600 ). AIRBASE data-archive denes the characteristics of this site as trafc urban residential. Monitor has been located very close to the busy trafc road. 2. Brussels (PARL.EUROPE) (B2 henceforth) (AIRBASE station code: BE0403A): (Lat: 50 500 330 , Lon: 4 220 3200 ). In terms of characteristics, this site has been dened as Urban background residential by AIRBASE data-archive. 3. London (Brent) (L1 henceforth) (Lat: 51.589618 , Lon: 0.275519 ): UK air quality archive categorizes this as a suburban site (a residential area). 4. London (Bloomsbury) (L2 henceforth) (Lat: 51.522287, Lon: 0.125848 ): The characteristics of this site has been dened as urban background by UK air quality data-archive. The monitoring station is within a self-contained, air conditioned housing located within the north-east corner of a central London gardens. The gardens are generally laid to grass with many mature trees. All four sides of the gardens are surrounded by 2/4 lane one-way road system, which is subject to frequent trafc. EU air quality standards (http://ec.europa.eu/environment/air/ quality/standards.htm) prescribe a daily maximum 8-h mean O3 concentration to be 120 mg m3. UK air quality standards (http:// www.airquality.co.uk/standards.php) prescribe that 8 hourly running or hourly mean O3 concentration should not exceed 100 mg m3 more than 10 times in a year. WHO air quality guidelines (http://www.who.int/phe/health_topics/outdoorair_aqg/en/) suggest a threshold of 100 mg m3 (8-h mean) of O3 concentration for adequate protection of public health. In the present study, weve followed a threshold of 100 mg m3 (WHO air quality guidelines and UK air quality standards) while calculating probability forecasts of O3-episodes. 3. Methodology 3.1. Fast Fourier Transform (FFT) FFT is a variant of Discrete Fourier Transform (DFT) with only difference that FFT is computationally faster (Press et al., 2002). Computationally, DFT is of the order of O(N2) while FFT is of the order of O(N.log2N). If x0 ; x1 ; x2 .:; xN1 denote a time series, its DFT Hn is given by

Hn

N1 X k0

xk e2pikn=N

(1)

which can be inverted by inverse Fourier transform as follows:

xk 1=N

N1 X n0

Hn e2pikn=N

(2)

Equation (1) is periodic in n with period N. Thus, the frequency range varies from 1/2 to 1/2 at discrete interval n/N. Now the periodogram estimates of the power spectrum at different frequencies are given as (Press et al., 2002)
1 P0 Pf0 N2 jH0 j2 h i 1 Pfk N2 Hk 2 HNk 2

(3)

k where fk N is dened only for the zero and positive frequencies (also Fourier transform (1) is symmetric, i.e., Hk and Hk have the same value). In the present study, weve plotted power vs. period.

4254

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265

The inspection of power spectrum helps us to identify and select the frequencies (periods) which are supposedly dominant in the time series. After taking only the selected number of dominant frequencies for a particular time series, we constructed the Fourier transform and took inverse of it. In the present study, the time series obtained after inverse Fourier transform under selected frequencies has been called as FFT component of the time series. 3.2. ARIMA modelling A time series {xt; t 0,1,2, ..} is ARMA (p, q) if it is covariance stationary and can be represented as

a0 > 0; ai ! 0; bi ! 0;

i 1; 2; .; q; and

(10) (11)

i 1; 2; .; p:

xt f1 xt1 . fp xtp 3t q1 3t1 .: qq 3tq ;

(4)

(Shumway and Stoffer, 2006) with fp s0, qq s0, and 3t are the innovations with N(0,s2 ) and 3 s2 >0. The parameters p and q are called the autoregressive [AR(p)] 3 and the moving average [MA(q)] orders, respectively. When a time series doesnt appear covariance stationary, the differencing procedure may be applied to make it stationary. Then, the ARMA(p, q) model can be applied to the stationary differenced time series and model so constructed is called ARIMA(p, d, q,) model where d denotes the order of differencing (Shumway and Stoffer, 2006; Brockwell and Davis, 2002). The parameters f and q have been estimated using maximum likelihood method (Brockwell and Davis, 2002) in the present study. An inspection of autocorrelation function (ACF) and partial autocorrelation function (PACF) helps in identifying the orders AR (p) and MA(q). In addition, more objectively dened criterions such as Akaike information criterion (AIC), HannoneQuinn Information Criterion (HIC), Bayesian Information Criterion (BIC) and Final Prediction Error (FPE) can also be used to identify the correct orders p and q (Brockwell and Davis, 2002; Kumar and Jain, 2009). 3.3. GARCH modelling Let 3t denote a real valued discrete-time stochastic process. In this study, 3t are the innovations of the ARMA process in equation (4). Engle (1982) dened them as an autoregressive conditional heteroskedastic process where all 3t are of the form

It is to note that, for p 0, the process reduces to an ARCH(q) process. Also, for p q 0 the conditional variance is constant, as in ARMA, and the innovation 3t simply reduces to white noise. As an ARMA analogue, the GARCH process could be justied through a Walds decomposition type of argument as a more parsimonious description. Bollerslev (1986) shows that the GARCH(1,1) process is wide-sense stationary with E3t 0, var3t a0 =1 a1 b1 and cov3t ; 3s 0 for tss if and only if a1 b1 < 1. The GARCH model parameters have also been estimated using maximum likelihood method (Shumway and Stoffer, 2006; Brockwell and Davis, 2002). The key insight of GARCH lies in the distinction between conditional and unconditional variances of the innovations process f3t g. The term conditional implies explicit dependence on a past sequence of observations. The term unconditional is more concerned with long term behaviour of a time series and assumes no explicit knowledge of the past. When the conditional variance parameters satisfy the inequalities in equations (9)e(11), the unconditional variance (i.e., time-independent, or long-run variance expectation) of the innovations process f3t g is

s2 E 32 t

 

Pp

a0 P ai q 1 bj i1 j

Equivalently, it can easily be noted that long-run conditional variance expectation is explicitly dependent on GARCH model parameters and becomes equal to unconditional variance of the innovations process f3t g. 3.3.1. Test for the presence of ARCH/GARCH effects (Engle, 1982) Since ARCH model requires iterative procedures, it may be desirable to test whether it is appropriate before going to the effort to estimate it. The Lagrange multiplier test is ideal for this as in many similar cases (e.g., Breusch and Pagan, 1978, 1980; Godfrey, 1978; Engle, 1979). Under the null hypothesis, a1 a2 . ap 0. The test is based upon the score under the null and the information matrix under the null. Consider the ARCH model with s2 hzt a, where h is some differentiable function which, t therefore, includes both the linear and exponential cases as well as lots of others and zt 1; 32 ; .; 32 where 3t are the tp t1 ordinary least square residuals. Under the null, s2 is a constant t 2. Engle (1982) shows that the LM test statistics denoted by s0 can be consistently estimated by

3t zt st ;

(5)

where zt is an identically independent distributed process with zero mean and unit variance. Although 3t is serially uncorrelated by denition, its conditional variance equals s2 which might be t autocorrelated and, therefore, may change over time. The variance equation of the GARCH(p, q) can be expressed as (Bollerslev, 1986; Aradhyula and Holt, 1988; Shumway and Stoffer, 2006; Brockwell and Davis, 2002)

x* f 0 zz0 z1 z0 f 0
0

zt wDq 0; 1;

(6)

1 2

s2 a0 t

q X i1

a i 32 ti

p X i1

bi s2 ti

(7)

where z0 z1 ; .; zT and f0 is the column vector of st2 1. t It is to note that f 0 f 0 =T 2 because normality has been assumed. Thus, an asymptotically equivalent statistics would be

32

x Tf 0 zzz1 zf 0 =f 0 f 0 TR2
(8)
where R2 is the squared multiple correlation between f0 and z. Since adding a constant and multiplying by a scalar will not change the R2 of a regression, this is also the R2of the regression of 32 on an t intercept and p lagged values of 32 . The statistic will be asymptott ically distributed as chi square with p degrees of freedom when the null hypothesis is true. Thus, the test procedure is to run the ordinary least square regression and save the residuals. Regress the square residuals on a constant and p lags and test TR2 as a c2 . p

s2 a0 aB32 bBs2 t t1 t1

where aB and bB are the appropriate polynomial of the lag operator B, Dq 0; 1 is the probability density function of the innovations or residuals with zero mean and unit variance and

p ! 0; q ! 0

(9)

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265

4255

4. Results and discussion All the computations involved in the present task have been carried out on MATLAB 7.7 platform. The matlab toolboxes ucsd_garch (available from http://www.kevinsheppard.com/ wiki/UCSD_GARCH) and the econometric-toolbox (available from http://www.spatial-econometrics.com/) have been freely used. To validate the forecasting performance of an FFTeARIMA model, last 100 data from each time series has been kept out of modelling procedure. The models performance for out of sample forecasts has been evaluated on the basis of mean absolute error (MAE), mean absolute percentage error (MAPE) and root mean square error (RMSE). While evaluating the performance of GARCH models to make probabilistic forecasts of O3-episodes, 30 out of sample data has been used. This is related to the fact that long term conditional variances essentially converge to unconditional variance (Section 3.3), thus, taking too long out of sample data-points cant properly exhibit the effectiveness of GARCH model. Hence, we limit to 30 out of sample data-points while using GARCH. Section 4.1 presents general description of time series at the four urban sites. Section 4.2 carries out FFT modelling for each time series. ARIMA modelling results for the residuals obtained after subtracting FFT component from observed data have been presented in Section 4.3. Section 4.4 discusses the forecasting performances of FFTeARIMA models applied on each time series. GARCH modelling outcomes

and its effect on models performance have been discussed in Section 4.5. 4.1. Time series of daily maximum ambient O3 concentration Fig. 1(a)e(d) present the time series plots of daily maximum ambient O3 concentration at four different urban sites of Brussels and London. A total of 65, 48, 138 and 221 missing data has been encountered out of 3499, 2191, 4170 and 3196 data at Brussels (Molenbeek) (B1), Brussels (PARL.EUROPE) (B2), London (Brent) (L1) and London (Bloomsbury) (L2), respectively. These missing values have been lled up using linear interpolation technique based on state space method (Ljung, 1999). A visual inspection of these time series clearly reveals that an annual cycle is present in each of the time series. This feature has been exploited using FFT technique. 4.2. FFT modelling Fig. 2(a)e(d) shows the power vs. period plot of each time series with the most dominant periods marked. The frequencies corresponding to these dominant periods have been chosen to construct the FFT component of the time series. For each of the site, rst three predominant frequencies [corresponding periods have been marked in Fig. 2(a)e(d)] have been chosen to reconstruct the periodic (FFT) component of the time series. Fig. 3(a)e(d) shows the FFT component of each time series along with the original time series.

Fig. 1. Time Series plots of daily maximum O3 concentration at, (a) Brussels (Molenbeek) (B1) for the period 1-Jan-98 to 31-July-07, (b) Brussels (PARL.EUROPE) (B2) for the period 1Jan-02 to 30-Jun-07, (c) London (Brent) (L1) for the period 1-May-96 to 30-sep-07, and (d) London (Bloomsbury) (L2) for the period 1-Jan-00 to 30-Sep-08.

4256

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265

Fig. 2. Power vs. Time-period plots for the daily maximum O3 time series at, (a) Brussels (Molenbeek) (B1), (b) Brussels (PARL.EUROPE) (B2), (c) London (Brent) (L1), and (d) London (Bloomsbury) (L2).

Fig. 3. FFT component (in red) of the original daily max O3 time series (in blue) at, (a) Brussels (Molenbeek) (B1), (b) Brussels (PARL.EUROPE) (B2), (c) London (Brent) (L1), and (d) London (Bloomsbury) (L2). (For the interpretation of the reference to color in this gure legend the reader is referred to the web version of this article.)

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265

4257

Fig. 4. The FFT residuals (original data subtracted corresponding FFT component) at, (a) Brussels (Molenbeek) (B1), (b) Brussels (PARL.EUROPE) (B2), (c) London (Brent) (L1), and (d) London (Bloomsbury) (L2).

Fig. 5. ACF (autocorrelation function) of FFT residuals at, (a) Brussels (Molenbeek) (B1), (b) Brussels (PARL.EUROPE) (B2), (c) London (Brent) (L1), and (d) London (Bloomsbury) (L2). The two (blue) straight lines parallel to x-axis show the 95% condence bounds. (For the interpretation of the reference to color in this gure legend the reader is referred to the web version of this article.)

4258

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265

Fig. 6. PACF (partial autocorrelation function) of FFT residuals at, (a) Brussels (Molenbeek) (B1), (b) Brussels (PARL.EUROPE) (B2), (c) London (Brent) (L1), and (d) London (Bloomsbury) (L2). The two (blue) straight lines parallel to x-axis show the 95% condence bounds. (For the interpretation of the reference to color in this gure legend the reader is referred to the web version of this article.)

4.3. ARIMA modelling In the next step, FFT component has been subtracted from the original time series to obtain the FFT residuals. These FFT residuals have been presented in Fig. 4(a)e(d). A visual examination of Fig. 4 (a)e(d) suggests that these time series can be considered covariance stationary as there are no upward or downward or exponential or sinusoidal or diminishing or expanding or any such apparent trend. The autocorrelation function (ACF) pattern of these FFT residuals have been shown in Fig. 5(a)e(d). The ACF pattern in each g clearly exhibits a pattern somewhat similar to exponential decline. Such pattern favour for AR(1) process. In addition, PACF pattern (Fig. 6(a)e(d)) also shows that AR(1) coefcients are the most signicant part of the process. We also applied AIC, BIC, HIC and FPE criterions in order to ascertain the orders p, q of ARMA(p, q) models as correctly as possible (Section 3). Based on these criterions and the test of stationarity and invertibility of ARIMA models (Kumar and Jain, 2009), the nal selected model for FFT residuals are ARIMA(1,0,2) for all the study sites. Table 1 lists the AR and MA coefcients with their statistics at different sites. The t-statistics and the p values in Table 1 clearly indicate that these coefcients in the ARIMA models are statistically quite signicant. 4.4. Forecasting performance of FFTeARIMA models Forecasting performances of the applied FFTeARIMA models have been evaluated against 100 out of sample one day ahead forecasts. The appropriateness of applied FFTeARIMA models has been tested against whiteness of their residuals (Shumway and Stoffer, 2006; Kumar and Jain, 2009). Fig. 7(a)e(d) shows the ACF of ARIMA residuals. All the ACF values are almost within the condence bounds and hence residuals can effectively be considered to follow white noise

process. Table 2 shows the forecasting performances of selected FFTeARIMA models against the indicators MAE, RMSE, MAPE and observed/predicted mean. The MAPE values for 100 out of sample forecasts were obtained as follows: 20%, 17.8%, 19.7% and 23.6% at Brussels (Molenbeek), Brussels (PARL.EUROPE), London (Brent) and London (Bloomsbury). The rst 20 out of sample one day ahead FFTeARIMA forecasts with their forecast condence intervals have been shown in Figs. 9 and 10. Only the rst 20 out of sample forecasts have been shown in Figs. 9 and 10 so that a clear comparison of forecast condence intervals can be made between those obtained from FFTeARIMA and FFTeARIMAeGARCH models (Section 4.5).

Table 1 The AR and MA coefcients of applied ARIMA models to FFT residuals at different sites. Coefcients Brussels (Molenbeek) (B1) AR(1) 0.8042 MA(1) 0.2065 MA(2) 0.1545 Brussels (PARL.EUROPE) (B2) AR(1) 0.7764 MA(1) 0.1942 MA(2) 0.0886 London (Brent) (L1) AR(1) 0.8141 MA(1) 0.2529 MA(2) 0.1214 London (Bloomsbury) (L2) AR(1) 0.8275 MA(1) 0.2998 MA(2) 0.1556 Std error 0.0230 0.0298 0.0241 0.0317 0.0401 0.0319 0.0194 0.0257 0.0210 0.0232 0.0305 0.0243 t-statistics 34.9 6.9 6.4 24.5 4.8 2.8 41.9 9.8 5.8 35.7 9.8 6.4 P value <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 0.0052 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265

4259

Fig. 7. ACF (autocorrelation function) of ARMA residuals at, (a) Brussels (Molenbeek) (B1), (b) Brussels (PARL.EUROPE) (B2), (c) London (Brent) (L1), and (d) London (Bloomsbury) (L2). The two (blue) straight lines parallel to x-axis show the 95% condence bounds. For the interpretation of the reference to color in this gure legend the reader is referred to the web version of this article.)

4.5. GARCH modelling, forecast condence intervals and probability forecasts of O3-episodes Fig. 7(a)e(d) shows the ACF of the residuals of applied FFTeARIMA models while Fig. 8(a)e(d) shows the ACF of the squared residuals of the same. In Fig. 7(a)e(d), all the ACF values are within the condence bounds, i.e., there exist no autocorrelation among the FFTeARIMA residuals. In other words, FFTeARIMA residuals follow the white noise process. However, Fig. 8(a)e(d) clearly shows that a signicant number of ACF values are out of condence bounds, i.e., squared residuals do not obey the white noise assumption and exhibit correlation in the variances. This clearly indicates that heteroskedasticity exists in the process. The correlation structure of the squared residuals can be exploited using the GARCH modelling process. To reafrm that the squared residuals exhibit ARCH/GARCH effects, weve also applied Engles hypothesis test to detect ARCH/GARCH effects (Section 3) on these squared residuals. Table 3 presents the test results, it clearly shows that the null hypothesis of no ARCH/GARCH effects is rejected in each case (H 1 in each case). Hence, the application of GARCH

Table 2 Forecasting Performance of FFTeARIMA Models for 100 out of sample forecasts. Brussels (Molenbeek) (B1) MAE (mg m3) RMSE (mg m3) MAPE (%) Observed Mean (mg m3) Predicted Mean (mg m3) 15.5 21.2 20.0 80.2 77.0 Brussels (PARL.EUROPE) (B2) 15.5 20.9 17.8 92.8 92.5 London (Brent) (L1) 12.6 16.7 19.7 65.6 67.6 London (Bloomsbury) (L2) 10.9 14.6 23.6 53.5 52.2

modelling may be useful to remove the ARCH/GARCH effects present in the corresponding time series. GARCH models have been estimated on the assumption that the conditional distribution of FFTeARIMA residuals follow the Gaussian process. For each time series of squared FFTeARIMA residuals, GARCH(1,1) models were estimated rst because they are parsimonious and are often the most likely candidates in the applied analysis (Aradhyula and Holt, 1988). After these initial estimates were obtained, several alternative specications of the conditional variance equation were examined. Each alternative was examined for improvements in model t and parameter signicance relative to the GARCH(1,1) process. Following this identication and selection process, it was determined that a GARCH(1,1) process was adequate for explaining the conditional variances at sites B2, L1 and L2 while GARCH(2,1) model is a good t for residuals time series at site B1. Table 4 reports the GARCH model parameters and their statistics for each time series. By looking at t-statistics and p-values, it is evident that each coefcient of the respective GARCH models is statistically signicant. It can be easily demonstrated that a1 b1 < 1 a1hGARCH1; b1hARCH1 in each case, i.e., for each site the dened GARCH processes are clearly stationary. To verify that no further ARCH/GARCH effects (or, heteroskedastic effect) are present in the model-results, we have applied Engles hypothesis test to detect the presence of ARCH/GARCH effects (Engle, 1982) on the standardized residuals of the GARCH models. Table 5 reports the test results for 10, 15 and 20 lags of squared sample residuals at 0.01 and 0.05 level of signicance. Table 5 shows that the null hypothesis of no ARCH/GARCH effects in case of site B1 is accepted at 10 lags but rejected at 15, 20 lags at 0.05 level of signicance, however, the same null hypothesis is accepted at all lags at 0.01 level of signicance. Moreover, the t-stat is quite close to the critical value at 0.05 level of signicance for lag 15. Thus, for practical purposes, we

4260

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265

Fig. 8. ACF (autocorrelation function) of squared ARMA residuals at, (a) Brussels (Molenbeek) (B1), (b) Brussels (PARL.EUROPE) (B2), (c) London (Brent) (L1), and (d) London (Bloomsbury) (L2). The two (blue) straight lines parallel to x-axis show the 95% condence bounds. (For the interpretation of the reference to color in this gure legend the reader is referred to the web version of this article.)

Fig. 9. Representation of one day ahead 20 out of sample forecasts for daily maximum O3 concentration at Brussels (Molenbeek) (B1) and Brussels (PARL.EUROPE) (B2) using FFTeARIMA and FFTeARIMAeGARCH.

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265

4261

Fig. 10. Representation of one day ahead 20 out of sample forecasts for daily maximum O3 concentration at London (Brent) (L1) and London (Bloomsbury) (L2) using FFTeARIMA and FFTeARIMAeGARCH.

can accept that the GARCH(2,1) model explains the heteroskedasticity present in the squared FFTeARIMA residuals at site B1. For all the other sites, it is amply clear from Table 5 that the null hypothesis of no ARCH/GARCH effects is accepted in each case (H 0 in each case), hence the selected GARCH models [i.e., GARCH(1,1)] sufciently explain the heteroskedasticity present in

the time series of the squared FFTeARIMA residuals at the sites B2, L1 and L2. Since GARCH model gives the estimate of conditional variances on the basis of its own lagged values and the lagged squared residuals, the conditional standard deviations can easily be obtained by taking the square root of the estimated conditional

Table 3 Engles hypothesis test to detect the presence of ARCH/GARCH effects on the residuals of FFTeARIMA models [H 0 means accepting the null hypothesis and H 1 means rejecting the null hypothesis]. Lag (M) Null hypothesis (H) (Level of signicance) 0.05 B1 10 15 20 B2 10 15 20 L1 10 15 20 L2 10 15 20 1 1 1 1 1 1 1 1 1 1 1 1 0.01 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 233.732 252.176 259.465 122.432 142.610 154.404 303.498 306.662 318.340 186.150 192.948 196.209 P value Stat Critical Value (Level of signicance) 0.05 18.307 24.996 31.410 18.307 24.996 31.410 18.307 24.996 31.410 18.307 24.996 31.410 0.01 23.209 30.577 37.566 23.209 30.577 37.566 23.209 30.577 37.566 23.209 30.577 37.566

4262 Table 4 GARCH model parameters and their statistics. Coefcients Brussels (Molenbeek) (B1) GARCH(1) 0.381 GARCH(2) 0.363 ARCH(1) 0.161 Brussels (PARL.EUROPE) (B2) GARCH(1) 0.866 ARCH(1) 0.090 London (Brent)(L1) GARCH(1) 0.733 ARCH(1) 0.171 London (Bloomsbury) (L2) GARCH(1) 0.833 ARCH(1) 0.118 Std error 0.115 0.102 0.018 0.023 0.015 0.017 0.012 0.013 0.011

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265

t-statistics 3.3 3.6 8.9 36.7 5.8 41.5 13.9 60.8 10.7

P value 0.0010 0.0003 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001

variances. These conditional standard deviations have also been estimated for 100 out of sample one day ahead forecasts. Subsequently, they have been used to estimate the 95% forecast condence intervals. The rst 20 out of sample forecasts (explained below) with newly estimated 95% forecast condence intervals have been presented in Figs. 9 and 10 for all the sites B1, B2, L1 and L2, respectively. Since the long term conditional variances tend to be equal to unconditional variance in the limit (Section 3), so the forecast condence intervals obtained from both FFTeARIMA and FFTeARIMAeGARCH tend to merge in the long run. Hence, difference between the two forecast condence intervals is practically indiscernible in long time series plots, so we limit to show rst 20 out of sample forecasts on the plots. The newly constructed forecast condence intervals using GARCH models can be compared with those of earlier estimated forecast condence intervals (using FFTeARIMA) [Figs. 9 and 10]. Figs. 9 and 10 show that the forecast condence intervals obtained from FFTeARIMAeGARCH is smaller than the forecast condence intervals obtained from FFTeARIMA. Though both forecast condence intervals tend to merge after a sufcient long period, there is a clear improvement in forecast condence intervals for short-term. It is to note that shorter is the forecast condence interval, greater is the reliability of the forecasts. For example, the information that the next day max O3 concentration at site B1 is likely to be 57 and with 95% condence it is predicted that max O3 concentration is likely to

stay in between 82.1 and 31.1 is more precise than the information that the next day O3 concentration at site B1 is likely to be 57 and with 95% condence it is predicted that O3 concentration is likely stay in between 94.6 and 19.3. The rst information is from FFTeARIMAeGARCH model while the second one is from FFTeARIMA model. Thus, the reliability of forecasts is improved if we are able to provide better and smaller forecast condence intervals. This has been made possible by exploiting the correlation structure in variances as is the case with FFTeARIMAeGARCH models [Figs. 9 and 10]. This usefulness of FFTeARIMAeGARCH model will further become more evident when we make probability forecasts of ozone episodes. The main purpose of air pollutants forecasts is to issue a forewarning to the public whether air pollutant concentration exceeds the prescribed threshold or not. Thus, the information that whether the next day O3 is likely to exceed the prescribed threshold or not might be more useful than the point forecasts of O3 concentration. With this in mind, we try to make probability forecasts of O3-episodes rst with the help of FFTeARIMA models and then with the help of FFTARIMAeGARCH models. To test the utility of such probability forecasts, well estimate it for the 30 out of sample members also. In this study, weve followed the WHO air quality guidelines and the UK air quality standards that prescribe the ambient O3 e concentration to be less than 100 mg m3 as safe (details in Section 2). Since, the models have been constructed on the assumption that the conditional distribution of residuals follow Gaussian process, the probability of O3-episode occurrence (!100 mg m3) has also been calculated using Gaussian probability function. To validate the approach, those 30 continuous days were selected from the time series when O3-episodes were more frequent. These 30 days were treated as out of sample and have been kept out of modelling procedure and the models were constructed using all the time series data before these 30 days. It is to note that model structure essentially remains the same as weve constructed the model using sufciently long time series and still have sufcient length of time series. For site B1, these 30 days were from 28-Jun-06 to 27-July-06, for site B2 these days were from 9-Jun-06 to 8-July-06, for site L1 these days were from 18-Jun-06 to 17-July-06 and for site L2 these 30 days were from 3-July-06 to 1-Aug-06. When GARCH modelling was ignored, probability (p) of O3-episode occurrence was calculated on the basis unconditional standard deviation obtained using FFTeARIMA models. When

Table 5 Engles hypothesis test to detect the presence of ARCH/GARCH effects on the standardized residuals of GARCH models [H 0 means accepting the null hypothesis and H 1 means rejecting the null hypothesis]. Lag (M) Null hypothesis (H) (Level of signicance) 0.05 B1 10 15 20 B2 10 15 20 L1 10 15 20 L2 10 15 20 0 1 1 0 0 0 0 0 0 0 0 0 0.01 0 0 0 0 0 0 0 0 0 0 0 0 0.141 0.029 0.017 0.353 0.114 0.223 0.053 0.068 0.133 0.878 0.766 0.853 14.755 26.866 35.570 11.049 22.734 24.453 18.097 23.816 27.056 5.188 10.806 13.908 p Value Stat Critical Value (Level of signicance) 0.05 18.3070 24.9958 31.4104 18.3070 24.9958 31.4104 18.3070 24.9958 31.4104 18.3070 24.9958 31.4104 0.01 23.209 30.577 37.566 23.209 30.577 37.566 23.209 30.577 37.566 23.209 30.577 37.566

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265 Table 6 O3-Episodes Analysis Results: Percentage of O3-episodes correctly forecasted and the percentage of false alarms for 30 out of sample one day ahead forecasts. Sites O3 Critical Limit 100 mg m3 Percentage of O3-episodes Correctly Forecasted FFTeARIMA B1(28-Jun-06 to 27-July-06) B2(9-Jun-06 to 8-July-06) Lr(18-Jun-06 to 17-July-06) Ll(3-July-06 to 1-Aug-06) 82.6% 80% 58.8% 38.4% FFTeARIMAeGARCH 91.3% 90% 70.6% 53.8% False Alarms FFTeARIMA 9.5% 20% 9.1% 16.7%

4263

FFTeARIMAeGARCH 8.7% 18.2% 7.7% 12.5%

GARCH modelling was taken into account, conditional standard deviations were used to calculate the probability (p) of O3-episode occurrence. We say that O3-episode is likely to occur if p ! 0.6, i.e., O3 concentration is going to be !100 mg m3 if calculated p ! 0.6. A forecast is said to be a successful forecast if both of these (i.e., p ! 0.6 and O3 concentration ! 100 mg m3) occur together the next day. The criterion p ! 0.6 is qualitative but reasonable in the sense that probability p ! 0.6 represents more likelihood of occurrence of a phenomenon than its nonoccurrence. We dene a forecast to be a false alarm if model gives p ! 0.6 but the observed O3 concentration turn out to be <100 mg m3 the next day. Now the total percentage of correctly forecasted O3-episodes and the total percentage of false alarms have been calculated as follows:

Percentage of false alarms 100 No: of false alarms= Total No: of O3 episodes forecasted;
Both of these quantities have been calculated for the considered 30 out of sample days. The results for each site have been presented in Table 6. The observed, forecasts and the forecast condence intervals (of FFTeARIMA and FFTeARIMAeGARCH) for the corresponding days have been depicted in Figs. 11 and 12. Table 6 reports that the percentage of correct probability forecasts of O3-episode ranges from 53.8% to 91.3% using GARCH and the performance of GARCH model is better up to 8.7 to 15.4%. A comparison of the percentage of false alarms raised by both the models can also be made using Table 6. The results show that the no. of false alarms raised by FFTeARIMAeGARCH is either less or comparable to false alarms raised by FFTeARIMA models at all the sites.

Percentage of successful forecasts 100 No: of successful forecasts= Total No: of O3 episodes actually occurred;

Fig. 11. Representation of one day ahead 20 out of sample daily maximum O3 concentration forecasts at Brussels (Molenbeek) (B1) and Brussels (PARL.EUROPE) (B2) for which probabilistic forecasts of O3-episodes have been made using FFTeARIMA and FFTeARIMAeGARCH.

4264

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265

Fig. 12. Representation of one day ahead 20 out of sample daily maximum O3 concentration forecasts at London (Brent) (L1) and London (Bloomsbury) (L2) for which probabilistic forecasts of O3-episodes have been made using FFTeARIMA and FFTeARIMAeGARCH.

Thus, at all the sites, there are signicant improvement both in the correctly forecasted episodes as well as in the reduction of no. of false alarms by introducing GARCH modelling procedure. 5. Conclusion The present study has applied GARCH modelling technique in association with FFTeARIMA in order to forecast daily maximum O3 concentration and to make probabilistic forecasts of ozone episodes at four urban sites of two major European cities (London and Brussels). In the modelling process, the ARIMA model structure [ARIMA(1,0,2)] is same for all the sites, however, GARCH model structure differs at site B1 where it is GARCH(2,1) while it is GARCH (1,1) for all the other sites. This might be related to the fact that the site B1 is a busy trafc site while rest of the sites are of urban background characteristics. At a busy trafc site, innumerable no. of factors play their role in governing the air pollutants concentration. Many of the times, local disturbances such as trafc-jam etc may also play a signicant role in governing the transport, dispersal of air pollutants. These all might introduce some different characteristic in the O3 time series which might be absent at urban background sites. In other words, there are much more random perturbations at a trafc site than those of urban background sites. This also possibly makes the nature of heteroskedasticity at a trafc site different than those of urban background sites as heteroskedasticity in a time series is mainly an outcome of short-term random perturbations introduced in the time series. Thus, GARCH model structure begs to differ at a busy trafc site. On the other hand, FFT captures long term cyclic trends in a time series while

ARIMA exploits the long term stationary characteristics of the time series which, for a trafc site, might remain similar/comparable to those of urban background sites. This possibly explains why FFTeARIMA model structure remains the same at all the sites. However, the results clearly reveal that modelling heteroskedastic effects using GARCH in O3 time series not only improves the short-term forecast condence intervals but also makes more accurate short-term probability forecasts of O3-episodes. At all the sites, introduction of GARCH models have signicantly improved the probability forecasts of ozone episodes. In addition to the improvement of correctly forecasted O3-episodes, the no. of false alarms has also reduced at these sites. Although the present study has been conducted for the four urban sites, the methodology is quite general in nature and can be extended to many other sites where similar structure of daily maximum O3 time series can easily be exploited. References
Aradhyula, S.V., Holt, M.T., 1988. GARCH time-series models: an application to retail livestock prices. Western Journal of Agricultural Economics 13, 365e374. Bollerslev, T., 1986. Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31, 307e327. Breusch, T.S., Pagan, A.R., 1978. A simple test for heteroscedasticity and random coefcient variation. Econometrica 46, 1287e1294. Breusch, T.S., Pagan, A.R., 1980. The Lagrange multiplier test and its application to model specication. Review of Economic Studies 47, 239e254. Brockwell, J.B., Davis, R.A., 2002. Introduction to Time Series and Forecasting. Springer-Verlag Inc, New York. Chateld, C., 2004. The Analysis of Time Series: An Introduction. Chapman & Hall/ CRC, New York Washington, D.C. (also published in the Taylor & Francis e-Library, 2009).

U. Kumar, K. De Ridder / Atmospheric Environment 44 (2010) 4252e4265 Choon, O.H., Chuin, J.L.T., 2008. A Comparison of Neural Network Methods and BoxJenkins Model in Time Series Analysis. From Proceeding (605) Advances in Computer Science and Technology - 2008, 605-024. Cobourn, W.G., 2007. Accuracy and reliability of an automated air quality forecast system for ozone in seven Kentucky metropolitan areas. Atmospheric Environment 41, 5863e5875. Demuzere, M., van Lipzig, N.P.M., 2010. A new method to estimate air-quality levels using a synoptic-regression approach. Part I: present-day O3 and PM10 analysis. Atmospheric Environment 44, 1341e1355. Denby, B., Schaap, M., Segers, Arjo, Builtjes, Peter, Horlek, Jan, 2008. Comparison of two data assimilation methods for assessing PM10 exceedances on the European scale. Atmospheric Environment 42, 7122e7134. Engle, R.F., 1979. A General Approach to the Construction of Model Diagnostics Based upon Lagrange Multiplier Principle. University of California, San Diego. Discussion Paper 79-43. Engle, R.F., 1982. Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom ination. Econometrica 50, 987e1007. Godfrey, L.G., 1978. Testing against general autoregressive and moving average error models when the regressors include lagged dependent variables. Econometrica 46, 1293e1302. Ho, S.L., Xie, M., Goh, T.N., 2002. A comparative study of neural network and BoxJenkins ARIMA modeling in time series prediction. Computers and Industrial Engineering 42, 371e375. Honor, C., Rouil, L., Vautard, R., et al., 2008. Predictability of European air quality: assessment of 3 years of operational forecasts and analyses by the PREVAIR system. Journal of Geophysical Research 113 (D04301). doi:10.1029/ 2007JD008761. Hubbard, M.C., Cobourn, M.C., 1998. Development of a regression model to forecast ground-level ozone concentration in Louisville, KY. Atmospheric Environment 32, 2637e2647. Kumar, U., Prakash, A., Jain, V.K., 2009. A multivariate time series approach to study the Interdependence among O3, NOx and VOCs in ambient urban atmosphere. Environmental Modeling and Assessment 14, 631e643.

4265

Kumar, U., Jain, V.K., 2009. ARIMA forecasting of ambient air pollutants (O3, NO, NO2 and CO). Stochastic Environmental Research and Risk Assessment. doi:10.1007/ s00477-009-0361-8. Ljung, L., 1999. System Identication e Theory for the User. NJ, Prentice Hall PTR. Masters, G.M., 1998. Introduction to Environmental Engineering and Science. Pearson Education, Singapore. Press, W.H., Teukolsky, S.A., Vellerling, W.T., Flannery, B.P., 2002. Numerical Recipes in C: the Art of Scientic Computing. Cambridge University Press, Cambridge. Prior, E.J., Schiess, J.R., McDougal, D.S., 1981. Approach to forecasting daily maximum ozone levels in St. Louis. Environmental Science and Technology 15, 430e436. Robeson, S.M., Steyn, D.G., 1989. A conditional probability density function for forecasting ozone air quality data. Atmospheric Environment 23, 689e692. Robeson, S.M., Steyn, D.G.,1990. Evaluation and comparison of statistical forecast models for daily maximum ozone concentrations. Atmospheric Environment 24B, 303e312. Schmidt, H., Derognat, C., Vautard, R., Beekmann, M., 2001. A comparison of simulated and observed O3 mixing ratios for the summer of 1998 in Western Europe. Atmospheric Environment 35, 6277e6297. Shabri, A., 2001. Comparison of time series forecasting methods using neural networks and Box-Jenkins models. Mathematika 17, 25e32. Shumway, R.H., Stoffer, D.S., 2006. Time Series Analysis and its Applications e With R Examples. Springer ScienceBusiness Media, LLC. Simpson, R.W., Layton, A.P., 1983. Forecasting peak ozone levels. Atmospheric Environment 17, 1649e1654. Slini, Th., Karatzas, K., Moussiopoulos, N., 2002. Statistical analysis of Environmental data as the basis of forecasting: an air quality application. The Science of the Total Environment 288, 227e237. Tang, Z., Almeida, C.De, Fishwick, P.A., 1991. Time series forecasting using neural networks vs. Box-Jenkins methodology. Simulation 57, 303e310. Tsai, C.-h., Chang, L.-c., Chiang, H.-c., 2009. Forecasting of ozone episode days by costsensitive neural network methods. Science of the Total Environment 407, 2124e2135. van Loon, M., Builtjes, P.J.H., Segers, A., 2000. Data assimilation of ozone in the atmospheric transport chemistry model LOTOS. Environmental Modeling Software 15, 603e609.

Das könnte Ihnen auch gefallen