Beruflich Dokumente
Kultur Dokumente
Monitoring and prediction of indoor air quality (IAQ) in subway or metro systems
using season dependent models
MinJeong Kim a , B. SankaraRao a , OnYu Kang a , JeongTai Kim b , ChangKyoo Yoo a,
a
Department of Environmental Science and Engineering, Center for Environmental Studies, Kyung Hee University, Yongin 446-701, Republic of Korea
b
Department of Architectural Engineering, Center for Sustainable Healthy Buildings, Kyung Hee University, Yongin 446-701, Republic of Korea
a r t i c l e i n f o a b s t r a c t
Keywords: Usually, various types of hazardous pollutants remain accumulated in underground subway stations of
Indoor air quality (IAQ) metro system. To control indoor air quality (IAQ) in subway stations, the control strategies based on the
Multivariate analysis of variance predictive model which does not have the effect of temperature due to seasonal variations, have been
(MANOVA)
currently used. In this paper, season dependent models for monitoring and prediction of IAQ, which take
PCA
care of seasonal changes, are proposed. The real time data of various pollutants (namely, concentration
PLS regression
Season dependent models of PM10 and PM2.5 on platform, temperature, humidity and the concentration of nitrogen) during March
Subway station 2008 to February 2009 are obtained from Seoul subway station. MANOVA test has been carried out to
Metro system know the quantitative measure of the differences among different data sets of three seasons (spring and
fall, summer, and winter). PCA and PLS regression methods are applied on data sets of one year (to develop
global model) and four seasons (to develop seasonal models) to monitor and predict the IAQ. The results
of this study show that the seasonal models can predict the future data of PM10 and PM2.5 precisely than
the global model.
2011 Elsevier B.V. All rights reserved.
0378-7788/$ see front matter 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.enbuild.2011.10.047
M. Kim et al. / Energy and Buildings 46 (2012) 4855 49
obtained for one year and one season. But, they did not propose F-test rejects null hypothesis H0 : 1 = 2 = = g = 0 (where 1
a quantitative method to know whether IAQ depends on seasonal is mean-vector of lth population, calculated using the expression
variations, from the given one year data set. l = (xl,1 + xl,2 + xl,3 + + xl,n )/n) at level , if
In the rst part of the present study, multivariate analysis of g
n p2 1
variance (MANOVA) approach is used to know whether seasonal F= i=1 l
>F g () (3)
variations affect the IAQ. To know the inuence of seasonal vari- p 2p,2 n p3
i1 i
ations on the IAQ, the data with the following two scenarios are g () is the upper (100)th percentile of the
where F
compared: (1) the real data which has effect of temperature and 2p2 n p2
(2) the data obtained after taking out the effect of temperature. To
i=1 l g
F-distribution with 2p and 2 n
i=1 l
p 2 degrees of freedom.
eliminate the inuence of seasonal temperature from the real data,
a method of external analysis is used. 2.2. External analysis
Later, PCA and PLS regression methods are applied on data sets
of one year and four seasons to monitor and predict the IAQ, respec- Usually, process variables can be classied into three groups,
tively. These methods are applied on real time data obtained from external variables, main variables and other variables. The concen-
tele-monitoring system (TMS) in S-station, Korea. The variable sets trations of main variables depend on external and other variables.
of IAQ which have mutual dependency are obtained while mon- Hence, the expressions for the main variables (such as PM10 and
itoring IAQ using PCA method. These variable sets help to know PM2.5 ) can be subdivided in to two parts: (a) the part affected by
the dependency between different IAQ variables. PLS regression external variables [such as temperature, humidity and number of
method is used for predicting IAQ of a particular day using the data passengers] and (b) the part affected by the other variables (other
of a previous day. This procedure is followed to predict IAQ for one than external variables) [such as ventilation system, air ltration
year (namely, global prediction curve) and four seasons (namely, and air curtains] [22,23]. Now, external analysis is carried out on
seasonal prediction curves). The global prediction model is gener- main variables to remove the effect of external variables. The con-
ally used terminology, which means the single model valid over the cepts of external analysis are given below:
entire phase space (or entire period) of the dataset. RMSE values of In external analysis, data matrix X (of main and external vari-
concentration prediction curves (of PM10 and PM2.5 ) obtained using ables) can be written as combination of external and main variables
PLS global and seasonal regression models are compared to know
the superiority of the seasonal regression model over global.
Outlines of this paper are as follows. The rst section introduces X= G H
the theories of MANOVA, external analysis and multivariable sta-
where G is the matrix of external variables data and H is the matrix
tistical methods. In Section 3, motivation of the present study is
of main variables data. The main matrix H can be split into two
introduced and the proposed methods for monitoring and predic-
parts, GC and E, where GC is a part affected by external variables
tion of IAQ for four different seasons and full year are explained.
and E is the part affected by other variables. Thus, the main matrix
Section 4 presents the results for the data obtained from Seoul
H can be written as
metro station. Finally the conclusions of this article are addressed.
E = H GC
2. Theory
where C is the coefcient matrix calculated using C = (GT G)1 GT H
2.1. Multivariate analysis of variance (MANOVA) [22]. Advantages of carrying out external analysis are: detecting
abnormal variations in main variables, locating the group of vari-
MANOVA is used to investigate whether the population mean- ables in which cause the abnormal variation in main variables, to
vectors are the same and, if not, which mean components differ lter abnormal variations in the main variables [24].
signicantly. If a data set has g number of populations; a popula- In this study, we focused on E matrix, which is obtained by
tion has n number of vectors; and a vector has p elements in it, removing the effect of changes in external variables (i.e., temper-
then a vector j of a population l can be decomposed as per the ature in this study) on changes in main variables (i.e., PM10 and
following model: PM2.5 in this study).
Table 1
Matrices B and W required for calculation of * (see Eq. (2)).
Source of variation Matrix of sum of square and cross products (SSP) Degrees of freedom (d.f.)
g
T
Treatment B= nl (xl x )(xl x ) g1
l=1
ng T
n T
written as the sum of the outer product of vectors ti and pi and usually focus on global approaches, which adopt a single model
the residual matrix E: to represent the inputoutput behavior of system. However,
n when dealing with large data sets from environmental, the global
X = TPT + E = ti pTi + E (4) approaches are inefcient to characterize the data belongs to a
particular period (or duration) and are difcult to optimize the
i=l
formulated problem and update the data on-line once the process
where ti is a score vector that contains information on the relation- operating mode changes. On the other hand, local approaches use
ship between different samples, pi is a loading vector that contains multiple models, which are valid in certain localized operating
information on the relation between different variables and n is regions. To estimate and produce an accurate forecast of system
the number of independent variables [25]. output at a given time, local models are chosen according to the
In practice, only a few principal components (PCs) are sufcient model (or criteria) dened on the input data [3234].
to explain most of the data. Application of PCA model on the data To nd an accurate IAQ prediction model in underground sub-
can facilitate ti determine the optimum number of PCs by simul- way stations, a season dependent local model (namely, seasonal
taneously considering the reduction in the dimensionality and the model) needs to be developed, since the IAQ of the metro sys-
minimization of loss of data information. Several techniques exist in tem has been inuenced by seasonal temperature variations. In this
PCA to estimate the number of PCs, including scree plotting, parallel study, variations of seasonal temperature on concentration of PM10
analysis, and cross-validation [21,26]. and PM2.5 are studied and valid seasonal models are developed.
n
May 2008 and from October 2008 to November 2008, respectively,
X = TPT + E = ti pTi + E (5) for summer and winter are from June 2008 to September 2008 and
i=l from December 2008 to February 2009, respectively. It is to be note
that the data sets of spring and fall are treated as one population.
Y = UQT + F = ui qTi + F (6) The data sets of summer and winter are treated as 2nd and 3rd
populations, respectively. Number of samples obtained for spring
i=l
and fall, summer and sinter are 134, 98 and 88, respectively.
where p and q are the loading vectors that contain information on Before carrying out the MANOVA test on IAQ, external analy-
the relationship between different process and output variables, sis is carried out for removing the inuence of seasonal changes
respectively, n is the vector of the number of latent variables (LVs), (due to temperature variations) from the IAQ data. It is well known
T and U are score matrices (which has the information between that air pollutants, especially concentration of PMs, will get affected
different samples) and E and F are the residuals. The PLS regression due to temperature changes [9]. Therefore, in the external analy-
model which relates X and Y can be expressed in as sis, temperature is referred as external variable and concentrations
Y = X LV + F (7) of PM10 and PM2.5 (which has diameters with 10 m and 2.5 m,
respectively) are referred as main variables.
where B is given by B = W(PT W)1 QT , in which W = (XT Yq)T [2931]. MANOVA tests are carried out to know whether seasonal vari-
ations affect IAQ. For MANOVA, one year data is divided into three
3. Materials and methods populations, one for spring and fall, one for summer, and the other
for winter). It is to be noted that the data of two seasons spring and
3.1. Motivation of this study fall is taken as one population. Signicant differences among these
3 populations according to changes in seasonal temperature are
Constructing models from time series with nontrivial dynamics observed. To know the inuence of seasonal variations on the IAQ,
is a difcult problem. Conventional data-based modeling methods the data with the following two scenarios are compared: (1) the
M. Kim et al. / Energy and Buildings 46 (2012) 4855 51
real data which has effect of temperature and (2) the data obtained distribution lters. CO and CO2 are measured using the wave-
after taking out the effect of temperature in the real data by exter- lengths obtained by non-dispersive infrared radiation absorption.
nal analysis. Evaluation of inuence of temperature on the indoor The specic characteristic features (such as detection limit and
air pollutants of the metro system is required to propose seasonal measurement accuracy) of measuring equipment of TMS are pre-
IAQ model. sented in Table 2 [19].
Finally, multivariate statistical methods are used to propose The daily mean values of each TMP variables reported during
monitoring and prediction models for IAQ, which takes care of March 2008 to February 2009 are used in the present study. Aver-
effect of variations in different seasons. Monitoring and prediction age values of indoor air pollutants obtained using the above data
models of IAQ are developed for three seasons, one for spring/fall, are shown in Table 3. The meteorological conditions collected from
one for summer, and the other for winter. PCA is used to develop (1) Seoul Metropolitan Research Institute of Public Health and Envi-
the monitoring models, while PLS regression is used to develop ronment and (2) internet material from the meteorological ofce
prediction models for three seasons. In the development of predic- is used in this study.
tion models of IAQ, output (or response) variable (Y) taken for this
study are the current concentrations of PM10 (t) and PM2.5 (t). These 4. Results and discussion
variables are considered since concentrations of PMs are taken as
criteria for monitoring IAQ [19]. Input variables (X) taken for this 4.1. MANOVA test for evaluation of changes in seasonal
study are the current concentration of nitrate, previous day data temperature
of PMs (t 1), temperature and humidity. To verify the accuracy of
the temperature dependent models, the performance of seasonal To know the inuence of seasonal variations on the IAQ, the
models (three models for 1-year period) are compared with a global data of IAQ with the following two scenarios are compared: (1) the
IAQ model (one model for 1-year period) which does not take care real data which has effect of temperature and (2) the data obtained
of seasonal variations. after taking out the effect of temperature. To obtain the second
one, external analysis is carried out on original data obtained from
3.3. Seoul station in Seoul metro system Seoul subway station for removal of effect of temperature on IAQ
data. Score plots of PCA analysis (which are not shown in the
The objective system for this study is underground subway is present paper) are obtained by following two scenarios: (1) score
station on line number 4, Seoul Metro (i.e., Seoul stations). Indoor plot (namely, score 1) obtained for the data which has inuence
air pollutants data is collected from a real-time TMS installed in of temperature and (2) score plot (namely, score 2) obtained for
Seoul subway station. TMS system is located at the center of the the data taking out the inuence of temperature variations using
platform and measures the concentration levels of seven air pol- external analysis. Signicant differences between the seasonal pop-
lutants (NO, NO2 , NOX , PM10 , PM2.5 , CO, CO2 ), temperature and ulation data sets of summer and winter are clearly observed in
humidity within the xed measurement intervals. Concentration score 1. However, once the inuence of temperature is eliminated
of NO, NO2 and NOX are measured by the chemiluminescence from the original data, the boundaries between the seasonal popu-
of nitro-oxide materials and ozone. PM10 and PM2.5 are mea- lations are fainted. These results show that the obtained data sets
sured by -ray attenuation principle with the corresponding size for different seasons have affect of temperature. This is aligned with
52 M. Kim et al. / Energy and Buildings 46 (2012) 4855
Table 2
Characteristic features of measuring equipment of TMP in S-station (Kim et al. [19]).
Device (component analyzer) Detection limits (measuring range) Measurement accuracy (measurement repeatability)
NOX analyzer (NA-623) 0.5 ppb (01 ppb) Within 1% of span gas concentration
PM10 analyzer (SPM-613D) Less than 1 g/m3 (00.5/1/2/5 mg/m3 ) Less than 0.5% of full scale (FS)
PM2.5 analyzer (SPM-613) Less than 1 g/m3 (00.5/1/2/5 mg/m3 ) Less than 2% FS
CO2 analyzer(NDIR gas analyzer) 0.1 ppm (05000 ppm) Within 1% FS
Table 3
Average values of indoor air pollutants obtained using the data taken from March 2008 to February 2009, from Seoul subway station.
the previous researchers results that the IAQ (especially PMs) has Fig. 3a, which is the loading plot for spring and fall, seven variables
strong dependency on variations in meteorology and temperature. of IAQ are grouped into two clusters: (1) the rst one consists the
Hence, we can conclude that the seasonal temperature variation concentration of PM2.5 (t), and PM2.5 (t 1) and (2) the second one
is an important variable which affects IAQ data obtained in metro has the temperature, humidity and the concentration of PM10 (t)
system. and PM10 (t 1). In Fig. 3b, which is the loading plot for summer,
To statistically verify the inuence of variations in season on the variables are grouped into two clusters. Note that the depen-
IAQ data, MANOVA test is carried out. MANOVA test will give the dency variables of clusters of Fig. 3b are different from that of Fig. 3a.
quantitative measure of the differences among different data sets However, in Fig. 3c, dening the boundary between the groups of
of season. For MANOVA test, the original data obtained from Seoul dependency variables is ambiguous when compared with other
subway station is divided into three populations: (1) spring and fall, seasonal monitoring results. It is because of the less dependency
(2) summer, and (3) winter, which has affect of temperature. Null of main variables on other variables in that particular season when
hypothesis for MANOVA is that: there is no signicant difference compared with whole year.
between the seasonal population mean-vectors (H0 = spring /fall = PLS regression method is used to propose the seasonal IAQ pre-
summer = winter = 0), if Eq. (3) does not satisfy. As the result of diction model. To know the superiority of the seasonal regression
MANOVA test, the value of F is 172.76, and the F-limited value is 2.11 model over global, root mean square error (RMSE) values of the con-
(where F-limit is the upper 0.95th percentile of the F-distribution). centration prediction curves (of PM10 and PM2.5 ) obtained using PLS
Since the F value is much higher than the F-limited value, the null global and seasonal models are compared. RMSE is a measurement
hypothesis is rejected with the 95% signicance level. It means that method, frequently used to determine the measure of gap between
there are signicant differences of IAQ data according to variations model and observed values. RMSE is dened as:
in season.
Since a global model (for 1-year period) which does not take care n 2
i=1
yi,observed
yi,model
of the changes in season, is not valid for the underground subway RMSE =
system. Hence, the temperature dependent models for monitoring n1
and prediction of IAQ are proposed in the next step.
where yi,observed are the actual observed values,
yi,model are the pre-
dicted values and n is the number of experiments [3537].
4.2. Proposal of seasonal monitoring and prediction model for IAQ
Fig. 2 shows a loading plot (in the PC1 PC2 ) obtained for global
data using PCA. Loading plot is used to interpret the relationships
between the variables. In general, the clustered variables (shown in
different circles) in loading plot represent the variables which have
strong dependency. Fig. 2 represents that the seven variables of IAQ
in the S-station are grouped into three clusters: (1) the rst cluster
contains the temperature and humidity, (2) the second cluster is
having concentration of PM2.5 (t), and PM2.5 (t 1), and (3) the last
one consists the concentration of PM10 (t) and PM10 (t 1). Where
PMs(t 1) is the concentration of previous day. The rst cluster
corresponds to the general correlation between the temperature
and humidity, that is, the humidity decreases as the temperature
increases (due to the increase of saturated vapor). In the second
and third clusters, since the concentration of PMs of previous day
accumulates inside underground subway station and affects the
current and next days PMs concentration.
Fig. 3 shows loading plots (in the PC1 PC2 plane) obtained for
data sets using the seasonal PCA. The number of groups of depen-
dency variables in the three seasonal loading plots is different. In Fig. 2. Loading plot of global data obtained using PCA.
M. Kim et al. / Energy and Buildings 46 (2012) 4855 53
Fig. 4. Indoor air quality (IAQ) prediction results obtained using localized tem-
Fig. 3. Loading plots obtained for seasonal data sets using PCA: (a) spring and fall, perature dependent prediction models for (a) spring and fall, (b) summer and (c)
(b) summer and (c) winter. winter.
Fig. 4a shows the predicted concentrations of PM10 and PM2.5 (PM10 (t 1), PM2.5 (t 1), NOX , temperature and humidity) of
using the spring and fall seasonal model. In this model, PLS X are reduced into three LVs, and represent a strong correla-
regression is constructed with three latent variables (LVs), cap- tion with Y variables [PMs(t)] linearly, explaining about 75% of
turing about 75% of the original data. It means that ve variables original data. The PLS regression model for spring and fall is
54 M. Kim et al. / Energy and Buildings 46 (2012) 4855
Table 4 results show that the seasonal models can predict precise results
Comparison of RMSE values of different component prediction curves obtained using
than the global model. The prediction results obtained using the
global and seasonal prediction models.
proposed seasonal models (which accounts for the variations in
Spring Summer Fall Winter different seasons) are robust when compared with global predic-
Localized prediction model tion model. The proposed seasonal model can effectively predict the
PM10 23.93 13.10 18.02 16.34 future concentrations of indoor particulate matters in underground
PM2.5 9.98 11.17 16.18 14.71 subway station.
Global prediction model
PM10 26.20 13.90 18.20 18.77
PM2.5 10.18 11.92 17.75 15.40 Acknowledgement
[23] Z. Ge, C. Yang, Z. Song, H. Wang, Robust online monitoring for multimode pro- [31] M. Barry, PLS Toolbox 3.5-Users Guide, Eigenvector Research, U.S.A.,
cesses based on nonlinear external analysis, Industrial & Engineering Chemistry 2004.
Research 47 (2008) 47754783. [32] V. Babovic, S.A. Sannasiraj, E.S. Chan, Error correction of a predictive ocean wave
[24] H.J. Remaker, E.N.M. Sprang, S.P. Gurden, J.A. Westerhuis, A.K. Smilde, Improved model using local model approximation, Journal of Marine Systems 53 (2005)
monitoring of batch processes by incorporating external information, Journal 117.
of Process Control 12 (2002) 569576. [33] Z. Ge, Z. Song, Online monitoring of nonlinear multiple model processes based
[25] C. Rosen, A chemometric approach to process monitoring and control, Ph.D. on adaptive local model approach, Control Engineering Practice 16 (2008)
Thesis, Lund Univ., Sweden, 1998. 14271437.
[26] J.E. Jackson, A User Guide to Principal Component Analysis, Wiley, New York, [34] L.G.M. Souza, G.A. Barreto, On building local models for inverse system iden-
1991. tication with vector quantization algorithms, Neurocomputing 73 (2010)
[27] B.S. Dayal, J.F. Macgregor, Recursive exponentially weighted PLS and its appli- 19932005.
cations to adaptive control and prediction, Journal of Process Control 3 (1997) [35] M.H. Kim, A.S. Rao, C.K. Yoo, Dual optimization strategy for N and P removal in
169179. a biological wastewater treatment plant, Industrial & Engineering Chemistry
[28] T. Yamamoto, Application of statistical process monitoring with external anal- Research 48 (2009) 63636371.
ysis to an industrial monomer plant, in: IFAC Symposium on Advanced Control [36] H.B. Liu, M.J. Kim, O.Y. Kang, B.S. Rao, J.T. Kim, J.C. Kim, C.K. Yoo, Sensor vali-
of Chemical Process, 2004, pp. 405410. dation for monitoring indoor air quality in a subway station, Indoor and Built
[29] S.J. Qin, Recursive PLS algorithms for adaptive data modeling, Computers & Environment (2011), doi:10.1177/1420326X11419342.
Chemical Engineering 22 (1998) 503514. [37] T.G. zbalta, A. Sezer, Y. Yldz, Models for prediction of daily mean indoor
[30] J.M. Park, H.W. Lee, The monitoring of biological wastewater treatment plant temperature and relative humidity: education building in Izmir, Turkey, Indoor
using multivariate statistical analysis, DICER Techinfo Part 1, vol. 3, 2004, pp. and Built Environment (2011), doi:10.1177/1420326X11422163.
193202.