Sie sind auf Seite 1von 11

 AtmosphericPollutionResearch6(2015)365375 

Atm spheric Pollution Research


www.atmospolres.com 
Anomaly detection and assessment of PM10 functional data at several
locations in the Klang Valley, Malaysia
NorshahidaShaadan1,AbdulAzizJemain2, MohdTalibLatif3,SayangMohdDeni1
1
CenterforStatisticalandDecisionScienceStudies,FacultyofComputer & MathematicalSciences,UniversitiTeknologiMARA(UiTM),40450ShahAlam,
Selangor,Malaysia
2
DELTA,SchoolofMathematicalSciences,FacultyofScienceandTechnology,UniversitiKebangsaanMalaysia(UKM),43600Bangi,Selangor,Malaysia
3
SchoolofEnvironmentalandNaturalResourceSciences,FacultyofScienceandTechnology,UniversitiKebangsaanMalaysia(UKM),43600Bangi,
Selangor,Malaysia

ABSTRACT 
Inenvironmentaldatasets,theoccurrenceofahighconcentrationofanunusualpollutant,moreformallyknownas
an anomaly, may indicate air quality problems. Thus, a critical understanding of the behavior of anomalies is
increasinglybecomingveryimportantforairpollutioninvestigations.Thisstudywasconductedtodetectanomalies
in daily PM10 functional data, to investigate the patterns of behavior as well as to identify possible factors that
determinePM10anomaliesatthreeselectedairqualitymonitoringstations(Klang,KualaSelangorandPetalingJaya)
intheKlangValley,Malaysia.Thestatisticalmethodemployedtodetecttheseanomaliesconsistedofacombination
oftherobustprojectionpursuitandtherobustMahalanobisdistancemethodsusingairqualitydatarecordedfrom
2005to2010.AnalysisofobtainedanomalousPM10profilesshowedthatdatarecordedduringElNinoyears(2005,
2006 and 2009) contained the highest frequency of anomalies. More frequent anomalies appeared during the CorrespondingAuthor:
southwest (SW) monsoon which occurs in the months of July and August as well as during the northeast (NE)
monsooninFebruary.Alessernumberofanomalieswerealsoobservedduringweekendscomparedtoweekdays.
Norshahida Shaadan
:+60355435323
Theweekend andmonsoonal effectphenomenawereshown tobe significantlyexistentatallstationswhilewind
speedwaspositivelyassociatedwithextremePM10anomaliesattheKlangandPetalingJayastations.Inconclusion,
:+60355435501
:shahida@tmsk.uitm.edu.my
anomalies detection was found useful for air pollution investigation in this study. The findings of this study imply
thatthelocationandbackgroundofastation,aswellaswindspeed,seasonal(monsoon)andweekdaysweekend 
variationsplayimportantroleininfluencingPM10anomalies. ArticleHistory:
 Received:24April2014
Keywords:PM10,functionaldata,anomalydetection,airqualitymonitoring Revised:30September2014
Accepted:26October2014

doi:10.5094/APR.2015.040

1.Introduction activitiesinvolving,forexample,motorvehicleusageandindustrial
 activity are the major sources of PM10 pollution during nonhaze
Particulate matter is the main pollutant in ambient air, periods, particularly in the Klang Valley region. Meteorological
particularlyinurbanareas(Wrobeletal.,2000;Aceroetal.,2012). factorsalsocontributetotheamountofparticulatematterinthe
In normal conditions, particulate matter usually originates from region (Juneng et al., 2011; Dominick et al., 2012). Higher
natural and anthropogenic sources such as sea spray, road dust, concentrations of particulate matter have been recorded during
soil, motor vehicle usage, industrial activities, domestic activities, the dry season, notably during the El Nino/Southern Oscillation
and biomass burning (Schauer et al., 1996; Keuken et al., 2013). (ENSO) events (Matsueda et al., 1999; Mahmud, 2009). The wind
Particulatemattercouldalsobegeneratedthroughtheaccumula directionfromthesouthwestwhichcomesfromSumatrabetween
tionofsmallsizedparticlesorviasecondaryinteractionsbetween JulyandSeptemberbringsahighamountofparticulatematterto
gases and ions (Shon et al., 2012). Measurement of particulate the Klang Valley. The concentration of particulate matter is also
matter is usually based on its aerodynamic diameter. Generally, influenced by local wind direction e.g. sea and land breezes and
particulatematterwithadiametersizebelow10misreferredto themovementofwindwithinthevalley.Theamountofrainduring
asPM10whileparticulatematterwithadiametersizeoflessthan the rainy season (northeast monsoon) plays an important role in
2.5m is referred to as PM2.5. PM10 has long been recognized as reducing the quantity of particulate matter in the ambient air.
the main parameter for determining particulate matter in Other factors, such as activities on weekdays and weekends also
Malaysia. Additionally, it has been consistently used as the influence the amount of particulate matter in the Klang Valley
principal specification for the calculation of the Malaysian Air (Azmietal.,2010).
Pollution Index (API) (Afroz et al., 2003; Awang et al., 2000). 
Elevated concentrations of PM10 have also been implicated in Understanding the behavior of anomaly occurrences is
respiratory mortality, particularly in busy areas such as the Klang becoming more important in air pollution investigation. The term
Valley(Mahiyuddinetal.,2013). anomaliesreferstoasmallportionofthedatasetthatisunusual
 ordissimilartotherestofthedata.Anomaliesmayconsistofnoisy
PM10 has been found to have a significant connection with data due to random errors; alternatively, they may be irregular
haze episodes from biomass burning, a challenge which has items of data resulting from unusual or unexpected events which
become both typical and reoccurring in Southeast Asia since the may indicate abnormal behavior (Torres et al., 2011). Intensive
1980s (Abas et al., 2004; Field et al., 2009). Local anthropogenic study of anomalies helps in identifying potential sources of the

Author(s)2015.ThisworkisdistributedundertheCreativeCommonsAttribution3.0License.
Shaadan et al. Atmospheric Pollution Research (APR) 366

occurrences. Anomaly or outlier detection refers to statistical monitoringstationislocatedinaresidentialarea,ontheoutskirts
techniquesusedtodetectabnormaldataoroutliers(Munizetal., ofasmalltownwhichisbothnearthecoastandtothemainroad.
2012).Basically,inenvironmentalresearchandotherfields,outlier Meanwhile, Petaling Jaya air monitoring station is the nearest
detection is among the most important tasks in data analysis station to Kuala Lumpur city centre and is surrounded by
(Filzmoser, 2005; Garces and Sbarbaro, 2011). The practical industries, residential and commercial areas as well as a heavily
application of this technique ranges from its usage in detecting congestedroad.
financial fraud (Sharma and Panigrahi, 2012), network intrusion 
(GarciaTeodoroetal.,2009;DavisandClark,2011),systemhealth
monitoring (Hauskrecht et al., 2013), criminal incidence analysis
(LinandBrown,2006)andmanyotheraspects.Ontheotherhand,
anomaly detection is less applied to environmental data particu
larlyduringthemonitoringofairpollutionwhereastheapplication
isimportantbecauseitcouldbeusedtoevaluatepollutedairinan
area(Torresetal.,2011).Furthermore,assupportedbyHawkinset
al. (2002), it is reasonable to assume values for possibly polluted
airbehavingasoutliersoranomalies.

Several studies that focuses on air pollutant variations in the
KlangValleyregionhavebeenconducted(Azmietal.2010;Juneng
etal.,2011;Ahamadetal.,2014).Noticeablyinthosestudies,the
employmentoffunctionaldatawaslessapplied.Apreviousstudy
by Shaadan et al. (2012) highlighted the advantage of functional
data approach in assessing and comparing the PM10 behavior
duringandbetweenthetwoextremehazeyears(1997and2005)
thathavebeenreportedinMalaysia.Nevertheless,forthisstudy,
besides aiming to provide a complementary technique for the
evaluation of air pollution problem, functional data were further
extrapolated to increase understanding for PM10 anomalies and
theassociatedinfluentialfactors.

Inspecific,theobjectiveofthisstudyistodetectandanalyze
theprofilesofanomaliesindailyPM10functionaldataaswellasto
investigate possible factors associated with the existence of Figure1. Locationofairqualitymonitoringstations.
anomalies at three air quality monitoring stations (Klang, Kuala
SelangorandPetalingJaya)withdifferinglocationalbackgroundsin 
theKlangValleyregionoftheMalaysianPeninsular. Missing data were treated using the column median value
 computedfromtheavailabledata.Themethodutilizedwaschosen
2.Methodology due to a considerably small percentage of missing values (<5%).
 This approach is supported by Acuna and Rodriguez (2004). Since
2.1.Descriptionofdataandstudylocation thestudysetouttodetectanomaliesintheformoffunctionaldata
 (curves),2192(N)dailyPM10curveswereusedforanalysisofdata
The Klang Valley region is considered to be the heartland of obtainedat eachstation.Thecurvedataattime [hour(t)]andat
Malaysia's industrial and commercial sectors with a highdensity anypoint(j)isdefinedasfollows:
multiracial population. The climate of Malaysia is very much 
influencedbytwomajortypesofmonsoonseasons;thesouthwest  (1)
(SW) and the northeast (NE), as well as another two inter 
monsoon periods. During the SW monsoon, which is reported to DataprocessingandanalysiswereconductedusingthefreeR
occurfromlateMayuntilSeptember,drierweatherconditionsare software (R Development Core Team, 2008) together with
normally experienced. Meanwhile, the NE monsoon which takes "rainbow" (Shang and Hyndman, 2013) and "fda" packages
place between November and March receives a higher precipi (Ramsayetal.,2013).
tationlevel,particularlyduringthefirstfewmonthsoftheseason. 
 2.2.Dataanalysis
The air pollutant and meteorological variable of concern in 
Data analysis in this study involved several stages. The first
this study are PM10 and wind speed, respectively. The data were
stageinvolveddataconversionfrompointvaluesintofunctionalor
made available by the Air Quality Division, Department of
curve forms. The second stage focused on the detection of
Environment Malaysia (DOE). To ensure for reliability of the
anomalies in PM10 functional data at the three selected stations.
measurement process, continuous monitoring and calibration of
This was subsequently followed by an assessment of the effec
theequipmentwascarriedoutbyAlamSekitarSdnBhd(ASMA),a
tiveness of the preferred detection method and profile
private company mandated by the DOE for this purpose. Daily by
construction wherein detected anomalies were extracted and
hourly PM10 in (gm3) was recorded using  ray attenuation
summarized.Finally,astatisticaltestwasconductedtoascertaina
mass monitor (BAM1020)whilewindspeeddatain(kmh1)was
phenomenon that may be indicated by the anomaly profile and
determinedusingMetOne010Csensor.
also to investigate the association between anomalies and wind

speed.
Three selected air quality monitoring stations; Klang (S1),

KualaSelangor(S2)andPetalingJaya(S3)whicharelocatedwithin Data conversion from points to functional data. Hourly recorded
theKlangValleyregionofPeninsularMalaysiawereinvolvedinthis data were converted into daily i functional data, xi(t) using basis
study. Table 1 describes background information and the data functionexpansiongivenbythefollowingequation:
while Figure 1 shows the location of the sampling stations. Klang 
air monitoring station is located in the city centre and is in close
proximity to a busy, trafficladen industrialized area, surrounded  (2)
by main roads and a busy port (Port Klang). Kuala Selangor air

Shaadan et al. Atmospheric Pollution Research (APR) 367


Table1.Dataandthestationinformation
Station Background Longitude() Latitude() MissingData(%)
Klang(S1) Urban N0300.597' E10124.507' 2.17
KualaSelangor(S2) Suburban N0319.592' E10115.532' 1.94
PetalingJaya(S3) Industry N0306.553' E10138.322' 2.62

which, consists of a linear combination of K, independent basis (4)
function Mk(t) and the basis coefficient k. Although various kinds 
ofbasisfunctioncanbeusedinthemodelingprocess,determining where, xi is a vector of measured points for curve , is the
which basis is the best is dependent on the nature of the data. location estimator (i.e. mean vector) and matrix  is the robust
Fourierbasis,forexample,issuitableforperiodicdatawhilespline estimateofthecovariancematrixofX.Giventheassumptionthat
ismoreappropriatefornonperiodicdata(RamsayandSilverman, thedataweregeneratedfromachisquareddistribution,thecut
2006). The appropriate value of K is determined using Bayesian offpointtodifferentiatebetweenanomalousandnonanomalous
Information Criteria (BIC) based on the construction of the mean curve was based on the critical value of the1,p2, that is,
functional data from the data set (Huang and Shen, 2004). The predefinedquantileofthedistributionwithpdegreeoffreedom
appropriateKistheonethatgivestheminimumBIC.Inthisstudy, (Filzmoser,2005).Thecutoffpointwasdeterminedbasedonthe
data conversion was conducted using the bspline basis with the choice of . Lower values indicated higher cutoffpoints which
number of basis, K, equal to 15 for Klang and 17 for both Kuala resulted into lower percentages of detected anomalies. In this
Selangor and Petaling Jaya air monitoring stations. Further study,anomaliesweredetectedwhentheirdistanceexceededthe
information on the theory and application of the statistical critical value 0.99,p2 which is a measure of the outlyingness of a
approaches in functional data can be obtained from Ramsay and curve with =0.01. The larger the squared value of the robust
Silverman(2002;2006). Mahalanobis distance, the more outlying the curve was from the
 centreofthegroup.
Anomalydetection.ThemultivariaterobustMahalanobisdistance 
methodreportedbyHyndmanandShang(2010)wasusedforthe Therobustestimateofthecovariancematrixwastakenasthe
detection of anomalies inthis study. Since computationusingthe covariance of the optimal subsample h. The subsample was
robust Mahalanobis distance method adopts a multivariate considered optimum if it had the minimum determinant of the
approach, n equally spaced discretized points on the curve that covariance matrix, more formally known as the minimum
span across the curves interval are needed to represent a covariance determinant (MCD) approach (Rousseeuw and Van
functional data. As required, some trade off were considered in Driessen, 1999). The value of h was assumed to be the minimum
choosingtheappropriatevalueofn.Asmallnensuresstabilityin number of curves which must not be outlying. Using MCD, the
thealgorithmcomputationwhilealargenbetterapproximatesthe outlyingpointswereignoredinthecomputationprocess.MCDwas
curve.However,atoolargenincursprobleminthecomputationof definedasfollows:
multivariate statistics due to the singularity of covariance matrix 
(Liebl,2013). (5)
 
The robust, multivariate Mahalanobis distance approach where, L is the matrix of the subsample h that has the minimum
consists of two subprocedures. First is the projection pursuit determinant of the covariance matrix with hNp
procedure and second is the computation of the measures for
 and ,thus  is the
curveoutlyingness.Theaimofthefirstprocedureistosearchfor
specificlinearprojectionsofthediscretizedcurves.Usingprincipal sample covariance estimate. The discussed method was favored
component analysis (PCA), the discretized curves were projected due to the convincing application results for detecting functional
intopdimensionalspace.ConsideringamatrixXofsizeNrowsby outliers as reported by Hyndman and Shang (2010), and also due
n columns, the search for the projected curves follows the eigen to the efficiency of the approach in handling large data sets
equation: (RousseeuwandVanDriessen,1999).
 
 (3) 3.ResultsandDiscussion
 
where, V is the sample variancecovariance matrix, that is The main goal of the analysis was to detect anomalies.
V=N1XTX,thetermuisaneigenvectorofVandisaneigenvalue Anomalies with all hours of the day which lie above the median
ofV.Thesolutioncanbeobtainedbyfindingthefirstpprojected levels when detected have increasingly become an issue of
weightvector(i.e.eigenvector)thatmaximizesthevarianceinthe concern due to their potential impact on human health. These
data.Theprojectedscoresisgivenby , , anomaliesareknownasredanomalies(RA).Thesecondgoalwas
to investigate the effectiveness of the employed anomaly
. Eigenvactors are orthogonal to each other; the first is
detection method followed by an establishment of the anomaly
contributedtobythelargestvariationinthedataset,thesecond profiles and finally an examination of the possible existence of a
bythesecondlargestvariationandsoon. phenomenonthatmightbeindicatedbytheprofileaswellasthe
 influenceofwindspeedonPM10anomalies.
In the second procedure, by considering the first two 
projections (i.e. principal component) that describes two major 3.1.Descriptivestatisticsofanomalies
proportions of variation in the data set, the squared robust 
Mahalanobis distance for each curve, D2(xi) was computed. This Aprioranalysiswasconducted todetermine theappropriate
was achieved using the projected scores obtained from the first number of n discretized function to be used in the multivariate
procedure where the new matrix data set was defined to be computation for detecting anomalies. Based on the computation
X=[s1, s2] of bivariate covariates. The term D2(xi) is a measure of forthepercentageofRA(Pred),theresultsinTable2showthatthe
theoutlyingnessofacurve.Thelargerthevalue,themoreoutlying choice of n=40 was appropriate for the data in the Klang and
a curve is from the centre of the group. The computation for Petaling Jaya stations, while n=50 was suitable for the Kuala
formulaD2(xi)foreachcurveisasfollows: Selangor station. These values yielded identical results or very
Shaadan et al. Atmospheric Pollution Research (APR) 368

smalldifferenceindeviancewhenn(thesizeofdiscretizedpoints) percentage was recorded at the urban site (Klang) (1.14%). The
increased. Furthermore, ranging from 20 to 70 in size, there was largerpercentageofabnormaldaysofPM10levelsataquieterarea
notmuchdifferenceinthedevianceoftheoverallmean .The such as Kuala Selangor, suggests a stronger influence from non
anomalydetectionforthedatasetwasthenconductedusingthe local sources, particularly transboundary pollution. Both Kuala
determinedsizeofthediscretizedfunction. SelangorandKlangarelocateddownwindandneartotheIslandof
 Sumatra, while Petaling Jaya is further downwind towards the
Based on the analysis conducted during the six year study central part of the Klang Valley region. Even though Abas et al.
period, the percentage of frequency of anomaly occurrence was (2004) stated that transboundary pollution is the major source of
foundtobe6.80%forKlang, 9.04%forKualaSelangorand5.20% pollution during haze incidences, background sources and
for Petaling Jaya monitoring station. Figure 2a shows that the meteorologicalfactorsarealsobelievedtorelatetohighanomaly
maximum level for all of the detected anomalies was above the occurrences (Lee et al., 2011). Therefore, with respect to the
maximumlevelofthemediancurve.Thus,theresultsindicatethat station's background, the results suggest that apart from the
none of the anomaly curves was totally below the median curve. transported haze, emissions from heavy traffic along with
On the other hand, the percentage of RA was found to be the industrial activity at Klang and Petaling Jaya exacerbate the
highestatthesuburbansite(KualaSelangor)(3.15%)followedby conditions. Ultimately, this makes them poorer in terms of air
the industrial site (Petaling Jaya) (2.46%), while the lowest quality.

(a)
.ODQJ .XDOD6HODQJRU 3HWDOLQJ-D\D










(gm3)

MaximumPM10(gm3)
(gm3 )
 PJP






 10
10
0D[LPXP30

MaximumPM
MaximumPM












0HGLDQOHYHO PD[LPXP 0HGLDQOHYHO PD[LPXP 0HGLDQOHYHO PD[LPXP


               

$QRPDO\'D\
AnomalyDay AnomalyDay
$QRPDO\'D\ AnomalyDay
$QRPDO\'D\

(b) .ODQJ .XDOD6HODQJRU 3HWDOLQJ-D\D












PM10(gm3)



















           
Hour Hour Hour

Figure2.Behaviorofthemaximumlevelofanomalies(a)andtheRAcurvesinwhicheachlinerepresentthedetecteddiurnalPM10anomaly
andthesolidboldcurveatthebottomisthemediancurve(b).



Shaadan et al. Atmospheric Pollution Research (APR) 369

Table2.Differentsizesof(n)discretizedpointsonthecurveandtheestimatedstatisticsmeanandresults
Station

Klang KualaSelangor PetalingJaya
Size(n)  Deviance Pred  Deviance Pred  Deviance Pred
20 67.323   57.985   51.462  
30 67.029 0.293 1.32 57.808 0.178 3.33 51.421 0.041 2.65
40 66.883 0.146 1.14 57.721 0.087 3.24 51.401 0.020 2.46
50 66.795 0.088 1.14 57.670 0.051 3.15 51.390 0.012 2.51
60 66.736 0.059 1.14 57.636 0.034 3.15 51.381 0.008 2.46
70 66.694 0.042 1.14 57.612 0.024 3.15 51.376 0.006 2.46
Size(n) n=40 n=50 n=40
Frequencyof
149(6.80) 198(9.04) 114(5.20)
anomaly

Figure2b showsthe functional form ofthe detectedRA.The KlangandKualaSelangorstationswhileitwas11thAugust2005at
diurnal levels of RA anomalies fluctuate with unstable direction the Petaling Jaya station. Based on these findings, the results
withthemajorityofthemexhibitingpeaksduringdaytimeatKlang support the effectiveness of the applied anomaly detection
andPetalingJayastations.Meanwhile,thepeakatKualaSelangor method used in this study. These results revealed that the
occurredduringthe nighttime. It isalso shownthatafew ofthe significant top 20 anomalous level is mostly dominant during the
severest RA at Klang and Petaling Jaya shared the same diurnal SW monsoon season with the majority of the severe incidences
pattern of maximum peak that occurred after midday at around occurring in the year 2005. The drier climate during the SW
3:00to5:00pm.KualaSelangorexperiencedthemostextremeRA monsoonandamoreseriousforestfireinSumatra,arebelievedto
thatreachedapeakatmidnightandaminimumat12:00noon. be the main reasons for this. In accordance with the study
 conductedbyFullerandMurphy(2006),theforestclearingfire is
3.2.Examiningtheeffectivenessofthemethodusedindetecting stronglylinkedtothemonsoonalsystem,wherethedryseasonis
anomalies thefavoredseasonforburningactivity.
 
Theeffectivenessoftheanomalydetectionmethodemployed 3.3. The influence of wind variable on the severity of anomaly
in this study was also investigated so that further analysis and curves
conclusionscouldbedrawn.Here,theeffectivenessofthemethod 
is defined as the ability of the method to detect anomalies with Variationsinairpollutantconcentrationsarestrongly related
datematcheswiththeperiodofthereportedhazeincidencesthat with variations in meteorological changes (Chang and Lee, 2008).
haveoccurredinthecountry.Notonlyisthedate,themagnitude Juneng et al. (2011) using the regression model showed that
oftheconcentrationlevelsalsoconsideredtoshowthatthosedays meteorologicalfactors,including:temperature,humidityandwind
are anomaly. According to Tangang et al. (2010), several serious speedaresignificantinmodulatingthevariationofPM10overthe
haze episodes have occurred in various parts of the Malaysian Klang Valley region during the southwest (SW) monsoon. The
region, including those that occurred in 198283, 1987, 1991, relationshipbetweenlocalwindspeedandaveragePM10levelwas
2002,2004,2005, 2006and2009.Duetothelimitedinformation foundtobenegativeatallairmonitoringstations,namely:Klang,
on the reported incidences available, only a group of several top Kuala Selangor and Petaling Jaya. However, since the model used
anomalycurvesthatliesattheupperpercentileofthedistribution focusedontheaveragePM10data,itdoesnotexplaintheextreme
weresampledandfedintotheanalysis.Thesedetectedanomalies valueobservations.
were believed to be the consequence of abnormal events. As 
reported in Afroz et al. (2003), high PM10 levels often associated Inthisstudy,inordertoexaminethepossiblecontributionof
withhazeincidencesinMalaysia.Thus,weinferredthatthemost meteorological factors, particularly focusing on wind and the
significant anomalous behavior occurred as a result of the severe diurnal fluctuations of the extreme anomaly, two graphs were
haze incidents that were reported. From the data used in the plotted (as depicted in Figure 3); a sample of 10 most extreme
analysis, some dates and periods of reported incidences were PM10 curves and the graph of the corresponding wind speed
taken from the Malaysian Environmental Quality Report for year curves. Obtained results indicated that wind speed positively
2005, 2006 and 2009 (DOE, 2006; DOE, 2007; DOE, 2010). The influencedtheextremePM10anomaliesatKlangandPetalingJaya.
information was recorded and summarized in the first three On the contrary, the relationship was negative at Kuala Selangor.
columns in Table 3. The dates of occurrence of the top 20 At the 5% significance level, Spearman Correlation Coefficient
anomaliesdetectedfromthedata(20052010)atallthethreeair analysisprovidedevidenceofapositiverelationshipbetweenwind
qualitymonitoringstationsarereportedintherestofthecolumns speedandextremePM10anomaliesattheKlangwithacoefficient
(Table3)includingthemaximum(pointvalue)oftheconcentration value r=0.39 and a corresponding pvalue=0.03. A positive
level. relationship (r=0.66, pvalue=0.00) between wind speed and
 extreme PM10 anomalies was also observed for Petaling Jaya. On
Remarkably, the results in Table 3 have shown that the theotherhand,anegativecorrelationwasobservedbetweenwind
detected time of anomaly occurrences match with the recorded speed and extreme PM10 anomalies at Kuala Selangor (r=0.74,
time and period of the haze incidences that had been reported. pvalue=1.00).
Theseverityranking(numberinbracket)ofthedetectedanomaly 
curveswasdeterminedusingthecomputedMahalanobisdistance 3.4. The profile of anomaly occurrences with respect to an
valueD(xi)whilethemaximumleveleverachievedindicatedthose annual,monthlyanddayoftheweekbasis
detected days were anomalies since all of them contained high 
PM10 concentration level whereby their maximum level was far All of the anomalies detected in the data set have also been
above the maximum of the median curve (i.e. Klang=73gm3, extracted and investigated to identify and study the patterns of
Kuala Selangor=55gm3 and Petaling Jaya=49gm3). The abnormal behavior of daily PM10. Hence, several profiles of
ranking result has shown that 10th August 2005 (indicated by anomaly occurrences were used to describe and summarize the
number 1 in brackets) was the most severely polluted day at the changes, both relating to time (temporal: on an annual, monthly
Shaadan et al. Atmospheric Pollution Research (APR) 370

and dayoftheweek basis) and stations (spatial), see Figure4. Aprilduringtheintermonsoonperiod.Thegraphalsoshowsthat
Noticeably,the trend ofthefrequencyofanomalyoccurrences at almostzeroanomalieswereobtainedinNovember.Thiscouldbe
all three stations indicated that the years 2005, 2006 and 2009 attributed to the washout effect from the higher precipitation
were the most affected and that 2010 was the least affected by levels during this time (at the early NE monsoon). The results
abnormal PM10 levels (Figure 4a). In terms of individual stations, clearly indicate the "monsoonal effect" on the frequency of
Klang and Petaling Jaya were observed to have experienced the anomalyoccurrences.
mostfrequentanomalyoccurrencesintheyear2005,whileKuala 
Selangor encountered them in the year 2006. In general, Kuala On a weekly scale, Figure 4c shows that the frequencies of
Selangorcouldbesaidtobethemostpronestationintermsofthe anomalies fluctuate with an increasing pattern from Monday to
possibilityofbeingaffectedbyanomalousPM10behavior,followed FridayatKualaSelangorandPetalingJayastation.Thefrequencies
by Klang and the least prone station; Petaling Jaya. The reason howeverdroponSaturdayandSunday.Ontheotherhand,onlya
could be due to the influential factor of warmer and drier slight increase in frequency between Friday and Saturday was
temperature during the El Nino period. The increase in the forest observedatKlangstation,thiswasthenfollowedbyadropinthe
firesactivityintheSoutheastAsianMaritimeContinentduringthe valueonSunday.Themain roadtoKualaLumpur(thecapitalcity
dryseason(Reidet al.,2012)consequently exacerbatedthe PM10 of Malaysia), which links many districts, such as Kuala Langat,
pollutiontodegradinglevels. Banting,KualaSelangor,etc.,islocatedinKlang.SinceSaturdayisa
 school holiday, the high number of anomalies could be due to
Thehighestfrequencyofoccurrencewasalsoobservedinthe short vacations or mini trip activity. It is possible that the
drier weather period, namely the SW monsoon. In the case of backgroundofthestationmayleadtothedepletionrateofPM10
Klang, this was in August, whilst for Kuala Selangor and Petaling at Klang station being lower than was observed at the other
Jaya, this occurred in July as is shown in Figure 4b. A secondary stations. Based on these results, the increase in frequency during
peakofoccurrencetookplaceduringtheNEmonsooninKlangand weekdays (Monday to Friday) and the decrease during the
KualaSelangor,whilePetalingJayastationexhibitedtwosecondary weekend(SaturdayandSunday)mayindicatetheexistenceofthe
peaks;oneinthemonthofFebruaryandtheotherinthemonthof "weekendeffect"phenomenon.


(a)
(i) (ii)
WindSpeed(kmhr1)
PM10(gm3)

Hour Hour

(b)
(i) (ii)
WindSpeed(kmhr1)
PM10(gm3)

Hour Hour

(c) (ii)
(i)
WindSpeed(kmhr1)
PM10(gm3)

Hour Hour 

Figure3.BehavioroftheextremePM10anomalies(i)andwindspeed(ii)atKlang(a),KualaSelangor(b)and
PetalingJaya(c).

Shaadan et al. Atmospheric Pollution Research (APR) 371


aRedanomalies(RA)offunctionaldata


Shaadan et al. Atmospheric Pollution Research (APR) 372


Additively, the monthly and dayofweek profile of the results only represent the difference in the overall hours.
anomaly occurrences indicated the potential existence of the Specifically,byanhourlyscale,asshownbythefunctionalmeanof
monsoonal effect" and the "weekend effect" phenomena at the thePM10concentrationlevelinFigure5a,significantlyhigherlevels
study locations. Thus, using the mean distribution of PM10 were observed at all hours of the day except hours between
concentration levels, the hypotheses of significant differences in 10:00am and 12:00 noon at the Klang station. For the Kuala
the mean PM10 levels between the SW and NE monsoons and Selangor and Petaling Jaya stations, the levels were always lower
betweenweekdaysandweekendsweretested.Inthisstudy,PM10 during the NE monsoon as compared to the SW monsoon at all
weekendeffectphenomenawasdefinedasthedifferenceinthe hoursoftheday.
PM10 level between weekdays (Monday to Friday) and the 
weekend (Saturday and Sunday). During the weekend, the On a temporal basis, the functional, descriptive statistical
emissions of anthropogenic precursors are believed to decrease mean of the diurnal level between the weekdays and weekend
fromweekdayvaluesbecausemajorsourcesofprecursors,suchas (seeFigure5b)hasshownthatthedominantdifferenceinthelevel
motorvehiclesandpowerplants,maybelessactiveonweekends. occurring after dawn until midnight (i.e. during anthropogenic
 activity time) was always higher on weekdays than weekends at
On the whole, ttest analysis of the mean level (p<0.05) bothKlangandPetalingJayastations.Ontheotherhand,thesame
provided evidence that the average diurnal PM10 level during the patternwasobservedtooccuracrossthehoursofthedayatKuala
SWmonsoonwassignificantlyhigherthanduringtheNEmonsoon. Selangor station. Of the three stations, the "weekend effect"
Thepvalueobtainedfor thetestofhypothesisonthe"weekend phenomenon in Petaling Jaya was far more significant. It is
effect"alsoproducedthesameresults.Itwasthusestablishedthat believed that this was due to the active emission sources on
the phenomenon exists at all considered stations. The ttest weekdaysascomparedtoweekends.

(a)

(b)
MonthlyProfile
50
40
Frequency

30 Klang
20 KualaSelangor
10 PetalingJaya
0
Dec
Nov

Apr

Oct
Jan
Feb
Dis

Sept
May
Mar

July
Aug
June

(c)

Figure4.Annual(a),monthly(b)anddayofweek(c)profileofanomalyoccurrences.

Shaadan et al. Atmospheric Pollution Research (APR) 373


PM10(gm3)


PM10(gm3)

Figure5.FunctionalmeanofPM10diurnallevelbetweentheSWandNEmonsoon(a)and
betweentheweekdaysandweekend(b).

4.SummaryandConclusion month of February during the NE monsoon season where the
 causes were attributed to local sources. The increasing pattern in
A combination of the robust projection pursuit and the frequency of anomalies during weekdays compared to week
MahalanobisdistancemethodusedbyHyndmanandShang(2010) endsindicatedtheimpactofactivesourcesofPM10suchasmotor
wereemployedtoidentifythedailyPM10anomaliesintheformof vehicles.
curves or functional data at three selected air quality monitoring 
stationsintheKlangValleyregionoftheMalaysianPeninsular.This The study has also provided evidence to demonstrate the
study shows that anomalies detection was a useful statistical existence of the monsoonal effect and weekend effect
technique in studying and investigating abnormalities in the daily phenomena at the study locations. Of the three stations, Kuala
PM10 process system. Using functional data analysis, the whole Selangorwasfoundtoexperiencethemostsignificantmonsoonal
structureofdailydiurnalpatternsofanomaliescouldbevisualized. effects while Petaling Jaya experienced the most significant
It is also shown that functional data for extreme anomalies and weekend effect. Wind speed was shown to positively influence
wind speed offers a solution to investigate the relationship theextremeanomaliesattheKlangandPetalingJayastations.
between two extreme data. The approach could overcome the 
problem facing by Juneng et al. (2011) due to the incapability of Based on the study findings, it was found that the stations'
regressionmethodused. location and background, wind speed along with seasonal
 (monsoon) and weekdaysweekend variation play important role
The detected anomalies from the data set represent ininfluencingPM10anomalies.Inaddition,theprofileofanomalies
interesting annual, monthly and dayoftheweek patterns of could be utilized as a guideline for analyzing the effectiveness of
behavior in their frequency of occurrence. Years with El Nino current air quality control regulations or even for the planning of
events, such as 2005, 2006 and 2009, resulted in the highest new mitigation policies. Given the appropriateness of the
frequencyofoccurrences.ThedryseasoncharacterizedbytheSW application,wesuggesttheincorporationofanomalydetectionas
monsoonwasthedominantperiodofanomalies,withthemonths an important step in data quality control systems as well as in
of July and August being the most frequent months where effortsaimedatairpollutionmonitoring.
anomalies occurred. Transboundary sources were identified as 
being a major influence. Another interesting peak was in the 
Shaadan et al. Atmospheric Pollution Research (APR) 374

Acknowledgments Hawkins, S.J., Gibbs, P.E., Pope, N.D., Burt, G.R., Chesman, B.S., Bray, S.,
 Proud, S.V., Spence, S.K., Southward, A.J., Southward, G.A., Langston,
The authors would like to thank the Department of W.J., 2002. Recovery of polluted ecosystems: The case for longterm
EnvironmentMalaysiaforprovidingtheinformationanddata.The studies.MarineEnvironmentalResearch54,215222.
workissupportedbytheUKMsResearchUniversityGrant[UKM Huang,J.H.Z.,Shen,H.P.,2004.Functionalcoefficientregressionmodelsfor
AP2011_19]. The authors are also grateful to the three nonlinear time series: A polynomial spline approach. Scandinavian
anonymousreviewersfortheirhelpfulcomments. JournalofStatistics31,515534.

Hyndman,R.J.,Shang,H.L.,2010.Rainbowplots,bagplotsandboxplotsfor
References
functional data. Journal of Computational and Graphical Statistics 19,
Abas,M.R.,Oros,D.R.,Simoneit,B.R.T.,2004.Biomassburningasthemain 2945.
source of organic aerosol particulate matter in Malaysia during haze
Juneng,L.,Latif,M.T.,Tangang,F.,2011.Factorsinfluencingthevariations
episodes.Chemosphere55,10891095.
of PM10 aerosol dust in Klang Valley, Malaysia during the summer.
Acero,J.A.,Simon,A.,Padro,A.,Coloma,O.S.,2012.Impactoflocalurban AtmosphericEnvironment45,43704378.
design and traffic restrictions on air quality in a mediumsized town.
Keuken,M.P.,Moerman,M.,Voogt,M.,Blom,M.,Weijers,E.P.,Rockmann,
EnvironmentalTechnology33,24672477.
T.,Dusek,U.,2013.SourcecontributionstoPM2.5andPM10atanurban
Acuna, E., Rodriguez, C., 2004. The treatment of missing values and its backgroundandastreetlocation.AtmosphericEnvironment71,2635.
effect in the classifier accuracy, in Classification, Clustering and Data
Lee, S., Ho, C.H., Choi, Y.S., 2011. High  PM10 concentration episodes in
Mining Applications, edited by Banks, D., House, L., McMorris , F.R,
Arabie,P.,Gaul,W.,SpringerVerlagBerlinHeidelberg,NewYork,pp. Seoul, Korea: Background sources and related meteorological
conditions.AtmosphericEnvironment45,72407247.
639648.
Liebl, D., 2013. Modeling and forecasting electricity spot prices: A
Afroz, R., Hassan, M.N., Ibrahim, N.A., 2003. Review of air pollution and
healthimpactsinMalaysia.EnvironmentalResearch92,7177. functionaldataperspective.AnnalsofAppliedStatistics7,15621592.
Lin, S., Brown, D.E., 2006. An outlierbased data association method for
Ahamad,F.,Latif,M.T.,Tang,R.,Juneng,L.,Dominick,D.,Juahir,H.,2014.
linkingcriminalincidents.DecisionSupportSystems41,604615.
Variationofsurfaceozoneexceedance aroundKlangValley,Malaysia.
AtmosphericResearch139,116127. Mahiyuddin,W.R.W.,Sahani,M.,Aripin,R.,Latif,M.T.,Thach,T.Q.,Wong,
C.M., 2013. Shortterm effects of daily air pollution on mortality.
Awang, M.B., Jaafar, A.B., Abdullah, A.M., Ismail, M.B., Hassan, M.N.,
AtmosphericEnvironment65,6979.
Abdullah,R.,Johan,S.,Noor,H.,2000.AirqualityinMalaysia:Impacts,
managementissuesandfuturechallenges.Respirology5,183196. Mahmud,M.,2009.SimulationofequatorialwindfieldpatternswithTAPM
duringthe1997hazeepisodeinPeninsularMalaysia.SingaporeJournal
Azmi,S.Z.,Latif,M.T.,Ismail,A.S.,Juneng,L.,Jemain,A.A.,2010.Trendand
ofTropicalGeography30,312326.
statusofairqualityatthreedifferentmonitoringstationsintheKlang
Valley,Malaysia.AirQualityAtmosphereandHealth3,5364. Matsueda, H., Inoue, H.Y., Ishii, M., Tsutsumi, Y., 1999. Large injection of
carbon monoxide into the uppertroposphere due to intense biomass
Chang, S.C., Lee, C.T., 2008. Evaluation of the temporal variations of air
burning in 1997. Journal of Geophysical Research  Atmospheres 104,
quality in Taipei City, Taiwan, from 1994 to 2003. Journal of
EnvironmentalManagement86,627635. 2686726879.
Muniz,C.D.,Nieto,P.J.G.,Fernandez,J.R.A.,Torres,J.M.,Taboada,J.,2012.
Davis,J.J.,Clark,A.J.,2011.Datapreprocessingforanomalybasednetwork
Detection of outliers in water quality monitoring samples using
intrusiondetection:Areview.Computers&Security30,353375.
functional data analysis in San Esteban Estuary (Northern Spain).
DOE (Department of Environment Malaysia), 2010. Malaysia Environ ScienceoftheTotalEnvironment439,5461.
QualityReport2009.MinistryofScience,TechnologyandEnvironment,
R Development Core Team, 2008. R: A Language and Environment for
KualaLumpur.
Statistical Computing, R Foundation for Statistical Computing.
DOE (Department of Environment Malaysia), 2007. Malaysia Environ http://www.Rproject.org,accessedinJanuary2013.
QualityReport2006.MinistryofScience,TechnologyandEnvironment,
Ramsay, J.O., Silverman, B.W., 2006. Functional Data Analysis, Springer,
KualaLumpur.
NewYork.
DOE (Department of Environment Malaysia), 2006. Malaysia Environ
Ramsay, J.O., Silverman, B.W., 2002. Applied Functional Data Analysis:
QualityReport2005.MinistryofScience,TechnologyandEnvironment,
MethodsandCaseStudies,Springer,NewYork.
KualaLumpur.
Dominick, D., Juahir, H., Latif, M.T., Zain, S.M., Aris, A.Z., 2012. Spatial Ramsay, J.O., Wickham, H., Graves, S., Hooker, G., 2013. http://
www.functionaldata.org,accessedinJune2013.
assessment of air quality patterns in Malaysia using multivariate
analysis.AtmosphericEnvironment60,172181. Reid, J.S., Xian, P., Hyer, E.J., Flatau, M.K., Ramirez, E.M., Turk, F.J.,
Field, R.D., van der Werf, G.R., Shen, S.S.P., 2009. Human amplification of Sampson, C.R., Zhang, C., Fukada, E.M., Maloney, E.D., 2012. Multi
scale meteorological conceptual analysis of observed active fire
droughtinduced biomass burning in Indonesia since 1960. Nature
hotspot activity and smoke optical depth in the Maritime Continent.
Geoscience2,185188.
AtmosphericChemistryandPhysics12,21172147.
Filzmoser, P., 2005. Identification of multivariate outliers: A performance
Rousseeuw,P.J.,VanDriessen,K.,1999.Afastalgorithmfortheminimum
study.AustralianJournalofStatistics34,127138.
covariancedeterminantestimator.Technometrics41,212223.
Fuller, D.O., Murphy, K., 2006. The ENSO  Fire dynamic in insular
SoutheastAsia.ClimaticChange74,435455. Schauer, J.J., Rogge, W.F., Hildemann, L.M., Mazurek, M.A., Cass, G.R.,
Simoneit, B.R.T., 1996. Source apportionment of airborne particulate
Garces, H., Sbarbaro, D., 2011. Outliers detection in environmental matterusingorganiccompoundsastracers.AtmosphericEnvironment
monitoringdatabases.EngineeringApplicationsofArtificialIntelligence 30,38373855.
24,341349.
Shaadan,N.,Deni,S.M.,Jemain,A.A.,2012.AssessingandcomparingPM10
GarciaTeodoro, P., DiazVerdejo, J., MaciaFernandez, G., Vazquez, E., pollutantbehaviourusing functional dataapproach.SainsMalaysiana
2009. Anomalybased network intrusion detection: Techniques, 41,13351344.
systemsandchallenges.Computers&Security28,1828.
Shang, H.L., Hyndman, R.J., 2013. http://sites.google.com/site/
Hauskrecht, M., Batal, I., Valko, M., Visweswaran, S., Cooper, G.F., hanlinshangswebsite/,accessedinJuly2013.
Clermont, G., 2013. Outlier detection for patient monitoring and
Sharma, A., Panigrahi, P.K., 2012. A Review of financial accounting fraud
alerting.JournalofBiomedicalInformatics46,4755.
detection based on data mining techniques. International Journal of
ComputerApplication39,3747.
Shaadan et al. Atmospheric Pollution Research (APR) 375

Shon, Z.H., Kim, K.H., Song, S.K., Jung, K., Kim, N.J., Lee, J.B., 2012. Torres, J.M., Nieto, P.J.G., Alejano, L., Reyes, A.N., 2011. Detection of
Relationship between watersoluble ions in PM2.5 and their precursor outliers in gas emissions from urban areas using functional data
gasesinSeoulmegacity.AtmosphericEnvironment59,540550. analysis.JournalofHazardousMaterials186,144149.
Tangang, F.T., Latif, M.T., Juneng, L., 2010. Climate change: Is Southeast Wrobel, A., Rokita, E., Maenhaut, W., 2000. Transport of trafficrelated
AsiauptotheChallenge?:TherolesofClimateVariabilityandClimate aerosols in urban areas. Science of the Total Environment 257, 199
Change on Smoke Haze Occurrences in Southeast Asia region, IDEAS 211.
Reports  Special Reports, edited by Kitchen, N., LSE IDEAS, London 
SchoolofEconomicsandPoliticalScience,London,UK.




Das könnte Ihnen auch gefallen