Locally WeightedRegression:An Approach to
RegressionAnalysisby Local Fifing

Locallyweighted regression,or loess, is a wayofestimating a regressionsurfacethrough a multivariate

a functionoftheindependent variableslocallyandina moving fashionanalogousto howa moving averageis computed
fora timeseries.Withlocalfitting we can estimate a muchwiderclassof regression surfaces thanwiththeusualclassesof
parametric functions,
suchas polynomials. The goalofthisarticleis to show,through applications,
howloesscanbe usedfor
threepurposes:dataexploration, diagnostic checking ofparametric
models,andproviding a nonparametricregressionsurface.
Alongtheway,thefollowing methodology is introduced:(a) a multivariatesmoothing procedurethatis an extension of
univariatelocallyweighted regression; (b) statisticalproceduresthatare analogousto thoseusedintheleast-squares of
parametric functions;(c) severalgraphical methodsthatare usefultoolsforunderstanding and checking
loessestimates the
assumptions on whichtheestimation procedure is based;and(d) theM plot,an adaptation ofMallows'sCp procedure, which
providesa graphical portrayal of thetrade-off betweenvarianceand bias,and whichcan be used to choosetheamountof

1. INTRODUCTION mathematical properties; inthisarticlewe further expand

themethodology. The original methodology also included
Locallyweighted regression, orloess, is a procedure for
a robustversioninwhichM estimation is incorporated so
fitting a regression surfaceto data through multivariate
thattheassumption of normality can be relaxed,butwe
smoothing: The dependent variableis smoothed as a func-
do notaddressrobustness here.
tionoftheindependent variables ina moving fashion anal-
Theapplications inthisarticleillustratethreemajoruses
ogousto how a movingaverageis computedfora time
ofthelocal-fittingmethodology. The first is simply topro-
series.The basicframework is this.Let yi(i = 1, .
videan exploratory graphicaltool; graphing smoothsur-
n) be measurements ofthedependent variable,andletxi
facesthatarefitted to thedatacangiveus insight intothe
= (xi1, . . , xip), i = 1, . .. , n, be n measurements of
behaviorofthedataandhelpus chooseparametric models.
p independent variables.Supposethatthedata are gen-
The secondis to provideadditional regression diagnostics
eratedbyyi = g(xi) + gi.As in themostcommonly used
to checktheadequacyofparametric modelsfitted to the
framework forregression, we supposethattheei are in-
data.Thethirdis tousetheloessestimate as theestimated
dependent normalvariableswithmean0 andvarianceU2.
regressionsurface, without resortingto a parametric class
In theusualframework, we wouldalso supposethatg is
a memberofa parametric of functions.While presenting these three uses we intro-
classoffunctions, suchas poly-
ducenewmethodsand reviewand applysomeold ones.
nomials,butherewe willsupposeonlythatg is a smooth
In Section2 we introduce themultivariate smoother: It
function of the independent variables.Withlocal fitting
is a straightforward extensionof the univariateloess
we can estimatea wideclassof smoothfunctions, much
smoother discussedbyCleveland(1979).Section3 has an
wider,infact,thanwhatwe couldreasonably expectfrom
applicationtovelocity measurements ofgalaxyNGC 7531.
anyspecificparametric classoffunctions.
Locallyweighted regression is usedtofita velocity surface
Smoothing bylocalfitting is actuallyan old idea thatis
as a functionofpositionon thecelestialsphere.In Section
deeplyburiedin themethodology of timeseries,where
4 we discussthe statistical propertiesof loess. Fortu-
data measuredat equallyspaced pointsin timewere
nately,analogsofthestatistical procedures usedin para-
smoothed bylocalfitting ofpolynomials (Macaulay1931).
metricfunction fitting-for example,analysisofvariance
(ANOVA) and t intervals-involve statistics whosedis-
ducedlocal-fitting methodsintothemoregeneralcase of
tributions byfamiliar distributions.
regression analysis.HastieandTibshirani (1986)tooklocal
Section5 has an applicationto measurements of ozone
fitting onestepfurther; inanysituation wherea dependent
concentration andthreemeteorological variables.Locally
variabledependson independent variables,we can carry
weighted regression is usedtoprovidea regression surface
out a local likelihoodprocedure.Cleveland(1979) intro-
andto carryoutprediction. In Section6 we introduce the
duced the specificlocal-fitting methodology thatis the
M plot,usingMallows'sCpidea (Mallows1966,1973)with
subjectof thisarticle,locallyweightedregression, and
appropriate modifications forthenewcontext, andgraph-
Devlin(1986) expandedthemethodology and addressed
ingan estimateof meansquarederroragainstdegreesof
* WilliamS. Clevelandis in Statistics
freedomof thefit.The principal use of theM plotis to
Research,AT&T Bell Labo-
ratories,MurrayHill, NJ07974.SusanJ. Devlinis in Measurements choose the amount of smoothing, thatis, theneighbor-
Research,Bell Communications Research,Piscataway, NJ08854.This hood size of themultivariate smoother. Section7 has an
greatlyfromdiscussionswithTrevorHastie,whoshared application to data froman industrial experiment mea-
his substantial withthebackfitting
experience algorithm.
The authors
toJohnChambers, TrevorHastie,JonKettenring,
Mallowsforhelpful aboutthemethods.
suggestions Theyalsothanktwo Association
t 1988AmericanStatistical
whosecomments ledtoa substantial
improve- Journal Association
mentoftheexposition. & Case Studies
Cleveland and Devlin: Locally Weighted Regression 597

suringthe abrasionloss of rubberspecimens.A locally tions5 and 7, we divideeach independent variablebyits

weighted regression analysissuggests thatthereis no in- standarddeviationand thenuse Euclideandistance.(In
teraction betweenthetwoindependent variables,so the applications whereone or moreoftheunivariate sample
regression surfaceis estimated byadditivefitting (Hastie distributions of theindependent variableshas outliers,it
andTibshirani 1986).Section8 has an application to mea- is sensibleto standardize witha resistant measureofscale
surements ofNO, in engineexhaust.The history ofthese such as the interquartile range.)For the applicationof
data includesan estimation of the regression surfaceby Section3 we use Euclideandistancewithout adjusting the
alternating conditional expectations (ACE) (Breimanand scale.
Friedman1985),a procedurethattransforms thedepen- Locallyweighted regression alsorequiresa weight func-
dentvariableand fitsan additivesurfaceto thedata. An tionand a specification ofneighborhood size. The weight
analysisby locallyweightedregression showsthatthe function usedinallofourexamplesis thetricube function:
regression surfaceofthesedatais suchthatno nontrivial W(u) = (1 - U3)3 for 0 < u < 1, and 0 otherwise.We
transformation ofthedatacouldleadtoadditivity. Section now showhow theweightfunction is used. Let d(x) be
9 describessimulations thatinvestigate thedistributionalthedistanceof the qth-nearest xi x. Thentheweight
approximations of Section4. Section10 discussesquali- fortheobservation (yi,xi) is
fications tothemethodology anddiscussesothermethods.
We also introduce graphical methodology inadditionto wi(x) = W(p(x, xi)/d(x)).
theM plot.Because it is easierto discussthesemethods Thusw,(x)as a function ofi is a maximum forxi closeto
withgraphsat hand,however,we introduce thismeth- x, decreasesas the xi increasein distancefromx, and
odologyin theapplications sections.Sections5 and 7 set becomes0 fortheqth-nearest xi tox. Insteadofthinking
forthconditioning plots,Section7 presentscomponent- in termsofq, thenumberofpointsin theneighborhood,
residualplots,and Section5 discussesdiagnostic plotsfor we thinkintermsoff = qln, thefraction ofpointsinthe
checking theassumptions madeaboutei. neighborhood. As f increases,g(x) becomessmoother.
Theshortened nameloess hassomesemantic substance. The M plot,whichis discussedin Section6, is an aid to
A loess(pronounced"lo is") is a depositoffineclayor choosingf in applications.
siltalongrivervalleys;in a vertical cross-section ofearth, If locallylinearfittingis used,thefitting variablesare
a loesswouldappearas a narrow, curve-like stratum run- justtheindependent variables.If locallyquadraticfitting
ningthrough thesection. is used,thefitting variablesaretheindependent variables,
theirsquares,and theircross-products. Locallyquadratic
fitting tendsto performbetterin situationswherethe
Locallyweightedregression providesan estimateg(x) regression surfacehas substantial curvature, suchas local
of theregression surfaceat anyvaluex in thep-dimen- maximaand minima(e.g., see theapplication in Sec. 3).
sionalspace of the independent variables.Let q be an
integer, where1 c q c n. The estimate ofg atx usesthe 3. NGC 7531 VELOCITYDATA:AN APPLICATION
q observations whosexi valuesare closestto x. That is, ILLUSTRATING THE BEHAVIOROF THE
we definea neighborhood inthespaceoftheindependent MULTIVARIATE SMOOTHER
variables.Each pointin theneighborhood is weighted ac- NGC 7531is a spiralgalaxyintheSouthern Hemisphere
cording to itsdistancefromx; pointsclosetox havelarge witha verybright innerring.Buta (1987)mademeasure-
weight,andpointsfarfromx havesmallweight.A linear mentsof the velocitiesof thisgalaxyat a collectionof
or a quadraticfunction of the independent variablesis pointsin thecelestialspherethatcoveredabout200 arc
fittedtothedependent variableusingweighted leastsquares secondsin the north-south directionand about 135 arc
withtheseweights;g(x) is takento be thevalue of this secondsintheeast-west direction. The measurements were
fittedfunction at x. Of course,we mustdo thiscompu- derivedfromnine spectrograms takenat CerroTololo
tationforeachvalueofx forwhichwewantg(x), andthus Inter-American Observatory in Julyand October1981.
loessis a computer-intensive method,butalgorithms exist Each spectrogram was madealonga narrowslit,and the
fordoingthecomputations efficiently(Cleveland,Devlin, velocity measurements weremadeat pointsalongtheslit
and Grosse 1988). byobserving theredshift.The locationsofthesevelocity
To carryout locallyweighted regression we musthave measurements areshowninFigure1. As canbe seenfrom
a distancefunction p inthespaceoftheindependent vari- the figure,thereare sevenuniquepositionsof the nine
ables.Forone independent variablewe letp be Euclidean slits,sincetwopositions wereusedtwice;thesevenunique
distance.For themultiple-regression case itis sensibleto slitlinesintersect at a pointin themiddleof theobser-
takep to be Euclideandistancein applications wherethe vationregion.The maximum velocity measurement is 1,785
independent variablesare measurements of positionin kilometers per second and the minimum is 1,409 km/sec.
physicalspace; forexample,the independent variables The dataarescattered becauseofmeasurement noiseand
might geographical locationand thedependent variable do not form a smoothvelocity field.
temperature. If the independent variablesare measured The velocity surfacewas estimated bylocallyquadratic
on different scales,thenit is typically sensibleto divide fitting withf = .4. Figure2 is a contourplot.The fitted
each variableby an estimateof scale beforeapplyinga surfacedoesa goodjob offollowing theunderlying pattern
standarddistancefunction. For the applications of Sec- in thedata. For example,thesurfacefollowsthepeaks
598 Journalof the American StatisticalAssociation,September 1988

130 I +50 - ~~~~1480


90 -~ ~ ~~~ 1500
90~~~~~~~~~ 525


1540 12
*0 0 156

10 0 0~ ~~~~
O00 00000

C.) ~~~0
*0 0 0
0 .

-Z -2 1580
so j0
_70 *. gis 00 *

- 70
30 0 0 0
0~~~~~~~~ 0 Og

J 0000 00
0~~~0 00*
10 NGC
70 0
0. -50 -25 0 25 50
East Eat- West - East
oordnt East - West Coordinate(ArcSeconds)

Figure 2. NGC 7531 VelocityData. The velocitysurface was esti-

withf = .4. The figureshows surface
mated by locallyquadratic fitting
- 70 -30 10 50 90 contours. The dotted line has a slope of 1080; the surface is roughly
--o-East East- WestCoordinate
(ArcSeconds) symmetricabout thisline.

Figure1. NGC 7531 Velocity

Data. Theplotshows thelocationsin
thecelestialsphereat whichtheNGC 7531 velocitymeasurements 4. STATISTICAL
The loessestimate, ?(x), is a linearcombination ofthe
and troughs in thedata:The maximum valueoftheesti- Yi,
matesatthepositions wherethemeasurements weremade n

is 1,757km/sec, andtheminimum valueis 1,440km/sec. g(x) = E li(x)yi,

is used,thefitis poorerand
cannottrackthesubstantial curvatureunlessf is takento wheretheli(x) dependon Xk fork = 1, . . ., n, W, p,
be verysmall,about.1,inwhichcasetheestimated surface andf,butnoton theyi.Let9i = A(xi)be thefitted values,
is verynoisy. let i Yi
- be the residuals, and let y = (yl, * *
The velocitypatternrevealedbythecontoursis inter- Yn)', 9 (Yi, , 9n)', and e = (i1, . . ., e)' Since
esting.Thereappearsto be an axisofsymmetry ofabout each 9i is a linearcombination of theelementsofy, we
1080(the axis is shownby thedottedlinein Fig. 2). As havethat9 = Ly, whereL (locallyweighted regression)
we movefromnorthto southalongthisaxis,thevelocity is an n x n matrixand e = (I - L)y, whereI is then
increasesbyabout320km/sec. Supposethattheonlymo- x n identity matrix. Thisis analogousto parametric least
tionsof thegalaxy(relativeto theearth)werea rotation squares:Forleastsquares,thefitted valuesareGy,where
aboutan axisthrough itscenteranda recession dueto the G (Gauss)istheprojection operator ontothespacespanned
expansion oftheuniverse. Thenthevelocity surfacewould bythefitting variables.If we applybothG and L to the
be linear,thecontourswouldbe straight linesparallelto valuesof one of the fitting variables,we get the same
theprojection oftheaxisofrotation on theviewing plane valuesback. One way to write thisis GG = G and LG
from theearth,andthevelocity alongthisprojectionwould = G. ButunlikeG, L isneither symmetric noridempotent
be equaltotherecession velocity.
Figure 2 doesnot follow (Devlin 1986).
sucha pattern.The velocityis notlinearalongthe 1080 Thereare threekeyingredients fordiscussing thesam-
axis:As we movefromthecenteroutwardalongtheaxis, plingvariability of the loess estimate:(a) thatg(x) is a
therateof changeof the velocitydecreasesratherthan linearcombination oftheyi;(b) theassumption thaty has
staying constant. Furthermore, thecontours are curved, a normal and
distribution; (c) the assumption that?(x)
bending way below the 1,580km/sec contour and the estimatesg withno bias. For locallylinear theas-
otherwayabovethiscontour.Nevertheless, thecontours sumption of no bias can onlybe exactlytruewheng is
suggestthatthepredominant motionofthegalaxy(aside linear,and forlocallyquadraticfitting it can onlybe ex-
fromtherecession)is circular. The motionsuperimposed actlytruewheng is quadratic.Nevertheless, thegoal of
on thisrotation, whichresultsin thebendingofthecon- partof thediagnostic checking(discussedin Sec. 5) and
tours,is notyetknown(Buta 1987). theM plot(discussedin Sec. 6) is to findestimates with
Cleveland and Devlin:Locally WeightedRegression 599

negligiblebias. Note thatlack of bias also underliesthe > c. ThuswewilluseinanalogywithANOVA a testbased

resultsofparametric regression. on (y'RNY - y'RAy)/y'RAY. In thistestthereductiondue
The majorconclusion ofthissectionis thatseveralsta- to A in theresidualsumofsquaresis comparedwiththe
tisticsdefinedanalogously withthoseusedinfitting para- residualsumof squaresofA. [Devlin(1986) discusseda
metricfunctions by leastsquareshave distributionsthat somewhat differentapproachtotestingforthespecialcase
arewellapproximated bythoseusedinparametric regres- whereN = G.] Let v1= tr(RN - RA), V2 = tr(RN - RA)2,
sion.Thisis good news,becausefamiliar techniquescan (1 = trRA, and (52 = trR32. The idea is to use the two-
thusbe usedin makinginferences basedon loess. In the moment x2approximationforthenumerator oftheafore-
remainder ofthissectionwe present thedistributionalap- mentioned statistic
andthedenominator, andapproximate
proximations, and in Section9 we describesimulations theteststatistic byan F distribution.
thatstudiedthequalityoftheapproximations. - A
(y'RNY y'RAY)IV1
4.1 Distributions
of Residuals, FiltedValues, and (y'RAy)I(1
Residual Sum of Squares is theteststatistic
is approximated by
Because of the linearity 9 and e have an F distribution
and normality, withv1/v2and 11l62df.We referto v1as
normaldistributionswithcovariance a2 LL' and the numerator
matrices divisorof the F testand to v1/v2 as the
a2(I - L)(I - L)', respectively. numerator degreesoffreedom.Similarterminology holds
for(1 and (51/2.

16 = el = residualsumof squares. 5. OZONEAND METEOROLOGICAL DATA:

Becauseoftheunbiasedness, E(e's) = a2 tr(I - L) (I - THESTATISTICAL DIAGNOSTIC
L), and we can estimatea2 by AND CONDITIONING
a= 1'81tr(I - L)(I - L)'. The data in thisapplicationare 111 measurements of
fourvariables-ozone(an airpollutant), solarradiation,
Thus,sincethevarianceofg(x) is temperature, andwindspeed-on 111daysbetweenMay
n 1 and September 30, 1973,at sitesin theNew YorkCity
a2(X) = 2 metropolitan region(Bruntz,Cleveland,Kleiner,and
i=l Warner1974). We analyzedthesedata to describethe
we can estimateit by dependenceof ozone on themeteorological variablesso
thatozoneconcentrations canbe predicted fromforecasts
2 12(x).
ofthemeteorology. Figure3 is a scatterplotmatrixofthe
i=l data. The first step in the of
analysis these data was to
smoothozoneas a function ofthemeteorologicalvariables
Wecanapproximate thedistribution ofa quadratic
form bya locallylinearfitting withf = .4.
in normalvariablessuchas ' by the distribution of a
constantmultiplied bya x2variable;thedegreesoffree- 0 5o 100 150 5 10 15 20

dom and the constantare chosenso thatthe firsttwo

momentsof the approximating matchthose
distribution soi .';' 11 X 00
of the distributionof the quadraticform(Kendalland
Stuart 1977). Let 61 = tr(I - L)(I - L)' and let 2 =
tr[(I- L)(I - L)']2. Usingthismethodofapproximation,
the distributionof (la52)/Q52C2) is approximatedby a
with52l52 df,and thedistribution
g(x))/d(x)is approximated
of (g(x) -
by a t distribution
150 1X 110
df.We can use thisresultto getapproximate confidence
forg(x) basedon g(x).
O 100 ;!00 300 Z K 10O~~~~~~~20
15 '. I:t',.......;i
4.2 Analysisof Variance
SupposethatNyandAyaretwovectorsoffitted values
fortworegression procedures.We thinkofN as yielding o Figure
1020306 Ozone
an Meerloia at.Te
8 0iueisasatepo
a fitfora nullhypothesisand A as yieldinga fitforan
alternativehypothesis.For example,N mightbe linear
leastsquaresso thatN = G andA mightbe loessso that
A = L, or A mightbe loesswitha smallvalueoff, say matriS a
.3, and N mightbe less witha largervalueoff,say .9, mtIx of11Wesrmntfooe ind
Wind Spere
so thatN = L9gandA = L3. Let y'RNY = y'(I - N)(I
- N)'y andy'RAY= y'(I - A)(I - A)'y be theresidual
sumofsquaresofthetwofits.Ifwe wanttotestN against solarradiation.Thegoalis topredict
theozoneconcentrations from
A, thelikelihood
ratiotestleadsus to (y'RNy)I(y'RAy) themeteorological
600 Journalofthe American StatisticalAssociation,September 1988

Theloessialmethodology discussedinthisarticlewidens graph&jagainstthe independent variablesto checkfor

thedomainofapplicability compared withthemuch-prac-bias.
ticedparametric-function fitting;nevertheless,
themeth- Figures4 and5 arediagnostic plotsforthelocallylinear
odologyis stillbasedon certaincritical assumptions. One fittotheozonedata.The toppanelofFigure4 is a normal
is thattheerrors, li, are independently and normally dis- probabilityplotofthes6. The curvature suggests thatthe
tributed withconstant variance.Anotheris thatthefitted 6i have a distribution thatis skewedto the right.The
function followsthepattern ofthedata,thatis, provides bottompanel of Figure4 is a plotof |silversus9j. The
a nearlyunbiasedestimate.Such assumptions mustbe smoothcurveis a locallylinearfitto thepointsoftheplot
checked.Whenassumptions areviolatedwecanoftentake withf = 3. The plot suggeststhatthe variance of 6i de-
corrective actionssimilarto those used in parametric pendson thelevelofg. Figure5 showsplotsof&iagainst
regression. Therealreadyexistsa wealthofdiagnostic pro- theindependent variables.The curveson thegraphsare
ceduresforregression models(Belsley,Kuh,and Welsch locallylinearfitswithf = 2. No distortion appearsin the
1980; Chambers,Cleveland,Kleiner,and Tukey1983; toppanel,buta smalleffect appearsin themiddlepanel
Cook andWeisberg1982;DanielandWood1971).Much anda moreseriousoneappearsinthebottom panel,which
of it is applicableto locallyweighted regression;forex- suggeststhatthe estimatedsurfaceis not following the
ample,one can makea normalprobability plotof 'iEto patternin thedata. Of course,it is possiblethatthedis-
checkthenormality assumption, makea plotofJeijagainst tortion
is also causingtheinadequaciesin Figure4.
9i to checkthe assumption of a constantvariance,and
75 - 0

50 - 0

00 %00 0 o
2- 0 04 io 6o%
so 05 a: ~0 00~0

(D -~~~~
O 100 00 300

-25 0

-50 - - 50 0
-3 -2 -1 0 1 2 3

75 -0|j[ 000
0 a: 00 0 m0 0
50 0~~~
0 ~~~~~~~~0 ~ 80 ~~
o O

'F 50 -
~~~~0~~ 8o 100
:2 ~~~~~~0
' D~~~~
po0~~~~~~~~ 0
? cO 0
or 0
5 0 (D~~~
0 25 -
250 o 0

0 0
o 00 ?P 0
00 o? 0
0 0 0

w 9 000 oo
I I I I . I

o 25 50 75 100 125
Figure4. Ozone and MeteorologicalData. Ozone was regressed on S 10 15 20
the meteorologicalvariables using locallylinearfitting
and f = .4. The WindSpeed
toppanelis a normal probabilityplotoftheresiduals.Thebottompanel
is a graphoftheabsoluteresidualsagainstthefitted values;thesmooth Figure5. Ozoneand Meteorological
Data. Theresidualsfortheozone
curveis a loess fitto thedata oftheplot,withf = 2/3.Theplotsshow dataaregraphedagainsttheindependent variables;thesmoothcurves
nonnormality and a dependenceof varianceon thelevelof thede- are loess fitsto thedata oftheplots,withf = 2/3.Theplotsindicate
pendentvariable. thattheestimatedregression
surfacedoes notfitthedata.
Cleveland and Devlin:LocoallyWeightedRegression 601

We couldreducethedistortion bydecreasing thevalue WindSpeed

off,whichis .4. But sincef is alreadyfairlysmall,and 5 10 15
sincethefitted surfacehas substantial curvature, we de-
cidedtocombatthedistortion byswitching tolocallyquad-
withf = .8. The distortion
raticfitting disappeared, but
4- ~~~~~~~~~~~~~90
theinadequaciesofFigure4 remained. Thuswe tookthe
cube rootsof the ozone concentrations and againcom-
puteda locallyquadraticfitwithf = .8. This estimate
passedthediagnostic checks.
Figures6-8 are three-variable conditioningplotsforthe 7- ~ ~ ~ ~ ~ ~ ~ 7

locallyquadratic fit.In eachpanelofFigure6,gis graphed

againsttemperature forfixedvaluesofsolarradiation and
4- HS w l
windspeed, and confidence (computedas de-
scribedin Sec. 4.1) are shownat fivevaluesof tempera- 7-
ture.For example,in thepanelsofthebottomrow,solar
radiationis 50 langleys;in thepanelsoftheleftmost col- 4- A 62
umn,windspeedis 5 milesperhour.Figures7 and8 graph
g againstsolarradiationandwindspeed,respectively, for
fixedvaluesoftheothervariables.The conditioning plots 0 100 200 300 0 100 200 300 0 100 200 300
showclearlythenonlinearity oftheregression surfaceand Solar Radiation
theinteraction amongtheindependent variables.
One major reasonforfitting a regression surfaceto Figure 7. Ozone and MeteorologicalData. Each panel of thiscon-
ozone data is prediction, eitherretrospective or prospec- ditioning plot shows a slice of the regressionsurface as a functionof
solar radiationforfixedvalues of temperatureand wind speed.
tive.We wantto predicttheseverity of ozone pollution
fromactualorpredicted valuesofthemeteorological vari-
ables. For example,duringtheperiodof measurement,federalstandard of80 ppb.Did thepollution episodecon-
May 1-September 30, 1973,thereweremanydayswith tinueon these two days, or was it reduced? We can use
missingozone measurements becauseof malfunctioningtheloesssurface to estimate the missing ozone concentra-
equipment. Twoofthesedays,August10and11,followed tionsfromthe meteorological measurements. The right
threedaysofrelatively highconcentrations, of
122,89, and and leftendpoints approximate 95% confidence inter-
110 partsper billion(ppb), all of whichwereabove the vals,all on theppbscale, are the following: August 10-
68 and97; August11-34 and 57. Thus ozone might have
WindSped beensomewhat elevatedon the10th,butwithhighprob-
5 10 15 abilityit droppedon the11th.
4- 0 3 3 290 62 76 90

7 X

c 0~~~~~~~~~~~~~~


4 -
X < t C 170 x

7 ~~~~~~~~~~~

7- 4 8 12 16 4170 a:
4 ] < H950 CO


60 70 80 90 60 70 80 90 60 70 80 90 4 ~~~~~~~~~~~~~~~

Figure6. Ozoneand MeteorologicalData. Because oftheproblems

4 8 12 16 4 8 12 16 4 8 12 16
plotsin Figures4 and 5, ozone was trans-
formed andlocally
bycuberoots quadraticfitting f= .8 wasused.
with WindSpeed
plot.Eachpanelshowsa sliceofthe
showsa conditioning
as a function
regression valuesofsolar
fixed Figure Data.Eachpanelofthiscon-
8. OzoneandMeteorological
radiation are95%confidence
fines inter- ditioning surface
plotshowsa sliceoftheregression of
as a function
vals. windspeedforfixed andsolarradiation.
602 Journalofthe American StatisticalAssociation,September 1988

6. THE M PLOT As before,we approximate thedistribution ofF byan F

withv'l/v2 and 51152 df, and thereby approximate thedis-
Mallows (1966) inventeda procedurecalled Cp for
choosinga subsetof theindependent tribution of Mf.
estimates It is important to emphasizethattheM plotis notin-
Mallows(1973) extendedthisto a moregeneralclassof tended to produce hard-and-fast rulesforthechoiceoff.
estimates andappliedittochoosing Rather, by showing the trade-off betweenvarianceand
theparameter inridge
bias as f changesand someinformation aboutsampling
regression.We can also extendit to locallyweighted
regression to help choosethevalue of f. The expected variability, it assistsin our judgment of an appropriate f.
meansquarederrorsummedoverthexiinthesampleand Sometimes we want to minimize the mean squared error;
thismightbe thecase whenwe wantto use g(x) forpre-
dividedby a2 is
diction.In otherapplications we maydecidethatlowvari-
ance is important and thuschoosean f thatinflates the
Mf = LE E (gf(xi) - g(xi))2 a2, bias somewhat;thismightbe thecase whenthesample
size is smallor we are searching fora simpledescription
wherethenotationforthefitted values,gf(xi),nowhas a ofthedatastructure thatcapturesthesalientfeatures. In
subscript to showthedependenceon f. Supposethatas stillotherapplications we mightdecidethatlow bias is
is an estimateof U2 froma smoothing wheres, thevalue critical;thisis oftenthecase whentheloess estimateis
off,is small,usuallyin therangefrom.2 to .4. The idea usedforgraphical exploration, sinceoureyescantolerate
is to choose a smalls so thatthe bias of gM(xi)willbe somenoisebutcannotrecovera missedeffect. Routinely
negligible, whichresultsin a nearlyunbiasedestimateof choosingf byminimizing Mfis a poorprocedure because
a Now,
2. let it ignores variance and bias,which are important to con-
Bf = fe - tr(I - Lf)'(I - Lf) sider in most applications. [Mallows (1973) made the same
pointabouttheuse ofCp.]Furthermore, at theminimum,
and Mfis oftenflatcomparedwithitssampling variability, so
a rangeof valuesof f withdifferent variance and bias
Vf = tr L'Lf.
properties givesthesamemeansquarederror.
A simplederivation showsthatwe canestimate MfbyMf The M plotcanbe usedformoregeneralpurposesthan
= Bf + Vf.Bfis thecontribution ofbiasto theestimated comparing loesssmoothings withdifferent valuesoff. For
meansquarederror,andVfisthecontribution ofvariance. example,we canaddM fromanyparametric fitorM from
If,fora particular f,gfis a nearlyunbiasedestimate,then otherlocal-fitting procedures suchas additivefitting (dis-
usinga standard(5-method argument (Kendalland Stuart cussedin Sec. 8). We do thisbycomputing a valueofM
1977)theexpectedvalueofBfis nearly0, so theexpected ina manneranalogoustothecomputation ofMf,andwith
valueofMfisnearlyVf.Ifas f increases biasisintroduced, a2 stillestimated byas.
Bfhas a positiveexpectedvalue,so theexpectedvalueof
Here Vfis theequivalent numberofparameters ofthe
fit,a measureof the amountof smoothing done by the
local-fittingprocedure.We use thisnamebecauseifwe An industrial experiment wasrunmeasuring threevari-
had doneordinary linearleastsquares,thentheoperator ablesforeachof30 rubberspecimens (Davies 1957).Each
matrixLf wouldbe replacedby G and trG'G = trG, specimenwas rubbedwithan abrasivematerial,and the
the numberof parameters used in the fit.In the forth- abrasionlosswas measured;theexperiment wasto relate
comingapplications, Vfdecreasesas f increases, so more thisloss to measurements of the hardnessand tensile
smoothing resultsin a smallerequivalentnumberof pa- strength ofthespecimens. Figure9 is a scatterplot matrix
rameters. ofthedata,whichweanalyzedbyfitting a linearregression
The M plotis a graphofMfagainstVffora selection model.We intendto evaluatethismodel.In an initialpass
off valuesbetweens and 1; thisletsus see thetrade-offoverthedataan outlierwas foundandremoved;we ana-
betweenthecontributions ofvarianceandbiastothemean lyzetheremaining 29 observations. (Sincetheoutlierdid
squarederroras f changes.It is also helpful, forjudging notresultinextreme valuesinanyoftheunivariate sample
variationon theplot,to showinformation aboutthedis- distributions, theindependent variableswerestandardized
tribution of Mf whenthereis no bias. We can proceed based on samplestandarddeviationscomputedfromall
exactlyas in Section4.2. Let RNbe the matrixforthe 30 observations.)
residualsumofsquareswhenthesmoothing parameter is Figure10 is an M plotwiths = .3. The circlesshowMf
f,thatis, , = Y'RNY, and letRAbe thematrixwhen versusVfforf rangingfromf = 1 (theleftmost circle)
theparameter is s. Then to f = .3 (therightmost circle)in stepsof .05. The line
Mf = Vfhas been drawn;notethatMs mustlie on this
=v line.The verticallinesegments and theirtickmarkspor-
(Y'RNY -Y'RAY)/Vl ? (51- n+? 2tr Lf
traythesampling distribution ofMf,underthehypothesis
of no bias and usingthedistributional approximation de-
= v1F ? os - n ? 2 tr Lf. scribedintheprevioussection:The topofeachlineis the
Cleveland and Devlin:Locally WeightedRegression 603

120 160 200 240 35 3 6 9

0 0 0 0
00 0 0 0 00 o-;?0
0 0o
0 0 0 0
0 0 00 0
Hardness 0 0 0 0 000 0
0 0 0 1 0
000 0.0
0 0
0 0
240- 00000
0 0 00
0 0 0 0 * 115
200- Oo 0 0 C,)
0 0 0 0
0 TensileStength 0
160 i00 r 0 00 0 0 0 00
.00 00 0 0 0 00 0

00 0 0 5-
120- 0 0
0 0 0 0

0 0

- 0
#00 00 000 0 00
0 0 0 0
00 0 0 0 0
0 0 3 6 9 12
00 00

EquivalentNumberof Parameters
50 70 90 50 150 250 350

Figure 10. Abrasion-Loss Data. The M plot is a graphicalmethodfor

Figure 9. Abrasion-Loss
Data.Thefigure is a scatterplotmatrixof
choosing the smoothingparameter,f,in locally weightedregression.
datafrom anindustrial inwhich
experiment abrasion losswasstudied
The filledcircles show M statistics,estimates of the mean squared
a 00e smohig
at smo0hnta
andtensile les AbrarsioLoss
loa strength.
0egrbsi asIonLosse 1 error,forfrangingfrom.3 (rightmost circle) to 1.0 (leftmostcircle). The
G shows the M statisticfora linearleast-squares fit.The M statistics
95% point,theuppertickmarkisthe90% point,thelower are graphed against theirexpected values under an assumptionofno
tickmarkis the10% point,andthebottomofthelineis bias. The slanted line on the plot is y = x, so the verticaldistance of
the5% point.The G on theplotis thevalue of M for an M statisticto the line is the contribution of bias to the estimate of
linearleastsquares.Note thattheequivalentnumberof the mean squared error.The ends of the verticallines show 90% in-
tervals,and the tickmarks show 80% intervalsof the distributions of
parameters fortheleast-squares fitis lessthanthatforany
the M statisticsunder an assumption of no bias. On the basis of this
local-regressionsmoothing, because least squaresdoes plot, f was chosen to be .5.
moredata smoothing thanlocal regression. InAFigure 10
thereis no clearlydefinedpointwherethe Mf begina
precipitousrise,andMfis flatcompared withitssampling tionand linearity ofhardness intothesmoothing. We can
fromf = .3 to f = .5; we chosef to be .5, do thisbyfollowing
variability, the additive-estimation approachof
preferringan estimatethathad as low a varianceas pos- HastieandTibshirani (1986).An additive estimateconsists
sible,inviewofthesmallsamplesize,without introducing ofa sum ofsmooth functions ofthe independent variables,
unduebias.NotethattheA! valueforleastsquaresshows g1(xi1)+ + gA(xip). Thegk arethecomponent functions.
thatthe linear-model fitin theoriginalanalysisis inap- The salientfeatureof the estimateis thatalthoughthe
propriate. regressionsurfaceis nonlinear,thereis no interaction
Figure11 plotsthefitwithf = .5 inthefollowing way: among theindependent variables.
Considerthetopcurvein thebottompanel.The valueof Additiveestimation can be carriedout by usingthe
hardnesshas been setto 60. The curveis a graphof the backfitting algorithm fromprojection selection(Breiman
fittedsurfaceagainsttensilestrength forthisfixedvalue and Friedman1985;Friedmanand Stuetzle1981;Hastie
of hardness.For theothercurveson thepanel,hardness andTibshirani 1986).Backfitting is an iterativeprocedure.
has been set to othervalues.The graphin the bottom In each iterationa component function, say the kth,is
panelis similar,
buttheconditioning is ontensilestrength. updatedbysmoothing y,minusthesumoftheothercom-
Thisgraphical toolis a two-variableconditioning plotthat ponentfunctions as a function ofXik. In our implemen-
can be used generally to explorelbess fitswithtwoin- tationthesmoothing is carriedout byloess.The finalfit
dependent variables.
Ofcourse,itis analogous tothethree- is a linearoperatorappliedto y. For thisreasonthedis-
variableconditioning plotsof Figures6 to 8. Figure11 tributional results ofSection4 applytobackfitting as well,
revealsseveralimportant properties oftheestimated sur- butwithL replacedbythebackfitting operator.
On eachpanelthefourcurveshaveroughly
face.- thesame Additivefittingwasusedfortheabrasion-loss data,with
shape,varying mostly inlevel,suggesting thatthereislittle thecomponent function forhardnessestimated bylinear
interactionbetweentensilestrength and hardness.Fur- least squares and the componentfunctionfor tensile
strengthestimated byloesswithvarying valuesoff.Figure
function of hardnessand a nonlinear function of tensile 12showsMiforthesefits.Thevaluesoff usedintheloess
strength. smoothing rangefromf = 1 (leftmost circle)to fi = .3
Figure11 suggests thatwe incorporate lackofinterac- (rightmost circle)instepsof.05,justas forthemultivariate
604 Journalofthe American StatisticalAssociation,September 1988

350- 35 -l I


150~ ~ ~~~~~~~~5 15

220 5-
50 -

60 70 80

350 - 3 6 9 12

EquivalentNumberof Parameters

Figure 12. Abrasion-Loss Data. Figure 11 suggests thatan additive

~250 nonparametricsmoothingwithno interactionwillfitthe data. Thisfigure
is an M plot foradditive fitswithhardness estimated by linear least

1 squares and abrasion loss estimated by loess, withf rangingfrom.3

to 1 in steps of .05. On the basis of thisplot, f was chosen to be .75.

150 -67 thereis a hockey-stick

dependence.A logicalnextstepin
73 the analysisof thesedata wouldbe to fita parametric
modelinwhichthedependenceoftensilestrength is con-
80 tinuousand piecewiselinear.
140 160 180 200 220 8. NO, DATA:AN APPLICATIONIN WHICH THE
Figure 11. Abrasion-Loss Data. Conditioningplots show a loess fit
to the abrasion-loss data, withf = .5. The graphs suggest the de-
The data in thisapplicationare froman experiment in
pendence on hardness is linearand thatthereis no interaction
tensilestrengthand hardness.
whicha single-cylinder enginewas runwithethanolor
indolene(Brinkman 1981).Thereare 110measurements
loessin Figure10. Also,theestimate of o2 iS thesameas ofcompression ratio(C), equivalenceratio(E), andNO,
thatinFigure10.The plotshowsthatan additivesmooth- intheexhaust.The purposeoftheanalysis wastosee how
ingcan providean acceptablefitto thedata; we chosef NO, dependson E andC. Therewere88 runswithethanol;
to be .75,preferring a low-variance estimatewithout un- fortheseruns,E variedfrom.535 to 1.232,C tookone
dulyinflating themeansquarederror,againinviewofthe offivevaluesranging from7.5 to 18,and thevaluesofE
smallsamplesize. and C werenearlyuncorrelated. Therewere22 runswith
Additive fitscanbe graphedbycomponent-residual plots. indolene;fortheseruns,C tookjustone value,7.5, and
As before,let gr(xir) be the estimated component func- E rangedfrom.665to 1.224.
tions, and let &ibe theresiduals.To studytheproperties Rodriguez(1985)analyzedthesedatausingACE (Brei-
ofthefitwe canmakeone plotforeachcomponent func- man and Friedman1985) and MORALS (Young, De-
tion:gr(xir) is graphedagainstXirfori = 1 to n bycon- Leeuw,andTakane1976),withtypeoffuelas a categorical
nectingsuccessivepointsbylinesegments, andgr(xir) + variableand C and E as continuous variables.In ACE
&iis graphedagainstXirbycircles.Theseplotsallowus to analysistheresulting surfaceis an additivefitto a trans-
see theformoftheestimated surfaceand to see whether formation ofthedependent variable.Thusan ACE fitto
any signalhas leaked into the residuals.The plotting theNO, concentrations resultsin a surfacewithno inter-
methodfollowsthatusedin partialresidualplots(Land- action.
wehr1983;Larsenand McCleary1972),wherethecom- Our goal was to explorethedata to see ifan additive
ponentfunctions havea differentform. fitwas reasonable.To allowforgeneralinteractions, we
Figure13 showscomponent-residual plotsforthe ad- treatedC and typeoffuelas a singlecategorical variable
ditivefitto theabrasion-loss datawithf = .75. The top withsixlevels,sinceC was equal to 7.5 forall indolene
panelshowsclearlytheformthatthenonlinearity takes; runsand to fivevalues rangingfrom7.5 to 18 forthe
Cleveland and Devlin:Locally WeightedRegression 605

100- l l l
300 - 0

a 250- o0 0 0

0 75-
a: ~~~~~0
X 200-
1[ ~~~~~8\
0 49 50-
150 -

0 0 0
100 00 0 25-

120 160 200 240


180 , ,
20 30 40 50

'2 0~~~ EquivalentNumberof Parameters

Figure 14. NO, Data. The data are froman experimentstudyingthe

0~~~~ dependence ofNOxexhaust emissions on equivalence ratio,compres-
sion ratio,and typeof fuel.The figureis an M plot forlocallyquadratic
0 0 0 withf rangingfrom.4 to 1 in steps of .05. The typeof fueland
the level of compression ratio,whichtook on one of fivevalues, were
0 o0
both entered as categorical variables. On the basis of the plot, f was
chosen to be .85.
45 60 75
-90 ~~~~~~~~~~0 successivevaluesby line
in thetop panel by connecting
segments.The bottompanel of Figure 15 is an interaction
plot. Each curveis a graphof
-180 -7 J

45 60 75 90 -
hk(Xj) - h(xj)

Figure13. Abrasion-Loss Data. Component-residual againstx,.

Figure15 showssomething
data,withf = .75. Thecurveon each
additivefitto theabrasion-loss important: For theethanol
plotis theestimatedcomponent foroneindependent
function runs,thereis a substantial
variable, interaction betweenC and E.
and theplotting symbolsshowthecomponent functionvalueplus the
As C increasesNO1'3generally increases, buttheeffect is
residualforeach observation.
reducedas E increasesand eventually becomesnearly0
whenE is at its maximum value. Indoleneadds to this
ethanolruns.Thusthereare twoindependent variables, interaction, because itsbehavior as a function ofE is dif-
E and thiscategoricalvariable.Furthermore, the NO. ferentfrom that of ethanol with C equal to 7.5. Thusan
concentrations weretransformed bycuberoots.Thusthe additivefitis completely inappropriate forthese data. (The
loessanalysisconsistsof sixseparatesmoothings ofcube M plotfor the additive fits, as one would expect, shows
rootNO. as a function of E, one foreach levelof the verylargebiases.)Furthermore, Figure15 showsthatthe
categorical variable.The smootherinthiscase waslocally form of the interaction is such that a nontrivial transfor-
quadraticfitting because,as we shallsee, thefunctional mation of NOx cannot possibly remove the interaction,
dependenceof cube rootNO. on the equivalenceratio whichmeansthatACE cannotleadtoa satisfactory model
has a local maximum and substantial
curvature. forthese data.
Figure14isanM plotforthelocallyquadratic smoother;
thevalueofs is .4, andinmoving fromleftto rightf goes
from1 to .4 instepsof .05. On thebasisofthisplotf was MonteCarlo simulations withnormalei were runto
chosento be .85; Mfjumpsconsiderably forlargervalues investigate thedistributional approximations discussedin
off. Section4. We constructed a widecollection ofdesigncon-
ThetoppanelofFigure15showsthesixlocal-regressionfigurations (i.e., setsof valuesof the independent vari-
estimates, hk(x) fork = 1-6, forthe six levelsof the ables) forup to fiveindependent variables.Threeitems
categorical variable.Each estimatewas computedat 50 werestudied inthesimulations: (a) distribution of&w2/a2,
equallyspacedvaluesofE from.6 to 1.15;letthe50values (b) confidence intervals forg(x), and (c) ANOVA forN
be denotedbyx,*for]j= i to 50. Each estimateis graphed = linearleastsquaresandA = locallylinearfitting. The
606 Journalof the American StatisticalAssociation,September 1988

IndoleneC= 7.5 EthanolC =12.0 ulationcan use samplesfromthe normal.If significant

Ethanol C=18.0 EthanolC= 9.0 ................. nonnormalityappearsin theresiduals,thensampling
Ethanol C =15.0 --------- EthanolC= 7.5
be fromthesampledistributionoftheresiduals.
1.8 ' are discussedin Section9.4.
9.1 LaboratorySimulations:Analysisof Variance
In thissectionwe discusslaboratory simulationsfortest-
ingN = linearleastsquaresagainstA = locallylinear
Figure16 showssomeof theresultsforone col-
lectionof60simulations; eachsimulation employed 16,000
replications,whichgave highaccuracyeven at the .01
significancelevel.The 60 simulations employed18 design
configurations and 4 valuesoff; notall valuesoff were
usedwitheach configuration, sincewe limitedourinves-
0.8 tigationsto practicalsituations.
Therewereninedesignconfigurations forp = 1. For
- N. - eachofthreevaluesofn, 100,50,and25,therewerethree
setsof valuesof theindependent variable.Each set was
0.2 - of theformF-1[(i - .5)/n] fori = 1, . .. , n, whereF
02 -~~~~~ was eithertheuniform, normal,or Cauchydistribution.
Simulations withf = .3, .5, and .7 wererunforeach
x 0.1 ,/ \ configuration,resulting in 27 simulations.
Thereweresix designconfigurations forp = 2. For
eachoftwovaluesofn, 50 and 100,therewerethreesets
0.0 / ofvaluesof theindependent variables.Each setwas de-
rivedin thefollowing manner:One independent variable
wasinitiallysetequal to one ofthesetsofvaluesusedfor
-0.1 , .y'~~~~s. p = 1; the secondvariablewas initially set equal to a
randompermutation of thesevalues; and thenthe two
variableswererotatedand scaledto have correlation 0
- 0.2 _________________ withf = .3, .5, .7, and .9
and variance1. Simulations
0.6 0.8 1.0 1.2
Equivalence Ratio n=100 n=50 n= 25

Figure15. NO, Data. Thetoppanelshowsthesixseparatesmooth-

ingsofNOxas a function ofequivalenceratio.Thebottom
thecurveswitha mean curvesubtracted. Thegraphsshowa strong
interaction amongtheindependent variablesthatcannotbe removed
0.5 LX
by a nontrivial ofthedependentvariable.Thusan ad-
ditivefitis notpossibleforthesedata.
00 0 0 ~~~~~~p=1
distributional approximations of Section4 wereexceed- 0 08
-0 00
ingly close to the truedistributions for(a) and (b). For 0.0.
1.0 . . .
(c) theywereclose,exceptwhenthedegreesoffreedom
1.5-o * O - 5 15 25 35
ofthefitwerea largefraction ofn; however, thissituation
1.0 - .
is notrelevantin practice.We referto thesesimulations
as laboratory simulations,becausetheyemployartificially C) 0.5 1.5 - . 0
5 15 25 35
constructed designconfigurations. In Section 9.1, (c) is
investigated; in Section9.2, (a) and (b) are investigated;
*5 15 25 35
and in Section9.3, (c) is investigated fora modification
For normalei, the truedistributions of the statistics
involved in(a)-(c) dependonthevalueoff andthedesign th resinfcne h0dgeso ee,an h oizna caei

Numerator Degrees of Freedom

configuration. A dataanalystcancheckthedistributional
approximation forany particular applicationthrougha Figure 16. Simulations.The figureshows the results of laboratory
simulation the
using designconfiguration data of the and simulations
ANOVA test
1.5 vetia sa
, h On
forglobal linearity.
ofarearnedom each
is 5 ni cn
thevalueoff used in thesmoothing. We call thesefield
simulations. If the diagnosticcheckingof the residuals
showsthatthesampledistribution oftheresidualsis well ofindependent variables, andn,thenumber ofobservations. Thefigure
approximated bya normaldistribution, thenthefieldsim- showsthatthedistributional approximations work exceedinglywell.
Cleveland and Devlin: Locally Weighted Regression 607

wererunforeach configuration, resulting in 24 simula- Nevertheless, one might hopethatv1is closeto v2andthat
tions. 51is closeto '2, and thentakethedegreesoffreedom to
Therewerethreedesignconfigurations forp = 3. Only be v1and 51. The 60 simulations describedin Section9.1
the value of n = 100 was used, and the configurationswerealso used to investigate thisone-moment approxi-
weregenerated ina manneranalogoustothatforthecase mation.Forthe10%,5%, and2.5% levelsofsignificance,
withp = 2 and n = 100. Simulationswithf = .5, .7, and themaximum absolutedeviations are 3.84%, 2.68%, and
.9 wererunforeach configuration, resulting in ninesim- 1.62%,respectively. The corresponding valuesforthetwo-
ulations. momentapproximation (givenin Sec. 9.1) are 2.18%,
Figure16 showsinformation aboutthetestat the5% 1.59%,and 1.05%. The degradation intheapproximation
levelof significance. The valuesplottedon the vertical fortheone-moment case is justlargeenoughthatwe have
scalesare 5% minustheactualsignificance, and thehor- continued withthesomewhat morecomplicated two-mo-
izontalscalesarethedegreesoffreedom ofthenumerator, mentapproximation.
thatis, v1Iv2.The panelsare arrangedbyp and n. Most
important, Figure16 showsthatthe approximating 5% 9.4 Field Simulations
significance levelis closeto thetruelevelsin each ofthe As we statedearlier,a data analystcan checktheper-
60 simulations. The largestabsolutedeviationis 1.59%. formance of theapproximating distribution in anyappli-
In fact,thesituation is evenbetterthanthat,becausethe cationby a fieldsimulation. If theapproximating distri-
largestdepartures occurforthelargestdegreesof free- butionperformed poorly, thesimulation distribution could
dom, and thesevalues are somewhatlargerthanthose be usedto makeinferences. Butwe havenotyetencoun-
typically used in practice.For thecaseswithlessthan10 teredan application inwhichtheresidualshavea sample
df,thelargestabsolutedeviationis .94%. Similarresults distribution thatis wellapproximated bythenormaland
hold forthe deviationsat the 10% and 2.5% levelsof theapproximating distribution performed poorly.We will
significance. Fortheformer, thelargestabsolutedeviation illustrate the use of twofieldsimulations fortwoof the
is 2.18%; forthe latter,thelargestis 1.05%. Figure16 applications in thisarticle.
also showsthatthe deviationof thetruelevelfromthe For theestimation oftheozone surfacein Section5, it
nominallevelincreasesas p increases, as n decreases,or is sensibleto ask whether theobservedcurvature in the
as thedegreesoffreedomincrease. fittedsurfaceis significant, becausethe estimateof the
The good performance of the approximations for standarderrorof the residualsis a = .43, whichis not
ANOVA occurseven thoughthe numerator of the test smallcomparedwiththesamplestandard deviation ofthe
statisticis notindependent of thedenominator. The ap- cuberootozone concentrations, whichis .89. To address
proximation workspartlybecausethedependenceis not whether datawiththismuchnoisecansupportotherthan
strongand partlybecauseunlessn or f is verysmallthe a globalfit,we carriedout ANOVA (describedin Sec.
numerator is contributingthemosttothevariability ofthe 4.2), testingthelocallyweightedregression fitagainsta
statistic. quadraticleast-squares fit.The F statistic is 2.10 and the
9.2 LaboratorySimulations:Confidence Intervals approximating distribution is F, with19.2and89.0df.The
foroU2 and g(x) significance level is .011, so thecurvature is highly signif-
icant.We also ran a fieldsimulation with1,200replica-
The 60 simulations describedin Section9.1 werealso tions:The simulated significance levelwas .010,whichis
used to investigate confidence intervalsfor U2. For the quitecloseto theapproximating level.
90% confidence level,themaximum absolutedeviation of The resultoftheabrasion-loss application in Section7
theactuallevelfromthenominallevelwas .50%; forthe was a nonlinearadditivefit.Sincethenumberof obser-
95% levelthemaximum was .48%. Clearly,theapprox- vations(29) is small,we might reasonably askwhether the
imating distributions performed excellentlyinthesecases. data reallysupporta nonlinearregression surface.Thus
The 27 simulations forp = 1 thatweredescribedin we testedtheadditivemodelagainsta linearleast-squares
Section9.1 werealso used to investigate confidence in- fit:The significance levelwas .00256,makingthenonlin-
tervalsforg(x) at twovaluesofx: themeanofthexiand earityhighly significant. (Of course,thetestneedsto be
thelargestofthexi. Forthe90% confidence interval,the viewedwithsomecaution,becausethemodelaroseafter
largestabsolutedeviationwas .44% forthe mean and severalpasses of the fitting processand becausef was
.65% fortheextreme.For the95% interval, thelargest selectedfromtheM plot.)A fieldsimulation wasalsorun:
absolutedeviationwas .45% forthemeanand .65% for The simulated significance levelwas .00211,whichis quite
the extreme.Again,the approximations performed ex- closeto theapproximating level.
9.3 OtherLaboratorySimulations
10.1 Locally Weighted Regression
In distributionalapproximations forANOVA, thediv-
isorsforthesumsofsquares,v1forthenumerator forApplications
and ~2
forthe denominator, are not generally thesame as the The methodology introduced herecan be an integral
degreesoffreedom fortheapproximating F distribution,partoftheanalysisin manyregression studies.In fact,it
v2lIv2forthe numerator and b1'/2 forthe denominator. represents a newapproach,comparedwithwhatis most
608 Journal of the American Statistical Association, September 1988

oftenpracticedtoday.Thismethodology can potentially 10.4 WeightFunctionsand the Poor Performance

penetratea regression studymostdeeplywhenthe de- of the Uniform
pendentvariableis a nonlinear function of theindepen-
dentvariables.Today,thetwomostcommonapproaches The generalformof thetricubeweightfunction, par-
to fittingnonlinearsurfaces in applicationsare searching ticularly the smooth contact with 0 at 1, enhances the
fortransformations ofthevariablesthatlinearizethesur- performance of locally weighted regression. Any reason-
faceandfitting polynomials oftheindependent variables. ablefunction withsmoothcontactcan also be expectedto
These methods,however,do not lead to a nearlyrich perform well. Nevertheless, theuniform weightfunction,
enoughclass of surfacesto modeladequatelythe wide with thediscontinuity at 1, performs poorly.
varietyofsurfaces encountered inpractice. Butevenwhen A problemwiththe uniform is thatits discontinuity
thefinalresultofa regression study isa parametricsurface, resultsin localroughness ing(x) thatisalmostalwaysnoise
themethodology can helpsubstantiate thevalidityofthe and notsignal. Thisis a well-known phenomenon indigital
fit. filteringand spectrum analysis,that boxcar windows have
Fouriertransforms withsidelobesthatfalloffslowlyas a
10.2 CurrentRestrictions to the Methodology function of frequency and thuspass unacceptably large
amountsof high-frequency noise (Bloomfield1976). A
One current restrictionoftheapplicability ofourmeth- secondproblemwiththeuniform weightfunction is that
odologyis theassumption ofnormality andconstant vari- it leads to lesssatisfactory distributional approximations,
anceoftheerrors.Nevertheless, future workmightrelax becauseforthe uniform, the eigenvaluesof L-which,
thisrestriction.A methodforestimating g(x) whentheei again,are relatedto theFouriertransform oftheweight
are assumedonlyto be symmetric alreadyexists:robust function-donotlendthemselves as wellto theapprox-
locallyweightedregression(Cleveland1979). What is imations as to a continuous weight functionsuchas tricube
neededforthisrobustprocedure,however,is distribu- (Devlin1986).
tionalresultssimilarto thoseinSections4 and6. Smooth- We mention theweight-function issue,inpart,because
ingtechniqueswithoutdistributional resultsoftenleave asymptotic resultsfornonparametric regressionshowthat
theanalystwithtoo littlemethodology to makeinformed theoverallformoftheweightfunction does nothavean
inferences. appreciable effectwithrespect to meansquarederror(e.g.,
Anothercurrentrestriction is to studiesin whichthe Priestley and Chao 1972).This,however,shouldnotbe
relevanceof each independent variablein explainingthe interpreted to meanthattheformoftheweightfunction
dependentvariablehas alreadybeen ascertained. To re- does notmatterin all respects.
movethisrestriction, workis neededto determine how
to incorporate intoloessmethodology procedures forse- 10.5 OtherMethods
lectinga subsetoftheindependent variables.
Anotherapproachto smoothing a dependentvariable
10.3 The Curse of Dimensionality as a function of two or more independent variablesis
projectionpursuit,an iterativeprocedure(Friedmanand
As thenumberofindependent variables,p, increases, Stuetzle
1981).Ateachstageoftheiterations, yiissmoothed
a fixednumberofpoints,n, rapidly becomessparse.This as a function of a linearcombination of theindependent
is referredto as thecurseof dimensionality. Some have variables.The linearcombination is chosentogivea max-
mistakenly supposedthatthe cursemakesmultivariateimumreduction intheresidual sumofsquares.Thesmoother
smoothing-that is, smoothingwithp > 1-a methodto is similarto univariatelocallyweightedregression, but
avoid.Whatmustbe avoidedis allowing f toremainfixed withmodifications to decreasethecomputation timeand
as p increases,becauseforfixedf theequivalentnumber witha methodforchoosing theamountofsmoothing. The
ofparameters ofthefitincreasesasp increases.Ofcourse, multivariate smoothing introduced hereis attractive be-
we mustmaintaincontrolof the equivalentnumberof cause of its simplicity: For a particular f, g(x) has a
parameters; thisis done by increasingf. As longas we straightforward definition and is simplya linearcombi-
maintain controlanddo notallowtheequivalent number nationof theyi,so thestatistical properties are easy to
of parameters to becomea largefraction of n, we can fathom.This simplicity leads to muchof the additional
expectmultivariate smoothing to behavereliably.In this methodology in thisarticle.The fullprojection-pursuit
articlewe have successfully carriedout multivariate algorithm resultsina considerably morecomplicated func-
smoothing fordata setswithtwoand threeindependent tionof theyi,becausethelinearcombinations ofthein-
variables.Fowlkes(1986) demonstrated thatsmoothing dependentvariablesare chosento minimize theresidual
withmorethanthreeindependent variablesis reasonable sumof squares.Consequently, almostnothing is known
in certaincircumstances, evenformoderately sizeddata about its distributional properties (Huber 1985). In ad-
sets.Of course,as p andf increaseforfixedn therewill dition,fullprojection pursuitalsohasitsrestricted domain
be a decreasein the amountof curvature thatcan be of applicability; not all regression surfacescan be well
estimated without seriousbias.Thisis nota defectin the approximated bya moderatenumber ofsmoothfunctions
methodbut a statementthatthe more complicateda oflinearcombinations oftheindependent variables(Huber
regression surfacebecomes,thelargern mustbe to get 1985).
goodestimates ofit. Exactlythesameconsiderations ob- Locallyweighted regressionfallsintoa classofregres-
tainwhatever themethodofestimation. sionprocedures thatsomecallnonparametricregression.
Cleveland and Devlin:Locally WeightedRegression 609

Stone(1977),Collomb(1981),Wegman andWright (1983), substantial andcomplex,andmanyquestionsremain(Sil-

andTitterington (1985)reviewed otherprocedures. Many verman1985).
studiesofnonparametric regression focusedon asymptotic Two popularmethodsforchoosingthesmoothing pa-
properties such as consistency, normality, and ratesof rameter in spline-fitting are cross-validation (Stone1974)
convergence (e.g., Benedetti1977;Devlin 1986;Hardle andgeneralized cross-validation (CravenandWahba1979).
and Gasser 1984; Stone 1977, 1982; Wahba 1979). For Unfortunately, usersofthesemethodsgenerally focusex-
example,Stone(1977),usingelegantarguments, showed clusively on themeansquarederror,whichin Section6
the asymptotic consistency of a wide class of nonpara- we criticized as too limiting. One exception, however,is
metricestimates. theworkbyClark(1980).Ofcourse,one coulduse cross-
One well-known nonparametric regression
procedure is validationor generalized cross-validation in place of the
smoothing splines(Henderson1924; Reinsch1967; Sil- M statistic to choosetheamount of smoothing forlocally
verman1985;Wahba1978;Whittaker 1923).Splineshave weighted regression, or one could use the M for
an attractiveproperty: Theyare thesolutionto an intui- splines.Thatis, thesemethodsforchoosing the amount
tivelyappealingmathematical criterion.Anotherattrac- ofsmoothing arenotdependent onthemethodofsmooth-
tiveproperty is thattheyhavea Bayesianinterpretationing.
(Wahba1978;Whittaker 1923). [Weerahandi and Zidek [ReceivedSeptember 1986.RevisedDecember 1987.]
(1985) provideda Bayesianinterpretation forunivariate
locallyweighted regression witha particularweightfunc- REFERENCES
tion.]But splinesalso havesomeunattractive properties. Diag-
Belsley,D. A., Kuh,E., and Welsch,R. E. (1980),Regression
First,theyoptimize a globalcriterionandarenotgenerally nostics,NewYork:JohnWiley.
local. [Although, as Silverman (1985) pointedout,when Benedetti, J.K. (1977),"On theNonparametric ofRegres-
n is large and the amount of smoothingis neitherlarge sionFunctions," Journalof theRoyalStatistical Society,Ser. B, 39,
norsmall,splinemethodsbehave,to a good approxima- Bloomfield, P. (1976),Fourier Analysis ofTimeSeries:An Introduction,
tion,as smoothing bylocalfitting witha weightfunction NewYork:JohnWiley.
withexponential decay;thussplinesarenearlylocalinthis Breiman,L., andFriedman, J. H. (1985),"Estimating OptimalTrans-
formations forCorrelation andRegression," Journal oftheAmerican
case.] A second unattractive propertyis thatbecause Statistical Association, 80, 580-598.
splinesarise as the resultof an optimization, it can be Brinkman, N. D. (1981),"EthanolFuel-A Single-Cylinder EngineStudy
difficultto determine howtheyoperateon thedata. In of Efficiency and Exhaust Emissions," SAE Transactions,90, No.
contrast,the operationalcharacteristics of local-fittingBruntz,S. M., Cleveland,W. S., Kleiner,B., andWarner, J.L. (1974),
procedures are easierto fathom becausetheyare defined "TheDependenceofAmbientOzoneonSolarRadiation,Wind,Tem-
directly.Forexample,becauseofitsdefinition, oneknows perature,andMixingHeight,"in Symposium on Atmospheric Diffu-
sionandAirPollution, Boston:American Meteorological pp.
thatthelocallyweighted regression estimate,g(x), is de- 125-128.
termined by 100f% of thedata at eachx, foranyn and Buta,R. (1987),"The Structure andDynamics ofRingedGalaxies,III:
foranyconfiguration ofthexi (exceptwhentiesin thexi Surface Photometry and Kinematics of the Ringed Nonbarred Spiral
NGC 7531,"TheAstrophysical Journal Supplement Series,64, 1-37.
leavemorethan100f%ofthedataat a particular point). Chambers, J.M., Cleveland, W.S., Kleiner, B., andTukey,P. A. (1983),
It is considerablymoredifficult to determine theeffective GraphicalMethodsforData Analysis,Monterey, CA: Wadsworth.
bandwidth of a splineestimateat x (Silverman1984).In Clark, R. M. (1980), "Calibration, Cross-Validation, and Carbon-14,
II," Journal oftheRoyalStatistical Society, Ser. A, 143,177-194.
manycases thisis onlypossibleby numerically working Cleveland,W. S. (1979), "RobustLocallyWeightedRegressionand
out the coefficients of thelinearcombination of yithat Smoothing Scatterplots,"Journalof theAmerican Associ-
makeup theestimate. ation,74, 829-836.
Cleveland,W. S., Devlin,S. J.,andGrosse,E. (1988),"Regression by
Themostseriousproblem withsplinesis computational. Local Fitting:Methods,Properties, andComputational Algorithms,"
Althoughfast0(n) algorithmsexistforone independent Journal ofEconometrics, 37, 87-114.
variable(Silverman1985;Whittaker 1923),fitting "thin Collomb,G. (1981),"Estimation Non-Parametrique de la Regression:
RevueBibliographique," International Review,49, 75-93.
plate"splinesto twoor moreindependent variablesis an Cook,R. D., andWeisberg, S. (1982),Influence andResiduals inRegres-
0(n3) computation(Wahba 1984). The expected compu- sion,NewYork:Chapman& Hall.
tationtimeof a loess estimateat a singlevalue of x is Craven,P., andWahba,G. (1979),"Smoothing NoisyData WithSpline
Functions:Estimating the CorrectDegree of Smoothing by the
0(n). For a fixedvalue off (i.e., a fixednumberofdegrees Methodof GeneralizedCross-Validation," Numerische Mathematik,
offreedom ofthefit),thenumberofpointsat whichone 31, 377-403.
needsto computeg to characterize it forpracticalappli- Daniel,C., andWood,F. (1971),Fitting EquationstoData, NewYork:
cationsis fixed:By usingblendingfunctions -
and k d Davies,0. L. (ed.) (1957),Statistical MethodsinResearch andProduc-
trees,local-fittingcomputations inpracticecanbe keptto tion(3rded.), NewYork:HafnerPress.
0(n) time (Cleveland et al. 1988) and are thus feasible Devlin,S. J.(1986),"Locally-Weighted MultipleRegression: Statistical
Properties andItsUse toTestforLinearity," technicalmemorandum,
even in computing environments thatdo not have fast, Bell Communications Research,Piscataway, NJ.
powerful processors. Notethatthisstrategy isnotavailable Fowlkes,E. B. (1986),"SomeDiagnostics forBinary Logistic Regression
in splinesmoothing, becauseone cannotgetg at a single Via Smoothing"(withdiscussion),in Proceedings of theStatistical
Computing Section,American Statistical
Association,pp. 54-64.
valueofx without carryingoutthefulloptimization. Thus Friedman,J. H., and Stuetzle,W. (1981),"ProjectionPursuitRegres-
anotherstrategy thathas been employedforsplinesis to sion,"Journal oftheAmerican Association,
Statistical 76, 817-823.
solvean alteredoptimization thatrequireslesscomputa- Hardle,W., and Gasser,T. (1984),"RobustNon-Parametric Function
Fitting,"Journal oftheRoyalStatistical Society,Ser. B, 46, 42-51.
tionand thatyieldsa solutionclose to the originalone Hastie,T. J.,andTibshirani, R. J.(1986),"Generalized Additive Models,"
whenn is large(Wahba1984).Butthecomputing is still StatisticalScience,1, 297-318.
610 Journalofthe American StatisticalAssociation,September 1988

Henderson, R. (1924),"A NewMethodof Graduation," Transactions Stone,C. J.(1977),"Consistent Nonparametric Regression," TheAnnals
oftheActuarial SocietyofAmerica,25, 29-40. ofStatistics,5, 595-620.
Huber,P. (1985),"Projection Pursuit"(withdiscussion), TheAnnalsof (1982), "OptimalRates of Convergence forNonparametric
13,435-525. Regression," TheAnnalsofStatistics, 10,1040-1053.
Kendall,M., andStuart, A. S. (1977),TheAdvancedTheory ofStatistics Stone,M. (1974),"Cross-Validatory Choiceand Assessment of Statis-
(Vol. 1, 4thed.), NewYork:Macmillan. ticalPredictions" (withdiscussion),Journalof theRoyalStatistical
Landwehr, J. M. (1983),"UsingPartialResidualPlotsto DetectNon- Society,Ser. B, 36, 111-147.
linearity,"technical memorandum, AT&T Bell Laboratories, Murray Titterington, D. M. (1985), "CommonStructure of Smoothing Tech-
Hill,NJ. niquesin Statistics,"International Review,53, 141-170.
Larsen,W.A., andMcCleary, S. J.(1972),"TheUse ofPartialResidual Wahba,G. (1978),"Improper SplineSmoothing,
Priors, andtheProblem
Plotsin Regression Analysis,"Technometrics, 14,781-790. of GuardingAgainstModel Errorsin Regression," Journalof the
Macaulay,F. R. (1981),TheSmoothing of TimeSeries,NewYork:Na- RoyalStatistical Ser. B, 40, 364-372.
tionalBureauofEconomicResearch. (1979),"Convergence Ratesof 'ThinPlate'Smoothing Splines
Mallows,C. L. (1966),"Choosinga SubsetRegression," unpublished WhentheData Are Noisy,"in Smoothing Techniques forCurveEs-
paperpresented at the annualmeeting of the AmericanStatistical timation, eds. T. GasserandM. Rosenblatt, Berlin:Springer-Verlag,
Association, Los Angeles. pp. 233-245.
(1973),"SomeComments on Cp,"Technometrics, 15,661-675. (1984),"Cross-Validated SplineMethodsfortheEstimation of
Priestley,M. B., and Chao, M. T. (1972),"Non-parametric Function Multivariate Functions FromData on Functionals," An
in Statistics:
Fitting," Journalof theRoyalStatistical Society,Ser. B, 34, 385- Appraisal,eds. H. A. David and H. T. David, Ames: Iowa State
392. University Press,pp. 205-235.
Reinsch, C. (1967),"Smoothing bySplineFunctions,"Numerische Math- Watson,G. S. (1964),"SmoothRegression Analysis," Sankhya, Ser.A,
ematik, 10, 177-183. 26, 359-372.
Rodriguez, R. N. (1985),"A Comparison of theACE and MORALS Weerahandi, S., and Zidek,J. V. (1985),"Smoothing LocallySmooth
Algorithms inanApplication toEngineExhaustEmissions Modeling," Processesby BayesianNonparametric Methods,"TechnicalReport
in Computer Scienceand Statistics:Proceedings oftheSixteenthSym- 26, University ofBritishColumbia,Dept. ofStatistics.
posiumon theInterface, ed. L. Billard,New York:North-Holland, Wegman, E. J.,andWright, I. W. (1983),"SplinesinStatistics,"
pp. 159-167. oftheAmerican Association,
Statistical 78, 351-365.
Silverman, B. W. (1984),"SplineSmoothing: theEquivalentVariable Whittaker, E. T. (1923),"On a New Methodof Graduation," in Pro-
KernelMethod,"TheAnnalsofStatistics, 12,898-916. ceedings oftheEdinburgh Mathematical Society(Vol. 41), pp. 63-75.
(1985), "Some Aspectsof theSplineSmoothing Approachto Young,F. W., DeLeeuw,J.,andTakane,Y. (1976),"Regression With
Non-parametric Regression CurveFitting" Journal
(withdiscussion), Qualitative andQuantitative Variables:An Alternating Least-Squares
oftheRoyalStatistical Society, Ser. B, 47, 1-52. MethodWithOptimalScalingFeatures," Psychometrika, 41,505-529.

