Sie sind auf Seite 1von 65

EARTHSC/ENVIRSC/GEOG4GA3

/ User Conference/|
Esri International
AppliedSpatialStatistics
AreaDataAnalysisIV
l i
San Diego, CA

Technical Workshops |

PatrickDeLuca,MA,GISP
November3,2015
b

Exploring2nd OrderEffects

Doattributesin
Do
attributes in neighboring
neighboring zonesshowspatial
zones show spatial
dependency?i.e.Dotheycovary?
SpatialAutocorrelation
p

11/3/2015

Involvescorrelationbetweenvaluesofthesamevariableat
differentspatiallocations
Itisconceptuallyandempiricallythetwodimensional
equivalentofredundancy.

PatrickDeLuca AppliedSpatialStatistics

53

SpatialAutocorrelation

Null hypothesis: No spatial autocorrelation


Nullhypothesis:Nospatialautocorrelation

11/3/2015

Valuesobservedatalocationdonotdependonvalues
observedatneighbouring locations
Observedspatialpatternofvaluesisequallylikelyasany
otherspatialpattern

PatrickDeLuca AppliedSpatialStatistics

54

AlternativeHypothesesofSA

Positive Spatial Autocorrelation


PositiveSpatialAutocorrelation

Likevaluestendtoclusterinspace
Neighbours aresimilar

NegativeSpatialAutocorrelation

11/3/2015

Neighbours aredissimilar
Checkerboardpattern

PatrickDeLuca AppliedSpatialStatistics

55

SpatialAutocorrelation

Why is spatial autocorrelation important?


Whyisspatialautocorrelationimportant?

Moststatisticsarebasedontheassumptionthatthevalues
ofobservationsineachsampleareindependent

Iftheobservations,however,arespatiallycorrelatedinsomeway,
theestimatesobtainedwillbebiasedandoverlyprecise.

11/3/2015

Biased theareaswithhigherconcentrationofeventswillhavea
greaterimpactonthemodelestimate
t i
t
th
d l ti t
Overestimateprecision sinceeventstendtobeconcentrated,there
areactuallyfewernumberofindependentobservationsthanarebeing
assumed

PatrickDeLuca AppliedSpatialStatistics

56

IndicesofSpatialCorrelation

Most common global approaches


Mostcommonglobalapproaches

JoinCountStatistics
MoransI
GetisOrdGeneralG

LocalApproaches

11/3/2015

LocalMoransI
LocalGetis Statistic

PatrickDeLuca AppliedSpatialStatistics

57

JoinCountStatistics

Appliedtobinaryvariablesmappedastwocolours
Applied
to binary variables mapped as two colours
(BlackandWhite)suchthatajoin,oredgeisclassified
aseitherWW(00),BB(11)orBW(10)
Interestedinnumberofoccurrencesofeachpossible
joinbetweenneighbouringcells
Canshow

11/3/2015

Positivespatialautocorrelation(clustering)ifthenumberofBWjoinsis
significantly lower than what we would expect by chance
significantlylowerthanwhatwewouldexpectbychance
Negativespatialautocorrelation(dispersion)ifthenumberofBWjoinsis
significantlyhigherthanwhatwouldexpectbychance
N ll
NullspatialautocorrelationifnumberofBWjoinsissameasexpected
ti l t
l ti if
b
f BW j i i
t d
PatrickDeLuca AppliedSpatialStatistics

58

JoinCountStatistics

11/3/2015

BB

WW

BW

TOTAL

24

24

12

24

10

10

24

PatrickDeLuca AppliedSpatialStatistics

KeyistheBW
OBW =EBW,random
OBW neE
ne EBW,notrandom
not random
OBW>EBW,moredispersed
OBW<EBW,moreclustered

59

JoinCountStatistics

Expected values under free sampling


Expectedvaluesunderfreesampling

Free(ornormal)samplingusedwhenyoucandetermine
theprobabilityofanareabeingblackorwhite
JBBE=kp2B=6
JWWE= kp2W=6
JBWE=2kp
k BpW=12

11/3/2015

k=totalnumberofjoins(=24)
pB=probabilityofbeingcodedblack(=0.5)
p
y
g
(
)
pW =probabilityofbeingcodedwhite(=0.5)

PatrickDeLuca AppliedSpatialStatistics

60

JoinCountStatistics

Standard Deviations
StandardDeviations
Needtocomputethetotalsetofallpossibilities
Givenby
n

1
m = ki (ki 1) = 52
2 i =1

BB = kpB2 + 2mpB3 (k + 2m) pB4 = 3.32


BW = 2(k + m) p B pw 4(k + 2m) p p = 2.45
2
B

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

2
w

61

JoinCountStatistics

ZScores
BB

WW

BW

TOTAL

Join Type
yp

24

24

BB

-1.81

-0.30

1.20

12

24

WW

-1.81

0.30

1.20

10

10

24

BW

4 90
4.90

0 00
0.00

-3.27
3 27

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

62

JoinCountStatistics

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

63

JoinCountStatistics

Contiguity Matrix
ContiguityMatrix

UsedRooksCase

TotalJoins=214/2=107

ThisonehereisQueensCase

TotalJoins=218/2=109

O
SullivanandUnwin,2003
OSullivan
and Unwin, 2003

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

64

JoinCountStatistics

Obama won
Obamawon

Romneywon

59647121votes,p(Obama)=0.511
303electoralvotes,p(Obama)e=0.595
57022021votes,p(Romney)=0.489
206electoralvotes,p(Romney)e=0.405

k=107
m=421

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

65

JoinCountStatistics

Obama Obama joins = 33.5


Romney Romney joins = 40
Obama Romney joins = 33
33.5
5
11/3/2015

PatrickDeLuca AppliedSpatialStatistics

66

JoinCountStatistics
P based on Votes
Join Type

Measured

Estimated

Std.Dev

ZScore

ObamaObama

33.5

27.94

8.694

0.640

RomneyRomney

40

25.586

8.353

1.726

ObamaRomney
Obama
Romney

33 5
33.5

53 474
53.474

3 066
3.066

6
6.514
514

P based on Electoral votes


Join Type

Measured

Estimated

Std.Dev

ZScore

ObamaObama

33.5

37.881

9.813

0.446

R
RomneyRomney
R

40

17 551
17.551

6 925
6.925

3 242
3.242

ObamaRomney

33.5

51.569

15.592

1.133

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

67

JoinCountStatistics

US Election using NonFree


USElectionusingNon
FreeSampling
Sampling
Usedwhenthereisnoaprioriknowledgeofwhatshould
beBorW.
DifferentmethodtocomputeEBW andBW calculation

2 JBW
E =
N ( N 1)

J = total number of joins, B = # Black, W=#


W # White

BW

E + 2 J (2 J 1) BW 4[ J ( J 1 2 J (2 J 1)]B ( B 1)W (W 1)
+
E
N ( N + 1)
N ( N 1)( N 2)( N 3)

BW

BW

Zb =

OBW EBW

11/3/2015

BW

33.5 53.5
= 3.4
=
5.888
PatrickDeLuca AppliedSpatialStatistics

68

2
BW

JoinCountStatistics

Limitations

Thejoincountstatisticcanonlybeusedonbinarydata.

Equationsfordeterminingthestandarddeviationsare
q
g
reasonablycomplex

11/3/2015

Butalotofdatacanbetransformedintobinary.
e.g.rainfalldatacaneasilybeconvertedinto"wet"or"dry"
regionsbydeterminingthoseareasaboveorbelowthemean.

Easytomakeamistakewhenimplementingthem

PatrickDeLuca AppliedSpatialStatistics

69

MoransI

Oneoftheoldestindicatorsofspatialautocorrelation
One
of the oldest indicators of spatial autocorrelation
(Moran,1950).Thedefactostandardfordetermining
spatialautocorrelation
Appliedtozonesorpointswithcontinuousvariables
associatedwiththem.
Comparesthevalueofthevariableatanyone
locationwiththevalueatallotherlocationsfora
spatialmatrixW
l

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

70

MoransI

BehavesinamannersimilartoPearsonscorrelation
coefficient
n
n
I=

n wij ( zi z )(z j z )
i =1 j =1

n
2

( zi z ) wij
i =1
i j

Valuesboundedby1to+1

11/3/2015

ve hasacheckerboardpattern
0isuncorrelated
+ve isclustered(nodistinctionbetweenhighandlow
values)
l )
PatrickDeLuca AppliedSpatialStatistics

71

MoransI

Positive Spatial Autocorrelation


PositiveSpatialAutocorrelation

I >1/(n1)
Spatialclusteringofhighand/orlowvalues

NegativeSpatialAutocorrelation
g
p

11/3/2015

I <1/(n1)
Checkerboardpattern

PatrickDeLuca AppliedSpatialStatistics

72

MoransI

AssessingsignificanceusingtheNormal
Assessing
significance using the Normal
approximationmethod

11/3/2015

Nullhypothesisstatesthatvaluesrepresentone
yp
p
manypossiblesamplesofvalues
Ifyourandomlyselectvaluestodistributeacross
yourstudyarea,mostofthetimeitwouldproduce
d
f h i
i
ld
d
apatternanddistributionofvaluesthatwouldnot
be markedly different from the observed pattern
bemarkedlydifferentfromtheobservedpattern
Assumingthatyourdataanditsarrangementare
oneofmany,many,possiblerandomsamples
PatrickDeLuca AppliedSpatialStatistics

73

MoransI

AssessingsignificanceusingtheNormal
Assessing
significance using the Normal
approximationmethod

Empiricaldistributioncanbecomparedtothetheoretical
distributionthroughZtest

I E(I )
Z (I ) =
SE(I )
SE(I )

11/3/2015

N 2 wij 2 + 3( wij ) 2 N ( wij ) 2


ij
ij
i
j

= SQRT
2
2
( N 1)(ijj wij )

PatrickDeLuca AppliedSpatialStatistics

74

MoransI

Normal Approximation example


NormalApproximationexample

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

75

MoransI

Assessing significance using random permutations


Assessingsignificanceusingrandompermutations

11/3/2015

Supposewehavenvaluesyi relatingtoourstudyarea
Thenn!permutationsofthemaparepossible,each
correspondingtoadifferentarrangementofthendatavalues
ThevalueofI canbecalculatedforanyofthesepermutations,
so we can create an empirical distribution for possible values
sowecancreateanempiricaldistributionforpossiblevalues
ofI underrandompermutationsofthendatavalues.
Plotthedistributionofthesepermutationsandcompareour
p
p
observedtothedistribution

PatrickDeLuca AppliedSpatialStatistics

76

MoransI

Random permutation example


Randompermutationexample

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

77

MoranScatterplots

Linearassociationbetweenvalueati
Linear
association between value at i andweighted
and weighted
averageofneighbours
Fourquadrants
q

HighHigh,LowLow=spatialclusters
HighLow,LowHigh=spatialoutliers

Whatcanbereadoffofthisgraph

11/3/2015

Slope=MoransI
Outliers
Highleveragepoints
Spatialregimes
l
PatrickDeLuca AppliedSpatialStatistics

78

Correlograms

UsingMoran
Using
MoranssItoproducecorrelogram
I to produce correlogram
UseproximitymatrixWk,wherekislag
Visualization

Spatialautocorrelationstatisticsforincreasinglag

Interpretation

Identificationofspatialprocess
Rangeofassociation
g

11/3/2015

Possibleindicationofmisspecifiedspatialweights

PatrickDeLuca AppliedSpatialStatistics

79

Correlograms

Malczewski,2009

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

80

Correlograms
Correlogram of Respiratory Disease,
Hamilton, 2008
0.7

Mora
ans I

0.6
0.5
0.4
0.3
0.2
01
0.1
0
1

Spatial Lag

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

81

Correlograms

Malczewski,2009

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

82

Correlograms
Correlogram
&PartialCorrelogram
ofRespiratoryDisease
g
g
p
y
0.7
0.6
0.5

MoraansI

0.4
Correlogram

0.3

Partial Correlogram

0.2
0.1
0
1

-0.1
-0.2

SpatialLag

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

83

GetisOrd GeneralG

measureshowconcentratedthehighorlowvalues
measures
how concentrated the high or low values
areforagivenstudyarea.

Thenullhypothesis:"thereisnospatialclusteringof
thevalues".

11/3/2015

SignificantandpositiveZscoresindicatehighvaluescluster
SignificantandnegativeZscoresindicatelowvaluescluster
PatrickDeLuca AppliedSpatialStatistics

84

GlobalandLocalAutocorrelation

Global

Onestatistictosummarizepattern
Informsifclusteringexistsinthedata

Local

11/3/2015

Locationspecificstatistics
Showsuswheretheclustersarelocated

PatrickDeLuca AppliedSpatialStatistics

85

LISAStatistics

LocalIndicatorofSpatialAssociation
p
Satisfiestworequirements:

indicatessignificantspatialclusteringforeachlocation
SumofLISAproportionaltoaglobalindicatorofspatial
association

LISA forms of global statistics


LISAformsofglobalstatistics

11/3/2015

LocalMoransI
LocalGetisOrdGi*

PatrickDeLuca AppliedSpatialStatistics

86

LISAStatistics

Use:

Identifyhotspots

Significantlocalclustersinabsenceofglobalassociation
Significantlocaloutliers

Indicate local instability


Indicatelocalinstability

11/3/2015

Highsurroundedbylowandviceversa

Localdeviationsfromglobalpatternofspatialassociation

PatrickDeLuca AppliedSpatialStatistics

87

LocalMoran

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

88

LocalMoran

Interpretation
p

11/3/2015

Assessinglackofspatialrandomness
Suggestssignificantspatialstructure
Suggestinterestinglocations
Doesnotexplainthem

PatrickDeLuca AppliedSpatialStatistics

89

GetisOrd Gi*

Local version of GetisOrd


LocalversionofGetis
Ord G

Thelocalsumiscomparedproportionallytothesumofall
features

Whenlocalsumisverydifferentthantheexpectedlocalsum,and
thatdifferenceistoolargetobetheresultofrandomchance,a
statisticallysignificantZscoreresults.

11/3/2015

Significant+ve Zscores thelargertheZscoreis,themoreintensethe


clusteringofhighvalues
SignificantnegativeZscores,thesmallertheZscoreis,themoreintensethe
clusteringoflowvalues.
g

PatrickDeLuca AppliedSpatialStatistics

90

GetisOrd Gi*

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

91

ModelingAreaData

Consideramultipleregression
Consider
a multiple regression
equation

yi =a +b1x1 +b2x2 +...+bnxn +ei

11/3/2015

yi=dependentvariable
x1,x2...xn =independentvariables
a =constant(intercept)
= constant (intercept)
b1,b2 ...bn =regressioncoefficients
ei =errorterm(residualor
difference between predicted and
differencebetweenpredictedand
observedvaluesofyi)

PatrickDeLuca AppliedSpatialStatistics

92

RegressionAnalysis:Assumptions

Multicollinearity:thereisnointer
Multicollinearity:
there is no intercorrelation
correlationof
of
independentvariables
Normality:Errorterms(e
y
( i))arenormallydistributed.
y
Andthemeanoftheerrortermis0
Homoskedasticity (equalvariance):theresidualsare
dispersedrandomlythroughouttherangeofthe
estimateddependentvariable
Spatial independence: there is no spatial
Spatialindependence:thereisnospatial
autocorrelationoftheresiduals

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

93

Example:AverageAgeofDeath

Whatexplainsaverageageofdeath?
What
explains average age of death?
Variablesthatwerestatistically
significant in a bivariate
significantinabivariate
Variable
ttest
regression
Dwell
5.04

11/3/2015

MedInc

4 41
4.41

LICO_All

3.97

NoEdu

5.31

Univ

5.53

DropOut

4.02

Seniors

6.47

PatrickDeLuca AppliedSpatialStatistics

94

Example:CardiacAdmissions

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

95

AnalysisofResiduals

Multicollinearity conditionnumber>30is
condition number > 30 is
problematic
JarqueBera testsjointhypothesisofskewness
tests joint hypothesis of skewness =0
=0
andkurtosis=0

11/3/2015

Isthedataconsistentwithhavingskewness
g
andkurtosis
equalto0?
Whenp>0.05itisconsistentwith0skewand0kurtosis
PatrickDeLuca AppliedSpatialStatistics

96

AnalysisofResiduals

BreuschPagan

11/3/2015

ttestsnullhypothesisthattheerrorvariancesareallequalvs
t
ll h
th i th t th
i
ll
l
thealternativethattheerrorvariancesareamultiplicative
functionofoneormorevariables
Alt.hyp.Statesthattheerrorvariancesincreaseordecrease
asthepredictedvaluesofyincreaseordecrease
P 0 05 i di t h t
P>0.05indicatesheteroskedasticity
k d ti it
PatrickDeLuca AppliedSpatialStatistics

97

AnalysisofResiduals

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

98

AnalysisofResiduals

Are errors independent?


Areerrorsindependent?

11/3/2015

Mapofresiduals

PatrickDeLuca AppliedSpatialStatistics

99

AnalysisofResiduals

Are errors independent?


Areerrorsindependent?

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

100

MyocardialInfarctionExample

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

101

OLSOutput

Suggests
Non normality
Non-normality

Suggests
heteroskedasticity
Suggests
spatial
autocorrelation
11/3/2015

PatrickDeLuca AppliedSpatialStatistics

102

ModelingAreaData

What is the conclusion regarding this model?


Whatistheconclusionregardingthismodel?

Modelshowserrorautocorrelation

Twotypesofmodelspossiblebasedonthetwo
yp
p
primarytypesofspatialdependence

11/3/2015

SpatialErrorModel
SpatialLagModel

PatrickDeLuca AppliedSpatialStatistics

103

ModelingSpatialDependence

Spatial error
Spatialerror

Observationsinterdependentthrough
unmeasuredvariablesthatare
correlatedacrossspaceOR
measurementerrorthatiscorrelated
with space
withspace

11/3/2015

arisesbecausewecannotmodelallthe
facetsofageographicalregionthatmay
influence all nearby locations
influenceallnearbylocations
Mayalsoarisefromboundariesthatare
notperfectmeasures

PatrickDeLuca AppliedSpatialStatistics

Xi

Xj

YiYj
i

104

ModelingSpatialDependence

Spatial error
Spatialerror

Theoreticallypossibletoeliminatethistypeofspatial
dependencewithproperexplanatoryvariablesandcorrect
boundariesofobservations
Spacemattersonlyintheerrorprocess,notinthe
substantiveportionofthemodel
p
Assumptionofuncorrelatederrortermsisviolated

11/3/2015

Indicativeofomitted(spatiallycorrelated)covariates

PatrickDeLuca AppliedSpatialStatistics

105

ModelingSpatialDependence

Spatial Lag
SpatialLag

Dependentvariableisaffectedbythe
valuesofthedependentvariablesin
nearbyplaces

11/3/2015

E.g.LandvalueinaCTisafunctionof
landvalueinnearbyCTs,notjustrelated
tocommonunmeasuredvariables

Assumptionofuncorrelatederror
terms is violated
termsisviolated
Assumptionofindependent
observationsisviolated
PatrickDeLuca AppliedSpatialStatistics

Xi

Xj

YiYj
i

106

AnalysisofResiduals

LMLagandRobustLMLag

PertaintoSpatialLagmodelasalternative
p
g
Robust:testsforlagdependencyinpresenceofmissingerror

LMErrorandRobustLMError

11/3/2015

PertaintoSpatialErrormodelasalternative
Robust:testsforerrordependenceinpresenceofmissinglag

PatrickDeLuca AppliedSpatialStatistics

107

AnalysisofResiduals

From: Anselin 2005


From:Anselin

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

108

SpatialLagModelResults

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

109

RegressionDiagnostics

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

110

RegressionDiagnostics
Noobviouspatternin
residuals

Nofunnellikepattern,noincrease/decrease,
suggests homoskedasticity
suggestshomoskedasticity

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

111

FinalLagModelSpecification

Spatial Lag Model in notation form


SpatialLagModelinnotationform

y=a +Wy +X +

Myocardial~0.86+0.69W(Myocardial)+0.31(JarmanScore)+

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

112

HypotheticalErrorModelSpecification

Spatial Error Model in notation form


SpatialErrorModelinnotationform

y=a +X +W +

Myocardial~75.99+0.25(JarmanScore)+0.71W +

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

113

SummaryofStepsforModeling

Exploration

Aspatial examinedependentvariablefornormality

11/3/2015

Histogram
Boxplot
Normalitystatistics

Spatial
Spatial

ComputeMoranCoefficientScatterplot andMoransItosearchfor
evidenceofthepresenceofspatialand/oraspatial outliers

C
Canalsoexamineonalocallevel
l
i
l ll l

PatrickDeLuca AppliedSpatialStatistics

114

SummaryofStepsforModeling

ComputeOLSresults
Compute
OLS results
UsingOLSresiduals,computeMoransI

Ifsignificantautocorrelationisdetectedintheresiduals,
If
significant autocorrelation is detected in the residuals,
thenrerunmodelwithaspatialmodelandestimatethe
respectiveparameters

Continuewithotherregressiondiagnostics

11/3/2015

Nonnormality(JarqueBera Test)
H t
Heteroskedasticity
k d ti it (BreuschPagan,KoenkerBasset)
(B
hP
K
k B
t)
Multicollinearity (ConditionNumber)
MoranssIforspatialdependenceofresiduals
Moran
I for spatial dependence of residuals
PatrickDeLuca AppliedSpatialStatistics

115

SummaryofStepsforModeling

Fitaspatialmodelonlyifwarranted
Fit
a spatial model only if warranted
Usetheoryifpossibletodecidewhichmodeltofit,if
notpossible,usethediagnostics
p
,
g

11/3/2015

PatrickDeLuca AppliedSpatialStatistics

116

Das könnte Ihnen auch gefallen