Sie sind auf Seite 1von 29

Improved Sampling and

Prediction Techniques for


Spatial Econometric Models
Chris Frazier
PB Consult Inc.
November 13, 2005
Introduction
Population and land use/cover models
important for a whole host of applications
Travel demand forecasting
Transportation planning and policy decisions
Urban growth modeling
Might expect variables of interest to exhibit:
Spatial correlation
Time-dependence (dynamic effects)
Dependence on land use/cover
Previous work incorporated these elements
into a rigorous modeling framework

Introduction
Previous work required sampling from very
large data set which led to problems when the
model was used to generate predictions
Loss of spatial resolution
Regional effects not fully accounted for

This work attempts to address the problems
associated with sampling previously
encountered with the model
Model Background: Data
Land-Use/Land-Cover (LULC) data
Derived from LandSat satellite imagery
Covers Austin, TX region
30 m 30 m resolution
9 LULC classifications
Water Shrubland Barren
Herbaceous Natural Herbaceous Planted Forest
Fallow Commercial Residential
4 data panels
2000 1997 1991 1983


1983 1991 1997
Model Background: Data, contd
2000
Statistics derived from LULC data
Land mix measure of homogeneity
Land entropy (balance) measure of heterogeneity
U.S. Census (by Block Group)
Approximations for non-census years estimated
Other spatial variables
Distances to CBD & nearest highway
Combination grid needed to reduce number
of data points (~3 million to ~30,000) and to
utilize Census data in grid structure
Model Background: Data, contd
with random effects Panel data linear regression

1
) W (

+ 1 v
Model Background: Methodology
Panel data linear regression
+ + = | Z X Y
( )
2
, 0 Normal ~
v i
v o
( )
2
, 0 Normal ~ o
it
(
(
(
(

=
0
0
0
W
2 1
2 12
1 12

N N
N
N
o o
o o
o o
2
cell & cell
between distance

|
|
.
|

\
|
=
j i
ij
o
X = non time lagged variables
Z = time lagged variables
with random effects
and spatial autocorrelation
Model Background:
Methodology, contd
Lagged variables used to capture dynamics
Different time gaps in lags needs to be
accounted for


Time Adjustment Factor a estimated along
with model coefficients using maximum
likelihood estimation
+
'
it
x |
t i
t t
z a
'
'

'
,
) (

Sampling Strategy
Rather than using random samples, regional
samples are used
These make more sense because one
underlying assumption of spatial econometrics
is that distance matters
25 samples used each 400 400 cells
16 Samples in one layer
9 samples in other layer
Simple average: = =

Inverse distance:


Inverse distance
2
:
Simple average:
i
=
i


Inverse distance:

Inverse distance
2
:
First: and determined independent of
model results
Second: and determined based on
predictions of known 2000 data from 1997 data
Region Combination Strategy
Each cell is part of two samples, allowing the
ability to coordinate model results to create
more robust projections:
| o
| o
, i , i i
P P P + =
1
, i
1
, i
1
, i
i
D D
D

+
=
| o
o
o
1
, i
1
, i
1
, i
i
D D
D

+
=
| o
|
|
2
, i
2
, i
2
, i
i
D D
D

+
=
| o
o
o
2
, i
2
, i
2
, i
i
D D
D

+
=
| o
|
|
2000 , , i 2000 , , i 2000 , i
P P P
| o
| o + =
1
, i
1
, i
i
i
D
D

=
|
o
|
o
2
, i
2
, i
i
i
D
D

=
|
o
|
o
Inverse error
2
:
2
i ,
2
i ,
i
i

=
|
o
c
c
|
o
2000 , , i 2000 , i
P P
o o
c =
2000 , , i 2000 , i
P P
| |
c =
Model Forms/Results
Two variables modeled: Population and % of
urban land cover
ln(Population) used to ensure results > 0
% land cover contained in [0,1], so regress
implying

Same independent variables from older
estimations used
Comparing model results with older
estimations shows them to be similar, with
expected regional differences
|
|
.
|

\
|

it
it
Y
Y
1
ln
( )
( ) |
|
Z X
Z X
Y
+ +
+
=
exp 1
exp
Old 2005 Population Prediction Old 2020 Population Prediction
2000 Actual Population
Old 2005 % Urban Prediction Old 2020 % Urban Prediction
2000 Actual %Urban
2000 Actual Population 2000 Simple Average
2000 Inverse Distance 2000 Inverse Distance Squared
2005 Simple Average Coordination
2010 Simple Average Coordination
2020 Simple Coordination
2000 Actual Population
2005 Inverse Distance Coordination
2010 Inverse Distance Coordination 2020 Inverse Distance Coordination
2000 Actual Population
2005 Inverse Distance Squared Coordination
2010 Inverse Distance Squared
Coordination
2020 Inverse Distance Squared
Coordination
2000 Actual Population
2005 Error Squared Coordination
2010 Error Squared Coordination 2020 Error Squared Coordination
2000 Actual Population
2005 Error Squared Coordination
2020 Error Squared Coordination
Old 2005 Population Prediction
Old 2020 Population Prediction
2020 Error Squared Coordination
2020 Predictions; Cell Groups 1-16
2020 Predictions; Cell Groups 17-25
2000 Actual %Urban
2000 Simple Average
2000 Inverse Distance 2000 Inverse Distance Squared
2005 Simple Average Coordination
2010 Simple Average Coordination
2000 Actual %Urban
2005 Inverse Distance Coordination
2010 Inverse Distance Coordination
2000 Actual %Urban
2005 Inverse Distance Squared Coordination
2010 Inverse Distance Squared Coordination
2000 Actual %Urban
2005 Error Squared Coordination
2010 Error Squared Coordination
2000 Actual %Urban
2005 Error Squared Coordination
2010 Error Squared Coordination
Old 2005 %Urban Prediction
Old 2010 %Urban Prediction
2010 Inverse Distance Squared
Coordination
2020 Error Squared Coordination
2020 Predictions; Cell Groups 1-16
2020 Predictions; Cell Groups 17-25
Conclusions
Using smarter sampling strategies can vastly
improve model performance while maintaining
similar level of computational effort
Models discussed here now show promise for
real world applications
Extending predictions beyond time span of
original model can lead to problematic results
How to combine samples is an area that could
warrant further research
Thank You!




Questions/Comments?