Sie sind auf Seite 1von 6

RENR 690 Geostatistics Lab

Introduction

This lab is designed as a quick introduction to doing geostatistics in R. Geostatistics is primarily used in
the resource world for estimation and quantification of uncertainty. Increasingly, the techniques are
being applied to environmental datasets.

This lab will cover the basics. Exploratory data analysis goes over location maps and histograms. The
variogram section covers omnidirectional variograms and automatic variogram fitting. The Kriging
section covers Ordinary Kriging and cross validation.

The dataset for this lab is a synthetic, 2D example with acid rock drainage data. The concepts could
easily apply to Ecology data such as Pine Beetle count or soil contamination.

Getting Started

This lab uses the GeoR package. For more information, please visit the following link:

http://cran.r-project.org/web/packages/geoR/index.html

GeoR uses a data format called a geodata object. It has one container for location data and another
for variable data. To begin, the data will be imported and loaded as a geodata object.
# Install and Load Package
package.install(geoR)
library(geoR)

# Load Data
data=read.csv("2Dtestdata.csv")
fix(data)
attach(data)

# Create a "geodata" Object


geodat <- as.geodata(data,coords.col = 1:2, data.col = 3)
summary(geodat)

Exploratory Data Analysis

The first step in a geostatistics workflow is exploratory data analysis. Location maps should be made of
all variables. This creates a location map with the points color coded by the variable of interest. The
variable were plotting ranges from 2 8.
# Plot Data

mycol=seq(8,2,-0.01)
mycol=terrain.colors(length(mycol))[rank(mycol)]

acidcol = round((ACID-2.0)*100.)
acidcol = mycol[acidcol]

plot(LocY~LocX,pch=16,col=acidcol, xlab="Easting", ylab="Northing", main="Data


Locations")

Here we can see the data was collected on a regular sampling grid. The extents are 100 meters by 100
meters. The data are spaced at 5 meter intervals. Theres a minor trend from high to low in the 135
degree direction.

It is also a good idea to look at histograms of the variables you will be mapping. This helps identify any
skewness to the data, existence of multiple populations (ex: bimodal histogram), as well as any outliers
in the data.

# Histogram
hist(ACID, col="grey", labels=T, breaks=17)
After plotting the histogram, we can see the data has a slight degree of negative skew and has a single
point which we may consider a high outlier. For this lab, well leave the data as is.

When looking at your data, there are a few things to consider.

1. Are there any spatial outliers? If most data points following a somewhat regular sampling grid
but a few points lie spatially far removed from the rest, consider removing them. Kriging
interpolates the area between data points and these spatial outliers can cause large swaths of
land to be estimated by just a couple points.
2. Is the data clustered? Humans typically have a tendency to preferentially sample in areas with
positive results. Obviously, this is good practice but it can lead to clustering of samples in an
area with positive results and create bias. If this exists, consider declustering to down weight
the over represented samples.
3. Are there any outliers? The Kriging system of equations is particularly sensitive to high outliers.
These can lead to large areas being assigned a high value. If your dataset has a few outliers,
consider capping them to the P90 or P95 value.

Variograms

The cornerstone of any geostatistical workflow is the variogram. Recall that it is a measure of variability
as a function of distance. To start, well compute an omnidirectional (isotropic) variogram and fit it with
a variogram model. An omnidirectional variogram assumes that spatial variability is the same in all
direction. Well start by computing the variogram and then looking at a plot and summary of the data.

# Compute Variogram and Summarize


v1 <- variog(geodat)

v1.summary <- cbind(c(1:10), v1$v, v1$n)


colnames(v1.summary) <- c("lag", "semi-variance", "# of pairs")
v1.summary

plot(v1, xlab="Distance, m", ylab="Semivariance", main="Variogram")


abline(1,0, lty=2)

Your summary output should show the lag number, the semi-variance and the number of data pairs.
The variance rises with distance and reaches the sill of 1.0 at lag number 6. This is the range of
continuity. On the plot, we can see the points rise towards the sill and reach the sill at a lag distance of
approximately 65 meters.

Next, well fit the experimental variogram points with a variogram model. The experimental points tell
us the variability at discrete lag distances. The variogram model is a best fit line through these points
and will tell us the variability at any lag distance.

The initial covariance parameters are filled in with the sill and the range which we estimate by looking at
the experimental variogram points. Well use a sill of 1.0 and a range of 68 meters.
# Fit Variogram

plot(v1, xlab="Distance, m", ylab="Semivariance", main="Variogram", ylim=c(0,1.1))


abline(1,0, lty=1, lwd=3)

autofit <- variofit(v1, ini.cov.pars=c(1.0,68), cov.model="sph")


lines(autofit, lty=2, lwd=2)

legend(50, 0.3, c("SILL", "AUTO FIT"), lty = c(1,2), lwd = c(3,2))

After running the code, you should have a like the following. The program creates a best fit through the
experimental points. With this, we can move on to estimating.
Kriging

To estimate, we first need to define a grid. This is easy for this synthetic dataset. Well define a 100
meter by 100 meter grid and well use a grid size of 1 meter by 1 meter. This will give us 10,000
locations on the estimation grid. In the real world, your grid and cell size will depend on data spacing.
You want small enough cells to provide good resolution but you probably want to keep the total number
of cells under 1 million.

For the Kriging equations, you will need the covariance parameters calculated from the variogram
section. Well use a sill of 1.0 and a range of 68 meters.

# Kriging

loci <- expand.grid(seq(0,100,l=100), seq(0,100,l=100))

kc<-krige.control(type.krige="ok", cov.model="spherical", obj.m=autofit)


krige <- krige.conv(geodat, loc=loci, krige=kc)

contour(krige, filled=TRUE, color=terrain.colors, xlab="Easting, m", ylab="Northing, m",


main="Kriged Map")

Your result should look something like this. We can see that the overall trend in the data is well
reproduced and the values change smoothly.
Cross Validation

Its a good idea to cross validate the estimated results with the true data points. Leave-one-out cross
validation removes a single data point, one at a time, and estimates the result at the now missing
location. We can then make a cross plot between the estimated results and the true data points.

# Leave-One-Out Cross Validation

kcx <- xvalid(geodat, model=autofit)

plot(kcx$data~kcx$predicted, xlab="Estimated Value", ylab="True Value",


main="Cross Validation", xlim=c(2,8), ylim=c(2,8) )
curve(1*x, add=T, lty=2)

p <- cor(kcx$data,kcx$predicted)
legend(5.5,3,c("COR = ",p))

Looking at the cross validation results, we can see the data points are well produced and correlation is
approximately 0.85. There is one point that is noticeably wrong with a true value of around 8 and an
estimated value of around 5. If you recall back to the beginning of the lab, this data point is a high
outlier. As you can see, these data points need to be managed carefully.

Das könnte Ihnen auch gefallen