Beruflich Dokumente
Kultur Dokumente
The objective of this lab is to try out some basic geostatistical tools with R. Geostatistics is used
primarily in the resource and environmental areas for estimation, uncertainty quantification
and integrating data with difference volumetric supports, precisions and sources. This lab will
introduce you to variograms and kriging for spatial data using the R package gstat.
These concepts will be introduced using a 2D data set from a West Texas oil deposit. This is a
nice small data set consisting of 62 wells from a carbonate-siltstone reservoir in West Texas. It
includes measurements of porosity (void fraction) and permeability (measure of how easily
fluid flows through the rock). The X and Y coordinates in the data set correspond to Easting and
Northing values (in ft), respectively. We are interested in mapping porosity for this reservoir
since this directly correlates to the oil in place. If you have time at the end, you can try the same
procedure with permeability since this directly correlates with our ability to extract the oil. Even
better, you could apply this procedure to some of your own spatial data!
1. Getting Started
Download the dataset “2dwelldata.csv”. Load this data set into R and check that it imported
correctly.
welldata = read.csv("2dwelldata.csv")
fix(welldata)
attach(welldata)
Let’s have a look at a map of the well locations. We can plot a simple map of where the wells
were drilled.
plot(Y~X, xlab="Easting",ylab="Northing")
This isn’t very interesting though. So we could colour this map by porosity. The porosity varies
between about 4 and 12%. To make a plot coloured by porosity where the high porosity rock is
coloured red and the low porosity rock is coloured yellow, a set of possible commands would
be:
mycol=seq(12,4,-0.01)
mycol=heat.colors(length(mycol))[rank(mycol)]
porcol=round((Porosity-4.0)*100.)
porcol=mycol[porcol]
plot(Y~X,pch=16,col=porcol,xlab="Easting", ylab="Northing")
10000
8000
6000
Northing
4000
2000
0
Easting
The high porosity rock is all in the East of the area and concentrated in the North-East of the
reservoir – now the cluster of wells in the North-East makes sense. A large number of wells
were drilled in the high porosity region.
2. Variograms
Download and install the package “gstat”. This is an R package with basic geostatistics
functionality. Install then load the package in R.
install.packages('gstat')
library(gstat)
This also loads the “sp” package which is a spatial data frame package. To run the gstat library
with this data, the data needs to be coerced into a spatial data frame. This is done by first
assigning the coordinates as a set of spatial points and then adding on the porosity data. R code
to do this:
xyspatial=SpatialPoints(cbind(X,Y))
porspatial=data.frame(Porosity)
spatialdata=SpatialPointsDataFrame(xyspatial,porspatial)
Check the spatial data to make sure that it assembled the data correctly. The first few rows
should look like:
> spatialdata
porvario=variogram(Porosity~1,spatialdata)
This is an omnidirectional (isotropic) variogram. This means that we are assuming that the
spatial variability is the same in all directions. This is fine for this exercise although we can tell
by the map that this is not the case! The reservoir is much more continuous in the North-South
direction than the East-West direction.
A variogram plot should always include a line with the sill. The sill is the point at which there is
no spatial correlation. This value corresponds to the data variance. Figure from University of
Alberta MIN E 310 Lecture Notes
We can now calculate the variogram sill (variance) and plot the variogram. One quick note: the
terms variogram and semivariogram are used interchangeably. Technically the semivariogram
(what we are calculating) is the variogram value divided by 2, but since we always divide by 2
to calculate the semivariogram they are used interchangeably.
porsill=var(Porosity)
plot(porvario$dist,porvario$gamma,xlim=c(0,8000),ylim=c(0,porsill+1)
,xlab="Distance (ft)",ylab="Semivariogram")
abline(h = porsill)
text(6000,porsill+0.2,paste("Sill =",round(porsill,3)))
4
Semivariogram Sill = 3.679
3
2
1
0
Distance (ft)
Right now we know experimental variogram values at a few specific distances – but we need to
model this variogram so that we know the variogram values at all distances. This means that we
need to determine:
For simple cases like this, it is a good idea to pick the nugget effect yourself based on your
knowledge of the variable. Recall that the nugget effect can be thought of as the y-axis intercept.
In this case, the nugget effect looks pretty low. We might estimate a nugget effect of about 0.3.
There are a number of permissible variogram models – the reason we need to pick a defined
variogram model rather than fitting the curve with any function is that the calculated
covariance matrix must be positive definite. With the defined variogram models (spherical,
exponential, Gaussian) this is always true. If we were to use another function this might not be
the case. Here we could choose the spherical variogram model.
The variogram range is the point at which the variogram reaches the sill. It looks like this occurs
at around 8000 ft, but we can use the variogram fitting function in the gstat package to help us
pick the range. The variogram range is determined by:
> porvm
model psill range
1 Nug 0.300 0.000
2 Sph 3.379 8769.015
So it fit a range of 8769 ft. This means that our variogram model equation is zero for a distance
of zero (no variability at zero distance!) and is the nugget effect plus our spherical variogram
model equation for distances larger than 0:
( ) {
[ ( ) ( ) ]
We can now plot the variogram model and the experimental points together to check if our fit is
reasonable. There is a built in plotting capability in gstat, but the plots aren’t that pretty so we
can do this ourselves given the above function for the spherical variogram model. If you still
have your variogram plot open you can add a plot of the model with:
curve(0.3+3.379*(1.5*(x/8769)-0.5*(x/8769)^3),add=TRUE)
You should now have a reasonable variogram model that looks like:
Semivariogram
3
2
1
0
Distance (ft)
3. Kriging
We can now use our variogram model to estimate porosity over the entire area by kriging. To
do this we need a list of locations at which we are going to estimate. We can do this by creating
a regular grid which “paves” the area. Look back at the area quickly. The area is could be
summarized as spanning Easting (X) values of 0 to 10500 ft and Northing (Y) values of 0 to
10500 ft. We could consider estimating a grid where the cells were 250 ft by 250 ft. This would
mean we would have 42 cells each in the X and Y directions. The procedure for generating this
grid is then:
gt = GridTopology(cellcentre.offset=c(125,125), cellsize=c(250,250),
cells.dim=c(42,42))
grd=SpatialGrid(gt)
The generated grid can be checked:
> summary(grd)
Object of class SpatialGrid
Coordinates:
min max
[1,] 0 10500
[2,] 0 10500
Is projected: NA
proj4string : [NA]
> mean(Porosity)
[1] 8.401975
To krige we provide the variable we are kriging, porosity data, the grid of points to estimate,
the variogram model and the mean (beta parameter here). The results can be plotted using the
spatial plotting utility.
11
10
Kriging is exact – so it reproduces the data points exactly. We should have a look and check that
this is the case. To do this we first extract the kriged estimates:
krigedpoints=cbind(coordinates(grd),krigedpor$var1.pred)
install.packages('lattice')
library(lattice)
We can make a level plot using the same colors as before:
levelplot(krigedpoints[,3]~krigedpoints[,1]*krigedpoints[,2],
cuts=length(mycol), col.regions=mycol,xlab="Easting",
ylab="Northing",groups=1)
To add the points, we can use the lattice trellis method:
trellis.focus("panel", 1, 1, highlight=FALSE)
panel.xyplot(X,Y,pch=21,cex=1.3,fill=porcol)
trellis.unfocus()
11
10000
10
8000
6000
Northing
4000
7
6
2000
Easting
We see a transition from low to high porosity values moving to the East and all the data values
are reproduced. You should see that in the far South East corner where there is very little data,
the estimated values tend towards the mean. Now if you have time you could try the same
procedure with the permeability data and see if you can estimate permeability over the area.