Sie sind auf Seite 1von 16

ITC e-learning module Introduction to Applied Geostatistics and Open-Source Statistical Computing Module Information 2011

D G Rossiter University of Twente, Faculty ITC Enschede (NL) January 7, 2011

Contents
1 Objectives 2 Learning method 3 How to complete a lesson 4 Communication 5 Assessment 6 Module schedule 7 Topics 8 Datasets A Prerequisite knowledge B Learning resources B.1 Textbooks: geostatistics B.2 Textbooks: R . . . . . . . B.3 Web pages . . . . . . . . . B.4 ITC library access . . . . C Acknowledgements Copyright c University of Twente, Faculty ITC 2011 1 2 2 4 4 5 6 8 8 10 10 10 11 11 12

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

References

14

ii

Thank you for choosing to invest your time in this ITC e-learning module. The topic is quite technical: both the subject matter of applied geostatistics and the open-source computing environment. We believe that when you complete this module, even if you do not fully grasp all aspects the rst time through, you will be in good condition to go further on your own and apply both the subject knowledge and the computing environment in your own work. This is the fourth version of this distance education module; in addition much of the material is based on elective modules in a classroom/computer lab. setting (face-to-face instruction) given at ITC for the past eight years. Still, there can always be improvements, and we welcome your suggestions and comments.

Objectives
From the course announcement1 : This course is aimed at postgraduate students and working professionals who wish to apply spatial statistics and geostatistical computing in research and consulting projects. On completion of this module, students should be able to: 1. select and apply appropriate visualisation and numerical techniques to explore the structure of a spatial data set; 2. select and apply appropriate procedures to model the structure of a spatial data set; 3. select and apply appropriate procedures to predict data values at unvisited locations using parametric and non-parametric models; 4. design a sampling strategy to reveal or account for spatial structure. 5. use the R environment for statistical computing at an intermediate level and be able to improve their skills by self-study and experimentation. The main objective is not to teach as many techniques as possible in the time available (although there is certaintly a lot of material here), but to equip you to continue learning and applying correct geostatistical techniques to your own problems, throughout your career. It is clearly impossible to cover all topics that might be relevant to all students; and even if it were, there are always new developments. Thus the emphasis is on learning how to learn and how to nd resources as necessary.
1

http://www.itc.nl/Pub/Study/Courses/C10-AES-DE-02

In keeping with this philosophy, we use the R Project for Statistical Computing [19] as the main computing environment. This open-source environment provides unlimited opportunities for exploratory and statistical analysis and graphics; once you learn this environment you will never be blocked when you have to apply new methods.

Learning method
There are ve elements to learning in this module: 1. Lectures in the form of presentation slides, covering essential theory, with self-study questions to allow you to check your understanding; along with answers; 2. A set of tutorial computer exercises that explain and illustrate key concepts; these are the form: 2.1 Task description; 2.2 Suggested computer procedures (typically R code); 2.3 Questions to check understanding; 2.4 Answers with explanation; 2.5 Optional challenges. 3. Feedback on the exercises from the instructor. 4. A discussion forum on Blackboard2 where students and instructors exchange questions, answers, extra information, etc. 5. An exercise on literature search and critical reading, in which the student must nd and evaluate an applications of geostatistics; see separate document Literature search and critical reading. 6. A nal project, in which the student goes deeper into some aspect of geostatistics, perhaps with their own data; see separate document Final Project.

How to complete a lesson


For the set lessons, we recommend the following procedure: 1. Skim the lecture to see its scope, and to see if you are already familiar with some of the concepts; 2. Skim the exercise (just the section titles) to see what will be covered; 3. Read the lecture and answer (for yourself) the self-study questions.
2

http://bb.itc.nl/

These are repeated, with possible answers, at the end of the lecture. Mark the sections that are confusing. Do not spend too much time on the lecture at this point. Remember that the lecture notes are not a textbook; see a list of these in B. The lectured introduce and explain the key concepts; however most of the real learning takes place in the exercises. 4. Follow the exercise in detail, and answer (for yourself) the self-study questions. Possible answers are given at the end of each section. This is where most of your time will be spent. The advantage of the exercises is that you actually compute and view results. If you are confused by a result or code, experiment ! You cant break the computer with R code. If you get stuck in an exercise: Read the instructions again slowly and make sure youve followed all the steps. Review your output against that shown in the exercise; Post a question on the Blackboard discussion group ; both the instructor and fellow students will read this. 5. At the end of each exercise is a self-test of how well you mastered this exercise. You should be able to complete the tasks and answer the questions with the knowledge you have gained from the exercise. Please submit your answers (including graphical output) to the instructor for grading and sample answers. Submit the answers as a single document (preferably in PDF, but will accept word processor) in the Blackboard environment, under the same lesson as the lecture and exercise, labelled by your ITC e-mail name and the exercise number, e.g. luo619_ex1.pdf (for the student with ITC e-mail name luo619. These answers will not be graded except as completed in three levels: 0 = not at all addressing the task; 1 = some deciency; 2 = satisfactory; 3 = extra work and insight. All must be completed with at least a 2; if you receive a lower grade, you can re-do the exercise after seeing the instructors solution 6. Review the lecture notes; by now most of the concepts mentioned in the lecture notes have been covered in the corresponding excercise. Missing background What should you do if you come across unfamiliar background material? It is not possible to go back and review, for example, linear regression theory in the time period of this module. We suggest that you review just enough to understand what is going on in this module, and save detailed review for later. 3

Communication
Almost all communication with the instructor, and between students, will take place in the Blackboard on-line environment. Students will receive an ITC e-mail ID and password, and be enrolled in this modules corresponding Blackboard module. The same ID and password are used for e-mail and Blackboard. See separate document for how to use this environment. The instructor will post information in the announcements, visible on the opening page; Students should post their experiences and ask questions on the group discussion page (under the Communication link on Blackboard); anyone can answer. The instructor will read all new discussions on this page at least twice a day, at 0900 and 1600 Central European Time Monday Friday, and comment as necessary. The course CD is reproduced in the Documents and Assignments folders; updated versions will be posted to Blackboard and announced as necessary; Students should upload assignments as explained above. The exceptions are: 1. Private matters such as sickness or other inability to complete work on time: e-mail the instructor directly. 2. Administrative matters (grades, enrollment etc.): e-mail the course secretary directly.

Assessment
Students must complete the required portion of six set computer exercises; these have self-check questions and answers. These are not graded (i.e. completion is satisfactory) but the instructor will review your answers and then supply a model answer sheet for your review. This is 50% of the grade. Students must also complete the Literature search and critical reading exercise; this will be graded on the ITC scale, 0 10, for 10% of the grade. The data analysis project or substitute exercises (??) report will be graded on the ususal ITC scale (< 60 = fail, 60 = pass, 70 = good, 80 = very good, 90 = excellent, 100 = perfect) and according to ITC standards for in-house modules; this will be weighted as 40% of the module grade. For example: a student who completes all exercises, receives an 8 of 10

on the small exercise, and 70 of 100 on the data analysis project would receive a grade of 50 + 8 + (70 x .4) = 86, qualied as Very Good in the Dutch system. Students who have successfully completed the course receive a Certicate, which can lead to an exemption at ITC (i.e. is equivalent to having taken a similar module within ITC). Please see the attached Assessment Regulations for details.

Module schedule
The 2010 module runs from Monday 25 January through Friday 05-March. Work will be accepted for two more weeks, i.e. until Saturday 20-March. There are six set topics (lectures, with accompanying exercises), an exercise on literature search and critical reading and a nal project. The lectures and exercises are spread out over the rst three weeks, i.e. two per week. Due dates: Exercise 1 Exercise 2 Exercise 3 Exercise 4 Exercise 5 Exercise 6 Project selection Wednesday 26-January Friday 28-January Wednesday 02-February Friday 04-February Wednesday 09February Friday 11-February Wednesday 16-February Friday 18-February

Literature search and critical reading Project preliminary results Project submission

Wednesday 23-February

Friday 04-March

Assignments are due at 1800 your local time. Please try to be on time with assignments, so that we stay in synch and can support each other via Blackboard discussions.

Topics
The distance course is equivalent to three weeks full-time resident instruction at ITC. Thus, if you work on the course for half-days over the six weeks you have done enough! Please do not feel pressured to do more; we know there is a lot of material, which you can review at your own speed after the course is nished. Weve supplied more material than can be comfortably covered in three full-time weeks. Of course, if you have time and feel motivated, feel free to submit as much work as you can, we will review it all. The rst two thirds (i.e. four weeks) comprise six mandatory topics; there are also ve optional topics which you may want to explore if you have time, or for your project in the last third of the course. All topics have some optional sections which can be skipped if you are pressed for time. Reading the lecture notes and answering the self-study comprehension questions in the notes should take four hours per topic. If you nd you are spending more time, you can either (i) continue because you nd it so interesting, (ii) skim some of the material you dont fully understand and review after the module is over. Each of the lecture topics is accompanied by a computer exercise. Completing the required sections of the exercises and answering the comprehension questions should take six hours per exercise. Most exercises will have optional sections which may be done at the students convenience either during or after the course. The six mandatory topics are (file names in brackets): 1. Geo-statistical computing (ov1.pdf, ex1.pdf) 1.1 The added value of spatial statistics 1.2 Inventory of packages 1.3 The R Project for Statistical Computing: what and why? 1.4 Introduction to the R environment and S language 2. Exploring and visualizing spatial data (ov2.pdf, ex2.pdf) 2.1 Visualizing spatial structure: postplots, quantile plots 2.2 Visualizing regional trends 2.3 Visualizing spatial dependence: h-scatterplots, variogram cloud, experimental variogram 2.4 Visualizing anisotropy: variogram surfaces, directional variograms 3. Modelling spatial structure from point samples (ov3.pdf, ex3.pdf) 3.1 Trend surfaces 3.2 Random elds 3.3 Stationarity; the intrinsic hypothesis 3.4 Models of spatial covariance 3.5 Variogram analysis; variogram model tting

4. Spatial prediction from point samples (Part 1) (ov4.pdf, ex4.pdf) 4.1 A taxonomy of prediction methods 4.2 Non-geostatistical methods 4.3 Introduction to Ordinary Kriging 5. Spatial prediction from point samples (Part 2) (ov5.pdf, ex5.pdf) 5.1 Derivation of the Kriging equations 5.2 Block Kriging 5.3 Universal Kriging 6. Assessing the quality of spatial predictions (ov6.pdf, ex6.pdf) 6.1 Kriging variance 6.2 Model validation with an independent data set 6.3 Cross-validation 6.4 Spatial simulation The ve optional topics are: Spatial prediction from point samples (Part 3: Using secondary information) (ov5.pdf, ex5a.pdf) 1. Feature-space modelling 2. Kriging with external drift 3. Universal Kriging Geostatistical risk mapping (ov7.pdf, ex7.pdf) 1. Indicator variables 2. Indicator variograms 3. Probability kriging with indicator variables Spatial sampling (ov8.pdf, ex8.pdf) 1. Design-based sampling in the presence of spatial dependence 2. Optimal sampling grid with known variogram 3. Sampling to model the variogram Spatial simulated annealing for sampling design (ex8a.pdf) Interfacing R spatial statistics with GIS (ex9.pdf) 1. Projections and coordinate systems 2. Creating GoogleEarth layers 3. Importing and exporting grids Several important topics will not be covered in this introductory course. You can choose to work on one of these during the nal project. Modelling anisotropy; anisotropic kriging

Multivariate geostatistics: co-kriging Spatial prediction by splines Spatial means and centroids Point-pattern analysis Detection and modelling of periodic patterns; spectral analysis; wavelets 3D geostatistics Circular statistics

Datasets
For Exercise 1 (all), Exercises 2 and 3 (anisotropy), and Exercise 4 (design-based prediction): River Maas (Meuse) soil pollution [17] For Exercises 2-8: Jura geochemistry (soil samples); used as a running example in the text of Goovaerts [12]; other references are [26, 1] For Exercises 2, 3 and 4 trend surfaces: part of the Cameroon TROPENBOS soil properties dataset [27] For Exercise 2 self test: the Walker Lake synthetic dataset of Isaaks and Srivastava [13] For Exercise 5 self test: Oxfordshire soil samples [3] For Exercise 9 self test: Kansas aquifer depth [8]

Prerequisite knowledge
Prociency in reading technical English; no ocial test is required but the materials are at a fairly advanced level. We suggest TOEFL 500, IELTS 5.5, Michigan 75, or Cambridge CPE/CAE. Prociency in writing technical English; not advanced, but enough to write a coherent technical report. Good basic computer skills; familiarly with a standard web browswer; ability (and sucient privledges) to install software pacakges. A rst University course or equivalent in probability and statistics; the specic knowledge we assume is listed below. We assume no prior knowledge of statistical computing nor of spatial (geo-) statistics. Students should be familiar with the concepts listed below. They should also have access to a statistics textbook where these are covered; a useful (but not comprehensive source) is the Electronic Statistics Textbook from StatSoft3 .
3

http://www.statsoft.com/textbook/stathome.html

Prerequisite: Basic concepts of probability and statistics 1. Measurement systems: nominal, ordinal, interval and ratio scales; 2. Populations vs. samples; 3. Population distribution vs. sampling distribution; 4. Unimodal vs. multimodal distributions; 5. Skewness vs. symmetry 6. Transformation to the natural logarithm 7. Origin of the binomial probability distribution; 8. Origin of the normal (Gaussian) probability distribution; 9. Shape and properties of the normal (Gaussian) probability distribution; Z-values (normal scores); 10. Null and alternate hypotheses; 11. Hypothesis testing; Type I and Type II errors; 12. Signicance levels (): meaning and interpretation; 13. Condence intervals: meaning and interpretation. Prerequisite: Exploratory data analysis 1. Histograms (frequency and density); box plots 2. Empirical cumulative density plots 3. Bivariate scatterplots; 4. Sample range; 5. Sample arithmetic and geometric mean; 6. Sample variance, standard deviation and coecient of variation; 7. Sample quantiles, median and mode. Prerequisite: Feature-space modelling and prediction 1. Covariance; Pearson correlation; correlation coecient; r ; 2. Rank (Spearman) correlation; 3. Linear regression of one dependent (regressed) variable on one independent (regressor) variable; 4. Coecient of determination of a linear model; R 2 ; 5. Regression diagnostics: residuals, leverage; 6. Prediction from regression equations; 7. Condence and prediction intervals; 8. Validation against an independent dataset; gain and bias; 9. Analysis of Variance for categorical variables (1-, 2-way, interactions).

B
B.1

Learning resources
Textbooks: geostatistics There are many geostatistics texts, varying widely in their mathematical level and application focus. Here are a few recommended ones that you might nd useful after completing this course. We did not select any one text because of the diverse background of students taking this module. Texts with a mathematical focus include Chils and Delner [4], Cressie [6] and Ripley [18]; some of the Ripley text is repeated with R code in the advanced R modelling reference by Venables and Ripley [23]. The text by Deutsch and Journel [9] is also mathematical, and aimed at the user of the GSLIB codes. A new text by Diggle and Ribeiro Jr [10] takes a more modern, unied approach to geostatistics than we take here; the ideas in this text are implemented in the Rgeo R package. Texts in an application eld but with a strong mathematical basis include Goovaerts [12] (natural resources), which uses the same Jura dataset used in our exercises as a running example, Webster and Oliver [25] (soil science), and the classic by Isaaks and Srivastava [13] (generic but aimed at geoscientists). This last text rewards slow, careful study and is aimed at practical results rather than extensive theory.

e-book

! e-book; select the e-books link from the digital library web page4 , and
once you have authenticated yourself, you can read it on-line. Texts with some geostatistics but mainly aimed at an application eld include Davis [8] (geology), Kitanidis [14] (hydrology), Fotheringham et al. [11] (geography), and Stein et al. [22] (remote sensing).

Thetext of Webster and Oliver [25] is available from the ITC library as an

B.2

Textbooks: R Dalgaard [7] is an introduction to basic statistics using R. You most likely know all the statistics; here you can see how to compute them in R. The Use R! series from Springer is relatively aordable and is aimed at getting you to use R for specic applications. The most relevant for this course is Applied Spatial Data Analysis with R by Bivand et al. [2]; this has comprehensive coverage of sp and gstat by these packages authors, as well as of spatstat, rgdal and others. There are also books on this series on time series analysis, wavelets [15] and the lattice graphics package [20], among others. These are also available by subscription as e-books from Springer.
4

http://www.itc.nl/library/digital_library.asp

10

The R book by Crawley [5] covers many topics and is also available as a ebook from the ITC library. Another conventional and e-book using R is by Reimann et al. [16]; this is a comprehensive introduction to environmental statistics, beginning from a basic mathematical level. Shumway and Stoer [21] is a comprehensive introduction to time series analysis using R. A useful R book for advanced statistical methods in R is Modern Applied Statistics with S (4th Edition) by Venables and Ripley [24], which covers a wide variety of modern methods, including geostatistics and time series analysis. B.3 Web pages Many instructors have put some material from their geostatistics courses on the web. This varies widely in quality. Wikipedia entries are of variable quality. For general statistics, the Electronic Statistics Textbook from StatSoft5 mentioned above is good. B.4 ITC library access All students with an ITC e-mail address, inluding distance education students, can access all resources of the ITC library6 . This includes the digital library, with an extensive list of journals (full-text), search engines (e.g. Web of Science, Elsevier Science Direct), and reference works, as well as the e-books listed in the previous section. Among the most relevant journals for applied geostatistics are: Mathematical Geosciences Computers & Geosciences Geoderma Water Resources Research Agricultural Systems These and many others are available to you to search and download fulltext PDFs. Take advantage of your enrollment in this module to dig deeper into the literature in your application eld.
5 6

http://www.statsoft.com/textbook/stathome.html http://www.itc.nl/library

11

Acknowledgements
Several ITC sta members and students have contributed to the design and development of this ITC distance course. Only the key persons are mentioned here. Course content: DG Rossiter and Prof. Alfred Stein Course design: DG Rossiter and Prof. Alfred Stein Author of course materials: DG Rossiter with advice from Prof. Alfred Stein E-learning support: Ineke ten Dam Technical support: Support unit e-learning; Linlin Pei (Blackboard); Cecile Plomp (Video) Course secretary: Cecile Plomp Content testing: Kerstin Mhlner, Tomoko Doko

12

References
[1] O Atteia, J-P Dubois, and R Webster. Geostatistical analysis of soil contamination in the Swiss Jura. Environmental Pollution, 86(3):315 327, 1994. [2] Roger S. Bivand, Edzer J. Pebesma, and V. Gmez-Rubio. Applied Spatial Data Analysis with R. Use R! Springer, 2008. URL http: //www.asdar-book.org/. [3] P A Burrough, P H T Beckett, and M G Jarvis. The relation between cost & utility in soil survey (I-III). Journal of Soil Science, 22(3):359 394, 1971. [4] J-P Chils and P Delner. Geostatistics: modeling spatial uncertainty. Wiley series in probability and statistics. John Wiley & Sons, New York, 1999. [5] M. J. Crawley. The R book. Wiley & Sons, Chichester, 2007. [6] N Cressie. Statistics for spatial data. John Wiley & Sons, New York, revised edition, 1993. [7] Peter Dalgaard. Introductory Statistics with R. Springer, 2002. ISBN 0-387-95475-9. [8] J C Davis. Statistics and data analysis in geology. John Wiley & Sons, New York, 3rd edition, 2002. [9] C V Deutsch and A G Journel. GSLIB: Geostatistical software library and users guide. Oxford University Press, Oxford, 1992. [10] P J Diggle and P J Ribeiro Jr. Model-based geostatistics. Springer, 2007. [11] A S Fotheringham, C Brunsdon, and M Charlton. Quantitative geography : perspectives on spatial data analysis. Sage Publications, London ; Thousand Oaks, Calif., 2000. [12] P Goovaerts. Geostatistics for natural resources evaluation. Applied Geostatistics. Oxford University Press, New York; Oxford, 1997. [13] E H Isaaks and R M Srivastava. An introduction to applied geostatistics. Oxford University Press, New York, 1990. [14] P K Kitanidis. Introduction to geostatistics : applications to hydrogeology. Cambridge University Press, Cambridge, England, 1997. [15] G. P. Nason. Wavelet methods in statistics with R. Use R! Springer, New York ; London, 2008. [16] Clemens Reimann, Peter Filzmozer, Robert G. Garrett, and Rudolf Dutter. Statistical data analysis explained : applied environmental statistics with R. Wiley & Sons, Chichester, 2008. URL http://ezproxy.itc.nl:2585/depp/reader/ protected/external/AbstractView/S9780470987599.

13

[17] M G J Rikken and R P G Van Rijn. Soil pollution with heavy metals - an inquiry into spatial variation, cost of mapping and the risk evaluation of copper, cadmium, lead and zinc in the oodplains of the Meuse west of Stein, the Netherlands. Doctoraalveldwerkverslag, Dept. of Physical Geography, Utrecht University, 1993. [18] B D Ripley. Spatial statistics. John Wiley and Sons, New York, 1981. [19] D G Rossiter. Introduction to the R Project for Statistical Computing for use at ITC. International Institute for Geo-information Science & Earth Observation (ITC), Enschede (NL), 3.7 edition, 2009. URL: http: //www.itc.nl/personal/rossiter/teach/R/RIntro_ITC.pdf. [20] Deepayan Sarkar. Lattice : multivariate data visualization with R. Use R! Springer, New York, 2008. URL http://lmdvr.r-forge. r-project.org/. [21] Robert H. Shumway and David S. Stoer. Time Series Analysis and Its Applications, with R examples. Springer Texts in Statistics. Springer, 2nd edition, 2006. URL http://www.stat.pitt.edu/stoffer/ tsa2/index.html. [22] A Stein, Freek van der Meer, and B G F Gorte, editors. Spatial statistics for remote sensing. Kluwer Academic, Dordrecht, 1999. [23] W N Venables and B D Ripley. Modern applied statistics with S. Springer-Verlag, New York, fourth edition, 2002. [24] William N. Venables and Brian D. Ripley. Modern Applied Statistics with S. Fourth Edition. Springer, 2002. ISBN 0-387-95457-0. [25] R Webster and M A Oliver. Geostatistics for environmental scientists. Wiley & Sons, Chichester, 2001. [26] R Webster, O Atteia, and J P Dubois. Coregionalization of trace metals in the soil in the Swiss Jura. European Journal of Soil Science, 45(2): 205218, 1994. [27] M Yemefack, D G Rossiter, and R Njomgang. Multi-scale characterization of soil variability within an agricultural landscape mosaic system in southern Cameroon. Geoderma, 125(1-2):117143, 2005.

14

Das könnte Ihnen auch gefallen