Sie sind auf Seite 1von 49

Master in Geographical Information Systems & Science

Master in Geospatial Technologies

Geostatistics

TUTORIAL
Exploratory spatial data analysis
ANA CRISTINA COSTA | ccosta@novaims.unl.pt
April 2016
TUTORIAL: Exploratory Spatial Data Analysis 2

Table of Contents
1 Introduction ......................................................................................................... 6

1.1 Learning outcomes ................................................................................................ 6

2 Exploratory data analysis in Excel ......................................................................... 7

2.1 Filter tool ............................................................................................................... 7

2.2 Analysis ToolPak .................................................................................................. 10

2.3 Descriptive statistics ............................................................................................ 11

2.3.1 Univariate analysis ....................................................................................... 11

2.3.2 Bivariate analysis .......................................................................................... 15

3 Exploratory spatial data analysis in ArcGIS .......................................................... 19

3.1 Query Builder tool ............................................................................................... 19

3.2 Exporting to shapefile and enabling Extensions ................................................. 21

3.3 ESDA tools ............................................................................................................ 23

3.3.1 Data posting ................................................................................................. 23

3.3.2 Regional histogram....................................................................................... 27

3.3.3 Indicator maps .............................................................................................. 30

3.3.4 Voronoi maps................................................................................................ 36

3.3.5 Local Moran's I statistic ................................................................................ 38

3.3.6 Global Moran's I statistic .............................................................................. 47


TUTORIAL: Exploratory Spatial Data Analysis 3

List of Figures
Figure 1: R5D.xls file in Excel ................................................................................................ 7

Figure 2: Filter tool from the DATA menu ........................................................................... 8

Figure 3: Filtering the year to select the 1990’s data .......................................................... 8

Figure 4: Selecting all data in order to copy and paste it in a new worksheet ................... 9

Figure 5: Changing the name of the worksheet .................................................................. 9

Figure 6: Deleting the row where R5D = –9999 in 1990 .................................................... 10

Figure 7: Add-Ins dialog box ............................................................................................... 12

Figure 8: Data Analysis button of the Data menu ............................................................. 12

Figure 9: Data Analysis window of the Analysis ToolPak ................................................... 12

Figure 10: Descriptive Statistics dialog box........................................................................ 13

Figure 10: Illustration of the relationship between the mean, the median and the mode
............................................................................................................................................ 14

Figure 11: Scatterplots illustrating different types of association between two variables
............................................................................................................................................ 16

Figure 12: Correlation dialog box ....................................................................................... 17

Figure 13: R5D index versus the weather stations’ elevation ........................................... 17

Figure 14: Add XY Data... dialog box .................................................................................. 19

Figure 15: Query Builder tool ............................................................................................. 21

Figure 16: Extensions dialog box ........................................................................................ 22

Figure 17: Geostatistical Analyst toolbar activation .......................................................... 22

Figure 18: Setting symbol properties for graduated colours example. ............................. 24

Figure 19: Setting graduated colours for R5D data posting .............................................. 25

Figure 20: R5D data posting using graduated colours ....................................................... 26

Figure 21: Setting points’ labels for R5D data posting ...................................................... 26

Figure 22: R5D data posting using points’ labels ............................................................... 27

Figure 23: Histogram on the Explore Data menu of the Geostatistical Analyst toolbar ... 28

Figure 24: Regional histogram and descriptive statistics of the R5D index ...................... 28
TUTORIAL: Exploratory Spatial Data Analysis 4

Figure 25: Selecting units in the map to investigate spatial regimes ................................ 29

Figure 26: Setting the threshold value of the indicator map in the Classification window
............................................................................................................................................ 31

Figure 27: Indicator map of the R5D index using 67.4 mm as threshold .......................... 31

Figure 28: Adding a new field to an attribute table .......................................................... 32

Figure 29: Naming the new field of the attribute table .................................................... 33

Figure 30: Python parser to code the values of an indicator variable in the Field
Calculator window .............................................................................................................. 33

Figure 31: Attribute table of the R5D_1990 layer with the new indicator variables
(Quartile1, Quartile2, Quartile3) ........................................................................................ 34

Figure 32: Symbology of the 1st indicator variable map (1st quartile of R5D) ................... 35

Figure 33: Indicator maps of the R5D index ...................................................................... 35

Figure 34: Voronoi Map dialog box .................................................................................... 37

Figure 35: Cluster Voronoi map of the R5D index ............................................................. 38

Figure 36: Illustration of the Cluster and Outlier Analysis (Anselin Local Moran's I) tool 39

Figure 37: Illustration of z-score peaks in the Spatial Autocorrelation by Distance graph.
............................................................................................................................................ 43

Figure 38: Incremental Spatial Autocorrelation dialog-box .............................................. 44

Figure 39: Spatial autocorrelation of the R5D index by distance (z-scores of the Local
Moran’s I statistic) .............................................................................................................. 45

Figure 40 Spatial autocorrelation of the R5D index by distance (z-scores of the Local
Moran’s I statistic) with the Beginning Distance parameter equal to 20000 meters ....... 45

Figure 41: Cluster and Outlier Analysis (Anselin Local Moran's I) dialog-box ................... 46

Figure 42: Anselin’s Local Moran's I map ........................................................................... 47

Figure 43: Spatial Autocorrelation (Morans I) dialog-box ................................................. 48


TUTORIAL: Exploratory Spatial Data Analysis 5

List of Tables
Table 1: Summary statistics of the R5D index in 1990 and stations’ elevation ................ 14

Table 2: Summary of when to use the mean, median and mode ..................................... 15

Table 3: Correlation matrix ................................................................................................ 17

Table 4: Interaction of the Conceptualization of Spatial Relationships parameter with


possible Distance Band or Threshold Distance values ....................................................... 42

Table 5: Global Moran's I Summary for the R5D index ..................................................... 49


TUTORIAL: Exploratory Spatial Data Analysis 6

1 INTRODUCTION
The purpose of this tutorial is to explain how to explore data using descriptive statistics
in Excel and, mainly, how to use Exploratory Spatial Data Analysis (ESDA) tools in
ArcGIS/ArcMap. This tutorial was produced using ArcGIS 10.1.

The illustration of concepts and tools is based on the R5D index data collected in the
south of Portugal during the 1990s decade (disclosed in the R5D Excel file), particularly
the 1990 year. The study area boundary (polygon) is disclosed in the StudyArea shapefile
(Limit folder). For further details on the data used in this tutorial, refer to the
LabData_info.pdf file.

Additional resources on the Internet, about the theoretical background or YouTube


videos demonstrating the use of the tools, are listed in the end of several topics covered.

1.1 Learning outcomes


At the end of this tutorial students should be able to:
 Use the Filter tool in Excel to select a subset of data.
 Explore data using descriptive statistics with the Analysis ToolPak in Excel.
 Explore the possible existence of an association between two variables in Excel.
 Add XY data stored in an Excel file to an ArcMap project.
 Use the Query Builder tool to select a subset of data in ArcMap.
 Export table data to a Shapefile or Feature Class in ArcMap.
 Enable Extensions and Toolbars in ArcMap.
 Explore data using Data posting in ArcMap.
 Explore data using descriptive statistics and the Regional histogram [Histogram
tool of the Geostatistical Analyst in ArcGIS].
 Explore data using Indicator maps [Query Builder tool in ArcMap].
 Explore data using Voronoi maps [Voronoi tool of the Geostatistical Analyst in
ArcGIS].
 Explore the possible existence of local clusters and spatial outliers [Cluster and
Outlier Analysis (Anselin Local Moran's I) tool of the Spatial Statistics Tools in
ArcMap].
 Assess global spatial autocorrelation using the Global Moran's I statistic [Spatial
Autocorrelation (Global Moran's I) tool of the Spatial Statistics Tools in ArcMap].
TUTORIAL: Exploratory Spatial Data Analysis 7

2 EXPLORATORY DATA ANALYSIS IN EXCEL


Open the R5D.xls file in Excel (Figure 1). The table columns correspond to the following
fields:
 x and y: projected spatial coordinates (Lisboa_Hayford_Gauss_IGeoE) of the
weather stations where the daily precipitation was measured (in meters).
 year: the year for which the R5D index was computed.
 R5D: values of the R5D index (our study variable), which is computed as the
greatest consecutive 5-day precipitation total per year (in millimetres). Some
authors consider this index as a flood indicator.
 elev: elevation of the weather stations (in meters).
 ID: code of the weather stations.
 LOCATION: name of the location of the weather stations.

Figure 1: R5D.xls file in Excel

The R5D index data are available for the whole 1990s decade, but in this tutorial we will
only use the data of the 1990 year. Therefore, the first step of the analysis corresponds
to filtering the data and creating a new table with the data rows corresponding to that
year. Afterwards, we will explore the data using descriptive statistics and graphic tools in
Excel and ArcGIS/ArcMap.

2.1 Filter tool


The following steps explain how to create a new worksheet, named R5D_1990, with the
values of the R5D index corresponding to the year 1990.

1. Click in any cell of the first row of data (or select the row with the fields/variables
names).
TUTORIAL: Exploratory Spatial Data Analysis 8

2. Turn on the Filter tool by pressing the Filter button in the DATA menu (Figure 2).
This will add a menu to each field name, which is accessible by selecting the small
arrow in each cell of the first row.

3. In the menu (small arrow) of the year field, unselect the “Select All” and, afterwards,
select 1990 (Figure 3).

4. Select all rows and columns, and then copy and paste it into a new worksheet (Figure
4).

5. Change the name of this worksheet from Sheet2 to R5D_1990 (Figure 5).

Figure 2: Filter tool from the DATA menu

Figure 3: Filtering the year to select the 1990’s data


TUTORIAL: Exploratory Spatial Data Analysis 9

Figure 4: Selecting all data in order to copy and paste it in a new worksheet

Figure 5: Changing the name of the worksheet

Before analysing the R5D index, we need to remove the missing values of R5D, which are
coded with the NODATA value –9999. These R5D values indicate that there are daily
precipitation data available for the corresponding year and location, but there are not
enough daily data to compute the R5D index (see the LabData_info.pdf file). This is why
these missing values were assigned a code, instead of being left in blank.

To delete the R5D values equal to –9999, we will use the Filter tool:

1. Turn on the Filter tool in the worksheet R5D_1990.


TUTORIAL: Exploratory Spatial Data Analysis 10

2. In the menu (small arrow) of the R5D field, unselect the “Select All” and, afterwards,
select –9999.

3. Delete the (only) row that has been selected in the previous step (Figure 6).

4. Turn off the Filter tool by pressing the Filter button in the DATA menu.

Figure 6: Deleting the row where R5D = –9999 in 1990

2.2 Analysis ToolPak


Basic summary statistics can be easily computed in Excel using the Analysis ToolPak add-
in. The Analysis ToolPak is a free add-in for Microsoft Excel that can save time and effort
when generating statistical analyses. The following explains how to enable it.
The following steps describe how to enable the Analysis ToolPak:

1. In Excel 2007, click the MS Office button, , and then press the Excel Options
button. In Excel 2013, select Options from the File menu.

2. Select Add-Ins in the left panel.

3. Click the GO… button.

4. In the Add-Ins dialog box, select the Analysis ToolPak item (Figure 7). Afterwards,
click the OK button.

Notes:

- If the Analysis ToolPak is not listed in the Add-Ins dialog box, click the Browse…
button to locate it.

- If a pop-up message appears to inform that the Analysis ToolPak is not installed
in the computer, click the Yes button to install it.

5. The (new) Data Analysis button is added to the Data menu (Figure 8).
TUTORIAL: Exploratory Spatial Data Analysis 11

Additional resources:

 See a video from YouTube on How to enable the Analysis ToolPak add-in in Excel
2007: https://www.youtube.com/watch?v=6nCP65Nbm0E [accessed: 13 April,
2016]

 See a video from YouTube on How to enable the Analysis ToolPak add-in in Excel
2013: https://www.youtube.com/watch?v=c-
lp4RKxHIM&annotation_id=annotation_3061641839&feature=iv&src_vid=E_apzh
8oCU8 [accessed: 13 April, 2016]

 See a video from YouTube on How to enable a Data Analysis Plug in Alternative
for Mac: https://www.youtube.com/watch?v=LRZTvAFfKEU&nohtml5=False
[accessed: 13 April, 2016]

2.3 Descriptive statistics


Descriptive statistics are used to describe the basic features of the data in a study. They
provide simple summaries about the study variables, as well as insights on the possible
relationship between them. Together with simple graphics analysis, they form the basis
of every quantitative analysis of data.

First, we will explore the data of the R5D and elev variables, separately, by using
univariate descriptive statistics1. A dataset with two variables contains what is called
bivariate data. Hence, afterwards, we will investigate the possible existence of a
relationship between the R5D index and the stations’ elevation using a bivariate analysis,
which is based on the correlation coefficient and the scatterplot.

2.3.1 Univariate analysis


The following steps describe how to compute descriptive statistics using the Analysis
ToolPak:
1. Click the Data Analysis button from the Data menu (Figure 8).

2. In the Data Analysis window, select Descriptive Statistics (Figure 9).

3. Fill in the fields of the Descriptive Statistics dialog box as follows (Figure 10):
a) Click the Input Range field, and select (with the mouse) all data of the R5D and
elev variables, including the names of these variables from the first row.

b) Click the Labels in First Row option.


c) Verify that the Output Options is set to New Worksheet Ply. This option specifies
that the statistics are presented in a new worksheet.
d) Click the Summary Statistics option.

1
Univariate descriptive statistics signifies that the statistics are separately computed for each variable.
TUTORIAL: Exploratory Spatial Data Analysis 12

e) Click the Ok button.

Figure 7: Add-Ins dialog box

Figure 8: Data Analysis button of the Data menu

Figure 9: Data Analysis window of the Analysis ToolPak


TUTORIAL: Exploratory Spatial Data Analysis 13

Figure 10: Descriptive Statistics dialog box

The descriptive statistics of the R5D index (Table 1) allow concluding that, in 1990:

 The R5D index was measured in 92 [count] weather stations in the south of
Portugal.

 The distribution of the R5D index is slightly skewed with a right tail [positively
asymmetric], because the mean is slightly greater than the median (Figure 11).

 The regional average of the R5D index is equal to 88 mm [mean], and the typical
deviation from this value is equal to 26.9 mm [standard deviation].

 The R5D index is smaller than 81.25 mm [median] in 50% of the weather stations.

 The R5D index has a great variability in the south of Portugal: the minimum value
was observed in Castro Verde (49 mm) and the maximum in Comporta (184.1
mm).

The descriptive statistics of the weather stations’ elevation (Table 1) allow concluding:

 The distribution of the stations’ elevation is slightly positively asymmetric (i.e.,


skewed with a right tail).

 The average elevation is equal to 188 m, and the typical deviation from this value
is equal to 113.5 m.

 50% of the weather stations are located at an altitude smaller than 174.5 meters.

 The stations’ elevation has a great variability in the south of Portugal: the lowest
station is located in Montevil (5 m) and the highest in São Julião (530 m).
TUTORIAL: Exploratory Spatial Data Analysis 14

Table 1: Summary statistics of the R5D index in 1990 and stations’ elevation

R5D Elevation
Mean 88.004 188.098
Standard Error 2.802 11.829
Median 81.25 174.5
Mode 115.7 170
Standard Deviation 26.872 113.465
Sample Variance 722.107 12874.199
Kurtosis 0.809 0.359
Skewness 0.904 0.665
Range 135.1 525
Minimum 49 5
Maximum 184.1 530
Sum 8096.4 17305
Count 92 92

Figure 11: Illustration of the relationship between the mean, the median and the mode

NOTES:
The value of the Mode computed by Excel for continuous data is wrong! The mode is the
most frequent value in the data set. But, when we have continuous data, we are more
likely not to have any one value that is more frequent than the other. For continuous
data, the mode represents the highest bar in a histogram. Normally, the mode is used
for categorical data where we wish to know which is the most common category.

The mean is dragged in the direction of extreme values in the tail of the distribution
(skewed distributions). The more skewed the distribution, the greater the difference
between the median and mean, and the greater emphasis should be placed on using the
median as opposed to the mean. The best measure of central tendency with respect to
the different types of variables is shown in Table 2.
TUTORIAL: Exploratory Spatial Data Analysis 15

Table 2: Summary of when to use the mean, median and mode

Measure of
Type of variable
central tendency
Categorical – nominal Mode
Categorical – ordinal Median
Discrete (counts; not skewed) Mean
Discrete (counts; skewed) Median
Continuous (not skewed) Mean
Continuous (skewed) Median

Additional resources:

 See a video from YouTube on How to produce a Histogram and compute


Descriptive Statistics using the Analysis ToolPak:
https://www.youtube.com/watch?v=c-
lp4RKxHIM&annotation_id=annotation_3061641839&feature=iv&src_vid=E_apzh
8oCU8 [accessed: 13 April, 2016]

 See this interactive tutorial to learn more about descriptive statistics and
exploratory data analysis tools: “Summarizing Distributions”, in: Online Statistics
Education: An Interactive Multimedia Course of Study, Rice University (Lead
Developer), University of Houston Clear Lake, Tufts University.
http://onlinestatbook.com/2/summarizing_distributions/summarizing_distributio
ns.html [accessed: 14 April, 2016]

2.3.2 Bivariate analysis


The scatterplot gives a good visual picture of the relationship between two variables (X,
Y). Each pair of values (x, y) contributes one point to the scatterplot, on which points are
plotted but not joined. The resulting pattern indicates the type and strength of the
relationship between the two variables (Figure 12). Therefore, the scatterplot aids the
interpretation of the correlation coefficient (or regression model).

When the points cluster along a straight line, the relationship is named a linear
relationship. If the points cluster along a curved line, it is named nonlinear. The strength
of the relationship is depicted by the cloud of points in the scatterplot. The tighter the
points cluster about a line (linear or nonlinear), the stronger the relationship between
the two variables. If the points are not clustered very closely about a line, the association
is weaker. Diffuse clouds of points indicate the absence of a relationship between the
two variables. When one variable (Y) increases with the second variable (X), we say that
X and Y have a positive association. Conversely, when Y decreases as X increases, we say
that they have a negative association.
TUTORIAL: Exploratory Spatial Data Analysis 16

Figure 12: Scatterplots illustrating different types of association between two variables

The Pearson product-moment correlation coefficient is a measure of the strength of the


linear relationship between two variables. It is referred to as Pearson's correlation or
simply as the correlation coefficient. If the relationship between the variables is not
linear, then the correlation coefficient does not adequately represent the strength of the
relationship between the variables. The sample correlation value can range from -1 to 1.
A value of -1 indicates a perfect negative linear relationship between variables, a value
of 0 indicates no linear relationship between variables, and a value of 1 indicates a
perfect positive linear relationship between variables. For further details, see the
additional resources section below.

The following steps describe how to compute the correlation coefficient between the
R5D index and the other quantitative variables (x-longitude, y-latitude, elev-elevation),
using the Analysis ToolPak:
1. Delete the year column, because it is redundant.

2. Click the Data Analysis button from the Data menu (Figure 8).

3. In the Data Analysis window, select Correlation.

4. Fill in the fields of the Correlation dialog box as follows (Figure 13):
a) Click the Input Range field, and select (with the mouse) all data of the x, y,
R5D and elev variables, including the names of these variables from the first
row.
b) Click the Labels in First Row option.
c) Verify that the Output Options is set to New Worksheet Ply.
d) Click the Ok button.
TUTORIAL: Exploratory Spatial Data Analysis 17

Figure 13: Correlation dialog box

The correlation between the R5D index and the other quantitative variables is very low
(Table 3), which indicates that the relationship is not linear or that there is no
association between them. Considering that the R5D is a precipitation index, we would
expect that when the elevation increases the R5D values also increase (positive
association). In order to further investigate the relationship between the R5D index and
the stations’ elevation, the scatterplot of these variables should be produced, and allows
concluding that there is no association between them (Figure 14).

Table 3: Correlation matrix

x y R5D elev

x 1

y 0.2774 1

R5D -0.3100 -0.1581 1

elev 0.3987 0.2282 -0.0063 1

Figure 14: R5D index versus the weather stations’ elevation


TUTORIAL: Exploratory Spatial Data Analysis 18

Additional resources:

 See this interactive tutorial to learn more about exploring the possible
relationship between two variables: “Describing Bivariate Data”, in: Online
Statistics Education: An Interactive Multimedia Course of Study, Rice University
(Lead Developer), University of Houston Clear Lake, Tufts University.
http://onlinestatbook.com/2/describing_bivariate_data/bivariate.html [accessed:
14 April, 2016]
TUTORIAL: Exploratory Spatial Data Analysis 19

3 EXPLORATORY SPATIAL DATA ANALYSIS IN ARCGIS


The following describes how to add the study area boundary and the R5D data in
ArcGIS/ArcMap.

1. Add the StudyArea shapefile (Limit folder) by pressing the button.

2. Add the Excel table with the R5D data by selecting: File menu + Add Data + Add XY
Data....

3. In the Add XY Data... dialog box, browse for the R5D.xls file in your computer, and
select the R5D$ worksheet. The X and Y fields are automatically recognaised (Figure
15).

4. Click the OK button in the Table Does Not Have Object-ID Field pop-up window.

Figure 15: Add XY Data... dialog box

3.1 Query Builder tool


Similarly to the previous analysis in Excel, first we need to filter the data in order to
select the year 1990 and the R5D values that are different from –9999. In
ArcGIS/ArcMap, queries are used to select a subset of features and table records. All
query expressions in ArcGIS Pro use Structured Query Language (SQL) to formulate these
TUTORIAL: Exploratory Spatial Data Analysis 20

search specifications. The Query Builder may be used somewhat like a wizard, as it
allows you to use buttons and lists to construct your query 2:
- You can construct valid SQL queries regardless of your data source.
- You can build common queries with no prior knowledge of SQL.
- The conditional operators are filtered based on the chosen field type.

There are a few ways you can gain access to the Query Builder if you need to perform a
query on your feature layer or table records, including the Layer Properties dialog box as
follows:

1. Right-click the R5D$ Events layer in the Contents pane, and select Properties… to
open the Layer Properties dialog box.

2. From the Definition Query tab click the Query Builder… button. Proceed as follows,
or write down the query: `year` = 1990 AND `R5D` <> -9999 (Figure 16).

a) Double-click the ‘year’ attribute.

b) Click the “=” sign.

c) Click the Get Unique Values button.

d) Double-click “1990” in the list of years.

e) Click the “And” operator.

f) Double-click the ‘R5D’ attribute.

g) Click the “<>” sign. This is equivalent to the ≠ sign.

h) Click the Get Unique Values button.

i) Double-click “-9999” in the list of R5D values.

3. Click the OK buttons.

2
“Write a query in the query builder”, ArcGIS Pro Tool reference, http://pro.arcgis.com/en/pro-
app/help/mapping/navigation/write-a-query-in-the-query-builder.htm. [Accessed: 15 April 2016]
TUTORIAL: Exploratory Spatial Data Analysis 21

Figure 16: Query Builder tool

If you open up the Attribute Table (right-click the R5D$ Events layer, and click Open
Attribute Table), you can see that all 92 records correspond to the year 1990, and that
all R5D values are different from –9999.

3.2 Exporting to shapefile and enabling Extensions


The interactive feature (brushing the histogram bars) of the Histogram tool in
Geostatistical Analyst is not available for the R5D$ Events layer, so we need to export
these data to a Shapefile or Feature Class. Before using that tool, we also need to enable
the Geostatistical Analyst extension and the corresponding toolbar.

The following describes how to export the previously selected data in the R5D layer to a
Shapefile (or Feature Class):

1. Right-click the R5D$ Events layer, and select Data > Export data….

2. Click the Browse button, , and navigate to the location where you want to store
the shapefile.

3. Change the Save as type drop-down menu to Shapefile. Type the following name for
the new shapefile: R5D_1990.

4. Click the OK button.

5. Click the Yes button to add the shapefile as a layer to the current map.

After adding the R5D_1990 shapefile as a layer to the map, the R5D$ Events layer can be
removed by selecting Remove after right-clicking the R5D$ Events layer.
TUTORIAL: Exploratory Spatial Data Analysis 22

The following describes how to enable the Geostatistical Analyst extension:

1. Select Extensions from the Customize menu.

2. The Extensions dialog box lists the extensions currently installed on your system that
work with the application you are using (i.e, ArcMap). Extensions are listed in this
dialog box whether or not you have registered them or whether or not licenses are
currently available for them on your License Manager. To enable the Geostatistical
Analyst extension, check the box next to it.

3. Click the Close button.

Figure 17: Extensions dialog box

Enabling an extension does not cause the extension's user interface to appear
automatically; it simply enables any controls that the extension provides. If the
extension's controls are on a toolbar, such as the ArcGIS Geostatistical Analyst extension
toolbar, you will still need to display the toolbar by choosing it from the Toolbars pull-
right menu in the Customize menu (Figure 18).

Figure 18: Geostatistical Analyst toolbar activation


TUTORIAL: Exploratory Spatial Data Analysis 23

Additional resources:

 This YouTube video will familiarise ArcMap beginners how to use ArcMap 10.
Topics include opening projects, the organization of the data view, adding data to
the project, using tools and toolbars, and saving projects in different ArcMap
version formats. A Basic Introduction to ArcMap 10:
https://www.youtube.com/watch?v=hqHCJUudPvs&list=PL63EB94891DE02AA9
[accessed: 15 April, 2016]

3.3 ESDA tools


The ESDA environment of ArcGIS Geostatistical Analyst allows you to graphically
investigate your dataset to gain a better understanding of it. Each ESDA tool provides a
different view of the data and is displayed in a separate window. The different views are
Histogram, Voronoi Map, Normal QQPlot, Trend Analysis, Semivariogram/Covariance
Cloud, General QQPlot, and Crosscovariance Cloud. All views interact with one another
and with the ArcMap map. In this tutorial we will only focus the use of the Histogram
and Voronoi Map tools. For further details, refer to the following tutorial that has been
disclosed as a pdf file:

Johnston K, Ver Hoef JM, Krivoruchko K, Lucas N (2001) Using ArcGIS™


Geostatistical Analyst. ESRI, Redlands (California), USA, Chapter 4 (pp. 81-90, 96-
105).

Besides the Regional Histogram (Histogram tool) and the Thiessen polygons (Voronoi
Maps tool), the following sections illustrate the use of Data Posting, Indicator Maps and
the Local Moran's I statistic.

3.3.1 Data posting


ArcGIS includes many layer display options that are used to portray geographic
information. There are numerous ways to represent layers using symbols, colours, and
labels. This tutorial will cover how to use graduated colours, and how to display data
values as labels next to the data points in the map.

Proceed as follows to display the R5D values using graduated colours (Figure 19 and
Figure 20):

1. Right-click the R5D_1990 layer in the Contents pane, and select Properties….

2. Click the Symbology tab on the Layer Properties dialog box.

3. Click Quantities, then click Graduated colors.

4. In the Value field select R5D, which is the numeric field that contains the
quantitative data we want to map.

5. Optionally, change the classification method and the number of classes.


TUTORIAL: Exploratory Spatial Data Analysis 24

 The default classification method is Natural Breaks, which seeks to reduce the
variance within classes and maximize the variance between classes.

 The Quantile method displays the distribution of values in categories with an


equal number of observations in each category.

 The Equal Interval method sets the value ranges in each category equal in
size. The entire range of data values is divided equally into however many
categories have been chosen.

6. Optionally, select a Normalization field to normalize the data. The values in this field
will be used to divide the Value field to create ratios.

7. Click the Ok button.

Figure 19: Setting symbol properties for graduated colours example.


Source: ESRI, “Using graduated colors”, ArcGIS Resource Center,
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Using_
graduated_colors/00s500000029000000/. [Accessed: 15 April 2016]
TUTORIAL: Exploratory Spatial Data Analysis 25

Figure 20: Setting graduated colours for R5D data posting

The map of graduated colours provides a first insight on the spatial distribution of the
R5D index (Figure 21):
 The spatial distribution of the R5D index is fairly homogeneous in the centre of
the study region, varying between 49 mm and 104.7 mm, with the exception of
two locations with higher values.
 The highest values are located in the southern part (Algarve region), in the
northwest corner (Setubal and Troia peninsulas), and in the northeast corner (S.
Mamede mountain range).
 The lowest values are located inland, in the centre of the study region.
 There is no apparent trend over the study domain.

 There is no evidence of outliers (errors or anomalies), because the highest values


are located in mountainous or coastal areas, which is consistent with the R5D
definition. Moreover, the daily precipitation data used to compute the R5D index
was subject to a thorough quality analysis, thus we can assume that there are no
measurement errors.
TUTORIAL: Exploratory Spatial Data Analysis 26

Figure 21: R5D data posting using graduated colours

Proceed as follows to display the data values as labels next to the data points (Figure
22):

1. Right-click the R5D_1990 layer in the Contents pane, and select Properties….

2. Click the Labels tab on the Layer Properties dialog box.

3. Check the box next to ‘Label features in this layer’.

4. Select R5D from the Label Field drop-down menu. Click the Ok button.

Figure 22: Setting points’ labels for R5D data posting


TUTORIAL: Exploratory Spatial Data Analysis 27

Adding the values of the R5D index to the points’ locations in the map (Figure 23) does
not improve our knowledge on the data distribution and patterns, because the density
of points is high. In this case, the simple use of graduated colours is more appropriate.

Figure 23: R5D data posting using points’ labels

3.3.2 Regional histogram


The ESDA environment of the Geostatistical Analyst extension allows you to graphically
investigate your dataset to gain a better understanding of it. The views in ESDA are
interconnected by selecting (brushing) and highlighting the selected points on all maps
and graphs (linking). The following illustrates the use of the Histogram tool to explore
the R5D index data:

1. Click the Geostatistical Analyst drop-down arrow in the Geostatistical Analyst


toolbar, point to Explore Data, then click Histogram (Figure 24).

2. Check if the Layer field is set to R5D_1990. Otherwise, click the Layer drop-down
arrow and select R5D_1990.

3. Click the Attribute drop-down arrow and select R5D. You may want to resize the
Histogram dialog box so you can also see the map.
TUTORIAL: Exploratory Spatial Data Analysis 28

Figure 24: Histogram on the Explore Data menu of the Geostatistical Analyst toolbar
Source: ESRI, “Exercise 2: Exploring your data”, ArcGIS Resource Center, Desktop 10,
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Exercise_2_Explorin
g_your_data/0031000000p8000000/. [Accessed: 15 April 2016]

The distribution of the R5D attribute is depicted in the histogram (Figure 25) with the
range of values separated into 10 classes (number of Bars). The frequency of data within
each class is represented by the height of each bar.

Generally, the important features of the distribution are its central value, spread, and
symmetry. The histogram indicates that the data is unimodal (one hump) and slightly
asymmetric, as expected from the exploratory data analysis in Excel. The right tail of the
distribution indicates the presence of a few sample points with large R5D values.

While keeping the Ctrl key pressed, click the histogram bars with R5D values ranging
from 1.44 to 1.84 (144 to 184 mm). Note that the x-axis values have been rescaled by a
factor of 100 (10–2) to make them easier to read. The sample points within this range are
selected on the map (Figure 25). These sample points are located within the Troia
peninsula (northwest corner) and Algarve (southern area). Clicking in the white
background of the histogram clears the selection.

Figure 25: Regional histogram and descriptive statistics of the R5D index
TUTORIAL: Exploratory Spatial Data Analysis 29

Click the Select Features button, , in the Tools toolbar. Select a few points by
dragging the mouse over the map. The bars corresponding to the selected observations
are depicted in the histogram (Figure 26). This functionality is useful to investigate the
possible existence of spatial regimes. A proportional effect might be present in the
southern area, because this is where the R5D index exhibits higher values. The points in
this area are spread throughout the histogram, thus there is no evidence of a
proportional effect in this area.

Descriptive statistics are also depicted in the upper-right corner of the Histogram
window. Besides the summary statistics previously discussed in the Exploratory data
analysis in Excel section, the 1st and 3rd quartile are also presented:

 The 1st quartile equal to 67.4 means that the R5D index is smaller than 67.4 mm
in 25% of the weather stations (sample points).

 The 3rd quartile equal to 107.95 means that the R5D index is smaller than 107.95
mm in 75% of the weather stations (sample points).

Note that the median corresponds to the 2nd quartile (50% of the sample values are
smaller than the median).

Optionally, you can add the histogram to the layout by pressing the Add to Layout
button. Close the histogram dialog box.

Figure 26: Selecting units in the map to investigate spatial regimes


TUTORIAL: Exploratory Spatial Data Analysis 30

Additional resources:

 For further details, see this ArcGIS Help Library topic: “Histograms”, in: ArcGIS
Help Library, ArcGIS Resource Center, ESRI.
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Histograms/003
10000000p000000/ [accessed: 15 April, 2016]

 This YouTube video shows how to use the Histogram tool in Geostatistical
Analyst. histogram: https://www.youtube.com/watch?v=ZGPmcBq4Eac
[accessed: 15 April, 2016]

3.3.3 Indicator maps


An indicator variable is an artificial binary variable created to represent an attribute with
two distinct categories/classes (0 or 1). A variable that is continuous can be made into a
binary variable by choosing a threshold, or cut-off value. It is possible to create several
indicator variables for the same dataset by choosing multiple thresholds. There is no
general rule to choose the thresholds. These values may depend on the expert
knowledge about the attribute. For example, if the attribute is a pollutant concentration,
the thresholds can be defined by different hazard or vulnerability levels. Another
approach is to set the thresholds equal to a set of percentile values of the attribute’s
distribution (e.g., the deciles3 will correspond to 9 indicator maps).

To illustrate how to create indicator maps of the R5D index, we will use the quartile
values to define the thresholds, which were determined in the previous section:
 The 1st quartile (percentile of 25%) is equal to 67.4 mm;
 The 2nd quartile (median = percentile of 50%) is equal to 81.25 mm;
 The 3rd quartile (percentile of 75%) is equal to 107.95 mm.

The indicator maps can be produced by simply using the Symbology tool, or using a more
sophisticated approach that consists in creating the indicator variables in the attribute
table.

Using the Symbology tool

The following explains how to create indicator maps using the Symbology tool:

1. Copy the R5D_1990 layer, and paste it using Paste Layer(s).

2. Right-click the duplicate R5D_1990 layer in the Contents pane, and select Properties.

3. Click the Symbology tab on the Layer Properties dialog box.

4. Click Quantities > Graduated colors.

3
Deciles are similar to quartiles. While quartiles sort data into four quarters (25th, 50th and 75th
percentiles), deciles sort data into ten equal parts: The 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th and
90th percentiles.
TUTORIAL: Exploratory Spatial Data Analysis 31

5. In the Value field select R5D, and click the Classify button.

6. In the Classification window (Figure 27): change the number of Classes to 2; change
the Classification Method to ‘Manual’; insert 67,4 in the first value of the Break
Values field. Click Ok. Figure 28 shows the resulting indicator map.

7. Repeat all the previous steps considering the necessary modifications for the 2nd and
3rd quartiles.

Figure 27: Setting the threshold value of the indicator map in the Classification window

Figure 28: Indicator map of the R5D index using 67.4 mm as threshold

Creating indicator variables

To create an indicator variable for each quartile proceed as follows:

1. Open the Edit toolbar (Customize menu + Toolbars > Edit).


TUTORIAL: Exploratory Spatial Data Analysis 32

2. Right-click the R5D_1990 layer in the Contents pane, and select Open Attribute
Table.

3. Editing the attributes of features and values in tables takes place within an edit
session. When you've completed your edits, you can save them and end the edit
session. To start an edit session for the R5D_1990 layer, select Start Editing from the
Editor button in the Editor toolbar.

4. Click the Options button in the Table window, and select Add Field… (Figure 29).

5. Type in the new field name as 'Quartile1' and keep the type as Short Integer (Figure
30).

6. The new indicator variable 'Quartile1' takes the value 1 if the R5D values are smaller
than or equal to 67.4, and takes the value 0 otherwise.

a) Right click on the 'Quartile1' field name and select Field Calculator….

b) In the Field Calculator window, switch to Python parser and type (Figure 31):
1 if !R5D! <= 67.4 else 0

c) Click Ok. You can now see the values of the 'Quartile1' field equal to 1 when the
R5D record is smaller than or equal to 67.4. Otherwise, the 'Quartile1' field is equal
to 0.

d) Select Save Edits from the Editor button in the Editor toolbar.

e) Select Stop Editing from the Editor button in the Editor toolbar.

7. Repeat steps from 4 to 6 considering the necessary modifications for the 2 nd and 3rd
quartiles. In the end of this process, the attribute table of the R5D_1990 layer should
look like Figure 32.

Figure 29: Adding a new field to an attribute table


TUTORIAL: Exploratory Spatial Data Analysis 33

Figure 30: Naming the new field of the attribute table

Figure 31: Python parser to code the values of an indicator variable in the Field Calculator window
TUTORIAL: Exploratory Spatial Data Analysis 34

Figure 32: Attribute table of the R5D_1990 layer with the new indicator variables (Quartile1, Quartile2,
Quartile3)

Now, we just need to map the three indicator variables:

1. Copy the R5D_1990 layer, and paste it using Paste Layer(s).

2. Right-click the duplicate R5D_1990 layer in the Contents pane, and select Properties.

3. Click the Symbology tab on the Layer Properties dialog box.

4. Click Categories > Unique values.

5. In the Value field select Quartile1, and click the Add All Values button.

6. Edit the Labels of the values 0 and 1, for example as in Figure 33. Change the symbol
properties to your preference. Click Ok.

7. Repeat all the previous steps considering the necessary modifications for the 2nd and
3rd quartiles.
TUTORIAL: Exploratory Spatial Data Analysis 35

Figure 33: Symbology of the 1st indicator variable map (1st quartile of R5D)

Using five or six indicator maps would be more informative. Nevertheless, the indicator
maps of the R5D index (Figure 34) confirm the previous conclusions. There is no
apparent trend over the study domain. The lowest values are located inland, in the
centre of the study region. The highest values are located in the southern part (Algarve
region), in the northwest corner (Setubal and Troia peninsulas), and in the northeast
corner (S. Mamede mountain range). Moreover, in centre of the study domain, there are
two locations with high values surrounded by lower values.

Figure 34: Indicator maps of the R5D index


TUTORIAL: Exploratory Spatial Data Analysis 36

3.3.4 Voronoi maps


The Voronoi Map tool of ArcGIS Geostatistical Analyst helps to investigate local data
variability. A Voronoi map is created by defining Thiessen polygons around each point in
the dataset. The designations ‘Thiessen polygons’ or ‘Voronoi maps’ are both
appropriate when the statistic displayed in the polygons is the attribute value at that
point. In the field of GIS, these maps are usually referred as Thiessen polygons, after the
American meteorologist who frequented their use. In other fields, particularly
mathematics and computer science, they are generally referred to as Voronoi diagrams,
in honour of the mathematician Heorhii Voronyi (spelled also as Georgy Voronoy).

Many spatial statistics analysis techniques assume that data are stationary, meaning the
relationship between two points and their values depends on the distance between
them, not their exact location. Any location inside a Thiessen polygon represents the
area closer to that data point than to any other data point. This allows exploring the
variation of each sample point based on its relationship to surrounding sample points.

The following illustrates the use of the Voronoi Map tool of ArcGIS Geostatistical Analyst
to explore the R5D index data:

1. Click the Geostatistical Analyst drop-down arrow in the Geostatistical Analyst


toolbar, point to Explore Data, and then click Voronoi Map.

2. In the Voronoi Map dialog box (Figure 35):

a) Check if the Layer field is set to R5D_1990. Otherwise, click the Layer drop-down
arrow and select R5D_1990.

b) Click the Attribute drop-down arrow and select R5D.

c) Click the Clip Layer drop-down arrow and select StudyArea.

d) Optionally, change the statistic used to assign values to the polygons in the Type
field. The default is ‘Simple’, which corresponds to the R5D value recorded at the
sample point within each polygon.

e) Optionally, you can add the Voronoi map to the layout by pressing the Add to
Layout button.

f) Optionally, use the Export… button to save the Voronoi map as a Shapefile or
Feature Class, which can be added as a layer. All the statistics in the Type field are
saved in the attribute table of the Voronoi map layer, thus they can be easily
displayed using the Symbology tool. The Simple statistic corresponds to the R5D
field of the Voronoi map layer. Try it!

3. The polygons that are selected in the tool view are linked to points in the ArcMap
data view, which are also highlighted.

4. Close the Voronoi Map dialog box.


TUTORIAL: Exploratory Spatial Data Analysis 37

Figure 35: Voronoi Map dialog box

The (simple) Voronoi map of the R5D index allows to conclude the following:

 The spatial distribution of the R5D index is fairly homogeneous in the centre of
the study region, where the spatial autocorrelation pattern seems to be
anisotropic (i.e., ellipse shaped) with the major continuity direction in the south-
southwest/north-northeast (points separated by a large distance in this direction
are similar, as opposed to the perpendicular direction). However, in the southern
region (Algarve), the major continuity direction seems to be west-east. This
indicates the presence of two different spatial regimes in the study domain.

If the (global) spatial autocorrelation needs to be modelled (e.g., before using


kriging interpolation), there are two alternatives. One, corresponds to assume
that the R5D index is isotropic, thus the anisotropy patterns are not modelled.
The second one corresponds to split the study domain in two, and analyse both
regions separately (stratified analysis). The later has the limitation of creating
discontinuities in the border of the two regions (strata).

 The highest values are located in the southern part (Algarve region), in the
northwest corner (Setubal and Troia peninsulas), and in the northeast corner (S.
Mamede mountain range).

 The lowest values are located inland, in the centre of the study region.

 There is no apparent trend over the study domain.

 There is no evidence of outliers (errors or anomalies), because the highest values


are located in mountainous or coastal areas, which is consistent with the R5D
definition. This conclusion was previously stated in Data posting section.
TUTORIAL: Exploratory Spatial Data Analysis 38

Nevertheless, the Cluster Voronoi map (Figure 36) depicts six polygons that
might correspond to spatial outliers. However, as stated before, the daily
precipitation data used to compute the R5D index was subject to a thorough
quality analysis, thus we can assume that there are no measurement errors, or
anomalies in the data. The R5D values of these polygons most likely reflect local
meteorological conditions that are different from the surrounding areas.

Figure 36: Cluster Voronoi map of the R5D index

3.3.5 Local Moran's I statistic


Local spatial pattern analysis tools work by considering each feature [point or polygon]
within the context of neighbouring features and determining if the local pattern (a target
feature and its neighbours) is statistically different from the global pattern (all features
in the dataset).

Given a set of features and an analysis attribute4, the Cluster and Outlier Analysis
(Anselin Local Moran's I) tool identifies statistically significant local clusters and spatial
outliers using the Anselin’s Local Moran's I statistic. This tool creates a new output
feature class with the following attributes for each feature in the input feature class:
Local Moran's I value, z-score, p-value, and COType, which is a code representing the
cluster type for each statistically significant feature (Figure 37). The cluster/outlier type
(COType) field distinguishes between a statistically significant cluster of high values (HH),
cluster of low values (LL), outlier in which a high value is surrounded primarily by low
values (HL), and outlier in which a low value is surrounded primarily by high values (LH).
Results are only reliable if the dataset contains at least 30 features.

4
This tool requires an input field such as a count, rate, or other numeric measurement.
TUTORIAL: Exploratory Spatial Data Analysis 39

Figure 37: Illustration of the Cluster and Outlier Analysis (Anselin Local Moran's I) tool
Source: ESRI, “Cluster and Outlier Analysis (Anselin Local Moran's I)”, ArcGIS for Desktop.
ArcGIS Pro, http://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/cluster-
and-outlier-analysis-anselin-local-moran-s.htm. [Accessed: 21 April 2016]

A positive value for I indicates that a feature [point or polygon] has neighbouring
features with similarly high or low attribute values; this feature is part of a cluster. A
negative value for I indicates that a feature has neighbouring features with dissimilar
values; this feature is an outlier. In either instance, the p-value for the feature must be
small enough for the cluster or outlier to be considered statistically significant. By
default, they are considered statistically significant if the p-value is smaller than 0.05
(95% confidence level). This analytical approach creates issues with both multiple testing
and dependency. The following paragraphs explain these issues and how to deal with
them5:

Multiple Testing — With a confidence level of 95 percent, probability theory tells us that
there are 5 out of 100 chances that a spatial pattern could appear structured (clustered
or dispersed, for example) and could be associated with a statistically significant p-value,
when in fact the underlying spatial processes promoting the pattern are truly random.
We would falsely reject the ‘complete spatial randomness’ null hypothesis in these cases
because of the statistically significant p-values. Five chances out of 100 seems quite
conservative until you consider that local spatial statistics perform a test for every
feature in the dataset. If there are 10,000 features, for example, we might expect as
many as 500 false results.

Spatial Dependency — Features near to each other tend to be similar; more often than
not spatial data exhibits this type of dependency. Nonetheless, many statistical tests
require features to be independent. For local pattern analysis tools this is because
spatial dependency can artificially inflate statistical significance. Spatial dependency is
exacerbated with local pattern analysis tools because each feature is evaluated within
the context of its neighbours, and features that are near each other will likely share
many of the same neighbours. This overlap accentuates spatial dependency.

In ArcGIS 10.2 or later, the Cluster and Outlier Analysis (Anselin Local Moran's I) tool
provides an optional Boolean parameter, the False Discovery Rate (FDR) Correction,
which will potentially account for multiple testing and spatial dependency. For this

5
ESRI (2016). “What is a z-score? What is a p-value?”. ArcGIS Pro, Tool Reference, Spatial Statistics toolbox,
http://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/what-is-a-z-score-what-is-a-p-value.htm.
[Accessed: 21 April 21, 2016]
ESRI (2016). “Modeling spatial relationships”. ArcGIS Pro, Tool Reference, Spatial Statistics toolbox,
http://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/modeling-spatial-relationships.htm.
[Accessed: 21 April 21, 2016]
TUTORIAL: Exploratory Spatial Data Analysis 40

method, statistically significant p-values are ranked from smallest (strongest) to largest
(weakest), and based on the false positive estimate, the weakest are removed from this
list. The remaining features with statistically significant p-values are identified by the
COType field in the output feature class. When no FDR correction is applied, features
with p-values smaller than 0.05 are considered statistically significant. The FDR
correction reduces this p-value threshold from 0.05 to a value that better reflects the 95
percent confidence level given multiple testing. While not perfect, empirical tests show
this method performs much better than assuming that each local test is performed in
isolation, or applying the classical, overly conservative, multiple test methods (e.g.,
Bonferroni or Sidak corrections)6.

Distance Band or Threshold Distance sets the scale of analysis for most
conceptualizations of spatial relationships (e.g., Inverse distance and Fixed distance
band). It is a positive numeric value representing a cut-off distance. Features outside the
specified cut-off for a target feature are ignored in the analysis for that feature. The
Calculate Distance Band from Neighbor Count tool will evaluate minimum, average, and
maximum distances for a specified number of neighbours and can help you determine
an appropriate distance band value to use for analysis. See also Selecting a fixed distance
band value for additional guidelines. These are a few recommendations:

- Use a distance band that is large enough to ensure all features will have at least
one neighbour, or results will not be valid.

- No feature should have all other features as a neighbour.

- Especially if the values for the input field are skewed, each feature should have
about eight neighbours.

- Use a distance band that reflects maximum spatial autocorrelation. Run the
Incremental Spatial Autocorrelation tool and note where the resulting z-scores
seem to peak. Use the distance associated with the peak value for your analysis.

Before applying the Cluster and Outlier Analysis (Anselin Local Moran's I) tool, we also
need to select an appropriate Conceptualization of Spatial Relationships7:

- Inverse distance, Inverse distance squared: most appropriate with continuous


data or to model processes where the closer two features are in space, the more
likely they are to interact/influence each other.

6
Caldas de Castro, M., & Singer, B. H. (2006). Controlling the false discovery rate: a new application to
account for multiple and dependent tests in local statistics of spatial association. Geographical Analysis,
38(2), 180-208.
7
If none of the options for the Conceptualization of Spatial Relationships parameter work well for your
analysis, you can create an ASCII text file or table with the feature-to-feature relationships you want and
then use these to build a spatial weights matrix file. If one of the options above is close, but not perfect for
your purposes, you can use the Generate Spatial Weights Matrix tool to create a basic SWM file, then edit
your spatial weights matrix file.
TUTORIAL: Exploratory Spatial Data Analysis 41

With this spatial conceptualization, every feature is potentially a neighbour of


every other feature, and with large datasets, the number of computations
involved will be enormous. You should always try to include a Distance Band or
Threshold Distance value when using the inverse distance conceptualizations.
This is particularly important for large datasets. If you leave the Distance Band or
Threshold Distance parameter blank, a threshold distance will be computed for
you, but this may not be the most appropriate distance for your analysis; the
default distance threshold will be the minimum distance that ensures every
feature has at least one neighbour.

- Fixed distance band: works well for point data. It is often a good option for
polygon data when there is a large variation in polygon size (very large polygons
at the edge of the study area and very small polygons at the centre of the study
area, for example), and you want to ensure a consistent scale of analysis.

- Zone of indifference: works well when fixed distance is appropriate but imposing
sharp boundaries on neighbourhood relationships is not an accurate
representation of your data. Keep in mind that the zone of indifference
conceptual model considers every feature to be a neighbour of every other
feature. Consequently, this option is not appropriate for large datasets since the
Distance Band or Threshold Distance value supplied does not limit the number of
neighbours but only specifies where the intensity of spatial relationships begins
to wane.

- Contiguity edges only, Contiguity edges corners: these polygon contiguity


conceptualizations are effective when polygons are similar in size and
distribution, and when spatial relationships are a function of polygon proximity
(the idea that if two polygons share a boundary, spatial interaction between
them increases).

- K nearest neighbours: effective when you want to ensure you have a minimum
number of neighbours for your analysis. Especially when the values associated
with your features are skewed (are not normally distributed), it is important that
each feature is evaluated within the context of at least eight or so neighbours
(this is a rule of thumb only).
When the distribution of your data varies across your study area so that some
features are far away from all other features, this method works well. Note,
however, that the spatial context of your analysis changes depending on
variations in the sparsity/density of your features. When fixing the scale of
analysis is less important than fixing the number of neighbours, the K nearest
neighbours’ method is appropriate.

- Delaunay triangulation: good option when your data includes island polygons
(isolated polygons that do not share any boundaries with other polygons), or in
cases where there is a very uneven spatial distribution of features. It is not
appropriate when you have coincident features, however.
TUTORIAL: Exploratory Spatial Data Analysis 42

Table 4 indicates how different choices for the Conceptualization of Spatial Relationships
parameter behave for each possible input types of the Distance Band or Threshold
Distance.

Table 4: Interaction of the Conceptualization of Spatial Relationships parameter with possible Distance
Band or Threshold Distance values
Adapted from: ESRI (2016). “Modeling spatial relationships”. ArcGIS Pro, Tool Reference, Spatial
Statistics toolbox, http://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/modeling-spatial-
relationships.htm. [Accessed: 21 April 21, 2016]
Distance Polygon Contiguity,
Band / Inverse Distance, Inverse Fixed Distance Band, Zone of Delaunay
Threshold Distance Squared Indifference Triangulation, K
Distance Nearest Neighbours
No threshold or cut-off is
applied; every feature is a Invalid. Runtime error will be
0 Ignored.
neighbour of every other generated.
feature.
A default distance will be
A default distance will be computed.
computed. This default will
This default will be the minimum
blank be the minimum distance to Ignored.
distance to ensure that every feature
ensure that every feature
has at least one neighbour.
has at least one neighbour.
For fixed distance band, only features
within this specified cut-off of each
The nonzero, positive value
other will be neighbours. For zone of
specified will be used as a
indifference, features within this
positive cut-off distance; neighbour
specified cut-off of each other will be Ignored.
number relationships will only exist
neighbours; features outside the cut-
among features within this
off are neighbours too, but they are
distance of each other.
assigned a smaller and smaller
weight/influence as distance increases.

Considering that the R5D index corresponds to a continuous variable, we will select
Inverse Distance Squared for the Conceptualization of Spatial Relationships. We will use
the Incremental Spatial Autocorrelation tool to select an appropriate Threshold
Distance. This tool measures spatial autocorrelation for a series of distances and
optionally creates a line graph of those distances and their corresponding z-scores of the
Local Moran’s I statistic. Z-scores reflect the intensity of spatial clustering, and
statistically significant peak z-scores indicate distances where spatial processes
promoting clustering (i.e., positive spatial autocorrelation) are most pronounced. These
peak distances are often appropriate values to use in tools that require a Distance Band
or Threshold Distance parameter.

When more than one statistically significant peak is present, clustering is pronounced at
each of those distances. Select the peak distance that best corresponds to the scale of
analysis you are interested in; often this is the first statistically significant peak
encountered (Figure 38).

If you are working with point data and the z-score never peaks (in other words, it just
keeps decreasing), it means there are many different spatial processes operating at a
variety of spatial scales and you will likely need to come up with different criteria for
determining the fixed distance to use in your analysis.
TUTORIAL: Exploratory Spatial Data Analysis 43

Figure 38: Illustration of z-score peaks in the Spatial Autocorrelation by Distance graph.
Source: ESRI (2016). “Incremental Spatial Autocorrelation”. ArcGIS Pro, Tool Reference, Spatial Statistics
toolbox > Analyzing Patterns toolset, http://pro.arcgis.com/en/pro-app/tool-reference/spatial-
statistics/incremental-spatial-autocorrelation.htm. [Accessed: 21 April 21, 2016]

Proceed as follows to run the Incremental Spatial Autocorrelation tool:

1. Open ArcToolbox , and browse Spatial Statistics Tools > Analyzing Patterns.

2. Double-click Incremental Spatial Autocorrelation.

3. In the Incremental Spatial Autocorrelation dialog-box (Figure 39):

a) Select the R5D_1990 layer in the Input Features field. This is the input
feature class.

b) Select R5D in the Input Field field. This is the analysis attribute.

c) Uncheck the ‘Row Standardization’ parameter, because this is more


appropriate for polygon features.

d) Select a file name (e.g., Incremental_Spatial_Autocorrelation.pdf) and a


location in your computer to save the Output Report File (optional).

e) Keep the default values of all other fields.

4. Click the Ok button.

Notes: Before using this tool, be sure to project your data if your study area extends
beyond 30 degrees.
If the tools fails to execute and returns the message "Error 001143 background
server threw an exception", disable the background processing and run the tool
again:
TUTORIAL: Exploratory Spatial Data Analysis 44

1) Geoprocessing menu (in the Standard toolbar) + Geoprocessing Options.


2) Uncheck ‘Enable’ in Background Processing.

Figure 39: Incremental Spatial Autocorrelation dialog-box

In the Output Report File (Incremental_Spatial_Autocorrelation.pdf), the Spatial


Autocorrelation by Distance graphic (Figure 40) exhibits a single peak at 42409,39
meters. Hence, we will set the Distance Band or Threshold Distance parameter to 42500
(round up to be safe).

If we run the Incremental Spatial Autocorrelation tool again, but with the Beginning
Distance parameter equal to 20000, we obtain the graphic in Figure 41. However, the
Output Report File also provides the information that “At least one distance increment
resulted in features with no neighbours which may invalidate the significance of the
corresponding results” for the first three points (20000, 30894.87 and 41789.75 meters). In
other words, the z-score might not be significant for at least one of these points.
Therefore, the previous value found seems to be adequate.
TUTORIAL: Exploratory Spatial Data Analysis 45

Figure 40: Spatial autocorrelation of the R5D index by distance (z-scores of the Local Moran’s I statistic)

Figure 41 Spatial autocorrelation of the R5D index by distance (z-scores of the Local Moran’s I statistic)
with the Beginning Distance parameter equal to 20000 meters

Finally, we have all the information we need to use the Cluster and Outlier Analysis
(Anselin Local Moran's I) tool:

1. Open ArcToolbox , and browse Spatial Statistics Tools > Mapping Clusters.

2. Double-click Cluster and Outlier Analysis (Anselin Local Moran's I).

3. In the Cluster and Outlier Analysis (Anselin Local Moran's I) dialog-box (Figure 42):

a) Select the R5D_1990 layer in the Input Feature Class field.

b) Select R5D in the Input Field field. This is the analysis attribute.
TUTORIAL: Exploratory Spatial Data Analysis 46

c) In the Output Feature Class field, click the Browse button, , and navigate
to the location where you want to store the resulting feature. Save it as a
shapefile named R5D_LocalMoran.

d) Select INVERSE_DISTANCE_SQUARED in the Conceptualization of Spatial


Relationships field.

e) Edit 42500 in the Distance Band or Threshold Distance (optional) field.

f) In ArcGIS 10.2 or later, check the False Discovery Rate (FDR) Correction
option.

g) Keep the default values of all other fields.

4. Click the Ok button.

Results show that there are no spatial outliers (Figure 43). A cluster of low values is
located in the centre of the study domain, and two clusters of high values are located in
the southern area (Algarve), more specifically near the Monchique (west) and Caldeirão
(east) mountain ranges. A single point in the northwest corner (Troia peninsula)
corresponds to a third cluster of high values, which confirms that this extreme value is
not an outlier. The remaining 69 points are not statistically significant, which means that
we do not have enough evidence to reject the ‘complete spatial randomness’ hypothesis
with 95% confidence. Note that this results were obtained without the False Discovery
Rate (FDR) Correction, which is not available in ArcGIS 10.1.

Figure 42: Cluster and Outlier Analysis (Anselin Local Moran's I) dialog-box
TUTORIAL: Exploratory Spatial Data Analysis 47

Figure 43: Anselin’s Local Moran's I map

3.3.6 Global Moran's I statistic


The Spatial Autocorrelation (Global Moran's I) tool calculates the Moran's I statistic
value and both a z-score and p-value to evaluate the significance of that statistic. Note
that results are not reliable with less than 30 features (points or polygons).

Given a set of features and an associated attribute, this tool evaluates whether the
pattern expressed is clustered, dispersed, or random. When the z-score or p-value
indicates statistical significance, a positive Moran's I index value indicates tendency
toward clustering while a negative Moran's I index value indicates tendency toward
dispersion.

In general, the Global Moran's I statistic is bounded by –1 and 1. This is always the case
when your weights are row standardized. When you do not row standardize the
weights, there may be instances where the statistic value falls outside the –1 to 1 range,
and this indicates a problem with your parameter settings. For polygon features, you will
almost always want to row standardize.

Similarly to the Cluster and Outlier Analysis (Anselin Local Moran's I) tool, we must
choose an appropriate Conceptualization of Spatial Relationships method, as well as the
Distance Band or Threshold Distance. All previous recommendations on these issues are
valid.

Proceed as follows to use the Spatial Autocorrelation (Morans I) tool with the R5D index
data:

1. Open ArcToolbox , and browse Spatial Statistics Tools > Analyzing Patterns.

2. Double-click Spatial Autocorrelation (Morans I).

3. In the Spatial Autocorrelation (Morans I) dialog-box (Figure 44):


TUTORIAL: Exploratory Spatial Data Analysis 48

a) Select the R5D_1990 layer in the Input Features field. This is the input
feature class.

b) Select R5D in the Input Field field. This is the analysis attribute.

c) Check the Generate Report (optional) option. The path to the HTML report
will be included with the messages summarising the tool execution
parameters8. If this window is closed, open it by selecting: Geoprocessing
menu + Results.

d) Select INVERSE_DISTANCE_SQUARED in the Conceptualization of Spatial


Relationships field.

e) Edit 42500 in the Distance Band or Threshold Distance (optional) field.

f) Keep the default values of all other fields.

4. Click the Ok button.

Figure 44: Spatial Autocorrelation (Morans I) dialog-box

For the R5D index data, the Global Moran's I value equal to 0.51 is statistically significant
(p-value<0.05) (Table 5), which provides evidence that the spatial distribution of high
values and/or low values in the dataset is more spatially clustered than would be
expected if underlying spatial processes were random.

8
The HTML report is saved in the default folder. For example:
C:\...\My Documents\ArcGIS\Default.gdb\MoransI_Result.html.
TUTORIAL: Exploratory Spatial Data Analysis 49

Table 5: Global Moran's I Summary for the R5D index


Moran's Index 0,514248
Expected Index -0,010989
Variance 0,005620
z-score 7,006316
p-value 0,000000

Das könnte Ihnen auch gefallen