You are on page 1of 20

Exploratory data analysis

Introduction
Exploratory data analysis (EDA) is an approach to analyzing data for the purpose of formulating hypothesis worth testing, complementing the tools of statistics for testing hypothesis.

Data evaluation form an essential part of every mineral inventory estimate it involves organizing and understanding of data that are the basis of a resource/reserve estimate.

The ultimate purpose of exploratory data evaluation in mineral inventory work is to improve the quality of estimation.

Aims of EDA
Specific aims include the following:
1. To minimize the error of estimation 2. To provide a comprehensive knowledge of the statistical characteristics of all variables of interest for resource/reserve estimation

3. To document and understand the inter relations among the variables of interest 4. To recognize any systematic spatial variation of variables such as grade and thickness of mineralized zones 5. To recognized and define distinctive geologic domains that must be evaluated independently for mineral inventory estimation

To evaluate similarity/dissimilarity of various types of raw data, especially samples of different supports.

Variance
X 3 3 4 4 4 5 5 5 6 6 6 6 7 7 8 8 9 10 10 11 Sum X- -3.35 -3.35 -2.35 -2.35 -2.35 -1.35 -1.35 -1.35 -0.35 -0.35 -0.35 -0.35 0.65 0.65 1.65 1.65 2.65 3.65 3.65 4.65 0 (X -) 11.22 11.22 5.52 5.52 5.52 1.82 1.82 1.82 0.12 0.12 0.12 0.12 0.42 0.42 2.72 2.72 7.02 13.32 13.32 21.62 106.55

The average of square of deviation is called variance.

To remove the - signs we simply square each deviation before finding the average. This is called the Variance:
(X - ) = 106.55 = 5.33 n 20

Mean
Mean is the average of a set of data.
To calculate the mean, find the sum of data and then divide by the number of data.

12,15,11,11,7,13 First, find the sum of the data. 12+15+11+11+7+13=69 Then divide by the number of data: 69/6=11.5 Thus mean is 11.5

In exploratory data analysis often calculate weighted mean. Weighted mean counts the weights of the samples. thus it reduces the error. Weighted mean is calculated by: sum of (sample*weight)/ sum
weights

Declustering
Declustering: A technique to remove spatial biasness of the clustered sample

Cell declustering
Weight of the sample is proportional to the number of samples in a cell The whole deposit is divided into cells The number of samples in each cell is counted Then weight of sample is calculated.

Example 1

Consider the following set of sample data taken from a drive within an orebody
Sample Value 1 1 2 3 3 12 4 3 5 10 6 2 7 8 8 2 9 3 10 9 11 2 12 3

What is the average grade?

Example 1 - Continued
The average grade is 4.833, but is it representative? Consider the sample positions
Y
120

1
100

80

60

9 10 2 12 2

40

20

3
0

-20 -20 0 20 40 60 80 100 120

What is the representative mean grade now?

Example 1 - Continued
X 0 50 100 0 50 100 0 50 100 45 55 45 Sum Mean Grade Y 0 0 0 50 50 50 100 100 100 45 45 55 Value 1 3 3 2 12 2 3 2 3 10 8 9 weight 1.00 1.00 1.00 1.00 0.25 1.00 1.00 1.00 1.00 0.25 0.25 0.25 9.00 Weighted Value 1.00 3.00 3.00 2.00 3.00 2.00 3.00 2.00 3.00 2.50 2.00 2.25 28.75 3.19

The values need to be declustered by assigning a weight according to proximity with other samples. The mean value is thus the sum of the weighted values divided by the sum of the weights

Sample Support
Samples measured over a length of 2m are said to have a support of 2m. However, not all 2m samples have the same support. This may be due to: Different diameter core producing different sample volumes Impact of core recovery

Consider the following where 1m samples are combined into 2m, 4m and 8m samples.
Sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Mean Variance 1m support 2m support 4m support 8m support 4.00 2.00 3.00 7.00 5.00 6.00 4.50 8.00 3.00 5.50 10.00 5.00 7.50 6.50 5.50 4.00 3.00 3.50 6.00 9.00 7.50 5.50 8.00 2.00 5.00 4.00 6.00 5.00 5.00 5.25 5.38 6.12 5.38 2.70 5.38 0.73 5.38 0.03

The mean grade does not change, but the variance decreases as support increases.

Unfortunately variance is important for geostatistical estimation so we need to have a good understanding of it
Not all samples are of the same length, especially if sampling is done to geological contacts etc. Hence we need to produce samples of uniform length which can only be achieved by combining samples

Thanks