Statistics For Analytical Chemistry

The General Analytical Problem
Select sample
Extract analyte(s) from matrix
Separate analytes
Detect, identify and
quantify analytes
Determine reliability and
significance of results
Errors in Chemical Analysis

Impossible to eliminate errors.
How reliable are our data?
Data of unknown quality are useless!
Carry out replicate measurements

Analyse accurately known standards
Perform statistical tests on data
Mean
Defined as follows:
x =
xi
i=1
Where xi = individual values of x and N = number of replicate

measurements
Median
The middle result when data are arranged in order of size (for even
numbers the mean of middle two). Median can be preferred when
there is an outlier - one reading very different from rest. Median
less affected by outlier than is mean.
Illustration of Mean and Median

Results of 6 determinations of the Fe(III) content of a solution, known to
contain 20 ppm:
Note: The mean value is 19.78 ppm (i.e. 19.8ppm) - the median value is 19.7 ppm
Precision
Relates to reproducibility of results..
How similar are values obtained in exactly the same way?
Useful for measuring this:
Deviation from the mean:
d i xi x
Accuracy
Measurement of agreement between experimental mean and
true value (which may not be known!).
Measures of accuracy:
Absolute error: E = xi - xt (where xt = true or accepted value)
Relative error:
x x
t 100%
E i
r
x
t
(latter is more useful in practice)
Illustrating the difference between accuracy and precision
Low accuracy, low precision
Low accuracy, high precision
High accuracy, low precision
High accuracy, high precision
Some analytical data illustrating accuracy and precision

HN
H
NH3+ClH
Benzyl isothiourea
hydrochloride
O
OH
Analyst 4: imprecise, inaccurate

Analyst 3: precise, inaccurate
Analyst 2: imprecise, accurate
Analyst 1: precise, accurate
Nicotinic acid
Types of Error in Experimental

Data
Three types:
(1) Random (indeterminate) Error
Data scattered approx. symmetrically about a mean value.
Affects precision - dealt with statistically (see later).
(2) Systematic (determinate) Error
Several possible sources - later. Readings all too high
or too low. Affects accuracy.
(3) Gross Errors
Usually obvious - give outlier readings.
Detectable by carrying out sufficient replicate
measurements.
Sources of Systematic Error

1. Instrument Error
Need frequent calibration - both for apparatus such as
volumetric flasks, burettes etc., but also for electronic
devices such as spectrometers.
2. Method Error
Due to inadequacies in physical or chemical behaviour
of reagents or reactions (e.g. slow or incomplete reactions)
Example from earlier overhead - nicotinic acid does not
react completely under normal Kjeldahl conditions for
nitrogen determination.
3. Personal Error
e.g. insensitivity to colour changes; tendency to estimate
scale readings to improve precision; preconceived idea of
true value.
Systematic errors can be

constant (e.g. error in burette reading less important for larger values of reading) or
proportional (e.g. presence of given proportion of
interfering impurity in sample; equally significant
for all values of measurement)
Minimise instrument errors by careful recalibration and good
maintenance of equipment.
Minimise personal errors by care and self-discipline
Method errors - most difficult. True value may not be known.
Three approaches to minimise:
analysis of certified standards
use 2 or more independent methods
analysis of blanks
Statistical Treatment of
Random Errors
There are always a large number of small, random errors
in making any measurement.
These can be small changes in temperature or pressure;
random responses of electronic detectors (noise) etc.
Suppose there are 4 small random errors possible.
Assume all are equally likely, and that each causes an error
of U in the reading.
Possible combinations of errors are shown on the next slide:
Combination of Random Errors

Total Error
No.
Relative Frequency
+U+U+U+U
+4U
1/16 = 0.0625
-U+U+U+U
+U-U+U+U
+U+U-U+U
+U+U+U-U
+2U
4/16 = 0.250
-U-U+U+U
-U+U-U+U
-U+U+U-U
+U-U-U+U
+U-U+U-U
+U+U-U-U
6/16 = 0.375
+U-U-U-U
-U+U-U-U
-U-U+U-U
-U-U-U+U
-2U
4/16 = 0.250
-U-U-U-U
-4U
1/16 = 0.01625
The next overhead shows this in graphical form
Frequency Distribution for

Measurements Containing Random Errors
4 random uncertainties
A very large number of

random uncertainties
10 random uncertainties
This is a
Gaussian or
normal error
curve.
Symmetrical about
the mean.
Replicate Data on the Calibration of a 10ml Pipette

No.
Vol, ml.
No.
Vol, ml.
No.
Vol, ml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
9.988
9.973
9.986
9.980
9.975
9.982
9.986
9.982
9.981
9.990
9.980
9.989
9.978
9.971
9.982
9.983
9.988
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
9.975
9.980
9.994
9.992
9.984
9.981
9.987
9.978
9.983
9.982
9.991
9.981
9.969
9.985
9.977
9.976
9.983
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
9.976
9.990
9.988
9.971
9.986
9.978
9.986
9.982
9.977
9.977
9.986
9.978
9.983
9.980
9.983
9.979
Mean volume
Spread
9.982 ml
0.025 ml
Median volume
9.982 ml
Standard deviation 0.0056 ml
Calibration data in graphical form
A = histogram of experimental results

B = Gaussian curve with the same mean value, the same precision (see later)
and the same area under the curve as for the histogram.
SAMPLE = finite number of observations

POPULATION = total (infinite) number of observations
Properties of Gaussian curve defined in terms of population.
Then see where modifications needed for small samples of data
Main properties of Gaussian curve:

Population mean () : defined as earlier (N ). In absence of systematic error,
is the true value (maximum on Gaussian curve).
Remember, sample mean ( x ) defined for small values of N.
(Sample mean population mean when N 20)
Population Standard Deviation () - defined on next overhead
: measure of precision of a population of data,

given by:
N
2
(
x
)
i
i 1
Where = population mean; N is very large.

The equation for a Gaussian curve is defined in terms of and , as follows:
( x ) 2 / 2 2
Two Gaussian curves with two different

standard deviations, A and B (=2A)
General Gaussian curve plotted in

units of z, where
z = (x - )/
i.e. deviation from the mean of a
datum in units of standard
deviation. Plot can be used for
data with given value of mean,
and any standard deviation.
AreaunderaGaussianCurve
Fromequationabove,andillustratedbythepreviouscurves,
68.3%ofthedataliewithinofthemean(),i.e.68.3%of
theareaunderthecurveliesbetweenof.
Similarly,95.5%ofthearealiesbetween,and99.7%
between.
Thereare68.3chancesin100thatforasingledatumthe
randomerrorinthemeasurementwillnotexceed.
Thechancesare95.5in100thattheerrorwillnotexceed.
Sample Standard Deviation, s

The equation for must be modified for small samples of data, i.e. small N
N
2
(
x
x
)
i
i 1
N 1
Two differences cf. to equation for :

1.
Use sample mean instead of population mean.
2.
Use degrees of freedom, N - 1, instead of N.

Reason is that in working out the mean, the sum of the
differences from the mean must be zero. If N - 1 values are
known, the last value is defined. Thus only N - 1 degrees
of freedom. For large values of N, used in calculating
, N and N - 1 are effectively equal.
Alternative Expression for s

(suitable for calculators)
N
( xi ) 2
i 1
( xi 2 )
i 1
N 1
Note: NEVER round off figures before the end of the calculation
Reproducibility of a method for determining

the % of selenium in foods. 9 measurements
were made on a single batch of brown rice.
Standard Deviation of a Sample

Sample
1
2
3
4
5
6
7
8
9
Selenium content (g/g) (xI)

0.07
0.07
0.08
0.07
0.07
0.08
0.08
0.09
0.08
xi
0.69
Mean = xi/N= 0.077g/g

Standard deviation:
xi2
0.0049
0.0049
0.0064
0.0049
0.0049
0.0064
0.0064
0.0081
0.0064
xi2=
0.0533
(xi)2/N = 0.4761/9 = 0.0529
0.0533 0.0529
0.00707106 0.007
9 1
Coefficient of variance = 9.2% Concentration = 0.077 0.007 g/g
Standard Error of a Mean

The standard deviation relates to the probable error in a single measurement.
If we take a series of N measurements, the probable error of the mean is less than
the probable error of any one measurement.
The standard error of the mean, is defined as follows:
sm s
Pooled Data
To achieve a value of s which is a good approximation to , i.e. N 20,
it is sometimes necessary to pool data from a number of sets of measurements
(all taken in the same way).
Suppose that there are t small sets of data, comprising N1, N2,.Nt measurements.
The equation for the resultant sample standard deviation is:
s pooled
N1
N2
N3
i 1
i 1
i 1
2
2
2
(
x
x
)
(
x
x
)
(
x
x
)
i 1 i 2 i 3 ....
N 1 N 2 N 3 ...... t
(Note: one degree of freedom is lost for each set of data)
(
x
x
)
Pooled Standard Deviation
Analysis of 6 bottles of wine

for residual sugar.
Bottle Sugar%(w/v) No.ofobs.

Deviationsfrommean
1
0.94
3
0.05,0.10,0.08
2
1.08
4
0.06,0.05,0.09,0.06
3
1.20
5
0.05,0.12,0.07,0.00,0.08
4
0.67
4
0.05,0.10,0.06,0.09
5
0.83
3
0.07,0.09,0.10
6
0.76
4
0.06,0.12,0.04,0.03
i2
(0.05) 2 (010
. ) 2 ( 0.08) 2
0.0189
s1
0.0972 0.097
2
2
and similarly for all sn .
Setn
1
2
3
4
5
6
Total
0.0189
0.0178
0.0282
0.0242
0.0230
0.0205
0.1326
sn
0.097
0.077
0.084
0.090
0.107
0.083
s pooled
01326
.
0.088%
23 6
Two alternative methods for measuring the precision of a set of results:
VARIANCE:
This is the square of the standard deviation:

N
s2
2
2
(
x
x
)
i
i 1
N 1
COEFFICIENT OF VARIANCE (CV)

(or RELATIVE STANDARD DEVIATION):
Divide the standard deviation by the mean value and express as a percentage:
s
CV ( ) 100%
x
Use of Statistics in Data

Evaluation
How can we relate the observed mean value ( x ) to the true mean ()?
The latter can never be known exactly.
The range of uncertainty depends how closely s corresponds to.
We can calculate the limits (above and below) around x that must lie,
with a given degree of probability.
Define some terms:
CONFIDENCE LIMITS
interval around the mean that probably contains .
CONFIDENCE INTERVAL
the magnitude of the confidence limits
CONFIDENCE LEVEL
fixes the level of probability that the mean is within the confidence limits
Examples later.
First assume that the known s is a good

approximation to.
Percentages of area under Gaussian curves between certain limits of z (= x - )

50%
80%
90%
95%
99%
of area lies between
0.67
1.29
1.64
1.96
2.58
What this means, for example, is that 80 times out of 100 the true mean will lie
between 1.29 of any measurement we make.
Thus, at a confidence level of 80%, the confidence limits are 1.29
For a single measurement: CL for = x z (values of z on next overhead)

For the sample mean of N measurements (
x ), the equivalent expression is:
CL for x z
Values of z for determining Confidence

Limits
Confidence level, %
50
68
80
90
95
96
99
99.7
99.9
Note:
z
0.67
1.0
1.29
1.64
1.96
2.00
2.58
3.00
3.29
these figures assume that an excellent approximation

to the real standard deviation is known.
Confidence Limits when is known

Atomic absorption analysis for copper concentration in aircraft engine oil gave a value
of 8.53 g Cu/ml. Pooled results of many analyses showed s = 0.32 g Cu/ml.

Calculate 90% and 99% confidence limits if the above result were based on (a) 1, (b) 4,
(c) 16 measurements.
(b)
(a)
(164
. )(0.32)
8.53 0.52 g / ml
1
i.e. 8.5 0.5g / ml
(164
. )(0.32)
8.53 0.26g / ml
4
i.e. 8.5 0.3g / ml
90% CL 8.53
( 2.58)(0.32)
99% CL 8.53
8.53 0.83g / ml
1
i.e. 8.5 0.8g / ml
90% CL 8.53
(2.58)(0.32)
8.53 0.41g / ml
4
i.e. 8.5 0.4 g / ml
99% CL 8.53
90% CL 8.53
(c)
(164
. )(0.32)
16
8.53 013
. g / ml
i.e. 8.5 01
. g / ml
( 2.58)(0.32)
8.53 0.21g / ml
16
i.e. 8.5 0.2 g / ml
99% CL 8.53
If we have no information on , and only have a value for s the confidence interval is larger,
i.e. there is a greater uncertainty.
Instead of z, it is necessary to use the parameter t, defined as follows:
t = (x - )/s
i.e. just like z, but using s instead of .
By analogy we have:
CL for x ts
N
(where x = sample mean for N measurements)
The calculated values of t are given on the next overhead
Values of t for various levels of probability

Degrees of freedom
(N-1)
1
2
3
4
5
6
7
8
9
19
59
Note:
(1)
(2)
80%
90%
95%
99%
3.08
1.89
1.64
1.53
1.48
1.44
1.42
1.40
1.38
1.33
1.30
1.29
6.31
2.92
2.35
2.13
2.02
1.94
1.90
1.86
1.83
1.73
1.67
1.64
12.7
4.30
3.18
2.78
2.57
2.45
2.36
2.31
2.26
2.10
2.00
1.96
63.7
9.92
5.84
4.60
4.03
3.71
3.50
3.36
3.25
2.88
2.66
2.58
As (N-1) , so t z
For all values of (N-1) < , t > z, I.e. greater uncertainty
Confidence Limits where is not known

Analysis of an insecticide gave the following values for % of the chemical lindane:
7.47, 6.98, 7.27. Calculate the CL for the mean value at the 90% confidence level.
xi%
7.47
6.98
7.27
xi = 21.72
xi
55.8009
48.7204
52.8529
xi2 = 157.3742
( xi ) 2
(2172
. )2
x N
157.3742
3
s
N 1
2
0.246 0.25%
2
i
If repeated analyses showed that s = 0.28%:
2172
.
7.24
N
3
i
(2.92)(0.25)
7.24
N
3
7.24 0.42%
90% CL x ts
90% CL x z
7.24
N
7.24 0.27%
(164
. )(0.28)
3
Testing a Hypothesis
Carry out measurements on an accurately known standard.
Experimental value is different from the true value.
Is the difference due to a systematic error (bias) in the method - or simply to random error?
Assume that there is no bias
(NULL HYPOTHESIS),
and calculate the probability
that the experimental error
is due to random errors.
Figure shows (A) the curve for
the true value (A = t) and
(B) the experimental curve (B)
Bias = B- A = B - xt.
Test for bias by comparing x xt with the

difference caused by random error
Remember confidence limit for (assumed to be xt, i.e. assume no bias)
is given by:
CL for x
ts
N
at desired confidence level, random
errors can lead to:
x xt
ts
N
if x xt
ts
, then at the desired
N
confidence level bias (systematic error)
is likely (and vice versa).
Detection of Systematic Error (Bias)

A standard material known to contain
38.9% Hg was analysed by
atomic absorption spectroscopy.
The results were 38.9%, 37.4%
and 37.1%. At the 95% confidence level,
is there any evidence for
a systematic error in the method?
x 37.8%
x xt 11%
.
xi 113.4
s
xi2 4208.30
4208.30 (113.4) 2 3
0.943%
2
Assume null hypothesis (no bias). Only reject this if
x xt ts
But t (from Table) = 4.30, s (calc. above) = 0.943% and N = 3
ts
N 4.30 0.943
x xt ts
3 2.342%
Therefore the null hypothesis is maintained, and there is no

evidence for systematic error at the 95% confidence level.
Are two sets of measurements significantly different?

Suppose two samples are analysed under identical conditions.
Sample 1 x1 from N 1 replicate analyses
Sample 2 x2 from N 2 replicate analyses
Are these significantly different?

Using definition of pooled standard deviation, the equation on the last
overhead can be re-arranged:
x1 x2 ts pooled
N1 N 2
N1 N 2
Only if the difference between the two samples is greater than the term on
the right-hand side can we assume a real difference between the samples.
Test for significant difference between two sets of data

Two different methods for the analysis of boron in plant samples
gave the following results (g/g):
(spectrophotometry)
(fluorimetry)
Each based on 5 replicate measurements.
At the 99% confidence level, are the mean values significantly
different?
Calculate spooled = 0.267. There are 8 degrees of freedom,
therefore (Table) t = 3.36 (99% level).
Level for rejecting null hypothesis is
ts N 1 N 2 N 1 N 2 - i.e. ( 3.36)( 0.267) 10 25

i.e. 0.5674, or 0.57 g/g.
But x1 x2 28.0 26.25 1.75g / g
i. e. x1 x2 ts pooled
N1 N 2
N1 N 2
Therefore, at this confidence level, there is a significant

difference, and there must be a systematic error in at least
one of the methods of analysis.
DetectionofGrossErrors
Asetofresultsmaycontainanoutlyingresult
outoflinewiththeothers.
Shoulditberetainedorrejected?
Thereisnouniversalcriterionfordecidingthis.
OnerulethatcangiveguidanceistheQtest.
Considerasetofresults
TheparameterQexpisdefinedasfollows:
Qexp x q xn /w
wherexq = questionableresult
xn = nearestneighbour
w = spreadofentireset
QexpisthencomparedtoasetofvaluesQcrit:
Qcrit (reject if Qexpt > Qcrit)
No. of observations
90%
95%
99% confidencelevel
3
0.941
0.970
0.994
4
0.765
0.829
0.926
5
0.642
0.710
0.821
6
0.560
0.625
0.740
7
0.507
0.568
0.680
8
0.468
0.526
0.634
9
0.437
0.493
0.598
10
0.412
0.466
0.568
RejectionofoutlierrecommendedifQexp>Qcritforthedesiredconfidencelevel.
Note:1.
Thehighertheconfidencelevel,thelesslikelyis
rejectiontoberecommended.
2.Rejectionofoutlierscanhaveamarkedeffectonmean
andstandarddeviation,esp.whenthereareonlyafew
datapoints.Alwaystrytoobtainmoredata.
3.Ifoutliersaretoberetained,itisoftenbettertoreport
themedianvalueratherthanthemean.
Q Test for Rejection

of Outliers
The following values were obtained for

the concentration of nitrite ions in a sample
of river water: 0.403, 0.410, 0.401, 0.380 mg/l.
Should the last reading be rejected?
Qexp 0.380 0.401 ( 0.410 0.380) 0.7

But Qcrit = 0.829 (at 95% level) for 4 values
Therefore, Qexp < Qcrit, and we cannot reject the suspect value.
Suppose 3 further measurements taken, giving total values of:
0.403, 0.410, 0.401, 0.380, 0.400, 0.413, 0.411 mg/l. Should
0.380 still be retained?
Qexp 0.380 0.400 ( 0.413 0.380) 0.606

But Qcrit = 0.568 (at 95% level) for 7 values
Therefore, Qexp > Qcrit, and rejection of 0.380 is recommended.
But note that 5 times in 100 it will be wrong to reject this suspect value!
Also note that if 0.380 is retained, s = 0.011 mg/l, but if it is rejected,
s = 0.0056 mg/l, i.e. precision appears to be twice as good, just by
rejecting one value.
Obtaining a representative sample

Homogeneousgaseousorliquidsample
Noproblemanysamplerepresentative.
Solidsamplenogrossheterogeneity
Takeanumberofsmallsamplesatrandomfrom throughoutthebulkthiswill
giveasuitablerepresentativesample.
Solidsampleobviousheterogeneity
Takesmallsamplesfromeachhomogeneousregionand
mixtheseinthesameproportionsasbetweeneach
regionandthewhole.
Ifitissuspected,butnotcertain,thatabulkmaterialisheterogeneous,then
itisnecessarytogrindthesampletoafinepowder,andmixthisvery
thoroughlybeforetakingrandomsamplesfromthebulk.
For a very large sample - a train-load of metal ore, or soil in a field - it is always
necessary to take a large number of random samples from throughout the whole.
Sample Preparation
and Extraction
Maybemanyanalytespresentseparationseelater.
Maybesmallamountsofanalyte(s)inbulkmaterial.
Needtoconcentratethesebeforeanalysis.e.g.heavymetalsin
animaltissue,additivesinpolymers,herbicideresiduesinflouretc.etc.
Maybehelpfultoconcentratecomplexmixturesselectively.
Mostgeneraltypeofpretreatment:EXTRACTION.
Classicalextractionmethodis:
(namedafterdeveloper).
Apparatus
Sampleinporous
thimble.
Exhaustiverefluxfor
upto12days.
Solutionofanalyte(s)
involatilesolvent
(e.g.CH2Cl2,CHCl3etc.)
Evaporatetodrynessor
suitableconcentration,
forseparation/analysis.
SOXHLETEXTRACTION

Statistics For Analytical Chemistry

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Statistics For Analytical Chemistry

Hochgeladen von

Copyright:

Verfügbare Formate

The General Analytical Problem

Errors in Chemical Analysis

Carry out replicate measurements

Where xi = individual values of x and N = number of replicate

Illustration of Mean and Median

(latter is more useful in practice)

Illustrating the difference between accuracy and precision

Low accuracy, low precision

Low accuracy, high precision

High accuracy, low precision

High accuracy, high precision

Some analytical data illustrating accuracy and precision

Analyst 4: imprecise, inaccurate

Types of Error in Experimental

Sources of Systematic Error

Systematic errors can be

Combination of Random Errors

The next overhead shows this in graphical form

Frequency Distribution for

A very large number of

Replicate Data on the Calibration of a 10ml Pipette

Calibration data in graphical form

A = histogram of experimental results

SAMPLE = finite number of observations

Main properties of Gaussian curve:

: measure of precision of a population of data,

Where = population mean; N is very large.

Two Gaussian curves with two different

General Gaussian curve plotted in

Sample Standard Deviation, s

Two differences cf. to equation for :

Use sample mean instead of population mean.

Use degrees of freedom, N - 1, instead of N.

Alternative Expression for s

Reproducibility of a method for determining

Standard Deviation of a Sample

Selenium content (g/g) (xI)

Mean = xi/N= 0.077g/g

(xi)2/N = 0.4761/9 = 0.0529

Coefficient of variance = 9.2% Concentration = 0.077 0.007 g/g

Standard Error of a Mean

Pooled Standard Deviation

Analysis of 6 bottles of wine

Bottle Sugar%(w/v) No.ofobs.

Two alternative methods for measuring the precision of a set of results:

This is the square of the standard deviation:

COEFFICIENT OF VARIANCE (CV)

Use of Statistics in Data

The range of uncertainty depends how closely s corresponds to.

Define some terms:

interval around the mean that probably contains .

First assume that the known s is a good

Percentages of area under Gaussian curves between certain limits of z (= x - )

of area lies between

For a single measurement: CL for = x z (values of z on next overhead)

x ), the equivalent expression is:

Values of z for determining Confidence

these figures assume that an excellent approximation

Confidence Limits when is known

of 8.53 g Cu/ml. Pooled results of many analyses showed s = 0.32 g Cu/ml.

The calculated values of t are given on the next overhead

Values of t for various levels of probability

Confidence Limits where is not known