Beruflich Dokumente
Kultur Dokumente
Stefano Siboni 0
Ph. D. in Materials Engineering Statistical methods
1. EXPERIMENTAL ERRORS
Measurement error:
dierence between the value experimentally measured of a
quantity and the true value of the same quantity
never certain, it can only be estimated
Systematic errors:
reproducible errors due to incorrect measurement techniques
or to the use of facilities not correctly working or calibrated
if recognized, they can be removed or corrected
Random errors:
describe the dispersion of results of repeated experimental
measurements
not related to well-dened and recognized sources
not reproducible
Absolute error:
estimate of the measurement error of a quantity
(of which it has the same physical units)
Relative error:
ratio between the absolute error and the estimated true value
of a quantity
pure number, typically expressed as a percentage
it quanties the precision of a measurement
Stefano Siboni 1
Ph. D. in Materials Engineering Statistical methods
Accuracy of a measurement:
determines how close the result of an experimental measure-
ment can be considered to the true value of the measured
quantity
Precision of a measurement:
expresses how exactly the result of a measurement is deter-
mined, no matter the true value of the measured quantity is.
Synonym of reproducibility
precision reproducibility
unimportance of random errors
unimportance of the relative error
Statistical population:
innite and hypothetical set of points (values) of which the
experimental results are assumed to be a random sample
Stefano Siboni 2
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 3
Ph. D. in Materials Engineering Statistical methods
2. RANDOM VARIABLES
Discrete RV:
RV assuming only a nite number or a countable innity of
real values
Continuous RV:
RV taking an uncountable innity of real values, typically any
real value of an interval
Stefano Siboni 4
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 5
Ph. D. in Materials Engineering Statistical methods
Tchebyshev theorem:
The probability that a continuous RV takes values in the inter-
val centred on the mean and of half-width k, with k > 0, is
not smaller than 1 (1/k2 ):
+k
1
p[ k x + k] = p(x) dx 1
k k2
Proof +
2
= p(x)(x )2 dx
k +
p(x)(x ) dx +
2
p(x)(x )2 dx
+k
k +
p(x)k2 2 dx + p(x)k2 2 dx =
+k
+k
= k2 2 1 p(x) dx
k
+k
1
= 1 p(x) dx
k2 k
Stefano Siboni 6
Ph. D. in Materials Engineering Statistical methods
Bernoulli variable
This RV can assume only a nite number of real values xi ,
i = 1, . . . , n, with
n probabilities wi = p(xi ). The condition of
normalization i=1 wi = 1 holds. Moreover:
n
n
= wi xi 2
= wi (xi )2
i=1 i=1
Binomial variable
The probability (mass) distribution is
n!
p(x) = px (1 p)nx x = 0, 1, . . . , n
x!(n x)!
for given n N and p (0, 1). Mean and variance are:
= np 2 = np(1 p)
This RV arises in a natural way by considering n independent
identically distributed Bernoulli RVs (x1 , x2 , . . . , xn ), with val-
ues {1,
n0} and probabilities p and 1 p respectively: the sum
x = i=1 xi is a binomial variable.
Poisson variable
It has probability (mass) distribution
x
m m
p(x) = e x = 0, 1, 2, . . .
x!
with m a positive constant. Mean and variance take the same
value:
= m 2 = m
Stefano Siboni 7
Ph. D. in Materials Engineering Statistical methods
Uniform variable
All the values of the RV x within the interval [a, b] are equally
likely, the others are impossible. The probability (density) dis-
tribution is:
1 1 if x [a, b]
p(x) = xR
b a 0 otherwise
a+b 2 (b a)2
= =
2 12
Stefano Siboni 8
Ph. D. in Materials Engineering Statistical methods
1
e(x) /2
2 2
p(x) = xR
2
1
p(z) = ez /2
2
zR
2
Stefano Siboni 9
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 10
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 11
Ph. D. in Materials Engineering Statistical methods
p(z, X 2 ) = p(z) pn (X 2 )
Stefano Siboni 12
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 13
Ph. D. in Materials Engineering Statistical methods
n2 X12
F =
n1 X22
where:
Stefano Siboni 14
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 15
Ph. D. in Materials Engineering Statistical methods
1
n
zn = (xi ) xi R
n i=1
Note:
We say that the sequence of RVs (zn )nN converges in dis-
tribution to the standard normal z.
The condition that variables are identically distributed can be
weakened (= more general results).
Stefano Siboni 16
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 17
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 18
Ph. D. in Materials Engineering Statistical methods
(x1 1 )2 (x2 2 )2
1
22 22
p(x1 , x2 ) = e 1 2 =
21 2
(x1 1 )2 (x2 2 )2
1 1
= e 22
1 e 22
2
2 1 2 2
3 x2 xyy2
p(x, y) = e
2
are stochastically dependent
Stefano Siboni 19
Ph. D. in Materials Engineering Statistical methods
() Mean of xi :
i = E(xi ) = xi p(x1 , . . . , xn ) dx1 . . . dxn R
Rn
() Covariance of xi and xj , i = j:
() Variance of xi :
var(xi ) = E[(xi i )2 ] =
= (xi i )2 p(x1 , . . . , xn ) dx1 . . . dxn > 0
Rn
Stefano Siboni 20
Ph. D. in Materials Engineering Statistical methods
() Correlation of xi and xj , i = j:
cov(xi , xj )
corr(xi , xj ) =
[1, +1]
var(xi ) var(xj )
The correlation of a variable with itself is dened but not mean-
ingful, since corr(xi , xi ) = 1 i = 1, . . . , n
Uncorrelated RVs:
RVs with null correlation
xi and xj uncorrelated corr(xi , xj ) = 0
(x1 , . . . , xn ) uncorrelated V diagonal
C diagonal
Theorem
The stochastic independence is a sucient but not neces-
sary condition to the uncorrelation of two or more RVs.
For instance, if x1 and x2 are independent we get
cov(x1 , x2 ) =
= (x1 1 )(x2 2 )p1 (x1 ) p2 (x2 ) dx1 dx2 =
R
2
Stefano Siboni 21
Ph. D. in Materials Engineering Statistical methods
Mean of x:
(detA)1/2 1 (x)T A(x)
E(x) = x n/2
e 2 dx1 . . . dxn =
Rn (2)
Covariance matrix of x:
coincides with the inverse of the structure matrix
C = A1
Theorem
For multivariate normal RVs, stochastic independence is equiv-
alent to uncorrelation
Stefano Siboni 22
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 23
Ph. D. in Materials Engineering Statistical methods
Warning: If two RVs x and y are normal, the pair (x, y) may
not be a bivariate normal RV!
1st counterexample:
If x is standard normal and y = x, then y is standard normal
dx 1 2 1
py (y) = px (x) = e 2 (y) 1 = e 2 y
1 1 2
dy 2 2
x=y
1
p(x, y) = e 2 x (x + y) ,
1 2
2nd counterexample:
For a standard normal x, the RV
x if |x| 1
y =
x if |x| > 1
2
p(x, y) =
e 2 x (y x) if |x| > 1
1 1 2
Stefano Siboni 24
Ph. D. in Materials Engineering Statistical methods
3. INDIRECT MEASUREMENTS
FUNCTIONS OF RANDOM VARIABLES
3.1 Generalities
() Direct measurement:
obtained by a comparison with a specimen or the use of a
calibrated instrument
() Indirect measurement:
performed by a calculation, from a relationship among quanti-
ties which are in turn measured directly on indirectly
Stefano Siboni 25
Ph. D. in Materials Engineering Statistical methods
x = f (x1 , x2 , . . . , xn )
Stefano Siboni 26
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 27
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 28
Ph. D. in Materials Engineering Statistical methods
y = Bx
E(Bx) = B E(x) = B ,
Stefano Siboni 29
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 30
Ph. D. in Materials Engineering Statistical methods
with
0 1 1 2
A = and B =
1 0 2 1
We have that:
0 1 1 2 2 1
AB = = = 0
1 0 2 1 1 2
and analogously
1 2 1 2 5 4
B2 = = = B
2 1 2 1 4 5
Stefano Siboni 31
Ph. D. in Materials Engineering Statistical methods
n
T
Qr = x Ar x = (Ar )ih xi xh r = 1, 2, . . . , k
i,h=1
n
k
x2i = Qj
i=1 j=1
Stefano Siboni 32
Ph. D. in Materials Engineering Statistical methods
n
Q1 + Q2 = x2i
i=1
Q1 + Q2 = Q
Stefano Siboni 33
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 34
Ph. D. in Materials Engineering Statistical methods
z = f (x, y) = x sin y + x2 y
f
(0.5, 0.2) = sin 0.2 + 2 0.5 0.2 = 0.398669331
x
f
(0.5, 0.2) = 0.5 cos 0.2 + 0.52 = 0.740033289
y
and variance
Stefano Siboni 35
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 36
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 37
Ph. D. in Materials Engineering Statistical methods
Ax = b
Stefano Siboni 38
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 39
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 40
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 41
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 42
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 43
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 44
Ph. D. in Materials Engineering Statistical methods
Theorem
Let x be a RV in R with probability distribution p(x) and let
g : R R be a monotonic increasing C 1 function. The
RV
y = g(x)
has then the range (g(), g(+)), with probability distribu-
tion
1
p(y) = p[g 1 (y)]
dg
(x) 1
dx x=g (y)
Remark
If y = g(x) = P (x) we get
1
p(y) = p[g 1 (y)]
= 1
p(x)
x=g 1 (y)
Stefano Siboni 45
Ph. D. in Materials Engineering Statistical methods
x = P 1 (z)
Stefano Siboni 46
Ph. D. in Materials Engineering Statistical methods
xn = yn /2147483647
Stefano Siboni 47
Ph. D. in Materials Engineering Statistical methods
4. SAMPLE THEORY
SAMPLE ESTIMATES OF AND 2
In particular:
Sample estimate of mean and variance of the distribution
In general:
Hypothesis testing on the distribution and the parameters
therein
Stefano Siboni 48
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 49
Ph. D. in Materials Engineering Statistical methods
1 1
n n
2
E[(x ) ] = 2
2
E[(xi )(xj )] = 2 2
ij =
n n n
i,j=1 i,j=1
> 0 lim p[ xn + ] = 1
n
Stefano Siboni 50
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 51
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 52
Ph. D. in Materials Engineering Statistical methods
Therefore, also for the sample estimate of the variance the prob-
ability distribution tends to shrink and concentrate around the
variance 2
Stefano Siboni 53
Ph. D. in Materials Engineering Statistical methods
() the RV
1
n
(n 1)s2
X 2
= = (xi x) 2
2 2 i=1
obeys a X 2 distribution with n 1 d.o.f.
Proof
X 2 is a positive semidenite quadratic form of the indepedent
standard RVs (xi )/
n 2 x 2 n
xi 1 xi xj
X =
2
n = ij
n
i=1 ij=1
with an idempotent representative matrix
n
1 1 1
n
1 1 1
ij jk = ij jk ij jk + 2 = ik
j=1
n n j=1
n n n n
of rank n 1 (the eigenvalues are 1 and 0, with multiplicities
n 1 and 1, respectively)
Stefano Siboni 54
Ph. D. in Materials Engineering Statistical methods
() the RV
x
t =
s/ n
is a Students t with n 1 d.o.f.
Proof
There holds
x x x 1
t = = n = n1 n
s/ n s (n 1)s2
2
so that z
t =n 1
X2
with z and X 2 stochastically independent, the former standard
normal and the latter X 2 with n 1 d.o.f.
Stefano Siboni 55
Ph. D. in Materials Engineering Statistical methods
t[](n) = t[1](n)
Stefano Siboni 56
Ph. D. in Materials Engineering Statistical methods
with(stats):
statevalf[icdf, students[n]]();
statevalf[icdf, studentst[12]](0.7);
0.5386176682
so that
t[12](0.7) = 0.5386176682
TINV(; n)
Stefano Siboni 57
Ph. D. in Materials Engineering Statistical methods
(n 1)s2
X 2
[
2 ](n1)
2
X 2 [1 2 ](n1)
is then veried with probability 1 and denes the (two-
sided) condence interval of the variance as
1 1
(n 1)s2 2 (n 1)s2
X2 [1
2 ](n1)
X2 [
2 ](n1)
(n 1)s2 1
X 2
[](n1) 2 (n 1)s2
2 X 2 [](n1)
Stefano Siboni 58
Ph. D. in Materials Engineering Statistical methods
with(stats):
statevalf[icdf, chisquare[n]]();
statevalf[icdf, chisquare[13]](0.3);
9.925682415
and therefore
X 2 [13](0.3) = 9.925682415
CHIINV(; n)
returns the value of X 2 [1](n) . Consequently, X 2 [](n) can be
calculated as
CHIINV(1 ; n)
Stefano Siboni 59
Ph. D. in Materials Engineering Statistical methods
i xi i xi
1 1.0851 13 1.0768
2 834 14 842
3 782 15 811
4 818 16 829
5 810 17 803
6 837 18 811
7 857 19 789
8 768 20 831
9 842 21 829
10 786 22 825
11 812 23 796
12 784 24 841
Stefano Siboni 60
Ph. D. in Materials Engineering Statistical methods
1.08039 1.08257
or, equivalently,
[1.08148 0.00109]
1 1
23 s2 = 23 6.6745 106 = 13.1332 106
X 2 [0.025](23) 11.689
and the CI becomes
Stefano Siboni 61
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 62
Ph. D. in Materials Engineering Statistical methods
For a xed z[1 2 ] > 0, the probability that the RV zn takes any
value within the interval (symmetric with respect to zn = 0)
z[1 2 ] zn z[1 2 ]
is approximately given by the integral between z[1 2 ] and
z[1 2 ] of the standard normal distribution
z[1 ]
2
1
ez /2 dz
2
2
z[1 ]
2
the critical value z[1 2 ] > 0 being thus dened by the equation
z[1 ]
2
1 1
e
2
/2
d =
2 2
0
Stefano Siboni 63
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 64
Ph. D. in Materials Engineering Statistical methods
x
z[1 2 ] z[1 2 ]
s/ n
with(stats):
statevalf[icdf, normald[0, 1]](1 2 );
so that
z0.8 = 0.8416212336
Stefano Siboni 65
Ph. D. in Materials Engineering Statistical methods
1
n
E[(xi ) ] m4 =
4
(xi x)4
n1
i=1
Stefano Siboni 66
Ph. D. in Materials Engineering Statistical methods
s2 2
zn =
1 3n 4
m4 + s
n n(n 1)
Stefano Siboni 67
Ph. D. in Materials Engineering Statistical methods
0.05
= = 0.025 = 1 = 1 0.025 = 0.975
2 2 2
On the table of the standard normal distribution we read then
and nally
0.81812 cm 0.8298 cm .
Stefano Siboni 68
Ph. D. in Materials Engineering Statistical methods
= 0.824 0.0058 cm .
that is
0.042 0.042
0.824 2.58 0.824 + 2.58
200 200
and nally
0.8163 cm 0.8317 cm .
Stefano Siboni 69
Ph. D. in Materials Engineering Statistical methods
5. HYPOTHESIS TESTING
Stefano Siboni 70
Ph. D. in Materials Engineering Statistical methods
The set of the possible values of the test variable is divided into
two regions: a rejection region and an acceptance region.
As a rule, these regions are (possibly unions of) real intervals
If the value of the test variable for the given sample falls in the
rejection region, the null hypothesis is rejected
Stefano Siboni 71
Ph. D. in Materials Engineering Statistical methods
An illustrative example
() Process to be analysed:
repeated toss of a die (independent tosses)
Stefano Siboni 72
Ph. D. in Materials Engineering Statistical methods
() Probability distribution of f :
H0 true = = 1/6
H1 true = = 1/4
Stefano Siboni 73
Ph. D. in Materials Engineering Statistical methods
() Testing strategy:
f fcr = acceptance of H0
Stefano Siboni 74
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 75
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 76
Ph. D. in Materials Engineering Statistical methods
PARAMETRIC TESTS
!
t-test on the mean of a normal population
X 2 -test on the variance of a normal population
z -test to compare the means of 2 independent normal populations
NON-PARAMETRIC TESTS
Sign test for the median
Sign test to check if 2 paired samples belong to the same population
.................
Stefano Siboni 77
Ph. D. in Materials Engineering Statistical methods
(fi ni )2
X 2
=
ni
i
wherein the sum is over all the bins of the histogram, follows
approximately a X 2 distribution with k1c d.o.f., being:
k the number of bins;
c the number of parameters of the distribution which are
not a priori assigned, but calculated by using the
same sample (the so-called constraints)
Stefano Siboni 78
Ph. D. in Materials Engineering Statistical methods
1 (1 ) =
Stefano Siboni 79
Ph. D. in Materials Engineering Statistical methods
xi xi xi xi
0.101 0.020 0.053 0.120
0.125 0.075 0.033 0.130
0.210 0.180 0.151 0.145
0.245 0.172 0.199 0.252
0.370 0.310 0.290 0.349
0.268 0.333 0.265 0.287
0.377 0.410 0.467 0.398
0.455 0.390 0.444 0.497
0.620 0.505 0.602 0.544
0.526 0.598 0.577 0.556
0.610 0.568 0.630 0.749
0.702 0.657 0.698 0.731
0.642 0.681 0.869 0.761
0.802 0.822 0.778 0.998
0.881 0.950 0.922 0.975
0.899 0.968 0.945 0.966
Stefano Siboni 80
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 81
Ph. D. in Materials Engineering Statistical methods
k1c = 810 = 7
Stefano Siboni 82
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 83
Ph. D. in Materials Engineering Statistical methods
0.5
1
ez /2 dz = 0.19146
2
P (x x < x + 0.5s) =
2
0
1
1
ez /2 dz =
2
P (x + 0.5s x < x + s) =
2
0.5
1 0.5
1 1
ez /2 dz ez /2 dz =
2 2
=
2 2
0 0
= 0.34134 0.19146 = 0.14674
1.5
1
ez /2 dz =
2
P (x + s x < x + 1.5s) =
2
1
1.5 1
1 1
ez ez /2 dz =
2 2
= /2
dz
2 2
0 0
= 0.43319 0.34134 = 0.09185
+
1
ez /2 dz =
2
P (x + 1.5s x) =
2
1.5
+ 1.5
1 1
ez /2 dz ez /2 dz =
2 2
=
2 2
0 0
= 0.5 0.43319 = 0.06681
Stefano Siboni 84
Ph. D. in Materials Engineering Statistical methods
8
(fi ni )2
X2 = = 17.370884
i=1
ni
812 = 5
Stefano Siboni 85
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 86
Ph. D. in Materials Engineering Statistical methods
Test variable
We consider the maximum residual between the theoretical and
empirical cumulative distribution
Dn = sup P (x) Pe,n (x)
xR
Theorem
If the random variables x1 , x2 , . . . are i.i.d. with continuous
cumulative distribution P (x), for any given > 0 there holds
! #
Probability Dn > Q()
n n+
with
Stefano Siboni 87
Ph. D. in Materials Engineering Statistical methods
Rejection region
A large value of the variable Dn must be considered as an
indication of a bad overlap between the empirical cumulative
distribution Pe,n (x) and the theoretical distribution P (x) con-
jectured
Therefore the null hypothesis will be rejected, with signicance
level , if
Q1 ()
Dn >
n
In particular
Q1 (0.10) = 1.22384 Q1 (0.05) = 1.35809
Q1 (0.01) = 1.62762 Q1 (0.001) = 1.94947
Stefano Siboni 88
Ph. D. in Materials Engineering Statistical methods
i xi i xi
1 0.750 17 0.473
2 0.102 18 0.779
3 0.310 19 0.266
4 0.581 20 0.508
5 0.199 21 0.979
6 0.859 22 0.802
7 0.602 23 0.176
8 0.020 24 0.645
9 0.711 25 0.473
10 0.422 26 0.927
11 0.958 27 0.368
12 0.063 28 0.693
13 0.517 29 0.401
14 0.895 30 0.210
15 0.125 31 0.465
16 0.631 32 0.827
Stefano Siboni 89
Ph. D. in Materials Engineering Statistical methods
Solution
To apply the test we need to compute the empirical cumulative
distribution and compare it to the cumulative distribution of a
uniform RV in [0, 1].
The cumulative distribution of a uniform RV in [0, 1] is deter-
mined in an elementary way
0 x<0
P (x) = x 0x<1
1 x1
To compute the empirical cumulative distribution we must re-
arrange the sample data into increasing order:
xi xi xi xi
0.020 0.297 0.517 0.779
0.063 0.310 0.581 0.802
0.102 0.368 0.602 0.827
0.125 0.401 0.631 0.859
0.176 0.422 0.645 0.895
0.199 0.465 0.693 0.927
0.210 0.473 0.711 0.958
0.266 0.508 0.750 0.979
Stefano Siboni 90
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 91
Ph. D. in Materials Engineering Statistical methods
D32 = 0.048
Stefano Siboni 92
Ph. D. in Materials Engineering Statistical methods
Q1 (0.05) = 1.35809
Q1 (0.05) 1.35809
D32 > = = 0.2400786621
32 5.656854249
Stefano Siboni 93
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 94
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 95
Ph. D. in Materials Engineering Statistical methods
x 0
t =
s/ n
(i) H0 : = 0 versus H1 : = 0
The test is two-sided. The rejection region consists in the union
of two intervals placed symmetrically with respect to the origin
t=0
t t[1 2 ](n1) t t[1 2 ](n1)
Stefano Siboni 96
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni 97
Ph. D. in Materials Engineering Statistical methods
i xi i xi
1 449 16 398
2 391 17 472
3 432 18 449
4 459 19 435
5 389 20 386
6 435 21 388
7 430 22 414
8 416 23 376
9 420 24 463
10 381 25 344
11 417 26 353
12 407 27 400
13 447 28 438
14 391 29 437
15 480 30 373
Solution
Number of data: n = 30
1
n
Sample mean: x = xi = 415.66
n
i=1
Stefano Siboni 98
Ph. D. in Materials Engineering Statistical methods
1
n
2
Sample variance: s = (xi x)2 = 1196.44
n 1 i=1
Estimated standard deviation: s = s2 = 34.59
415.66 425
t = = 1.47896
34.59/ 30
Stefano Siboni 99
Ph. D. in Materials Engineering Statistical methods
s 34.59
x t[0.99](29) = 415.66 2.462 = 400.11
30 30
s 34.59
x + t[0.99](29) = 415.66 + 2.462 = 431.21
30 30
400.11 431.21
Generally speaking:
The test variable takes a value in the acceptance region of H0 :
= 0 if and only if the tested value 0 of the mean belongs
to the condence interval of (the signicance level of the test,
, and the condence level of the CI, 1 , are complementary
to 1).
(n 1)s2
X 2
=
02
(i) H0 : 2 = 02 versus H1 : 2 = 02
Two-sided test with rejection region of H0
X X
2 2
[
2 ](n1)
X X
2 2
[1
2 ](n1)
i Vi i Vi
1 1.426 16 1.497
2 1.353 17 1.484
3 1.468 18 1.466
4 1.379 19 1.383
5 1.481 20 1.411
6 1.413 21 1.428
7 1.485 22 1.358
8 1.476 23 1.483
9 1.498 24 1.473
10 1.475 25 1.482
11 1.467 26 1.435
12 1.478 27 1.424
13 1.401 28 1.521
14 1.492 29 1.399
15 1.376 30 1.489
Number of data: n = 30
1
n
Sample mean: V = Vi = 1.4467
n i=1
Sample variance:
1
n
2
s = (Vi V )2 = 0.00221318
n1
i=1
Sample standard deviation: s = s2 = 0.04704448
29 s2 29 0.00221318
= = 12.94311 104
X 2 [0.99](29) 49.588
whereas
29 s2 29 0.00221318
= = 45.02125 104
X 2 [0.01](29) 14.256
X 2 = (n 1)s2 /02
(n 1)s2 29 0.002213183
X 2
= = = 20.057
02 32 104
1 1
p q
y = yi z = zj
p i=1 q j=1
(y1 , y2 , . . . , yp ) (z1 , z2 , . . . , zq )
1 1
p q
y = yi z = zj
p i=1 q j=1
1 1
p q
(p 1)s2y (q 1)s2z
= (yi y) 2
= (zj z) 2
12 12 i=1 22 22 j=1
n1 +n2 dF
1
2 n 2 n1 2
0 2 2 1 + F
n 2
with(stats):
statevalf[icdf, fratio[n1, n2 ]]();
so that
F[0.85](10,13) = 1.841918109
FINV(; n1 ; n2 )
F = s2A /s2B
{F F[0.025](9,11)} {F F[0.975](9,11) }
where
The F -values are not available on the table, but can be easily
calculated by numerical tools. For instance, by the following
Maple commands
0.49822695
/ {F 0.2556188} {F 3.587898}
2 (p 1)s2y + (q 1)s2z
s = ,
p+q2
the random variable
yz
t = $
s 1p + 1
q
1
p q
X 2
= 2 (yi y) +
2
(zj z) 2
i=1 j=1
where:
1
p+q
(xk x)2 is a X 2 variable with p + q 1 d.o.f.
2
k=1
1
p q
2
(yi y) +
2
(zj z) is a X 2 with p + q 2 d.o.f.
2
i=1 j=1
1 pq
2
(y z)2 is a X 2 with 1 d.o.f.
p+q
The previous properties imply the stochastic independence
of the random variables
1
p q
1 pq
2
(yi y) 2
+ (zj z) 2
and 2
(y z)2
i=1 j=1
p+q
and therefore of
1
p q
1 pq
(yi y) +
2
(zj z) 2
and (y z)
2 i=1 j=1
p+q
Process 1 Process 2
9 14
4 9
10 13
7 12
9 13
10 8
10
1 1
p p
x = xi = 8.17 2
sx = (xi x)2 = 5.37
p p1
i=1 i=1
1 1
q q
y = yj = 11.29 2
sy = (yj y)2 = 5.24
q q1
j=1 j=1
2
(p 1)s2x + (q 1)s2y 5 5.37 + 6 5.24
s = = = 5.299
p+q2 11
xy 8.17 11.29
t = = 1 1 = 2.436
1 1
s + 5.299 +
p q 6 7
di = yi zi , i = 1, . . . , n
d yz
t = =
sd / n
1 1
n
(yi zi y + z)2
n n 1 i=1
since
t[1 2 ](n1) = t[0.975](7) = 2.365
1
8
y = yi = 8.425
8
i=1
1
8
z = zi = 10.1875
8
i=1
while
1
n
1
(yi zi y + z)2 = 21.89875 = 3.12839286
n 1 i=1 7
and therefore
1 n
(yi zi y + z)2 = 3.12839286 = 1.76872634
n 1 i=1
Remark
All the t-tests for the comparison of the means of two normal
populations are implemented in Excel by the function TTEST:
unpaired with equal variances
unpaired with dierent variances
paired
in both the two-sided and the one-sided cases.
Question
Are the mean values of the quantity x signicantly dierent for
the various treatments?
Or are they essentially equal, so that the various treatments
do not modify the property x in a signicant way?
The model
Set of random variables with two indices
xij i = 1, . . . , r j = 1, . . . , c
Hypothesis to be tested
H0 : i = i = 1, . . . , r (equal means)
Method
ANOVA
For a set of data ranked in r samples of c data each, the 1-
factor analysis of variance consists in the interpretation (or
explanation) of data spread by distinguishing
a contribution due to the choice of the sample (row)
an intrinsic contribution of the single samples
1-factor ANOVA
when the factor is related to the rst index
xij = + i + ij i = 0 ij i.i.d. N (0, )
i
ij i.i.d. N (0, )
etc. etc.
Mean of data
1 1
r c
x = xij = xij
rc rc
i=1 j=1 i,j
Mean of a sample
the mean of the data in a row, i.e. within a sample
1
c
xi = xij
c j=1
Residual
dierence between the value of a variable and the mean of data
xij x
Total variation
the (residual) sum of squares
SSt = (xij x)2
i,j
r
SS1 = (xi x)2 = c (xi x)2
i,j i=1
It expresses the data variation due to the fact that the samples
(rows) have unequal means
Unexplained variation
the sum of squares of unexplained residuals
SSu = (xij xi )2
i,j
1-factor ANOVA
Model:
xij = + i + ij i = 1, . . . , r j = 1, . . . , c
with:
i = 0
i
and
ij i.i.d. random variables N (0, ) i, j
We make the strong assumption that is the same for all the
variables
1
2
(xij x i )2 =
i,j
%1 &
1
= (ij )(I P1 P2 ) (ij ) = X 2 rc1
1
(x i x i ) 2
=
2 i,j
%1 &
1
= (ij )P2 (I P1 ) (ij ) = X 2 r1
1
2
(xij xi )2 =
i,j
%1 &
1
= (ij )(I P2 ) (ij ) = X 2 r(c1)
with X 2 r1 and X 2 r(c1) stochastically independent
List of symbols:
(vij ) = (v11 , v12 , . . . , vrc ) Rrc (arbitrary vector of Rrc )
Fundamental theorem
For any set of data xij the total variation is the sum of the
variation explained by the factor related to i and of the unex-
plained one
SSt = SS1 + SSu
If moreover the variables xij are independent and identically
distributed, with normal distribution of mean and variance
2 , we have that
SSt / 2 is a X 2 variable with rc 1 d.o.f.
SS1 / 2 is a X 2 variable with r 1 d.o.f.
SSu / 2 is a X 2 variable with rc r d.o.f.
SS1 / 2 and SSu / 2 are stochastically independent
Remarks
we have a balance on the number of d.o.f.
rc 1 = (r 1) + (rc r)
the (residual) sums of squares divided by the respective num-
ber of d.o.f. are dened as variances
SSt /(rc 1) total variance
SS1 /(r 1) variance explained by the factor considered
SSu /(rc r) unexplained variance
the ratio between the explained variance and the unexplained
one
SS1 r(c 1) r(c 1) SS1
F = =
r 1 SSu r 1 SSu
follows a Fisher distribution with (r 1, rc r) d.o.f.
H0 : i = i = 1, . . . , r
1 > 2
SS1
r1
F Fcritical = F[1](r1,r(c1))
Remark
In the conditions assumed for the model same variance 2
and means i for the various samples it is often important
to determine the condence intervals of the mean dierences
We can apply the following result
Schee theorem
For any r-uple of real coecients (C1 , C2 , . . . , Cr ) with zero
mean
r
Ci = 0
i=1
Question
Have the chemical reagent or the treatment temperature a sig-
nicant eect on the property x of the material?
Model
For any i = 1, . . . , r and j = 1, . . . , c the random variable xij
is normal with mean ij and variance 2
Remark
The mean of the variable xij may depend on both the indices
i and j (on both the chemical reagent and the temperature),
while the variance 2 is the same
Hypothesis to be tested
H0 : ij = i = 1, . . . , r, j = 1, . . . , c (equal means)
Denitions
Mean of data
1 1
r c
x = xij = xij
rc rc
i=1 j=1 i,j
1
c
xi = xij
c j=1
1
c
xj = xij
r i=1
Residual
xij x
Residual explained by the factor related to index i
xi x
xj x
Unexplained residual
Total variation
SSt = (xij x)2
i,j
r
SS1 = (xi x)2 = c (xi x)2
i,j i=1
c
SS2 = (xj x)2 = r (xj x)2
i,j j=1
Unexplained variation
SSu = (xij xi xj + x)2
i,j
Model:
xij = + i + j + ij i = 1, . . . , r j = 1, . . . , c
with:
i = 0 j = 0
i j
and
ij i.i.d. random variables N (0, ) i, j
1
2
(xij x i j )2 =
i,j
%1 &
1
= (ij )(I P1 P2 ) (ij ) = X 2 rc1
1
2
(xi x i )2 =
i,j
%1 &
1
= (ij )P2 (I P1 ) (ij ) = X 2 r1
1
(xj x j ) 2
=
2 i,j
%1 &
1
= (ij )(I P2 )P1 (ij ) = X 2 c1
1
(xij x i xj + x) 2
=
2
i,j
%1 &
1
= (ij )(I P2 )(I P1 ) (ij ) = X 2 (r1)(c1)
with
X 2 r1 X 2 c1 X 2 (r1)(c1)
stochastically independent
Fundamental theorem
For any set xij the total variation coincides with the sum of
the variation explained by the factor related to i
the variation explained by the factor associated to j
the unexplained variation
Remarks
rc 1 = (r 1) + (c 1) + (r 1)(c 1)
Test variable
Fcritical = F[1](r1,(r1)(c1))
F > F[1](r1,(r1)(c1))
Test variable
Fcritical = F[1](c1,(r1)(c1))
F > F[1](c1,(r1)(c1))
Model:
xijk = + i + j + ij + ijk
i = 1, . . . , r j = 1, . . . , c k = 1, . . . , s
with:
i = 0 j = 0
i j
ij = 0 j ij = 0 i
i j
and
ijk i.i.d. random variables N (0, ) i, j, k
stochastically independent
Hypothesis to be tested:
H0 : the median of the distribution is m
H1 : the median of the distribution is dierent from m
Test variable
For a set x1 , x2 , . . . , xn of outcomes of x we pose
1 if xi m > 0
si = i = 1, . . . , n
0 if xi m 0
and introduce as a test variable the number of positive dier-
ences with respect to the median
n
z = si
i=1
Rejection region
We are led to reject the hypothesis H0 whenever the number
of positive dierences with respect to the median turns out to
be too large or too small
{z zmin } {z zmax }
For a given signicance level of the test, the bounds zmin and
zmax of the rejection region are therefore dened by
min
n 1 n 1
z n
= =
z=0
z 2n 2 z=z
z 2 n 2
max
The sign test is very useful to check whether two paired samples
are drawn from the same statistical population,
i.e. if two paired samples follow the same probability distribu-
tion
The test relies on the remark that for any two independent
continuous random variables x and y with the same probability
distribution p(x) or p(y) the dierence RV
z = xy
pz (z) = pz (z) z R
Consequently,
pz (z) = p(y z) p(y) dy
R
Positive and (null or) negative dierences have then the same
probability to occur:
1 1
p(z > 0) = p(z 0) = .
2 2
min
n 1 n 1
z n
= =
z=0
z 2n 2 z=z
z 2n 2
max
Assuming that the drive conditions are the same, check if there
is any dierence in the performances of the engines due to the
use of the additive, with a signicance level of 5% and 1%.
Solution
We must test the null hypothesis H0 that no dierence of per-
formances occurs between the cars fed with additivated fuel
and the same vehicles lled up with normal gasoline, versus
the alternative hypothesis H1 that the cars lled up with ad-
ditivated fuel have better performances.
The hypothesis H0 is rejected in favour of H1 only if the number
of positive dierences is suciently large: the test is one-sided.
The distances per litre covered by the additivated and the non-
additivated cars show 12 positive dierences and 4 nega-
tive dierences.
If H0 is correct, the probability of having at least 12 positive
signs out of 16 dierences writes
16
16 1
16
=
z=12
z 2
16 16 16 16 16 1
= + + + + = 0.0384
12 13 14 15 16 216
Let us pose
|xk x| |xk |
z =
s
+z
1
e(x) /2 dx
2 2
p(z) = 1
2
z
z z
2 2 /2
p(z) = 1 e d = 1 erf
2 2
0
x
2
et dt
2
erf(x) =
0
n critical z n critical z
1 0.674489750 27 2.355084009
2 1.150349380 28 2.368567059
3 1.382994127 29 2.381519470
4 1.534120544 30 2.393979800
5 1.644853627 40 2.497705474
6 1.731664396 50 2.575829304
7 1.802743091 60 2.638257273
8 1.862731867 70 2.690109527
9 1.914505825 80 2.734368787
10 1.959963985 90 2.772921295
11 2.000423569 100 2.807033768
12 2.036834132 150 2.935199469
13 2.069901831 200 3.023341440
14 2.100165493 250 3.090232306
15 2.128045234 300 3.143980287
16 2.153874694 350 3.188815259
17 2.177923069 400 3.227218426
18 2.200410581 450 3.260767488
19 2.221519588 500 3.290526731
20 2.241402728 550 3.317247362
21 2.260188991 600 3.341478956
22 2.277988333 650 3.363635456
23 2.294895209 700 3.384036252
24 2.310991338 800 3.420526701
25 2.326347874 900 3.452432937
26 2.341027138 1000 3.480756404
var(x) = p(x) (ax+by)(xx )2 dydx = p(x)(xx )2 dx
R2 R
var(y) = p(x) (ax + b y)(y y )2 dydx =
R2
= p(x)(ax ax )2 dx = a2 p(x)(x x )2 dx
R R
cov(x, y) = p(x) (ax + b y)(x x )(y y )dydx =
R2
= p(x)(x x )(ax ax )dx = a p(x)(x x )2 dx
R R
cov(x, y) a !
+1 if a > 0
corr(x, y) =
= =
var(x) var(y) |a| 1 if a < 0
with
y = p(x)f (x) dx
R
n
(xi x)(yi y)
r = i=1
n n
(xi x) 2 (yi y)2
i=1 i=1
1 1+r
z = arctanh r = ln
2 1r
follows approximately a normal distribution
of mean
1 1 +
z = ln +
2 1 n1
and standard deviation
1
(z)
n3
[(n 1)/2]
p0,n (r) = (1 r2 )(n4)/2
[(n 2)/2]
i xi yi
1 1.8 1.1
2 3.2 1.6
3 4.3 1.1
4 5.9 3.5
5 8.1 3.8
6 9.9 6.2
7 11.4 5.6
8 12.1 4.0
9 13.5 7.5
10 17.9 7.1
1 1
10 10
x = xi = 8.8100 y = yi = 4.1500
10 10
i=1 i=1
and squares
10
SSxx = (xi x)2 = 233.2690
i=1
10
SSyy = (yi y)2 = 51.9050
i=1
by means of the RV
r
t = n 2
1 r2
6.3 Remark
7. DATA MODELING
Measurements
Data of the form (xi , yi ), i = 1, . . . , n
Synthesis of data into a model
We assume that
so that
search for the minimum
best t of the merit function
Merit function
The choice of the merit function often relies on the so-called
maximum likelihood method
1
n
' [yi R(xi )]2
2
P (x1 , y1 , . . . , xn , yn ) e 2i =
i=1
1 1
n
2 [yi R(xi )]2
2 i=1 i
= e
P (x1 , y1 , . . . , xn , yn ) be maximum
Let
(xi , yi ) R , 1 i n
2
p
i = j j (xi ) + i , 1in, (2)
j=1
E(i ) = 0 E(i k ) = ik k2 1 i, k n .
p
R(x) = aj j (x) , xR,
j=1
n p 2
1
L(1 , . . . , p ) = 2 yi + j j (xi ) (3)
i=1 i j=1
F a = 1 y , (4)
n
j (xi )k (xi )
Fjk = 1 j, k p (5)
i2
i=1
and
k (xi )
ki = 1 k p, 1 i n (6)
i
while
1/
1
a1 y1
1/2 O
.
a = .. 1 = y=
..
.. .
.
ap yn
O 1/n
a = F 1 1 y . (7)
Remark
a1 , . . . , ap independent F diagonal
is a X 2 RV with n p d.o.f.
7.2.3 Goodness of t
Too large values of NSSAR suggest that the model is presum-
ably incorrect and should be rejected
Q = pnp (X 2 ) dX 2 =
NSSAR
+
1 X 2 /2 2 np
= e (X ) 2 1 dX 2
[(n p)/2] 2(np)/2
NSSAR
If however:
we can:
pose = 1;
n p 2
1 1
2
= NSSAR|2 =1 = yi + j (xi )aj
np n p i=1 j=1
E[NSSAR|2 =1 ] = 2 E[NSSAR] = 2 (n p)
p
i = 1 + j j (xi ) + i i = 1, . . . , n
j=2
p
R(x) = a1 + aj j (x)
j=2
n p 2
1
NSSAR = 2 yi a1 aj j (xi ) = X 2 np
i=1 i j=2
n
1
n
1 2
n
1 2
(yi y) =
2
R(xi ) y + yi R(xi )
i2 i2 i2
i=1 i=1 i=1
n
1 1 1
n
y = 2 2 yi weighted mean of yi s
i=1 i
i=1 i
n
1
2 (yi y)2 total variation of data
i=1 i
n
1 2
2 R(xi ) y variation explained by the regression
i=1
i
n
1 2
2 yi R(xi ) residual variation (or NSSAR)
i=1 i
variation explained
residual variation
by the regression
n
1
X 2 n1 = 2 (yi y)2
i=1 i
n
1 2
X 2 p1 = 2 R(xi ) y
i=1 i
that is
H0 : y independent on x
F > F[1](p1,np)
n
p 2
1
NSSm.d. = y aj j (xi )
i=1
i2 /ni i j=1
1
ni
yi = yik
ni
k=1
n
p 2
1
L(1 , . . . , p ) = y j j (xi )
i2 /ni i
i=1 j=1
More precisely:
N n NSSm.d.
> F[1](np,Nn)
n p NSSm.i.
n
p 2
SSm.d. = ni yi aj j (xi )
i=1 j=1
An estimate of 2 is given by
1 i n n
SSm.i.
= (yik y i )2
N n N n i=1
k=1
NSSARp NSSARp+1
NSSARp NSSARp+1
with 1, n p 1 d.o.f.
Therefore:
p parameter model
small FA more satisfactory (H0 )
p + 1 parameter model
large FA more adequate (H1 )
As a conclusion:
p
i = j j (xi ) + i i = 1, . . . , n
j=1
p
0 = j j (x0 ) + 0
j=1
p
y0 = aj j (x0 ) + 0
j=1
and variance
p
var(y0 ) = j (x0 ) k (x0 ) cov(aj , ak ) + 02 =
j,k=1
p
1
= j (x0 ) k (x0 ) F jk
+ 02 ,
j,k=1
2 0
i=1 i
C =
(xi x)2 1
n
0
i2
i=1
and therefore a correlation coecient equal to zero
The random variables m and q are independent!
(while a and b are not in the previous case)
i = + xi + i i = 1, . . . , n
We have again
Remark
The probability that and belong simultaneously to the
respective condence intervals is not (1 )2 = (1 )(1 )
The variables a and b are not independent!
i = + (xi x) + i i = 1, . . . , n
n
(xi x)2 1 NSSAR
= q t[1 2 ](n2)
i2 n2
i=1
Remark
The probability that and belong simultaneously to the
respective condence intervals is (1 )2 = (1 )(1 )
The variables m and q are independent!
i = + (xi x) + i
0 = + (x0 x) + 0
y0 = m + q(x0 x) + 0
i = i = 1, . . . , n and 0 = ,
with
1 2
n n
x = xi SSAR = yi + m + q(xi x)
n
i=1 i=1
1 1
V = + n (x0 x)2 + 1
n
(xi x)2
i=1
Remark
The coecient V decreases when the spread of xi s around
their mean value x grows
Remark
By varying x0 R, the condence interval describes a region
of the (x, y) plane of the form
(x, y) R 2
: |m + q(x x) y| c 1 + g(x x) 2
Remark
In the case of the linear regression model
i = + xi + i i = 1, . . . , n
y0 = a + bx0 + 0
with
n
2
SSAR = yi + a + bxi
i=1
Solution
(i) Since the standard deviations are equal, the chi-square t-
ting reduces to the usual least-squares tting and the best-
t estimates of parameters m and q can be written as:
n
(xi x)yi
1 n
m = yi = 1.8833 q = i=1 = 0.87046
n i=1 n
(xi x)2
i=1
n
SSAR = [m + q(xi x) yi ]2 = 0.22704376
i=1
with:
m = 1.8833
q = 0.87046
SSAR = 0.22704376
24
(xi x)2 = 17.060000
i=1
t[0.975](22) = 2.074
As a conclusion:
We have therefore:
+
Q = np (X 2 ) dX 2
NSSAR
n
1 SSAR
NSSAR = [m + q(xi x) yi ] 2
=
2 2
i=1
SSAR 0.22704376
NSSAR = = = 35.47558
2 0.082
33.924 0.05
35.476 33.924 Q 0.05
35.476 Q =
36.781 33.924 0.025 0.05
36.781 0.025
p
i = j j (xi ) + i i = 1, . . . , n
j=1
It is known from van der Waals theory that the molar enthalpy
of a real gas can be approximately expressed by a relationship
of the form
P
h(T, P ) = 1 T + 2 + 3 P
T
where T denotes the absolute temperature, P stands for the
pressure and 1 , 2 , 3 are appropriate constants related to the
molar specic heat at constant pressure cP , the gas constant
R and the van der Waals parameters a, b:
2a
1 = cP 2 = 3 = b .
R
Stefano Siboni 211
Ph. D. in Materials Engineering Statistical methods
Solution
(i) The linear regression model involves two independent
variables, T and P . Thus we are about to deal with a multiple
linear regression problem. By posing x = (T, P ), the base
functions to be considered are
P
1 (x) = T 2 (x) = 3 (x) = P
T
and the sample values of the independent variables can be de-
noted with
xi = (Ti , Pi ) i = 1, . . . , 20
For a homoscedastic system the common variance 2 can be
neglected and the chi-square method simply reduces to a least-
squares tting. The representation matrix of the normal equa-
tions takes the form
F11 F12 F13
F = F12 F22 F23
F13 F23 F33
1 1 2
20 20
1
F11 = 2 1 (xi )1 (xi ) = 2 Ti = 1.9 106 2
i=1 i=1
1 1
20 20
1
F12 = 2 1 (xi )2 (xi ) = 2 Pi = 2.5 107 2
i=1 i=1
1 20
1 20
1
F13 = 1 (xi )3 (xi ) = Ti Pi = 7.5 109
2 2 2
i=1 i=1
1 20
1 20
Pi2 8 1
F22 = 2 2 (xi )2 (xi ) = 2 = 4.98933 10
i=1
i=1
Ti2 2
1 1 Pi2
20 20
1
F23 = 2 2 (xi )3 (xi ) = 2 = 1.32679 1011 2
Ti
i=1 i=1
1 20
1 20
1
F33 = 3 (xi )3 (xi ) = Pi2 = 3.750 1013
2 2 2
i=1 i=1
1 1
20 20
1
c1 = 2 1 (xi )hi = 2 Ti hi = 2 4.70647 107
i=1 i=1
1 1 Pi
20 20
1
c2 = 2 2 (xi )hi = 2 hi = 2 5.63486 108
Ti
i=1 i=1
1 20
1 20
1
c3 = 2 (xi )hi = Pi hi = 1.74663 10 11
2 2 2
i=1 i=1
The estimates of the regression parameters hold then
a1 c1 29.04383301
a2 = F 1 c2 = 0.335066751
a3 c3 3.44115 105
and the desired model writes therefore
P
h(T, P ) = 29.04383301 T 0.335066751 + 3.44115 105 P
T
A graphical representation of the result follows
with(Statistics);
by posing:
= 0.05
n p = 20 3 = 17
and inserting the numerical estimates
and consequently
1 = 29.04383301 0.0763195
2 = 0.335066751 0.00888647
3 = (3.44115 4.12350) 105
3
var(h0 ) = j (x0 )k (x0 )(F 1 )jk + 2
j,k=1
so that
NSSAR
t[1 2 ](np) var(h0 ) =
np
3 1/2
1
= t[1 2 ](np) j (x0 )k (x0 ) 2 (F 1 )jk + 1
j,k=1
NSSAR 2
=
np
= 1.86425914 103 + 2.43942982 T02 + 0.253954938 P0
2
3 2 P0
1.87428869 10 T0 P0 + 3.30732012 10
T02
2
1/2
P
2.84823259 104 0 + 7.12115332 107 P02
T0
The domain between the lower and the upper surface consti-
tutes the condence region of the regression model, at the
assigned condence level 1 = 95%. Notice that the two
surfaces tend to divaricate at points (T, P ) far from the sam-
pled domain, which means that the CI width increases: we
nd again the same phenomenon already described for the re-
gression straight line (for better clarity, in the picture the CI
half-width has been enlarged by a factor 5).
or, equivalently, to
+
Q = pnp (X 2 )dX 2 =
NSSAR
+
The SVD of A
O
A = V UT
O O
allows us to dene the so-called pseudo-inverse of A (Moore-
Penrose pseudo-inverse)
1
O
A+ = U VT
O O
If A is invertible:
A+ = A1
If not, A+ is however dened
Application
Given the linear set
Ax = b ,
the vector
x+ = A+ b
provides the solution in the sense of the least-squares
F 1 = V (2 )1 V T
9.508032001 0 0
S :=
0 0.7728696357 0
0.9999999999 2.000000000 3.000000000
3.999999999 5.000000000 6.000000000
steepest descent
Levenberg-Marquardt method
n
[yi , y(xi , )]
i=1
n y y(x , )
i i
minp
R
i=1
i
d(z)
(z) = z R
dz
z2
(z) = and (z) = z
2
{h1 , h2 , . . . , hk }
hTj hq = jq j, q = 1, . . . , k
HH T xi i = 1, . . . , n
xTi HH T i = 1, . . . , n
XHH T
so that
orthogonal projections
rows of XHH T
of xi s on Sk
i = xi HH T xi = (I HH T )xi i = 1, . . . , n
n
SS = i2 = tr[X(I HH T )X T ] =
i=1
= tr[XX T ] tr[XHH T X T ] =
Least-squares:
we require that the matrix H and therefore Sk is chosen
in such a way that the residual sum of squares is minimum
SS minimum
tr[H T X T XH] maximum
Result
Let us consider an orthonormal basis of eigenvectors
h1 , h2 , ... , hp
XT X
with
X T Xhi = i hi i = 1, . . . , p
1 2 . . . p 0
Then
we can pose
H = h1 h2 . . . hk
p
tr[X(I HH T )X T ] = i
i=k+1
k
XHH T = (Xhj )hTj
j=1
n
p
p
x2ij = tr(X T X) = j
i=1 j=1 j=1
k
tr[XHH T X T ] = tr[H T X T XH] = j
j=1
p
SS = tr[X(I HH T )X T ] = j
j=k+1
k
x = j hj
j=1
with 1 , . . . , k R
x1 = f1 (xpk+1 , . . . , xp )
x2 = f2 (xpk+1 , . . . , xp )
... ...
xpk = fpk (xpk+1 , . . . , xp )
with:
r = Rank(X) = Rank(X T X) min(n, p)
U = u1 u2 . . . up
u1 , u2 , . . . , up orthonormal basis of eigenvectors of X T X
X T Xui = i2 ui i = 1, . . . , p
according to the eigenvalues of X T X arranged in a de-
creasing order
12 22 . . . p2
and
V = v 1 v 2 . . . v n
v1 , v2 , . . . , vn orthonormal basis of Rn such that
1
vi = Xui i = 1, . . . , r
i
Therefore, if k r the only interesting case we have
k
k
1
k
j vj uTj = j (Xuj )uTj = (Xuj )uTj
j
j=1 j=1 j=1
with:
and
The algorithms for the PCA are therefore the same of the SVD!
1
n
xi x = xi xq
n q=1
() Calculation of t[](n)
with(stats):
Maple 11:
statevalf[icdf, students[n]]();
Excel: TINV(2 2; n)
() Calculation of X 2 [](n)
with(stats):
Maple 11:
statevalf[icdf, chisquare[n]]();
Excel: CHIINV(1 ; n)
() Calculation of z[]
with(stats):
Maple 11:
statevalf[icdf, normald[0, 1]]();
Excel: NORMINV(; 0; 1)
Excel: CHIDIST(NSSAR; n p)
List of topics
1. Experimental errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 General denitions and properties . . . . . . . . . . . . . . . . . . 4
2.2 Examples of discrete RVs . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Examples of continuous RVs . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Remark. Relevance of normal variables . . . . . . . . . . . . . 16
2.5 Probability distributions in more dimensions . . . . . . . . 18
2.5.1 (Stochastically) independent RVs . . . . . . . . . . . . . . . . . . . 19
2.5.2 (Stochastically) dependent RVs . . . . . . . . . . . . . . . . . . . . . 19
2.5.3 Mean, covariance, correlation of a set of RVs . . . . . . . . 20
2.5.4 Multivariate normal (or Gaussian) RVs . . . . . . . . . . . . . 22
3. Indirect measurements. Functions of RVs . . . . . . . . . . . 25
3.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Linear combination of RVs . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Linear combination of multivariate normal RVs . . . . . 28
3.4 Quadratic forms of standard variables . . . . . . . . . . . . . . 30
3.4.1 Craig theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.2 Characterization theorem of chi-square RVs . . . . . . . . . 30
3.4.3 Fisher-Cochran theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.4 Two-variable cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Error propagation in indirect measurements . . . . . . . . 34
3.5.1 Gauss law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5.2 Logarithmic dierential . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5.3 Error propagation in solving a set
of linear algebraic equations . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5.4 Probability distribution of a function of RVs
estimated by Monte-Carlo methods . . . . . . . . . . . . . . . . . 44
4. Sample theory. Sample estimates of and 2 . . . . . . . 48
4.1 Sample estimate of the mean . . . . . . . . . . . . . . . . . . . . . 49
Stefano Siboni -1
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni -2
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni -3
Ph. D. in Materials Engineering Statistical methods
Stefano Siboni -4