1 Introduction To Statistical Inference

Industrial Statistics 1.
Introduction to Statistical Inference
1. Introduction to Statistical
Inference
1 / 53
Industrial Statistics 1. Introduction to Statistical Inference
1.1 Overview
The aim of statistical inference is to make decisions and draw conclusions about populations.
step 1: selection of a suitable distribution family

problem: characteristic X F but F is unknown. Frequently the user has a prior information
about F . It holds that F 2 F = {F# : # 2 }. is called parameter set.
Example:
X = quality of produced bulbs (1 intact, else 0). Here X B(1, p) with p = P (X = 1),
thus # = p, = (0, 1), and F = {B(1, p) : p 2 (0, 1)}.
X = body height. Various studies have shown that the body height is roughly normally
distributed. Thus X N (, 2 ), # = (, ) and = IR (0, 1).
Note that if X is a continuous variable then the true distribution is usually no member of the
selected distribution family. These distribution only provide an approximation.
step 2: drawing of a sample

In order to draw conclusions on #, data x1 , ..., xn are collected. The set of all possible samples
is called sample space.
basic idea: x1 , ..., xn are considered to be realizations of the random sample X1 , ..., Xn from X.
X1 , ..., Xn have the same distribution as X (identically distributed).
In many cases it is assumed that the variables X1 , ..., Xn are independent and identically
distributed (briefly: i.i.d.).
major areas of statistical inference: parameter estimation, confidence intervals, and hypothesis
testing
2 / 53
1.2 Confidence Intervals

Assume that a random sample X1 , . . . , Xn is given with Xi F# , # 2 .
aim: derivation of an area (interval) which contains the parameter # with a given probability
Let 2 (0, 1). Suppose that L and U only depend on the sample variables X1 , ..., Xn . If
P# (L # U ) 1 8# 2 ()
then the interval [L, U ] is called a two-sided 100(1 )% confidence interval for #. L is called
lower control limit, U upper control limit, and 1 is the confidence coefficient.
If both sides in (*) are equal then the confidence interval is called exact.
In practice is usually chosen equal to 0.1, 0.05 or 0.01.

interpretation: Because U L should be small it is desirable to have exact confidence intervals.

L, 1 is called a one-sided lower 100(1 )%-confidence interval for # if
`
P# L #) 1 8#
`
and 1, U is called a one-sided upper 100(1 )%-confidence interval for # if
`
P# # U ) 1 8#.
Example:
risk behavior of a financial investment ; upper c. i.
tear strength of a rope ; lower c. i.
3 / 53
1.2.1 Confidence Intervals for the Parameters of a Normal

Distribution
Suppose that the sample variables X1 , ..., Xn are i.i.d. with Xi N (, 2) for i = 1, .., n.
aim: confidence intervals for if is known
development of the confidence interval: first estimate by X
Since X N (, 2 /n) the following structure is chosen for the confidence interval
[X c p , X + c p ]
n n
with c > 0. c is chosen as a function of such that (*) is valid. Note that

p |X |
2 X c p , X + c p , n c.
n n
p
Since n(X )/ the quantity c is determined such that

p |X | !
P n c = 2 (c) 1 = 1 .
Consequently c = 1 (1 /2) = z/2 . z/2 is the upper 100/2 percentage point of the
standard normal distribution.
100(1 )% confidence interval for ( known)

X z/2 p , X + z/2 p
n n
4 / 53
now: confidence interval for if is unknown
100(1 )% confidence interval for ( unknown)

S S
X tn 1;/2 p , X + tn 1;/2 p
n n
with tn 1;/2 = tn 1 1 (1 /2) (upper 100/2% percentage point of the t distribution with
n 1 degrees of freedom.
Example: mean annual rainfall (in millimeters) in Australia from 1983 to 2002:
1983 1984 1985 1986 1987 1988 1989 1990 1991 1992
499.2 555.2 398.8 391.9 453.4 459.8 483.7 417.6 469.2 452.4
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002
499.3 340.6 522.8 469.9 527.2 565.5 584.1 727.3 558.6 338.6
It is n = 20, x = 485.755, s = 90.33872, and t19;0.025 = 2.093. Thus the confidence interval is
given by
[485.755 42.27934, 485.755 + 42.27934] = [443.4757, 528.0343].
5 / 53
program:
year <- c(1983:2002);

rain <- c(499.2, 555.2, 398.8, 391.9, 453.4, 459.8, 483.7, 417.6, 469.2, 452.4,
499.3, 340.6, 522.8, 469.9, 527.2, 565.5, 584.1, 727.3, 558.6, 338.6);
# (or rain <- read.table("rain.txt");
# use setwd() (set working directory) and getwd() (get working directory))
#--- histogram ---#
hist(Rain, breaks = 8, freq = FALSE, main = "histogram",

xlab = "Mean Annual Rainfall", ylab = "");
#--- box plot ---#
boxplot(rain, range = 0, ylab = "Mean Annual Rainfall");
#--- normal qq plot ---#
qqnorm(rain, datax = TRUE, main = "Normal QQ Plot");

qqline(rain, datax = TRUE);
6 / 53
Histogram
output:
0.008
700
0.006
Mean Annual Rainfall
600
0.004
500
0.002
400
0.000
300 400 500 600 700
Mean Annual Rainfall Normal QQ Plot

2

1
Theoretical Quantiles
400 500 600 700
Sample Quantiles 7 / 53
program:
#--- confidence intervall ---#
alpha <- 0.05;

n <- length(rain);
lcl <- mean(rain) - qt(1 - alpha / 2, n - 1) * sd(rain) / sqrt(n);
ucl <- mean(rain) + qt(1 - alpha / 2, n - 1) * sd(rain) / sqrt(n);
ci <- c(lcl, ucl);
print(ci);
output:
result: [443.4752, 528.0348]
8 / 53
now: confidence interval for if is unknown
100(1 )% confidence interval for 2

" #
(n 1) S 2 (n 1) S 2
2
, 2
n 1;/2 n 1;1 /2
9 / 53
1.2.2 Large-Sample Confidence Intervals

Using the central limit theorem large-sample confidence intervals for arbitrary distributions
(discrete or continuous) can be derived. This means that
lim P# (L(X1 , .., Xn ) # U (X1 , .., Xn )) 1 8 # 2 .

n!1
Example: Suppose that X1 , X2 , ..., are i.i.d. with E(Xi ) = for all i 1.
i) confidence interval for if 2 = V ar(Xi ) is known
large-sample confidence interval for ( known)

X z/2 p , X + z/2 p
n n
rule of thumb: n 30
ii) confidence interval for if 2 = V ar(Xi ) is unknown
large-sample confidence interval for ( unknown)

S S
X z/2 p , X + z/2 p
n n
rule of thumb: n 40
10 / 53
example: mercury contamination in largemouth bass (in ppm) - a sample of fish was selected
from 53 Florida lakes
1.230 0.490 0.490 1.080 0.590 0.280 0.180 0.100 0.940

1.330 0.190 1.160 0.980 0.340 0.340 0.190 0.210 0.400
0.040 0.830 0.050 0.630 0.340 0.750 0.040 0.860 0.430
0.044 0.810 0.150 0.560 0.840 0.870 0.490 0.520 0.250
1.200 0.710 0.190 0.410 0.500 0.560 1.100 0.650 0.270
0.270 0.500 0.770 0.730 0.340 0.170 0.160 0.270
It holds that n = 53, x = 0.5319583, s = 0.3567051, and z0.025 = 1.96. Thus the asymptotic
confidence interval is equal to [0.4311, 0.6188].
program:
#--- histogram ---#
hist(MerCon, breaks = 14, freq = FALSE, main = "histogram",

xlab = "concentration", ylab = "");
qqnorm(MerCon, datax = TRUE, main = "normal qq plot");

qqline(MerCon, datax = TRUE);
11 / 53
output:
Histogram Normal QQ Plot

1.2
2

1.0

1

0.8
0

0.6

0.4
1

0.2
2
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0.0 0.2 0.4 0.6 0.8 1.0 1.2
Concentration Sample Quantiles
12 / 53
program:
#--- confidence interval ---#
alpha <- 0.05;

n <- length(MerCon);
lcl <- mean(MerCon) - qnorm(1 - alpha / 2) * sd(MerCon) / sqrt(n);
ucl <- mean(MerCon) + qnorm(1 - alpha / 2) * sd(MerCon) / sqrt(n);
ci <- c(lcl, ucl);
print(ci);
output:
result: ci = [0.4311, 0.6188]
13 / 53
Overview: Confidence Intervals

distribution # # confidence interval
n large X X p z/2 X + p z/2 (or S instead of if unknown)

rn `
n
r `
X 1 X X 1 X
B(1, p), n very large p X X n
z/2 p X + n
z/2
h i h i
n X+ 1 z 2 B n X+ 1 z 2 + B
2 n /2 2 n /2
B(1, p), n large p X p with
n+z 2 n+z 2
/2
r /2
`
X 1 X ` 1
2
B = z1 /2 n
+ 2n
z/2
N (, 2 ), 2
known X X p z/2 X + p z/2
n n
2 S
. . ., unknown X X p tn 1;/2 X + pS tn 1;/2
n n
N (, 2 ), known 2
S 2 n S 2 2
n S 2
2 2
n;/2 n;1 /2
2 (n 1) S 2 (n 1) S 2
. . ., unknown S2 2 2
2
n 1;/2 n 1;1 /2
1 2 2
2-dim. normal distr.

p + 1p z/2
z/2
n n
n
2 1 X 2
with S = (Xi )
n
i=1
14 / 53
1.3 Hypothesis Testing

1.3.1 Introduction
Let X be the characteristic of interest with X F# , # 2 . The test problem is given by
H0 : # 2 0 against H1 : # 2 1 . H0 is called null hypothesis and H1 is called alternative
hypothesis . 0 and 1 are disjoint and 0 [ 1 = . A decision problem between two
hypotheses is present, a so-called (test problem)
Example: burning rate of solid propellant - H0 : = 50 (centimeters per second) against

H1 : 6= 50
procedure: Based on the sample x1 , . . . , xn a decision about a particular hypothesis is made.

Such a decision rule is called a statistical test.
Table: Type I Error and Type II Error
reality
decision
# 2 0 # 2 1
H0 not rejected no error type II error
H1 accepted type I error no error
15 / 53
procedure:
An upper bound for the type I error is fixed, e.g. 2 0.01, 0.05, 0.1 . The critical region C
for the test (reject H0 ) is determined such that the type I error fulfills this condition.
Such a test is called test of significance at level for H0 if the probability of a type I error is
smaller or equal to , i.e.
P# ((x1 , .., xn ) 2 C) for all # 2 0 .

is called significance level.
Because only the type I error and not the type II error is controlled by a test of significance, the
size of the type II error may be large. For that reason it is only possible to accept H1 , i.e. to
reject H0 . It is incorrect to accept the null hypothesis H0 .
16 / 53
1.3.2 Tests for Univariate Samples

Suppose that the random sample is i.i.d. with Xi N (, 2)
test problem: H0 : = 0 against H1 : 6= 0
Gauss test ( known)
p X 0
n| | > z/2 ; accept H0 (reject H0 )
p X 0
n| | z/2 ; fail to reject H0
t test ( unknown)
p X 0
n| | > tn 1;/2 ; accept H1 (reject H0 )
S
p X 0
n| | tn 1;/2 ; fail to reject H0
S
test problem: H0 : 2 = 2 against H1 : 2 6= 2

0 0
H0 is rejected if
S2 2 S2 2
(n 1) 2
< n 1;1 /2 or (n 1) 2
> n 1;/2
0 0
17 / 53
p
Example: Power function G() = P ( n|X 0 |/ > z/2 ) of the two-sided Gauss test (i.e.
probability to accept H1 as a function of ) for = 0.05, n = 5, = 1 and 0 = 0)
G()
1.0 qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq......qqqqqqqqqqqq qqqq.qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

qqqqqq .....
qqq qqq ....
qqq qq ........
.
...
....
qqqq q
q
... ..
qq
.... ....
.q
...qq ..q
.. .
qq......q
.... ...
.... qq
.
.. ... ...
.. q
qqq.....
.... .... ....
.... qq
.. .. ..
.. qq qq .......
.... .... ....
.... qq q
.. .. ..
.. q qq ......
.... .... ....
.... qq q
.. .. ..
.. q qq ......
.... .... ....
.... qq q
0.8 .. .. ..
.. qq qq ......
.... .... ....
.... qq q
.. .. ..
qqq q .....
..... .... .....
qqq q
q
qq ......
.. .. ..
.... ..... ....
q
qqq q
q
qq ......
.. .. . ..
.... .... ... ....
q
.
q q
..
qqq qq ......
.. .. ..
.... .... ... ....
q
.
q q
..
qqq
.. .. ..
q ......
.... .... ... ....
q
.
qqq q
.. .. .. ..
q ......
.... .... ... ....
q
.
qqq q
0.6 .. .. .
... ..
q ......
.... .... ....
q
.
q
.
q
.
qqq
.. .. ... ..
q ......
.... .... ....
q
.
.
qqq q
.. .. .. ..
...
q ......
.... .... ....
q q
.
qqq q
.
qq ......
.. .. ... ..
.... .... ....
q
.
q
.
qqq q
..
qq
.. .. ..
.... .... ... ....
q q
.
qqq q
..
qq
.. .. ..
.... .... ... ....
q q
.
qqq q
.. ..
qq
.. .. ... ..
.... .... ... ....
..qq ..qqq
. .
.. ..
0.4 .. .. ..
..qq
... ....
..q
.... ..... ....
....qq .q...q
. .
q
.. .. ... ... ..
.. qq qqq......
.... .... .... .... ....
.... qq q
.. .. .. .. ..
.. qq qq......
.... .... .... .... ....
.... qq q
.. .. .. .. ..
.. qq qq ......
..... .... ..... .... .....
.... qq q
.. .. .. .. ..
.. q qq ......
.... .... .... .... ....
.... qq q
.. .. .. .. ..
.. q q .....
.... .... .... .... ....
.... qq
q
q
.. qq
.. .. .. .. ..
q ......
.... .... .... .... ....
q
qqq .... qq .... qq
0.2 .. .. .
...
.
... ..
.... .... . . ....
qqq .... q .... q

.. .. .. .. ..
.... .... .... ... ... ....
q
. .
q q
.
qqq ..... qq .....

.. .. .. ... ..
.... .... .... .
. .
. .
. ....
qq .. qq .. q
.. .. .. .. ..
.............................................................................................................................................................................................................................q.....q.q..q...........................................................................................................................................................................................................................
.... .... .... .
. .
. ...
. ....
.. .. .. .
.... .... ..... . . ...
.
.
..
....
. . .. .
.. .. .. .. .. .. .. .. .. .. .. .. .. ..
.... .... .... ... .... .... ....
.. .. .. ... .. .. ..
3 2 1 0 1 2 3
"
H1 H1 !
H0
18 / 53
Large-Sample Tests
Suppose that the variables X1 , X2 , ... are i.i.d. with E(Xi ) = and V ar(Xi ) = 2 for
i = 1, .., n.
Because in most cases the distribution of the underlying characteristic X is unknown

approximations to the critical values are determined using the asymptotic distribution of the test
statistic..
test problem: H0 : = 0 against H1 : 6= 0
p
The null hypothesis H0 is rejected if n| X S0 | > z/2 .
rule of thumb: n > 100, for 30 n 100 use tn 1;/2 instead of z/2
19 / 53
Tests for the Mean and the Variance of a Single Sample

The random sample X1 ,..,Xn is assumed to be i.i.d..
distribution H0 H1 test statistic T T under H0 reject H0 if

X 0
normal distr. = 0 6= 0 p |T | > z/2
/ n
known 0 > 0 T > z
0 < 0 T < z
X 0
unknown = 0 6= 0 p tn 1 |T | > tn 1;/2 (z/2 )
S/ n
0 > 0 ( for T > tn 1; (z )
0 < 0 n > 100) T < tn 1; ( z )
X 0
arbitrary distr. = 0 6= 0 p approx. |T | > z/2 (tn 1;/2 )
S/ n
with EW 0 > 0 (tn 1 for T > z (tn 1; )
0 < 0 30 n 100) T < z ( tn 1; )

binomial distr. p = p0 p 6= p0 number B(n, p0 ) T 62 c1 /2 , c/2
n small p p0 p > p0 of successes T > c
p p0 p < p0 T < c1 (percentage points
of B(n, p0 ))
(p p0 )
n 100 or p = p0 p 6= p0 p approx. |T | > z/2
p0 (1 p0 )/n
np(1 p) > 5 p p0 p > p0 T > z
p p0 p < p0 T < z
2 2 2 2
normal distr. = 0 6= 0 (n 1)S 2 / 0
2 2
n 1 T 62 [ 2
n 1;1 /2
, 2
n 1;/2
]
2 2 2 2
0 > 0 (nS 2 / 0
2
) ( 2
n)

T > 2
n 1;
2 2 2 2 2
0 < 0 T < n 1;1
n
X n
2 1 2 2 1 X 2
for known expectation , S = (Xi X) , S = (Xi )
n 1 n 20 / 53
i=1 i=1
Example: performance of new golf clubs - ratio of outgoing velocity of a golf ball to the
incoming velocity (coefficient of restitution)
0.8411 0.8191 0.8182 0.8125 0.8750

0.8580 0.8532 0.8483 0.8276 0.7983
0.8042 0.8730 0.8282 0.8359 0.8660
test problem: H0 : 0.82 against H1 : > 0.82
Since n = 15, x = 0.83724, s = 0.0245571 the test statistic is equal to

T = x p0.82
= 2.718979. Assuming normality the percentile is t14;0.05 = 1.761. Because it is
s/ 15
smaller than T the null hypothesis is rejected.
program:
CoR <- c(0.8411, 0.8191, 0.8182, 0.8125, 0.8750, 0.8580, 0.8532, 0.8483, 0.8276,
0.7983, 0.8042, 0.8730, 0.8282, 0.8359, 0.8660);
#--- box plot ---#
boxplot(CoR, range = 0, ylab = "Coefficient of Restitution");
#--- histogram ---#
hist(CoR, breaks = 8, freq = FALSE, main = "Histogram",

xlab = "Coefficient of Restitution", ylab = ""); 21 / 53
program:
qqnorm(CoR, datax = TRUE, main = "Normal QQ Plot");

qqline(CoR, datax = TRUE);
#---unpaired t-test---#
t.test(CoR, alternative = "two.sided", mu = 0.82, conf.level = 0.95);

#---> t-test for mu.0 = 0.82 including confidence level
#---alternative procedure: direct calculation of the quantities---#
mu.0 <- 0.82;

n <- length(CoR);
T <- (mean(CoR) - mu.0) / sd(CoR) * sqrt(n); #---> test statistics
p.value <- dt(T, n - 1); #---> directly to p-value
22 / 53
Histogram
output:
20
0.86
Coefficient of Restitution
15
0.84
10
0.82
5
0.80
0
0.80 0.82 0.84 0.86 0.88
Coefficient of Restitution Normal QQ Plot

0.80 0.82 0.84 0.86
Sample Quantiles 23 / 53
The P Value Approach

The p-value is equal to the probability to reject the null hypothesis for the given data set.
Thus it is the smallest level of significance that would lead to rejection of the null hypothesis H0
for the given data. It can be cosnidered as the observed significance level.
For the Gauss test it holds that = P0 (T > t|T = t) = 1 (t).
1-sided Gauss test
.......
........... ................
...... ......
= 0.05 .....
.....
......
.....
.....
.....
.....
.....
= 0.0179
.... ....
.... ....
.... ....
.... ...
... ...
... ...
.... ...
. ...
... ...
.
.... ...
.. ...
... ...
.... ...
...
.
... ...
..... ...
... ...
... ...
... ...
. ...
.. ...
... ...
...
. ...
.. ...
... ...
...
. ...
...
.
... ...
..... ...
. ...
.... ....
......
. ....
....
.
.... ....
.
....
....
.... 1 ....
....
.......
.....
.
....
.......
.
. ..
... ......
.... ........
.....
......
..... .
... ..... ......
..... ......
.
.....
.... .... ......
...... ......
...... .
...
....... ......
....... . ... ......
r r
............ .... .... .............. ..........
.. .
.......... ... ... ..........................
.............. ................ ... ... ...... ................
......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
... /
..
....
...
.
....
...
z
1 ....
. t
...
.
....
...
.
....
...
.
....
...
H0 H1 .
....
...
.
....
...
..
The smaller the p-value the more unlikely isH0 .
If an upper bound for the type I error is given then H0 is rejected if < . Else, H0 is not
rejected.
24 / 53
Relationship between tests of significance and confidence

intervals
To illustrate the relationship suppose that X N (, 2 ). The sample variables are assumed to
be independent and identically distributed. We are interested in statements about the
expectation.
Suppose that a confidence interval for with level 1 is given. Then

S S
ci = X tn 1;1 /2 p , X + tn 1;1 /2 p .
n n
In order to deal with the test problem H0 : = 0 against H1 : 6= 0 , it is sufficient to

check whether 0 2 ci. If this is the case then H1 is accepted. This procedure is
equivalent to the t-test.
Suppose that the t-test for the test problem H0 : = 0 against H1 : 6= 0 is given.
Then the test statistic is equal to
p X 0
T = n .
S
Consequently
P (|T | tn 1;1 /2 ) =1 = P0 (0 2 KI)
with ci as above. ci is a confidence interval for with confidence level 1 . Thus the
test directly provides a confidence interval. 25 / 53
Overview: Statistical Inference for a Single Sample
1.4 Statistical Inference for Two Samples
Suppose that X1 and X2 are independent characteristics. Let X11 ,..,X1n1 be a random sample
of X1 and let X21 ,..,X2n2 be a random sample of X2 .
Two Independent Samples

Assumption: Suppose that X11 , .., X1n1 , X21 , .., X2n2 are independent. Let E(Xi ) = i and
V ar(Xi ) = i2 for i = 1, 2.
Test on Dierence in Means

Test of Equality of the Variances

1 2 1 2
27 / 53
Tests for the Means and the Variances of a Bivariate

Sample I
normal distr. 1 = 2 1 6= 2 rX1 X2

|T | > z/2
2 2
1+ 2
n1 n2
1 and 2 1 2 1 > 2 T > z
known 1 2 1 < 2 T < z
X1 X2
1, 2 unknown 1 = 2 1 6= 2 q tn1 +n2 2 |T | > tn +n
1 2 2;/2
2( 1 + 1 )
n1 n2
1 = 2, 1 2 1 > 2 T > tn1 +n2 2;
small sample size 1 2 1 < 2 T < tn1 +n2 2;
1, 2 unknown 1 = 2 1 6= 2 rX1 X2
tdf , df = |T | > tdf ;/2
S2 S2
1+ 2
n1 n2
(1+R)2
1 6= 2, 1 2 1 > 2 T > tdf ;
R2 + 1
n1 1 n2 1
n2 S 2
1 2 1 < 2 R= 1 T < tdf ;
n1 S 2
2
n
X
2 1 2 1 1
2 = n +n S 2 + n +n S2
1 n 2 n
with Sk = (Xki Xk ) , k = 1, 2,
n 1 1 2 2 1 1 2 2 2
i=1 28 / 53
Tests for the Means and the Variances of a Bivariate

Sample II
arbitrary distr. 1 = 2 1 6= 2 rX1 X2

|T | > z/2
S2 S2
1+ 2
n1 n2
1, 2 unknown, 1 2 1 > 2 T > z
large sample size 1 2 1 < 2 T < z
p1 p2
binomial distr. p1 = p2 p1 6= p2 q |T | > z/2
p(1 p)( 1 + 1 )
n1 n2
large sample size p1 p2 p 1 > p2 n1 p1 +n2 p2 T > z

with p = n1 +n2
p1 p2 p 1 < p2 T < z
2 2 2 2 S2
normal distr. = 6= 1 Fn1 1,n2 1 T > Fn
1 2 1 2 S2 1 1,n2 1;/2
2
or
T < Fn 1,n2 1;1 /

1
2 2 2 2
1 2 1 > 2 T > Fn1 1,n2 1;
2 2 2 2
1 2 1 < 2 T < Fn1 1,n2 1;1
29 / 53
Example: arsenic in drinking water - drinking water arsenic concentration in parts per billion
(ppb) for 10 metropolitan Phoenix communities and 10 communities in rural Arizona
Metro Phoenix Rural Arizona

Phoenix, 3 Rimrock, 48
Chandler, 7 Goodyear, 44
Gilbert, 25 New River, 40
Glendale, 10 Apache Junction, 38
Mesa, 15 Buckeye, 33
Paradise Valley, 6 Nogales, 21
Peoria, 12 Black Canyon City, 20
Scottsdale, 25 Sedona, 12
Tempe, 15 Payson, 1
Sun City, 7 Casa Grande, 18
program:
#--- arsenic concentration ---#
MetroPhoenix <- c(3, 7, 25, 10, 15, 6, 12, 25, 15, 7);
RuralArizona <- c(48, 44, 40, 38, 33, 21, 20, 12, 1, 18);
ArsenicConcentration <- cbind(MetroPhoenix, RuralArizona);
30 / 53
program:
#--- box plot ---#
boxplot(ArsenicConcentration);
#--- normal q-q plot ---#
qqnorm(MetroPhoenix, datax = TRUE, main = "Normal Q-Q Plot: Metro Phoenix");

qqline(MetroPhoenix, datax = TRUE);
qqnorm(RuralArizona, datax = TRUE, main = "Normal Q-Q Plot: Rural Arizona");

qqline(RuralArizona, datax = TRUE);
#--- test for variance homogeneity ---#
var.test(MetroPhoenix, RuralArizona, ratio = 1, conf.level = 0.95);
#--- unpaired t-test ---#
t.test(MetroPhoenix, RuralArizona, alternative = "two.sided", paired = FALSE,

var.equal = FALSE, conf.level = 0.95);
31 / 53
output:
40
30
20
10
0 MetroPhoenix RuralArizona
Normal QQ Plot: Metro Phoenix Normal QQ Plot: Rural Arizona

1.5
1.5

1.0
1.0


0.5
0.5

0.0

0.0

1.5 1.0 0.5
1.5 1.0 0.5
5 10 15 20 25 0 10 20 30 40
Sample Quantiles Sample Quantiles 32 / 53

output:
F test to compare two variances

data: MetroPhoenix and RuralArizona
F = 0.2473, num df = 9, denom df = 9, p-value = 0.04936
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.06143758 0.99581888
sample estimates:
ratio of variances 0.2473473
Welch Two Sample t-test

data: MetroPhoenix and RuralArizona
t = -2.7669, df = 13.196, p-value = 0.01583
alternative hypothesis: true difference in means is not equal to 0
-26.694067 -3.305933
sample estimates:
mean of x mean of y
12.5 27.5
33 / 53
Two Paired Samples

now: The characteristics X1 and X2 are not assumed to be independent.
Example: running time for a marathon before (characteristic X1 ) and after a training camp
(characteristic X2 )
idea: Using a transformation the data are transferred to a univariate sample which can be
handled by well-known methods. Consider D = X2 X1 .
assumption: The random variables Di = X2i X1i , i = 1, . . . , n are i.i.d. with

Di N (2 1 , 2 ), i = 1, .., n.
test problem
H0 : 2 1 against H1 : 2 < 1
test statistic:
p D
T = n v
u n
u 1 X ` 2
t Di D
n 1 i=1
Under H0 it holds that T tn 1. Accept H1 if T < tn 1; .
34 / 53
Example: influence of a new diet on the blood cholesterol level
subject 1 2 3 4 5 6 7 8 9 10
before 223 259 248 220 287 191 229 270 245 201
after 220 244 243 211 299 170 210 276 252 189
dierence di -3 -15 -5 -9 12 -21 -19 6 7 -12
It holds that d = 5.9 and
n
X
1 ` 2
di d = 129.65.
n 1 i=1
Thus
p 5.9
t= 10 p = 1.639 .
129.65
Choosing = 0.1 ist t9;0.9 = t9;0.1 = 1.383. Since t < t9;0.9 the alternative hypothesis
H1 : 2 < 1 is accepted.
35 / 53
program
#--- cholesterol treatment ---#
X <- c(223, 259, 248, 220, 287, 191, 229, 270, 245, 201);
Y <- c(220, 244, 243, 211, 299, 170, 210, 276, 252, 189);
#--- paired t-test ---#
t.test(X, Y, alternative = "two.sided", paired = TRUE, var.equal = TRUE,

conf.level = 0.95);
output:
Paired t-test
data: X and Y
t = 1.6385, df = 9, p-value = 0.1357
alternative hypothesis: true difference in means is not equal to 0
-2.245511 14.045511
sample estimates:
mean of the differences
5.9
36 / 53
Overview: Statistical Inference for Two Samples
1.5 Goodness-of-Fit Tests

Suppose that the sample variables X1 ,..,Xn are i.i.d. with Xi F for i = 1, .., n.
test problem: H0 : F 2 F0 against H1 : F 2 D F0 , where F0 D
Example: If F = N (, 2) then F0 = {N (, 2) : 2 IR, > 0}

problem: How to choose F0 in practice? ; histogram, kernel estimator, empirical distribution
function
Empirical cdf
1.0
0.8
0.6
Fn(x)
0.4
0.2
0.0
0.04 0.02 0.00 0.02 0.04
time period: WIG 01.01.2002 31.12.2003 (# 498)
38 / 53
1.5.1 Distribution Function is Known

Test of Kolmogorov
decision: If D = sup |Fn (x) F0 (x)| is sufficiently large then H0 is rejected.
x2R
The distribution of D under H0 is tabulated. It does not depend on F0 if F0 is continuous!
application in practice: Suppose that F0 is continuous and x1 x2 . . . xn .

; D = max Fn (xi ) F0 (xi ), F0 (xi ) Fn (xi 1 ) with Fn (x0 ) := 0
1in
Example: Test on a Standard Normal Distribution (F0 = )

sample: x1 = 1.0, x2 = 0.5, x3 = 0.5, x4 = 1.5
i xi Fn (xi ) (xi ) Fn (xi 1) maxi

1 -1.0 0.25 0.159 0 0.159
2 0.5 0.75 0.691 0.25 0.441
3 0.5 0.75 0.691 0.75 0.059
4 1.5 1 0.933 0.75 0.183
Thus d = 0.441. For = 0.1 we get that c0.1 = 0.565. Because d c0.1 the null hypothesis
H0 : F = is not rejected.
program
x <- c(-1.0, 0.5, 0.5, 1.5);
ks.test(x, pnorm) 39 / 53
Goodness-of-Fit Test of Pearson

Example: Suppose that a die is symmetric, i.e. that P (X = i) = 1/6 for 1 i 6.
first: X is discrete taking the r dierent values t1 , . . . , tr
test problem: H0 : P (X = ti ) = pi for all i = 1, . . . , r against H1 : P (X = ti ) 6= pi for at least

one i
Si is equal to the number of observations equal to ti . The expected number is equal to npi .
Xr ` 2
Si npi
test statistic: Q =
i=1
npi
If Q > 2 then H0 is rejected.

r 1;
Remark: The asymptotic test can be applied if npi 5 for all i. Contrary to the test of
Kolmogorov the test of Pearson can be applied to discrete as well as continuous distributions.
Example: Symmetry of a Die

Result of n = 120 thrown dices:
1 2 3 4 5 6
24 12 15 25 16 28
We get
q = 16/20 + 64/20 + 25/20
+ 25/20 + 16/20 + 64/20 = 210/20 = 10.5.
For = 0.1 it holds that 25;0.1 = 9.236. Since it is smaller than q the hypothesis H0 is
rejected. The die is not symmetric. 40 / 53
program
y <- c(24, 12, 15, 25, 16, 28);
x <- c(1,1,1,1,1,1); x <- x/6;
chisq.test(y, p=x)
41 / 53
1.5.2 Known Distribution Family

n o

Let F0 = F0 : 2 R, > 0 , F0 known with expectation 0 and variance 1 (e.g.,
F0 = ).
test problem: H0 : F 2 F0 against H1 : F 2 D F0
graphical method: quantile

plot
x 1`
approach: If F (x) = F0 then F0 F (x) = F0 1
F0 x
= x
.
`
Replace F (x) by Fn (x(i) ) and consider F0 1 Fn (x(i) ) .
(
(i 1/2)/n
Since F0 1 (1) = 1 the following quantities are used pi =
(i 3/8)/(n + 1/4)
idea: Using vi = F0 1 (pi ) it holds that vi (x(i) )/ , i.e. x(i) + vi .
decision:
`
If the points vi , x(i) are roughly lying on a straight line then H0 is rejected.
determination of the straight line: least-squares approach

Xn
` 2 !
x(i) a bvi = min ; x = x v + v
a,b
i=1
v = 0 ; estimator of : x v , v = 1 ; estimator of + : x + v
Remark: If F0 is a symmetric distribution function then v = 0 and thus x = x + v. 42 / 53
Returns of the WIG-Index
Normal QQ Plot Student t (5 df) QQ Plot

0.04
0.04

0.02
0.02

Sample Quantiles

0.0119 0.012
0.000796
0.000796

rWIG

0.00
0.00

0.02
0.02

0.04
0.04
3 2 1 0 1 2 3 4 2 0 2 4
Theoretical Quantiles v
p
vi = 1 i 1/2 vi = t5 1
i 1/2
/ 5/(5 2)
n n
x = 0.000796, s = 0.0120, = 0.0119 x = 0.000796, s = 0.0120, = 0.0120
time period: 01.01.2002 31.12.2003 (# 499) time period: 01.01.2002 31.12.2003 (# 499)
43 / 53
Test of Shapiro and Wilk on Normality

x
assumption: Let F0 = , i.e. F (x) = .
n
!2
X ` `
1
X(i) X (i 1/2)/n
i=1
test statistic: W = n n
X ` 2 X ` 2
1
X(i) X (i 1/2)/n
i=1 i=1
If W < c then H0 is rejected.

1`
motivation: consider the points (i 1/2)/n , x(i) . These points should roughly lie on
a straight line if H0 is valid (the absolute value of ; % is close to 1). W is an estimator of %2 .
Example: We consider the mean annual rainfall data from Australia considered above.
program
x <- c(499.2,...); shapiro.test(x)
We get W = 0.9536 and a P-value of 0.4248. Thus the normal assumption is not rejected.
Modified Test of Kolmogorov on Normality

`

test statistic: D = sup Fn (x) (x X)/S
x2R
If D > c then H0 is rejected.
problem: The distribution of D does not depend on F0 ! 44 / 53
Overview: Goodness-of-Fit Tests
1.6 Appendix
discrete distributions
probability mass parameter expectation variance
function f (m) set
2
`
= E(X) = E [X ]2
n
m n m
binomial p (1 p) 0<p<1 np n p (1 p)
B(n, p) m n 2 {1, 2, . . .}
m 2 {0, 1, . . . , n}
`M `N M
m n m M M N M N n
hyper- `N N 2 {1, 2, . . .}, n n
geometric M 2 {0, 1, . . . , N }, N N N N 1
n
H(N, M, n) n 2 {1, 2, . . . N }
m 2 {mmin , m (bei N > 1)

max },
min +1, . . . , m
mmin := max 0, n (N M ) ,
mmax := min{n, M }
m
Poisson e >0
P( ) m!
m 2 {0, 1, . . .}
1 1 p
geometric p (1 p)m 1 0<p<1
G(p) p p2
m 2 {1, 2, . . .} 46 / 53
continuous distributions
density f parameter expectation variance
set
2
= E(X) = E(X )2
1 a+b (b a)2
, x 2 [a, b] 1 < a <
uniform b a b<1 2 12
distr.
U (a, b)
(x )2
1 2
p e 2 2 , x 2 IR 2 IR, >0
normal distr. 2 2
N (, 2 )
x 1 1
f (x) = e , x 0 >0
exponential 2
distr.
E( )
r
r 1 x r r
x e , x 0 > 0, r > 0
Gamma (r) 2
distr.
47 / 53
continuous distributions (cont.)

density f parameter set expectation variance
2
= E(X) = E(X )2
1 n 1 x
n `n x2 e 2 , x > 0, n 2 IN n 2n
2
n -Vert. 22 2
2
n
2
n+1
2
1 + xn n
p , x 2 IR, n 2 IN 0 (n > 1) (n > 2)
t-distr. B(n/2, 1/2) n n 2
(Student)
tn
(m/n)m/2 m m m+n
n 2 n2 (m + n 2)
1 2
x 2 1+ x ,
F -distr. B( m
2
, n
2
) n n 2 m (n 2)2 (n 4)
Fm,n
x 0, m, n 2N (n > 2) (n > 4)
R1 (x) (y)
Note that (x) = 0
exp( t)tx 1 dt and B(x, y) = (x+y)
.
48 / 53
Percentage Points of the Standard Normal Distribution

Let z = 1 (1 ) , = 1 ( z ) and z =
(z ) = z1 .
1 z 1 z 1 z 1 z
0.9999 3.7190 0.9975 2.8070 0.965 1.8119 0.83 0.9542
0.9998 3.5401 0.9970 2.7478 0.960 1.7507 0.82 0.9154
0.9997 3.4316 0.9965 2.6968 0.955 1.6954 0.81 0.8779
0.9996 3.3528 0.9960 2.6521 0.950 1.6449 0.80 0.8416
0.9995 3.2905 0.9955 2.6121 0.945 1.5982 0.79 0.8064
0.9994 3.2389 0.9950 2.5758 0.940 1.5548 0.78 0.7722
0.9993 3.1947 0.9945 2.5427 0.935 1.5141 0.76 0.7063
0.9992 3.1559 0.9940 2.5121 0.930 1.4758 0.74 0.6433
0.9991 3.1214 0.9935 2.4838 0.925 1.4395 0.72 0.5828
0.9990 3.0902 0.9930 2.4573 0.920 1.4051 0.70 0.5244
0.9989 3.0618 0.9925 2.4324 0.915 1.3722 0.68 0.4677
0.9988 3.0357 0.9920 2.4089 0.910 1.3408 0.66 0.4125
0.9987 3.0115 0.9915 2.3867 0.905 1.3106 0.64 0.3585
0.9986 2.9889 0.9910 2.3656 0.900 1.2816 0.62 0.3055
0.9985 2.9677 0.9905 2.3455 0.890 1.2265 0.60 0.2533
0.9984 2.9478 0.9900 2.3263 0.880 1.1750 0.58 0.2019
0.9983 2.9290 0.9850 2.1701 0.870 1.1264 0.56 0.1510
0.9982 2.9112 0.9800 2.0537 0.860 1.0803 0.54 0.1004
0.9981 2.8943 0.9750 1.9600 0.850 1.0364 0.52 0.0502
0.9980 2.8782 0.9700 1.8808 0.840 0.9945 0.50 0.0000
49 / 53
Percentage Points of the t-distribution (tn; = tn 1 (1 ))

1
n 0.9 0.95 0.975 0.98 0.99 0.995 0.999
1 3.078 6.314 12.706 15.895 31.821 63.657 318.309
2 1.886 2.920 4.303 4.849 6.965 9.925 22.327
3 1.638 2.353 3.182 3.482 4.541 5.841 10.215
4 1.533 2.132 2.776 2.999 3.747 4.604 7.173
5 1.476 2.015 2.571 2.756 3.365 4.032 5.893
6 1.440 1.943 2.447 2.612 3.143 3.707 5.208
7 1.415 1.895 2.365 2.517 2.998 3.499 4.785
8 1.397 1.860 2.306 2.449 2.896 3.355 4.501
9 1.383 1.833 2.262 2.398 2.821 3.250 4.297
10 1.372 1.812 2.228 2.359 2.764 3.169 4.144
11 1.363 1.796 2.201 2.328 2.718 3.106 4.025
12 1.356 1.782 2.179 2.303 2.681 3.055 3.930
13 1.350 1.771 2.160 2.282 2.650 3.012 3.852
14 1.345 1.761 2.145 2.264 2.624 2.977 3.787
15 1.341 1.753 2.131 2.249 2.602 2.947 3.733
16 1.337 1.746 2.120 2.235 2.583 2.921 3.686
17 1.333 1.740 2.110 2.224 2.567 2.898 3.646
18 1.330 1.734 2.101 2.214 2.552 2.878 3.610
19 1.328 1.729 2.093 2.205 2.539 2.861 3.579
20 1.325 1.725 2.086 2.197 2.528 2.845 3.552
21 1.323 1.721 2.080 2.189 2.518 2.831 3.527
22 1.321 1.717 2.074 2.183 2.508 2.819 3.505
23 1.319 1.714 2.069 2.177 2.500 2.807 3.485
24 1.318 1.711 2.064 2.172 2.492 2.797 3.467
25 1.316 1.708 2.060 2.167 2.485 2.787 3.450
30 1.310 1.697 2.042 2.147 2.457 2.750 3.385
35 1.306 1.690 2.030 2.133 2.438 2.724 3.340
40 1.303 1.684 2.021 2.123 2.423 2.704 3.307
45 1.301 1.679 2.014 2.115 2.412 2.690 3.281
50 1.299 1.676 2.009 2.109 2.403 2.678 3.261
60 1.296 1.671 2.000 2.099 2.390 2.660 3.232
70 1.294 1.667 1.994 2.093 2.381 2.648 3.211
80 1.292 1.664 1.990 2.088 2.374 2.639 3.195
90 1.291 1.662 1.987 2.084 2.368 2.632 3.183
100 1.290 1.660 1.984 2.081 2.364 2.626 3.174
150 1.287 1.655 1.976 2.072 2.351 2.609 3.145
200 1.286 1.653 1.972 2.067 2.345 2.601 3.131
250 1.285 1.651 1.969 2.065 2.341 2.596 3.123
300 1.284 1.650 1.968 2.063 2.339 2.592 3.118
350 1.284 1.649 1.967 2.061 2.337 2.590 3.114
1 1.282 1.645
q 1.960 2.054 2.326 2.576 3.090
For large values of n it holds that tn; n z ; z is the percentage point of the standard normal distribution.
n 2
z. B. 350 1.286 1.650 1.966 2.060 2.333 2.583 3.099
50 / 53
Percentage Points of the 2

-distribution ( 2
n; = n
1
(1 ))
1
df 0.01 0.025 0.05 0.1 0.9 0.95 0.975 0.99
1 0.000 0.001 0.004 0.016 2.706 3.841 5.024 6.635
2 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210
3 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.34
4 0.297 0.484 0.711 1.064 7.779 9.488 11.14 13.28
5 0.554 0.831 1.145 1.610 9.236 11.07 12.83 15.09
6 0.872 1.237 1.635 2.204 10.64 12.59 14.45 16.81
7 1.239 1.690 2.167 2.833 12.02 14.07 16.01 18.48
8 1.646 2.180 2.733 3.490 13.36 15.51 17.53 20.09
9 2.088 2.700 3.325 4.168 14.68 16.92 19.02 21.67
10 2.558 3.247 3.940 4.865 15.99 18.31 20.48 23.21
11 3.053 3.816 4.575 5.578 17.28 19.68 21.92 24.72
12 3.571 4.404 5.226 6.304 18.55 21.03 23.34 26.22
13 4.107 5.009 5.892 7.042 19.81 22.36 24.74 27.69
14 4.660 5.629 6.571 7.790 21.06 23.68 26.12 29.14
15 5.229 6.262 7.261 8.547 22.31 25.00 27.49 30.58
16 5.812 6.908 7.962 9.312 23.54 26.30 28.85 32.00
17 6.408 7.564 8.672 10.09 24.77 27.59 30.19 33.41
18 7.015 8.231 9.390 10.86 25.99 28.87 31.53 34.81
19 7.633 8.907 10.12 11.65 27.20 30.14 32.85 36.19
20 8.260 9.591 10.85 12.44 28.41 31.41 34.17 37.57
21 8.897 10.28 11.59 13.24 29.62 32.67 35.48 38.93
22 9.542 10.98 12.34 14.04 30.81 33.92 36.78 40.29
23 10.20 11.69 13.09 14.85 32.01 35.17 38.08 41.64
24 10.86 12.40 13.85 15.66 33.20 36.42 39.36 42.98
25 11.52 13.12 14.61 16.47 34.38 37.65 40.65 44.31
26 12.20 13.84 15.38 17.29 35.56 38.89 41.92 45.64
27 12.88 14.57 16.15 18.11 36.74 40.11 43.19 46.96
28 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28
29 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59
30 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89
31 15.66 17.54 19.28 21.43 41.42 44.99 48.23 52.19
32 16.36 18.29 20.07 22.27 42.58 46.19 49.48 53.49
33 17.07 19.05 20.87 23.11 43.75 47.40 50.73 54.78
34 17.79 19.81 21.66 23.95 44.90 48.60 51.97 56.06
35 18.51 20.57 22.47 24.80 46.06 49.80 53.20 57.34
36 19.23 21.34 23.27 25.64 47.21 51.00 54.44 58.62
37 19.96 22.11 24.07 26.49 48.36 52.19 55.67 59.89
38 20.69 22.88 24.88 27.34 49.51 53.38 56.90 61.16
39 21.43 23.65 25.70 28.20 50.66 54.57 58.12 62.43
40 22.16 24.43 26.51 29.05 51.81 55.76 59.34 63.69
41 22.91 25.21 27.33 29.91 52.95 56.94 60.56 64.95
42 23.65 26.00 28.14 30.77 54.09 58.12 61.78 66.21
43 24.40 26.79 28.96 31.63 55.23 59.30 62.99 67.46
44 25.15 27.57 29.79 32.49 56.37 60.48 64.20 68.71
45 25.90 28.37 30.61 33.35 57.51 61.66 65.41 69.96
51 / 53
2
Percentage Points of the -distribution cont.

df 0.01 0.025 0.05 0.1 0.9 0.95 0.975 0.99
46 26.66 29.16 31.44 34.22 58.64 62.83 66.62 71.20
47 27.42 29.96 32.27 35.08 59.77 64.00 67.82 72.44
48 28.18 30.75 33.10 35.95 60.91 65.17 69.02 73.68
49 28.94 31.55 33.93 36.82 62.04 66.34 70.22 74.92
50 29.71 32.36 34.76 37.69 63.17 67.50 71.42 76.15
55 33.57 36.40 38.96 42.06 68.80 73.31 77.38 82.29
60 37.48 40.48 43.19 46.46 74.40 79.08 83.30 88.38
65 41.44 44.60 47.45 50.88 79.97 84.82 89.18 94.42
70 45.44 48.76 51.74 55.33 85.53 90.53 95.02 100.4
75 49.48 52.94 56.05 59.79 91.06 96.22 100.8 106.4
80 53.54 57.15 60.39 64.28 96.58 101.9 106.6 112.3
85 57.63 61.39 64.75 68.78 102.1 107.5 112.4 118.2
90 61.75 65.65 69.13 73.29 107.6 113.1 118.1 124.1
95 65.90 69.92 73.52 77.82 113.0 118.8 123.9 130.0
100 70.06 74.22 77.93 82.36 118.5 124.3 129.6 135.8
110 78.46 82.87 86.79 91.47 129.4 135.5 140.9 147.4
120 86.92 91.57 95.70 100.6 140.2 146.6 152.2 159.0
130 95.45 100.3 104.7 109.8 151.0 157.6 163.5 170.4
140 104.0 109.1 113.7 119.0 161.8 168.6 174.6 181.8
150 112.7 118.0 122.7 128.3 172.6 179.6 185.8 193.2
160 121.3 126.9 131.8 137.5 183.3 190.5 196.9 204.5
170 130.1 135.8 140.8 146.8 194.0 201.4 208.0 215.8
180 138.8 144.7 150.0 156.2 204.7 212.3 219.0 227.1
190 147.6 153.7 159.1 165.5 215.4 223.2 230.1 238.3
200 156.4 162.7 168.3 174.8 226.0 234.0 241.1 249.4
220 174.2 180.8 186.7 193.6 247.3 255.6 263.0 271.7
240 192.0 199.0 205.1 212.4 268.5 277.1 284.8 293.9
260 209.9 217.2 223.7 231.2 289.6 298.6 306.6 316.0
280 227.9 235.5 242.2 250.1 310.7 320.0 328.2 338.0
300 246.0 253.9 260.9 269.1 331.8 341.4 349.9 359.9
320 264.1 272.3 279.6 288.0 352.8 362.7 371.4 381.8
340 282.3 290.8 298.3 307.0 373.8 384.0 393.0 403.6
360 300.5 309.3 317.0 326.1 394.8 405.2 414.5 425.3
380 318.8 327.9 335.8 345.1 415.7 426.5 435.9 447.1
400 337.2 346.5 354.6 364.2 436.6 447.6 457.3 468.7
450 383.2 393.1 401.8 412.0 488.8 500.5 510.7 522.7
500 429.4 439.9 449.1 459.9 540.9 553.1 563.9 576.5
550 475.8 486.9 496.6 507.9 592.9 605.7 616.9 630.1
600 522.4 534.0 544.2 556.1 644.8 658.1 669.8 683.5
650 569.1 581.2 591.9 604.2 696.6 710.4 722.5 736.8
700 615.9 628.6 639.6 652.5 748.4 762.7 775.2 790.0
750 662.9 676.0 687.5 700.8 800.0 814.8 827.8 843.0
800 709.9 723.5 735.4 749.2 851.7 866.9 880.3 896.0
850 757.0 771.1 783.3 797.6 903.2 918.9 932.7 948.8
900 804.3 818.8 831.4 846.1 954.8 970.9 985.0 1002
950 851.5 866.5 879.5 894.6 1006 1023 1037 1054
1000 898.9 914.3 927.6 943.1 1058 1075 1090 1107
52 / 53
Percentage Points of the F -distribution (Fm,n; = Fm,n

1
(1 )), here: = 0.1
Note that: Fm,n; = F 1

n,m;1
n
m
1 2 3 4 5 6 7 8 9 10 15 20 30 40 50 60 70 80 90 100
1 39.87 8.526 5.538 4.545 4.060 3.776 3.589 3.458 3.360 3.285 3.073 2.975 2.881 2.835 2.809 2.791 2.779 2.769 2.762 2.756
2 49.52 9.000 5.462 4.325 3.780 3.463 3.257 3.113 3.006 2.924 2.695 2.589 2.489 2.440 2.412 2.393 2.380 2.370 2.363 2.356
3 53.62 9.162 5.391 4.191 3.619 3.289 3.074 2.924 2.813 2.728 2.490 2.380 2.276 2.226 2.197 2.177 2.164 2.154 2.146 2.139
4 55.87 9.243 5.343 4.107 3.520 3.181 2.961 2.806 2.693 2.605 2.361 2.249 2.142 2.091 2.061 2.041 2.027 2.016 2.008 2.002
5 57.28 9.293 5.309 4.051 3.453 3.108 2.883 2.726 2.611 2.522 2.273 2.158 2.049 1.997 1.966 1.946 1.931 1.921 1.912 1.906
6 58.24 9.326 5.285 4.010 3.404 3.055 2.827 2.668 2.551 2.461 2.208 2.091 1.980 1.927 1.895 1.875 1.860 1.849 1.841 1.834
7 58.95 9.349 5.266 3.979 3.368 3.014 2.785 2.624 2.505 2.414 2.158 2.040 1.927 1.873 1.840 1.819 1.804 1.793 1.785 1.778
8 59.48 9.367 5.252 3.955 3.339 2.983 2.752 2.589 2.469 2.377 2.119 1.999 1.884 1.829 1.796 1.775 1.760 1.748 1.739 1.732
9 59.90 9.381 5.240 3.936 3.316 2.958 2.725 2.561 2.440 2.347 2.086 1.965 1.849 1.793 1.760 1.738 1.723 1.711 1.702 1.695
10 60.24 9.392 5.230 3.920 3.297 2.937 2.703 2.538 2.416 2.323 2.059 1.937 1.819 1.763 1.729 1.707 1.691 1.680 1.670 1.663
15 61.26 9.425 5.200 3.870 3.238 2.871 2.632 2.464 2.340 2.244 1.972 1.845 1.722 1.662 1.627 1.603 1.587 1.574 1.564 1.557
20 61.78 9.441 5.184 3.844 3.207 2.836 2.595 2.425 2.298 2.201 1.924 1.794 1.667 1.605 1.568 1.543 1.526 1.513 1.503 1.494
30 62.31 9.463 5.168 3.817 3.174 2.800 2.555 2.383 2.255 2.155 1.873 1.738 1.606 1.541 1.502 1.476 1.457 1.443 1.432 1.423
40 62.57 9.472 5.160 3.804 3.157 2.781 2.535 2.361 2.232 2.132 1.845 1.708 1.573 1.506 1.465 1.437 1.418 1.403 1.391 1.382
50 62.73 9.477 5.155 3.795 3.147 2.770 2.523 2.348 2.218 2.117 1.828 1.690 1.552 1.483 1.441 1.413 1.392 1.377 1.365 1.355
60 62.84 9.480 5.151 3.790 3.140 2.762 2.514 2.339 2.208 2.107 1.817 1.677 1.538 1.467 1.424 1.395 1.374 1.358 1.346 1.336
70 62.91 9.483 5.146 3.786 3.135 2.756 2.508 2.333 2.202 2.100 1.808 1.667 1.527 1.455 1.412 1.382 1.361 1.344 1.332 1.321
80 62.97 9.484 5.144 3.782 3.132 2.752 2.504 2.328 2.196 2.095 1.802 1.660 1.519 1.447 1.402 1.372 1.350 1.334 1.321 1.310
90 63.01 9.486 5.143 3.780 3.129 2.749 2.500 2.324 2.192 2.090 1.797 1.655 1.512 1.439 1.395 1.364 1.342 1.325 1.312 1.301
100 63.05 9.487 5.142 3.778 3.126 2.746 2.497 2.321 2.189 2.087 1.793 1.650 1.507 1.434 1.388 1.358 1.335 1.318 1.304 1.293
53 / 53

1 Introduction To Statistical Inference

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

1 Introduction To Statistical Inference

Hochgeladen von

Copyright:

Verfügbare Formate

Industrial Statistics 1.

Introduction to Statistical Inference

step 1: selection of a suitable distribution family

step 2: drawing of a sample

1.2 Confidence Intervals

In practice is usually chosen equal to 0.1, 0.05 or 0.01.

1.2.1 Confidence Intervals for the Parameters of a Normal

now: confidence interval for if is unknown

100(1 )% confidence interval for ( unknown)

[485.755 42.27934, 485.755 + 42.27934] = [443.4757, 528.0343].

year <- c(1983:2002);

#--- histogram ---#

hist(Rain, breaks = 8, freq = FALSE, main = "histogram",

#--- box plot ---#

boxplot(rain, range = 0, ylab = "Mean Annual Rainfall");

#--- normal qq plot ---#

qqnorm(rain, datax = TRUE, main = "Normal QQ Plot");

Mean Annual Rainfall

300 400 500 600 700

Mean Annual Rainfall Normal QQ Plot

400 500 600 700

#--- confidence intervall ---#

alpha <- 0.05;

result: [443.4752, 528.0348]

now: confidence interval for if is unknown

100(1 )% confidence interval for 2

1.2.2 Large-Sample Confidence Intervals

lim P# (L(X1 , .., Xn ) # U (X1 , .., Xn )) 1 8 # 2 .

i) confidence interval for if 2 = V ar(Xi ) is known

large-sample confidence interval for ( known)

ii) confidence interval for if 2 = V ar(Xi ) is unknown

large-sample confidence interval for ( unknown)

1.230 0.490 0.490 1.080 0.590 0.280 0.180 0.100 0.940

#--- histogram ---#

hist(MerCon, breaks = 14, freq = FALSE, main = "histogram",

#--- normal qq plot ---#

qqnorm(MerCon, datax = TRUE, main = "normal qq plot");

Histogram Normal QQ Plot

Concentration Sample Quantiles

#--- confidence interval ---#

alpha <- 0.05;

result: ci = [0.4311, 0.6188]

Overview: Confidence Intervals

n large X X p z/2 X + p z/2 (or S instead of if unknown)

1.3 Hypothesis Testing

Example: burning rate of solid propellant - H0 : = 50 (centimeters per second) against

procedure: Based on the sample x1 , . . . , xn a decision about a particular hypothesis is made.

Table: Type I Error and Type II Error

P# ((x1 , .., xn ) 2 C) for all # 2 0 .

1.3.2 Tests for Univariate Samples

test problem: H0 : = 0 against H1 : 6= 0

Gauss test ( known)

test problem: H0 : 2 = 2 against H1 : 2 6= 2

1.0 qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq......qqqqqqqqqqqq qqqq.qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

qqq .... q .... q

qqq ..... qq .....

Because in most cases the distribution of the underlying characteristic X is unknown

test problem: H0 : = 0 against H1 : 6= 0

Tests for the Mean and the Variance of a Single Sample

distribution H0 H1 test statistic T T under H0 reject H0 if

0.8411 0.8191 0.8182 0.8125 0.8750

Since n = 15, x = 0.83724, s = 0.0245571 the test statistic is equal to