1.017/1.010 Class 18 Small Sample Statistics

1.017/1.
010 Class 18
Small Sample Statistics
Small Samples
When sample size N is small the estimator of a distributional property a

(mean, variance, 90 percentile, etc.) is generally not normal.
In this case, the CDF’s of the estimate aˆ and standardized statistic z

(used to derive confidence intervals and hypothesis tests) can be
approximated with stochastic simulation.
In order to generate random replicates in the stochastic simulation

we need to specify the property a (or parameters that are related to
it):
For estimating confidence intervals we assume a = aˆ (the

estimate computed from the actual data).
For testing hypotheses we assume a = a0 (the hypothesized

parameter value).
The stochastic simulation uses many Nrep random sample replicates, each
of length N, to generate Nrep estimates. The desired estimate and
standardized statistic CDFs are derived from this ensemble of estimates.
Example – Small-sample two-sided confidence Intervals for the

mean of an exponential distribution
Consider a small sample that is thought to be drawn from an

exponential distribution with unknown parameter a:
[x1, x2, x3, x4, x5] = [0.05 1.46 0.50 0.72 0.11 ]
The sample mean is an unbiased estimator of a:
aˆ = m x = 0.57
As in the large sample case we derive the a confidence interval

from a standardized statistic z. Replicate i of z is:.
aˆ i − maˆ
z i (aˆ i , maˆ ) =
s aˆ
1
where maˆ and s aˆ .are the sample mean and standard deviation
computed over the ensemble of all estimate replicates aˆ i (e.g. i =
1,…, Nrep). Each â i is derived from N = 5 data values obtained
from the MATLAB function exprnd, with a = m x = 0.57 .
The CDF Fz(z) is obtained by plotting the zi replicates with cdfplot

or normplot.
Fz(z) for this example clearly deviates from the unit normal at both
high and low values:
p/2
z zL zU
Fz(z) is used as in the large sample case to identify the zL and zU

values:
α   α
z L = FZ−1   zU = FZ−1 1 − 
2  2
The small-sample double-sided 95% confidence interval for a is

approximately:
a L = aˆ − zU s aˆ = 0.56 - (+2.1)(0.255) = 0.02

aU = aˆ − z L s aˆ = 0.56 - (-1.5)(0.255) = 0.94
0.02 ≤ a ≤ 0.94
For comparison, the 95% doubled-sided large-sample (normal)

interval:
0.06 ≤ a ≤ +1.06
2
The difference is slight considering the small sample size. The
difference in the small and large-sample 99% confidence intervals
is greater.
Example – Small-sample two-sided test of a hypothesis about the

mean of an exponential distribution
Consider in the above example the hypothesis:
H0: a=1.0
We can derive the rejection region and p value for this hypothesis
with a stochastic simulation similar to the performed above except
that we use a = a0 =1.0 in exprnd and derive Fz(z) from replicates
defined as follows:
i ˆi aˆ i − a0 aˆ i − 1.0
z (a , a0 ) = =
s aˆ s aˆ
In this case the Fz(z) plot is the same as the one shown above.
The test statistic obtained from the observed sample mean is:
aˆ − a0 0.56 − 1.0
z (aˆ , a0 ) = = = −1.73
s aˆ 0.255
This gives a p value of approximately 0.004 (see figure), leading us

reject the hypothesis.
Special Case: Normally Distributed Samples
If random sample(s) are normally distributed it is possible to derive the

exact small sample CDFs of certain standardized statistics
Two-sided Confidence Intervals for Small Normally Distributed Samples
Confidence Intervals for the Mean E(x):
mx − a
Standardized statistic: t (m x , a) =
sx
N
This has a t distribution with n=N-1 degrees of freedom.
Confidence interval:
3
 α s α  s
m x − Ft−, ν1 1 −  x ≤ a ≤ m x − Ft−, ν1   x
 2 N 2 N
Evaluate Ft−,ν1 with MATLAB function tinv.
Confidence Intervals for the Variance Var(x):
( N − 1) s x2
Standardized statistic: χ 2 ( s x2 , σ x2 ) =
σ x2
This has a Chi-squared distribution with ν =N - 1 degrees of freedom.
Confidence interval:
( N − 1) s x2 ( N − 1) s x2
≤ Var[ x ] ≤
 α α 
F χ−21,ν 1 −  F χ−21,ν  
 2 2
Evaluate F χ−21,ν with MATLAB function chi2inv.
Two-sided Hypothesis Tests for Small Normally Distributed Samples
Hypothesis Tests about the Mean E(x):
H 0 : E ( x) = a0
m x − a0
Use t test statistic (ν = N -1) : t (m x , a 0 ) =
sx
N
p value:
p
= Ft , ν [t (m x , a0 )] for Ft , ν ≤ 0.5
2
p
1 − = Ft , ν [t (m x , a0 )] for Ft , ν > 0.5
2
Evaluate Ft , ν with MATLAB function tcdf.
Hypothesis Tests about the Variance Var(x)
4
H 0 : Var ( x) = σ x2 = a0
( N − 1) s x2
Use Chi - squared test statistic (ν = N -1) : χ 2 ( s x2 , a0 ) =
a0
p
= F χ 2 ,ν [ χ 2 ( s x2 ,a0 )] for F χ 2 ,ν ≤ 0.5
2
p
1 − = F χ 2 ,ν [ χ 2 ( s x2 ,a0 )] for F χ 2 ,ν > 0.5
2
Evaluate Ft , ν with MATLAB function chi2cdf.

t CDF
ν=
ν=1
Chi-squared
ν=4
ν = 10
Copyright 2003 Massachusetts Institute of Technology

Last modified Oct. 8, 2003

1.017/1.010 Class 18 Small Sample Statistics

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

1.017/1.010 Class 18 Small Sample Statistics

Hochgeladen von

Copyright:

Verfügbare Formate

1.017/1.

When sample size N is small the estimator of a distributional property a

In this case, the CDF’s of the estimate aˆ and standardized statistic z

In order to generate random replicates in the stochastic simulation

For estimating confidence intervals we assume a = aˆ (the

For testing hypotheses we assume a = a0 (the hypothesized

Example – Small-sample two-sided confidence Intervals for the

Consider a small sample that is thought to be drawn from an

The sample mean is an unbiased estimator of a:

As in the large sample case we derive the a confidence interval

The CDF Fz(z) is obtained by plotting the zi replicates with cdfplot

Fz(z) is used as in the large sample case to identify the zL and zU

The small-sample double-sided 95% confidence interval for a is

a L = aˆ − zU s aˆ = 0.56 - (+2.1)(0.255) = 0.02

For comparison, the 95% doubled-sided large-sample (normal)

Example – Small-sample two-sided test of a hypothesis about the

Consider in the above example the hypothesis:

This gives a p value of approximately 0.004 (see figure), leading us

Special Case: Normally Distributed Samples

If random sample(s) are normally distributed it is possible to derive the

Two-sided Confidence Intervals for Small Normally Distributed Samples

Confidence Intervals for the Mean E(x):

This has a t distribution with n=N-1 degrees of freedom.

Evaluate Ft−,ν1 with MATLAB function tinv.

Confidence Intervals for the Variance Var(x):

This has a Chi-squared distribution with ν =N - 1 degrees of freedom.

Evaluate F χ−21,ν with MATLAB function chi2inv.

Two-sided Hypothesis Tests for Small Normally Distributed Samples

Hypothesis Tests about the Mean E(x):

Evaluate Ft , ν with MATLAB function tcdf.

Hypothesis Tests about the Variance Var(x)

Evaluate Ft , ν with MATLAB function chi2cdf.

Copyright 2003 Massachusetts Institute of Technology

Das könnte Ihnen auch gefallen