Sie sind auf Seite 1von 51

1

STATISTICAL INFERENCE
2

Statistical Inference
Concerned with making generalizations about
characteristics of a population (unknown values of
parameters) on the basis of data from a sample
taken (observed values of a statistic)
Parameter numerical characteristic of a
population.
Statistic numerical characteristic computed from
sample data.
Employs inductive method of inference as opposed
to the deductive method.
3
PROBABILITY
POPULATION
SAMPLE
DATA
STATISTICAL INFERENCE
Quality of Inference ?
SAMPLING
BASIC MODEL (FRAMEWORK) STATISTICAL
INFERENCE
PROBABILITY
POPULATION
SAMPLE
DATA
STATISTICAL INFERENCE
Quality of Inference ?
SAMPLING
BASIC MODEL (FRAMEWORK) STATISTICAL
INFERENCE
4
Sampling
Sampling process of selecting a part of a
population.
Methods:
Probability sampling (random, scientific sampling)
selection of elements from a population is governed by
chance (objective method)
Non-probability sampling (purposive, quota) selection of
elements from a population is defined by the sampler (not
by chance)
5
Probability Sampling
Each element in the population has a non-zero (not
necessarily equal) chance of being selected.
The selection probability can be computed and is
known (i.e. the probability that a unit in the
population is included in the sample)
Requires a list of some kind of the elements of the
population (sampling frame).
Regarded as a RANDOM EXPERIMENT.
6
Some Methods of (Probability)
Sampling
Simple Random Sampling (SRS)
Each sample of distinct units of size n has the same
chance of being selected.
From a population of N units, there are a total of
N
C
n

possible samples.
Equivalently, select one unit at a time ensuring that each
remaining units after earlier draws has the same chance of
being selected.
Stratified Random Sampling
Elements are divided into groups, called strata.
Samples of size n
i
are selected independently from each
strata and that n=n
1
+n
2
++n
L


7
Some Methods of (Probability)
Sampling
Cluster Sampling
Each element of the population (usually geographically
contiguous) are grouped into clusters.
A sample of cluster is selected and every element of the
sampled cluster is selected.
Multi-stage sampling.
Elements are first grouped into clusters.
A sample of cluster is selected.
From each sampled cluster, a sample of units are
selected. (Two-stage sampling)

8
Claim
Sampling is a random experiment.
Process of making observation (composition of the
sample, sample data itself).
Well-defined outcomes the set of possible samples. In
practice, we only get a single sample.
We cannot predict with certainty which sample would be
selected.
Statistic is a random variable
A statistic is a numerical quantity computed from sample
data used to describe a sample.
9
Claim
The probability distribution of a statistic is called
sampling distribution.
One such statistic is the sample mean.
Sampling distribution of the sample mean:
If the population from which samples are to
be drawn normally distributed with mean
and variance o
2
, then

2
~ , X N
n
o

| |
|
\ .
10
Example
An illustration: Suppose the hypothetical members
of the population are households. The population
has measurable characteristics, which we often call
a variable. Consider monthly income as the
variable of interest.
Further, let us say that the values of X for units
A,B,C,D and E are respectively equal to 5,000,
7,000, 10,000, 11,000, and 9,000. We have the
possible sample data:
11
Example
X
Sample
Sample
Units
Sample Data m(1)
1 (A,B,C) (5000, 7000, 10000) 5000 7333.33
2 (A,B,D) (5000, 7000, 11000) 5000 7666.67
3 (A,B,E) (5000,7000,9000) 5000 7000
4 (A,C,D) (5000,10000,11000) 5000 8666.67
5 (A,C,E) (5000,10000,9000) 5000 8000
6 (A,D,E) (5000,11000,9000) 5000 8333.33
7 (B,C,D) (7000,10000,11000) 7000 9333.33
8 (B.C,E) (7000,10000,9000) 7000 8666.67
9 (B.D,E) (7000,11000,9000) 7000 9000
10 (C,D,E) (10000,11000,9000) 9000 10000
12
Example
m
[ (1) ] P m m =
5000 7000 9000
6/10 3/10 1/10
Probability Distribution of m(1)=min [Sampling
Distribution of m(1)]
13
Example
x
[ ] P X x =
7000 1/10
7333.33 1/10
7666.67 1/10
8000 1/10
8333.33 1/10
8666.67 2/10
9000 1/10
9333.33 1/10
10000 1/10
Probability Distribution of [Sampling Distribution of] x
14
Sampling Distribution of the
sample mean
In this case, the mean (from all possible simple
random samples of size n) of the sample means
is:


Its variance (also called the sampling variance) is
( ) E X =
2
2
( )
X
Var X
n
o
o = =
15
Sampling Distribution of the
sample mean
The standard deviation of the sample mean is




In general, the standard deviation of any statistic is
called STANDARD ERROR and hence the
notation SE.
2
2
( )
X X
SE X
n
n
o o
o o = = = =
16
Central Limit Theorem
If we are sampling from any population with
mean and variance o
2
(even if the parent
population is not normal), then the sample
mean is approximately normally distributed for
sufficiently large sample size (n at least 25).
That is, for sufficiently large n (at least 25),
2
~ , X N
n
o

| |
|
\ .
17
18
Implication
If , then





This means, (under the normality conditions) one
can easily solve problems of the form
2
~ , X N
n
o

| |
|
\ .
~ (0,1)
/
X
Z N
n

o

=
( ) P a X b < <
19
Example
Suppose that it is believed that heights of
students are normally distributed with mean 64 in.
with standard deviation of 6 in. If a random
sample of 16 students were to be selected, how
likely is the possibility of getting a mean height of
75 in.?
Let X denote the heights of students.
Then X~N(64, 36). Because we are sampling
from a normal population, then
36
~ 64,
16
X N
| |
|
\ .
20
Example
,
75
( 75)
/ /
64 75 64
6/ 4 6/ 4
( 7.6) 0
Thus
X
P X P
n n
X
P
P Z

o o
| |
> = >
|
\ .
| |
= >
|
\ .
= > ~
If heights were indeed N(64,36), then getting a sample
mean value of 75 in. based on random sample of 16 is
highly unlikely. Therefore, if indeed the actual
computed sample mean is at least 75 in. then the
assumed population of heights is questionable.
21
Inference: Major Concerns
The major concerns of statistical inference are:

Point Estimation
Interval Estimation
Hypothesis Testing
22
Point Estimation: Basic
Concepts
ESTIMATOR a rule/real-valued function defined
from a sample that is used to generate values that
serves as basis for making generalizations
(inference) about unknown value of a (population)
parameter.
ESTIMATE a single value computed from a
sample (sample data) by applying the estimator to
that sample. Estimates are possible values of an
estimator.

23
Point Estimation - Remarks
An estimator is a statistic. Because a statistic is
regarded as a random variable, then we regard an
estimator as a random variable.
Several estimates can be computed for every
estimator. This is because there are many possible
samples that can be drawn from a given population
which are generally different from one another
(hence different data sets from the sample)


24
Point Estimation: Objective
For a given parameter, there are several estimators
to choose from.
The main problem of point estimation is to choose
an estimator that has a tendency to give estimates
that are close to the true value of a parameter.
Need for some criteria that defines a good
estimator. Ideally, a good estimator is one that
yields estimates exactly equal to or very close to
the true value of the parameter (object of
inference).
25
Unbiasedness
Let T be a parameter. Let x
1
,x
2
,,x
n
be a random
sample from the population. Let t=t(x
1
,x
2
,,x
n
) be
an estimator. The estimator t is said to UNBIASED
(or an unbiased estimator of) for T if E(t) = T.
That is, if all possible samples were to be drawn
and estimates computed from each sample, then
the estimator t is unbiased for T if the mean of all
estimates is equal to T.
The estimates (computed from different samples)
tend to cluster around the value of T.


26
Illustration
Bulls-eye represents the value of T. The hits
represents the estimates and the shooter
represents the estimator.
UNBIASED BIASED
27
Unbiasedness: Remarks
Unbiasedness is a good property of an estimator.
If the estimator is not unbiased, it is referred to as a
BIASED estimator.
Even if an estimator is unbiased, it can give an
estimate that is very far from the value of the
parameter T (not good).
Similarly, biased estimators may yield estimate that
is close (even equal) to the value of the parameter
T.
Unbiasedness alone do not make a good estimator.
28
Precision
Precision of estimators measures the extent to
which estimates are close to one another. The
closer the estimates are, the more precise the
estimator.
PRECISE NOT PRECISE
29
Precision: Remarks
Often measured by the standard error of an
estimator. Smaller SE values imply more precise
estimates.
Often, the SE is inversely proportional to the
sample size. That is, larger sample sizes leads to
smaller SE values hence more precise estimators.
It is possible that a precise estimator may yield
estimates that are far from the true value of the
parameter.
30
Accuracy
An estimator is said to be ACCCURATE if it has a
tendency to yield estimates that are close to the
true value of the parameter.
ACCURATE NOT ACCURATE
31
Accuracy: Remarks
The best estimator is the most accurate.
An unbiased estimator that is precise is accurate.
Unbiased estimators are not always accurate or
precise.
Precise estimators are not always unbiased or
accurate.
A biased estimator may be accurate.
32
UNBIASED AND PRECISE
(ACCURATE)
UNBIASED BUT NOT
PRECISE (NOT
ACCURATE)
BIASED BUT PRECISE
(MAY BE ACCURATE)
BIASED AND NOT
PRECISE
Which is the
best estimator?
33
Estimating the Population
Mean
From a random sample of size n the sample
mean is an UNBIASED estimator of the
population mean.
The TRUE (theoretical) standard error of the
mean is given by




o is the population standard deviation.
( )
x
x
n
o
o o = =
34
Estimating the Population
Mean
Note that the true standard error of the sample
mean is a function of an (usually unknown)
parameter o.
It primary use is to allow the identification of
conditions in which the estimator can be
considered precise (recall: unbiased +
precise=accurate).
From the expression, the sample mean is
precise for large n. A larger sample size is
required for very variable population.

.
35
Estimating the Population
Mean
If o is unknown (usually the case), one can
estimate the standard error of the sample mean
based on the same sample data as

.
2
1
( ) , - sample standard deviation
( )
1
x
n
i
i
s
s x s s
n
x x
s
n
=
= =

36
Estimating the Population
Proportion
The population proportion P is defined as the true
count of the population that possess the attribute of
interest and the total number of elements of the
population.
The sample proportion, p, is the number of
elements in the sample that possess the attribute of
interest and the sample size.
The sample proportion is an UNBAISED estimator
of the population proportion based on a simple
random sample of size n.
37
Estimating the Population
Proportion
The true standard error of the sample proportion
is





The standard error is largest when P=1/2 and
becomes smaller as P approaches 0 or 1.

(1 )
( )
true population proportion
p
P P
p
n
P
o o

= =

38
Estimating the Population
Proportion
The true standard error of the sample proportion
is estimated as





(1 )

( )

sample proportion
p
p p
s p s
n
p

= =

39
Interval Estimation
In practice, once an estimator is chosen, we only
select one out of many (many, many) possible
sample of size n and use the estimate computed as
bases for our inference about the true value of the
parameter.
If the estimator chosen is accurate, then we expect
that the single estimate computed should be one of
those estimates that are close to the true (and
unknown) value of the parameter.
We may be a little bit uncomfortable with a single
value.
40
Interval Estimation
Hence, instead of a point estimator, we instead
employ an interval estimator.
An INTERVAL ESTIMATOR of a parameter, u is of
the form: a u b where a, b are statistics.
It is a range of values from a to b obtained from
sample data that gives an estimate of the most
likely value that a parameter can assume.
The problem is how to choose a and b that are
close to one another and at the same time has the
largest chance that the value of the parameter is
contained in the said interval.
41
Confidence Interval
An interval estimator of a parameter becomes a
CONFIDENCE INTERVAL estimator if a confidence
coefficient is attached to it.
A CONFIDENCE COEFFICIENT is the level of
confidence which ranges from 0 to 1 (or 0%
to100%) . The closer it is to 1, the more confident
we are that the true value of the parameter is
indeed contained within the interval.
Usually expressed as (1-o) where o is the amount
of error in estimation. That is, the amount of error
we are willing to commit (the true value of the
parameter is not contained in the interval chosen)
42
Confidence Interval Estimate
of the Mean
Assuming normality of , a (1-o) x 100%
Confidence Interval Estimate of is:




In here, is often referred to as the
maximum allowable error.
is the (1-o/2)th percentile point of the z-
distribution. That is, it is the point in the z-table
such that
x
/ 2 / 2
/ 2
x z x z
n n
x z x d
n
o o
o
o o

o
s s +
=
/ 2
d z
n
o
o
=
/ 2
z
o
/ 2
( ) 1 ( / 2) P Z z
o
o < =
43
Confidence Interval Estimate
of the Mean
/ 2
z
o
0.1 0.05
0.025 0.001
:
1.28 1.645
1.96 2.33
Example
z z
z z
= =
= =
/ 2
( ) 1
2
P Z z
o
o
< =
44
Example
Suppose we want to estimate the mean profit of
10,000 small establishments involved in food
manufacturing. From a similar census in the past,
the standard deviation of profit is about 85,000
pesos. A random sample of 40 establishments was
obtained and their profit values were obtained
giving a mean of 100,000 pesos.
45
Example (contd.)
A 95% confidence interval estimate of the true
mean profit is





This means that we are 95% confident that the true
mean profit of 10,000 small establishments is
between P 73,658.2 to 126,341.8.

0.025 0.025
85, 000 85, 000
100, 000 100, 000
40 40
100, 000 1.96(13, 439.7) 100, 000 1.96(13, 439.7)
100, 000 26,341.8 100, 000 26,341.8
73, 658.2 126,341.8
z z

s s +
s s +
s s +
s s
46
Interpretation
The confidence interval
estimate computed is
just one of about 95%
CIs that contains the
true value of the
population mean.
x

Different interval
estimates from
different samples
47
Confidence Interval Estimate
of the Population Proportion
In the case of a population proportion, a (1-o)x
100% confidence interval estimate is:




In here,
/ 2
(1 )

P P
p z
n
o

2
(1 ) P P
d z
n
o

=
48
Sample Size
A good interval estimate is one in which the lower
limit and the upper limit are as close as possible
(shorter width) with the highest level of confidence
as possible.
This can be done by ensuring that the maximum
error is small which in turn is achieved by
increasing the sample size.
49
Sample Size
Thus, to come up with a good confidence interval
estimate, one should find the sample size so that we
achieve the desired value for the maximum allowable
error. That is, if we specify the value of d, then for a
given level of confidence we have




For proportions, let
2
2 2
/ 2 / 2
/ 2
2
z z
d z n
d d
n
o o
o
o o o
(
= = =
(

2
(1 ) P P o =
50
Example
Suppose that . Find the sample size such
that we are 95% confident that the difference
between the estimate and the true mean will not
exceed 2.
Solution: In here d=2, and therefore
10 o =
2
(1.96)(10)
96.04 96
2
n
(
= = ~
(

51
Confidence Interval estimate of
the population mean when o is
unknown.
When the population standard deviation o is
unknown, a (1-o) x 100% confidence interval
estimate of the population mean is



is obtained from the Students t table
with degrees of freedom (df) = n-1.
For large n,






/ 2( 1) n
s
x t
n
o
/ 2,( 1) n
t
o
/ 2,( 1) / 2 n
t z
o o
~

Das könnte Ihnen auch gefallen