Sie sind auf Seite 1von 51

The Chi-Squared Distribution

Let v be a positive integer. Then a


random variable X is said to have a chi-
squared distribution with parameter v if
the pdf of X is
( / 2) 1 / 2
/ 2
1
0
( ; )
2 ( / 2)
0 0
v x
v
x e x
f x v
v
x


'

<

6/12/11
01:22:08 PM
IIT Bombay (IC102)
The Chi-Squared Distribution
The parameter v is called the number of
degrees of freedom (df) of X. The
symbol is often used in place of chi-
squared.
If X1 , X2 , Xn are n i.i.d r.v.
following N( , 2), Then,

2

6/12/11
01:22:09 PM
IIT Bombay (IC102)
2
2
1
n
i
n
i
X

| `

. ,

:
IIT Bombay (IC102)
Chi-squared Critical Value
Let , called a chi-squared critical
value, denote the number on the
measurement axis such that of the area
under the chi-squared curve with n d.f. lies
to the right of 2
,
.
n

2
,n

6/12/11
01:22:09 PM
IIT Bombay (IC102)
Notation Illustrated
2
,n

2
,n

shaded area =
2
pdf
n

6/12/11
01:22:09 PM
6/12/11
01:22:09 PM
IIT Bombay (IC102)
If X is a chi-sq r.v. with n d.f., then for (0,1),
the quantity
2
,n

is defined to be such that


2
,
{ }
n
P X


6/12/11
01:22:09 PM
IIT Bombay (IC102)
6/12/11
01:22:09 PM
IIT Bombay (IC102)
Cont. distn.
1. Uniform
2. Normal
3. Gamma
4. Exponential
5. Chi-squared
6. t-dist.
7. F-dist.
6/12/11
01:22:09 PM
IIT Bombay (IC102)
Random Samples
6/12/11
01:22:09 PM
IIT Bombay (IC102)
Data from random samples drawn are used for inferring
certain population characteristic of interest.
The distribution of the population variable is usually known,
except for some unknown population parameters.
Problems in which the form of the underlying distn. is
specified up to a set of unknown parameters are called
parametric inference problems.
6/12/11
01:22:09 PM
IIT Bombay (IC102)
Random Samples
The rvs X1,, Xn are said to form a
simple random sample of size n if
1. The Xis are independent rvs.
2. Every Xi has the same (identical)
probability distribution.
6/12/11
01:22:09 PM
IIT Bombay (IC102)
i.e., if X1, , Xn are independent r.v. having a
common (identical) distn. F, then we say that they are
i.i.d. random sample of size n from the distn. F.
Distribution of a
Linear
Combination of
Random Variables
6/12/11
01:22:09 PM
IIT Bombay (IC102)
Linear Combination
Given a collection of n random
variables X1,, Xn and n numerical
constants a1,,an, the r.v.
is called a linear combination of the Xis.
1 1
1
...
n
n n i i
i
Y a X a X a X

+ +

6/12/11
01:22:09 PM
IIT Bombay (IC102)
Expected Value of a Linear Combination
Let X1,, Xn have mean values
and variances of
respectively
1 2
, ,...,
n

2 2 2
1 2
, ,..., ,
n

Whether or not the Xis are independent,
( ) ( ) ( )
1 1 1 1
... ...
n n n n
E a X a X a E X a E X + + + +
1 1
...
n n
a a + +
6/12/11
01:22:10 PM
IIT Bombay (IC102)
For identically distributed Xi `s and ai= 1/n,
we get, E( ) = .
X
Variance of a Linear Combination
( ) ( ) ( )
2 2
1 1 1 1
... ...
n n n n
V a X a X a V X a V X + + + +
If X1,, Xn are independent,
2 2 2 2
1 1
...
n n
a a + +
and
1 1
2 2 2 2
... 1 1
...
n n
a X a X n n
a a
+ +
+ +
6/12/11
01:22:10 PM
IIT Bombay (IC102)
For i.i.d. Xi `s and ai= 1/n, we get, V( ) =
2/n.
X
Variance of a Linear Combination
( )
( )
1 1
1 1
... Cov ,
n n
n n i j i j
i j
V a X a X a a X X

+ +

For any X1,, Xn,


6/12/11
01:22:10 PM
IIT Bombay (IC102)
Difference Between Two Random Variables
( ) ( ) ( )
1 2 1 2
E X X E X E X
and, if X1 and X2 are independent,
( ) ( ) ( )
1 2 1 2
V X X V X V X +
6/12/11
01:22:10 PM
IIT Bombay (IC102)
If X1, X2,Xn are independent and
normally distributed rvs, then any linear
combination of the Xis also has a normal
distribution.
The difference X1 X2 between two
independent, normally distributed variables
is itself normally distributed.
Also, is normally distributed.
6/12/11
01:22:10 PM
IIT Bombay (IC102)
X
Statistics
and their
Distributions
6/12/11
01:22:10 PM
IIT Bombay (IC102)
Statistic
A statistic is any quantity whose value can
be calculated from sample data, e.g.
sample mean, sample variance, sample
range, sample median, etc.
Prior to obtaining data, there is uncertainty
as to what value is taken by any particular
statistic. Thus, a statistic is itself a random
variable and its prob. distn. is referred to as
its sampling distn.
6/12/11
01:22:10 PM
IIT Bombay (IC102)
Simulation Experiments
Draw 300 random samples each of size n=25, from a normal
(size N) pop. with mean =5.4 and SD =0.2. So we get
sample means .
Draw the histogram of
6/12/11
01:22:10 PM
IIT Bombay (IC102)
1 2 300
, ,..., x x x
, 1,..., 300.
i
x i
This gives a good approximation of the
sampling distn. of
In order to find the exact sampling dist. of ,
we need the dist. based on all the possible
samples.
. X
300
25
| `

. ,
X
6/12/11
01:22:10 PM
IIT Bombay (IC102)
The Distribution
of the
Sample Mean
6/12/11
01:22:10 PM
IIT Bombay (IC102)
General properties of the Sample Mean
Let X1,, Xn be a random sample from a
distribution with mean value and
standard deviation Then

.
( )
( )
2
2
1.
2.
X
X
E X
V X
n



In addition, with To = X1 ++ Xn,
( ) ( )
2
, , and .
o
o o T
E T n V T n n
6/12/11
01:22:10 PM
IIT Bombay (IC102)
6/12/11
01:22:10 PM
IIT Bombay (IC102)
( )
( )
2
E X
V X
n

Case: when Pop. is Normally dist.


Let X1,, Xn be a random sample from a
normal distribution with mean value and
standard deviation Then for any n,
is normally distributed with (, 2/n)
To is normally distributed with (n, n2).

.
X
6/12/11
01:22:10 PM
IIT Bombay (IC102)
The Central Limit Theorem CLT
Let X1,, Xn be a random sample from a distribution
with mean value and variance Then if n is
sufficiently large, has approximately a normal
distribution with
X

2
.
2
2
and ,
X X
n


and To also has
approximately a normal distribution with
2
, .
o o
T T
n n
n, the better the approximation.
The larger the value of
6/12/11
01:22:10 PM
IIT Bombay (IC102)
Case: when Pop. is not Normally dist.
6/12/11
01:22:10 PM
IIT Bombay (IC102)
CLT:
The Central Limit Theorem

Population
distribution
small to
moderate n
X
large n
X
6/12/11
01:22:10 PM
IIT Bombay (IC102)
Rule of Thumb
If n > 30, the Central Limit Theorem
can be used.
6/12/11
01:22:10 PM
IIT Bombay (IC102)
6/12/11
01:22:10 PM
IIT Bombay (IC102)
Points to note:
1. When n is large, the sampling distn. of sample mean is
well approximated by a normal curve, even when the
pop. distn. is not itself normal.
2. Sample mean based on a large n will tend to be closer to
pop. mean than will sample mean based on a small n.
3. The sampling dist. of tends to be centered at the value
of the pop. mean.
4. The spread of the sampling distn. of tends to grow
smaller as the sample size n increases.
5. As n increases, the sampling distn. of tends to a
normal distn. with mean
X
X
X
and SD
X X
n


6/12/11
01:22:10 PM
IIT Bombay (IC102)
When a sample X1,, Xn is drawn from a
pop. with mean and SD , and
when n is large (CLT)
or when the pop. has a normal distn.,
(0,1)
/
x
x
X
X
Z N
n


:
6/12/11
01:22:10 PM
IIT Bombay (IC102)
Example:
6/12/11
01:22:10 PM
IIT Bombay (IC102)
Example:
6/12/11
01:22:10 PM
IIT Bombay (IC102)
Point estimate for pop. mean is
A single number (statistic) based on sample data that represents
our best guess for the value of the pop. mean (e.g., sample
mean, sample median, sample mode)
A statistic whose mean value is equal to is said to be an
unbiased statistic (e.g., sample mean; )
The point estimate (say, 5.5 feet) says nothing about how close it
might be to the true pop. mean .
As an alternative, we might report an entire interval of plausible
values for the pop. mean .
( )
E X
Unbiasedness (Illustration)
6/12/11
01:22:10 PM
IIT Bombay (IC102)
(Methods of point est.Method of Moments;
Method of Max. Likelihood)
6/12/11
01:22:10 PM
IIT Bombay (IC102)
6/12/11
01:22:10 PM
IIT Bombay (IC102)
6/12/11
01:22:10 PM
IIT Bombay (IC102)
6/12/11
01:22:10 PM
IIT Bombay (IC102)
6/12/11
01:22:10 PM
IIT Bombay (IC102)
6/12/11
01:22:10 PM
IIT Bombay (IC102)
6/12/11
01:22:10 PM
IIT Bombay (IC102)
Card-Holders Survey on Payment Cards
www.math.iitb.ac.in/~udai/card-holders_survey2011.html
IIT Bombay (IC102)
Confidence Intervals
An alternative to reporting a single value for the
parameter being estimated is to calculate and
report an entire interval of plausible values a
confidence interval (CI).
A CI is always calculated by first selecting a
confidence level, which is a measure of the
degree of reliability of the interval to have
captured the true pop. mean .
6/12/11
01:22:10 PM
6/12/11
01:22:10 PM
IIT Bombay (IC102)
A confidence level of 95% implies that 95%
of all samples would give an interval that
includes and only 5% of all samples would
yield an erroneous interval.
6/12/11
01:22:10 PM
IIT Bombay (IC102)
Consider a random sample
IIT Bombay (IC102)
95% Confidence Interval for when is known
If after observing X1 = x1,, Xn = xn, we
compute the observed sample mean ,
then a 95% confidence interval for can
be expressed as

x
1.96 , 1.96 x x
n n
| `
+

. ,
6/12/11
01:22:10 PM
6/12/11
01:22:10 PM
IIT Bombay (IC102)
Given that pop. SD is 0.2, a 95% CI for the
pop. mean heights (when sample mean, based
on n=25, is say 5.45) is
0.2 0.2
5.45 1.96 , 5.45 1.96
25 25
| `
+

. ,
1.96 , 1.96 x x
n n
| `
+

. ,
(5.45-0.08, 5.45+0.08) = (5.37,5.53)
IIT Bombay (IC102)
Other Levels of Confidence
( )
/ 2 / 2
1 P z Z z


1
shaded area = / 2
curve z
/ 2
z

/ 2
z

0
6/12/11
01:22:10 PM
IIT Bombay (IC102)
Other Levels of Confidence
/ 2 / 2
, x z x z
n n

| `
+

. ,
A confidence interval for
the mean of a normal population
when the value of is known is given
by

100(1 )%
6/12/11
01:22:11 PM
6/12/11
01:22:11 PM
IIT Bombay (IC102)
Given that pop. SD is 0.2, a 99% CI for the
pop. mean heights (when sample mean, based
on n=25, is say 5.45) is
0.2 0.2
5.45 2.58 , 5.45 2.58
25 25
| `
+

. ,
(5.45-0.10, 5.45+0.10) = (5.35,5.55)