You are on page 1of 59

Interval Estimation

Population variance Known

The output voltage of a power source is known to have a sd of


10V. A random sample of 50 samples gave an average of 118 V.
Find a 95% confidence interval for the population mean voltage

95%
z1

2.5 %

z2

95%
z1

2.5

z2

Risk = 5% (2.5% or 0.025 on either side)


Survival Function (S=0.025)
F(z2)=0.975
Z2= G(0.975)=Inverse of cdf
=NORMSINV(0.975)=1.96
Z1=F(0.025)=NORMSINV(0.025)=-1.96

95%

z2

z1

x1 x z / 2
x2 x z / 2
x z / 2

2.5%

x1 118
z1
1.96
10 / 50
10
x1 118 1.96 *
115 .228
50
x 118
z2 2
1.96
10 / 50
10
x2 118 1.96 *
120.772
50

x z / 2
n
n

Estimation of Confidence Interval


Population variance known
Select a sample of size n
Evaluate sample mean X
Distribution of X is normal

With mean of and SD of

X
Z
/ n
P( z / 2

z / 2 ) (1 )
/ n

A confidence interval for is given by

x z / 2

x z / 2
n
n

Where z/2 is the inverse of cdf for P=(1- /2)


z/2 is given by
z/2= NORMINV(1- /2)

Area under the normal curve


z

area

area

area

area

-3.000

0.001

-1.400

0.081

0.200

0.579

1.800

0.964

-2.800

0.003

-1.200

0.115

0.400

0.655

2.000

0.977

-2.600

0.005

-1.000

0.159

0.600

0.726

2.200

0.986

-2.400

0.008

-0.800

0.212

0.800

0.788

2.400

0.992

-2.200

0.014

-0.600

0.274

1.000

0.841

2.600

0.995

-2.000

0.023

-0.400

0.345

1.200

0.885

2.800

0.997

-1.800

0.036

-0.200

0.421

1.400

0.919

3.000

0.999

-1.600

0.055

0.000

0.500

1.600

0.945

3.200

0.999

A manufacturing process produces pins with a standard


deviation of 0.1 mm. It is required to set the process average
at 5 mm. The average diameter of a sample of 100 items
was found to be 5.027 mm. Does the experiment suggest
the actual process average has shifted from the target of
5.00
Confidence interval 95%
Z = 1.96
Range of process mean (L,U)
0.1
L 5.027 1.96 *
5.007
100
0.1
U 5.027 1.96 *
5.047
100

With 95 % confidence level it can not be said that


process average is 5.00

The average zinc concentration in an alloy is to be controlled. A sample


36 measurements showed zinc percentage to be 2.6 %. The sd = 0.3
%
Estimate the % of zinc in the alloy with a confidence level of 95 %

0.3
L 2.6 1.96 *
2.50
36
0.3
U 2.6 1.96 *
2.70
36

What should be the sample size


For estimating Process average we can be
(1-) x 100 % confident that the error will
not exceed the following value

z / 2 / n
Sample size n for specific error = e

z / 2
n

The average diameter of a product is to be controlled. The sd = 0.1mm


What should be the sample size to estimate the average diameter with
an accuracy of 0.02 mm with a confidence level of 95%

1.96 * 0.10
n

0.02

96

Interval Estimation
Population variance Unknown

Why is it different from the earlier


case
It is required to estimate the output voltage of a power
source. A random sample of 10 power sources gave an
average of 118 V and a sd of 10V. Can you evaluate
with a 95% confidence the range for the output by using
the following formula?

x z / 2

x z / 2
n
n

No. Because standard deviation is estimated from a


sample, actual sd may be more than 10. Therefore the
estimated range has to be widened to take care of
uncertainty in sd

When Variance is unknown


Compute Sample Mean
Compute sample variance s2

Quantity

X
s / is nsaid to have a t distribution

With (n-1) degrees of freedom)


The shape of the t distribution is similar to that
of the normal distribution. If Sample size is
large, it approaches the normal distribution

Degrees of Freedom
Imagine a very simple situation in which the
individual scores that make up a distribution are
3, 4, 5, 6, and 7.
If you are asked to tell what the first score is
without having seen it, the best you could do is a
wild guess, because the first score could be any
number.
If you are told the first score (3) and then asked
to give the second, it too could be any number.

Degrees of Freedom
The same is true of the third and fourth scores
each of them has complete freedom to vary.
But if you know those first four scores (3, 4, 5,
and 6) and you know the mean of the distribution
(5), then the last score can only be 7.
If, instead of the mean and 3, 4, 5, and 6, you
were given the mean and 3, 5, 6, and 7, the
missing score could only be 4.

Degrees of Freedom
In the t test, because the known sample mean is used to
replace the unknown population mean in calculating the
estimated standard deviation, one degree of freedom is
lost.
For each parameter you estimate, you lose one degree
of freedom.
Degrees of freedom is a measure of how much precision an
estimate of variation has.
A general rule is that the degrees of freedom decrease when you
have to estimate more parameters.

The t Distribution

For example, using the normal curve, 1.96 is the cut-off for a twotailed test at the .05 level of significance.
On a t distribution with 3 degrees of freedom (a sample size of 4),
the cutoff is 3.18 for a two-tailed test at the .05 level of significance.
If your estimate is based on a larger sample of 7, the cutoff is 2.45,
a critical score closer to that for the normal curve.

The t Distribution
If your sample size is infinite, the t distribution is
the same as the normal curve.

DF

0.05

0.025

0.01

0.005

0.001

2.01

2.571

3.365

4.032

5.893

10

1.812

2.23

2.76

3.17

4.14

15

1.753

2.13

2.60

2.95

3.73

20

1.725

2.09

2.53

2.85

3.55

25

1.708

2.06

2.49

2.79

3.45

30

1.697

2.04

2.46

2.75

3.39

df\p

0.1

0.05

0.025

0.01

0.005

1.638

2.353

3.182

4.541

5.841

1.533

2.132

2.776

3.747

4.604

1.476

2.015

2.571

3.365

4.032

1.440

1.943

2.447

3.143

3.707

1.415

1.895

2.365

2.998

3.499

1.397

1.860

2.306

2.896

3.355

1.383

1.833

2.262

2.821

3.250

10

1.372

1.812

2.228

2.764

3.169

11

1.363

1.796

2.201

2.718

3.106

12

1.356

1.782

2.179

2.681

3.055

13

1.350

1.771

2.160

2.650

3.012

14

1.345

1.761

2.145

2.624

2.977

15

1.341

1.753

2.131

2.602

2.947

16

1.337

1.746

2.120

2.583

2.921

17

1.333

1.740

2.110

2.567

2.898

df\p

0.1

0.05

0.025

0.01

0.005

18

1.330

1.734

2.101

2.552

2.878

19

1.328

1.729

2.093

2.539

2.861

20

1.325

1.725

2.086

2.528

2.845

21

1.323

1.721

2.080

2.518

2.831

22

1.321

1.717

2.074

2.508

2.819

23

1.319

1.714

2.069

2.500

2.807

24

1.318

1.711

2.064

2.492

2.797

25

1.316

1.708

2.060

2.485

2.787

26

1.315

1.706

2.056

2.479

2.779

27

1.314

1.703

2.052

2.473

2.771

28

1.313

1.701

2.048

2.467

2.763

29

1.311

1.699

2.045

2.462

2.756

30

1.310

1.697

2.042

2.457

2.750

It is required to estimate the output voltage of a


power source. A random sample of 10 samples
gave an average of 118 V and a sd of 10V.
Estimate the range for the output with a
confidence level of 95%

t/2= Inverse of cdf for P=(1-/2)

Prob
0.100
0.050
0.025
0.010
0.005

t
1.383
1.833
2.262
2.821
3.250

t/2=2.262

x1 118
t1
2.262
10 / 10
10
x1 118 2.262 *
110 .85
10
x2 118
t2
2.262
10 / 10
10
x2 118 2.262 *
125.15
10

Estimation of Confidence Interval


Select a sample of size n
Evaluate sample mean
Confidence interval is given by

x t / 2

s
s
x t / 2
n
n

An Example
A new process has been developed that
transfers ordinary iron into a kind of material
known as metallic glass. It is much stronger and
has more corrosion resistance compared to
steel. However it is brittle at high temperature.
An experiment is conducted to note the
temperature at which it shows first signs of
brittleness. ( Data in next slide)
Find with a confidence level of 90 %, the
temperature at which brittleness appears

Experimental Results
1
2
3
4
5
6
7
8
9
10

326.19
333.25
316.77
297.05
315.68
294.86
305.72
297.84
326.80
318.00

11
12
13
14
15
16
17
18
19
20

300.69
304.02
306.43
323.27
301.51
300.51
319.94
331.64
332.14
305.14

Mean of the temperature = 312.87


Sample variance S = 12.94
For a confidence limit of 90 %
t(05,19)=1.729
A 90 % confidence interval for the
temperature is given by
12.94
312.87 1.729
20
12.94
312.87 1.729
20

307.86
317.97

The capacitance of seven electronic components (f) are


9.8, 10.2, 10.4, 9.8, 10, 10.2, 9.6. Find a 95% confidence
interval of the mean resistance
9.8

Confidence Interval = 95%


/2= 0.025
t0.025=2.447

10.2
10.4
9.8

0.283
L 10.00 2.447
9.74
7
0.283
L 10.00 2.447
10.26
7

10
10.2
9.6
Average(R
ange)

Var
(Range)

0.283

Estimating Difference
between two Means
Population Variances known

Difference between two means


Variances known
Sample size n1
Mean 1

Sample size n2
Mean 2

Variance

Variance 2

X1 X 2

Will be approximately distributed with


Mean =

1 2
var

n1
n2
2

Interval for difference between two means


Continued

1 2 x1 x2 z / 2

n1 n2

1 2 x1 x2 z / 2

n1 n2

2
1

2
1

2
2

2
2

Example
A research Project attempted to reduce the average
sulphur content in steel. A new treatment was suggested
to reduce sulphur content in steel. Sulphur content was
measured in some samples with and without the
treatment. The following results was obtained
With Treatment
Sample size 10, mean = 0.42, sd 0.05

Without Treatment
Sample size 6. Mean = 0.51, sd=0.08

1-2=0.51-0.42=0.09

1 2 0.09 1.96
1 2 0.09 1.96

0.08

6
0.082
6

< 1-2 < 0.16

0.05

10
0.052
10

Estimating Difference
between two Means
Population Variances unknown
But Equal

Variances unknown, but equal


Population approximately Normal
2
2
(
n

1
)
S

(
n

1
)
S
1
2
2
S p2 1
n1 n2 2

1 2 ( x1 x2 ) t / 2 S p

1 1

n1 n2

1 2 ( x1 x2 ) t / 2 S p

1 1

n1 n 2

Sp is known as the pooled estimate of population


standard deviation
t/2 is the t value for n1+n2-2 degrees of freedom

A research was undertaken to study the aquatic degradation


due to acid mine discharge. Samples were taken from point
down stream the mine discharge and from a point far from the
point of discharge.
For sample 1 (12 samples 1 every month)
Mean = 3.11, Sample SD (s)= 0.771
For sample 2 (10 samples , 1 every month)
Mean = 2.04, Sample SD (s)=0.448
For 90 % confidence level
2
2
(
n

1
)
S

(
n

1
)
S
1
2
2
S p2 1
0.417
n1 n2 2

1 2 ( x1 x2 ) t / 2 S p

1 1

1.547
n1 n 2

1 2 ( x1 x2 ) t / 2 S p

1 1

0.593
n1 n 2

Assumptions involved
Populations are normal
Variances are equal
Slight to moderate departures from the
above assumptions does not affect the
result significantly if the sample sizes are
identical
Therefore equal sample sizes should be
preferred, if possible

Estimating Difference
between two Means
Population Variances unknown

Unequal variances
( s12 / n1 s22 / n2 ) 2
v 2
[( s1 / n1 ) 2 /( n1 1)] [( s22 / n2 ) 2 /( n2 1)]

1 2 ( x1 x2 ) t / 2

s12 s22

n1 n 2

1 2 ( x1 x2 ) t / 2

s12 s22

n1 n 2

Where ta/2 is the t value for a t distribution with v


degrees of freedom
In case it is not integer, it should be rounded to the
Nearest integer value

Example
A study was undertaken to measure amount of
orthophosphorus (measured in milligrams per liter).
Sample 1
n=15
Mean = 3.84
s=3.07

Sample 2
n=12
Mean = 1.49
s=0.80

Assuming the observations to be from normal


Populations with different variances, find 95%
With confidence level, the difference in means

x1 3.07, s1 3.07, n1 15, x2 1.49 s2 0.80, n2 12


( s12 / n1 s22 / n2 ) 2
v 2
16.3 16
2
2
2
[( s1 / n1 ) /(n1 1)] [( s2 / n2 ) /(n2 1)]
t / 2 2.120
2
1

2
2

1 2 ( x1 x2 ) t / 2

s
s

4.10
n1 n 2

1 2 ( x1 x2 ) t / 2

s12 s22

0.60
n1 n 2

Comparison By Pairing

1
2
3
4
5
6
7
8
9
10

119.266
118.381
112.654
113.388
118.09
113.28
116.511
117.108
114.964
113.219

120.255
110.674
115.713
117.637
113.034
121.264
116.704
118.215
114.913
111.698

1
2
3
4
5
6
7
8
9
10

119.266
118.381
112.654
113.388
118.09
113.28
116.511
117.108
114.964
113.219

120.255
110.674
115.713
117.637
113.034
121.264
116.704
118.215
114.913
111.698

-0.989
7.707
-3.059
-4.249
5.056
-7.984
-0.193
-1.107
0.051
1.521

1
2
3
4
5
6
7
8
9
10

-0.989
7.707
-3.059
-4.249
5.056
-7.984
-0.193
-1.107
0.051
1.521

Sr. No

1
2
3
4
5
6
7
8
9
10

-0.989
7.707
-3.059
-4.249
5.056
-7.984
-0.193
-1.107
0.051
1.521

D will have a t distribution


With 9 degrees of freedom

x t / 2

s
s
x t / 2
n
n

Example
1

2.5

4.9

-2.4

11

6.9

-0.1

3.1

5.9

-2.8

12

3.3

2.9

0.4

2.1

4.4

-2.3

13

4.6

4.6

3.5

6.9

-3.4

14

1.6

1.4

0.2

3.1

-3.9

15

7.2

7.7

-0.5

1.8

4.2

-2.4

16

1.8

1.1

0.7

10

-4

17

20

11

5.5

-2.5

18

2.5

-0.5

36

41

-5

19

2.5

2.3

0.2

10

4.7

4.4

0.3

20

4.1

2.5

1.6

Example (Cont.)

Mean = 0.87
Degrees of freedom =19
S= 2.9773
t for confidence level of 95% and df 19 = 2.093

x t / 2

s
s
x t / 2
n
n

< 0.52, > -2.26

Estimating a Proportion

A lot of 5000 products


have been received

Sample of 200 found to


Contain 5 defectives

Estimate of defectives
In the lot

Estimating a Proportion
The mean and standard deviation of a binomial
Distribution

np
np (1 p) npq
p

p (1 p )

pq
n

Estimating a Proportion
x z / 2

x z / 2
n
n

p z / 2

pq
p ' p z / 2
n

pq
n

p=Actual Proportion
p=sample proportion
q=1-p

From a random sample of 500 cars in a city 240 were


found to be some model from Maruti Udyog Ltd. If the city
Has 50000 registered cars, what is the 95% confidence
Level for the actual number of maruti cars in the city

Example

p=240/500=0.48
Var=0.48x0.52/500=0.000499
SD=0.022
p<0.48+1.96x0.022=0.523
p>0.48-1.96x0.022=0.437
U=0.523*50000=26156
L=0.437*50000=21844

What should be the sample size


For estimating Process average we can be (1-) x
100 % confident that the error will not exceed the
following value

z / 2 pq / n
Sample size n for specific error = e

z / 2 pq

Maximum value of pq=1/4

z / 2
n

2e

Example
We are interested to find the average number of defectives
produced by a machine. What should be the sample size if we
want to be 95% confident that the error of estimate does not
exceed 0.02

z / 2
n

2e

1.96

2 * 0.02

2400

Comparing two Proportions

A lot of 5000

A lot of 4000

Sample of 200
5 Defectives

Sample of 150
6 Defectives

Is the first lot better than


The second ?

Interval for difference between two means


Continued
12 22
1 2 x1 x2 z / 2

n1 n2
1 2 x1 x2 z / 2

12 22

n1 n2

p '1 p'2 p1 p2 z / 2

p1q1 p2 q2

n1
n2

p '1 p'2 p1 p2 z / 2

p1q1 p2 q2

n1
n2

Example
Certain changes has been incorporated in a process
for manufacturing electronic components. 75 out of
1500 items were found to be defectives before
resetting and 80 out of 2000, after resetting. Find a
90% confidence interval for the true difference in
fraction defectives of the old and the new setting

p1 0.05, p2 0.04, p1 p2 0.01


0.05 * 0.95 0.04 * 0.96
p1 p2 0.01 1.645

0.0217
1500
2000
0.05 * 0.95 0.04 * 0.96
p1 p2 0.01 1.645

0.0017
1500
2000