You are on page 1of 56

Session 12

Reference
Levin, R. I. and Rubin, D.S., Statistics for Management
(Pearson Education )
Black, K., Business Statistics 5
th
Edn., Wiley
Publication.

Q? What is the purpose of obtaining a sample?

A. To provide a description of a population
In the inferential statistics process, a researcher
selects a random sample from the population,
computes a statistic on the sample, and reaches
conclusions about the population parameter from
the statistic.

In attempting to analyze the sample statistic, it is
essential to know the distribution of the statistic.

Sampling distribution: The probability distribution of
a statistic, obtained by selecting all the possible
samples of a specific size from a population.

Predicting the characteristics of a sample
Example



Frequency distribution for a population of four scores: 2, 4, 6, 8
Suppose we know the marks of four students
Scores: 2,4,6,8
Lets construct a distribution of sample means
Population parameters (scores) : 2,4,6,8
Specify a sample size, say n=2
Examine all possible samples (A,A), (A,B), (A,C).
The possible samples of n = 2 scores from the population
Figure Distribution of sample means
The distribution of sample means for n = 2
Characteristics of sample means
Sample means tend to pile up around the population
mean
The distribution of sample means is approximately
normal in shape.
The distribution of sample means can be used to
answer probability questions about sample means
What do we use when we have a large n
and do not want to calculate all of the
possible samples ?
Central Limit Theorem
CLT: For any population with mean of and a
standard deviation o, the distribution of sample
means for sample size n will approach a normal
distribution with a mean of and a standard
deviation of o/ (square root of n) as n approaches
infinity.

n
Central Limit Theorem Contd
Distribution of sample means tends to be a normal
distribution particularly if one of the following is
true:
The population from which the sample is drawn is normal.
The number of scores (n) in each sample is relatively large
(n>30)
Expected value of X
Sample means should be close to the population
mean (expected value of x)
Expected value of X: the mean of the distribution
of sample means will be equal to (the
population mean)
X
Standard Error of X
n
x
o
o =
Standard error of the
mean for an infinite
population
Standard deviation of
the population
x
x

Magnitude of the Standard error is
determined by

The size of the sample
The standard deviation of the population from
which the sample is selected
Law of large numbers: the > n, the more
probable the sample mean will be close to the
population mean.


Estimating the Population Mean
Interval estimate
Suppose a marketing research director needs an
estimate of the average life in months of car batteries
his company manufacturers.
A random sample of 200 batteries is selected.
Enquire about the life of the batteries.
Mean battery life is 36 months.
Point estimate: Mean battery life is 36 months.
What about the uncertainty factor??
To answer this we need to find the standard error.
Standard error is calculated as 0.707 months
In other words: actual life of battery may lie
somewhere in the interval estimate of 35.293 to
36.707 months
Session 13
Confidence Interval to Estimate
when o is Known
n
x
x
E
=
n
z x
n
z x
or
n
z x
o

o
o
+ s s

Point estimate





Interval Estimate
What is a confidence interval?
One sample out of 20 (5%) does not contain the true mean, 15.
10
11
12
13
14
15
16
17
18
19
20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Sample
Confidence Interval (contd..)
95% confidence means: 95% of all the sample means
are within 2 standard errors from .
is within 2 standard errors of 95% of all the
sample means.
Distribution of Sample Means
for 95% Confidence

.4750 .4750
X
95%
.025 .025
Z
1.96 -1.96 0
For a 95% confidence interval
= 0.05
/2 = 0.025

Value of /2 or z
.025
look at the standard normal
distribution table under
.5000 - .0250 = .4750
From standard normal table look up 0.4750, and read
1.96 as the z value from the row and column
Estimating the Population Mean
is used to locate the Z value in constructing the
confidence interval
The confidence interval yields a range within which
the researcher feel with some confidence the
population mean is located
Z score the number of standard deviations a value
(x) is above or below the mean of a set of numbers
when the data are normally distributed

Estimating the Population Mean
x
z
n

=

o
95% Confidence Intervals for

X
95%
X
X
X
X
X
X
/2
1300, 160, 85, 1.96 x n z
o
= o = = =
/ 2 / 2
46 46
1300 1.96 1300 1.96
85 85
1300 34.01 1300 34.01
1265.99 1334.01
x z x z
n n
o o
o o
s s +
s s +
s s +
s s
95% Confidence Interval for
Problem # 1
A survey was taken of U.S. companies that do
business with firms in India.
One of the questions on the survey was:
Approximately how many years has your company
been trading with firms in India?
A random sample of 44 responses to this question
yielded a mean of 10.455 years. Suppose the population
standard deviation for this question is 7.7 years.
Using this information, construct a 90% confidence
interval for the mean number of years that a U.S.
company has been trading with firms in India.
365 . 12 545 . 8
91 . 1 455 . 10 91 . 1 455 . 10
44
7 . 7
645 . 1 455 . 10
44
7 . 7
645 . 1 455 . 10

s s
+ s s
+ s s
+ s s
n
z x
n
z x
o o
645 . 1 confidence % 90
. 44 , 7 . 7 , 455 . 10
=
= = =
z
n x o
Problem 1 - Solution
Problem # 2
A study is conducted in a company that employs 800
engineers. A random sample of 50 engineers reveals
that the average sample age is 34.3 years.
Historically, the population standard deviation of the
age of the companys engineers is approximately 8
years.

Construct a 98% confidence interval to estimate the
average age of all the engineers in this company.
85 . 36 75 . 31
554 . 2 3 . 34 554 . 2 3 . 34
1 800
50 800
50
8
33 . 2 3 . 34
1 800
50 800
50
8
33 . 2 3 . 34
1 1
s s
+ s s

+ s s

+ s s

o
N
n N
n
z x
N
n N
n
z x
33 . 2 confidence % 98
. 50 and , 800 = , 8 , 3 . 34
=
= = =
z
n N x o
Problem 2- Solution
Estimating the Mean of a Normal
Population: Sample Size is Small (n<30)
The distribution of sample means is approximately
normal if the population has a normal distribution.

The z formulas can be used to estimate a population
mean if the value of the population Standard
Deviation is known.
Problem #3
Suppose a car rental firm wants to estimate the
average number of miles travelled per day by
each of its car. A random sample of 20 cars data
reveal that the sample mean travel distance per
day is 85.5 km with a population standard
deviation of 19.3 km. Assume that the number of
miles travelled per day is normally distributed in
the population.
Compute 99% confidence interval to estimate
population mean.
96.6 4 . 74 s s
Problem ??
The Greensboro Coliseum is considering
expanding its seating capacity and needs to know
both the average number of people who attends
events there and the variability in this number.
The following are the attendances in thousands
at nine randomly selected sporting events. Find
the point estimates of the mean and the variance
of the population from which the sample was
drawn.
8.8 14.0 21.3 7.9 12.5 20.6 16.3 14.1 13.0
Answer: 14.2777 thousands; 21.119
Problem ??
The National Bank of Lincoln is trying to
determine the number of tellers available during
the lunch rush on Fridays. The bank has
collected data on the number of people who
entered the bank during the last 3 months on
Friday from 11 a.m. to 1 p.m. Using the data
below, find the point estimates of the mean and
standard deviation of the population from which
the sample was drawn.
242, 275, 289, 306, 342, 385, 279, 245, 269, 305,
294, 328
Answer: x bar = 296.58 people; s =40.75
Problem ??
Bobby wants to purchase a used car. He randomly selected 125
want ads and found that the average price of a car in this sample
was Rs.1.75 lakhs. He knows that the standard deviation of the
used-car prices in the city is Rs.33500.

(a) Establish an interval estimate for the average price of a car so
that Bobby can be 68.3 percent certain that the population mean
lies within this interval?

Answer: (a) 172003.6 177996.3
(b) Establish an interval estimate for the average price of a car so
that Bobby can be 95.5 percent certain that the population mean
lies within this interval
Answer: (b) 169007.3 180992.7
Session 14
Problem ??
The Westview High School Principal is interested in knowing the
average height of seniors at this school, but she does not have
enough time to examine the records of all 430 seniors. It is
assumed that the height of seniors follows normal distribution.
She randomly selects 48 students. She finds the sample mean to
be 64.5 inches and the standard deviation to be 2.3 inches.

(a) Find the estimated standard error of the mean

Answer: (a) 0.31326
(b) Construct a 90 percent confidence interval for the mean
Answer: (b) 63.986 65.014
t Distribution
When the population standard deviation is unknown,
sample size is <30.t distribution
Early theoretical work on t distribution was done by
W.S. Gosset in early 1900s (Guinness Brewery, Dublin)
t distribution is used instead of the z distribution for
doing inferential statistics on the population mean
when the population Std Dev is unknown and the
population is normally distributed
With the t distribution, you use the Sample Std Dev, s
n
s
x
t

=
A family of distributions - a unique distribution for
each value of its parameter using degrees of freedom
(d.f.)
t formula:
t Distribution
t distribution symmetric, unimodal, mean = 0,
flatter in middle and have more area in their tails
than the normal distribution
t distribution approach the normal curve as n becomes
larger
t distribution is to be used when the population variance
or population Std Dev is unknown, regardless of the size
of the sample
t Distribution Characteristics
t table uses the area in the tail of the distribution
Emphasis in the t table is on , and each tail of the
distribution contains /2 of the area under the curve
when confidence intervals are constructed
t values are located at the intersection of the df
value and the selected /2 value
Reading the t Distribution
1
1 , 2 / 1 , 2 /
1 , 2 /
=
+ s s

n df
n
s
t x
n
s
t x
or
n
s
t x
n n
n
o o
o

Confidence Intervals for of a


Normal Population: Unknown o
Table of Critical Values of t
to
o
0
With df = 24 and o = 0.05,
t
o
= 1.711.
df
t0.100 t0.050 t0.025 t0.010 t0.005
1 3.078 6.314 12.706 31.821 63.656
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
40 1.303 1.684 2.021 2.423 2.704
60 1.296 1.671 2.000 2.390 2.660
120 1.289 1.658 1.980 2.358 2.617
1.282 1.645 1.960 2.327 2.576

1 =
+ s s

n df
n
s
t x
n
s
t x
or
n
s
t x

Confidence Intervals for of a


Normal Population: Unknown o
Problem #4
The owner of a large equipment rental company wants to
make a rather quick estimate of the average number of days
a piece of ditch digging equipment is rented out per person
per time. The company has records of all rentals, but the
amount of time required to conduct an audit of all accounts
would be prohibitive. The owner decides to take a random
sample of rental invoices. Fourteen different rentals of ditch
diggers are selected randomly from the files, yielding the
following data. The owner uses these data to construct a
99% confidence interval to estimate the average number of
days that a ditch digger is rented and assumes that the
number of days per rental is normally distributed in the
population.

3 1 3 2 5 1 2 1 4 2 1 3 1 1
18 . 3 10 . 1
04 . 1 14 . 2 04 . 1 14 . 2
14
29 . 1
012 . 3 14 . 2
14
29 . 1
012 . 3 14 . 2
s s
+ s s
+ s s
+ s s

n
s
t x
n
s
t x
012 . 3
005 . 0
2
99 . 1
2
13 1 , 14 , 29 . 1 , 14 . 2
13 , 005 .
=
=

=
= = = = =
t
n df n s x
o
Solution for Problem #4
Problem ??
Suppose a researcher wants to estimate the average amount of
extra working hours (beyond their 40-hour week) used per week
for managers in the aerospace industry. He randomly samples 18
managers and measures the amount of extra time they work
during a specific week and obtains the results (in hours) as shown
below:
6 21 17 20 7 0 8 16 29
3 8 12 11 9 21 25 15 16
Construct a 90% confidence interval to estimate the average
amount of extra time per week worked by a manager
Answer: (a) 10.356 16.754
t
0.05,17
= 1.740

2 2


:

= sample proportion

=1
= population proportion
= sample size
p q p q
p z p p z
n n
where
p
q p
p
n
o o

s s +

Confidence Interval to Estimate


the Population Proportion
Estimating the population proportion often
must be made
Problem #5
A clothing company produces mens jeans. The jeans
are made and sold with either a regular cut or a boot
cut. In an effort to estimate the proportion of their
mens jeans market in Oklahoma City that prefers
boot-cut jeans, the analyst takes a random sample
of 423 jeans sales from the companys two Oklahoma
City retail outlets. Only 72 of the sales were for
boot-cut jeans. Construct a 90% confidence interval
to estimate the proportion of the population in
Oklahoma City who prefer boot-cut jeans.


(0.17)(0.83) (0.17)(0.83)
0.17 1.645 0.17 1.645
423 423
0.17 0.03 0.17 0.03
0.14 0.20
pq pq
p z p p z
n n
p
p
p
s s +
s s +
s s +
s s
72

423, 72, 0.17


423

=1 1 0.17 0.83
90% 1.645
x
n x p
n
q p
Confidence z
= = = = =
= =
=
Solution Problem #5
Determining Sample Size when Estimating
It may be necessary to estimate the sample size
when working on a project
In studies where is being estimated, the size of the
sample can be determined by using the z formula for
sample means to solve for n
Difference between and is the error of estimation x
Determining Sample Size when Estimating
n
x
z
o

=
= x E
|
|
.
|

\
|
= =
E
z
E
z
n
o
o
o o
2
2
2
2 2
2
o ~
1
4
range
z formula


Error of Estimation (tolerable error)


Estimated Sample Size

Estimated o
Problem #6
Suppose you want to estimate the average age of all
Boeing 737-300 airplanes now in active domestic U.S.
service. You want to be 95% confident, and you want
your estimate to be within one year of the actual
figure. The 737-300 was first placed in service about
24 years ago, but you believe that no active 737-300s
in the U.S. domestic fleet are more than 20 years old.
How large of a sample should you take?
2 2
2
2 2
(1.96) (5)
2
1
96.04 or 97
n
z
E
=
=
=
o
Solution for Problem 6
Determining Sample Size when Estimating p
n
q p
p p
Z

=

p p E =

E
pq z
n
2
2
=
z formula



Error of Estimation (tolerable error)



Estimated Sample Size
Problem #7
Hewitt Associates conducted a national survey to
determine the extent to which employers are
promoting health and fitness among their employees.
One of the questions asked was, Does your company
offer on-site exercise classes? Suppose it was
estimated before the study that no more than 40% of
the companies would answer Yes. How large a sample
would Hewitt Associates have to take in estimating
the population proportion to ensure a 98% confidence
in the results and to be within .03 of the true
population proportion?
2
2
2
2
(2.33) (0.40)(0.60)
(.03)
1, 447.7 or 1, 448
z pq
n
E
=
=
=
60 . 0 1
40 . 0
33 . 2 % 98
03 . 0
= =
=
=
=
P Q
P estimated
Z Confidence
E
Solution for Problem 7