You are on page 1of 55

Introduction to Probability and Statistics

Handout #7

Instructor: Lingzhou Xue

TA: Daniel Eck

The pdf file for this class is available on the class web page.

1

Chapter 8
Fundamental Sampling Distributions and
Data Descriptions

2

¯ and Sample Variance S 2.
• Sample Mean X

• Histogram and Box Plot.

• Central Limit Theorem (CLT).

• χ2, t, and F Distributions.

3

Example 1: Sample Distribution
The sample distribution is the distribution resulting from the
collection of actual data. A major characteristic of a sample is
that it contains a finite (countable) number of scores, the number of scores represented by the letter n. For example, suppose
that the following data were collected:
15 14 15 18 15 20 15 16 17 14 17 13 11 14 18 12 17 12 21 8
14 17 14 12 13 15 15 16 17 14 16 13 14 15 18 16 16 17 14 15
16 15 17 12 14 14 13 13 13 14
These numbers constitute a sample distribution.

4

0.10
0.05
0.00

Density

0.15

0.20

Histogram

8

10

12

14

16

18

x

Sample Distribution.

20

called statistics. To achieve order in this chaos. and correlation coefficient. Examples of statistics are the mean. median. among others. different scores would result. . However. there would also be some consistency in that while the statistics would not be exactly the same. If a different sample was taken. range.In addition to the frequency distribution. the sample distribution can be described with numbers. statisticians have developed probability models. they would be similar. mode. standard deviation.

05 0.15 0.0.10 Density 0.15 8 8 10 10 12 12 14 16 14 16 x 18 18 20 0.10 Density 0.00 0.10 Density 0.20 Histogram Histogram 20 10 12 10 14 12 16 x x Histogram Histogram 14 x 18 16 20 18 22 20 .15 0.05 0.05 0.10 Density 0.00 0.00 0.05 0.20 0.00 0.15 0.

Random Sampling 5 .

It is the entire group we are interested in. which we wish to describe or draw conclusions about.Population A population consists of the totality of the observations with which we are concerned. 6 . Sample A sample is subset of a population.

7 .2008 Presidential Race from CNN.

8 . the population would be all of the students who attend UMN.Example 2 If you wanted to find out the percentage of students at UMN who enjoy reading Time. If we randomly select 20% of the population. Therefore. this selection would be the sample in this experiment.

. X2. each having the same probability distribution f (x). . . . . . Xn be n independent random variables. 9 . . . . Define X1. Xn to be a random sample of size n from the population f (x) and write its joint probability distribution as g(x1. . X2. Random Sample Let X1.A simple random sample of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected. xn) = f (x1)f (x2) · · · f (xn). . . x2.

Some Important Statistics 10 .

Statistic A statistic is a function of random variables that does not depend upon any unknown parameter. X2. . . . Xn represent a random sample of size n. Sample Mean & Sample Variance If X1. S2 = (Xi − X) n − 1 i=1 11 . . then the sample mean is defined by the statistic n 1 X ¯= Xi X n i=1 and the sample variance is defined by the statistic n X 1 ¯ 2.

15.Example 3 A comparison of coffee prices at 4 randomly selected grocery stores in San Diego showed increases from the previous month of 12. n i=1 s2 = 4 34 1 X 2 (xi − x ¯) = . Solution: 4 1 X x ¯= xi = 16 cents. Find the variance of this random sample of price increases. and 20 cents for 1-pound bag. 17. 4 − 1 i=1 3 12 .

we may write  2  n n X X 1   Xi2 −  Xi  .Theorem If S 2 is the variance of a random sample of size n. S2 = n n(n − 1) i=1 i=1  Proof: 13 .

5·6 6 Thus the sample standard deviation s = q 13/6 = 1. 14 . at Lake Muskoka. 6. s2 = 13 1 2 [6 · 171 − 31 ] = . representing the number of trout caught by a random sample of 6 fishermen on June 19. 4. 5. n = 6. Solution: 6 X x2 i = 171.Example 4 Find the sample mean and variance of the data 3.47. 1996. and 7. i=1 Hence. 6. i=1 6 X xi = 31.

3. 1. • the sample variance. 15 . 0. 6. 4. 3.Example 5 The numbers of incorrect answers on a true-false competency test for a random sample of 15 students were recorded as follows: 2. 3. Find • the sample mean. 0. 1.

Thus. A trick to remember this one is to remember that mode starts with the same first two letters that most does. You’ll never forget that one! Median The median is the middle value in your list. 16 .Mode The mode in a list of numbers refers to the list of numbers that occur most frequently. When the totals of the list are even. the median is the middle entry in the list after sorting the list into increasing order. the median is equal to the sum of the two middle (after sorting the list into increasing order) numbers divided by two. remember to line up your values. the middle number is the median! Be sure to remember the odd and even rule. When the totals of the list are odd.Mode. Most frequently .

Data Displays and Graphical Methods 17 .

” If the distance from the box exceeds 1. or outliers. in addition. the boxplot indicates which observations.Box Plot A box plot (also known as a box-and-whisker diagram or plot or candlestick chart) is a convenient way of graphically depicting the five-number summary. 18 . Technically.5 times the interquartile range. if any. the observation may be labeled an outlier. Outlier Outliers are observations that are considered to be unusually far from the bulk of the data. one may view an outlier as being an observation that represents a ”rare event. median. are considered unusual. Q3 − Q1 (in either direction). 75% percentile (upper quartile or third quartile (Q3)) and adjust values. which consists of 25% percentile (lower quartile or first quartile (Q1)).

Box Plot 19 .

• Find the median. • Find the lower quartile.Example 6 The following set of numbers are the amount of marbles fifteen different boys own (they are arranged from least to greatest). • Find the upper quartile. • Find the interquartile range. 20 . 18 27 34 52 54 59 61 68 78 82 85 87 91 93 100.

Box-and-Whisker Plot for Example 6. 21 .

Sampling Distribution of Means 22 .

Sampling Distribution The probability distribution of a statistic is called a sampling distribution. 23 .

24 . either finite or infinite. the sampling distribution of X be approximately normal with mean µ and variance σ 2/n provided that the sample size is large (n > 30). 1). then the limiting form of the distribution of ¯ −µ X Z= √ σ/ n as n → ∞. is the standard normal distribution N (0.If we are sampling from a population with unknown distribu¯ will tion. Central Limit Theorem ¯ is the mean of a random sample of size n taken from a If X population with mean µ and finite variance σ 2.

f (x) = e 4 x>0 Exponential p.d.Example 7 ¯ be the sample mean of a random sample of size 100 drawn Let X from an exponential distribution with its graph given by 1 −x/4 .f with β = 4. 25 .

plain briefly your reasoning. .Decide which of the graphs labeled (a)-(d) would most closely ¯ Exresemble the sampling distribution of the sample mean X.

with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a random sample of 16 bulbs will have an average life of less than 775 hours? Solution: 26 .Example 8 An electrical firm manufactures light bulbs that have a length of life that is approximately normally distributed.

approximate the probability that the sample mean of their blood cholesterol levels will lie between 198 and 206. 1.Example 9 The blood cholesterol levels of a population of workers have mean 202 and standard deviation 14. If a sample of 36 workers is selected. 2. Repeat 1 when the sample size is 64. 27 .

Solution: 28 .

with means µ1 and µ2. respectively. X normally distributed with mean and variance given by µX ¯1 −X ¯2 = µ1 − µ2 . discrete or continuous.Sampling Distribution: Difference Between Two Averages If independent samples of size n1 and n2 are drawn at random from two populations. 29 . then the sampling ¯1 − X ¯2. and σ12 σ22 2 σX + . ¯1 −X ¯2 = n1 n2 Hence Z= ¯1 − X ¯2) − µ ¯ ¯ (X X1 −X2 r σ12 σ22 n1 + n2 is approximately a standard normal variable. and variances σ12 and σ22. is approximately distribution of the differences of means.

while those of manufacturer B have a mean lifetime of 6. What is the probability that a random sample of 36 tubes from manufacturer A will have a mean lifetime that is at least 1 year more than the mean lifetime of a sample of 49 tubes from manufacturer B? Solution: .9 year.5 years and a standard deviation 0.Example 10 The television picture tubes of manufacture A have a mean lifetime of 6.0 years and a standard deviation of 0.8 year.

more than 10 points? 2. What is the probability that two groups of students selected at random. consisting of 64 and 100 students.Example 11 The mean score for freshmen on an aptitude test at a certain college is 540. respectively. with a standard deviation of 50. will differ in their mean scores by 1. an amount between 5 and 10 points? 30 .

Solution: 31 .

Sampling Distribution of S 2 32 .

Sampling Distribution of S 2 If S 2 is the variance of a random sample of size n taken from a normal population having the variance σ 2. 33 . then the statistic n ¯ 2 X (n − 1)S 2 (Xi − X) 2 = χ = 2 σ σ2 i=1 has a chi-squared distribution with υ = n − 1 degrees of freedom.

from a normal population with variance σ 2 = 5. between 2. greater than 2. Solution: 34 .065.6445. will have a variance s2 1.065 and 3.Example 12 Find the probability that a random sample of 21 observations. 2.

t−Distribution 35 .

where Z q T = V /υ is given by the density function Γ[(υ + 1)/2] t2 −(υ+1)/2 (1 + ) .t−Distribution Let Z be a standard normal random variable and V a chisquared random variable with υ degrees of freedom. h(t) = Γ(υ/2) υ −∞ < t < ∞. then the distribution of the random variable T . This is known as the t−distribution with υ degrees of freedom. If Z and V are independent. 36 .

0 f(x) 0. 5 and 100.3 v=5 −3 −2 −1 0 1 2 3 x The t-Distribution curves for ν = 2.4 t Distributions v=2 0.0.2 v=100 0.1 0. 37 .

. Xn be independent random variables that are all normal with mean µ and standard deviation σ. . let n 1 X ¯= X Xi n i=1 and n X 1 ¯ 2. . S2 = (Xi − X) n − 1 i=1 ¯ X−µ √ has a t-distribution with Then the random variable T = S/ n ν = n − 1 degrees of freedom. 38 . X2. .Corollary Let X1.

045 for a random sample of size 15 selected from a normal distribution with mean ¯ X−µ √ .761) = 0. µ and T = s/ n Solution: 39 .Example 13 Find k such that P (k < T < −1.

X2. The reader should note that the use of the t-distribution for the statistic ¯ −µ X T = √ S/ n requires that X1. . . . Xn be normal.What Is the t-Distribution Used for? The t-distribution is used extensively in problems that deal with inference about the population mean or in problems that involve comparative samples (i. in cases where one is trying to determine if means from two samples are significantly different).e. .. 40 .

with a mean of 100. The standard deviation in the sample group is 25. What is the probability that the average test score in the sample group will be at most 110.3? Solution: 41 . Suppose 25 people are randomly selected and tested.Example 14 Suppose scores on an IQ test are normally distributed.

Note for fα(v1.Note for χ2 α (v) 2 In the textbook. That is. v2)). we have α = P (χ2 > χ2 α (v)). Note for tα(v) In the textbook. tα represent the t-value above which we find an area equal to α. χα represent the χ2-value above which we find an area equal to α. That is.v2) In the textbook. 42 . we use α = P (T > tα(v)). That is. we have α = P (F > fα(v1. fα represent the f -value above which we find an area equal to α.

517) and P (−2.499) • Find k such that P (k < T < 2.Example 13b ¯ X−µ √ for a random sample of size n = 8.975. Solution: 43 . Consider T = s/ n • Calculate P (T < −2.998 < T < 3.517) = 0.

F -Distribution Let U and V be two independent random variables having chisquared distribution with υ1 and υ2 degrees of freedom. x > 0. (ν +ν )/2 Γ(ν1 /2)Γ(ν2 /2) 1 2 f (x) = (1+ν1 x/ν2 )   0. This is known as the F −distribution with υ1 and υ2 degrees of freedom. respecU/υ tively. 44 . x ≤ 0. Then the distribution of the random variable F = V /υ1 is 2 given by the density    Γ[(ν1+ν2)/2](ν1/ν2)ν1/2 x(ν1 /2)−1 .

5 2.5 0.0 f(x) v1=10. 45 .0 F Distributions v1=100. v2=30 0 1 2 3 4 5 x The F -Distribution curves. v2=10 1.0 0.1. v2=100 v1=6.

respectively. then σ22S12 S12/σ12 F = 2 2= 2 2 S2 /σ2 σ1 S2 has an F -distribution with υ1 = n1 − 1 and υ2 = n2 − 1 degrees of freedom. 46 .Theorem If S12 and S22 are the variances of independent random samples of size n1 and n2 taken from normal populations with variances σ12 and σ22.

the F distribution is applied to many other types of problems in which the sample variances are involved. However. 47 . the F -distribution is called the variance ratio distribution. In fact.What Is the F -Distribution Used for? The F -distribution is used in two-sample situations to draw inferences about the population variances.

P   2 S2 < 7. taken from normal populations with variances σ12 = 20 and σ22 = 10. P S12 S22  > 3.  2. respectively.Example 15 If S12 and S22 represent the variances of independent random samples of size n1 = 31 and n2 = 25.526 .88 . Solution: 48 . find: 1.

05 = 0.4152  = 1 − 0.Solution: 1. P (S22 < 7.95 49 .526 < =1−P 10  ! 10 χ2 24 > 36.526) = P 24 · S22 24 · 7.

24) > 1. P S12 > 3.2.05 .94 = 0.88 · 2 2 S2 σ1 σ1 ! 2 2 S1 σ2 1 =P · > 3.88 · 2 S22 σ12   = P F(30.88 2 S2 ! ! 2 2 2 S1 σ2 σ2 =P · 2 > 3.