Sie sind auf Seite 1von 168

11

Chapter 1

1.2 In order to know what percent of owners of portable MP3 players are 18 to 24 years old, we
would need to know two things: The number of people who own MP3 players, and the number
of those owners in that age group. The Arbitron data tells us neither of those things.

1.3 (a) The stemplot does a better job, the dots in the dotplot are so spread out, it is difficult to
identify the shape of the distribution. (b) The numbers in the left column show cumulative
counts of observations from the bottom up and the top down. For example, the 5 in the third row
indicates that 5 observations are at or .below 1.09. The (3) in the far left column is Minitab's way
of marking the location of the "middle value." Instead of providing a cumulative count, Minitab
provides the number ofleaves (observations) in the row that contains the center of the
distribution. The row with the parentheses also indicates where the cumulative counts switch
from the bottom up to the top down. For example, the 7 in the lOth row indicates that 7
observations are at or above 1.72. (c) The final concentration, as a multiple of its initial
concentration should be close to 1. This sample is shown as the second dot from the left on the
dotplot and in the second row ofthe stemplot. The sample has a final concentration of0.99.

1.4 (a) Liberty University is represented with a stem of 1 and a leaf of3. Virginia State
University is represented with a stem of 1 and a leaf of 1. The colleges represented with a stem
of2 and a leaf of 1 are: Hollins; Randolph-Macon Women's; Sweet Briar; William and Mary.
(b) These 23 twos represent the 23 community colleges. The stem of 0 represents all colleges
and universities with tuition and fees below $10,000.
12 Chapter 1 Exploring Data 13

1.5 The distribution is approximately symmetric with a center at-35. The smallest DRP score is 1.8 (a) The distribution of the number of frost days is skewed to the right, with a center around 3
14 and the largest DRP score is 52, so the scores have a range of38. There are no gaps or (31 observations Jess than 3, 11 observations equal to 3, and 23 observations more than 3). The
outliers. smallest number of frost days is 0 and the largest number is 10. There are no gaps or outliers in
Stem-and-leaf of DRP N = 44 this distribution. (b) The temperature never fell below freezing in April for about 23% (15 out of
Leaf Unit = 1.0 65 years) ofthese 65 years.
2 1 44
6 1 5899 1.9 The distribution of the time ofthe first lightning flash is roughly symmetric with a peak
7 2 2 during the l2 1h hour ofthe day (between 11:00 am and noon). The center ofthe distribution is at
15 2 55667789
20 3 13344 12 hours, with the earliest lightning flash in the ih hour of a day (between 6:00am and 7:00am)
(6) 3 555589 and the latest lightning flash in the 1ih hour of a day (between 4:00pm and 5:00pm).
18 4 00112334
10 4 5667789
3 5 122 1.10 The distribution oflengths ofwords in Shakespeare's plays is skewed to the right with a
center between 5 and 6 letters. The smallest word contains one Jetter and the largest word
contains 12 letters, so the range is 11 letters.
1.6 (a) and (b) The stemplots are shown below. The stemplot with the split stems shows the
skewness, gaps, and outliers more clearly. (c) The distribution of the amount of money spent by
shoppers at this supermarket is skewed to the right, with a minimum of $3 and a maximum of
$93. There are a few gaps (from $62 to $69 and $71 to $82) and some outliers on the high end
($86 and $93).

Stem-and-leaf of Dollar N = 50 Stem-and-leaf of Dollar N = 50


Leaf Unit = 1 . 0 Leaf Unit = 1.0

3 0 399 1 0 3
13 1 1345677889 3 0 99
(15) 2 000123455668888 6 1 134
22 3 25699 13 1 5677889
17 4 1345579
(b) The distribution is approximately symmetric with a single peak at the center of about 55
20 2 0001234
10 5 0359 (8) 2 55668888 years. The youngest president was 42 at inauguration and the oldest president was 69. Thus,
6 6 1 22 3 2 range is 69-42=27 years. (c) The youngest was Teddy Roosevelt; the oldest was Ronald
5 7 0 21 3 5699
4 8 366 17 4 134
Reagan. (d) At age 46, Bill Clinton was among the younger presidents inaugurated, but he was
1 9 3 14 4 5579 not unusually young. We certainly would not classify him as an outlier based on his age at
10 5 03 inauguration!
8 5 59
6 6 1
5 6
5 7 0
4 7
4 8 3
3 8 66
1 9 3

1.7 (a) The distribution oftotal returns is roughly symmetric, though some students might say
SLIGHTLY skewed to the right. (b) The distribution is centered at about 15%. (39% of the
stocks had a total return less than 10%, while 60% had a return less than 20%. This places the
center of the distribution somewhere between 10% and 20%.) (c) The smallest total return was
between -70% and -60%, while the largest was between 100% and 110%. (d) About 23%
(1 + 1+ 1+ 1+ 3 + 5 + 11) of all stocks lost money.
14 Chapter 1 Exploring Data 15

(b) The distribution contains a large gap (from 2 to 38 grams). A closer look reveals that the diet
drinks contain no sugar (or in one case a very small amol!_nt of s11gar), but the regular soft drinks
contain much more sugar. The diet soft drinks appear in the bar on the left of the histogram and
the regular drinks appear in a cluster of bars to the right of this bar. Both graphs show that the
sugar content for regular drinks is slightly skewed to the right.

1.13 (a) The center corresponds to the 50th percentile. Draw a horizontal line from the value 50
on the vertical axis over to the ogive. Then draw a vertical line from the point of intersection
down to the horizontal axis. This vertical line intersects the horizontal axis at approximately $28.
Thus, $28 is the estimate ofthe center. (b) The relative cumulative frequency for the shopper
who $17.00 is 9/50= 0.18. The histogram is shown below.
(c) From the graph we get a very rough estimate of 30%-5%=25%. The actual percent is
=
4/18x100 22.22%. The center of the distribution is between 140 and 150, at about 148 mg/dl.
The relative cumulative frequency associated with 130 mg/dl is about 30% from the graph or
=
5/18x100 27.78%.

1.14 (a) Two versions of the stemplot are shown below. For the first, we have (as the text
suggests) rounded to the nearest 10; for the second, we have trimmed numbers (dropped the last
digit). 359 mg/dl appears to be an outlier. The distribution of fasting plasma glucose levels is
skewed to the right (even if we ignore the outlier). Overall, glucose levels are not under control:
Only 4 ofthe 18 had levels in the desired range.
Stem-and-leaf of Glucose levels N = Stem-and-leaf of Glucose levels N = (b) Yes, the birthrate has clearly been decreasing since 1960. In fact, the birthrate only increased
18 18
Leaf Unit = 10 Leaf Unit = 10 in one 10-year period, from 1980 to 1990. (c) Better education, the increased use of
contraceptives, and the possibility of legal abortion are just a few of the factors which may have
1 0 8 3 0 799 led to a decrease in birthrates. d A time plot for the number of births is shown below.
7 1 000134 (7) 1 0134444
(7) 1 5555677 8 1 5577
4 2 0 4 2 0
3 2 67 3 2 57
1 3 1 3
1 3 6 1 3 5

(b) A relative cumulative frequency graph (ogive) is shown below.

(e) The total number ofbirths decreased from 1960 to 1980, increased drastically from 1980 to
1990, and stayed about the same in 2000. (f) The two variables are measuring different things.
Rate of births is not affected by a change in the population but the total number ofbirths is
affected; assuming that the number of births per mother remains constant.
16 Chapter 1 Exploring Data 17

(b) The life expectancy of females has drastically increased over the last hundred years from
48.3 to 79.5. The overall pattern is roughly linear, although the increases appear to have leveled (b) The plot shows a decreasing trend-fewer disturbances overall in the later years. The counts
off a bit from 1980 to 2000. show similar patterns (seasonal variation) from year to year. The counts are highest in the
second quarter (Q2 on the graph and Apr.-June in the table). The third quarter (Q3 on the graph
and July-Sept. in the table) has the next highest counts. One possible explanation for this
seasonal variation is that more people spend longer amounts of time outside during the spring
(Q2) and summer (Q3) months. The numbers of civil disturbances are lowest (in Q1 and Q4)
when people spend more time inside.

1.19 Student answers will vary; for comparison, recent US. News rankings have used measures
such as academic reputation (measured by surveying college and university administrators),
retention rate, graduation rate, class sizes, faculty salaries, student-faculty ratio, percentage of
faculty with highest degree in their fields, quality of entering students (ACT/SAT
scores, high school class rank, enrollment-to-admission ratio), financial resources, and the
percentage of alumni who give to the school.

Both plots show the same overall pattern, but the histogram is preferred because of the large 1.20 A histograms from a TI calculator and Minitab are shown below. The overall shape,
number of measurements. A stemplot would have the same appearance as the graphs above, but skewed to the right, is clear in all of the graphs. The stemplots in Exercise 1.6 give exact (or at
it would be somewhat less practical, because of the large number of observations with common least rounded) values of the data and the histogram does not. Stemplots are also very easy to
stems (in particular, the stems 2 and 3). (b) The histogram is approximately symmetric with two construct by hand. However, the histogram gives a much more appealing graphical summary.
unusually low observations at -44 and -2. Since these observations are strongly at odds with the Although histograms are not as easy to construct by hand, they are necessary for large data sets.
general pattern, it is highly likely that they represent observational errors. (c) A time plot is
shown below. WINDOW
XMin=3.11
Xr~ax= 106. 23
Xscl=12.89
Yrlin= -6.0138
YMax=23.4
Yscl=2
Xres=1

WINDOW
XMin=0
XMax=100
Xscl=10
YMin= -2
YMax=15
Yscl=2
Xres=l
(d) Newcomb's largest measurement errors occurred early in the observation process. The
measurements obtained over time became remarkably consistent.
Chapter 1 Exploring Data 19
18

Stem-and-leaf of Over65 N = 48 Stem-and-leaf of Over65 N = 48


Leaf Unit = 0.10 Leaf Unit = 0.10

1 8 5 1 8 5
4 9 679 1 9
5 10 6 4 9 679
13 11 02233677 4 10
(13) 12 0011113445789 5 10 6
22 13 00012233345568 10 11 02233
8 14 034579 13 11 677
2 15 36 22 12 001111344
( 4) 12 5789
22 13 0001223334
12 13 5568
8 14 034
Unmet need is greater at private institutions than it is at public institutions. The other 5 14 579
2 15 3
distinctions (2-year versus 4-year and nonprofit versus for profit) do not appear to make much of 1 15 6
a difference. A pie chart would be incorrect because these numbers do not represent parts of a
single whole. (If the numbers given had been total unmet need, rather than average unmet
need, and if we had information about all types of institutions, we would have been able to make 1.25
a pie chart.)

(b) There are more 2, 3, and 4letter words in Shakespeare's plays and more very long words in
Popular Science articles.

The time plots show that both manufacturers have generally improved over this period, with one 1.26 From the top left histogram: 4, 2, 1, 3. The upper-left hand graph is studying time; it is
slight jump in problems in 2003. Toyota vehicles typically have fewer problems, but GM has reasonable to expect this to be right-skewed (many students study little or not at all; a few study
managed to close the gap slightly. longer). The graph in the lower right is the histogram of student heights: One would expect a fair
amount of variation, but no particular skewness to such a distribution. The other two graphs are
1.23 (a) The percent for Alaska is 5.7% (the leaf7 on the stem 5), and the percent for Florida is handedness (upper right) and gender (lower left)-unless this was a particularly unusual class!
17.6% (leaf 6 on stem 17). (b) The distribution is roughly symmetric (perhaps slightly skewed to We would expect that right-handed students should outnumber lefties substantially. (Roughly
the left) and centered near 13%. Ignoring the outliers, the percentages range from 8.5% to 10% to 15% ofthe population as a whole is left-handed.)
15.6%.
14

1.24 Shown below are the original stemplot (as given in the text for Exercise 1.23, minus Alaska Ix
and Florida) and the split-stems version students were asked to construct for this exercise. 1.27 (a) The mean of Joey's first 14 quiz grades is x = i~I 1190
' = = 85 . (b) After adding the
14 14
Splitting the stems helps to identifY the small gaps, but the overall shape (roughly symmetric zero for Joey's unexcused absence for the 151h quiz, his final quiz average drops to 79.33. The
with a slight skew to the left) is clear in both plots. large drop in the quiz average indicates that the mean is sensitive to outliers. Joey's final quiz
grade of zero pulled his overall quiz average down. (c) A stemplot and a histogram (with cut
points corresponding to the grading scale) are shown below. Answers will vary, but the
20 Chapter 1 Exploring Data 21

histogram provides a good visual summary since the intervals can be set to match the grading
scale. 1.32 (a) The mean number ofhome runs hit by Barry Bonds from 1968 to 2004 is 37.0, and the
Stem-and-leaf of Joey_s grades N = 14 median is 37.0. The distribution is centered at 37 or Barry Bonds typically hits 37 home runs
Leaf Unit = 1 . 0
per season. (b) A stemplot is shown below.
Stem-and-leaf of Home run records N = 19
1 7 4
Leaf Unit = 1. 0
4 7 568
7 8 024
2 1 69
7 8 67
3 2 4
5 9 013
5 2 55
2 9 68
9 3 3344
(2) 3 77
8 4 02
6 4 55669
1 5
1 5
1 6
1.28 (a) A stemplot is shown below. 1 6
Stem-and-leaf of SSHA scores I N = 18 1 7 3
Leaf Unit = 1.0 (c) Barry Bonds typically hits around 37 home runs per season. He had an extremely unusual
3 10 139 year in 2001.
4 11 5
7 12 669
9 13 77
are shown below.
9 14 08
7 15 244
4 16 55
2 17 8
1 18
1 19
1 20 0

200 is a potential outlier. The center is 138.5. (Notice that the far left column of the stemplot
does not indicate the line with the median in this case because there are 9 scores at or below 137
and 9 scores at or above 140. Thus, any value between 137 and 140 could be called the median.
Typically, we average the two "middle" scores and call138.5 the median.) The scores range
2539 (b) Descriptive statistics for the SSHA scores of women and men are shown below. Note:
from 101 to 178, excluding 200, so the range is 77. (b) The mean score is x = = 141.056. Minitab uses N instead of n to denote the sample size on output.
18
(c) The median is 138.5, the average ofthe 9th and lOth scores in the ordered list of scores. The Variable N Mean StDev Minimum Q1 Median Q3 Maximum
Women 18 141.06 26.44 101.00 123.25 138.50 156.75 200.00
mean is larger than the median because of the unusually large score of200, which pulls the mean Men 20 121.25 32.85 70.00 95.00 114.50 144.50 187.00
towards the long right tail of the distribution.
(c) Women generally score higher than men. All five statistics in the five number summary
1.29 The team's annual payroll is 1.2x 25 = 30 or $30 million. No, you would not be able to (minimum, Q1, median, Q3, and maximum) are higher for the women. The men's scores are
calculate the team's annual payroll from the median because you cannot determine the sum of all more spread out than the women's. The shapes of the distributions are roughly similar, each
25 salaries from the median. displaying a slight skewness to the right.
1.30 The mean salary is $60,000. Seven of the eight employees (everyone but the owner) 1.34 (a) The mean and median should be approximately equal since the distribution is roughly
earned less than the mean. The median is $22,000. An unethical recruiter would report the mean symmetric. (b) Descriptive statistics are shown below.
salary as the "typical" or "average" salary. The median is a more accurate depiction of a Variable N Mean StDev Minimum Q1 Median Q3 Maximum
"typical" employee's earnings, because it is not influenced by the outlier of$270,000. Age 41 54.805 6.345 42.000 51.000 54.000 59.000 69.000
The five-number summary is: 42, 51, 54, 59, 69. As expected, the median (54) is very close to
1.31 The mean is $59,067, and the median is $43,318. The large salaries in the right tail will the mean (54.805). (c) The range ofthe middle halfofthe data is IQR =59- 51= 8. (e)
pull the mean up. According to the 1.5x(IQR) criterion, none of the presidents would be classified as outliers.
22 Chapter 1 Exploring Data 23

1.35 Yes, IQR is resistant. Answers will vary. Consider the simple data set 1, 2, 3, 4, 5, 6, 7, 8. (c) Software output is provided below.
The median= 4.5, Q1 = 2.5, Q3 = 6.5, and IQR = 4. Changing any value outside the interval variable N Mean StDev Minimum Q1 Median Q3 Maximum
between Q1 and Q3 will have no effect on the IQR. For example, if8 is changed to 88, the IQR Phosphate levels 6 5.400 0.642 4.600 4.825 5.400 5.875 6.400
will still be 4.
1.40 (a) The median and IQR would be the best statistics for measuring center and spread
because the distribution of Treasury bill returns is skewed to the right. (b) The mean and
standard deviation would be best for measuring center and spread because the distribution ofiQ
scores of fifth-grade students is symmetric with a single peak and no outliers. (c) The mean and
standard deviation would be the best statistics for measuring center and spread because the
distribution ofDRP scores is roughly symmetric with no outliers.

1.41 The mean 1s. -x =--=


11200 1600 ca1ones,
. the vanance
. . s 2 = 214870 =3581 2 squared ca1ones,
1s .
7 6
21 870
and the standard deviation is s = ~ : =
189.24 calories. Details are provided below.

X; X; -x (x; -xf
1792 192 36864
The boxplot indicates the presence of several outliers. According to the 1.5xiQR criterion, the 1666 66 4356
outliers are $85.76, $86.37, and $93.34. 1362 -238 56644
1614 14 196
1.37 (a) The quartiles are Q 1 = 25 and Q3 = 45. (b) Q3 + 1.5xiQR = 45 + 1.5x20 = 75. Bonds' 1460 -140 19600
73 home runs in 2001 is not an outlier. 1867 267 71289
1439 -161 25921
1.38 a) Descriptive statistics for the percent of residents aged 65 and over in the 50 states is 11200 0 214870
shown below.
Variable N Mean StDev Minimum Q1 Median Q3 Maximum
Over65 50 12.538 1.905 5.700 11.675 12.750 13.500 17.600 1.42 Answers will vary. The set {1, 2, 10, 11, 11} has a median of10 and a mean of7. The
The five-number summary is 5.7%, 11.675%, 12.75%, 13.5%, and 17.6%. (b) The IQR is 13.5 median must be 10, so set the third number in the ordered list equal to 10. Now, the mean must
- 11.675 = 1.825. 1.5xiQR is 2.7375 so any percents above 13.5+2.7375=16.2375 or below be 7, so the sum of all five numbers must be 7x5=35. Since 10 is one ofthe humbers, we need 4
11.675-2.7375=8.9375 would be classified as outliers. One other state, the one with 8.5%, other numbers, 2 below 10 and 2 above 10, which add to 35-10=25. Pick two small positive
would be an outlier. numbers (their sum must be no more than 5), say 1 and 2. The last two numbers must be at least
10 and have a sum of 22, so let them be the same value, 11.
32
1.39 (a) The mean phosphate level is :X= .4 = 5.4 mg/dl. (b) The standard deviation is 1.43 (a) One possible answer is 1, 1, 1, 1. (b) 0, 0, 10, 10. (c) For (a), any set of four identical
6
numbers will haves= 0. For (b), the answer is unique; here is a rough description of why. We
s= ~ 2 ~ 6 = 0.6419 mg/dl. Details are provided below. want to maximize the "spread-out"-ness of the numbers (which is. what standard deviation
measures), so 0 and 10 seem to be reasonable choices based on that idea. We also want to make
X; X; -x (x; -xf
xf, (x x) ~ - :xf and ( x4 - :xf -as large as
2
each individual squared deviation-( x1 - 2 - , (
5.6 0.2 0.04
possible. Ifwe choose 0, 10, 10, 10-or 10, 0, 0, O-we make the first squared deviation 7.5 2,
5.2 -0.2 0.04
but the other three are only 2.5 2 Our best choice is two at each extreme, which makes all four
4.6 -0.8 0.64
squared deviations equal to 52
4.9 -0.5 0.25
5.7 0.3 0.09
6.4 1.0 1.00
32.4 0 2.06
24 Chapter 1 Exploring Data 25

1.44 The algebra might be a bit of a stretch for some students: about the ability of the students taking the exam. If we have a less able group of students, then
( x1 -x)+(x2 -x)+ .. +(xn-1 -x)+(xn -x)=xI -x+x2 -=x++xn-1 -x+xn -:X scores would be lower, even on an easier exam.)
(drop the parentheses) 1.48 Who? The individuals are hot dogs. What? The quantitative variables of interest are
=XI + x2 + ... + xn-1 + xn -X- X ... - X- X calories (total number) and.sodium content (measured in mg). Why? The researchers were
(rearrange the terms) investigating the nutritional quality of major brands of hot dogs. When, where, how, and by
whom? The data were collected in 1986 by researchers working in a laboratory for Consumer
= x1 +x2 ++xn-l +xn -nx
~~~~~~~8~s37,?~~~boxplotsa~re~s~h~o~w~n~be~l~ow~~==~~~=7~~~

=0

1.45 (a) The mean and the median will both increase by $1000. (b) No. Each quartile will
increase by $1000, thus the difference Q3 - Q 1 will remain the same. (c) No. The standard
deviation remains unchanged when the same amount is added to each observation.

1.46 A 5% across-the-board raise will increase both IQR and s. The transformation being Numerical summaries: Descriptive statistics for each variable of interest are shown below.
applied here is xnew =1.05x, where x =the old salary and xnew =the new salary. Both IQR and s
will increase by a factor of 1.05. Descriptive Statistics: Beef-cal, Meat-cal, Poultry-Cal
Variable N Mean StDev Minimum Ql Median Q3 Maximum
Beef-cal 20 156.85 22.64 111.00 139.50 152.50 179.75 190.00
Meat-cal 17 158.71 25.24 107.00 138.50 153.00 180.50 195.00
Poultry-Cal 17 122.47 25.48 86.00 100.50 129.00 143.50 170.00

Descriptive Statistics: Beef-sod, Meat-sod, Poultry-Sod


Variable N Mean StDev Minimum Q1 Median Q3 Maximum
Beef-sod 20 401.2 102.4 253.0 319.8 380.5 478.5 645.0
Meat-sod 17 418.5 93.9 144.0 379.0 405.0 501.0 545.0
Poultry-Sod 17 459.0 84.7 357.0 379.0 430.0 535.0 588.0

Interpretation: Yes, there are systematic differences among the three types ofhot dogs.
Calories: There seems to be little difference between beef and meat hot dogs, but poultry hot
dogs are generally lower in calories than the other two. In particular, the median number of
calories in a poultry hot dog (129) is smaller than the lower quartiles of the other two types, and
the poultry lower quartile (1 00.5) is less than the minimum calories for beef (111) and meat
(b) The two distributions are very different. The distribution of scores on the statistics exam is (1 07). Students may simply compare the means-the average number of calories for poultry hot
roughly symmetric with a peak at 3. The distribution of scores on the AB calculus exam shows a dogs (122.47) is less than the averages for the other two types (156.85 for beef and 158.71 for
very different pattern, with a peak at 1 and another slightly lower peak at 5. The College Board meat-and standard deviations-the variability is highest for the poultry hot dogs (s = 25.48).
considers "3 or above" to be a passing score. The percents of students "passing" the exams are Sodium: Beef hot dogs have slightly less sodium on average than meat hot dogs, which have
very close (57.9% for calculus AB and 60.7% for statistics). Some students might be tempted to slightly less sodium on average than poultry hot dogs. Students may compare the means (401.2
argue that the calculqs exam is "easier" because a higher percent of students score 5. However, < 418.5 < 459) or medians (380.5 < 405 < 430). The variability, as measured by the standard
there is a larger percent of students who score 1 on the calculus exam. From these two deviations, goes in the other direction. Beef hot dogs have the highest standard deviation
distributions it is impossible to tell which exam is "easier." (Note: Grade setting depends on a (102.4), followed by meat hot dogs (93.9) and poultry hot dogs (84.7). The statement that "A hot
variety of factors, including the difficulty of the questions, scoring standards, and the dog isn't a carrot stick" provides a good summary of the nutritional quality of hot dogs. Even if
implementation of scoring standards. The distributions above do not include any information you try to reduce your calories by eating poultry hot dogs, you will increase your sodium intake.
26 Chapter 1 Exploring Data 27

Girls Boys
1.49 (a) Relative frequency histograms are shown below, since there are considerably more men 0 033334
than women. 96 0 66679999
22222221 1 2222222
888888888875555 1 558
4440 2 00344
2
3 0
6 3 0

1.52 The bar graphs below show several distinct differences in educational attainment between
the two
~~~~~~~~~

(b) Both histograms are skewed to the right, with the women's salaries generally lower than the
men's. The peak for women is the interval from $20,000 to $25,000, and the peak for men is the
interval from $25,000 to $30,000. The range of salaries is the same, with salaries in the smallest
and largest intervals for both genders. (c) The percents for women sum to 100.1% due to
roundoff error.

1.50 (a) To convert the power to watts, let xnew = 746x, where x =measurement in horsepower.
The mean, median, IQR, and standard deviation will all be multiplied by 746. (b) To convert
The older adults are more likely to have earned no more than a high school diploma. The
temperature to degrees Celsius, let xnew = ( 5/9) (X- 32)' where X= measurement in F . The
younger adults are more likely to have gone to college and to have completed a Bachelor's
new mean and median can be found be applying the linear transformation to the old mean and degree. However, the percentages of adults (young and old) earning advanced degrees are almost
median. In other words, multiply the old mean (median) by 5/9 and subtract 160/9. The IQR identical (about 8.2%).
and standard deviation will be multiplied by 5/9. (c) To "curve" the grades, let xnew =X+ 10'
where x =original test score. The mean and median will increase by 10. The IQR and standard 1.53 (a) The descriptive statistics (in units oftrees) are shown below.
deviation will remain the same. Descriptive Statistics: trees
Variable group N Mean StDev Minimum Q1 Median Q3 Maximum
trees 1 12 23.75 5.07 16.00 19.25 23.00 27.75 33.00
1.51 (a) Most people will "round" their answers when asked to give an estimate like this. Notice 2 12 14.08 4.98 2.00 12.00 14.50 17.75 20.00
that many responses are also multiples of 30 and 60. In fact, the most striking answers are the 3 9 15.78 5.76 4.00 12.00 18.00 20.50 22.00
ones such as 115, 170, and 230. The students who claimed 360 (6 hours) and 300 (5 hours) may The means (or medians), along with the boxplot below, suggest that logging reduces the number
have been exaggerating. (Some students might also "consider suspicious" the student who oftrees per plot and that recovery is slow. The 1-year-after and 8-years-after means (14.08 and
claimed to study 0 minutes per night.) (b) The stemplots below suggest that women (claim to) 15.78) are similar, but well below the mean for the plots that had never been logged (23.75). The
study more than men. The approximate midpoints are 175 minutes for women and 120 minutes standard deviations are similar, but the boxplot clearly shows more variability for the plots
for men. logged 8 years earlier (compare the heights of the boxes or the distances from the end of one
whisker to the end of the other whisker). (c) Use of x and s should be acceptable, since there is
Stem-and-leaf of Girls N = 30 Stem-and-leaf of Boys N = 30 only one outlier (2) in group 2 and the distributions show no extreme outliers or strong skewness
Leaf Unit = 10 Leaf Unit = 10 (given the small sample sizes).
2 0 69 6 0 033334
10 1 12222222 14 0 66679999
(15) 1 555578888888888 (7) 1 2222222
5 2 0444 9 1 558
1 2 6 2 00344
1 3 1 2
1 3 6 1 3 0
28 Chapter 1 Exploring Data 29

1.57 (a) The five-number summaries below show that chicks fed the new com generally gain
more weight than chicks fed normal com.
Variable Minimum Q1 Median Q3 Maximum
Normal corn 272.0 333.0 358.0 401.3 462.0
New corn 318.00 379.25 406.50 429.25 477.00

(Note that the quartiles will be slightly different if the student calculates them by hand. For
normal com Q1 = 337 and Q3 = 400.5. For new com Q1 = 383.5 and Q3 = 428.5.) No matter
how the quartiles are calculated, all five statistics in the five-number summary for the normal
com are lower than the corresponding statistics for the chicks fed with new com. The side-by-
side boxplot, constructed from these five statistics, clearly illustrates the effect (more weight
1.54 The means and standard deviations shown below are basically the same. Data set A is of the new com.
skewed to the left, while data set B skewed to the right with a high outlier.

Descriptive Statistics: Data A, Data B


Variable Mean StDev
Data A 7.501 2.032
Data B 7.501 2.031

Stem-and-leaf of Data A N =11 Stem-and-leaf of Data B N =11


Leaf Unit = 0.10 Leaf Unit = 0.10

1 3 1 3 5 257
2 4 7 5 6 58
2 5 (3) 7 079
3 6 1 3 8 48 (b) The means and standard deviations are:
4 7 2 1 9 Variable Mean StDev
(4) 8 1177 1 10 Normal corn 366.3 50.8
3 9 112 1 11 New corn 402.95 42.73
1 12 5 The average weight gain for chicks that were fed the new com is 36.65 grams higher than the
average weight gain for chicks who were fed normal com. (c) The means and standard deviations
1.55 The time series plot below shows that sales from record labels for the two groups were will be multiplied by 1/28.35 in order to convert grams to ounces. Normal: x=12.921oz, s =
similar from 1994 to 1996. After 1996, sales increased for the older group (over 35) and 1.792oz; New: x=14.213oz, s = 1.507 oz.
decreased for the 15-34 ears).
1.58 (a) Mean-although incomes are likely to be right-skewed, the city government wants to
know about the total tax base. (b) Median-the sociologist is interested in a "typical" family, and
/ ..... -----~ --l'r wants to lessen the impact of the extremes.

CASE CLOSED!
(1) A boxplot from Minitab is shown below. The centers of the distributions are roughly the
same, with the center line being just a little higher for CBS. The variability (heights of the
boxes) in the ratings differs considerably, with ABC having the most variability and NBC having
the least variability. The shapes of the distributions also differ, although we must be careful with
so few observations. The ratings are skewed to the right for ABC, roughly symmetric for CBS,
and slightly skewed to the left for NBC.
1.56 The variance is changed by a factor of2.542 = 6.4516; generally, for a transformation
xnew = bx, the new variance is b 2 times the old variance.
30 Chapter 1 Exploring Data 31

(2) The descriptive statistics are provided below.


Variable Network N Mean StDev Minimum Ql Median Q3 Maximum
Viewers ABC 6 8.72 3.97 5.50 5.65 7.60 11.33 16.20
CBS 9 7.978 1.916 5.400 6.100 8.000 9.650 10.900
Variable Minimum Ql Median Q3 Maximum
NBC 5 6.880 0.968 5.400 5.950 7.100 7.700 7.800
H. bihai 46.340 46.690 47.120 48.293 50.260
red 37.400 38.070 39.160 41. 690 43.090
The medians and IQRs should be used to compare the centers and spreads of the distributions yellow 34.570 35.450 36.110 36.820 38.130
because of the skewness, especially for ABC. The medians are 7.6 for ABC, 8.0 for CBS, and
7.1 for NBC. The IQRs are 5.68 for ABC, 3.55 for CBS, and 1.75 for NBC. (3) Whether there H bihai is clearly the tallest variety-the shortest bihai was over 3 mm taller than the tallest red.
are outliers depends on which technology you use. 16.2 is an outlier for ABC according to the Red is generally taller than yellow, with a few exceptions. Another noteworthy fact: The red
TI-83/84/89, but is not identified as an outlier by Minitab. Technical note: Quartiles can be variety is more variable than either of the other varieties. (b) The means and standard deviations
calculated in different ways, and these "slight" differences can result in different values for the for each variety are:
quartiles. If the quartiles are different, then our rule of thumb for classifying outliers will be Variable Mean StDev
H. bihai 47.597 1.213
different. These minor computational differences are not something you need to worry about. red 39.711 1.799
(4) It means that the average of the ratings would be pulled higher or lower based on extremely yellow 36.180 0.975
successful or unsuccessful shows. For example, the rating of 16.2 for Desperate Housewives (c) The stemplots are shown below.
would clearly pull the average for ABC upward. (5) The medians suggest that CBS should be Stem-and-leaf of H. bihai N = 16 Stem-and-leaf of red N = 23
Leaf Unit = 0.10 Leaf Unit = 0.10
ranked first, ABC second, and NBC third.
2 46 34
1.59 Student answers will vary but examples include: number of employees, value of company 7 46 66789 1 37 4
(3} 47 114 4 37 789
stock, total salaries, total profits, total assets, potential for growth. 6 47 9 38 00122
6 48 0133 11 38 78
1.60 A stemplot is shown below. 2 48 (1} 39 1
Stem-and-leaf of density N = 29 2 49 11 39 67
Leaf Unit = 0.010 2 49 9 40
2 50 12 9 40 56
1 48 8 7 41 4
1 49 6 41 699
2 50 7 3 42 01
3 51 0 1 42
7 52 6799 1 43 0
12 53 04469 Stem-and-leaf of yellow N = 15
(4} 54 2467 Leaf Unit = 0.10
13 55 03578
8 56 12358 2 34 56
3 57 59 4 35 14
1 58 5 5 35 6
(3} 36 001
7 36 5678
The distribution is roughly symmetric with one value (4.88) that is somewhat low. The center
3 37 01
of the distribution is between 5.4 and 5.5. The densities range from 4.88 to 5.85 and there are no 1 37
outliers. We would estimate the Earth's density to be about 5.45 in these units. 1 38 1
32 Chapter 1 Exploring Data 33

Bihai and red appear to be right-skewed (although it is difficult to tell with such small symmetric with no outliers.
samples). Skewness would make these distributions unsuitable for x and s. (d) The means and
standard deviations in millimeters are shown below.
Variable Mean StDev
H. bihai (in) 1.8739 0.0478
red (in) 1.5634 0.0708
yellow (in) 1.4244 0.0384
To convert from millimeters to inches, multiply by 39.37/1000 = 0.03937 (or divide by 25.4-an
inch is defined as 25.4 millimeters). For example, for the H bihai variety,
x = (47.5975 mm)(0.03937 in/mm) = (47.5975 mm) + (25.4 mmlin) = 1.874 in.
1.62 Student observations will vary. Clearly, Saturday and Sunday are quite similar and
considerably lower than other days. Among weekdays, Monday births are least likely, and chosen. One possible ogive is shown below.
Tuesday and Friday are also very similar. One might also note that the total number of births on
a given day (over the course of the year) would be the sum ofthe 52 or so numbers that went into
each boxplot. We could use this fact to come up with a rough estimate of the totals for each day,
and observe that Monday appears to have the smallest number ofbirths (after Saturday and
Sunday).

1.63 The stemplot shown below is roughly symmetric with no apparent outliers.
Stem-and-leaf of Percent(rouned) N = 15
Leaf Unit = 1.0

2 4 33
4 4 89 (c) Estimates will vary. The median percentile) is about 8.4 min. and the 90th percentile is
( 6) 5 000114
5 5 579
about 8.8 min. (d) A drive time of8.0 minutes is about the 38th percentile.
2 6 11
(b) The median is 50.7%. (c) The third quartile is 57.4%, so the elections classified as 1.66 (a) A frequency table and histogram are shown below.
landslides occurred in 1956, 1964, 1972, and 1984.
Hours per Rei. Freq.
14 959 1 week (approx.)
1.64 Note that estimates will vary. (a) The median would be in position ' + = 7480 in the 0--3 .33
2
list; from the boxplot, we estimate it to be about $45,000. (b) The quartiles would be in positions 3-6 .20
3740 and 11,220, and we estimate their values to be about $32,000 and $65,000. Note: The 6-9 .15
positions of the quartiles were found according to the text's method; that is, these are the 9-12 .13
locations of the medians of the first and second halves of the list. Students might instead compute 12-15 .01
0.25 x 14,959 and 0. 75 x 14,959 to obtain the answers 3739.75 and 11,219.25. (c) Omitting 15-18 .04
these observations should have no effect on the median and quartiles. (The quartiles are 18-21 .02
computed from the entire set of data; the extreme 5% are omitted only in locating the ends ofthe 21-24 .03
lines for the boxplot.) (d) The 5th and 95th percentiles would be approximately in positions 748 24-27 .01
and 14,211. (e) The "whiskers" on the box extend to approximately $13,000 and $137,000. (f) 27-30 .08
All five income distributions are skewed to the right. As highest education level rises, the
median, quartiles, and extremes rise-that is, all five points on the boxplot increase. (b) The median (50th percentile) is about 5, Q1 (25th percentile) is about 2.5, and Q3 (75th
Additionally, the width of the box (the IQR) and the distance from one extreme to the percentile) is about 11. There are outliers, according to the 1.5xiQR rule, because values
other (the difference between the 5th and 95th percentiles) also increase, meaning that the exceeding Q3 + 1.5xiQR = 23.75 clearly exist. (c) A student who used her computer for 10 hours
distributions become more and more spread out.
would fall at about the 70th percentile.
34 Chapter 1

1.67 (a) The five number summary for monthly returns on Wal-Mart stock is: Min=
-34.04255%, Q1 = -2.950258%, Median= 3.4691%, Q3 = 8.4511%, Max= 58.67769%. (b)
The distribution is roughly symmetric, with a peak in the high single digits (5 to 9). There are no
gaps, but four "low" outliers and five "high" outliers are listed separately. (c) 58.67769% of
$1000 is $586.78. The stock is worth $1586.78 at the end of the best month. In the worst month,
the stock lost 1000x0.3404255 = $340.43, so the $1000 decreased in worth to $1000-$340.43 =
$659.57. (d) IQR = Q3- Q1 = 8.45- (-2.95) = 11.401; 1.5xiQR = 17.1015
Q1- (1.5xiQR) = -2.950258- 17.1015 = -20.0518
Q3 + (l.5xiQR) = 8.4511 + 17.1015 = 25.5526
The four "low" and five "high" values are all outliers according to the criterion. It does appear
that the software uses the 1.5xiQR criterion to identify outliers.

1.68 The difference in the mean and median indicates that the distribution of awards is skewed
sharply to the right-that is, there are some very large awards.

1.69 The time plot below shows that women's times decreased quite rapidly from 1972 until the
mid-1980s. Since that time, they have been fairly consistent: All times since 1986 are between
141 and 147

1.70 (a) About 20% oflow-income and 33% ofhigh-income households consisted of two
people. (b) The majority oflow-income households, but only about 7% ofhigh-income
households, consist of one person. One-person households often have less income because
they would include many young people who have no job, or have only recently started
working. (Income generally increases with age.)
35

Chapter 2

68
2.1 Eleanor's standardized score, z = ~~;oo = 1.8, is higher than Gerald's standardized score,
z =27-18 =1.5.
6

2.2 The standardized batting averages (z-scores) for these three outstanding hitters are:
Player z-score
Cobb .420-.266
z= 4.15
.0371
Williams .406-.267
z= =4.26
.0326
Brett .390-.261
z= =4.07
.0317
All three hitters were at least 4 standard deviations above their peers, but Williams' z-score is the
highest.

2.3 (a) Judy's bone density score is about one and a half standard deviations below the average
score for all women her age. The fact that your standardized score is negative indicates that your
bone density is below the average for your peer group. The magnitude of the standardized score
tells us how many standard deviations you are below the average (about 1.5). (b) If we let O"
denote the standard deviation of the bone density in Judy's reference population, then we can
solve fiorO" m. the equatiOn-
. 1.45 = 948-956 . Thus, a=5.52.
.
(j

2.4 (a) Mary's z-score (0.5) indicates that her bone density score is about half a standard
deviation above the average score for all women her age. Even though the two bone density
scores are exactly the same, Mary is 10 years older so her z-score is higher than Judy's (-1.45).
Judy's bones are healthier when comparisons are made to other women in their age groups. (b)
If we let O" denote the standard deviation of the bone density in Mary's reference population,
t he equation
then we can so lve fior O" m . o
.5 =
948 - 944 Th
(j
. us, a == 8 . There IS
. more vana
. b"l"
I Ity
in the bone densities for older women, which is not surprising.

2.5 (a) A histogram is shOwn below. The distribution of unemployment rates is symmetric with
a center around 5%, rates varying from 2. 7% to 7.1 %, and no gaps or outliers.
36 Chapter 2 Describing Location in a Distribution 37

z=
64 - 46 9 . 1 57
= . among the nattona
. 1 group an d z = 64 - 58 '2 =
. 0.62 among th e 50 b oys at h'ts
10.9 9.4
school. (c) The boys at Scott's school did very well on the PSAT. Scott's score was relatively
better when compared to the national group than to ))is peers at school. Only 5.2% of the test
takers nationally scored 65 or higher, yet about 23.47% scored 65 or higher at Scott's school. (d)
Nationally, at least 89% of the scores are between 20 and 79.6, so at most 11% score a perfect
80. At Scott's school, at least 89% of the scores are between 30 and 80, so at most 11% score 29
or less.

2.8 Larry's wife should gently break the news that being in the 90th percentile is not good news
(b) The average unemployment rate is :X= 4.896% and the standard deviation of the rates is in this situation. About 90% of men similar to Larry have identical or lower blood pressures.
s = 0.976%. The five-number summary is: 2.7%, 4.1 %, 4.8%, 5.5%, 7.1 %. The distribution is The doctor was suggesting that Larry take action to lower his blood pressure.
symmetric with a center at 4.896%, a range of 4.4%, and no gaps or outliers. (c) The
unemployment rate for Illinois is the 84th percentile; Illinois has one of the higher unemployment 2.9 Sketches will vary. Use them to confirm that the students understand the meaning of(a)
rates in the country. More specifically, 84% of the 50 states have unemployment rates at or symmetric and bimodal and (b) skewed to the left.
below the unemployment rate in Illinois (5.8%). (d) Minnesota's unemployment rate (4.3%) is
at the 30th percentile and the z-score for Minnesota is z = -0.61. (e) The intervals, percents 2.10 (a) The area under the curve is a rectangle with height 1 and width 1. Thus, the total area
guaranteed by Chebyshev's inequality, observed counts, and observed percents are shown in the under the curve is 1 x 1 = 1. (b) The area under the uniform distribution between 0.8 and 1 is
table below. 0.2x 1 = 0.2, so 20% of the observations lie above 0.8. (c) The area under the uniform
k Interval % guaranteed Number of values Percent of values distribution between 0 and 0.6 is 0.6x1 = 0.6, so 60% ofthe observations lie below 0.6. (d) The
by Chebyshev in interval in interval area under the uniform distribution between 0.25 and 0.75 is 0.5x1 = 0.5, so 50% of the
1 3.920-5.872 At least 0% 35 70% observations lie between 0.25 and 0.75. (e) The mean or "balance point" of the uniform
2 2.944-6.848 At least 75% 47 94% distribution is 0.5.
3 1.968-7.824 At least 89% 50 100%
2.11 A boxplot for the uniform distribution is shown below. It has equal distances between the
4 0.992-8.800 At least 93.75% 50 100%
with no outliers.
5 0.016-9.776 At least 96% 50 100%
As usual, Chebychev's mequahty IS very conservative; the observed percents for each interval
are higher than the guaranteed percents.

2.6 (a) The rate of unemployment in Illinois increased 28.89% from December 2000 (4.5%) to
45 3 7
May 2005 (5.8%). (b) The z-score z - .4 = 1.03 in December 2000 is higher than the z-
1
5.8-4.896 . .
score z = = 0.9262 m May 2005. Even though the unemployment rate in Illinois
0.976
increased substantially, the z-score decreased slightly. (c) The unemployment rate for Illinois in

December 2000 is the 86th percentile. ( ~~ = 0.86) Since the unemployment rate for Illinois
4 1
2.12 (a) Mean C, median B; (b) mean A, median A; (c) mean A, median B.
i~ May 2005 is the 84th percentile, we know that Illinois dropped one spot ( 1 = 0.02) on the 2.13 (a) The curve satisfies the two conditions of a density curve: curve is on or above horizontal
50
ordered list of unemployment rates for the 50 states. axis, and the total area under the curve= area of triangle+ area of2 rectangles=
1
-x0.4x 1 +0.4x1 +0.4x1 = 0.2+0.4+0.4 = 1. (b) The area under the curve between 0.6 and 0.8
2.7 (a) In the national group, about 94.8% of the test takers scored below 65. Scott's 2
percentiles, 94.8th among the national group and 68th, indicate that he did better among all test is 0.2x1 = 0.2. (c) The area under the curve between 0 and 0.4 is
takers than he did among the 50 boys at his school. (b) Scott's z-scores are
38 Chapter 2 Describing Location in a Distribution 39

1 8000000-4243283.33 . . . .
2x0.4xl +0.4xl = 0.2+0.4 = 0.6. (d) The area under the curve between 0 and 0.2 is from z= 0.71 m 2004 to 0.79 m 2005. Damon's salary percentile
5324827.26
1 increased from the 8ih (26 out of30) in 2004 to the 93rd (26 out of28) in 2005, while McCarty's
2x 0.2x 0.5 +0.2xl.5 = 0.05+ 0.3 = 0.35. (e) The area between 0 and 0.2 is 0.35. The area decreased from the 20th (6 out of30)in 2004 to the 14th (4 out of28) in 2005.
between 0 and 0.4 is 0.6. Therefore the "equal areas point" must be between 0.2 and 0.4.
2.18 (a) The intervals, percents guaranteed by Chebyshev's inequality, observed counts, and
2.14 (a) The distribution should look like a uniform distribution, with height 1/6 or about observed percen ts are shown m th e table b eIow.
16.67%, depending on whether relative frequency or percent is used. If frequency is used, then k Interval % guaranteed Number of values Percent of values
each of the 6 bars should have a height of about 20. (b) This distribution is similar because each by Chebyshev in interval in interval
of the bars has the same height. This feature is a distinguishing characteristic of uniform 1 73.93-86.07 At least 0% 18 72%
distr~butions. However, the two distributions are different because in this case we have only 6 2 67.86-92.14 At least 75% 23 92%
possible outcomes {1, 2, 3, 4, 5, 6}. In Exercise 2.10 there are an infinite number of possible 3 61.79-98.21 At least 89% 25 100%
outcomes in the interval from 0 to 1. 4 55.72-104.28 At least 93.75% 25 100%
5 49.65-110.35 At least 96% 25 100%
2.15 The z-scores are z
w
72-64 .
=--=
2.7
2.96 for women and z =
m
72-69.3
2.8
= 0 96 for men The z-
' '
As usual, Chebyshev's inequality is very conservative; the observed percents for each interval
are higher than the guaranteed percents. (b) Each student's z-score and percentile will stay the
scores tell us that 6 feet is quite tall for a woman, but not at all extraordinary for a man. same because all of the scores are simply being shifted up by 4 points,
z=
(x+4)-(x+4) = -
x-x . .
- . (c) Each student's z-score and percentile will stay the same because
s s
. muI'
aII of the scores are bemg tip I'Iedby t he same positive
. . . constant, z = 1.06x-1.06:X - x-x- . (d)
1.06s s
This final plan is recommended because it allows the teacher to set the mean (84) and standard
deviation (4) without changing the overall position of the students.

2.19 (a) Erik had a relatively good race compared the other athletes who completed the state
meet, but had a poor race by his own standards. (b) Erica was only a bit slower than usual by
her own standards, but she was relatively slow compared to the other swimmers at the state meet.

2.20 (a) The density curve is shown below.


(b) Numerical summaries are provided below. 1.4
Variable N Mean StDev Minimum Q1 Median Q3 Maximum
Salaries 28 4410897 4837406 316000 775000 2875000 7250000 22000000 l.Z
The distribution of salaries is skewed to the right with a median of$2,875,000. There are two 1.0
major gaps, one from $8.5 million to $14.5 million and another one from $14.5 million to $22 ..8
million. The salaries are spread from $316,000 to $22 million. The $22 million salary for
.6
Manny Ramirez is an outlier. (c) David McCarty's salary of$550,000 gives him az-score of
.4
z=
550000-4410897 . . th
--0.80and places him at about the 14 percentile. (d) Matt Mantei's salary .2
c
1
4837406 I
or$750,000 ~aces hi~ at the 25th percentile and Matt Clement's salary of$6.5 million places 0
.5 1
him at the 75 percentile. (e) These percentiles do not match those calculated in part (b)
because the software uses a slightly different method for calculating quartiles. The area under the density curve is equal to the area of A + B + C =
!xo.5x0.8+!xo.5x 0.8+ 1x0.6 = 1. (b) The median is atx = 0.5, and the quartiles are at
~.17 Between 2004 and 2005, McCarty's salary increased by $50,000 (10%), while Damon's 2 2
mcreased by $250,000 (3.125%). The z-score for McCarty decreased from approximately x = 0.3 and x = 0.7. (c) The first line segment has an equation of y = 0.6+ 1.6x.
500000-4243283.33 Thus, the height of the density curve at 0.3 is 0.6+ 1.6x0.3 = 1.08. The total area under the
z= . -0.70 in 2004 to -0.80 in 2005 while the z-score for Damon increased
5324827 26
40 Chapter 2 Describing Location in a Distribution 41

one standard deviation below the mean. (d) The value 71.5 is one standard deviation above the
density curve between 0 and 0.3 is ix0.3x0.48+0.3x0.6=0.252. Thus, 25.2% ofthe mean. Thus, the area to the left of71.5 is the 0.68 + 0.16 = 0.84. In other words, 71.5 is the 84th
observations lie below 0.3. (d) Using symmetry ofthe density curve, the area between 0.3 and percentile of adult male American heights.
0.7 is 1 - 2x0.252 = 0.496. Therefore, 49.6% of the observations lie between 0.3 and 0.7.
of 9-ounce bags of potato chips is shown below.

The interval containing weights within 1 standard deviation of the mean goes from 8.97 to 9.27.
The interval containing weights within 2 standard deviations of the mean goes from 8.82 to 9.42.
(b) The proportion of outcomes less than 1 is 1x _!_ = _!_ (c) Using the symmetry of the The interval containing weights within 3 standard deviations of the mean goes from 8.67 to 9.57.
2 2 (b) A bag weighing 8.97 ounces, 1 standard deviation below the mean, is at the 16th percentile.
distribution, it is easy to see that median= mean= 1, Q1 = 0.5, Q3 = 1.5. (d) The proportion of
(c) We need the area under a Normal curve from 3 standard deviations below the mean to 1
outcomes that lie between 0.5 and 1.3 is 0.8 x _!_ = 0.4. standard above the mean. Using the 68-95-99.7 Rule, the area is equal to
2 1 1
0.68+ (0.95 -0.68)+ (0.997 -0.95) = 0.8385, so about 84% of9-ounce bags of these potato
~.22 (a) Outcomes from 18 to 32 are likely, with outcomes near 25 being more likely. The most
2 2
chips weigh between 8.67 ounces and 9.27 ounces.
hkely outcome is 25. (d) The distribution should be roughly symmetric with a single peak
around 25 and a standard deviation of about 3 .54. There should be no gaps or outliers. The
2.27 Answers will vary, but the observed percents should be close to 68%, 95%, and 99.7%.
normal density curve should fit this distribution well.
2.28 Answers will differ slightly from 68%, 95%, and 99.7% because of natural variation from
2.23 The standard deviation is approximately 0.2 for the tall, more concentrated one and 0.5 for
trial to trial.
the short, less concentrated one.
2.29 (a) 0.9978 (b) 1-0.9978 = 0.0022 (c) 1-0.0485 = 0.9515 (d) 0.9978-0.0485 = 0.9493

2.30 (a) 0.0069 (b) 1-0.9931 = 0.0069 (c) 0.9931-0.8133 = 0.1798 (d) 0.1020-0.0016 =
0.1004

2.31 (a) We want to find the area under the N(0.37, 0.04) distribution to the right of0.4. The
graphs below show that this area is equivalent to the area under the N(O, 1) distribution to the
. hto f z = 0.4-0.3 7 = 0 .75 .
ng
0.04

2.25 (a) Approximately 2.5% of men are taller than 74 inches, which is 2 standard deviations
above the mean. (b) Approximately 95% ofmen have heights between 69-5=64 inches and
69+5=74 inches. (c) Approximately 16% of men are shorter than 66.5 inches, because 66.5 is
42 Chapter 2 Describing Location in a Distribution 43

~/
I
I 0.25

Using Table A, the proportion of adhesions higher than 0.40 is 1- 0.7734 = 0.2266. (b) We 2.33 (a) The proportion of pregnancies lasting less than 240 days is shown in the graph below
want to find the area under the N(0.37, 0.04) distribution between 0.4 and 0.5. This area is
7
equivalent to the area under the N(O, 1) distribution between z = 0.4-0.3 = 0.75and
0.04
0 5 037
z= - = 3.25. (Note: New graphs are not shown, because they are almost identical to the
0.04
graphs above. The shaded region should end at 0.5 for the graph on the left and 3.25 for the
graph on the right.) Using Table A, the proportion of adhesions between 0.4 and 0.5 is 0.9994-
0.7734 = 0.2260. (c) Now, we want to find the area under the N(0.41, 0.02) distribution to the
right of0.4. The graphs below show that this area is equivalent to the area under the N(O, 1)
. "button
d tstn . totheng
. hto f z = 0.4-0.4l =-0 .5.
0.02

The shaded area is equivalent to the area under the N(O, 1) distribution to the left of
z=
240 266
-
16
=-1.63 which is 0.0516 or about 5.2%. (b) The proportion of pregnancies
'
lasting between 240 and 270 days is shown in the graph above (right). The shaded area is
270-266
equivalent to the area under the N(O, 1) distribution between z = -1.63 and z = = 0.25,
16
which is 0.5987-0.0516 = 0.5471 or about 55%. (c) The 80th percentile for the length of human
re nanc is shown in the ra h below.

Using Table A, the proportion of adhesions higher than 0.40 is 1- 0.3085 = 0.6915. The area /~"'
I \
under the N(0.41, 0.02) distribution between 0.4 and 0.5 is equivalent to the area under the N(O, II \
,

I
1) d tstn . between z = 0.4-0.4l =-0 .5 an d z = 0 5 -0.4l =4 .5 . U.
. "button smg Tabl e A , th e
0.02 0.02
propmtion of adhesions between 0.4 and 0.5 is 1-0.3085 = 0.6915. The proportions are the
same because the upper end ofthe interval is so far out in the right tail. __ / / 0,8

2.32 (a) The closest value in Table A is -0.67. The 25th percentile of the N(O, 1) distribution is
-0.67449. (b) The closest value in Table A is 0.25. The 60th percentile ofthe N(O, 1), Using Table A, the 801 percentile for the standard Normal distribution is 0.84. Therefore, the
distribution is 0.253347. See the graphs below. soth percentile for the length of human pregnancy can be found by solving the equation
44 Chapter 2 Describing Location in a Distribution 45

x-266 .
0.84= forx. Thus, x=0.84x16+266=279.44. Thelongest20%ofpregnancteslast
16
approximately 279 or more days.
/-,, .
2.34 (a) The proportion of people aged 20 to 34 with IQ scores above 100 is shown in the graph I \

I
below left. I \
I \
I \
I \
I 0.4 0.6 I I
0.98
\
\

/
_// __.//

1 2
The two equations are -0.25 = - 11 and 2.05 = - 11 . Multiplying both sides of the equations
(j (j

by CJ and subtracting yields -2.3CJ = -1 or


1
=
= - - 0.4348 minutes. Substituting this value
CJ
2.3
The shaded area is equivalent to the area under the N(O, 1) distribution to the right of
z=
100 110
- = -0.4, which is 1- 0.3446 = 0.6554 or about 65.54%. (b) The proportion of
1 11
back into the first equation we obtain -0.25 = -
0.4348
or Jl = 1+ 0.25 x 0.4348 1.1087 minutes. =
25
people aged 20 to 34 with IQ scores above 150 is shown in the graph above (right). The shaded 2.37 Small and large percent returns do not fit a Normal distribution. At the low end, the
150 110 percent returns are smaller than expected, and at the high end the percent returns are slightly
area is equivalent to the area under the N(O, 1) distribution to the right of z = - = 1.6,
25 larger than expected for a Normal distribution.
which is 1-0.9452 = 0.0548 or about 5.5%. (c) The 981h percentile of the IQ scores is shown in
the 2.38 The shape of the quantile plot suggests that the data are right-skewed. This can be seen in
the flat section in the lower left-these numbers were less spread out than they should be for
Normal data-and the three apparent outliers that deviate from the line in the upper right; these
were much larger than they would be for a Normal distribution.

2.39 (a) Who? The individuals are great white sharks. What? The quantitative variable of
interest is the length of the sharks, measured in feet. Why? Researchers are interested in the size
of great white sharks. When, where, how, and by whom? These questions are impossible to
answer based on the information provided. Graphs: A histogram and stemplot are provided
below.

Using Table A, the percentile for the standard Normal distribution is closest to 2.05.
Therefore, the 801h percentile for the IQ scores can be found by solving the equation
x-110 .
2.05 = for x. Thus, x = 2.05 x 25 + 110 = 161.25 . In order to qualify for MENSA
25
membership a person must score 162 or higher.

2.35 (a) The quartiles of a standard Normal distribution are at 0.675. (b) Quartiles are 0.675
standard deviations above and below the mean. The quartiles for the lengths of human
pregnancies are 266 0.675(16) or 255.2 days and 276.8 days.
Describing Location in a Distribution 47
46 Chapter 2

2.40 (a) A stemplot is shown below. The distribution is roughly symmetric.


Stem-and-leaf of density N = 29
Stem-and-leaf of shlength N = 44 Leaf Unit = 0.010
Leaf Unit = 0.10
1 48 8
1 9 4 1 49
1 10 2 50 7
1 11 3 51 0
6 12 12346 7 52 6799
14 13 22225668 12 53 04469
18 14 3679 (4) 54 2467
(6) 15 237788 13 55 03578
20 16 122446788 8 56 12358
11 17 688 3 57 59
8 18 23677 1 58 5
3 19 17 (b) The mean is :X= 5.4479 and the standard deviation iss= 0.2209. The densities follow the
1 20
1 21 68-95-99.7 rule closely-75.86% (22 out of29) ofthe densities fall within one standard
1 22 8 deviation ofthe mean, 96.55% (28 out of29) of the densities fall within two standard deviations
of the mean, and 100% of the densities fall within 3 standard deviations of the mean. (c)
Normal from Mini tab and a TI calculator (right) are shown below.
Numerical Summaries: Descriptive statistics are provided below.
Variable N Mean StDev Minimum Ql Median Q3 Maximum
shlength 44 15.586 2.550 9.400 13.525 15.750 17.400
Interpretation: The distribution of shark lengths is roughly symmetric with a peak at 16 and a
22.800


....
..... .... ....
spread from 9.4 feet to 22.8 feet.
(b) The mean is 15.5 86 and the median is 15.75. These two measures of center are very close to

.....
.
...
one another, as expected for a symmetric distribution. (c) Yes, the distribution is approximately
normal-68.2% of the lengths fall within one standard deviation of the mean, 95.5% ofthe
lengths fall within two standard deviations of the mean, and 100% of the lengths fall within 3
standard deviations ofthe mean. (d) Normal probability plots from Mini tab (left) and a TI
calculator are shown below.
Yes, the Normal probability plot is roughly linear, indicating that the densities are approximately
Normal.

._.. .... 2.41 (a) A histogram from one sample is shown below. Histograms will vary slightly but should

........ ......-
suggest a bell curve. (b) The Normal probability plot below shows something fairly close to a
line but illustrates even for actual normal data, the tails deviate from a line.

..... ....;
II II II II II
,_ ...........

Except for one small shark and one large shark, the plot is fairly linear, indicating that the
Normal distribution is appropriate. (e) The graphical displays in (a), comparison oftwo
measures of center in (b), check ofthe 68-95-99.7 rule in (c), and Normal probability plot in (d)
indicate that shark lengths are approximately Normal.

2.42 (a) A histogram from one sample is shown below. Histograms will vary slightly but should
suggest the density curve of Figure 2.8 (but with more variation than students might expect).
50 Chapter 2 Describing Location in a Distribution 51

\
\
~-- ,/
(
2.47 (a) Using Table A, the closest values to the deciles are 1.28. (b) The deciles for the (b) The 65th percentile is shown above (right). Using Table A, the 65th percentile of a standard
heights of young women are 64.5 1.28x2.5 or 61.3 inches and 67.7 inches. Normal distribution is closest to 0.39, so the 65th percentile for Writing score is 516 + OJ9x 115
= 560.85.
2.48 The quartiles for a standard Normal distribution are 0.6745. For a N(,u,a)distribution, 2. (a) The proportion of male test takers who earned scores below 502 is shown below (left).
502 491
Q1 = ,u-0.6745a, Q3 = ,u+0.6745a, and IQR = 1.349a. Therefore, 1.5xiQR = 2.0235a, and Standardizing the score yields a z-score of z = - = 0.10. Table A gives the proportion
110
the suspected outliers are below Q1 -1.5 x IQR = ,u- 2.698a or above 0.5398 or about 54%. (b) The proportion of female test takers who earned scores above 491 is
Q3 + 1.5 x IQR = ,u + 2.698a . The proportion outside of this range is approximately the same as
the area under the standard Normal distribution outside of the range from -2.7 to 2.7, which is 2
shown below (right). Standardizing the score yields a z-score of z =
491 502
-
108
= -0.10. Table A

x 0.0035 = 0.007 or 0.70%. gives the proportion 1 - 0.4602 = 0.5398 or about 54%. (Minitab gives 0.5406.) The
probabilities in (a) and (b) are almost exactly the same because the standard deviations for male
2.49 The plot is nearly linear. Because heart rate is measured in whole numbers, there is a slight and female test takers are close to one another.
"step" appearance to the graph.

2.50 Women's weights are skewed to the right: This makes the mean higher than the median,
and it is also revealed in the differences M -Q1 = 133.2-118.3 = 14.9pounds and
Q3 -M = 157.3-133.2 = 24.lpounds.

CASE CLOSED!
1. (a) The proportion of students who earned between 600 and 700 on the Writing section is
shown below (left). Standardizing both scores yields z-scores of z =
600 516
-
115
=
0.73 and

700 516
z= - = 1.6. Table A gives the proportion 0.9452-0.7673 = 0.1779 or about 18%. (c) The 85th percentile for the female test takers is shown below (left). Using Table A, the 85th
115 percentile of the standard Normal distribution is closest to 1.04, so the 85th percentile for the
=
female test takers is 502 + 1.04 x 108 614 . The proportion of male test takers who score above
614 is shown below (right). Standard 1zmg . ld s a z-score o f z = 614 - 491 =
. . t h e score y1e . 1.12 .
110
Table A gives the proportion 1-0.8686 = 0.1314 or about 13%.
Describing Location in a Distribution 53
52 Chapter 2

/~ ..
I \

,./~=
/"" \ ...
..

3. (a) The boxplot below shows that the distributions of scores for males and females are very
similar. Both distributions are roughly symmetric with no outliers. The median for the females
(580) is slightly higher than the median for the males (570). The range is higher for females
versus 33 and the for males (110 versus 100).

135 100
A WISC score of 135 corresponds to a standardized score of z = - 2.33. Using Table
15
A, the proportion of"gifted" students is 1- 0.9901 = 0.0099 or .99%. Therefore,
0.0099x1300=12.87 or about 13 students in this school district are classified as gifted.
Variable N Mean StDev Minimum. Ql Median Q3 Maximum
Males 48 584.6 80.1 430.0 530.0 570.0 640.0 760.0
Females 39 580.0 78.6 420.0 530.0 580.0 630.0 780.0 2.52 Sketches will vary, but should be some variation on the one shown below: The peak at 0
should be "tall and skinny," while near 1, the curve should be "short and fat."
The mean for the males (584.6) is slightly higher than the mean for the females (580.0), but the
overall performance for males and females is about the same at this school. (b) The students at
this private school did much better than the overall national mean (516). There is also much less
variability in the scores at this private school than the national scores. (c) Normal probability
plots for the males and females are shown below. Both plots show only slight departures from
the overall linear indicatin that both ~ets of scores are approximately Normal.
)\ 0
:~:~
2.53 The percent of actual scores at or below 27 is
1052490
1171460
x100 =89.84%. A score of27

..... corresponds to a standard score of z =
27 20 9
- = 1.27 . Table A indicates that 89.8% of scores

. . ..
4.8
in a Normal distribution would fall below this level. Based on these calculations, the Normal
distribution does appear to describe the ACT scores well .


2.54 (a) Joey's scoring "in the 97th percentile" on the reading test means that Joey scored as
well as or better than 97% of all students who took the reading test and scored worse than about
3%. His scoring in the 72nd percentile on the math portion of the test means that he scored as
54 Chapter 2 Describing Location in a Distribution 55

well as or better than 72% of all students who took the math test and worse than about 28%. That 2.57 (a) The mean :X= $17,776 is greater than the median M= $15,532. Meanwhile,
is, Joey did better on the reading test, relative to his peers, than he did on the math test. (b) If M -Q1 =$5,632and Q3 -M =$6,968, so Q3 is further from the median than Q1 Both ofthese
the test scores are Normal, then the z-scores would be 1.88 and for the 97th percentile and 0.58
comparisons result in what we would expect for right-skewed distributions. (b) From Table A,
for the 72nd percentile. However, nothing is stated about the distribution of the scores and we do
we estimate that the third quartiles of a Normal distribution would be 0.675 standard deviations
not have the scores to assess normality. =
above the mean, which would be $17,776 + 0.675 x $12,034 $25,899. (Software gives
0.6745, which yields $25,893.) As the exercise suggests, this quartile is larger than the actual
2.55 The head sizes that need custom-made helmets are shown below. The 5th and 95th
value ofQ3
percentiles for the standard Normal distribution are 1.645. Thus, the 5th and 95th percentiles for
soldiers' head circumferences are 22.8 1.645xl.I. Custom-made helmets will be needed for 2.58 (a) About 0.6% of healthy young adults have osteoporosis (the area below a standard z-
soldiers with head circumferences less than approximately 21 inches or greater than score of -2.5 is 0.0062). (b) About 31% of this population of older women has osteoporosis: The
a roximatel 24.6 inches. BMD level that is 2.5 standard deviations below the young adult mean would standardize to -0.5
for these older women, and the area to the left of this standard z-score is 0.3085.

/-'-\
1 2.59 (a) Except for one unusually high value, these numbers are reasonably Normal because the
I \ other points fall close to a line. (b) The graph is almost a perfectly straight line, indicating that
I \ the data are Normal. (c) The flat portion at the bottom and the bow upward indicate that the
I \ distribution of the data is right-skewed data set with several outliers. (d) The graph shows 3
I
I \ clusters or mounds (one at each end and another in the middle) with a gap in the data towards the
lower values. The flat sections in the lower left and upper right illustrate that the data have peaks
at the extremes.

2.60 If the distribution is Normal, it must be symmetric about its mean-and in particular, the
2.56 (a) The density curve is shown below. The coordinates of the right endpoint of the segment 1oth and 90th percentiles must be equal distances below and above the mean-so the mean is 250
points. If225 points below (above) the mean is the lOth (90th) percentile, this is 1.28 standard
are (h,h). 225
deviations below (above) the mean, so the distribution's standard deviation is
1.28
=175.8

points.

2.61 Use window ofX[55,145] 1s and Y[-0.008, 0.028].01 (a) The calculator command
shadeNorm(135,1E99,100,15) produces an area of0.009815. About .99% of the students earn
WISC scores above 135. (b) The calculator command shadeNorin(-lE99,75,100,15) produces
an area of0.04779. About 4.8% ofthestudents earn WISC scores below 75. (c)
shadeNorm(70,130,100,15) = 0.9545. Also, 1- 2(shadeNorm(-1E99,70,100,15)) = 0.9545.

2.62 The calculator command normalcdf (-1E99, 27, 20.9, 4.8) produces an area of0.89810596
(b) To find the median M, set the area of the appropriate triangle (~basex height) equal to 0.5 or 89.81%, which agrees with the value obtained in Exercise 2.53.

2.63 The calculator commands invNorm(.05,22.8,1.1) = 20.99 and invNorm(.95,22.8,1.1) =


and solve. That is, solve the equation _!._ M x M =_!._forM. Thus, M = 1. The same approach 24.61 agree with the values obtained in Exercise 2.55.
2 2

yields Q1 = ~ =0.707 and Q = J% =1.225. (c) The mean will be slightly below the median of
3

1 because the density curve is skewed left. (d) The proportion of observations below 0.5 is
0.5x0.5x0.5=0.125 or 12.5%. None (0%) of the observations are above 1.5.
56 Chapter 3 Examining Relationships 57

Chapter 3

3.1 (a) The amount of time a student spends studying is the explanatory variable and the grade
on the exam is the response variable. (b) Height is the explanatory variable and weight is the
response variable. (c) Inches of rain is the explanatory variable and the yield of corn is the
response variable. (d) It is more reasonable to explore the relationship between a student's grades
in statistics and French. (e) A family's income is the explanatory variable and the years of
education their eldest child completes is the response variable.

3.2 The explanatory variable is weight of a person, and the response variable is mortality rate
~that is, how likely a person is to die over a 10-year period). The other variables that may (b) Th~ scatterplot shows a negative, linear, fairly weak relationship. (Note: direction=negative,
mfluence the relationship between weight and survival are the amount of physical activity, form=lmear, strength=weak.) (c) Because this association is negative, we conclude that the
perhaps measured by hours of exercise per week, and economic status, which could be measured sparrowhawk is a long-lived territorial species.
by annual income of the person, family net worth, amount of savings, or some other financial
variable. 3.7 (a) A positive association between IQ and GPA means that students with higher IQs
tend to have higher GPAs, and those with lower IQs generally have lower GPAs. The plot
3.3 Water temperature is the explanatory variable, and weight change (growth) is the response does show a positive association. (b) The form of the relationship roughly linear, because a line
variable. Both are quantitative. through the scatterplot of points would provide a good summary. The positive association is
moderately strong (with a few exceptions) because most of the points would be close to the line.
3.4 The explanatory variable is the type of treatment-removal of the breast or removal of only (c) The lowest point on the plot is for a student with an IQ of about 103 and a GPA of about 0.5.
the .tumor and nearby lymph nodes, followed by radiation, and survival time is the response
vanable. Type of treatment is a categorical variable, and survival time is a quantitative variable. 3.8 (a) From Figure 3.5, the returns on stocks were about 50% in 1954 and about -28% in 1974.
(b) The return on Treasury bills in 1981 was about 15%. (c) The scatterplot shows no clear
3.5 (a) The explanatory variable is the number of powerboat registrations. (b) A scatterplot is pattern. The statement that "high treasury bill returns tend to go with low returns on stocks"
shown below. implies a negative association; there may be some suggestion of such a pattern, but it is
extremely weak.

variable is shown below.

The scatterplot shows a positive linear relationship between these variables. (c) There is a
posi~ive li?ear association between powerboat registrations and manatees killed. (d) Yes, the
relatt?n~h1p between these variables is linear. (e) The relationship is a strong, positive, linear
ass?cmtt.on. Yes, the number ofmanatees killed can be predicted accurately from powerboat (b) The relationship is curved or quadratic. High amounts of fuel were used for low and high
registratiOns. For 719,000 powerboat registrations, about 48 manatees would be killed by values of speed and low amounts of fuel were used for moderate speeds. This makes sense
powerboats. because the best fuel efficiency is obtained by driving at moderate speeds. (Note: 60 km/hr is
about 37 mph) (c) Poor fuel efficiency (above average fuel consumption) is found at both high
3.6 (a) A scatterplot is shown below. and low speeds and good fuel efficiency (below average fuel consumption) is found at moderate
speeds. (d) The relationship is very strong, with little deviation for a curve that can be drawn
through the points.
56 Chapter 3 Examining Relationships 57

Chapter 3

3.1 (a) The ~mount oftime a student spends studying is the explanatory variable and the grade
on the exam .Is the response variable. (b) Height is the explanatory variable and weight is the
response vanable. (c) Inches of rain is the explanatory variable and the yield of corn is the
:espo~se. variable. (d) It is more reasonable to explore the relationship between a student's grades
m statistics and French. (e) A family's income is the explanatory variable and the years of
education their eldest child completes is the response variable.

3.2 ~he expl~natory variabl~ is weight of a person, and the response variable is mortality rate
~that IS, how hkely a person Is to die over a 10-year period). The other variables that may
(b) The scatterplot shows a negative, linear, fairly weak relationship. (Note: direction=negative,
mfluence the relationship between weight and survival are the amount of physical activity,
form=linear, strength=weak.) (c) Because this association is negative, we conclude that the
perhaps m~asured by hours of exercise per week, and economic status, which could be measured
sparrowhawk is a long-lived territorial species.
by annual mcome of the person, family net worth, amount of savings or some other financial
variable. '
3.7 (a) A positive association between IQ and GPA means that students with higher IQs
tend to have higher GPAs, and those with lower IQs generally have lower GPAs. The plot
3.3. Water temperature is the explanatory variable, and weight change (growth) is the response
does show a positive association. (b) The form ofthe relationship roughly linear, because a line
vanable. Both are quantitative.
through the scatterplot of points would provide a good summary. The positive association is
moderately strong (with a few exceptions) because most of the points would be close to the line.
3.4 The explanatory variable is the type of treatment-removal of the breast or removal of only
(c) The lowest point on the plot is for a student with an IQ of about 103 and a GPA of about 0.5.
the .tumor and nearby lymph nodes, followed by radiation, and survival time is the response
vanable. Type of treatment is a categorical variable, and survival time is a quantitative variable.
3.8 (a) From Figure 3.5, the returns on stocks were about 50% in 1954 and about -28% in 1974.
(b) The return on Treasury bills in 1981 was about 15%. (c) The scatterplot shows no clear
3.5 (a) The explanatory variable is the number ofpowerboatregistrations. (b) A scatterplot is
shown below. pattern. The statement that "high treasury bill returns tend to go with low returns on stocks"
implies a negative association; there may be some suggestion of such a pattern, but it is
extremely weak.

variable is shown below.



The. ~catt~rplot sho":s a positive linear relationship between these variables. (c) There is a
posJ~Ive h?ear association between powerboat registrations and manatees killed. (d) yes, the
relati?n~htp between these variables is linear. (e) The relationship is a strong, positive, linear
ass?ciati.on. Yes, the number of manatees killed can be predicted accurately from powerboat
(b) The relationship is curved or quadratic. High amounts of fuel were used for low and high
registratiOns. For 719,000 powerboat registrations, about 48 manatees would be killed by
powerboats. values of speed and low amounts of fuel were used for moderate speeds. This makes sense
because the best fuel efficiency is obtained by driving at moderate speeds. (Note: 60 kmlhr is
3.6 (a) A scatterplot is shown below. about 37 mph) (c) Poor fuel efficiency (above average fuel consumption) is found at both high
and low speeds and good fuel efficiency (below average fuel consumption) is found at moderate
speeds. (d) The relationship is very strong, with little deviation for a curve that can be drawn
through the points.
Examining Relationships 59
58 Chapter 3

+
+
+

(b) The association is positive, and the relationship is linear and moderately strong. (c) The 3.13 (a) The scatterplot below shows a strong, positive, linear relationship between the two
scatterplot below shows that the pattern of the relationship does hold for men. However, the measurements. all five to be from the same species.
relationship between mass and rate is not as strong for men as it is for women. The group of
men has lean masses and metabolic rates than the group of women .


.


(b) The femur measurements have mean of 58.2 and a standard deviation of 13.2. The humerus
measurements have a mean of 66 and a standard deviation of 15.89. The table below shows the
standardized measurements (labeled zfemur and zhumerus) obtained by subtracting the mean and
3.11 A scatterplot from a calculator is shown below. As expected, the calculator graph looks the dividing by the standard deviation. The column labeled "product" contains the product
same as the scatterplot in Exercise 3.9 (a). (zfemurxzhumerus) of the standardized measurements. The sum of the products is 3.97659, so
the correlation coefficient isr =!x 3.97659 = 0.9941.
+ 4
femur Humerus zfemur zhumerus product
38 41 -1.53048 -1.57329 2.40789
56 63 -0.16669 -0.18880 0.03147
+ ++ 59 70 0.06061 0.25173 0.01526
+ ++ 64 72 0.43944 0.37759 0.16593
+ ++ 74 84 1.19711 1.13277 1.35605
+ ++
++ (c) The correlation coefficient is the same, 0.9941.

3.14 The scatterplot below, with price as the explanatory variable, shows a strong, positive,
3.12 A scatterplot from a calculator is shown below. As expected, the calculator graph shows linear association between price and deforestation percent.
the.same relationship as the scatterplot in Exercise 3.10.
60 Chapter 3 Examining Relationships 61

D D D

D
a D

WINDOW
XMin=-5
(b) The prices have a mean of 50 and a standard deviation of 16.32. The deforestation percents Xfilax=5
have a mean of 1.738% and a standard deviation of0.928%. The table below shows the Xscl=l
standardized values (labeled zprice and zdeforestation) obtained by subtracting the mean and Ymin=-.7
dividing by the standard deviation. The column labeled "product" contains the product
YMax=.7
Ysc1=.2
(zpricexzdeforestation) of the standardized measurements. The sum of the products is 3.82064, Xres.=l
so the correlation coefficient isr = !x3.82064 = 0.9552. (b) The correlation r = 0.2531. (c) The two scatterplots, using the same scale for both variables,
4 are shown below.
price Deforestation
29 0.49
zprice
-1.28638
zdeforestation
-1.34507
product
1.73028
. ..
40 1.59 -0.61256 -0.15951 0.09771
54 1.69 0.24503 -0.05173 -0.01268
55

1.82 0.30628 0.08838 0.02707
72 3.10 1.34764 1.46794 1.97826
(c) The correlation coefficient is the same, 0.9552.

3.15 (a) The lowest calorie count is about 107 calories and the sodium level for this brand is
about 145 mg. The highest calorie count is about 195 calories, and the sodium level for this
brand is about 510 mg. (b) The scatterplot shows positive association; high-calorie hot dogs tend (d) The correlation between x* andy* is the same as the correlation between x andY: r = 0:2531.
to be high in salt, and low-calorie hot dogs tend to have low sodium. (c) The lower left point is Although the variables have been transformed, the distances between the correspondmg pomts
an outlier. Ignoring this point, the relationship is linear and moderately strong. and the strengths of the association have not changed.
3.16 (a) The correlation r is clearly positive but not near 1. The scatterplot shows that students 3.18 (a) The correlation between the percent of returning birds and the number of new adults is
with high IQs tend to have high grade point averages, but there is more variation in the grade r = -0.748. A with the two new points added is shown below.
point averages for students with moderate IQs. (b) The correlation r for the data in Figure 3.8
would be closer to one. The overall positive relationship between calories and sodium is
stronger than the positive association between IQs and GPAs. (c) The outliers with moderate IQ
scores in Figure 3.4 weaken the positive relationship between IQ and GPA, so removing them
would increaser. The outlier in the lower left corner ofFigure 3.8 strengthens the positive,
linear relationship between calories and sodium, so removing this outlier would decrease r.

3.17 (a) A scatterplot is shown below.


62 Chapter 3 Examining Relationships 63

The correlation for the original data plus point A is r = -0.807. The correlation for the original
data plus point B is r = -0.469. (c) Point A fits in with the negative linear association displayed
by the other points, and even emphasizes (strengthens) that association because, when A is
included, the points of the scatterplot are less spread out (relative to the length of the apparent
line suggested by the points). On the other hand, Point B deviates from the pattern, weakening
the association.

3.19 There is a perfect, positive association between the ages of the women and their spouses, so
r = 1.

(b) The association between time and pulse is negati~e. The faster Profe~sor M?ore swims 2000
yards the more effort he will have to exert. Thus, a higher speed (lower time) will correspond
with a higher pulse and slower speeds (higher times) will correspond with lower pulses. (c) The
negative, linear relationship is moderately strong. (d) The correlation is;= -0.744. The .
scatterplot shows a negative association between time and pulse. Small times correspond with
large pulses and large times correspond with small pulses. (e) The value of r would not
change.

3.23 (a) Gender is a categorical variable and the correlation coefficientrmeasures the strength
(b) The speeds have a mean of 40 and a standard deviation of 15 .81. The mileages have a mean
of linear association for two quantitative variables. (b) The largest possible value of the
of26.8 mpg and a standard deviation of2.68 mpg The table below shows the standardized
correlation coefficient r is 1. (c) The correlation coefficient r has no units.
values (labeled zspeed and zmpg) obtained by subtracting the mean and dividing by the standard
deviation. The column labeled "product" contains the product (zspeedxzmpg) of the
3.24 The paper's report is wrong because the correlation ( r = 0.0) is interpreted incorrectly.
standardized measurements. The sum of the products is 0.0, so the correlation coefficient is also
The author incorrectly suggests that a correlation of zero indicates a negative association
0.0.
between research productivity and teaching rating. The psychologist meant that there is no linear
~eed ll!Pg zspeed zmpg product association between research productivity and teaching rating. In other words, knowledge of a
20 24 -1.26491 -1.04350 1.31993 professor's research productivity will not help you predict her teaching rating.
30 28 -0.63246 0.44721 -0.28284
40 30 0.00000 1.19257 0.00000 with the correct calories as the explanatory variable, is shown below.
50 28 0.63246 0.44721 0.28284
60 24 1.26491 -1.04350 -1.31993
..
The correlation coefficient r measures the strength of lznear associatiOn between two quantitative
variables; this plot shows a nonlinear relationship between speed and mileage.

3.21 (a) New York's median household income is about $32,800 and the mean income per
person is about $27,500. (b) Both of these variables measure the prosperity of a state, so you
would expect an increase on one measure to correspond with an increase in the other measure.
Household income will generally be higher than income per person because most households
have one primary source of income and at least one other smaller source of income. (c) In the
District of Columbia there are a relatively small number of individuals earning a great deal of
money. Thus, the income distribution is skewed to the right, which would raise the mean per
capita income above the median household income. (d) Alaska's median household income is (b) There is a positive, linear relationship between the correct and guessed calories. The guessed
about $48,000. (e) Ignoring the outliers, the relationship is positive, linear, and moderately calories for 5 oz. of spaghetti with tomato sauce and the cream-filled snack cake are unusually
strong. high and do not appear to fit the overall pattern displayed for the other foods. (c) The correlation
Examining Relationships 65
64 Chapter 3

is r = 0.825 . This agrees with the positive association observed in the plot; it is not closer to 1
because of the unusual guessed calories for spaghetti and cake. (d) The fact that the guesses are
all higher than the true calorie count does not influence the correlation. The correlation r would
not change if every guess were 100 calories higher. The correlation r does not change if a
constant is added to all values of a variable because the standardized values would be unchanged.
(c) The correlation without these two foods is r = 0.984 . The correlation is closer to 1 because
the relationship is much stronger without these two foods.

3.26 (a) Rachel should choose small-cap stocks because small-cap stocks have a lower
correlation with municipal bonds. Thus, the weak, positive relationship between small-cap
stocks and bonds will provide more diversification than the large-cap stocks, which have a
stronger positive relationship with bonds. (b) She should look for a negative correlation, 3.29 (a) For every one week increase in age, the rat will increase its weight by an average of 40
although this would also mean that the return on this investment would tend to decrease when (b) They intercept provides an estimate for the birth weight (100 grams) ofthis male rat.
return on bonds increases. below.
~~~~~~
3.27 The correlation isr = 0.481. The one unusual point (10, 1) is responsible for reducing the
correlation. Outliers tend to have fairly strong effects on correlation; the effect is very strong
here because there are only six observations.

is shown below .


(d) No, we should not use this line to predict the rat's weight at 104 weeks. This would be
extrapolation. This regression line would predict a weight of 4260 grams (about 9.4lbs) for a 2
year old rat! The regression equation is oqly reliable for times where data were collected.

3.30 (a) The slope is 0.882; this means that on the average, reading score increases by 0.882 for
each one-point increase in IQ. (b) The predicted scores for x = 90 and x = 130 are -33.4 +
(b) The overall pattern is not linear. The yield tends to be highest for moderate planting rates 0.882x90 = 45.98 and -33.4 + 0.882xl30 = 81.26. (c) This is most easily done by plotting the
and smallest for small and large planting rates. There is clearly no positive or negative ints 45 and 81 then drawing the line connecting them.
association between p 1antmg rates and yield. (d) The mean yields for the five planting rates are:
Planting rate Mean
12000 131.025
16000 143.150
20000 146.225
24000 143.067
28000 134.750
A scatterplot With the means added is shown below. We would recommend the planting rate
with the highest average yield, 20,000 plants per acre.

(d) The intercept (-33.4) would correspond to the expected reading score for a child with an IQ
of 0; neither that reading score nor that IQ has any meaningful interpretation.
66 Chapter 3 Examining Relationships 67

3.31 (a) The slope is 0.0138 minutes per meter. On average, if the depth ofthe dive is
increased by one meter, it adds 0.0138 minutes (about 0.83 seconds) to the time spent
underwater. (b) When Depth= 200, the regression line estimates DiveDuration to be 5.45
minutes (5 minutes and 27 seconds). (c) To plot the line, compute DiveDuration = 3.242
minutes when = 40 and DiveDuration = 6. 83 minutes when Depth = 300 meters.
/

/.


(c) When 716,000 powerboats are registered, the predicted number of manatees killed will be
-41.43 + 0.1249 x 716 = 47.99, or about 48 manatees. (d) Yes, the measures seem to be
succeeding, three of the four new points are below the regression line, indicating that fewer
manatees than predicted were killed. Additional evidence of success is provided by the two
for 1992 and 1993 fall well below the overall pattern.
(d) The intercept suggests that a dive of no depth would last an average of2.69 minutes; this
obviously does not make any sense.
3.32 (a) The slope is -0.0053; this means that on the average for each additional week of study
the pH decreased by 0.0053 units. Thus, the acidity of the precipitation increased over time. (b) ..
To plot the line, compute pH at the beginning (weeks= 0) and end (weeks= 150) of the study.
At the of the is 5.43 and at the end of the study pH is 4.635.
. .
(e) The mean number of manatee deaths for the years with 716,000 powerboat registrations is 42.
The prediction of 48 was too high.

3.34 (a) The least squares regression line is y = 31.9- 0.304x. The calculator output (and
Minitab output) is shown below.
LinRe9
'::l=a+bx
(c) Yes, the y intercept provides an estimate for the pH level at the beginning of the study. (d) .a=31.93425919
b=-.3040229451
The regression line predicts the pH to be 4.635 at the end of this study. r:?:=.5602033042
r=-.7484673034
3.33 (a) A scatterplot from the calculator is shown below.
+ WINDOW Minitab output
+ XMin=1975
+
+ xr. . ax=1995 The regression equation is
newadults = 31.9 - 0.304 %returning
......... X:scl=l
YMin=10 Predictor Coef SE Coef T p

+
+ +
+
YMax=55 Constant 31.934
-0.30402
4.838
0.08122
6.60
-3.74
0.000
0.003
+ + Y:sc1=.2 %returning
....................
+ Xre:s=l S = 3.66689 R-Sq = 56.0% R-Sq{adj) = 52.0%
(b) Let y = number of manatees killed and x = number of powerboat registrations. The least-
square regression equation is y = -41.43 + 0.1249x.
Chapter 3 Examining Relationships 69
68

the field measurements tend to be smaller than the laboratory measurements for large depths. (b)
(b) The means, standard deviations, and correlation are: x = 58.23%, sx = 13.03%,
The points for the larger depths fall systematically below the line y = x showing that the field
y = 14.23 new birds, sY = 5.29 new birds, r = -0.748. (c) The slope is measurements are too small compared to the laboratory measurements. (c) In order to minimize
b = -0.748( 5 29 )
13.03
= =
-0.304 and the intercept is a= 14.23-bx 58.23 31.9. (d) The slope tells
the sum ofthe squared distances from the points to the regression 1ine,the top right part ofthe
blue line in Figure 3.20 would need to be pulled down to go through the "middle" of the group of
us that as the percent of returning birds increases by one the number of new birds will decrease points that are currently below the blue line. Thus, the slope would decrease and the intercept
by -0.304 on average. They intercept provides a prediction that we will see 31.9 new adults in a would increase. (d) The residual plot clearly shows that the prediction errors increase for larger
new colony when the percent of returning birds is zero. This value is clearly outside the range of laboratory measurements. In other words, the variability in the field measurements increases as
values studied for the 13 colonies of sparrowhawks and has no practical meaning in this the laboratory measurements increase. The least squares line does not provide a great fit,
situation. (e) The predicted value for the number of new adults is 31.9- 0.304x60 = 13.69 or especially for larger depths.
about 14.

3.35 (a) Let y =Blood Alcohol Content (BAC) and x =Number ofBeers. The least-squares
regression line is y = -0.0127 + 0.0 17964x. (b) The slope indicates that on average, the BAC
will increase by 0.017964 for each additional beer consumed. The intercept suggests that the
average BAC will be -0.01270 if no beers are consumed; this is clearly ridiculous. (c) The
predicted BAC for a student who consumed 6 beers is -0.0127 + 0.017964x6 = 0.0951. (d) The
prediction error is 0.10-0.0951 = 0.0049. - .----~--

3.36 (a) The relationship between the two variables in Figure 3.15 is positive, linear, and very
strong. (b) The regression line predicts that the Sanchez family would average about 500 cubic
feet of gas per day in a month that averages 20 degree-days per day. (c) The blue line in Figure
3.15 is called the "least-squares line" because it minimizes the sum of the squared deviations of (b) We would certainly not use the regression line to predict fuel consumption. The scatterplot
the observed amounts of gas consumed from the predicted amounts of gas. In other words, the shows a nonlinear relationship. (c) The sum of the residuals provided is -0.01, which illustrates
least squares line minimizes the squared vertical distances from the observed amounts of gas a slight roundoff error. (d) The residual plot indicates that the regression line underestimates
consumed to the values predicted by the line. (d) The least squares line provides a very good fit fuel consumption for slow and fast speeds and overestimates fuel consumption for moderate
because the prediction errors, the vertical distances from the points to the line, are very small and speeds. The quadratic pattern in the residual plot indicates that the regression model is not
the linear relationship is very strong. a ro riate for these data.

. b = 0.894 (0.044139929) =
3.37 The slope IS . 0 .018 and the mtercept
. .
IS
2.1975365
=
a= 0.07375 -bx 4.8125 -0.0129, which is the same asthe equation in Exercise 3.35.

3.38 (a) Let y =gas used and x =degree-days. The least-squares regression line 0

is y = 1.08921 +0.188999x. (b) The slope tells us that on average the amount of gas used
increases by 0.188999 for each one unit increase in degree-days. They intercept provides a
realistic estimate (108.921 cubic feet) for the average amount of gas used when the average
number ofheating degree-days per day is zero. (c) The predicted value is 1.08921 +
0.188999x20 = 4.8629, which is very close to the rough estimate of 5 from Exercise 3.36 (b).
3.41 (a) The scatterplot withy= rate and x =mass is shown below.
(d) The predicted value for this month is 1.08921 + 0.188999x30 = 6.7592, so the prediction
error is 640- 675.92 = -35.92.

3:39 (a) There is a positive, linear association between the two variables. There is more
variation in the field measurements for larger laboratory measurements. The values are scattered
above and below the line y = x for small and moderate depths, indicating strong agreement, but
Chapter 3 Examining Relationships 71
70

a
LinRe9
a a a :::t=a+bx
a a=201. 1615'396
a b=24 .. 02606662
~~
an r2=.7681692929
a
a r=.8764526758
~
a
a
......................... I
(b) The least-squares regression line is y = 201.162 + 24.026x . (c) The slope tells us that a
female will increase her metabolic rate by a mean of24.026 calories for each additional kg of
lean body mass. The intercept provides an estimate for the average metabolic rate (201 calories)
for women, when their lean body mass is zero (clearly unrealistic). (d) The residual plot (shown
below) shows no clear pattern, so the least squares line is an ad equate model for the data.
a
WI~~DOW
Xr.-.in=31
Xr.-.ax=56
a Xscl=1
a Pna a Yr.-.in=-147
Yr.-.ax=260
a a a a a Yscl=.2
Xres=1
(e) The residual plot with the predicted value on the horizontal axis looks exactly like the
.
preVIOUS p1Ot 0 f the rest'dua1s versus 1ean bo1y
d mass.
+
WINDOW
XMit1=945
Xr.-.ax=1560
+ Xscl=100
+ ......+ + Yr.-.in=-147
Yr.-.ax=260
+ + + + + Yscl= .. 2
Xres=l
3.42 (a) The correlations are all approximately the same (To three decimal places
rA = rB = rc = 0.816andrn = 0.817 ), and the regression lines are all approximately y = 3.0+0.5x.
=
For all four sets, we predict y 8whenx = 10. (b) The scatterplots are provided below.
(d) The regression line should only be used for Data Set A. The variables have a moderate linear
association with a fair amount of variability from the regression line and no obvious pattern in
the residual plot. For Data Set B, there is an obvious nonlinear relationship which can be seen in
both plots; we should fit a parabola or some other curve. For Data Set C, the point (13, 12.74)
deviates from the strong linear relationship of the other points, pulling the regression line up. If a
data entry error (or some other error) was made for this point, a regression line for the other
points would be very useful for prediction. For Data Set D, the data point with x = 19 is a very
influential point-the other points alone give no indication of slope for the line. The regression
line is not useful in this situation with only two values of the explanatory variable x.

3.43 (a) The scatterplot of the data with the least-squares regression line is below.
Chapter 3 Examining Relationships 75
74

mg/L. The negative value of BOD was obtained because values ofTOC near zero were probably
not included in the study. This is another example where the intercept does not have a practical
interpretation.

3.50 (a) The least-squares line for predictingy = GPA from x = IQ has slope
b = 0.6337(2!_) = 0.101 and intercept a= 7.447 -0.10lx108.9 = -3.5519. Thus, the
13.17
regression line is y =-3.5519+ 0.101x. (b) r 2 =(0.6337r = 0.4016. Thus, 40.16% of the
variation in GPA is accounted for by the linear relationship with IQ. (c) The predicted GPA for
this student is y =-3.5519+ O.lOlx 103 = 6.8511 and the residual is6.8511-0.53 = 6.3211. The correlation is 0.99994 > 0.997, so recalibration is not necessary. (b) The regression
line for predicting absorbance is y = 1.6571 + 0.1133x. The average increase in absorbance for a
1 mg/1 increase in nitrates is 0.1133. The predicted absorbance when no nitrates are present is
1.65 71. Ideally, we should predict no absorbance when nitrates are not present. (c) The
predicted absorbance in a specimen with 500 mg/1 of nitrates is
y = 1.6571 + 0.1133 x 500 = 58.308. (d) This prediction should be very accurate since the linear
relationship is almost perfect, see the scatterplot above. Almost 100% (r = 0.9999) of the
2

variation in absorbance is accounted for by the linear relationship with nitrates.

(b) Clearly, this line does not fit the data very well; the data show a clearly curved pattern. (c)
The residuals sum to 0.01 (the result of roundoff error). The residual plot below shows a clear
quadratic pattern, with the first two and the last four residuals being negative and those between
3 and 8 months

(b) The regression line for predicting y =height from x =age is y = 71.95 + 0.3833x. (c) When
x = 40 months: y = 87.28 em. When x = 60 months: y = 94.95 em. (d) A change of6 em in
12 months is 0.5 em/month. Sarah is growing at about 0.38 em/month; more slowly than normal.

3.54 (a) Sarah's predicted height at 480 months is y = 71.95 + 0.3833x 480 = 255.93cm.
Converting to inches, Sarah's predicted height is 255.93x0.3937 =100.7596 inches or about 8.4
feet! (b) The prediction is impossibly large, because we incorrectly used the least-squares
3.52 (a) A scatterplot, with the regression line, is shown below. regression line to extrapolate.

3.55 (a) The slope of the regression line for predicting final-exam score from pre-exam totals is

b = 0.6(~) = 0.16; for every extra point earned on the midterm, the score on the final exam
30 .
increases by a mean of0.16. The intercept ofthe regression line is a= 75-0.16x 280 = 30.2; if
the student had a pre-exam total ofO points, the predicted score on the final would be 30.2. (b)
Chapter 3 79
78 Examining Relationships

The slope and intercept change slightly wh;n Child 19 is removed, so this point does not appear
2 2
The regression lines are: y = -0.0578 + 0.0076x (with all subjects) and
2
to be extremely influential. (b) With all children, r = 0.410; without Child 19, r = 0.572. r y = -0.0152+0.0067x(without Subject 16).
increases because more of the variability in the scores is explained by the stronger linear
relationship with age. In other words, with Child 19's high Gesell score removed, there is less 3.63 Higher income can cause better health: higher income means more money to pay for
variability around the regression line. medical care, drugs and better nutrition, which in turn results in better health. Better health can
cause higher income: if workers enjoy better health, they are better able to get and hold a job,
3.61 (a) A scatterplot with the two new points is shown below. Point A is a horizontal outlier; which can increase their income.
that is, it has a much smaller x-value than the others. Point B is a vertical outlier; it has a higher
y -value than the others. 3.64 No, you cannot shorten your stay by choosing a smaller hospital. The positive correlation

~-
does not imply a cause and effect relationship. Larger hospitals tend to see more patients in poor
condition, which means that the patients will tend to require a longer stay.

(b) The three regression formulas are: y = 31.9- 0.304x (the original data);
y = 22.8-0.156x(with Point A); y = 32.3-0.293x(with Point B). Adding Point B has little
impact. Point A is influential; it pulls the line down, and changes how the line looks relative to
the original 13 data points.
The least-squares regression line for predicting farm y =population from the explanatory
3.62 (a) Who? The individuals are 16 couples in their mid-twenties who were married or had variable x =year is y = 1166.93- 0.5868x. (b) The farm population decreased on average by
been dating for two years. What? The variables are empathy score (a quantitative measure of about 0.59 million (590,000) people per year. About 97.7% of the variation in the farm
empathy from a psychological test) and brain activity (a quantitative variable reported as a population is accounted for by the linear relationship with year. (c) The predicted farm
fraction between -1 and 1). Why? The researchers wanted to see how the brain expresses population for the year 2010 is -12,538,000; clearly impossible, as population must be greater
empathy. In particular, they were interested in checking ifwomen with higher empathy scores than or equal to zero.
have a stronger response when their partner has a painful experience. When, where, how, and by
whom? The researchers zapped the hands of the men and women to measure brain activity, 3.66 (a) Who? The individuals are students at a large state university. What? The variables are
presumably in a lab, doctor's office, or hospital. The results appeared in Science in 2004 so the the number of first-year students and the number of students who enroll in elementary
data were probably collected shortly before publication ofthe article. (b) Subject 16 is mathematics courses. Both variables are quantitative and take on integer values from several
influential on the correlation. With all subjects, r = 0.515 ; without Subject 16, r = 0.331. (c) hundred to several thousand, depending on the size of the university. Why? The data were
Subject 16 is not influential on the least-squares regression line (see the scatterplot below). collected to try to predict the number of students who will enroll in elementary mathematics

L
80 Chapter 3 Examining Relationships 81

courses. When, where, how, and by whom? Faculty members in the mathematics department at
a large state university obtained the enrollment data and class sizes from 1993 to 2000. These
data were probably extracted from a historical data base in the Registrar's office. A scatterplot,
with the is shown below.

As the scatterplot shows, the point from 1986 is not very influential on the regression line. The
two regression lines are: y = 5.694+0.6201x (with all points) andy= 4.141 +0.5885x (without
the point in 1986). (b) The residual plot below, for all ofthe points, does not show any unusual
the residual is visible.
The regression line appears to provide a reasonable fit. About 69.4% of the variation in
enrollments for elementary math classes is accounted for by the linear relationship with the
number of students. The residual plots~a~re~s~h~o~w~n~b~e~l~ow~.~El12~~llillil~~~WJ




3.69 (a) Yes, but the relationship is not very strong. (b) The mortality rate is extremely variable
for those hospitals that treat few heart attacks. As the number of patients treated increases the
The plot of the residuals against x shows that a somewhat different line would fit the five lower variability decreases and the mortality rate appears to decrease giving the appearance of an
points well. The three points above the regression line represent a different relation between the exponentially decreasing pattern of points in the plot. The nonlinearity strengthens the
number of first-year students and mathematics enrollments. The plot of the residuals against conclusion that heart attack patients should avoid hospitals that treat few heart attacks.
year clearly illustrates that the five negative residuals are from the years 1993 to 1997, and the
three positive residuals are from 1998, 1999, and 2000. (c) The change in requirements was not
visible on the scatterplot in part (a) or the plot of the residuals against x. However, the change is
clearly illustrated (negative residuals before 1998 and positive residuals after 1998) on the plot of
the residuals against year.

3.67 The correlation for individual stocks would be lower. Individual stock performances will
be more variable weakening the relationship.

3.68 A scatterplot, with both regression lines, is shown below. A scatterplot with a circle
around the point from 1986 with the largest residual is shown in the solution to Exercise 3.56.

The influential observation (circled) is observation 7, (1 05, 89). (b) The line with the larger
slope is the line that omits the influential observation (105, 89). The influential point pulls the
regression line with all of the points downward in order to minimize the overall prediction error.

~b-. _____________________________________________. .__________________________________________


82 Chapter 3 Examining Relationships 83

3.71 Age is a lurking variable. We would expect both variables, shoe size and reading
comprehension score, to increase as the child ages.

=
The regression line for predicting y = wind stress from x = snow cover is y 0.212- 0.0056x;
r 2 = 0.843. The linear relationship explains 84.3% of the variation in wind stress. We have good
evidence that decreasing snow cover is strongly associated with increasing wind stress. (b) The
graph shows 3 clusters of 7 points.

3.74 The sketch below shows two clusters of points, each with a positive correlation. The top
(b) The correlations are: 'i =0.4819 (all observations); r =0.5684(without Subject 15);
2
cluster represents economists employed by business firms and the bottom cluster represents
=
r3 0.3837 (without Subject 18). Both outliers change the correlation. Removing subject 15 economists employed by colleges and universities. When the two clusters are combined into one
large group of economists, the overall correlation is negative.
increases r, because its presence makes the scatterplot less linear, while removing Subject 18
decreases r, because its presence decreases the relative scatter about the linear pattern. (c) The
..... -
.... .
. , :..
=
three regression lines shown in the scatterplot above are: y 66.4 + 10.4x (all observations); '
= =
y 69.5+8.92x (without #15); y 52.3+ 12.lx (without #18). While the equation changes in .'. .. . I

...
,. . . .
response to removing either subject, one could argue that neither one is particularly influential,
as the line moves very little over the range of x (HbA) values. Subject# 15 is an outlier in terms
... ..... ... ,... .
.. .

of its y value; such points are typically not influential. Subject # 18 is an outlier in terms of its x
value, but is not particularly influential because it is consistent with the linear pattern suggested
by the other points. 3.75 (a) In the scatterplot below right-hand points are filled circles; left-hand points are open
circles.
3.73 (a) Who? The individuals are land masses. What? The two quantitative variables are the ~~~~~~ 0
amount of snow cover (in millions of square kilometers) and summer wind stress (in newtons per
square meter). Why? The data were collected to explore a possible effect of global warming.
0 0 0
When, where, how, and by whom? The data from Europe and Asia appear to be collected over a
7 year period during the months of May, June, and July. The amount of snow cover may have
been estimated from arial photographs or satellite images and the summer wind stress
measurements may have been collected by meteorologists. The scatterplot below suggests a
negative linear association, with correlation r = -0.9179. 0

.. - -.-: "' _,_._. . . ___ - - . -.ft.-.. -.


~ --~-

(b) The right-hand points lie below the left-hand points. (This means the right-hand times are
shorter, so the subject is right-handed.) There is no striking pattern for the left-hand points; the
pattern for right-hand points is obscured because they are squeezed at the bottom of the plot. (c)
=
The regression line for the right hand isy 99.4 +0.0283x (r = 0.305, l = 9.3%). The regression
=
line for the left hand is y 172+0.262x(r= 0.318, l = 10.1 %). The left-hand regression is

I
84 Chapter 3 85
Examining Relationships

slightly better, but neither is very good: distance accounts for only 9.3% (right) and 10.1% (left)
of the variation in time.

(5) r 2 = 0.536, which indicates that about 54% of the variation in the average number of home
runs per game is accounted for by the linear relationship with year. In other words, about 46% of
the variation is not explained by the least-squares regression line. (6) The predicted value for
CASE CLOSED (1) A scatterplot is shown below. The average number ofhome runs hit per
2001 is about 2.16. This estimate is probably not very accurate. In particular, since the residuals
game decreases from 1960 to 1970, then levels offbefore increasing from about 1980 to 2000.
are positive for all years after 1995, this estimate is likely to be too low. (7) The prediction error
The correlation is which indicates a moderate positive association.
is 2.092-2.16 = -0.068. The estimate is not bad, and it even overestimated the average number
of home runs per game. (8) No, these data should not be used to predict the mean number of


... home runs per game in 2020. This case study has illustrated that patterns can change over time
so we have no data to help use predict what might happen 20 years in the future. We should not
use the regression line to extrapolate.


3.77 Seriousness of the fire is a lurking variable: more serious fires need more attention. It
would be more accurate to say that a large fire "causes" more firefighters to be sent, rather than
vice versa.

3.78 (a) Two mothers are 57 inches tall; their husbands are 66 and 67 inches tall. (b) The tallest
(2) A scatterplot below, with the regression line, shows a moderately strong linear association fathers are 74 inches tall; there are three of them, and their wives are 62, 64, and 67 inches tall.
between average home runs per game and year after Rawlings became the supplier. The (c) There is no clear explanatory variable; either could go on the horizontal axis. (d) Positive
correlation is 0.732. association means that when one parent is short (tall) the other parent also tends to be short (tall).
In other words, there is a direct association between the heights of parents. We say the
association is weak because there is a considerable amount of variation (or scatter) in the points.

. -.~ 3.79 (a) A scatterplot, with the regression line, is shown below. There is a negative association
~~ between alcohol consumption and heart disease.

~
!)

=
(3) The least-squares regression line is y -61.09+ 0.0316x. The slope (0.0316) indicates the
average increase in the average number of home runs as year increases by one. The intercept has
no practical meaning in this setting. (4) The residual plot suggests that the regression line
Examining Relationships 87
86 Chapter 3

(b) There is a very strong positive linear relationship, r = 0.999. (c) The regression line for
=
predicting y = steps per second from x = running speed is y 1. 7661 + 0.0803x. (d) Yes,
r 2 = 0.998 so 99.8% of the variation in steps per second is explained by the linear relationship
....
~.
- .~
with speed. (e) No, the regression line would change because the roles of x andy are reversed .

. .. ~
~
-,~
However, the correlation would stay the same, so r 2 would also stay the same .

3.82 The correlation for the individual runners would be lower because there is much more
~. variation among the individuals. The variation in the average number of steps for the group is
" smaller so the regression line does a great job for the published data.

3.83 (a) One possible measure ofthe difference is the mean response: 106.2 spikes/second for
(b) The regression equation for predicting y = heart disease death rate from x = alcohol
=
consumption is y 260.56- 22.969x. The slope provides an estimate for the average decrease
pure tones and 176.6 spikes/second for monkey calls-an average of an additional 70.4
=
spikes/second. (b) A scatterplot, with the regression line y 93.9 + 0.778x, is shown below.
(slope is negative) in the heart disease death rate for a one liter increase in wine consumption.
Thus, for every extra liter of alcohol consumed, the heart disease death rate decreases on average
by about 23 per 100,000. The intercept provides an estimate for the average death rate (261 per
100,000) when no wine is consumed. (c) The correlation is r = -0.843, which indicates a strong
2
negative association between wine consumption and heart disease death rate. r = 0.71, so 71%

of the variation in death rate is accounted for by the linear relationship with wine consumption.
= =
(d) The predicted heart disease death rate is y 260.56-22.969 x 4 168.68. (e) No. Positive r

indicates that the least-squares line must have positive slope, negative r indicates that it must

have negative slope. The direction ofthe association and the slope of the least-squares line must

always have the same sign. Recall b = r(2)sx


and the standard deviations are always
The third point (pure tone 241, call 485 spikes/second) has the largest residual; it is circled. The
nonnegative. first point (474 and 500 spikes/second) is an outlier in the x direction; it is marked with a square.
(c) The correlation drops only slightly (from 0.6386 to 0.6101) when the third point is removed;
3.80 (a) The point at the far left of the plot (Alaska) and the point at the extreme right (Florida) it drops more drastically (to 0.4 793) without the first point. (d) Without the first point, the
are unusual. Alaska may be an outlier because its cold temperatures discourage older residents = =
regression line is y 101 +0.693x; without the third point, it isy 98.4+ 0.679x.
from remaining in the state. Florida is unusual because many individuals choose to retire there.
(b) The linear association is positive, but very weak. (c) The outliers tend to suggest a stronger 3.84 (a) In the mid-1990s, European and American stocks were only weakly linked, but now it
linear trend than the other points and will be influential on the correlation. Thus, the correlation is more common for them to rise and fall together. Thus investing in both types of stocks is
with the outliers is r = 0.267, and the correlation without the outliers is r = 0.067. not that much different from investing in either type alone. (b) The article is incorrect; a
correlation of0.8 means that a straight-line relationship explains about 64% of the variation in
lot, with there ression line, is shown below. European stock prices.

27
3.85 The slope is b = 0.5( ) = 0.54. The regression line, shown below, for predicting y =
2.5
husband's height from x =wife's height isy = 33.67 + 0.54x.
1

88 Chapter 3

The predicted height is y = 33.67 + 0.54 x 67 = 69.85 inches.


3.86 Who? The individuals are the essays provided by students on the new SAT writing test.
What? The variables are the word count (length of essay) and score. Both variables are
quantitative and take on integer values. Why? The data were collected to investigate the
relationship between length of the essay and score. When, where, how, and by whom? The data
were collected after the first administration of the new SAT writing test in March, 2005. Dr.
Perelman may have obtained the data from the Educational Testing Service or from colleagues
who scored the essays. Graphs: The scatterplot below, with the regression line included, shows
a relationship between length of the essay and score, but the relationship appears to be nonlinear.
The residual plot also shows a clear pattern, so using the least-squares regression line to predict
score from of is not a idea.


/

..... / .




..
'
...
...

Numerical summaries: The correlation between word count and score is 0.881. The least squares
=
regression line for predicting y =score from x =word count is y 1.1728+0.0104. This line
accounts for about 77.5% of the variation in score. Interpretation: Even though the scatterplot
shows a moderately strong positive association between length of the essay and score, we do not
want to jump to conclusions about the nature of this relationship. Better students tend to give
more thorough explanations so there could be another reason why the longer essays tend to get
high scores. In fact, a careful look at the scatterplot reveals considerably more variation in the
length of the essays for students who received a score of 4, 5, or 6. If Dr. Perelman's made his
second conclusion about being right over 90% of the time by rounding the correlation coefficient
from 0.88 to 0.9, then he made a serious mistake with his interpretation of the correlation
coefficient. If scores were assigned by simply sorting the word counts from smallest to largest,
the error rate would be much larger than 10%.
89

Chapter4

4.1 (a) Yes, the scatterplot below (left) shows a linear relationship between the cube root of
weight, ~weight , and length.

...

(b) Let x =length andy= ~weight . The least-squares regression line is y = -0.0220 + 0.2466x.
The intercept of -0.0220 clearly has no practical interpretation in this situation, since weight and
the cube root of weight must be positive. The slope 0.2466 indicates that for every 1 em increase
in length, the cube root of weight will increase, on average, by 0.2466. (c)
= =
~weight = -0.0220 + 0.2466 x 36 8.8556, so the predicted weight is 8.8556 694.5 g. The
3

predicted weight with this model is slightly higher than the predicted weight of 689 .9g with the
model in Example 4.2. (d) The residual plot above (right) shows the residuals are negative for
lengths below 17 em, positive for lengths between 18 em and 27 em, and have no clear pattern
for lengths above 28 em. (e) Nearly all (99.88%) of the variation in the cube root of the weight
can be explained by the linear relationship with the length.

4.2 (a) The scatterplot below (left) shows positive association between length and period with
unusual 2.11 in the top right corner.
~~~~~-

..
...

(b) The residual plot above (right) shows that the residuals tend to be small or negative for small
lengths and then get larger for lengths between 40 and SO em. The residual for the one very large
length is negative again. Even though the value of r 2 is 0.983, the residual plot suggests that a
model with some curvature (or a linear model after a transformation) might be better. (c) The
information from the physics student suggests that there should be a linear relationship between
90 Chapter 4 More about Relationships between Two Variables 91

period and.J!ength. (d) A scatterplot (left) and residual plot (right) are shown below for the
transformed data. The least-squares regression line for the transformed data is
y = -0.0858+ 0.210.,jlength. The value ofr 2 is slightly higher, 0.986 versus 0.983, and the
residual plot looks better, although the residuals for the three smallest lengths are positive and
the residuals for the next six len ths are ne ative.

.. (d) Letting y =I/ P, the least-squares regression line is y = 0.1002 + 0.0398V. The scatterplot
(below on the left), the value ofr 2 = 0.9997, and the residual plot (below on the right) indicate
that the linear model provides an excellent fit for the transformed data. This transformation also
achieves linearity because V =

(e) According to the theoretical relationship, the slope in the model for (d) should be
=
~ 0.2007 . The estimated model appears to agree with the theoretical relationship because
-v980
the estimated slope is 0.210, an absolute difference of about 0.0093. (f) The predicted length of
=
an SO-centimeter pendu.lum is y = -0.0858 + 0.210.J80 1.7925 seconds.

4.3 (a) A scatterplot is shown below (left). The relationship is strong, negative and slightly
nonlinear with no outliers.
(e) When the gas volume is 15 the model in part (c) predicts the pressure to be
=
P= 0.3677 + 15.8994(1/15) 1.4277 atmospheres, and the model in part (d) predicts the
=
reciprocal of pressure to be 0.1002 + 0.0398(15) = 0.6972 or P=1/0.6972 1.4343
atmospheres. The predictions are the same to the nearest one-hundredth of an atmosphere.
2
4.4 (a) The scatterplot below (left) shows that the relationship between period and length is
rou hl linear.

(b) Yes, the scatterplot for the transformed data (above on the right) shows a clear linear
relationship. (c) The least-squares regression equation is P= 0.3677 + 15.8994(1/V). The
square of the correlation coefficient, r 2 = 0.9958, indicates almost a perfect fit. The residual plot
(below) shows a definite pattern, which should be of some concern, but the model still provides a
good fit. ...

(b) The least-squares regression line for the transformed data y =period and x =length is
y = -0.1547 +0.0428x. The value of r 2 = 0.992 and the residual plot above (right) indicate that

L
Chapter 4 More about Relationships between Two Variables 93
92

the linear model provide a good fit for the transformed data. As we noticed in Exercise 4.2 part
(d), the residual plot looks better, but there is still a pattern with the residuals for the three
smallest lengths being positive and the residuals for the next six lengths being negative. (c)
4112
According to the theoretical relationship, the slope in the model should be
980
=
0.0403. The

estimated model appears to agree with the theoretical relationship because the estimated slope is
0.0428, an absolute difference of about 0.0025. (d) The predicted length of an SO-centimeter
=
pendulum is y = -0.154 7 + 0.0428 x 80 3.2693 or a period of 1.8081 seconds. The two models
provide very similar predicted values, with an absolute difference of only 0.0156.

4.5 (a) A scatterplot is shown below (left). The relationship is strong, negative and nonlinear (or
=
(g) At 22m, the predicted light intensity is y = 888.1139e-0333x22 0.5846lumens. No, the
absolute difference between the observed light intensity 0.58 and the predicted light intensity
0.5846 is very small (0.0046lumens) because the model provides an excellent fit.

(b) The ratios (120.42/168, 86.31/120.42,61.87/86.31,44.34/61.87,31.78/44.34, and


22.78/31.78) are all 0.717. Since the ratios are all the same, the exponential model is
appropriate. (c) Yes, the scatterplot (above on the right) shows that the transformation achieves
linearity. (d) If x =Depth and y = ln(Light Intensity), then the least-squares regression lines is (b) The ratios are 226,260/63,042 = 3.5890, 907,075/226,260 = 4.0090, and 2,826,095/907,075 =
y =6.7891-0.3330x. The intercept 6.7891 provides an estimate for the average value of the 3.1156. (c) The transformed values ofy are 4.7996, 5.3546, 5.9576, and 6.4512. A scatterplot of
the logarithms against year is shown above (right). (d) Minitab output is shown below.
natural log of the light intensity at the surface of the lake. The slope, -0.3330, indicates that the The regression equation is
natural log of the light intensity decreases on average by 0.3330 for each one meter increase in log{Acres) = - 1095 + 0.556 year
depth. (e) The residual plot below (left) shows that the linear model on the transformed data is p
Predictor Coef SE Coef T
appropriate. (Some students may suggest that there is one unusually large residual, but they need Constant -1094.51 29.26 -37.41 0.001
to look carefully at the scale on they-axis. All of the residuals are extremely small.) (f) If x = year 0.55577 0.01478 37.60 0.001
Depth and y =Light Intensity, then the model after the inverse transformation is S = 0.0330502 R-Sq = 99.9% R-Sq{adj) = 99.8%
y =e 67891 e-0 333xor y =888.1139x 0.7168x.
The scatterplot below (right) shows that the (e) If x = year and y = acres, then the model after the inverse transformation is
exponential model is excellent for these data. y = I0-109451105558 x.
The coefficient of l05558 x is 0.0000 (rounded to 4 decimal places) so all of
the predicted values would be 0. (Note: If properties of exponents are not used to simplify the
right-hand-side, then some calculators will be able to do the calculations without having serious
overflow problems.) (f) The least-squares regression line oflog(acres) on year is
y =4.2513 + 0.5558x. (g) The residual plot below shows no clear pattern, so the linear
regression model on the transformed data is appropriate.
Chapter 4 More about Relationships between Two Variables 95
94

mistake. (d) A scatterplot of the logarithms against year (above on the right) shows a strong,
positive linear relationship. (e) The least-squares regression line for predicting the logarithm of
y =deaths from x =year is approximately y = -587.0 + 0.301x. Thus, the predicted value in
=
1995 is y = -587.0 + 0.301 x 1995 13.495. As a check, log(2 45 ) =13.5463. The absolute
difference in these two predictions, 0.0513, is relatively small.

(h) If x =year and y = acres, then the model after the inverse transformation is
y = 1042513105558x =17,836.1042 x 10.5558 x.
A scatterplot with the exponential model
superimposed is shown above (right). The exponential model provides an excellent fit. (i) The
predicted number of acres defoliated in 1982 (5 years since 1977) is
=
y 17,836.1042 x 105558x5 = 10,722,597.42 acres.
4.7 (a) If y =number of transistors and x =number of years since 1970, then y(l) = ab 1 =2250. (b) In the scatterplot above (right), the transformed data appear to be linear from 0 to 90 (or 1790

andy(4) = ab 4 = 9000, so a= (
2250
.
)3 =1417.4112 and b =
4
2250
.4 =. 1.5874. This. model
to about 1880), and then linear again, but with a smaller slope. The linear trend indicates that the
exponential model is still appropriate and the smaller slope reflects a slower growth rate. (c) The
90000 25 1417 112 least-squares regression line for predicting y = log(population) from x = time since 1790 is
predicts the number of transistors in year x after 1970 to be y =1417.4112x 1.5874x. (b) Using y = 1.329 + 0.0054x . Transforming back to the original variables, the estimated population size
the natural logarithm transformation on both sides of the model in (a), produces the line is21.3304x1.0125x. A scatterplot with this regression line is shown below (left). (d) The
lnj/ = 7.2566+0.4621x. (c) The slope for Moore's model (0.4621) is larger than the estimated residual plot (below on the right) shows random scatter and r 2 = 0.995, so the exponential model
slope in Example 4.6 (0.332), so the actual transistor counts have grown more slowly than ,.,.,...,.,"'n"'" an excellent fit.
Moore's law suggests.

(e) The predicted population in 2010 is y = 1.329 + 0.0054 x 220 =2.517 or about
10 =328.8516 million people. The prediction is probably too low, because these estimates
2 517

usually do not include homeless people and illegal immigrants.

4.10 (a) A scatterplot of distance versus height is shown below (left).

(c) According to the paper, the number of children killed x years after 1950 is 2x. Thus,
2 45 = 3.5184 x 1013 or approximately 35 trillion children were killed in 1995. This is clearly a
96 Chapter4 More about Relationships between Two Variables 97

(b) The curve tends to bow downward, which resembles a power curve xP with p < 1. =
(c) The inverse transformation gives the estimated power modeljl = 107617 x02182 5.7770x02182
Since we want to pull in the right tail of the distribution, we should apply a transformationxP (d) This model predicts the average life span for humans to be
with p < 1. (c) A scatterplot of distance against the square root of height (shown above, right) =
y 5. 7770 x 65"2182 = 14.3642 years, considerably shorter than the expected life span of humans.
straightens the graph quite nicely.
(e) According to the biologists, the power model is y = ax02 The easiest and best option is to
4.11 (a) Let x =Body weight in kg and y =Life span in years. Scatterplots ofthe original data plot a graph of {weight 02 , lifespan) and then fit a least-squares regression line using the
(left) and the transformed data (right), after taking the logarithms of both variables, are shown transformed weight as the explanatory variable. The scatterplot (above on the right) shows that
below. The linear trend in the scatterplot for the transformed data suggests that the power model this model provides a good fit for the data. The least-squares regression line is
=
y = -2.70+ 7.95x02 with a predicted average life span of y = -2.7 + 7.95x 65 2 15.62 years for
humans. Note: Students may try some other models, which are not as good. For example,
raising both sides of the equation to the fifth power, the model becomes l = a 5x, which is a
linear regression model with no intercept parameter (or an intercept of zero). After transforming
life spany to/, the estimated model is ys = 30,835x. This model predicts the average life span
of humans to be y =(30,835 x 65t"2 =18.2134 years. Another option is to try plotting a graph of
5
{weight, lifespan ) to achieve linearity. The least-squares regression line for this set of
transformed data is ys =13 89463 + 30, 068x with a predicted average life span of
y = (1389463+30068x65t"2 =20.1767 years for humans. Note that none ofthe models
(b) The least squares regression line for the transformed data is logy= 0.7617 +0.2182log(x). provides a reasonable estimate for the average life span of humans.
2
The residual plot (below on the left) shows fairly random scatter about zero and r = 0.7117 . 4.12 (a) The power model would be more appropriate for these data. The scatterplot of the log
Thus, 71.17% of the variation in the log of the life spans is explained by the linear relationship of cost versus diameter (below on the left) is linear, but the plot of the log of cost versus the log
with the log of the body weight. of diameter (below on the right) shows almost a perfect straight line.
98 Chapter 4 More about Relationships between Two Variables 99

height. (d) The residual plot below for the transformed data shows that the residuals are very
close to zero with no discernable attern. This model clearly fits the transformed data very well.

(b) Let y = the cost of the pizza and x = the diameter of the pizza. The least-squares regression
line is logy = -1.5118 + 2.1150 log x . The inverse transformation gives the estimated power (e) The inverse transformation gives the estimated power model
=
modelj/ = 10-uus x 2 ' 115 0.0308x 2 115
' (c) According to this model, the predicted costs of the

y = 1o-13912 x 20029 ::::: 0.0406x20029 The predicted weight of a 5' 10 (70") adult is
four different size pizzas are $4.01, $5.90, $8.18, and $13.91, from smallest to largest. There are y = 0.0406 x 702' 0029 =201.4062 lbs, and the predicted weight of a 7' (84") adult is
only slight differences between the predicted costs for the model and the actual costs, so an
adjustment does not appear to be necessary based on this model. (d) According to our estimated
y = 0.0406 x 84 2 0029
' =290.1784lbs.
power model in part (b), the predicted cost for the new "soccer team" pizza is 4.14 Who? The individuals are hearts from various mammals. What? The response variable y
=
y = 0.0308 x 24 2 115 $25.57 . (e) An alternative model is based on setting the cost proportional is the weight of the heart (in grams) and the explanatory variable xis the length ofthe left
to the area, or the power model of the form cost oc ( 1i/4) x 2 Most students will square the ventricle (in em). Why? The data were collected to explore the relationship in these two
diameter and then fit a linear model to obtain the least squares regression line quantitative measurements for hearts of mammals. When, where, how, and by whom? The data
were originally collected back in 1927 by researchers studying the physiology ofthe heart.
y = -0.506 + 0.0445x2 The estimated price ofthe "soccer team" pizza is Graphs: A scatterplot of the original data is shown below (left). The nonlinear trend in the
y = -0.506 + 0.0445 x 242 =$25.13 Alternatively, this model can be rewritten as .JY= bx . scatterplot makes sense because the heart weight is a 3-dimensional characteristic which should
Using least-squares with no intercept, the value of b is estimated to be 0.2046, so the predicted be proportional to the length of the cavity of the left ventricle. A scatterplot, after transforming
cost of the "soccer team" pizza is y = (0.2046 x 24)
2
=$24.11. the data by taking the logarithms of both variables, shows a clear linear trend (below, right) so
the model is 0:.1'\l'1ol'r>l'1ol'i

4.13 (a) As height increases, weight increases. Since weight is.a 3-dimensional characteristic
and height is !-dimensional, weight should be proportional to the cube of the height. A model of
the form weight= a(height)b would be a good place to start. (b) A scatterplot of the response
variable versus the variable x =height is shown below.

Numerical Summaries: The correlation between log of cavity length and log of heart weight is
0.997, indicating a near perfect association. Model: The power model is weight= ax lengthb.
After taking the logarithms of both variables, the least-squares regression line is
(c) Calculate the logarithms of the heights and the logarithms of the weights. The least-squares logy = -0.13 64 + 3.13 87log x . Approximately 99.3% of the variation in the log of heart weight
regression line for the transformed data is logy= -1.3912 + 2.0029log x. r 2 = 0.9999; almost is explained by the linear relationship with log of cavity length. The residual plot below suggests
that there may be a little bit of curvature remaining, but nothing to get overly concerned about.
all (99.99% of the variation in log ofweight is explained by the linear relationship with log of

.L______________________________________________...---...............................................~5
Chapter 4 More about Relationships between Two Variables 101
100

the model from (b) and 109.51 em using the model from (d). There is very little difference in the
2
predicted values, but most students will probably pick the prediction from (d) because r is a
little higher and the residual plot shows less variability about the regression line.

4.16 (a) We are given the modellny = -2.00+2.42lnx. Using properties oflogarithms, the
power model is e1ny = e-2.00+2.42 Inx or y = e-2.00 x 2.42 (b) The estimated biomass of a tree with a
diameter of 30 em is y= e-2.00 x 302.4 2
=508.2115 kg.
4.17 Who? The individuals are carnivores. What? The response variable y is a measure of
abundance and the explanatory variable xis the size of the carnivore. Why? Ecologists were
Interpretation: The inverse transformation gives the estimated power model interested in learning more about nature's patterns. When, where and how? The data were
=
y = 10--0 1364 x 31387 0.7305x3 1387 , which provides a good fit for these data. collected before 2002 (the publication date) by relating the body mass of the carnivore to the
number of carnivores. Rather than simply counting the total number of observed carnivores, the
researchers created a measure of abundance based on a count relative to the size of prey in an
area. Graphs: A scatterplot of y = abundance versus x = body mass (on the left below) shows a
nonlinear relationship. Using the log transformation for both variables provides a moderately
linear the scatterplot be[llo~w~ol]nlth~e~lj~~~-~[tlili~~~

.:
..
.. ..
2
(b) The least-squares regression line for the transformed data is y = 0.990 + 490.416x (c) The
residual plot above (right) shows random scatter and r 2 = 0.9984, so 99.84% of the variability in Numerical Summaries: The correlation between log body mass and log abundance is -0.912.
the distance fallen is explained with this linear model. (d) Yes, the scatterplot below (left) shows Model: The least-squares regression line for the transformed data is
that this transformation does a very good job creating a linear trend. The least-squares regression logy =1.9503 -1.0481log x, with an r 2 = 0.8325 and a residual plot (below) showing no obvious
line for the transformed data is .JY = 0.1046 + 22.0428x.
~~~~~~

. . ..

Interpretation: The inverse transformation gives the estimated power model


(e) The residual plot above (right) shows no obvious pattern and r 2 = 0.9986. This is an excellent
model. (f) The predicted distance that an object had fallen after 0.47 seconds is 109.32 em using
=
y = 101.9503 x -1.0481 89.1867 x -1.0481 , which provides a good fit for these data.
102 Chapter 4 More about Relationships between Two Variables 103

4.18 Let x =the breeding length, length at which 50% of females first reproduce andy= the
asymptotic body length. The scatterplot (left) and residual plot (righ9 below sh?w that the linear
model does not provide a great fit for these body measurements of this fish species. Most of the
residuals are for below 30 em a~n~d~a~b~ov~e~1~50~cm~.~~~~EillJ~~

...

(b) The first phase is from 0 to 6 hours when the mean colony size actually decreases. This
decrease is hard to see on the graph of the original data, but is more obvious on the graph of the
transformed data. In the second phase, from 6 to 24 hours, the mean colony size increases
exponentially. Both graphs show this phase clearly, but it is most noticeable from the linear
trend on the graph of the transfonned data for this time period. At 36 hours, mean growth is in
Applying the log transfonnation to both lengths produces better results. The scatterplot (left) the third phase where growth is still occurring, but at a lower rate than the previous phase. The
and residual plot (right) below show that a linear model provides a very good fit. The least point in the top right comer of both graphs clearly shows the new phase because this point does
squares regression model for the transfonned data is logy = 0.3011 + 0.9520 logx, with an not fit the pattern for phase two. (c) Let y = mean colony size and x =time. The least-squares
r 2 = 0.898 and a residual plot with very little structure, although most of the residuals are still regression line for the transfonned data is logy= -0.5942 + 0.0851x. Using the inverse
P<T<nn., .. when the variable is above 1.9. transfonnation, the predicted size of a colony 10 hours after inoculation is
Y= J0-0.5942100.085Jxl0 = 100.2568 ='= 1.8063

4.20 The correlation for time (hours 6-24) and log (mean colony size) is r = 0.9915. The
correlation time (hours 6-24) and log (individual colony size) isr = 0.9846. As expected, the

correlation for the individual colony size is smaller than the correlation for the mean colony size

.
'
...



because individual measurements have more variability. The scatterplots below show the
differences in the for mean colony sizes and individual
~~~-~

. gives
The inverse transformation . t he estimate
. d power mo.de1 y~ =1003011 x 0.952 =. 2 .0003 x 0.952 ,
which provides a good fit for these data.

4.19 (a) Scatterplots ofthe original data (left) and the transformed data (right) are shown below.

3
4.21 (a) Weight =c1(height) and strength= c2 (height) 2 , so strength= c3 (weight) 213 , where cP
C2, and C3are arbitrary constants. (b) The graph of y =x 213 below shows that strength does not
increase linearly with body weight, as would be the case if a person 1 million times as heavy as
an ant could lift 1 million times more than the ant. Strength increases more slowly. For
example, if weight is multiplied by 1000, strength will increase by a factor of 1000213 =100.

'
Chapter 4 More about Relationships between Two Variables 105
104

4.26 (a) The two-way table is shown below. (b) The percent of eggs in each group that hatched
are 59.26% in a cold nest, 67.86% in a neutral nest, and 72.12% in a hot nest. The percents
indicate that hatching increases with temperature. The cold nest did not prevent hatching, but
made 1"t Iess l"k I
1 ety.
Cold Neutral Hot
Hatched 16 38 75
Not hatched 11 18 29
Total 27 56 104

4.27 (a) The two conditional distributions are shown in the table below. The biggest difference
between men and women is in Administration-a higher percentage of women chose this major.
1 A greater percent of men chose the other fields, especially finance. (b) A total of 386 students
4.22 (a) Answers will vary. (b) The population of cancer cells aftern-1 years is P = Po(7 I 6f- resp onded , so 722- 386 = 336 d"d
I not resp1ond. About 46.54% of the students did not respond.
1 1
The population of cancer cells after n years is P = Po (7 I 6Y- + (1 I 6)(Po (7 I 6y- ) = Po (7 I 6)n . Female Male
(c) Answers will vary, but the exponential model should provide a good fit for the data collected. Accounting 30.22% 34.78%
Administration 40.44% 24.84%
4.23 (a) The sum of the six counts is 10+9+24+61+206+548 = 858 people. (b) The sum ofthe Economics 2.22% 3.73%
top row shows 10+9+24 = 43 people had arthritis. (c) The marginal distribution of participation Finance 27.11% 36.65%
in soccer is shown below.
Elite Non-elite Did not play 4.28 Two examples are shown below. In general, choose a to be any number from 0 to 50, and
Count 71 215 572 then all the other entries can be determined.
Percent 8.3% 25.1% 66.7%
..
lliiill [JQTIQJ
(d) The percent of each group who have arthntls IS 14.08% for the elite soccer players, 4.2% for OTIJIJ []QIQJ
the non-elite soccer players and 4.19% for the people who did not play. This suggests an Note: This is why we say that such a table has "one degree offreedom:" We can make
association between playing elite soccer and developing arthritis. one (nearly) arbitrary choice for the value ofa, and then have no more decisions to make.

4.24 The percents should add to 100% because they provide a breakdown of all participants 4.29 (a) The two-way table is shown below. (b) Overall, 11.88% ofwhite defendants and
according to one categorical variable. The sum is 8.3% + 25.1% + 66.7% = 100.1 %. If one 10.24% ofblack defendants receive the death penalty. For white victims, 12.58% of white
more decimal place is included in each of the percents, then the sum is 8.28% + 25.06% + defendants and 17.46% of black defendants receive the death penalty. For black victims, 0% of
66.67% = 100.01%. The percents do not add to 100% because ofrounding. white defendants and 5.83% of black defendants receive the death penalty. (c) The death penalty
is more likely when the victim was white (14.02%) rather than black (5.36%). Because most
4.25 (a) The sum of the six counts is 5375 students. (b) The proportion of these students who convicted killers are of the same race as their victims, whites are more often sentenced to death.
smoke is 100415375 = 0.1868, so the percent of smokers is 18.68%. (c) The marginal Death penalty No death penalty
distr ibution of parents smok"mg b eh av10r
IS shown beow.
1 White defendant 19 141
Neither parent smokes One parent smokes Both parents smoke Black defendant 17 149
Count 1356 2239 1780
Percent 25.23% 41.66% 33.12% 4.30 (a) The two-way table is shown below. (b) Overall, 70% of male applicants are admitted,
(d) The three conditional d1stnbut10ns are shown m the table below. while only 56% of females are admitted. (c) In the business school, 80% of male applicants are
Neither parent One parent Both parents admitted, compared with 90% of females. In the law school, 10% of males are admitted,
smokes smokes smoke compared with 33.33% of females. (d) Six out of7 men apply to the business school, which
Student does not smoke 86.14% 81.42% 77.53% admits 82.5% of all applicants, while 3 out of 5 women apply to the law school, which admits
13.86% 18.58% 22.47% 1 s applrI cant s.
only 27 5o/(oof"t
Student smokes
The conditional distributions reveal what many people expect-parents have a substantial Admit Deny
influence on their children. Students that smoke are more likely to come from families where Male 490 210
one or more of their parents smoke. Female 280 220

L
106 Chapter 4 More about Relationships between Two Variables 107

4.31 The table below gives the two marginal distributions. The marginal distribution of marital
=
status is found by taking, e.g., 337/8235 4.1 %. The marginal distribution of job grade is found
=
by taking, e.g., 955/8235 11.6%.
Single Married Divorced Widowed
4.1% 93.9% 1.5% 0.5%
Grade 1 Grade 2 Grade 3 Grade 4
11.6% 51.5% 30.2% 6.7%
As rounded here, both sets of percents add up to 100%. If students round to the nearest whole
percent, the marital status numbers add up to 101%. lfthey round to two places after the
decimal, the job grade percents add up to 100.01%.

4.32 The percent of single men in grade 1 jobs is 58/337='= 17.21%. The percent of grade 1 jobs 4.37 (a) To find the marginal distribution of opinion, we need to know the total numbers of
=
held by single men is 58/955 6.07%. people with each opinion: 49/133 = =
36.84% said "higher," 32/133 24.06% said "the same,"
=
and 52/133 39.10% said "lower." The numbers are summarized in the first table below. The
=
4.33 Divide the entries in the first column by the first column total; e.g., 17.21% 58/337. main finding is probably that about 39% of users think the recycled product is oflower quality.
These should add to 100% (except for rounding error). The percentages in the table below add to This is a serious barrier to sales. (b) There were 36 buyers and 97 nonbuyers among the
100.01%. =
respondents, so (for example) 20/36 55.56% of buyers rated the quality as higher. Similar
Job grade % of single men arithmetic with the buyers and nonbuyers rows gives the two conditional distributions of opinion,
1 17.21% shown in the second table below. We see that buyers are much more likely to consider recycled
2 65.88% filters higher in quality, though 25% still think they are lower in quality. We cannot draw any
3 14.84% conclusion about causation: It may be that some people buy recycled filters because they start
4 2.08% with a high opinion of recycled products, or it may be that use persuades people that the quality
If the percents are rounded to the n~tarest tenth, 17.2%, 65.9%, 14.8%, and 2.1 %, then they add to is hi h.
~--------------------
100%. Higher The same Lower
36.84% 24.06% 39.10%
4.34 (a) We need to compute percents to account for the fact that the study included many more
married men than single men, so that we would expect their numbers to be higher in every job Higher The same Lower
grade (even if marital status had no relationship with job level). (b) A table of percents is below; Buyers 55.56% 19.44% 25.00%
descriptions of the relationship may vary. Single and widowed men had higher percents of grade Nonbuyers 29.90% 25.77% 44.33%
1 jobs; single men had the lowest (and widowed men the highest) percents of grade 4 jobs.
Job grade Single Married Divorced Widowed 4.38 (a) The two-way table is shown below. (b) The overall batting averages are 0.240 for Joe
1 17.21% 11.31% 11.90% '19.05% and 0.260 for Moe. Moe has the best overall batting average.
4 2.08% 6.90% 5.56% 9.52% Hit No hit
Joe 120 380
4.35 Age is the main lurking variable: Married men would generally be older than single men, Moe 130 370
so they would have been in the work force longer, and therefore had more time to advance (c) Two separate tables, one for each type of pitcher, are shown below. Against left-handed
in their careers. pitchers, Joe's batting average is 0.200 and Moe's batting average is 0.100. Against right-
handed pitchers, Joe's batting average is 0.400 and Moe's batting average is 0.300. Joe is better
4.36 (a) A bar graph is shown below-58.33% of desipramine users did not have a relapse, again st b0 th k'mdso f p1'tchers.
while 25.0% oflithium users and 16.7% of those who received a placebo succeeded in breaking Left-handed pitchers Right-handed pitchers
their addictions. (b) Because random assignment was used, there is statistical evidence for Hit No hit Hit No hit
causation (though there are other questions we need to consider before we can reach that Joe 80 320 Joe 40 60
conclusion). Moe 10 90 Moe 120 280
(d) Both players do better against right-handed pitchers than agamst left-handed pitchers. Joe
spent 80% ofhis at-bats facing left-banders, while Moe only faced left-banders 20% of the time .

b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .- - .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .~
108 Chapter 4 More about Relationships between Two Variables 109

4.39 Examples will vary, of course; one very simplistic possibility is shown below. The key is Rows: Degree Columns: Gender
Female Male All
to be sure that there is a lower percentage of overweight people among the smokers than among
the nonsmokers. Associate's 431 244 675
63.85 36.15 100.00
26.85 21.90 24.83
Combined -All People 15.851 8.974 24.825
Early Death
Bachelor's 813 584 1397
Yes No 58.20 41.80 100.00
Overweight 41 59 50.65 52.42 51.38
29.901 21.478 51.379
Not overweight 50 50
Doctor's 21 24 45
Non smokers 46.67 53.33 100.00
Smokers
1.31 2.15 1. 66
Early Death Earl Death 0. 772 0.883 1.655
Yes No Yes No
Overweight 31 59 Master's 298 215 513
Overweight 10 0 58.09 41.91 100.00
Not overweight 40 20 Not overweight 10 30 18.57 19.30 18.87
10.960 7.907 18.867
4.40 Who? The individuals are students. What? The categorical variables of interest are Professional 42 47 89
educational level or degree (Associate's, Bachelor's, Master's, Professional, or Doctor's) and 47.19 52.81 100.00
gender (male or female). Why? The researchers were interested in checking if the participation 2.62 4.22 3.27
1.545 1. 729 3.273
of women changes with level of degree. When, where, how, and by whom? These projections,
in thousands, were made for 2005-2006 by the National Center for Education Statistics. Graphs: All 1605 1114 2719
59.03 40.97 100.00
The conditional distributions of sex for each degree level are shown in the bar graph below (left).
100.00 100.00 100.00
The conditional distributions of degree level for each gender are shown in the bar graph below 59.029 40.971 100.000

Cell Contents: Count


% of Row
% of Column
% of Total
Interpretation: Women earn a majority of associate's, bachelor's, and master's degrees, but fall
slightly below 50% for professional and doctoral degrees. The distributions of degree level are
very similar for females and males.

4.41 No. Rich nations have more TV sets than poor nations. Rich nations also have longer life
expectancies because they have better nutrition, clean water, and better health care. There is
common response relationship between TV sets and length of life.

Numerical summaries: The software output below from Mintab provides the joint distribution,
marginal distributions, and conditional distributions in one consolidated table. The first entry in
each cell is the count, the second entry is the% of the row (or the conditional distribution of
gender for each type of degree), the third entry is the % of the column (or the conditional
distribution of degree for each gender), and the fourth entry is the overall %.

_L_____________________________________.s~----------------------------------~
Chapter 4 More about Relationships between Two Variables 111
110

~~
--- -------- -- ... ~~~
~~~
~~~ --- --------

Exposure to
x=#of y =average Miscarriages
chemicals
TV sets life span

Time
standing up

4.42 In this case, there may be a causative effect, but in the direction opposite to the one 4.44 Well-off people tend to have more cars. They also tend to live longer, probably because
suggested: People who are overweight are more likely to be on diets, and so choose artificial they are better educated, take better care of themselves, and get better medical care. The cars
sweeteners over sugar. (Also, heavier people are at a higher risk to develop diabetes; if they do, have nothing to do with it. The relationship between number of cars and length of life is
they are likely to switch to artificial sweeteners.) common response.

Use of Weight
sweeteners gain

Number
4.43 No. The number of hours standing up is a confounding variable in this case. The diagram of cars Length of
below illustrates the confounding between exposure to chemicals and standing up. life

4.45 It could be that children with lower intelligence watch many hours of television and get
lower grades as well. It could be that children from lower socio-economic households where
parents are less likely to limit television viewing and are unable to help their children with their
schoolwork because the parents themselves lack education. The variables "number of hours
112 Chapter4 More about Relationships between Two Variables 113

watching television" and "grade point average" change in common response to "socio-economic familiarity with the test. The student may also have increased knowledge from additional math
status" or "IQ". and science courses.
...... --------- -- ... ---
--- --- _... --- --------- -- ... -......
--- ---
---

?
Number of hours
spent watching TV GPA
Coaching SAT
Course score

IQ or socioeconomic
status Experience

4.46 Single men tend to have a different value system than married men. They have many 4.48 A reasonable explanation is that the cause-and-effect relationship goes in the other
interests, but getting married and earning a substantial amount of money are not among their top direction: Doing well makes students feel good about themselves, rather than vice versa.
priorities. Confounding is the best term to describe the relationship between marital status and
income.

Self- Quality
... -- -------- --- --- --- esteem of work
---

? CASE CLOSED!
1. (a) Let y =premium and x =age. Scatterplots of the original data (left) and transformed data
(right) after taking the logarithms ofboth variables are shown below. The plot of the original
Marital
Annual data shows a strong nonlinear relationship. The plot for the transformed data shows a clear
status
income linear trend, so the power model is appropriate.

4.4 7 The effects of coaching are confounded with those of experience. A student who has taken
the SAT once may improve his or her score on the second attempt because of increased

b
............................................................................ ~. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. .
Chapter 4 More about Relationships between Two Variables 115
114

The linear trend


(d) T~e .leading cause ?f death for the youngest age group is accidents, followed by homicide
and smctde. For the mtddle age group, accidents are still the leading cause of death but cancer
and ?eart disease are second and third, respectively. For the oldest age group, canc;r is the
leadmg cause of death, with heart disease running a close second.

3. (a) The chance of dying for men over 65 who walk at least 2 miles a day is half that of men
who do not exercise. (b) Individuals who exercise regularly have many other habits and
characteristics that could contribute to longer lives.

4.49 Spending more time watching TV means that less time is spent on other activities. Answers
will vary, but some possible lurking variables are: the amount of time parents spend at home the
amount ofexerc!se and the economy. For example, parents ofheavy TV watchers may not '
(c) Since the association between the log of premium and age is nearly perfect, the exponential spend as much time at home as other parents. Heavy TV watchers may not get as much exercise
model is most appropriate. The least-squares regression line for the transformed data is as other adolescents. As the economy has grown over the past 20 years, more families can afford
logy= -0.0275 + 0.0373x. Using the inverse transformation, the predicted premium is TV sets (many homes now contain more than two TV sets), and as a result, TV viewing has
y = 1o-0 0275 =0.93 86 x 10 (d) The predicted monthly premiums are
100373 x 0373
x mcreased and children have less physical work to do in order to make ends meet.
y = 0.9386x 10o.omxss =$136.74 for a 58-year-old and y = 0.9386 x 10o.omx =$322.76 for a
68

4.50 (a) Let y =intensity and x =distance. A scatterplot of the original data is shown below
68-year-old. (e) You should feel very comfortable with these predictions. The residual plot
(left). The data appear to follow a power law model of the form y = a.xbwhere b is some
above (right) shows no clear patterns and r 2 = 99.9%, so the exponential model provides an
number.
excellent fit.
2. (a) The entries in each column are only from these six selected causes of death. There are
other causes of death so the total number of deaths in each age group is higher than the sum of
the deaths for these six causes. (b) Percents should be used to compare the age groups because
the age groups contain different numbers of individuals. (c) The conditional distributions are
shown in the table below. Each entry is obtained by dividing the count for that cause of death by
the ap propnate coIumn t ot aI.
15 to 24 years 25 to 44 years 45 to 64 years
Accidents 45.32% 21.60% 5.42%
AIDS 0.52% 5.34% 1.35%
Cancer 4.93% 14.77% 33.16%
Heart disease 3.28% 12.63% 23.27%
Homicide 15.59% 5.71% 0.63% (b) A scatterplot of the transformed data (above on the right), after taking the logarithms of both
Suicide 11.87% 8.73% 2.30% variables, shows a clear linear trend, so the power model is appropriate. The least-squares

L
116 Chapter 4 More about Relationships between Two Variables 117

regression line for the transformed data is logy= -0.5235- 2.0126logx. (c) The residual plot 4.52 The explanatory variable is the amount of herbal tea and the response variable is a measure
of health and attitude. The most important lurking variable is social interaction-many of the
below shows no obvious patterns and r 2 =99.9% so this linear model on the transformed data
nursing home residents may have been lonely before the students started visiting.
an excellent fit.
4.53 (a) The column sums are shown below.
Single: 10,949 + 7,653 + 4,009 + 720 = 23,331
Married: 2,472 + 19,640 + 32,183 + 8,539 = 62,834
Widowed: 16 + 228 + 2,312 + 8,732 = 11,288
Divorced: 155 + 2,904 + 7,898 + 1,703 = 12,660
The sum ofthese column totals is 23,331 + 62,834 + 11,288 + 12,660 = 110,113, which is not
equal to 110,115. The difference is due to rounding. (b) The marginal distributions, conditional
distributions, and joint distribution are shown in the software output from Mini tab below.
Rows: Age Columns: Marital Status
divorced married single widowed All

15-24 155 2472 10949 16 13592


1.14 18.19 80.55 0.12 100.00
(d) Using the inverse transformation to find the predicted intensity gives
=
y =10-05235 x-2 0126 0.2996x-20126 The plot of the original data with this model is shown above
1. 22
0.141
3.93
2.245
46.93
9.943
0.14
0.015
12.34
12.344
(right). (e) The predicted intensity of the 100-watt bulb at 2.1 meters is 25-39 2904 19640 7653 228 30425
y = 0.2996 X 2. r 20126 ='= 0.0673 candelas. 9.54
22.94
64.55
31.26
25.15
32.80
0.75
2.02
100.00
27.63
2.637 17.836 6.950 0.207 27.631

40-64 7898 32183 4009 2312 46402


17.02 69.36 8.64 4.98 100.00
62.39 51.22 17.18 20.48 42.14
7.173 29.227 3.641 2.100 42.140

65+ 1703 8539 720 8732 19694


8.65 43.36 3.66 44.34 100.00
13.45 13.59 3.09 77.36 17.89
1.547 7.755 0.654 7.930 17.885

All 12660 62834 23331 11288 110113


11.50 57.06 21.19 10.25 100.00
100.00 100.00 100.00 100.00 100.00
11.497 57.063 21.188 10.251 100.000
(b) Let x =distance andy= intensity. The least-squares regression line for the transformed data Cell Contents: Count
% of Row
is .Y=-0.0006+0.30(:2 ). (c) Thepredictedintensityofthe 100-wattbulbat2.1 meters is % of Column
% of Total

y = -0.0006+ 0.3o(-4) =0.0674candelas. (d) Writing the model from part (d) of Exercise The table below provides just the marginal distribution for marital status.
2.1
Single Married Widowed Divorced
y = -0.0006 + ( 2.1~)
0
4.50 in a slightly different form shows that the models are very similar, 21.19% 57.06% 10.25% 11.50%
A bar chat1 of the marginal distribution is shown below.

y =( .~;.~126 ).
versus
2 The absolute difference in the predicted values is 0.0001. Thus, the

inverse square law provides an excellent model.


Chapter4 More about Relationships between Two Variables 119
118

(c) The two conditional distributions are shown in the table below. (c) The inverse transformation gives a predicted height of
Age Single Married Widowed Divorced =
y = 10.461010-0.ll9 lx 2.8907 X 1o-O.ll 9 lx The predicted height on the ih bounce is
15-24 80.55% 18.19% 0.12% 1.14% y = 2.8907 X 10-0.ll lx? = 0.4239 feet.
9

40-64 8.64% 69.36% 4.98% 17.02%


Among the younger women, more than 4 out of 5 have not yet married, and those who are 4.55 The lurking variable is temperature or season. More flu cases occur in winter when less ice
married have had little time to become widowed or divorced. Most of the older group is or has cream is sold, and fewer flu cases occur in the summer when more ice cream is sold. This is an
been married-only about 8.64% are still single. (d) Among single women, 46.93% are 15-24, example of common response.
32.8% are 25-39, 17.18% are 40-64 and 3.09% are 65 or older.

Number of flu
cases reported Amount of
ice cream
sold

Season or
Not only is the linear association between the log(height) and bounce stronger than the linear temperature
association between the logarithms of both variables, but there is also a value of zero for the
bounce number which means that the logarithm cannot be used for this point. The exponential
model is more appropriate for predicting y =height from x =bounce number. (b) The least-
squares regression line for the transformed data is logy= 0.4610- 0.1191x. The residual plot 4.56 Who? The individual are randomly selected people from three different locations. What?
The response variable is whether or not the individual suffered from CHD and the explanatory
below shows that the first two residuals are positive and the next three residuals are negative, but
variable is a measure of how prone an individual is to sudden anger. Both variables are
the residuals are all very small. The value of r 2 is 0.998, which indicates that 99.8% of the
categorical, with CHD being yes or no and the level of anger being classified as low, moderate,
variability in log(height) is explained by linear relationship with bounce. This model provides an
or high. Why? The researchers wanted to see ifthere was an association between these two
excellent fit. categorical variables. When, where, how, and by whom? In the late 1990s a random sample of
almost 13,000 people was followed for four years. The Spielberger Trait Anger Scale was used
to classifY the level of anger and medical records were used for CHD. Graphs: A bar graph of
the conditional distributions of CHD for each level of anger is shown below (left). To see the

!
120 Chapter 4 More about Relationships between Two Variables 121

Graphs: Scatterplots below show the original data (left) and the transformed data (right) after
taking the logarithm of count. Both plots suggest that the exponential decay model is appropriate
for these data.

..
..
..
..

Numerical summaries: The software output below from Minitab shows the marginal
distributions, conditional distributions, and joint distribution. Numerical summaries: The least-squares regression line for the transformed data is
Rows: CHD Columns: Anger logy= 2.5941-0.0949x. Using the inverse transformation, the predicted count is
high low moderate All

606 3057 4621 8284


y =10 2 5941
10-0.o949 x = 392.7354 x 10-00949 x. Interpretation: The residual plot below shows no
No
7.32 36.90 55.78 100.00 clear pattern and r 2 = 98.8%, so the exponential decay model provides an excellent model for
95.73 98.30 97.67 97.76
7.151 36.075 54.532 97.758
the number of bacteria after to X -rays.

Yes 27 53 110 190


14.21 27.89 57.89 100.00
4.27 1. 70 2.33 2.24
0.319 0.625 1.298 2.242

All 633 3110 4731 8474


7.47 36.70 55.83 100.00
100.00 100.00 100.00 100.00
7.470 36.700 55.830 100.000

Cell Contents: Count


% of Row
% of Column
% of Total
The most important numbers for comparison are the percents of each anger group 4.58 (a) The two-way table below was obtained by adding the corresponding entries for each
that experienced CHD: 53/3110 = 1.70% ofthe low-anger group, 110/4731 =2.33% ofthe age group. The proportion of smokers who stayed alive for 20 years is 443/582 =0.7612 or
moderate-anger group, and 27/633 = 4.27% of the high-anger group. 76.12o/<o an d t he proportiOn
of non smokers who stayed alive is 502/732 = 0.6858 or 68.58%.
Smoker Not
Interpretation: Risk of CHD increases with proneness to sudden anger. It might be good to Dead 139 230
point out to students that results like these are typically reported in the media with a reference to Alive 443 502
4 (b) For the youngest group, 269/288 or 93.40% of the smokers and 327/340 or 96.18% of the
the relative risk of CHD; for example, because .3% = 2.5, we might read that "subjects in the
1.7% nonsmokers survived. For the middle group, 167/245 or 68.16% of the smokers and 147/199 or
high-anger group had 2.5 times the risk of those in the low-anger group." 73.87% of the nonsmokers survived. For the oldest group, 7/49 or 14.29% of the smokers and
28/193 or 14.51% ofthe nonsmokers survived. The results are reversed when the data for the
4.57 Who? The individuals are cultures of marine bacteria. What? The two quantitative three age groups are combined. (c) The percents of smokers in the three age groups are
variables are x =time (minutes) andy= count (number of surviving bacteria in hundreds). Why? 288/628x100 = 45.86% for the youngest group, 245/444x100 = 55.18% for the middle aged
Researchers wanted to see if the bacteria would decay exponentially over time when exposed to group, but only 49/242x100 = 20.25% for the oldest group.
X-rays. When, where, how, and by whom? It is not clear when or where the data were collected,
but the counts were obtained after exposing cultures to X-rays for different lengths of time .

..
122 Part I Review Exercises Part I Review Exercises 123

Part I Review Exercises American East Asian


Save time 25.2% 14.5%
1.1 Who? The individuals are 19 years. What? The variables measured are wildebeest Easy 24.3% 15.9%
abundance (in thousands of animals) and the percent of grass area burned in the same year. Low price 14.8% 49.3%
Why? There is a claim that more wildebeest reduce the percent of grasslands burned. When, Live far from stores 9.6% 5.8%
where, how, and by whom? We are not told when these data were collected. However, we know No pressure to buy 8.7% 4.3%
the data are from long-term records from the Serengeti National Park in Tanzania. Graph: The Other reason 17.4% 10.1%
scatterplot below (on the left) shows a moderately strong, negative, fairly linear relationship . Note: The percentages for East Asian students total 99 .9%, due to rounding error. (c) A higher
between the percent of grass area burned and wildebeest abundance. There are no unusual pomts percentage of American students than East Asian students buy from catalogs because it saves
that in the them time (25.2% versus 14.5%) and it is easy (24.3% versus 15.9%). A higher percentage of
East Asian students than American students buy from catalogs because of the low price (49.3%
versus 14.8%) .
1.3 (a) Since we know the weights of seeds of a variety of winged bean are approximately

~
Normal, we can use the Normal model to find the percent of seeds that weigh more than 500 mg.
First, we standardize 500 mg:
. ~ ~ . .. z = x- J.l = 500-525 = -25 = _ 0 .23
~-
(J 110 110
Using Table A, we find the proportion of the standard Normal curve that lies to the left of z =
-0.23 to be 0.4090, which means that 1-0.4090 = 0.5910 lies to the right of z = -0.23. Thus,
Numerical summaries: For these data, x = 904.8,sx = 364.0,y = 40.16,sY = 26.10, and 59.1% of seeds weigh more than 500 mg. (b) We need to find the z-score with 10% (or 0.1 0) to
r = -0.803 . Model: The line on the plot is the least-squares regression line of percent of grass its left. The value z = -1.28 has proportion 0.1003 to its left, which is the closest proportion to
area burned on wildebeest abundance. The regression equation is y = 92.29- 0.05762x. A 0.1 0. Now, we need to fmd the value of x for the seed weights that gives us z = -1.28:
residual plot is shown above (on the right). Interpretation: The scatterplot shows a negative -1. 28 = x-525
association. That is, areas with less grass burned tend to have a higher wildebeest abundance. 110
The overall pattern is moderately linear (r = -0.803 ). The slope of the regression line suggests -1.28{110) = x-525
that for every increase of 1000 wildebeest, the percent of grassy area burned decreases by about 525 -1.28(110) =X
5.8. According to they-intercept, an area with no wildebeest would have 92.29 percent of grass
384.2=x
area burned. It does not make sense to interpret they-intercept due to extrapolation. The residual If we discard the lightest 10% of these seeds, the smallest weight among the remaining seeds is
plot shows a fairly "random" scatter of points around the "residual= 0" line. There is one large
2
384.2 mg.
positive residual at 1249 thousand wildebeest. Since r = 0.646, 64.6% of the variation in
percent of grass area burned is explained by the least-squares regression of percent of grass area 1.4 Who? The individuals are American bellflower plants. What? The explanatory variable is
burned on wildebeest abundance. That leaves 3 5.4% of the variation in percent of grass area whether cicadas were placed under the plant (categorical) and the response variable is seed mass
burned unexplained by the linear relationship. in milligrams (quantitative). Why? The researcher wants to investigate whether cicadas serve as
fertilizer and increase plant growth. When, where, how, and by whom? We are not told when
1.2 (a) Themargma . 1d'IS tr'b
1 u t'wn o f r easons for all students is these data were collected. However, we know the data come from 39 cicada plants and 33
Save time 21.2% control plants on the forest floor in the eastern United States. Graphs: We can compare the
Easy 21.2% cicada plants and the control plants with a side-by-side boxplot and a back-to-hack stemplot. In
Low price 27.7% the stemplot, the stems are listed in the middle and the leaves are placed on the left for cicada
Live far from stores 8.2% plants and on the right for control plants.
No pressure to buy 7.1%
Other reason 14.7%
Note: The percentages total100.1 %, due to rounding error. (b) The conditional distributions of
American and East Asian students are

L
Part I Review Exercises Part I Review Exercises 125
124

Stem-and-leaf of Cicada Plants and Control Plants symmetric with no outliers, it is appropriate to use the mean and standard deviation to describe
Leaf Unit = 0.010 center and s read.
Cicada Control
0 1 3
4 1 445
7 1 77
99 1 89999
111100 2 0111
3333332222 2 2
5544 2 4444445555
7777666 2 66666
999 2 89
110 3
3
5 3

.
Numerical summaries Here are summary statistics for the two distributions
s Min Ql M Q3 Max IQR
Variable Mean
Cicada 0.04759 0.1090 0.2170 0.2380 0.2760 0.3510 0.0590
0.24264
Plants
Control 0.04307 0.1350 0.1900 0.2410 0.2550 0.2900 0.0650 (b) A regression line added to a plot of the days against year shows, on average, that the number
0.22209
Plants
Interpretation: The dtstnbut10n of seed mass (m mg) ts a btt nght-skewed for the ctcada plants. of 20 that the falls is decreasing as the years go by.
One cicada plant had an unusually low seed mass (0.109 mg). For the control plants, the
distribution of seed mass (in mg) is somewhat left-skewed. While the median seed mass is about
the same for both the cicada plants and the control plants, the seed mass for the cicada plants is
higher than the seed mass for the control plants at the ftrst and third quartiles (and at the
maximum). The mean seed mass is higher for the cicada plants. The standard deviation is larger
for the cicada plants, while the IQR is larger for the control plants. Because of the outlier in the
seed mass for the cicada plants and the skewness of both distributions, we should use the
resistant medians and IQRs in our numerical comparisons. The median and IQR are both smaller
for the cicada plants than for the control plants. However, the ftrst and third quartiles and the
maximum are greater for the cicada plants than for the control plants. We might want to do more
research to see if we come up with more conclusive data. (c) According to R-Sq in the fttted line plot above, 10.0% of the variation in ice breakup time is
accounted for by the time trend.
1.5 A histogram of the date of ice breakup (number of days since April 20) on the Tanana River
shows the data well. ___,.,.,.,.,..,.._..,., 1.7 Grouping the data into year groups (1 = 1917 to 1939, 2 = 1940 to 1959, 3 = 1960 to 1979, 4
= 1980 to 2005), we can see that the median time to tripod drop is generally decreasing over
time. The median is approximately equal for the time periods 1940 to 1959 and 1960 to 1979.
However, the median looks noticeably higher for the time period 1917 to 1939 and noticeably
lower for the time period 1980 to 2005.

Because the distribution is slightly right-skewed, it is appropriate to use the ftve-number


summary (and IQR) to describe the data. Alternatively, since the distribution is roughly

I
L
126 Part I Review Exercises Part I Review Exercises 127

Numerical summaries: The correlation between ln(Seed Weight) and ln(Seed Count) is -0.929.
1.8 This is an observational study, so we cannot prove that online instruction is more effective Model: The least-squares regression equation is ln(Seed Weight)= 15.5 -1.52ln(Seed Count),
than classroom teaching. There are other factors that we must consider. These arise when we ask with r 2 = 0.863. A of the residuals versus ln(Seed Count) is shown below.
the question "What might be different about students who choose online instruction over
classroom instruction?" Some factors to consider are: age of the students (e.g., older students
may work full time and find it easier to take an online course, but these students might be more
serious about doing well in the course), aptitude of the students (e.g., those who are proficient
with computers and choose online instruction might also be better students).

1.9 Who? The individuals are several common tree species. What? The variables are seed count
and seed weight (mg). Why? We wonder if trees with heavy seeds tend to produce fewer seeds
than trees with light seeds. When, where, how, and by whom? These data come from many
studies compiled in Greene and Johnson's "Estimating the mean annual seed production of
trees," which was published in Ecology, volume 75 (1994). Graphs: We first examine a
There appears to be fairly random scatter in the residual plot, so the regression we have
scatterplot of seed weight versus seed count. The plot shows that a linear relationship is not
performed seems appropriate. We now perform an inverse transformation on the linear
~~~
n .... for these data. We need to transform the data.
.... ..,
regression equation:
ln(Seed Weight)= 15.5 - 1.52ln(Seed Count)
eln(Seed Weight) = e15.5 - 1.52ln(Seed Count)
(Seed Weight) = e15.5 X e-1.52 ln(Seed Count)
52
(Seed Weight) = e15'5 x (Seed Countt1.
This is the power model for the original data. Interpretation: The relationship between seed
count and seed weight is not linear. However, we have found a power model that works well to
describe this relationship. The relationship we found tells us that 86.3% of the variability in
ln(Seed Weight) is accounted for by the least-squares regression on ln(Seed Count).
Taking the natural log ofboth seed count and seed weight gives us a relationship that looks more
linear.
1.10 (a) Smaller cars tend to get better gas mileage than larger cars. More than 50% of large cars
get less gas mileage than the midsize car with the worst gas mileage. All large cars get less gas
mileage than 75% of the subcompact and compact cars. Subcompact cars get the best gas
mileage, on average, but they also have the most variability. Compact cars get slightly worse gas
mileage than subcompact cars, but there is still a lot of variability for the compact cars. Overall,
as the size of the car increases, the gas mileage noticeably decreases. (b) For each additional
penny in the cost of gas, the sale of high MPG cars increases by 0.101690%, on average. A more
practical way to look at this relationship is to say that for each additional 10 cents spent on gas,
the sale of high MPG cars increases about 1.02%, on average. They-intercept says that if gas
128 Part I Review Exercises

cost nothing, the high MPG cars sales would be about 9.6% of the car sales market. This does
not make any sense, since we need to extrapolate outside of the range of the data to make this
statement. (c) The predicted sales of high MPG cars for that month is
High mpg Cat'/o =9.63594+0.101690(150) = 24.89
That is, we predict high MPG cars to represent about 24.89% of sales that month. The actual
sales of high MPG cars were about 25.8%. The residual is 25.8%-24.89% = 0.91 %. (d) 45% of
the variation in the sale of high MPG cars(%) is accounted for by the least-squares relationship
with gas price in the current month.

I
128 Part I Review Exercises 129

cost nothing, the high MPG cars sales would be about 9.6% of the car sales market. This does Chapter 5
not make any sense, since we need to extrapolate outside of the range of the data to make this
statement. (c) The predicted sales of high MPG cars for that month is 5.1 The population is (all) local businesses. The sample is the 73 businesses that return the
High mpg Car"lo = 9.63594+0.101690(150) = 24.89 questionnaire, or the 150 businesses selected. The nonresponse rate is 51.3% = 77/150.
Note: The definition of "sample" makes it somewhat unclear whether the sample
That is, we predict high MPG cars to represent about 24.89% of sales that month. The actual
sales of high MPG cars were about 25.8%. The residual is 25.8%-24.89% = 0.91%. (d) 45% of includes all the businesses selected, or only those which responded. Many folks lean toward
the variation in the sale of high MPG cars(%) is accounted for by the least-squares relationship the latter (the smaller group), which is consistent with the idea that the sample is "a part of
the population that we actually examine. "
with gas price in the current month.
5.2 (a) An individual is a person; the population is all adult U.S. residents for that week. (b) An
individual is a household; the population is all U.S. households in the year 2000. (c) An
individual is a voltage regulator; the population is all the regulators in the last shipment.

5.3 This is an experiment: a treatment is imposed. The explanatory variable is the teaching
method (computer assisted or standard), and the response variable is the increase in reading
ability based on the pre- and post-tests.
'!
''
5.4 We can never know how much of the change in attitudes was due to the explanatory variable
(reading propaganda) and how much to the historical events of that time. The data give no
information about the effect of reading propaganda.

5.5 This is an observational study. The researcher did not attempt to change the amount that
people drank. The explanatory variable is alcohol consumption. The response variable is
survival after 4 years.

5.6 (a) The data were collected after the anesthesia was administered. Hospital records were
used to "observe" the death rates, rather than imposing different anesthetics. (b) Some possible
confounding variables are type of surgery, location of hospital, training of the doctor, patient
allergies to certain anesthetics, and health of the patient before the surgery.

5.7 Only persons with a strong opinion on the subject, strong enough that they are willing to
spend the time and money, will respond to this advertisement.

5.8 Letters to legislators are an example of a voluntary response sample--the proportion of


letters opposed to the insurance should not be assumed to be a fair representation of the attitudes
of the congresswoman's constituents.

5.9 Pnt the retail outlets in alphabetical order and label them from 001 to 440. Starting at line
105, the sample includes outlets numbered 400, 077, 172, 417, 350, 131, 211, 273, 208, and 074.

5.10 Entering at line 131 and reading two-digit numbers, the authors will call Beach Castle (05),
Sea Castle (19), and Banyan Tree (04). Most statistical software will select a SRS for you,
eliminating the need for Table B. The Simple Random Sample apple! on the text CD and Web
site is a convenient way to automate this task.
130 Chapter 5
T
I
Producing Data 131
I
5.11 Assign 01 to 30 to the students (in alphabetical order). Starting on line 123 gives 08-Ghosh,
IS-Jones, 07-Fisher, and 27-Shaw. Assigning 0-9 to the faculty members gives 1-Besicovitch 5.21 (a) The individuals are adults, presumably those who are eligible to vote, in the country.
and 0-Andrews. (b) The individuals are employed women who are members of the local business and
professional women's clubs. (c) The individuals are households in the U.S.
5.12 Label the 500 midsize accounts from 001 to 500, and the 4400 small accounts from 0001 to
4400. Starting at line 115, the first five accounts in each strata are 417, 494, 322, 247, and 097 5.22 Children from larger families will be overrepresented in such a sample. Student
for the midsize group, then 3698, 1452, 2605, 2480, and 3716 for the small group. explanations of why will vary; a simple illustration can aid in understanding this effect. Suppose
that there are 100 families with children; 60 families have one child, and the other 40 have three.
5.13 (a) This is a stratified random sample. (b) Label each area code from 01 through 25; Then there are a total of 180 children (an average of 1.8 children per family), and two-thirds of
beginning at line Ill, the SRS includes 12 (559), 04 (209), 11 (805), 19 (562), 02 (707), 06 those children come from families with three children. Therefore, if we had a class (a sample)
(925), 08 (650), 25 (619), 17 (626), and 14 ( 661 ). chosen from these 180 children, only one-third of the class would answer "one" to the teacher's
question, and the rest would say "three." This would give an average of about 2.3 children per
5.14 (a) This is cluster sampling. (b) Answers will vary. Label each block from 01 through 65; family.
beginning at line 142, the 5 blocks are 02, 32, 26, 34, and 08. The statistical applet selected
blocks 10, 20, 45, 36, and 60. 5.23 Number the bottles across the rows from 01 to 25, then select 12-B0986, 04-AllOl, and
11-A2220. (If numbering is done down columns instead, the sample will be All17, B1102, and
5.15 (a) Households without telephones or with unlisted numbers are omitted from this frame. Al098.)
Such households would likely be made up of poor individuals (who cannot afford a phone), those I
who choose not to have phones, and those who do not wish to have their phone number 5.24 In order to increase the accuracy of its poll results. Larger samples give less variable results
published. (b) Those with unlisted numbers would be included in the sampling frame when a than smaller samples. II[!
random-digit dialer is used. i
5.25 One could use the labels already assigned to the blocks, but that would mean skipping a lot
5.16 The higher no-answer rate was probably the second period-when families are likely to be of four-digit combinations that do not correspond to any block. An alternative would be to drop
vacationing or spending time outdoors. A high rate of nonresponse makes sample results less the second digit and use labels 100, 101, 102, ... ,105; 200, ... ,211; 300, ... ,325. But by far the
reliable because you don't know how these individuals would have responded. It is very risky to simplest approach is to assign labels 01, ... , 44 (in numerical order by the four-digit numbers
assume that they would have responded exactly the same way as those individuals who did already assigned), enter the table at line 125, and select: 21 (#3002), 37 (#3018), 18 (#2011), 44
respond. (#3025), and 23 (#3004).

5.17 The first wording would pull respondents toward a tax cut because the second wording 5.26 (a) False-if it were true, then after looking at 39 digits, we would know whether or not the
mentions several popular alternative uses for tax money. 40th digit was a 0. (b) True-there are I 00 pairs of digits 00 through 99, and all are equally
likely. (c) False-0000 is just as likely as any other string of four digits.
5.18 Variable: Approval of president's job performance. Population: Adult citizens of the U.S.
5.27 It is not an SRS, because some samples of size 250 have no chance of being selected (e.g.,
or perhaps just registered voters. Sample: The 1210 adults interviewed. Possible sources of bias:
Only adults with phones were contacted. Alaska and Hawaii were omitted. a sample containing 250 women).
li
5.19 (a) There were 14,484 responses. (Note that we have no guarantee that these came from 5.28 (a) The two options presented are too extreme; no middle position on gun control is I
14,484 distinct people; some may have voted more than once.) (b) This voluntary response allowed. Many students may suggest that this question is likely to elicit more responses against
sample collects only the opinions of those who visit this site and feel strongly enough to respond. gun control (that is, more people will choose 2). (b) The phrasing of this question will tend to
make people respond in favor of a nuclear freeze. Only one side of the issue is presented.
5.20 (a) The wording is clear. The question is slanted in favor ofwaming labels. (b) The
question is clear, but it is clearly slanted in favor of national health insurance by asserting it 5.29 A sample from a smaller subgroup gives less information about the population. "Men"
would reduce administrative costs. (c) The wording is too technical for many people to constituted only about one-third of our sample, so we know less about that group than we know
understand-and for those who do understand the question, it is slanted because it suggests about all adults.
reasons why one should support recycling. It could be rewritten to something like: "Do you
support economic incentives to promote recycling?" 5.30 The chance of being interviewed is 3/30 for students over age 21 and 2/20 for students
under age 21. This is 1/10 in both cases. It is not an SRS because not all combinations of I
I
T
I
132 Chapter 5 I Producing Data 133

students have an equal chance of being interviewed. For instance, groups of 5 students all over each treatment combination so that systematic differences due to the treatments can be separated
age 21 have no chance of being interviewed. from natural variability in the experimental units.

5.31 Answers will vary. One possible approach: Obtain a list of schools, stratified by size or 5.39 (a) Expense, condition of the patient, etc. In a serious case, when the patient has little
location (rural, suburban, urban). Choose SRSs (not necessarily all the same size) of schools chance of surviving, a doctor might choose not to recommend surgery; it might be seen as an
from each strata. Then choose SRSs (again, not necessarily the same size) of students from the unnecessary measure, bringing expense and a hospital stay with little benefit to the patient.
selected schools. (b)

5.32 (a) Split the 200 addresses into 5 groups of 40 each. Looking for 2-digit numbers from 01
I Group I
ISO patients --- Treatment I
Surgery

~
to 40, the table gives 35 so the systematic random sample consists of35, 75, 115, 155, and 195.
(b) Every address has a 1-in-40 chance of being selected, but not every subset has an equal
chance of being picked-for example, 01, 02, 03, 04, and 05 cannot be selected by this method. Compare
Random Allocation recovery

~
5.33 Experimental units: pine seedlings. Factor: Light intensity. Treatments: full light, 25%
light and 5% light. Response variable: dry weight at the end of the study.

5.34 Subjects: The students living in the selected dormitory. Factor: The rate structure.
Group2
150 patients
Treatment2
Alternative
/
Treatments: Paying one flat rate, or paying peak/off-peak rates. Response variables: The
amount and time of use and total network use.
5.40 Assign nine subjects to each treatment. A diagram is below; if we assign labels 01
5.35 Experimental units: the individuals who were called. Factors: 1. type of call; 2. offered through 36, then line 130 gives:
survey results. Treatments: (1) giving name/no survey results, (2) identifying university/no
survey results, (3) giving name and university/no survey results, (4) giving name/offer to send Group 1 Group2 Group3
survey results, (5) identifying university/offer to send survey results, (6) giving name and 05 Chen 32 Vaughn 31 Va1asco 02 Asihiro 35 Willis 11 Fleming
university/offer to send survey results. Response variable: whether or not the interview was 16 lmrani 04 Bikalis 18 Kaplan 36Zhang 21 Marsden 15 Hruska
completed. 17 James 25 Padilla 07Duncan 23 O'Brian 26 Plochman 12 George
20 Maldonado 29 Trujillo 13Han 27 Rosen 08 Durr 14Howard
5.36 Subjects: 300 sickle cell patients. Factor: type of medication. Treatments: hydroxyurea and 19 Liang 33 Wei 10 Farouk
placebo. Response variable: number of pain episodes.
The other nine subjects are in Group 4.
5.37 (a) The response variable is the amount of chest pain. (b) This phenomenon is known as
Group I
,I
I
the placebo effect. (c) Well-designed experiments should use a control. The ligation study 9 patients - Antidepressant
illustrates the importance of using a control group.

~
~
5.38 (a) The experimental units are the middle schools. The response variables are physical Group 2 Antidepressant
9 patients plus slress "'\
activity and lunchtime consumption of fat. (b) There are two factors, physical activity program management
Compare the

-1 -~
and nutrition program, and four treatments, activity intervention, nutrition intervention, both Random Allocation number and severity

~
interventions, and neither intervention.
Group 3
FactorB: 9 patients
Nutritional Program
Factor A: Yes No
Physical Activity Yes Group 4 - Placebo plus
Program ~o 9 patients stress management

(c) At least 4 experimental units are required for the experiment, but as we will see in the next
section, using only 4 experimental units is not a good idea. We want at least one replicate on
134 Chapter 5
T Producing Data 135

5.41 (a) A diagram is shown below. (b) Assigning the students numbers from 001 to 120, using.
line 123 from Table B, the first four subjects are 102, 063, 035, and 090.
I Group I
6rats --- Treatment I
Black tea-

I Group J

40 P'';'"''
Accomplice fired
because he/she did
poorly ~ Random Allocation - Group
6rats
2 Treatment 2
Green Tea
- Compare growth of
cataracts

~
Compare
Random Allocation - Group 2 Accomplice - - performance after
~40poHents randomly tired the break
Group 3
6rats
Treatment)
Placebo
/
Group 3 Both cont;nue /
40patients to work
5.45 Because the experimenter knew which subjects had learned the meditation techniques, he
(or she) may have had some expectations about the outcome of the experiment: if the
5.42 (a) A diagram is shown below. (b) Label the subjects from 01 through 20. From experimenter believed that meditation was beneficial, he may subconsciously rate that group as
line 131, we choose 05, 19, 04, 20, 16, 18, 07, 13, 02, and 08; that is, Frankum, Wenk, Edwards, being less anxious.
Zillgitt, Valenzuela, Waespe, Hankinson, Shenk, Colton, and Mathis for one group, and the rest
for the other. 5.46 (a) If only the new drug is administered, and the subjects are then interviewed, their
responses will not be useful, because there will be nothing to compare them to: How much "pain
Group I
~
Treatment I relief' does one expect to experience? (b) Randomly assign 20 patients to each of three groups:
10 subjects Strong marijuana
Group 1, the placebo group; Group 2, the aspirin group; and Group 3, which will receive the new I
/ ~ Compare work
medication. After treating the patients, ask them how much pain relief they experienced, and
then compare the average pain relief experienced by each group. (c) The subjects should ,I
I
Random Allocation
output and earnings certainly not know what drug they are getting-a patient told that she is receiving a placebo, for ,,,.
~ /
example, will probably not experience any pain relief. (d) Yes-presumably, the researchers
would like to conclude that the new medication is better than aspirin. If it is not double-blind, the
Group 2 Treatment2 interviewers may subtly influence the responses of the subjects.
~
10 subjects Weak marijuana

5.47 (a) Ordered by increasing weight, the five blocks are (1) Williams-22, Deng-24,
Hemandez-25, and Moses-25; (2) Santiago-27, Kendall-28, Mann-28, and Smith-29; (3) Brunk-
5.43 The second design is an experiment-a treatment is imposed on the subjects. The first is an
30, Obrach-30, Rodriguez-30, and Loren-32; (4) Jackson-33, Stall-33, Brown-34, and Cruz-34;
observational study; it may be confounded by the types of men in each group. In spite of the
(5) Bimbaum-35, Tran-35, Nevesky-39, and Wilansky-42. (b) The exact randomization will
researcher's attempt to match "similar" men from each group, those in the first group (who
vary with the starting line in Table B. Different methods are possible; perhaps the simplest is to
exercise) could somehow be different from men in the non-exercising group. number the subjects from 1 to 4 within each block, then assign the members of block 1 to a
weight-loss treatment, then assign block 2, etc. For example, starting on line 133, we assign 4-
5.44 (a) A diagram is shown below. (b) If we assign labels 01, ... , 18 and begin on line 142, Moses to treatment A, !-Williams to B, and 3-Hemandez to C (so that 2-Deng gets treatment D),
then we select: 02, 08, 17, 10, 05, and 09 for Group 1; 06, 16, 01, 07, 18, and 15 for Group 2.
then carry on for block 2, etc.
The remaining rats are assigned to the placebo group.
5.48 (a) A figure with 6 circular areas is shown below. Table B was used to select 3 for the
treatment, starting at line 104. The frrst 4 digits are: 5 2 7 1. We cannot use the 7 because it is
more than 6. Therefore, we would treat areas 5, 2 and 1. (b) A figure with 3 pairs of circular
areas is shown below. For each pair, we randomly pick one of the two to receive the treatment.
A random number was generated for each pair. If the random number was less than 0.5 then the
top area was treated and the bottom area was untreated. If the random number is greater than or
equal to 0.5, then the top area is untreated and the bottom area is treated.
T
I

Chapter 5 Producing Data 137


136

5.52 (a) The subjects are the 210 children. (b) The factor is the "choice set"; there are three
levels (2 milk/2 fruit drink, 4 milk/2 fruit drink, and 2 milk/4 fruit drink). (c) The response
variable is the choice made by each child. (d) Arrange the names of the children in alphabetical
order and assign numbers from 001 to 210. Use Table B, the statistical apple!, or a random
number generator to select 70 children for Group 1 (2 milk/2 fruit drink) and 70 children for
Group 2 (4 milk/2 fruit drink). The remaining 70 children would be placed in Group 3 (2 milk/4
fruit drink). Starting at line 125, the first five children in Group 1 are 119, 33, 199, 192, and 148.

5.53 (a) The subjects are patients. The factor is temperature with 2 levels, warmed and not
warmed. The response variable is the presence of infection (yes or no) after surgery.
(b)

5.49 (a) Randomly assign 10 subjects to Group I (the 70 group) and the other I? to c:rou~ 2
(which will perform the task in the 90 condition). Record the number of correct msertt~~s m
each group. (b) All subjects will perform the task twice; ~nee in each.te~peratu~e condtl!on.
Group I
20 subjects
-- Treatment I
Warmed

~
Randomly choose which temperature each subject works m first by fltppmg a com.
Random Allocation
/ Compare rate of
infections
5.50 (a) Randomly assign half the girls to get high-calcium p~nch,. and.the oth~r half get
low-calcium punch. The response variable is not clearly descnbed m thts exercise; the b~st ~ /
we can say is "observe how the calcium is processed." (b) Randomly ?elect half of the gtrls
to receive high-calcium punch first, while the other half get~ low-calcmm punch first, ~e~
for each subject, compute the difference in the response vanable for each treatment. Thts ts a
Group2
20 subjects - Treatment2
Not wanned

better design because it deals with person-to-person variation; the differences m responses
for 60 individuals gives more precise results than lhe difference in the average responses for (c) Assign each subject a number, 01 to 40, by alphabetical order. Starting at line 121 in Table
two groups of30 subjects. (c) The first five subjects are 16, 34, 59, 44, and 2~. In the complete.ly B, the first twenty different subjects are: 29-Ng, 07-Cordoba, 34-Sugiwara, 22-Kaplan, 10-
randomized design, the first group receives high-calcium punch all summer; m the matche~ parrs Devlin, 25-Lucero, 13-Garcia, 38-Ullmann, 15-Green, 05-Cansico, 09-Decker, 08-Curzakis, 27-
design, they receive high-calcium punch for the first part of the summer, and then low-calcmm McNeill, 23-Kim, 30-Quinones, 28-Morse, 18-Howard, 03-Afifi, 01-Abbott, 36-Travers. These
punch in the second half. subjects will be assigned to Treatment Group 1; the remaining subjects go into Group 2. (d) We
want the treatment groups to be as alike as possible. If the same operating team was used to
5.51 (a) "Randomized" means that patients were randomly assigned either St. Jo~'s-wort , operate on "warmed" and ''unwarmed" patients, then the effect of the "warming" on the
extract or the placebo. "Placebo controlled" means that the results for the group usmg St. John s- occurrence of infection might be confounded with the effect of the surgical team (e.g., how
wort extract were compared to the group that received the placebo. "Double-blind" means that skillful the team was in performing the necessary preventive measures). (e) Double-blinding.
neither the subjects nor the researchers interacting with them (including those who measured We would prefer a double-blind experiment here to ensure that the patients would not be treated
depression levels) know who is receiving which treatment. differently with regard to preventing and monitoring infections due to prior knowledge of how
(b) they were treated.

Group I Treatment I
98 subjects ~ St. John's wmt
5.54 (a) There are three factors (roller type, dyeing cycle time, and temperature), each with
two levels, for a total of 2 3 = 8 treatments. The experiment therefore requires 24 fabric

Random Allocation
/ ""' Compare rate of
change on
specimens. (b) In the interest of space, only the top half of the diagram is shown below.
The other half consists of Groups 5 to 8, for which the treatments have natural bristle rollers
depression scale instead of metal rollers.
~ /
Group2
I02 subjects - Treatment2
Placebo
138 Chapter 5 Producing Data 139

3
Group l
s.pecim~ns - Metal roller,
30 min., 150 Group I Treatment I

i 1
216subjects-

~
Beta carotene
Group 2
3 speCimens - Metnl roller,
30 min., 175
Random

- ~
~
assignment Compare finish

\
Group 3 Mcfil.l roller, Group2
~ Tre,.mMt2
3 specimens 40 min., 150 216 subjects v;,,m;,c&E ~

Group 4
- Metal roller, Random Allocation
Compare rate of
colon cancer

~ 1
3 sp~imens 40 nlit\., J.75
Group 3 Treatment)
~
216subjects All3
5.55 (a) Randomly assign 20 men to each of two groups. Record each subject's blood pressure,
then apply the treatments: a calcium supplement for Group I, and a placebo for Group 2. After
sufficient time has passed, measure blood pressure again and observe any change. (b) Number Group4 Treatment4
216subjects- Placebo
from 01 to 40 down the columns. Group I is 18-Howard, 20-Irnrani, 26-Maldonado, 35-
Tompkins, 39-Willis, 16-Guillen, 04-Bikalis, 21-James, 19-Hmska, 37-Tullock, 29-0'Brian, 07-
Cmnston, 34-Solomon, 22-K.aplan, 10-Durr, 25-Liang, 13-Fratianna, 38-Underwood, 15-Green,
and 05-Chen. (c) Block on race (make 2 groups, one of black men and one of white men) and 5.59 Three possible treatments are (1) fine, (2) jail time, and (3) attending counseling classes.
then apply the design in (a) to each block. Observe the change of blood pressure for each block. The response variable would be the rate at which people in the three groups are rearrested.

Group l Treatment I
5.56 The simplest design would be a completely randomized design, assigning half of the n drunk drivers
-~
Fine
women to take strontium renelate and half to take the placebo. A better design would block
according to the medical center (and country); that is, randomly select for the strontium I ~ Compare future
renelate group half the women from country A, half of those from country B, and so on. Random Allocation -------..
Group2 _____. Treatment2 ----+- offenses and
n drunk drivers Jail
This blocking would take care of differences from one cpuntry to another. arrest rate
'\..
5.57 Responding to a placebo does not imply that the complaint was not "real"-38% of the
placebo group in the gastric freezing experiment improved, and those patients really had ulcers.
Group3
n drunk drivers - Treatment 3
Counseli11g
I
The placebo effect is a psychological response, but it may make an actual physical improvement
in the patient's health.
5.60 (a) Each subject takes both tests; the order in which the tests are taken is randomly chosen.
5.58 (a) The explanatory variable is the vitamin(s) taken each day; the response variable is the (b) Take 22 digits, one for each subject, fiom Table B. If the digit is even, the subject takes the
colon cancer rate. (b) Diagram below; equal group sizes are convenient but not necessary. (c) BI first; if it is odd, he or she takes the ARSMA frrst. Answers will vary, but approximately 11
Using labels 001 through 864 (or 000 through 863), we choose 731, 253, 304, 470, and 296. (d) subjects should take each BI first. Using line 107 of Table B, subjects I, 2, 8, 10, 11, 12, 13, 14,
"Double-blind" means that both the subjects and those who work with the subjects do not know 16, and 21 would take ARSMA first and the other 12 subjects would take BI first.
who is getting what treatment. This prevents the expectations of those involved from affecting
the way in which the subjects' conditions are diagnosed. (e) The observed differences were no CASE CLOSED!
more than what might reasonably occur by chance even if there is no effect due to the treatments. I. (a) Researchers simply asked 305 pregnant women to rate their stress levels and chocolate
(f) Some possible lurking variables are amount of exercise, fiber intake, cholesterol level, fat consumption, so this is an observational study. (b) Since this is an observational study, the
intake, amount of sleep, etc. researchers should not suggest a cause and effect relationship. Suggesting that chocolate
"produced" these feelings is going too far.
2. (a) Both of the studies simply observed the impact of a treatment on a group of patients. The
major difference is that Dr. Hollenberg measured blood flow on two separate days (day I and
day 5). Thus, each subject has a baseline measurement and a measurement after consuming
flavonols for a week. (b) Dr. Hollenberg's study is a matched pairs design. The measurements
on day I and day 5 are not independent since they are obtained on the same subjects. (c)
Randomly assign the patients into two groups, one group is neated using the protocol from Dr.
Hollenberg's study and the other groups is treated the same way except the fluid they drink each

'
,I
T,
Chapter 5 Producing Data 141
140

day does not contain 900 milligrams offlavonols. The differences in blood flow (day 5- day 1)
are compared for the two groups. 5.64 (a) Possible response variables are: whether or not a subject has a job within some period
3. (a) The researchers did not use a completely randomized design because they wanted to of time, number of hours worked during some period, length of time before subject became
control the subject-to-subject variability. Blood flow differs for different individuals. (b) The employed. For the design, randomly assign about one-third of the group (3,355 subjects) to
investigators used a randomized block design. (c) The researchers were able to control a major each treatment, and observe the chosen response variables after a suitable amount of time.
source of variation. (b) The simplest approach is to label from 00001 through 10065, and then take five digits at
a time from the table. (This means we have to skip about 90% of the five-digit sets, as only
5.61 (a) Explanatory variable: type of cookie (Toll house or Dark Chocolate); Response those beginning with 0 [and a few beginning with 1) are useful.) With this approach, we
variable: cookie preference. (b) The population is all cookies manufactured of each type. We choose 00850, 02182, and 00681 (the last of these is on line 172). More efficient labellings
can assume that the cookies used were not systematically different from the population and so are possible and will lead to different samples.
hopefully a representative sample was used, but not a SRS because not all sets of cookies in the
population had an equal chance of being used. 5.65 Each player will complete the experiment twice, once with oxygen during the rest period
and once without oxygen. The order for each player will be determined by the flip of a coin. If
(c)
the coin lands heads up, then the player will receive oxygen during the first set of sprints. If the
Group I Treatment I coin lands tails up, then the player will not receive oxygen during the first set of sprints. After a
n/2 students _....... Toll House first suitable amount of time, the players will run the sprints again with the different "treatment"
during the second set of sprints. The differences for each player (final time with oxygen- final.
time without oxygen) will be used to check for an oxygen effect.
Compare
Random Allocation preferences
5.66 A stratified random sample would be useful here; one could select 50 faculty members
from each type of institution. If a large proportion of faculty in your state work at a particular
Group 2 __.. Treatment 2 class of institution, it may be useful to stratify unevenly. If, for example, about 50% teach at
n/2 students Dark Chocolate first Class I institutions, you may want half your sample to come from Class I institutions.

5.67 (a) The treatment combinations are shown in the table below, and the design is also
(d) Variability was controlled by the students being randomly assigned to the two groups, the diagrammed below.
cookies being tasted in different orders, the cookies being as similar as possible and given to the
student in the same way. (e) Half of the students were assigned to each treatment group which Factor B
provides sufficient student-to-student variability to show that any difference is due to the cookies
and not the subject variability. (f) The experiment was blind (the students don't know which
cookie they are eating- it has nothing to do with their blindfold!) but not double blind (the Factor A 5 mg Treatment 2
person handing them the cookie would know-by the appearance--which cookie the student was Dosage 10 mg Treatment 4 Treatment 5
being given. (g) Answers will vary but some examples are: Can they tell the difference at all?
You could add 2 more treatments: Toll house followed by another Toll House and a Dark
Chocolate followed by another Dark Chocolate. They may think there is a strong preference for
one cookie when, in fact, they are both the same! You might want to block by gender or if you
wanted to reach a conclusion using the whole school, you could take a stratified sample using
grade.

5.62 (a) The population is Ontario residents; the sample is the 61,239 people interviewed. (b)
The sample size is very large, so if there were large numbers of both sexes in the sample--this is
a safe assumption since we are told this is a "random sample"-these two numbers should be
fairly accurate reflections of the values for the whole population.

5.63 (a) A matched pairs design (two halves of the same board would have similar properties).
(b) A sample survey (with a stratified sample: smokers and nonsmokers). (c) A block design
(blocked by sex).
T
142 Chapter 5 Producing Data 143

Group I
n subjects - Treatment 1
5 mg, injected
5.69 As described, there are two factors: ZIP code (three levels: none, 5-digit, 9-digit) and the
day on which the letter is mailed (three levels: Monday, Thursday, or Saturday) for a total of9
treatments. To control confounding variables, aside from mailing all letters to the same address,
Group 2
- Treatment 2 all letters should be the same size, and either printed in the same handwriting or typed. The

t
n subjects 5 mg, patch design should also specify how many letters will be in each treatment group. Also, the letters

~ o o e~ 7,~:0'm
should be sent randomly over many weeks.

Random Allocation
Group3
n subjects - Treatment 3
5 mg, IV drip
tho
blood after 30
5.70 Each subject should taste both kinds of cheeseburger, in a randomly selected order, and
then be asked about their preference. Both burgers should have the same "fixings" (ketchup,

~ -
/i
minutes
Group 4 Treatment4 mustard, etc.). Since some subjects might be able to identify the cheeseburgers by appearance,
n subjects I0 mg, injected

\
one might need to take additional steps (such as blindfolding, or serving only the center part of
GrourJ5
n subjects

Group 6
n subjects
-
-
TreatmentS
10 mg, patcb

Treatrnent6
10 mg, IV drip
1 the burger) in order to make this a truly "blind" experiment.

5.71 (a)

I Group I
27 palieots
Treatment I
NSAIDINSAID~

(b) Larger samples give more information; in particular, with large samples, the variability in the
observed mean concentrations is reduced, so that we can have more confidence that the
2~ro~!'
2 Treatment 2 ________.. C .
Random Allocation -----+- Placebo/NSAID ompare pam score
differences we might observe are due to the treatment applied rather than random fluctuation. ~ paoents
(c) Use a block design. Separate men and women and randomly allocate the 6 treatments to each
gender.
Group 3 ___,_ Treatment
. 3 /
27 patients Placebo/Placebo
5.68 (a) This is a randomized block design. The blocks here are "runners" and "nonrunners."
(b) The two extra patients can be randomly assigned to two of the three groups. (b) The patients,
Group I
physicians, and physical therapists did not know which subjects were in which group. (c) If the
VitaminC pain scores for Group A were significantly lower than that of Group C, it suggests that NSAID
/ was successful in reducing pain. However, because the pain score for Group A was not

~
Runners - Random Allocation
significantly lower than Group B, this suggests that it was the application ofNSAID before the
~
Subjects
I Group2
Placebo

~ Compare infecllon
rates
surgery that made the drug effective.

5.72 (a) It appears to be an observational study. Taking blood samples from the subjects and

\ Nourunners - Rnndom Allocation


/
~
Group I
Vitamin C

Group2
Placebo
:l measuring their lung function do not qualify as treatments. (b) No. While a large sample does
reduce variability, it does not imply causation. (c) No. This is not an experiment. Association is
not the same as causation.

(~) A difference in rate of infection may have been due to the effects of the treatments, or it may
simp~~ have been. due .to random chance. Saying that the placebo rate of 68% is "significantly
more than the Vttamm Crate of33% means that the observed difference is too large to have
occurred by chance alone. In other words, Vitamin C appears to have played a role in lowering
the infection rate of runners.
T
I
144 Part II Review Exercises Part II Review Exercises 145

Part II Review Exercises The response variables are color and flavor of the french fries. (b) Potatoes will be randomly
assigned in equal quantities to each of the six treatments (listed above). Each taster rates the
11.1 (a) Yes-the researchers used double blinding in this experiment. Since the subjects did not color and flavor of the french fries from each treatment. The ratings are then compared to find
know what kind of toothpaste they were using (unmarked tubes were used) and the dentists did the best storage method and cooking procedure. (c) The french fries should be served to the
not know which subjects were using which toothpaste, both groups were blind as to which group tasters on unmarked plates. Each taster will be presented with the six plates in random order.
each subject was in. Between tastings, each taster should have some water, as a "wash out" process.
(b) The researchers gave all of the volunteers a free tooth cleaning so that the volunteers' teeth
would all be free of tartar buildup at the beginning of the study. 11.5 (a) Each customer will taste both the "Mocha Frappuccino Light" and the regular version of
(c) Suppose the researchers believe that men's and women's dental hygiene habits differ this coffee. The order in which the customer tastes the two products will be randomized.
systematically. For instance, maybe women tend to brush more frequently and more thoroughly Between tastings, each customer will drink water for a "wash out" period. We can make this
than men do. Then the researchers could block the volunteers by gender. They should randomly study double-blind if the customers are given the two types of coffee in unmarked cups. This
assign 60 men to the tartar control group and 60 men to the regular toothpaste group. For the way neither the customers nor the people serving the coffee know which type of coffee is in
women, 45 would be randomly assigned to the tartar control group and 45 to the regular which cup. (b) We label each customer using labels 01, 02, 03, ... , 40. We enter the partial table
toothpaste group. Blocking in this way would isolate the unwanted variability due to gender of random digits and read two-digit groups. The labels 00 and 41 to 99 are not used in this
differences in dental hygiene that is present in the completely randomized design. example, so we ignore them. We also ignore any repeats of a label, since that customer has
already been assigned to a group. The first 20 customers selected will receive "Mocha I
11.2 (a) A potential source of bias related to the question wording is that people may not Frappuccino Light" first. The remaining 20 customers will receive the regular version first. Here,
remember how many movies they watched in a movie theater in the past year. It might help the we pick ouly the first 3 in the "Mocha Frappuccino Light" first group. The first two-digit group ,I
polling organization to shorten the amount of time that they ask about, perhaps 3 or 6 months. is 07, so the customer with label 07 is in the "Mocha Frappuccino Light" first group. The second
(b) A potential source of bias not related to the question wording is that the poll contacted people two-digit group is 51, which we ignore. The third two-digit group is 18, so the customer with
through "residential phone numbers." Since more and more people (especially younger adults)
I
labell8 is in the "Mocha Frappuccino Light" first group. The fourth two-digit group is 89, which '
are using ouly a cellular phone (and do not have a residential phone), the poll omitted these we ignore. The fifth two-digit group is 15, so the customer with labell5 is in the "Mocha
people from the sampling frame. These same people might be more likely to watch movies in a Frappuccino Light" first group. Thus, the first three customers in the "Mocha Frappuccino
movie theater. The polling organization should include cell phone numbers in their list of Light" first group are those customers with labels 07, 18, and 15. (c) No. Using a matched pairs
possible numbers to call. design gives us an advantage over the completely randomized design. The advantage is that we
can compare how each customer rates the new and regular coffee. Since each customer tastes
11.3 (a) This was an observational study. The researchers examined data from the nurses, both types of coffee, we can say which one the customer prefers and then look at the proportion
including their alcohol consumption. The researchers did not assign the nurses to different of these customers who prefer the new coffee drink. A completely randomized design would not
alcohol consumption groups-these were preexisting groups, and the researchers observed the take advantage of this natural comparison due to pairing.
results. (b) "Significantly lower risk of death" means that the light-to-moderate drinkers had
lower death rates than both nondrinkers and heavy drinkers and that these lower death rates were 11.6 (a) One sampling method that depends on voluntary response would be to put an
very unlikely to be explained by chance variation alone. (c) One possible lurking variable is advertisement in the school paper and ask students who park on campus to complete a survey.
exercise. Perhaps light-to-moderate drinkers exercised more often than both nondrinkers and Ouly those students who are passionate about the parking issue will respond to the survey,
heavy drinkers. Regular exercise might be associated with lower risk of heart disease. In that resulting in voluntary response bias. (b) One sampling method that is bad in another way would
case, we wouldn't know whether the light-to-moderate drinking or the regular exercise led to be to talk to students as they leave the parking lot. (c) A sampling method that would have more
reduced risk of death from heart disease. reliable results would be to select a random sample of students who park on campus and contact .I
these students (probably via email) to fmd out their opinions about parking on campus. While we
11.4 (a) There are two factors-storage of potatoes and cooking procedure. There are three might still have some nonresponse, this sampling method attempts to eliminate bad sampling
levels for potato storage: fresh picked (i.e., not stored), stored for a month at room temperature, practices.
stored for a month refrigerated. There are two levels for cooking the potatoes: cooked
innnediately after slicing, sliced and cooked after an hour at room temperature. There are six 11.7 (a) "Controlled scientific studies" implies that controlled, randomized experiments have
treatments been used. The control part is important, because this means that the nonphysical treatments
Potato stora2e were compared to other treatments for the same ailments. The scientific part is what implies to
Cooking procedure Fresh picked Stored (room temp.) Stored (refrigerated) the reader that this was an experiment, where the researchers randomly assigned subjects to
Cooked immediately 1 2 3 treatment groups, instead of an observational study, where subjects self-select the treatments they
Cooked after an hour 4 5 6
t
146 Part II Review Exercises Part II Review Exercises 147

receive. (b) The control group allows for a comparison, while random assigmnent into treatment "60% off' grows increasingly attractive to customers as the percent of goods on sale increases.
groups attempts to balance the unknown impacts of variables not under study. When only 25% of food is on sale, customers rate the "50% to 70% off' range as more attractive
than the precise "60% off' advertisement. For all other percents of foods on sale, the precise
11.8 (a) Randomly assign 36 of the acacia trees to have active beehives placed in them (the other "60% off' is more attractive to customers and becomes more and more attractive than the range
36 acacia trees will have empty beehives placed in them). Compare the damage caused by "50% to 70% off' as the percent of food items on sale increases.
elephants to the trees with active and empty bee hives. (b) The randomization in this experiment
is important so that variables such as location, accessibility, rainfall, etc. are "scattered" among
the two groups (trees with active and empty beehives). (c) We would want the evaluators of the
elephant behavior to be blind to which trees have active beehives and which trees have empty
beehives, if possible, so that they do not knowingly or unknowingly rate the elephant damage
differently based on this knowledge.

11.9 (a) In observational studies, the subjects "self-select" into the groups being observed. In
experiments, the subjects are randomly assigned to treatment groups. We can show cause and
effect with experiments, but not with observational studies. (b) A "randomized controlled trial"
is one where subjects are randomized into treatment groups and a control group receiving an
alternative treatment (possibly a placebo or dummy treatment) is used so that treatment
effectiveness can be compared. (c) "Healthy user bias" means that the people who supplement
with vitamin E might also do other things that contribute to their general health that might lessen
the risk of heart disease. In an observational study, we cannot separate out this "healthy user
bias" from the effect of vitamin Eon the risk of heart disease. But, in a randomized controlled
experiment, the randomization spreads the "healthy user bias" out among the treatment groups so
it is not a factor we must consider.

11.10 (a) There will be 8 treatment groups, with 25 people randomized into each treatment
group. The treatments are:
Treatment I: 25% of food on sale, 60% off
Treatment 2: 50% of food on sale, 60% off
Treatment 3: 75% of food on sale, 60% off
Treatment 4: I 00% of food on sale, 60% off
Treatment 5: 25% of food on sale, 50-70% off
Treatment 6: 50% of food on sale, 50-70% off
Treatment 7: 75% of food on sale, 50-70% off I
Treatment 8: I 00% of food on sale, 50-70% off
Researchers will compare the mean attractiveness rating given by individuals in the eight groups.
(b) Since there are 200 subjects, we label the subjects 001, 002, ... , 200. The labels 000 and 201
to 999 are not used in this example, so we ignore them. We also ignore any repeats of a label,
since that subject is already in a treatment group. Once we have 25 subjects for the first
treatment, we select 25 subjects for the second treatment, and so on, until all subjects have been
assigned to a treatment group. Here we pick only the first 3 subjects. The first three-digit group
is 457, which we ignore. The second three-digit group is 404, which we ignore. The third three-
digit group is 180, so subject 180 is the first person assigned to treatment group I. The fourth
three-digit group is 165, which we ignore. We also ignore 561 and 333 until we get to 020,
which means subject 020 is in treatment group I. We then ignore 705 and assign the subject with
label 193 to treatment group I. (c) The range "50% to 70% off' slowly decreases in
attractiveness to customers as the percent of goods on sale increases. To the contrary, the precise

148 Chapter 6 Probability and Simulation: The Study of Randomness 149

Chapter 6 6.5 The choice of digits in these simulations may of course vary from that made here. In (a)-(c),
a single digit simulates the response; for (d), two digits simulate the response of a single voter.
6.1 Answers will vary but examples are: (a) Flip the coin twice. Let HH represent a failure, and (a) Odd digits represent a Democratic choice; even digits represent a Republican choice. (b) 0,
let the other three outcomes, HT, TH, TT, represent a success. (b) Let I, 2, and 3 represent a I, 2, 3, 4, 5 represent a Democratic choice and 6, 7, 8, 9 represent a Republican choice. (c) 0, I,
success, and let 4 represent a failure. If 5 or 6 come up, ignore them and roll again. (c) Peel off 2, 3 represent a Democratic choice; 4, 5, 6, 7 represent a Republican choice; 8, 9 represent
two consecutive digits from the table; let 00 through 74 represent a success, and let 75 through Undecided. (d) 00, 01, ... , 52 represent a Democratic choice and 53, 54, ... , 99 represent a
99 represent a failure. (d) Let diamonds, spades, clubs represent a success, and let hearts Republican choice.
represent a failure. You must replace the card and shuffle the deck before the next trial to
maintain independence. 6.6 For the choices made in the solution to Exercise 6.5:
(a) D, R, R, R, R, R, R, D, R, D- 3 Democrats, 7 Republicans
6.2 Flip both nickels at the same time. Let HH represent~ success (the occurrence of the (b) R, D, D, R, R, R, R, D, R, R- 3 Democrats, 7 Republicans
phenomenon of interest) and HT, TH, TT represent ~ failure (the nonoccurrence of the (c) R, U, R, D, R, U, U, U, D, R- 2 Democrats, 4 Republicans, 4 undecided
phenomenon). (d) R, R, R, D, D, D, D, D, D, R- 6 Democrats, 4 Republicans

6.J (a) Obtain an alphabetical list of the student body, and assign consecutive numbers to the 6.7 Let 1 represent a girl and 0 represent a boy. The command randint(O,l,4) produces a 0 or I
students on the list. Use a random process (table or random digit generator) to select I 0 students with equal likelihood in groups of 4. Continue to press ENTER. In 50 repetitions, we got at
from this list. (b) Let the two-digit groups 00 to 83 represent a "Yes" to the question of whether least one girl47 times, and no girls three times. Our simulation produced a girl 94% of the time,
or not to abolish evening exams and the groups 84 to 99 represent a "No." (c) Starting at line vs. a probability of 0.938 obtained in Example 6.6.
129 in Table B ("Yes" in boldface) and moving across rows:
Repetition 1: 36, 75, 95, 89, 84, 68, 28, 82, 29,13 #"Yes": 7. 6.8 (a) Let the digits 0, 1, 2, 3, 4, 5 correspond tQ the American League team winning a Series
Repetition 2: 18, 63, 85, 43, 03, 00, 79, 50, 87, 27 #"Yes": 8. game and 6, 7, 8, 9 correspond to the National League team winning. Single digits are chosen
Repetition 3: 69, 05, 16, 48, 17, 87,17, 40, 95, 17 #"Yes": 8. until one team has won four games, with a minimum of four digits and a maximum of seven
Repetition 4: 84, 53, 40, 64, 89, 87, 20, 19, 72, 45 # "Yes": 7. digits being chosen. On the TI-83, you can use the command randint (0, 9, 1) repeatedly to
Repetition 5: 05, 00, 71, 66, 32, 81, 19, 4f, 48,73 #"Yes": 10. generate the digits. Here are two repetitions:
(Theoretically, we should achieve 10 "Yes" results approximately 17.5% of the time.) 0, 3, 9, 2, 7, 9, 2 AL, AL, NL, AL, NL, NL, AL #games= 7
3, 0, 9, 1, 0 AL, AL, NL, AL, AL #games= 5
6.4 (a) A single random digit simulates one shot, with 0 to 6 representing a made basket and 7, The long-run average of many repetitions will give the approximate number of games one would
8, or 9 representing a miss. Then 5 consecutive digits simulate 5 independent shots. (b) Let 0-6 expect the Series to last. The average should be close to 5.6979. (b) Other factors might
represent a "made basket" and 7, 8, 9 represent a "missed basket." Starting with line 125, the include: the starting pitchers, the weather conditions, injury status of key players.
first four repetitions are:
Repetition 9 7H ll H 9 J.7 8 ~ .:l. 7 18 8 6.9 Let 00 to 14 correspond to breaking a racquet, and let 15 to 99 correspond to not breaking a
Number of misses (2) (I) (2) (3) racquet. Starting with line 141 in the random digit table, we peel two digits off at a time and
Each block of 5 digits in the table represents one repetition of the 5 attempted free throws. record the results: 96 76 73 59 64 23 82 29 60 12. In the first repetition, Brian played 10
The underlined digits represent made baskets. We perform 46 more repetitions for a total of 50, matches until he broke a racquet. Addition repetitions produced these results: 3 matches, 11
and calculate the relative frequency that a player misses 3 or more of 5 shots. Here are the matches, 6 matches, 37 matches, 5 matches, 3 matches, 4 matches, II matches, and I match.
numbers of baskets missed for the 50 repetitions. The average for these 10 repetitions is 9.1. We will learn later that the expected number of
21231 10132 22332 12401 matches until a break is about 6.67. More repetitions should improve our estimate.
11212 10102 33233 12023
12321 22210 6.10 (a) Let the digits 0, I, 2, 3, and 4 correspond to a girl and the digits 5, 6, 7, 8, and 9
A frequency table for the number of missed shots is shown below. correspond to a boy. (b) A table indicating the number of girls in a family with 4 children,
I Number of misses I 0 I I I 2 I 3 I 4 5 I frequencies, and percents is shown below.
Girls Count Percent
I Frequency I 6 I 15 I 18 I 10 I 1 o I 0 3 7.50
The relative frequency of missing 3 or more shots in 5 attempts is 11/50 = 0.22. 1 6 15.00
2 17 42.50
3 10 25.00
Note: The theoretical probability of missing 3 or more shots is 0.1631. 4 4 10.00
Note: The theoretical percents are: 6.25, 25, 37.5, 25, and 6.25.
!50 Chapter 6 Probability and Simulation: The Study of Randomness !51

6.11 Let integers I to 25 correspond to a passenger that fails to appear and 26 to I 00 correspond
to a passenger that shows up. The command randlnt(l,I00,9) represents one van. Continue to (c) The mean number of hits in 20 at bats was x = 6.25. And 6.25/20 = 0.3125, compared with
press ENTER. In 50 repetitions, our simulation produced 12 vans with 8 people and 3 vans with the player's batting average of .320. Notice that even though there was considerable variability in
9 people so 15 vans had more than 7 people, suggesting a probability of0.3 that the van will be the 20 repetitions, ranging from a low of3 hits to a high of9 hits, the results of our simulation
overbooked. Note: The theoretical probability is 0.3003. are very close to the player's batting average.

6.12 (a) Since there are four parts to each multiple choice question, the probability of guessing 6.17 (a) One digit simulates system A's response: 0 to 8 shut down the reactor, and 9 fails to
correctly is 0.25. Let digits 00 to 24 correspond to a correct solution and the digits 25 to 99 shut it down. (b) One digit simulates system B's response: 0 to 7 shut down the reactor, and 8 or
correspond to an incorrect solution. Jack's average score in 100 repetitions was 2.8. Noie: The 9 fail to shut it down. (c) A pair of consecutive digits simulates the response of both systems, the
expected score is 2.5. (b) Since Sam does not answer any questions, he will not earn or lose any first giving A's response as in (a), and the second B's response as in (b). !fa single digit were
points, so his score is 0. Since Jack guesses at all!O questions, we expect him to get 25% of used to simulate both systems, the reactions of A and B would be dependent-for example, if A
them correct and 75% of them incorrect. Jack earns 4 times as many points for a correct guess as fails, then B must also fail. (d) Answers will vary. The true probability that the reactor will shut
he loses for an incorrect guess, so we would expect Jack to score higher than Sam. Note: Jack's down is 1-(0.2)(0.1) = 0.98.
expected score is 4x2.5-l x7.5 = 2.5.
6.18 This simulation is fun for students, but the record-keeping can be challenging! Here is one
6.13 (a) Read two random digits at a time from Table B. Let 01 to 13 represent a Heart, let 14 to method. First number the (real or imaginary) participants 1-25. Write the numbers 1-25 on the
52 represent another suit, and ignore the other two-digit numbers. (b) You should beat Slim board so that you can strike through them as they hear the rumor. We used randint (1, 25) to
about 44% of the time; no it is not a fair game. randomly select a person to begin spreading the rumor, and then pressed ENTER repeatedly to
randomly select additional people to hear the rumor. We made a table to record the round (time
6.14 On the Tl-83, we started a counter (C), and then executed the command shown, pressing increment), those who knew the rumor and were spreading it, those randomly selected to hear the
the ENTER key 30 times for 30 repetitions. rumor, and those who stopped spreading it because the person randomly selected to hear it had
already heard it. Here is the beginning of our simulation, to illustrate our scheme:
1-+C L1 li.i .L~ 1
1
randlnt(1. 99. :5)+ ~ji:
B
U? .
..........._ ""!...;..... ""' ...
Time
incr Knows Tells Stopped
Ll::s..
: +C+C
YM(L1 HL<:(C).
. 1 16 --> 2
:2
I 2 2 --> 25
L1(1)=53
16 --> 3
For five sets of 30 repetitions, we observed 5, 3, 3, 8, and 4 numbers that were multiples of 5.
The mean number of multiples of 5 in 30 repetitions was 3.6, so 3.6/30 = 0.12 is our estimate for 3 2 19
the proportion of times a person wins the game. 3 6
16 15
6.15 The command randlnt(l,365,23)->L 1 : SortA (LI) randomly selects 23 (numbers) birthdays 25 1
and assigns them to L 1 Then it sorts the day in increasing order. Scroll through the list to see 21
4 1
duplicate birthdays. Repeat many times. For a large number of repetitions, there should be
duplicate birthdays about half the time. To simulate 41 people, change 23 to 41 in the command
2 5
and repeat many times. There is about a 90% chance that at least 2 people will have the same
3 23
birthday when 41 people are in the room. We assume that there are 365 days for birthdays, and 6 13
that all birthdays are equally likely. 15 25 15
16 9
6.16 (a) Select three digit numbers and let 000 to 319 correspond t9 hits and 320 to 999 19 16 19
correspond to no hits. (b) We entered I -->cENTER to set a counter. Then enter randlnt (0, 999, 25 15 25
20) -.; L 1: sum (Ll 2: 0 and L1 :::; 319) --> L2 (C) : C + I > C and press ENTER repeatedly. The
count (number of the repetition) is displayed on the screen to help you see when to stop. The
5
results for the 20 repetitions are stored in list L2. We obtained the following frequencies:
Chapter 6 Probability and Simulation: The Study of Randomness 153
152

Eventually we crossed off all but 7, 12, 14, and 24, so 4 out of25 or 4/25 = 0.16 or 16% never 6.26 (a) We expect probability 1/2 (for the first flip, or for any flip of the coin). (b) The
heard the rumor. Note: With a sufficiently large population, approximately 20% ofthe theoretical probability that the first head appears on an odd-numbered toss of a fair coin is
population will not hear the rumor. I (I)'
2+
2 2
+(I)' +... 2.
3
= Mostanswersshouldbebetweenabout0.47and0.87.
6.19 (b) In our simulation, Shaq hit 52% of his shots. (c) The longest sequence of misses in our
run was 6 and the longest sequence of hits was 9. Of course, results will vary. 6.27 The study looked at regular season games, which included games against weaker teams,
and it is reasonable to believe that the 63% figure is inflated because of these weaker opponents.
6.20 (a) The proportions were 0.65, 0.7125, 0.7187. With n = 20, nearly all answers will be In the World Series, the two teams will (presumably) be nearly the best, and home game wins
0.40 or greater. With n = 80, nearly all answers will be between 0.58 and 0.88. With n = 320, will not be so easy.
nearly all answers will be between 0.66 and 0.80. (b) The set of results for 320 women is much
less variable. For 20 women the proportions varied from 0.45 to 0.9, with a standard deviation 6.28 In the long run, the fraction of five-card hands containing two pairs will be about 1!21. It
of0.137. For 320 women, the proportions varied from 0.7125 to 0.775, with a standard does not mean that exactly one out of21 hands contains two pairs; that would mean, for
deviation of0.01685. As the number of trials increases, the variability in the proportion example, that if you've been dealt 20 hands without two pairs, that you could count on the next
decreases. hand having two pairs. Recall that chance behavior is not predictable for a small number of
trials, the regular and predictable pattern occurs in the long run.
6.21 A large number of trials of this experiment often approach 40% heads. One theory
attributes-this surprising result to a "bottle-cap effect" due to an unequal rim on the penny. We 6.29 (a) S = {germinates, fails to grow}. (b) The survival time could be measured in days,
don't know. But a teaching assistant claims to have spent a profitable evening at a party betting weeks, months, or years. S ={nonnegative integers}. (c) S= {A, B, C, D, F}. (d) Using Y for
on spinning coins after learning of the effect. "yes (shot made)" and N for "no (shot missed)," S = {YYYY, NNNN, YYYN, NNNY, YYNY,
NNYN, YNYY, NYNN, NYYY, YNNN, YYNN, NNYY, YNYN, NYNY, YNNY, NYYN}.
6.22 The theoretical probabilities are, in order: 1/16, 4/16 = 1/4, 6/16 = 3/8, 4/16 = 1/4, 1116. (There are 16 outcomes in the sample space.) (e) S= {0, I, 2, 3, 4}

6.23 There are 21 Os among the first 200 digits; tpe proportion is 21/200 = 0.105 . 6.30 (a) S= {all numbers between 0 and 24 hours}. (b) S= {0, I, 2, ... , 11000}. (c) S= {0, I,
2, ... , 12}. (d) S= {all numbers greater than or equal to 0}, or S= {0, O.oi, 0.02, 0.03, ... }.
6.24 (a) 0. (b) 1. (c) 0.01. (d) 0.6 (Note: While 0.6 is the best answer for part (d), 0.99 is not (e) S = {all positive and negative numbers}. Note that the rats can lose weight.
incorrect.)
6.31 S = {all numbers between _ and __ } The numbers in the blanks may vary. Table 1.10
6.25 The table below shows infmmation from www.mms.com. The exercise specified M&M's has values from 86 and 195 cal; the range of values should include at least those numbers. Some
Milk Chocolate Candies, but students may be interested in other popular varieties. Of course, students may play it safe and sayS= {all numbers greater than 0}
answers will vary, but students who take reasonably large samples should get percentages close
to the values in the table below. (For example, samples of size 50 will almost always be within 6.32 (a) If two coins are tossed, then by the multiplication principle, there are 2x2 = 4 possible
12%, while samples of 75 should give results within I 0%.) In a sample of 1695 candies, 439 outcomes. The outcomes are illustrated in the following tree diagram:
or about 25.9% were blue.
M&M's variety Blue%
Milk Chocolate 24%
Peanut 23%
Almond 20%
Peanut Butter 20%
Crispy 17%
Dark Chocolate 17%
Minis 25%
Baking Bits 25%
,
I
I

Probability and Simulation: The Study of Randomness !55


Chapter 6
!54

sample space isS= {lll, ll2, 113,121,122,123,131,132, 133,2ll,212,213,221,222,223,


231, 232, 233, 311, 312, 313, 321, 322, 323, 331, 332, 333}.
Toss 1 Toss2

6.35 (a) Number of ways Sum Outcomes


H HH I 2 (1,1)

Start
H

< T HT
2
3
4
5
6
3
4
5
6
7
(I, 2) (2, I)
(I, 3) (2, 2) (3, I)
(1,4)(2,3)(3,2)(4, I)
(1, 5) (2, 4) (3, 3) (4, 2) (5, I)
(I, 6) (2, 5) (3, 4) (4, 3) (5, 2) (6, I)
5 8 (2, 6) (3, 5) (4, 4) (5, 3) (6, 2)
H TH 4 9 (3, 6) (4, 5) (5, 4) (6, 3)

<
3 10 (4,6)(5,5)(6,4)
T 2 II (5, 6)(6, 5)
I 12 (6, 6)
T TT (b) 18. (c) There are 4 ways to get a sum of5 and 5 ways to get a sum of8.
(d) Answers will vary but might include:
The sample space isS= {HH, HT, TH, TT}. (b) If three coins are tossed, then there are 2x2x2 = The "number of ways" increases by I until "sum= 7" and then decreases by I.
8 possible outcomes. The outcomes are illustrated in the following tree diagram: The "number of ways" is symmetrical about "sum= 7."
To~s 1 Toss 2 Toss) The outcomes show a symmetrical pattern, very similar to stemplots for symmetric
distributions.
HHH
< " -
Odd sums occur in an even number of ways and even sums occur in an odd number of
H

<
ways.
T-- HHT
The possible values of the sum are not equally likely, even though all36 outcomes are
H
HTH equally likely.
< " -
T
T - - HIT

St>rt THH 6.36 (a) 26. (b) 13. (c) I. (d) 16. (e) 3.
< " -
H

<
T-- THT 6.37 (a) The given probabilities have sum 0.96, so P(type AB) = I - 0.96 = 0.04. The sum of all
T H-- TTH
possible outcomes is I. (b) P(type 0 or B)= 0.49 + 0.20 = 0.69.

T 6.38 (a) The sum of the given probabilities is 0.76, so P(blue) = I - 0.76 = 0.24. (b) The sum of
< T - TTT
the given probabilities is 0.77, so P(blue) = I - 0.77 = 0.23. (c) P(milk chocolate M&M is red,
The sample space isS= {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}. (c) If five coins are yellow, or orange)= 0.13 + 0.14 + 0.2 = 0.47. P(peanut M&M is red, yellow, or orange)= 0.12
tossed, then there are 2x2x2x2x2 = 32 possible outcomes, each of which co)lsists of a string of + 0.15 + 0.23 = 0.5.
five letters that may be H's or T's. The sample space isS= {HHHHH, HHHHT, HHHTH,
HHTHH, HTHHH, HHHTT, HHTHT, HHTTH, HTHTH, HTTHH, HTHHT, HHTTT, HTHTT, 6.39 P(either CV disease or cancer)= 0.45 + 0,22 = 0.67; P(other cause)= I - 0.67 = 0.33.
HTTHT, HTTTH, HTTTT, THHHH, THHHT, THHTH, THTHH, TTHHH, THHTT, THTHT,
THTTH, TTHTH, TTTHH, TTHHT, THTTT, TTHTT, TTTHT, TTTTH, TTTTT}. 6.40 (a) Since the three probabilities must add to I (assuming that there were no "no opinion"
responses), this probability must be 1- (0.12 + 0.61) = 0.27. (b) 0.12 + 0.61 = 0.73.
6.33 (a) I Ox I Ox I Ox 10 = 104 = I 0,000. (b) JOx9x8x7 = 5,040. (c) There are I 0,000 four-digit
tags, I ,000 three-digit tags, 100 two-digit tags, and I 0 one-digit tags, for a total of II, II 0 license 6.41 (a) The sum is I, as we expect since all possible outcomes are listed. (b) 1-0.41 = 0.59.
tags. (c) 0.41 + 0.23 = 0.64.

6.34 (a) An outcome of this experiment consists of a string of3 digits, each of which can be I,
2, or 3. By the multiplication principle, the number of possible outcomes is 3x3x3 = 27. (b) The
!56 Chapter 6 Probability and Simulation: The Study of Randomness !57

6.42 There are 19 outcomes where at least one digit occurs in the correct position: Ill, 112, 6.51 (a) P(one call does not reach a person)= 0.8. Thus, P(none of the 5 calls reaches a person)
113, 121, 122, 123, 131, 132, 133,213,221,222,223,233,313,321,322,323,333. The 5
= (0.8) == 0.3277. (b) P(one call to NYC does not reach a person)= 0.92. Thus, P(none of the 5
theoretical probability of at least one digit occurring in the correct position is therefore 19/27 =
0.7037. calls to NYC reaches a person)= (0.92)' == 0.6591.

6.43 (a) The table below gives the probabilities for the number of spots on the down-face when 6.52 (a)Therearesixarrangementsofthedigits 1,2,and3: {123, 132,213,231,312,321},so
tossin a balanced (or "fair" 4-sided die.
Number ofs ots I 2 3 P( winning)= - 6 - = 0.006. (b) With the digits 1, 1, and 2, there are only three distinct
1000
3
Since all4 faces have the same shape and the same area, it is reasonable to assume that any one arrangements {112, 121, 211}, so P( winning)= - - = 0.003.
1000
of the 4 faces is equally likely to be the down-face. Since the sum of the probabilities must be
one, the probability of each should be 0.25. (b) The possible outcomes are (1,1) (1,2) (1,3) (1,4)
6.53 (a) S = {right, left}. (b) S= {All numbers between 150 and 200 em}. (Choices of upper
(2,1) (2,2) (2,3) (2,4) (3,1) (3,2) (3,3) (3,4) (4,1) (4,2) (4,3) (4,4). The probability of any pair is
and lower limits will vary.) (c) S = {all numbers greater than or equal to 0}, or S = {0, 0.01,
1/16 = 0.0625. The table below gives the probabilities for the sum of the number of spots on the
0.02, 0.03, ... }.(d) S ={all numbers between 0 and 1440}. (There are 1440 minutes in one day,
down-faces when tossin two balanced or "fair" 4-sided dice.
so this is the largest upper limit we could choose; many students will likely give a smaller upper
Sum ofs ots 2 3 4 5 6 7
limit.)
Probabili 1/16 2/16 3/16 4/16 3/16 2/16 l/16
P(Sum = 5) = P(1,4) +P(2,3) + P(3,2) + P(4,!) = (0.0625) (4) = 0.25. 6.54 (a) S= {F, M} or {female, male}. (b) S = {6, 7, ... , 20}. (c) S= {All numbers between
2.5 and 6 Vmin}. (d) S= {All whole numbers between __ and _ _ bpm}. (Choices of upper
6.44 (a)P(D)= P(l,2,or3)=0.301 +0.176+0.125=0.602. (b) P(BuD) =P(B)+P(D)= and lower limits will vary.)
0.602: 0.222 = 0.824. (c) P(Dc) = I - P(D) = 1-0.602 = 0.398. (d) P( C nD)= P(l or 3) =
6.55 (a) Legitimate. (b) Not legitimate: The total is more than I. (c) Legitimate.
0.301 + 0.125 = 0.426. (e) P(B n C)= P(7 or 9~ = 0.058 + 0.046 = 0.104.
6.56 (a) If A and B are independent, then P(A and B) = P(A) x P(B). Since A and B are
6.45 Fight one big battle: His probability of winning is 0.6, which is higher than the probability nonempty, then we have P(A) > 0, P(B) > 0, and P(A) x P(B) > 0. Therefore, P(A and B)> 0.
0.8 3 = 0.512 of winning all three small battles. So A and B cannot be empty. (b) If A and Bare disjoint, then P(A and B)= 0. But this cannot be
true if A and Bare independent by part (a). So A and B cannot be independent. (c) Example: A
12 12
6.46 The probability that all 12 chips in a car will work is (1- 0.05) = ( 0.95) == 0.5404. bag contains 3 red balls and 2 green balls. A ball is drawn from the bag, its color is noted, and
the ball is set aside. Then a second ball is drawn and its color is noted. Let event A be the event
that the first ball is red. Let event B be the event that the second ball is red. Events A and B are
6.47 No: It is unlikely that these events are independent. In particular, it is reasonable to expect
not disjoint because both balls can be red. However, events A and B are not independent
that college graduates are less likely to be laborers or operators.
because whether the first ball is red or not, alters the probability of the second ball being red.
7 317
6.48 (a) P(A)= 0.4397 orabout0.44,sincethereare7,317(thousand)malesoutof 6.57 (a) The sum ofall8 probabilities equals 1 and all probabilities satisfY 0 ~p ~I. (b) P(A) =
16,639 0.000 + 0.003 + 0.060 + 0.062 = 0.125. (c) The chosen person is not white. P(Bc) =I- P(B) =I
16,639 (thousand) students in the October 2003 CPS. (b) - (0.060 + 0.691) = 1-0.751 = 0.249. (d) P(A" n B)= 0.691.
P(B)= 3,494+2,630 6,124 . 0_368 1. (c) P(AnB)= 1,589+970 2,559 . 0_1538 . A
16,639 16,639 16,639 16,639 , 6.58 A and Bare not independent because P(A and B)= 0.06, but P(A)xP(B) = 0.125x0.751 =
and Bare not independent since P(AnB) ,c P(A)x P(B). 0.0939. For the two events to be independent, these two probabilities must be equal.
' i

6.59 (a) P(undergraduate and score~ 600) = 0.40x0.50 = 0.20. P(graduate and score~ 600) =
6.49 An individual light remains lit for 3 years with probability I - 0.02; the whole string
20
0.60x0.70 = 0.42. (b) P(score ~ 600) = P(UG and score~ 600) + P(G and score~ 600) = 0.20 +
remains lit with probability (1-0.02)'" =(0.98) "=0.6676. 0.42 = 0.62

6.50 P(neither test is positive)= (1- 0.9)x(l- 0.8) = O.Jx0.2 = 0.02.


.....---
1

T
I
158 Chapter 6 I Probability and Simulation:The Study of Randomness 159

6.60 (a) The choices for Austin and Sara are shown in the table below. The sum of Austin's
picks is in parentheses. Each of the 25 outcomes for Austin could appear with one of the 10
6.62 (a) 1/38. (b) Since 18 slots are red, the probability of winning isP(red) = ~: = 0.4737. (c)
possible choices for Sara, so a tree diagram would have 250 branches.
Austin Sara Number of pairs for Number of pairs for There are 12 winning slots, so P(win a column bet)=~!= 0.3158.
Austin with a sum Austin with a sum less
greater than Sara's pick than Sara's pick 6.63 You should pick the first sequence. Look at the first five rolls in each sequence. All have
1, 1 (2) 25 0 one G and four R's, so those probabilities are the same. In the first sequence, you win regardless
1, 2 (3) 2 24 0 of the sixth roll; for the second sequence, you win if the sixth roll is G, and for the third
1, 3 (4) 3 22 1
1, 4(5) 4 19 3 ~
sequence, you win if it is R. The respective probabilities are ( )' x ( ~) = 0.0082,
1, 5 (6) 5 15 6
2, 1 (3)
2, 2(4) 7
. 2, 3 (5)
6

8
10
6
3
10
15
19
(H x(H =o.0055,and (H x(~)=o.0027. il
2, 4 (6) 9 1 22 6.64 P(first child is albino)= 0.5x0.5 = 0.25. P(both of two children are albino)= 0.25x0.25 =
2, 5 (7) 10 0 24 0.0625 P(neither is albino)= (l-0.25)x(l-0.25) = 0.5625.
3, 1 (4)
3, 2 (5)
3, 3(6) 6.65 (a) A Venn diagram is shown below.
3, 4 (7)
Neither
3, 5 (8) 03
4, C(S)
4, 2 (6)
4, 3 (7)
4, 4 (8)
4, 5M
5, 1 (6) Stanford only Both Princeton only
0.3 0.2 {);Z
5, 2 (7)
5, 3C8l
5, 4 (9)
5, 5 (10)
(b) The sample space contains 25xlO = 250 outcomes. (c) Count the number of pairs for Austin
with a sum greater than each possible value Sara could pick. See the table above. (d) P(Austin
wins)= 125/250 = 0.5 (e) Count the number of pairs for Austin with a sum less than each ..

possible value Sara could pick. See the table above. (f) P(Sara wins)= 100/250 = 0.4. (g) The (b) P(neither admits Zack) = 1 ~ P(Zack is admitted by Princeton or Stanford)= I- (0.4 + 0.5-
probability of a tie is 25/250 = 0.1. Yes, 0.5 + 0.4 + 0.1 = 1. 0.2) = 0.3 (c) P(Stanford and not Princeton)= P(Stanford)- P(both Princeton and Stanford)= 0.5
-0.2 = 0.3
6.61 (a) P(under 65) = 0.321 + 0.124 = 0.445. P(65 or older)= 1-0.445 = 0.555 OR 0.365 +
0.190 = 0.555. (b) P(tests done)= 0.321 + 0.365 = 0.686. P(tests not done)= 1-0.686 = 0.314 6.66 P(A or B)= P(A) + P(B) - P(A and B)= 0.138 + 0.261 - 0.082 = 0.317.
OR 0.124 + 0.190 = 0.314. (c) P(A and B)= 0.365; P(A)xP(B) = (0.555)x(0.686) = 0.3807.
Thus, events A and B are not independent; tests were done less frequently on older patients than 6.67 (a) {A and B}: household is both prosperous and educated; P(A and B)= 0.082 (given). (b)
would be the case if these events were independent. {A and B'}: household is prosperous but not educated; P(A and B') = P(A)- P(A and B)= 0.138
-0.082 = 0.056. (c) {A' and B}: household is not prosperous but is educated; P(A' and B)=
P(B)- P(A and B)= 0.261- 0.082 = 0.179. (d) {A' and B'}: household is neither prosperous
nor educated; P(A' and B') = I - 0.317 = 0.683 (so that the probabilities add to 1).
t
I
160 Chapter 6 Probability and Simulation: The Study of Randomness 161

Afand8 s
083
Coffee only Tea only
o.zo 0.05

Cola only
. 0.15
6.68 To find the probabilities in this Venn diagram, begin with P(A and Band C)= 0 in the
None
center of the diagram. Then each of the two-way intersections P(A and B), P(A and C), and P(B 0.20
and C) go in the remainder of the overlapping areas; ifP(A and Band C) had been something
other than 0, we would have subtracted this from each of the two-way intersection probabilities
to find, for example, P(A and B and Ce). Next, determine P(A only) so that the total probability 6.70 (a) A Venn diagram is shown below. (b) P(country but not Gospel)= P(C)- P(C and G)=
of the regions that make up the event A is 0.6. Finally, P(none) = P(Ae and Be and C") = 0 0.4-0.1 = 0.3. (c) P(neither) = 1- P(C or G)= 1- (0.4 + 0.3- 0.1) = 0.4.
because the total probability inside the three sets A, B, and C is 1. Ne$ther
0.4

(b) P(at least one offer)= P(A orB or C)= 1- P(no offers)= 1- P(N and Be and Cc) = 1 - 0 =
1. (c) P(A and Band Ce), as noted above, is the same as P(A and B)= 0.1, because P(A and B
and C)= 0.
6.71 (a) "The vehicle is a car"= Ac; P(N) = 1- P(A) = 1-0.69 = 0.31. (b) "The vehicle is an
6.69 In constructing the Venn diagram, start with the numbers given for "only tea" and "all imported car"= Nand B. To find this probability, note that we have been given P(Be) = 0.78
three," then determine other values. For example, P(coffee and cola, but not tea)= P(coffee and and P(A and Be)= 0.55. From this we can determine that 78%- 55%= 23% of vehicles
cola)- P(all three). (a) 15% drink only cola. (b) 20% drink none of these. sold were domestic cars-that is, P(Aeand Be)= 0.23-so P(Ae and B)= P(A")- P(N and Be)=
0.31 - 0.23 = 0.08.
Note: The table below summarizes all that we can determine from the given information
(bold).

P(A) =0.69 I P(A") = 0.31


P(B) = 0.22 P(A and B)= 0.14 P(N and B)= 0.08
PiB'l = 0.78 P(A and Be) = 0.55 PiAe and B0) = 0.23
162 Chapter6 Probability and Simulation: TheStudy of Randomness 163

P(2nd card * I * picked)=


. -12 =. 0.2353
51
P(The vehicle is an imported car)= P(A' and B)= P(B)- P(A and B)= 0.22-0.14 = 0.08.
P(3'd card oil I 2 oils picked)= .!..!_ = 0.22
P(A'andB) 0 08 = 0.3636 (d) The events A' and Bare not independent, if 50
(c) p (A' I B) = -'-:::--;c::-c-_!_
P(B) 0.22 P( 41h card oil I 3 oils picked)=~= 0.2041
49
they were, P(A' 1 B) would be the same as P(A').
P(5 1h card oil l4o~~s picked)=
_2_=0.1875
6.72 Although this exercise does not call for a tree diagram, one is shown below. The numbers 48
on the right side of the tree are found by the multiplication rule; for example, P("regular" and (c) The product of these conditional probabilities gives the probability of a flush in spades by the
"?: $20") = P(R and T) = P(R) x P(T I R) = (0.4)x(0.3) = 0.12. The probability that the next
extended multiplication rule: We must draw a spade, and then another, and then a third, a fourth,
and a fifth. The product of these probabilities is about 0.0004952. (d) Since there are four
customer pays at least $20 is P(T) = 0.12 + 0.175 + 0.15 = 0.445.
possible suits in which to have a flush, the probability of a flush is four times the probability
Grade ?:$20 found in (c), or about 0.001981.

6. 77 First, concentrate on spades. The probability that the first card dealt is one of those five
0.3

R""l<
Yes 0.12 cards (A oil, Ko~~, Qo~~, J oil, or I Oo~~) is 5/52. The conditional probability that the second is one of
those cards, given that the first was, is 4/51. Continuing like this, we get 3/50, 2/49, and finally
1148; the product of these five probabilities gives P(royal flush in spades) = 0.00000038477.
No 0.28 Multiplying by four (there are four suits) gives P(royal flush)= 0.000001539.
0.7
0.4 0.5 Yes 0.175 6.78 Let G ={student likes Gospel} and C ={student likes country}. See the Venn diagram in

Customer Midrange~ the solution to Exercise 6.70. (a) P(G I C)= P(G and C)IP(C) = 0.1/0.4 = 0.25. (b) P(G I not C)=
P(G and not C)IP(not C)= 0.2/0.6 = 1/3 = 0.3333.

~
0.35
No 0.175
P(AnB) 0.082 . .
6.79 P(A I B)= - - = 0.3142. If A and B were mdependent, P(A I B) would equal
0.25 P(B) 0.261

<:
Yes 0.15
P(A) and also P(A and B) would equal P(A)xP(B).
Premium
NoO.IO
0.4 6.80 P(at least $100,000) = 10,855,000/129,075,000 = 0.0841; P(at least $1 million)=
240,000/129,075,000 = 0.0019. (b) P(at least $1 million I at least $100,000) = 0.0019/0.0841 =
0.0226.
. I$ O) P(Premium n$20) 0.15
- - = 0.337 .
Ab out 34.,
6 .73 P (Prem1um 2 = ;o. 6.81 Let I = {infection occurs} and F = {operation fails}. The probability of interest can be
P($20) 0.445
written as P(I' n F'). Using the given information that P(I) = 0,03, P(F) = 0.14, and P(I and F)=
O.o!, 84% of these operations succeed and are free from infection. P(I' n F') = I - P(I or F)= I
6.74 P(A and B)= P(A) P(B I A)= (0.46)(0.32) = 0.1472. - (0.03 + 0.14- 0.01) = 0.84.
6.75 Let F ={dollar falls} and R ={renegotiation demanded}, then P(F and R) = P(F)xP(RIF) =
6.82 (a) A tree diagram is shown below.
(0.4)x(0.8) = 0.32.

6.76 (a) & (b) These probabilities are:


13
P(l'1 card o~~) = - = 0.25
52 .
164 Chapter 6 Probability and Simulation: The Study of Randomness 165

6.87 Let W be the event "the person is a woman" and P be "the person earned a professional

<
degree." (a) P(W) = 1119/1944 == 0.5756. (b) P(W I P) = 39/83 == 0.4699. (c) Wand Pare not
0.9985 EIA+
independent; if they were, the two probabilities in (a) and (b), P(W) and P(W I P), would be
equal.
Antibody present
6.88 (a) P(Jack) = 1113. (b) P(5 on second I Jack on first)= 1/12. (c) P(Jack on first and 5 on
0.01
EIA- second)= PGack on first) x P(5 on second I Jack on first)= (1113) x (1112) = 1/156. (d) P(both
0.0015
cards greater than 5) = P(first card greater than 5) x P(second card greater than 5 I first card
Subject
greater than 5) = (8/13) x (7/12) = 56/156 =
0.359.

6.89 Let M be the event "the person is a man" and B be "the person earned a bachelor's degree."

0.99

Antibody absent < 0.006

0.994
EIA+

EIA-
=
(a) P(M) = 825/1944 0.4244 (b) P(BIM) = 559/825
=
=
0.6776 (c) P(M n B)= P(M) P(B I M)
= (0.4244)x(0.6776) 0.2876. This agrees with the directly computed probability: P(M and B)
= 559/1944 == 0.2876.

6.90 (a) P(C) = 0.20, P(A) = 0.10, P(A I C)= 0.05. (b) P(A and C)= P(C) x P(A I C)=
(0.20)x(0.05) = 0.01.

(b) P(test pos) = P(antibody and test pos) + P(no antibody and test pos.) = O.Olx0.9985 + 1
6.91 P(C 1 A) = P(C n A) O.O = 0.10 , so 10% of A students are involved in an accident.
0.99x0.006 = 0.016. (c) P(antibody I test pos) = P(antibody and test pos)/P(test pos) = P(A) 0.10
O.Ql x0.9985/0.016 = 0.624
0
6.92 IfF= {dollar falls} and R ={renegotiation demanded}, then P(R) = P(F and R) + P(F and
. 0.0009985 . . R) = 0.32 + P(F 0 ) P(R I F 0 ) = 0.32 + (0.6)x(0.2) = 0.44.
6.83 (a) P(antibody I test pos.) = 0.1428. (b) P(ant1body I test pos)
0.0009985 + 0.005994 .
6.93 P(correct) = P(knows answer)+ P(doesn't know, but guesses correctly)= 0.75 +
. 0.09985
+ . 09487 (c) A positive
.. result does not always indicate that the antibody is
(0.25)(0.20) = 0.8.
0 09985 0 0054
present. How common a factor is in the population can impact the test probabilities. Correct
0.75
0.75
P( En W} = P( E)x P(W 1 E)= 0.15x 0.8 = 0.12.
6.84 By the multiplication rule,
P(EnW) 0.12 Knows answer
Therefore, P(EIW)= ( -=0.2.
P W) 0.6 Correct
O.Z Guess correclly - - O.OS

<
0.25
6.85 (a) P(switch bad)= 0.1, P(switch OK)= I- P(switch bad)= 0.9. (b) Of the 9999 No

remaining switches, 999 are bad. P(second bad I first bad) = =


999
9999
0.09991. (c) Of the 9999 0.8
. Incorrect
Guess mcorrectly - - O.ZO

. . sw1tc
remammg - - =0.10001.
. hes, 1000 are bad. P(second bad I first good) =1000
9999 6.94 The tree diagram is shown below. The black candidate expects to get 12% + 36% + 10% =
58% of the vote.
6.86 (a) P(chemistry) = 119/399 == 0.2982. About 30% of all laureates won prizes in chemistry.
=
(b) P(US) = 215/399 0.5388. About 54% of all laureates did research in the United States. (c)
P(US I phys-med) = 90/142 == 0.633&. About 63% of all physiology/medicine laureates did
=
research in the United States. (d) P(phys-med I US)= 90/215 0.4186. About 42% of all
laureates from the United States won prizes in physiology/medicine.
166 Chapter 6
T Probability and Simulation: The Study of Randomness 167

(3) The probability that the alarm will sound (incorrectly) when scanning luggage which does not
Race Vote contain explosives, guns, or knives is 0.3. P(alarm sounds I no explosives, guns, or knives)=
0.3.
(4) A tree diagram is shown below.
For 0.12

~
Positive 0.00006
0.6
White Bomb
0.7 Against 0.28 1/10,000
Negah
0.4 0.00004
~ For0.36 0.4
Positive
0.29997
Voter Black~ 0.
0.4 No Bomb
0.1 Against 0.04 9999/10,000
0.2 Negative
~ For0.10 0.69993
0.7
Hispanic~ Since 40% of explosives are not detected, the probability of not detecting a suitcase containing a
bomb is P(negative I bomb)= 0.4 and P(positive I bomb)= I - 0.4 = 0.6. The probability that a
O.S Against 0.10
suitcase contains a bomb and is detected is P(bomb and positive)= P(bomb)xP(positive I bomb)
= 0.00006. The probability that a suitcase contains a bomb and it is not detected is P(bomb and
negative)= P(bomb)xP(negative I bomb)= 0.00004.
6.95 P(knows the answer I gives the correct answer)= 0.75/0.80 = 0.9375. (5) Since the occurrence of false-positives is 30%, we know that P(positive I no bomb)= 0.3 and
P(negative I no bomb)= 0.7 The probability that a suitcase contains no bomb and the alarm does
6.96 The event {Y < 1/2} is the bottom half of the square, while {Y >X} is the upper left not sound is P(no bomb and negative)= P(no bomb)xP(negative I no bomb)= 0.69993.
tnangle of the square. They overlap in a triangle with area l/8, so

P(Y<~andY>X)
6.97 (a) A single run: spin the 1-10 spinner twice; see if the larger of the two numbers is larger
I ) 1/8 I than 5. The player wins if either number is 6, 7, 8, 9, or 10. (b) !fusing the random digit table,
P ( Y < -I Y > X = ----'-----=~----=-,--_L -=- let 0 represent 10, and let the digits 1-9 represent themselves. (c) randlnt (1, 10, 2). (d) In our
2 P(Y>X) 1/2 4
simulation of20 repetitions, we observed 13 wins for a 65% win rate. Note: Using the methods
of the next chapter, it can be shown that there is a 75% probability ofwinning this game.

6.98 (a) Let 01 to 05 represent demand for 0 cheesecakes, 06 to 20 represent demand for I
cheesecake, 21 to 45 represent demand for 2 cheesecakes, 46 to 70 represent demand for 3
cheesecakes, 71 to 90 represent demand for 4 cheesecakes, and 91 to 99 and 00 represent
demand for 5 cheesecakes. The average number of cheesecakes sold on 30 consecutive days was
2.667. (b) Our results suggest that the baker should make 2 cheesecakes each day to maximize
his profits.

6.99 (a) Since Carla makes 80% of her free throws, let a single digit represent a free throw, and
CASE CLOSED! let 0-7 :::::>"made free throw" and 8, 9 :::::>"miss." (b) We instructed the calculator to simulate a
(I) ~ false-negative is when the alarm f~ils to go off for a suitcase containing explosives, guns, free throw, and store the result in L 1 Then we instructed the calculator to see if the attempt was a
or knives. hit (I) or a miss (0), and record that fact in L2 Continue to press ENTER until there are 20
(2) A false-negative is much more serious than a false-positive. A potential tragedy could occur
simulated free throws.
with a false-negative. A false-positive may lead to embarrassment and frustration, but nobody
will be physically harmed.

L I
168 Chapter 6 Probability and Simulation: The Study of Randomness 169

lit IL:a:

.
S=all people
hired in the past 5
Scroll through L2 and determine the longest string of I 's (consecutive baskets). This is one years
repetition. In our first set of20 repetitions, we observed 9 consecutive baskets. Additional sets
of20 free throws produced streaks oflength: 5, 10, 5, 10, 7, 6, 18, 5, 11, 11, 11, 8, 6, 4, 6, 6, 8,
11, and 5. (c) The average streak length was 8.1 consecutive baskets in 20 attempts. Most
students are surprised by the average length of a streak. Other descriptive statistics, including
the five-number summary are shown below.
Variable N N* Mean SE Mean StDev Minimum Ql Median Q3
Streak 20 0 8.100 0.750 3.354 4.000 5.250 7.500 10.750
6.105 Let H ~ {adult belongs to health club} and G ~ {adult goes to club at least twice a week}.
Variable_., Maximum
Streak 18.000 P(G and H)~ P(H) x P(G 1 H)~ (0.1) x (0.4) ~ 0.04.

6.100 (a) All probabilities are greater than or equal to 0, and their sum is I. (b) Let R 1 be Taster 6.106 P(B I A)~ P(both tosses have the same outcome I head on first toss)~ P(both
1's rating and R2 be Taster 2's rating. Add the probabilities on the diagonal (upper left to lower heads)/P(head on first toss)~ 0.25/0.5 ~ 0.5. P(B) ~ P(both tosses have same outcome)~ 2/4 ~
right): P(R 1 ~ R2) ~ O.o3 + 0.08 + 0.25 + 0.20 + 0.06 ~ 0.62. (c) P(Rt > 3) ~ 0.39 (the sum of the 0.5. Since P(B I A)~ P(B), events A and Bare independent.
ten numbers in the bottom two rows) (d) P(R2 > 3) ~ 0.39 (the sum ofthe ten numbers in the
right two columns). Note that because the matrix"is symmetric (relative to the main diagonal), 6.107 Let R 1 be Taster 1's rating and R2 be Taster 2's rating. P(R 1 ~ 3) ~ 0.01 + 0.05 + 0.25 +
these probabilities agree. 0.05 + 0.01 ~ 0.37 and P(R2 > 3 n R 1 ~ 3) ~ 0.05 + O.ot ~ 0.06, so
P(JS >3IR ~3)~ P(JS >3nR, =3) 0.06 =.1622
6.101 (a) P(Type AB) ~ 1 - (0.45 + 0.40 + 0.11) ~ 0.04. (b) P(Type B or Type 0) ~ 0.11 + 0.45 I P(R, = 3) 0.37
~ 0.56. (c) Assuming that the blood types for husband and wife are independent, P(Type Band
Type A)~ 0.11x0.40 ~ 0.044. (d) P(Type Band Type A)+ P(Type A and Type B)~ O.J1x0.40
6.108 The response will be "no" with probability 0.35 ~ 0.5x0.7. If the probability of plagiarism
+ 0.40x0.11 ~ 0.088 (e) P(Husband Type 0 or Wife Type 0) ~ P(Husband Type 0) + P(Wife
2 were 0.2, then P(student answers "no")~ 0.4 ~ 0.50x0.8. If39% of students surveyed answered
Type 0)- P(Husband and Wife both Type 0) ~ 0.45 + 0.45- (0.45) ~ 0.6975.
"no," then we estimate that 2 x 39% ~ 78% have not plagiarized, so about 22% have
plagiarized.
6.102 (a) P(both have Type 0) ~ P(American has 0) x P(Chinese has 0) ~ 0.45x0.35 ~ 0.1575.
(b) P(both have same Type)~ 0.45x0.35 + 0.4x0.27 + 0.11x0.26 + 0.04x0.12 = 0.2989. Plagiarized
Student
response

6.103 (a) To find P(A or C), we would need to know P(A and C). (b) To find P(A and C), we
would need to know P(A or C) or P(A I C) or P(C I A). Tails
v..
0.5
0.5
6.104 P(D) ~ P(A and D)+ P(B and D)+ P(C and D)~ O.r+ 0.1 + 0.2 = 0.4
Coin Flip

y.,
0.15
0.5

No
No -- 0.35
170 Chapter 7 Random Variables 171

Chapter 7 = 0.5) = 0.

7.1 (a) P(less than 3) = P(1 or 2) = 2/6 = 1/3. (b)-(c) Answers will vary. 7.8 (a) P(O ~X~ 0.4) = 0.4. (b) P(0.4 ~X~ 1) = 0.6. (c) P(0.3 ~X~ 0.5) = 0.2. (d) P(0.3 <X
< 0.5) = 0.2. (e) P(0.226 ~X~ 0.713) = 0.713-0.226 = 0.487. (f) A continuous distribution
assigns probability 0 to every possible outcome. In this case, the probabilities in (c) and (d) are
the same because the events differ by 2 possible values, 0.3 and 0.5, each of which has
probability 0.

5
7.9 (a) P(p z 0.45) = P(z z 0.4 -0.4) = P(Z z 2.08) = 0.0188. (b) P(p < 0.35) =
0.024
7.3 (a) "At least one nonword error" is the event {X 2:1} or {X>O}. P(X 2:1) = 1 - P(X<1) = 1-
P(X=O) = 1- 0.1 = 0.9. (b) The event {X,; 2} is "no more than two nonword errors," or "fewer p(z z 0.3 5 -0.4)= P(Z < -2.08) = 0.0188. (c) P(0.35
0.024.
~ p ~ 0.45)= P(-2.08,; Z ~ 2.08) =
than three nonword errors." P(X,; 2) =(X= 0) + P(X = 1) + P(X = 2) = 0.1 + 0.2 + 0.3 = 0.6.
P(X < 2) = P(X = 0) + P(X = 1) = 0.1 + 0.2 = 0.3. 0.9812-0.0188 = 0.9624.

7.4 The probability histograms are shown below. The distribution of the number of rooms is 7.10 Answers will vary. For a sample of 400 observations from the N(0.4, 0.024) distribution,
roughlx symmetric for owners (graph on the left) and skewed to the right for renters (graph on there were 9 values below 0.35. Thus, the relative frequency is 9/400 = 0.0225, which is close to
j:he right). The center is slightly over 6 units for owners and slightly over 4 for renters. Overall, but slightly higher than the value from Exercise 7.9 (b).
units tend to have fewer rooms than i;i~~~
7.11 (a) The 36 possible pairs of"up faces" are (1, 1) (1, 2)(1, 3) (1, 4)(1, 5) (1, 6)(2, 1)(2, 2) I
(2, 3) (2, 4) (2, 5) (2, 6) (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) !I
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) (6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6) (b) Each pair must have
probability 1/36. (c) Let X= sum of up faces. The sums, outcomes, probabilities, and a
probability histogram are shown below.
Sum Outcomes Probability
X=2 (1, 1) p = 1/36
X=3 (1,2)(2,1) p = 2/36
X= 4 (1, 3) (2, 2) (3, 1) p = 3/36
X= 5 (1, 4) (2, 3) (3, 2) (4, 1) p =4/36
X= 6 (1, 5) (2, 4) (3, 3) (4, 2) (5, 1) p = 5/36
X= 7 (1, 6) (2, 5) (3, 4) (4, 3) (5, 2) (6, 1) p = 6/36
7.5 (a) "The unit has five or more rooms" can be written as {X z 5}. P(X 2 5) = P(X = 5) + P(X p = 5/36
I
X= 8 (2, 6) (3, 5) (4, 4) (5, 3) (6, 2)
= 6) + P(X=7) + P(X=8) + P(X=9) + P(X = 10) = 0.868. (b) The event {X> 5} is "the unit has X= 9 (3, 6) (4, 5) (5, 4) (6, 3) p = 4/36
more than five rooms." P(X > 5) = P(X = 6) + P(X = 7) + P(X=8) + P(X=9) + P(X = 10) = X= 10 (4, 6) (5, 5) (6, 4) p = 3/36
0.658. (c) A discrete random variable has a countable number of values, each of which has a X= 11 (5, 6)(6, 5) p = 2/36
distinct probability (P(X = x)). P(X z 5) and P(X > 5) are different because the first event 2 p = 1/36
contains the value X = 5 and the second does not.

7.6 (a) P(T=2) = 1-0.37 = 0.63 and P (T=3) = 0.37x0.63 = 0.2331. (b) P(T~) is the
probability that no more than two people will pass on your message.
P(T,; 4) = P(T = 2)+ P(T = 3)+ P(T =4) = 0.63+ 0.37x 0.63+ 0.37 2 x 0.63 0.9493. =
7.7 (a) P(X < 0.49) = 0.49. (b) P(X,; 0.49) = 0.49. Note: (a) and (b) are the same because
there is no area under the curve at any one particular point. (c) P(X 2 0.27) = 0.73. (d) P(0.27
<X< 1.27) = P(0.27 <X< 1) = 0.73. (e) P(0.1,; X,; 0.2 or 0.8,; X,; 0.9) = 0.1 + 0.1 = 0.2. (f)
P(not [0.3,; X,; 0.8]) = 1 - 0.5 = 0.5. Or P(O,; X< 0.3 or 0.8 <X,; 1) = 0.3 + 0.2 = 0.5 (g) P(X
172 Chapter 7
T Random Variables 173

P(X=6) + P(X=7) + P(X=8) + P(X=9) + P(X=IO) + P(X=ll) + P(X=l2) = 0.983. (d) P(X > 6) =
(d) P(X=7 or X=11) = 6/36 + 2/36 = 8/36 or 2/9. (e) P (any sum other than 7) = P(X j 7) =I -
I- 0.010-0.007-0.007 = 0.976. Or P(X > 6) = P(X=7) + P(X=8) + P(X=9) + P(X=IO) +
P(X = 7) = I - 6/36 = 30/36 = 5/6.
P(X=ll) + P(X=l2) = 0.976. (e) Either X~ 9 or X> 8. The probability is P(X ~ 9) = P(X=9) +
7.12 (a) All of the probabilities are between 0 and I, and both sets of probabilities sum to I. (b) P(X=IO) + P(X=II) + P(X=12) = 0.068 + O.o70 + 0.041 + 0.752 = 0.931.
Both distributions are skewed to the right. However, the event {X= I} has a much higher
probability in the household distribution. This reflects the fact that a family must consist of two 7.16 (a) LetS= {student supports funding} and 0 ={student opposes funding}. P(SSO) =
or more persons. A closer look reveals that all of the values above one, except for 6, have 0.6x0.6x0.4 = 0.144. (b) The ~ossible combinations are SSS, SSO, SOS, OSS, SOO, OSO,
2
slightly higher probabilities in the family distribution. These observations and the fact that the OOS, and 000. P(SSS) = 0.6 = 0.216, P(SSO) = P(SOS) = P(OSS) = 0.6 x0.4 = 0.144,
3
mean and median numbers of occupants are higher for families indicates that family sizes tend to P(SOO) = P(OSO) = P(OOS) = 0.6x0.42 = 0.096, and P(OOO) = 0.4 = 0.064. (c) The
probability distribution of X is given in the table below. The probabilities are found by adding
than household sizes in the U.S.
the probabilities from (b). For example, P(X =I)= P(SSO or SOS or OSS) = 0.144 + 0.144 +
0.144 = 3x0.144 = 0.432. (d) The event "a majority of the advisory board opposes funding" can
be written as {X 2:: 2} or {X> 1}. The probability of this event is P(X 2:: 2) = 0.288 + 0.064 =
0.352.
ValueofX 0 I 2 3
Probability 0.216 0.432 0.288 0.064

7.17 (a) The height should be 112 or 0.5 since the area under the curve must be 1. A graph of the
curve is shown

'

''
7.13 (a) "More than one person lives in this household" can be written as {Y > I} or {Y ~ 2}.
P(Y> !)= 1-P(Y= 1)=0.75. (b) P(2<Y:;;4)=P(Y=3)+P(Y=4)=0.32. (c)P(Y;t2)= I
- P(Y = 2) = I - 0.32 = 0.68.

7.14 (a) All of the probabilities are between 0 and I and they add to I. A probability histogram
is shown
(b) P(Y:;; I)= Jx0.5 = 0.5. (c) P(0.5 < Y < 1.3) = 0.8x0.5 = 0.4. (d) P(Y ~ 0.8) = J.2x0.5 = 0.6.

7.18 (a) The area of a triangle is .!..xbxh = 0.5x2xl = 1. (b) A sketch is shown below. P(Y <I)
2
= 0.5xJxl = 0.5. (c) A sketch is shown below. P(Y < 0.5) = 0.5x0.5x0.5 = 0.125.

(b) The event {X 2:: I means that the household owns at least one car. P(X 2:: I)= P(X =I)+ u
P(X = 2) + P(X=3) + P(X=4) + P(X = 5) = 0.91. Or P(X 2:: I)= I- P(X <!)=I- P(X = 0) 1- = (d) Answers will vary. In one simulation 94 of the 200 sums were less than I, and 20 of the 200
0.09 = 0.91. (c) P(X > 2) = P(X = 3) + P(X = 4) + P(X = 5) = 0.20, so 20% of households own
sums were less than 0.5. Thus, the relative frequencies are 0.47 and 0.1, respectively. These
more cars than a two-car garage can hold.
values are close to the theoretical values of0.5 and 0.125 in parts (b) and (c).
7.15 (a) All of the probabilities are between 0 and I and they add to 1. (b) 75.2% of fifth-graders
7.19 Answers will vary. The resulting histogram should approximately resemble the triangular
eventually finished twelfth grade. (c) P(X 2: 6) = I- 0.010- 0.007 = 0.983. Or P(X 2: 6) =
density curve of Figure 7 .8, with any deviations or irregularities depending upon the specific
174 Chapter 7 Random Variables 175

random numbers generated. Two histograms, one example from computer software (left) and
another from a are shown below. 7.23 The expected number of girls is,ux = :~:::X,p, = o(i)+l(i)+2(D+3G) = 1.5 and the
?;'~
- -\ variance is a-~ = 2)x,- f'J' p = (0-1.5) G )+(1-1.5) (~)+(2-1.5) 2 (i) + (3-1.5)
1
2 2 2
G)= 0.75
/
so the standard deviation is ax =0.866 girls.
7
/ "': ....... 7.24 The mean grade is ,u = OxO.Ol + 1x0.05 + 2x0.30 + 3x0.43 + 4x0.21 = 2.78.
~~ IN 7.25 The mean for owner-occupied units is ,u= (1)(0.003) + (2)(0.002) + (3)(0.023) + (4)(0.104)
.

+ (5)(0.210) + (6)(0.224) + (7)(0.197) + (8)(0.149) + (9)(0.053) + (10)(0.035) = 6.284 rooms.


The mean for renter-occupied units is J1 = (1)(0.008) + (2)(0.027) + (3)(0.287) + (4)(0.363) +
(5)(0.164) + (6)(0.093) + (7)(0.039) + (8)(0.013) + (9)(0.003) + (10)(0.003) = 4.187 rooms.
7.20 (a)P(p?:0.16) = P( Z?: 0.~~0~~;15 ) = P(Z?:I.09) = 1-0.8621 =0.1379. The larger value of ,u for owner-occupied units reflects the fact that the owner distribution was
symmetric, rather than skewed to the right, as was the case with the renter distribution. The
< '< 0 16) =P (0.14-0.15 <Z< 0.16-0.15) =P(-1.09<Z<l09)=08621-
(b)p(o 14 -P- "center" of the owner distribution is roughly at the central peak class, 6, whereas the "center" of
0.0092 0.0092 - - . . the renter distribution is roughly at the class 4. A comparison of the centers (6.284 > 4.187)
0.1379 = 0.7242. matches the observation in Exercise 7.4 that the number of rooms for owner-occupied units
tended to be higher than the number of rooms for renter-occupied units.
7.21 Ans;-ve~s w!ll vary. One possibility is to simulate 500 observations from the N(0.15,
0.0092) dJstnbutwn. The required TI-83 commands are as follows: 7.26 If your number is abc, then of the 1000 three-digit numbers, there are six-abc, acb, hac,
ClrList L 1 bca, cab, cba-for which you will win the box. Therefore, you win nothing with probability
randNorm (0.15, 0.0092, 500)---+ L 1 994/1000 = 0.994 and $83.33 with probability 6/1000 = 0.006. The expected payoff on a $1 bet
sortA(L 1) is J.l = $Ox0.994 + $83.33x0.006 = $0.50. Thus, in the long run, the Tri-State lottery commission
Scrollin~ through the 500 simulated ob~ervations, we can determine the relative frequency of will make $0.50 per play of this lottery game.
observatwns that are at least 0.16 by usmg the complement rule. For one simulation, there were
435 observations less than 0.16, thus the desired relative frequency is 1 - 435/500 = 65/500 = 7.27 (a) The payoff is either $0, with a probability of0.75, or $3, with a probability of0.25. (b)
0.13. The actual probability is P(p?: 0.16)= 0.1379. 500 observations yield a reasonably close For each $1 bet, the mean payoff is f1x= ($0)(0.75) + ($3)(0.25) = $0.75. (c) The casino makes
approximation. 25 cents for every dollar bet (in the long run).

7.22 The table below shows the possible observations of Y that can occur when we roll one 7.28 In Exercise 7.24, we computed the mean grade of ,u = 2.78. Thus, the variance is
standard die and one "weird" die. As in Exercise 7.11, there are 36 possible pairs of faces; "~ =co- 2.78)' (o.ot) + (1- 2.78)' ( o.o5) + (2- 2.78)' (o.3o) + (3- 2.78)' ( 0.43) + c4- 2.78)' ( o.21) =
however, a number of the pairs are identical to each other. 0.7516 and the standard deviation is ax= 0.8669.
Standard Die
I 2 3 4 5 6
7.29 The means are: fln =1x0.25 + 2x0.32 + 3x0.17 + 4x0.15 + 5x0.07 + 6x0.03 + 7xO.Ol =
0123456
0123456 2.6 people for a household and flF =I xO + 2x0.42 + 3x0.23 + 4x0.21 + 5x0.09 + 6x0.03 +
7x0.02 = 3.14 people for a family. The standard deviations are: a~ = (1- 2.6) x0.25 + (2-
2
Weird 0123456
Die 6 7 8 9 10 11 12 2 2
2.6)2 x 0.32 + (3 - 2.6)2 xO.l7 + (4 - 2.6)2 x0.15 + (5- 2.6) x0.07 + (6- 2.6) x0.03 + (7-
6 7 8 9 10 11 12
6 7 8 9 10 11 12 =
2.6ix0.01 = 2.02, and a" = ~2.02 1.421 people for a household and a~ = (1 - 3.14i(O) + (2
2
The possible values ofY are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12. Each value ofY has - 3.14) 2(0.42) + (3- 3.14)2(0.23) + (4- 3.14i(0.21) + (5- 3.14i(0.09) + (6- 3.14) (0.03) + (7
probability 3/36 = 1/12. - 3.14i(0.02) = =
1.5604, and a F = ~!.5604 1.249 people for a family. The family distribution
has a slightly larger mean than the household distribution, matching the observation in Exercise
7.12 that family sizes tend to be larger than household sizes. The standard deviation for
I
176 Chapter 7 Random Variables 177

households is only slightly larger, mainly due to the fact that a household can have only 1 sample averages for 3, 4, and 5 dice converge to 10.5, 14, and 17.5, repectively. (d) The table is
person. shown below.
Number of Dice Ex ected sum
7.30 We would expect the owner distribution to have a slightly wider spread than the renter I 3.5
distribution. Even though the distribution of renter-occupied units is skewed to the right, it is 2 7
more concentrated (contains less variability) about the "peak" than the symmetric distribution for 3 10.5
owner-occupied units. Thus, the average distance between a value and the mean is slightly 4 14
larger for owners. The variances and standard deviations are: u~ = (1 - 6.284)2 x0.003 + (2- 5 17.5
2 2
6.284) x0.002 + (3- 6.284) x0.023 + (4- 6.284) 2 xO.J04 + (5- 6.284) 2 x0.210 + (6- The greatest number of dice possible for this apple! is 10 with an expected value of 35. The
2
6.284ix0.224 + (7- 6.284) x0.197 + (8- 6.284) 2 x0.149 + (9- 6.284) 2 x0.053 + (10- expected sum is 3.5x(the number of dice).
6.284ix0.035 = 2
=
2.68934 and u 0 1.6399 rooms for owner-occupied units and <T~ =(I-
7.36 The relative frequencies obtained from invoices can be viewed as means. As more invoices
4.187ix0.008 + (2- 4.187) x0.027 + (3- 4.187) 2 x0.287 + (4- 4.187) 2 x0.363 + (5-
are examined, the relative frequencies should converge to the probabilities specified by Benford.
4.187ix0.164 + (6- 4.187) 2 x0.093 + (7- 4.187) 2 x0.039 + (8 -"4.187)2 x0.013 + (9-
The Law of Large Numbers does not say anything about a small number of invoices, but the
2
4.187ix0.003 + (10- 4.187) x0.003 = =
1.71003 and u. 1.3077 rooms for renter-occupied regularity in the relative frequencies will become apparent when a large number of invoices are
units. examined.

7.31 The graph for X~= 10 displays visible variation for the first ten sample averages, whereas 7.37 (a) The probability distribution for the new random variable a+bXis shown below.
the graph for x_ = 100 gets closer and closer to p = 64.5 as the number of observations a+bX 5 8 17
increases. This illustrates that as the sample size (represented by the integers in L 1) increases, the P(a+bX) 0.2 0.5 0.3
sample mean converges to (or gets closer to) the population mean p = 64.5. (In other words, this (b) The mean of the new variable is Pa+bX =5x0.2 + 8x0.5 + 17x0.3 = 10.1, and the variance is
exercise illustrates the law oflarge numbers graphically.) a'a+bX = (5-10 1)2 xO ' 2 + (8-10.1) 2 x0.5 + (17-IO.Jix0.3 = 21.69. (c) The mean of X is f.lx =
2.7. Using Rule 1 for means, the mean of the new variable is f.la+bX =a +bf.Jx = 2 + 3x2.7 = 10.1,
7.32 (a) The wheel is not affected by its past outcomes-it has no memory; outcomes are so the variance will stay the same as the calculation shown in part (b). (d) The variance of X
independent. So on any one spin, black and red remain equally likely. (b) The gambler is wrong is a~ = 2.41 , so Rule I for variances implies that the variance of the new variable is
again. Removing a card changes the composition of the remaining deck, so successive draws are
not independent. If you hold 5 red cards, the deck now contains 5 fewer red cards, so your
u;+bX =b'a~ =32 x 2.41 =21.69. This is exactly the same as the variance we obtained in part (b),
chance of another red decreases. so var(2 + 3X) = a;+bx = 9var(X)= 21.69. (e) Using the rules is much easier than using the
definitions. The rules are quicker and enable users to avoiding tedious calculations where
7.33 Below is the probability distribution for L, the length of the longest run of heads or tails. mistakes are easy to make.
P(You win)= P(run of I or 2), so the expected outcome is p = $2xO.l738 + -$Jx0.8262 =
-$0.4786. On the average, you will lose about 48 cents each time you play. (Simulated results 7.38 (a) Independent: Weather conditions a year apart should be independent. (b) Not
should be close to this exact result; how close depends on how many trials are used.) independent: Weather patterns tend to persist for several days; today's weather tells us
something about tomorrow's. (c) Not independent: The two locations are very close together,
Value ofL I 2 3 4 5 6 7 8 9 10 and would likely have similar weather conditions.
Probability I 88 185 127 63 28 12 5 2 1
7.39 (a) Dependent: since the cards are being drawn from the deck without replacement, the
512 512 512 512 512 512 512 512 512 512
nature of the third card (and thus the value of Y) will depend upon the nature of the first two
cards that were drawn (which determine the value of X). (b) Independent: X relates to the
7.34 No, the TV commentator is incorrectly applying the law oflarge numbers to a small
outcome of the first roll, Y to the outcome of the second roll, and individual dice rolls are
number of at bats for Tony Gwynn.
independent (the dice have no memory).
7.35 (a) The expected result of a single die is 3.5. The green mean of the apple! does not agree
7.40 The total mean is 40 + 5 + 25 = 70 minutes.
with the expected sum. As the number of tosses increases, the mean fluctuates less and stabilizes
close to the expected sum. This is called the Law of Large Numbers. (b) The expected result for
7.41 (a) The total mean is II+ 20 = 31 seconds. (b) No, the mean time required for the entire
two dice is 7. Again, the mean fluctuates and then stabilizes close to the expected sum. (c) The
operations is not changed by the decrease in the standard deviation. (c) The standard deviation
178 Chapter 7 Random Variables 179

for the total time to position and attach the part is ,)22 + 42 =4.4721 seconds. 7.45 (a) Randomly selected students would presumably be unrelated. (b) The mean ofthe
difference JlF-M = JlF -I'M = 120 - I 05 = 15 points. The variance of the difference is
7.42 (a) The total resistance T = R1 + R, is Normal with mean 100 + 250 = 350 ohms and
a~-M =a~+ a! = 28 + 35 = 2009, so the standard deviation of the difference is
2 2
standard deviation ,J2.s' + 2.8 2 =3.7537 ohms. (b) The probability is P(345,; T,; 355) =
aF-M = .J2009 =44.8219 points. (c) We cannot find the probability based on only the mean and
p( 345 350
-
3.7537
< z,;
355 350
-
3.7537
) = P(-1.332,; Z,; 1.332)= 0.9086-0.0914 = 0.8172 (Table A standard deviation. Many different distributions have the same mean and standard deviation.
Many students will assume normality and do the calculation, but we are not given any
gives 0.9082-0.0918 = 0.8164). information about the distributions of the scores.

7.43 (a) The mean is Jlx = Ox0.03 + I xO.l6 + 2x0.30 + 3x0.23 + 4x0.17 + 5xO.ll = 2.68 toys. 7.46 (a) The mean for the first die (X) is Jlx= I xl/6 + 3xl/6 + 4xl/6 + 5x 1/6 + 6x!/6 + 8xl/6
The variances of X is a~ = (0- 2.68)2 x0.03 +(I - 2.68)2xO.I6 + (2- 2.68)2 x0.30 + (3- = 4.5 spots. The mean for the second die (Y) is f.lr = I x 1/6 + 2x 1/6 + 2x 1/6 + 3x 1/6 + 3x 1/6 +
2
2.68) x0.23 + (4- 2.68ixO.l7 + (5- 2.68) 2 x0.11 =
1.7176, so the standard deviation is 4xl/6 = 2.5 spots. (b) The table below gives the possible values ofT= total sum of spots for the
=
ax= ,)1.7176 1.3106 toys. (b) To simulate (say) 500 observations of X, using the Tl-83, we two dice. Each of the 36 possible outcomes has probability 1/36.
will first simulate 500 random integers between I and I 00 by using the command: Die #I
randlnt(l,I00,500)---+ L 1 1 3 4 5 6 8
The command sortA(L 1) sorts these random observations in increasing order. We now identify 1 2 4 5 6 7 9
500 observations of X as follows: integers I to 3 correspond to X= 0, integers 4 to 19 2 3 5 6 7 8 10
correspond to X = I, integers 20 to 49 correspond to X = 2, integers 50 to 72 correspond to X = Die 2 3 5 6 7 8 10
3, integers 73 to 89 correspond to X= 4, and integers 90 to 100 correspond to X= 5. For a #2 3 4 6 7 8 9 11
sample run of the simulation, we obtained 12 observations of X= 0, 86 observations of X= I, 3 4 6 7 8 9 11
!55 observations of X= 2, 118 observations of X= 3, 75 observations of X= 4, and 54 4 5 7 8 9 10 12
observations of X = 5. These data yield a sample mean and standard deviation of x = 2.64 toys The probability distribution ofT is
and s = 1.291 toys, very close to J1 x and ax . t 2 3 4 5 6 7 8 9 10 11 12
P(T=t) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
7.44 (a) Let X denote the value of the stock after two days. The possible combinations of gains (c) Using the distribution for (b), the mean is Jlr = 2xl/36 + 3x2/36 + 4x3/36 + 5x4/36 + 6x5/36
and losses on two days are presented in the table below, together with the calculation of the + 7x6/36 + 8x5/36 + 9x4/36 + !Ox3/36 +II x2/36 + !2xl/36 = 7 spots. Using properties of
corresponding values of X. means (the mean of the sum is the sum of the means) from (a), f.lr = Jlx + Jlr = 4.5 + 2.5 = 7
1st day 2nd day Value of X spots.
Gain 30% Gain 30% 1000 + 0.3x!OOO = 1300
1300 + 0.3x1300 = 1690
7.47 (a) The mean temperature iSJlx = 550C. The variance is a~= 32.5, so the standard
Gain30% Lose25% 1000 + 0.3x!OOO = 1300
1300- 0.25x1300 = 975 deviation is ax= ,)32.5 =5.7009 C. (b) The mean number of degrees off target is 550- 550 =
Lose 25% Gain 30% 1000- 0.25x!OOO = 750 0C, and the standard deviation stays the same, 5.7009C, because subtracting a constant does
750 + 0.3x750 = 975
Lose 25% Lose25% 1000- 0.25x!OOO = 750 not change the variability. (c) In degrees Fahrenheit, the mean is Jlr = ~ Jlx +32 = 1022 F and
750- 0.25x750 = 562.50
Since the returns on the two days are independent and P(gain 30%) = P(lose 25%) = 0.5, the
probability of each of these combinations is 0.5x0.5 = 0.25. The probability distribution of X is
the standard deviation isrrr = ~(~ J ~ =
a', = ( )rr, 10.2616 F.
therefore
X ]690 975 562.5 7.48 Read two-digit random numbers from Table B. Establish the correspondence 01 to 10 ~
P(X- x) 0.25 0.5 0.25 540, II to 35 ~545, 36 to 65 ~ 550, 66 to 90 ~ 555, and 91 to 99, 00 ~ 560. Repeat many
The probability that the stock is worth more than $1000 is P(X = 1690) = 0.25. (b) The mean times, and record the corresponding temperatures. Average the temperatures to approximate
value of the stock after two days is Jlx = 1690x0.25 + 975x0.5 + 562.5x0.25 = 1050.625, or Jlx; find the standard deviation of the temperatures to approximate ax. In one simulation with
approximately $1051.
180 Chapter 7
T
I Random Variables 181

200 repetitions, the sample mean of 550.03C is very close to flx and the standard deviation of I scores, 218,219, and 220, at the value of219, the probability provided will be divided by three.
5.46C is slightly smaller than O"x . I Thus, the approximate probability that Blaylock would score exactly 218 is 0.24/3 = 0.08. Thus,
P(X ~ 218) = 0.07 + 0.16 + 0.23 + 0.08 = 0.54. The probability that Blaylock's score would be
7.49 (a) Yes. The mean of a sum is always equal to the sum of the means. (b) No. The no more than 220 is P(X ~ 220) = 0.07 + 0.16 + 0.23 + 0.24 = 0.70. According to this
probabilitydistribution,P(209~X~218) = P(X~218) =0.54.
variance of the sum is not equal to the sum of the variances, because it is not reasonable to
assume that X and Y are independent.
I
i 7.53 Let V =vault, P =parallel bars, B =balance beam, and F =floor exercise. early's
7.50 (a) The machine that makes the caps and the machine that applies the torque are not the expected score is flv+P+B+F = flv + flp + flB + flF = 9.314 + 9.553 + 9.461 + 9.543 = 37.871
same. (b) Let T denote the torque applied to a randomly selected cap and S denote the cap 2
points. The variance of her total score is a~+P+B+F =a;+ O"; + ai +a~ 2
strength. Tis N(7, 0.9) and Sis N(IO, 1.2), soT-S is Normal with mean 7- 10 = -3 inch- = 0.216 + 0.122 +

pounds and standard deviation ../0.9 2 + 1.22 = 1.5 inch-pounds. Thus, P(T > S) = P(T- S > O) = 0.203 2 + 0.0992 = 0.!126, so av+P+B+F = .Jo.ll26 = 0.3355points. The distribution ofCarly
P(Z > 2) = 0.0228. Patterson's total score Twill be N(37.871, 0.3355). The probability that she will beat the score of
38 211 37 871
38 211 is P(T > 38.211) = P(z > - ) = P(Z > 1.0134) = 0.1554 (Table A gives
7.51 (a) The variance of the number of trucks and SUVs isO"~ = (0- 0.7) 2 x0.4 + (1- 0.7)2 x0.5 . 0.3355
2
+(2-0.7) xO.l =0.41 so O"v =.J0.41=0.6403vehicles. (b) Thevarianceoftotalsalesis 0.1562).
a~+Y = O"l + O"i = 0.89 + 0.41 = 1.3, so the standard deviation of total sales is 7.54 (a) The 16 possible outcomes are shown in the table below, with Ann's choice first and
O"x+Y = .Jl3 = 1.1402 vehicles. (c) The variance of Linda's estimated earnings is Bob's choice second.
(A, A) (A, B) (A, C) (A, D) (B, A) (B, B) (B, C) (B,D)
a:sox+4oov =350'a~ +400'0"i = 350' x0.89+4002 x0.41 = 174,625, so the standard deviation is
0 2 -3 0 -2 0 0 3
(}3SOX+400Y =../174,625 =$417.88. (C, A) (C, B) (C, C) (C, D) (D, A) (D,B) (D,C) (D, D)
3 0 0 -4 0 -3 4 0
7.52 Let Land F denote the respective scores of Leona and Fred. The difference L- F has a (b) The values of X, Ann's winnings on a play, are listed below each possible outcome above.
Normal distribution with mean flL-F = 24- 24 = 0 points and standard deviation (c) The probability distribution of X is shown below.
X ~ ~ ~ 0 2 3 4
aL-F = ../2' + 2' = 2.8284 points. The probability that the scores differ by more than 5 points is P(X- x) 1116 2116 1116 8/16 1116 2/16 1116

2~8~~4 ) = P(IZI > 1.7678) = 0.0771 (Table A gives 0.0768).


(d) The mean winnings isf1x = $0, because the distribution is symmetric about 0. Thus, the
P(IL- Fl > 5) = P(1z1 >
game is fair. The variance is O"~ = (-4)2 xll16 +(-3) 2 x2/16 + (-2) xl/16 + 0 x8116 + 2 xll16
2 2 2

CASE CLOSED! + 32 x2/16 + 42 xl/16 = 4.75, so the standard deviation of the winnings is ax = .J4.75 = $2.18.
I. The random variable X of interest is the possible score in the golf tournament.
7.55 The missing probability is 0.99058 (so that the sum is 1). The mean earnings is
2. Yes, all of the probabilities are between 0 and I, and they sum to 1. flx = $303.35.

3. The expected score is JJ.x = 210x0.07 + 213x0.16 +216x0.23 + 219x0.24 + 222xO.l7 + 7.56 The mean 11 x of the company's "winnings" (premiums) and their "losses" (insurance
225x0.09 + 228x0.03 + 231x0.01 = 218.16 strokes. claims) is about $303.35. Even though the company will lose a large amount of money on a
small number of policyholders who die, it will gain a small amount from many thousands of21-
4. The variance is a~ = (210-218.16) 2 x0.07 + (213-218.!6) 2 xO.l6 + (216-218.16) 2 x0.23 + year-old men. In the long run, the insurance company can expect to make $303.35 per insurance
2 policy. The insurance company is relying on the Law of Large Numbers.
(219-218.!6) X0.24 + (222-218.16) 2 xO.I7 + (225-218.16) 2 x0.09 + (228-218.!6) 2 x0.03 +
2
(231-218.16) xO.Ol = 21.4344 and the standard deviation is ax= .J21.4344 = 4.6297 strokes.
7.57 The variance is a~ = 94,236,826.64, so the standard deviation is O" x = $9707.57.
5. To find the probability that Blaylock's score would be 218 or less, the probability that she
would score exactly 218 needs to be approximated. Since the discrete distribution includes three 7.58 (a) Using properties of means, the mean ofZ is flz = 0.5f1x +0.5f1r = 0.5x$303.35 +
0.5x$303.35 = $303.35. Using properties of variances, the variance ofZ is
182 Chapter 7
T
I Random Variables 183

a; = 0.25al + 0.25a~ = 0.5x94,236,826.64 = 47, 118,413.32, so the standard deviation is


I P(X=I)=l/2
P(X = 2) = (l/2)x(l/2) = 1/4
a 2 = ~0.5al = $6864.29. (b) For 4 men, the expected value of the average income is P(X = 3) = (l/2)x(l/2)x(l/2) = 1/8
Jlz = 0.25J1xl + 0.25J1x2 +0.25J1xl +0.25J1x4 = $303.35; the same as it was for one policy and two P(X = 4) = (l/2)x(l/2x(l/2)x(l/2) + (l/2)x(l/2)x(l/2)x(l/2) = 1/16 + 1/16 = 1/8
policies. The variance of the average income is Thus, the probability distribution of X is
a; = 0.0625alI + 0.0625al2 + 0.0625al3 +0.0625ax'4 = 0.25ax'I = 23,559,206.66, so the X I I 2 3 4

standard deviation is a z = ~0.25al = $4853.78 (smaller by a factor of I/ .fi ).

7.59 The distribution of the difference X- Y is N(O, .Jo.3' + 0.3' ) "'N(O, 0.4243) so
7.63 (a) A single random digit simulates each toss, with (say) odd= heads and even= tails. The
P(IX- Yi :2: 0.8) = P(IZI :2: 1.8856) = 0.0593 (Table A gives 0.0588). first round is two digits, with two odds a win; if you don't win, look at two more digits, again
with two odds a win. Using a calculator, you could use randlnt(O, I, 2) which provides 2 digits
7.60 (a) The mean profit is Jlx= I xO.l + ].5x0.2 + 2x0.4 + 4x0.2 + !OxO.l = $3 million. The either a 0 (tail) or !(head). (b) Using a calculator, in 50 plays (remember, unless you win, a
"play" consists of"4 tosses of the coin" or 2 simulations of obtaining 2 random numbers) I
variance is ai = (1-3) 2 xO.l + (1.5-3) 2 x0.2 + (2-3) 2 x0.4 + (4-3) 2 x0.2 + (!0-3) 2 xO.l = 6.35,
obtained 25 wins for an estimate of$0. (c) The monetary outcome X can be $1 or -$1. To win a
so the standard deviation is ax= .J6.35 = $2.5199million. (b) The mean and standard deviation dollar, you can win on the first round by getting 2 heads or by winning on the second round by
ofY are Jlr = 0.9 Jlx -0.2 = 0.9 x $3-0.2 = $2.5 million and not getting 2 heads on the first round, and then getting two heads on the second round. So the

a, =~0.9'ai =.J0.9 2 x6.35 =$2.2679million. probability of winning is . !. + (~) (..!..) = 2_. So, the expected value is
4 4 4 16

7.61 (a) The mean of the difference Y-X is Jlr-x = J1y- Jlx = 2.001-2.000 = O.OO!g. The ($1)(2_)+(-$1)(i_) = _2_= $0.125.
16 16 16
variance of the difference is a~-x =a~+ a~ = 0.002 + 0.001
2 2
= 0.000005 so a,_x=
7.64 (a) The value of d 1 is 2x0.002 = 0.004 and the value of d, is 2xO.OOI = 0.002. (b) The
0.002236g. (b) The expected value of the average is Jlz =..!.. Jlx +..!_ Jlr = 2.0005g. The variance
2 2 standard deviation of the total length X+ Y + Z is a x+Y+z = .Jo.ool' + 0.002 2 +0.0012 = 0.0024,
of the avearge a; = .!..,.i + .!..,.~ = 0.00000125, so the standard deviation is a = 0.001118 g.
4 4
2 so d = 0.005 -considerably less than d 1 + 2d2 = 0.008. The engineer was incorrect.
The average Z is slightly more variable than the reading Y, since a z > a y.

7.62 (a) To do one repetition, start at any point in Table Band begin reading digits. As in
Example 6.6, let the digits 0, I, 2, 3, 4 =girl and 5, 6, 7, 8, 9 =boy, and read a string of digits
until a "0 to 4" (girl) appears or until four consecutive "5 to 9"s (boys) have appeared, whichever
comes first. Then let the observation of X= number of children for this repetition= the number
of digits in the string you have read. Repeat this procedure 25 times. (b) The possible outcomes
and their corresponding values of X= number of children are shown in the table below.

Outcome
G (first child is a girl) X=!
BG (second child is a girl) X=2
BBG (third child is a girl) X=3
BBBG, BBBB (fouth child is a girl of four boys) X=4
Since births are independent and B and G are equally likely to occur on any one birth, we can use
our basic probability rules to calculate
184 Chapter 8 The Binomial and Geometric Distributions 185

ChapterS
(a) P(X =II)= G~)co.8) 11 co.2)9 =0.0074 (b) P(X = 20) = G~)co.8) 20 (0.2)0 =o.OII5
8.1 Not binomial: There is not fixed number of trials n (i.e., there is no definite upper limit on
the number of defects) and the different types of defects have different probabilities.
(c) P(X < 20) = 1-P(X = 20) =l-O.Oll5 = 0.9985
8.2 Yes: I) "Success" means person says "Yes" and "failure" means person says "No." (2) We 8.11 Let X= the number of Hispanics on the committee. X is B(l5, 0.3).
have a fixed number of observations (n = 100). (3) It is reasonable to believe that each response
is independent of the others. (4) It is reasonable to believe each response has the same
3
(a) P(X = 3) = (':}0.3) (0.7)
12
=0.1701 0 15
(b) P(X = 0) =(':}0.3) (0.7) = 0.0047
probability of "success" (saying "yes") since the individuals are randomly chosen from a large
city. 8.12 Let X= the number of men called. X is B(30, 0.7).
8.3 Yes: I) "Success" means reaching a live person and "failure" is any other outcome. (2) We (a) P(X = 20) =G~)co.7) 20 (0.3) 10 =0.1416 (b) P(I'' woman is the 4th call)=
have a fixed number of observations ( n = 15). (3) It is reasonable to believe that each call is
independent of the others. (4) Each randomly-dialed number has chance p = 0.2 of reaching a
live person.
3
(0.7) (0.3) =0.1029.
8.4 Not binomial: There is no fixed number of attempts (n). 8.13 Let X= the number of children with blood type 0. X is B(5, 0.25). (a) P(X =2) =
binompdf(5, 0.25, 2) = 0.2637. (b) A table with the values of X, the pdf, and the cdfis shown
8.5 Not binomial: Because the student receives instruction after incorrect answers, her below
probability of success is likely to increase. X 0 I 2 3 4 5
pdfP(X) 0.2373 0.3955 0.2637 0.0879 0.0146 0.0010
8.6 The number who say they never have time to relax has (approximately) a binomial cdfF(X) 0.2373 0.6328 0.8965 0.9844 0.9990 1.0000
distribution with parameters n = 500 andp = 0.14. I) "Success" means the respondent "never
has time to relax" and "failure" means the respondent "has time to relax." (This is a good (c) The probabilities given in the table above for P(X) add to I. (d) A probability histogram is
example to point out why "success" and "failure" should be referred to as labels.) 2) We have a shown below on
fixed number of observations (n = 500). 3) It is reasonable to believe each response is
independent of the others. 4) The probability of"success" may vary from individual to
individual (think about retired individuals versus parents versus students), but the opinion polls
provide a reasonable approximation for the probability in the entire population.

8.7 Let X= the number of children with type 0 blood. X is B(5, 0.25).
2
P(X = 3) =G)co.25)\o.75) = 10(0.25)\0.75)2 =o.0879
8.8 Let X= the number of broccoli plans that you lose. X is B(l 0, 0.05).
(e) See the probabilities. Cumulative distribution histograms are
P(X o> 1) = P(X =0)+ P(X = 1) = ('~}0.05) 0 (0.95) 10 +('~}0.05)1(0.95)9 shown below for the number of children with type 0 blood (left) and the number of free throws
made (right). Both cumulative distributions show bars that "step up" to one, but the bars in the
= (0.95)10 + 10(0.05)(0.95)' =0.9139 cumulative histogram for the number of children with type 0 blood get taller sooner. That is,
there are fewer steps and the steps are bigger.
8.9 Let X= the number of children with blood type 0. X is B(5, 0.25).

P(X "21) =1- P(X =0) = 1-(~}0.25) 0 (0.75) 5 = 1-(0.75) 5 1-0.2373 =0.7627
=
8.10 Let X= the number of players who graduate. X is B(20, 0.8).
186 Chapter 8
T The Binomial and Geometric Distributions 187

8.19 (a) np ~ 2500x 0.6 = 1500, n(i- p) ~ 2500x0.4= 1000; both values are greater than 10, so
the conditions are satisfied. (b) Let X = the number of people in the sample who find shopping
frustrating. X is B(2500, 0.6). Then P(X ~ 1520) = 1- P(X :s; 1519) =I- binomcdf(2500, 0.6,
1519) =I- 0.7868609113 = 0.2131390887, which rounds to 0.213139. The probability correct
to six decimal places is 0.213139. (c) P(X :s; 1468) ~ binomcdf(2500, 0.6, 1468) = 0.0994.
Using the Normal approximation to the binomial, P(X :s; 1468) = 0.0957, a difference of0.0037.

8.20 Let X be the number of 1'sand 2's; then X has a binomial distribution with n = 90 and p =
0.477 (in the absence of fraud). Using the calculator or software, we findP(X :> 29) =
binomcdf(90, 0.477, 29)= 0.0021. Using the Normal approximation (the conditions are
satisfied), we find a mean of 42.93 and standard deviation of a= .J90x0.477x 0.523 = 4.7384.
8.14 Let X= the number of correct answers. X is B(50, 0.5). (a) P(X:::: 25) = I - P(X :S 24) = 1
- ) = P(Z,; -2.94) ~ 0.0016. Either way, the
29 42 93
- binomcdf(50, 0.5, 24) = 1 - 0.4439 = 0.5561. (b) P(X:::: 30) = I - P(X :S 29) = 1- binomcdf Therefore, P(X :s; 29) = P(z;:;
(50,0.5,29) = 1-0:8987=0.1013. (c)P(X:C:32)= 1-P(X:S31)= 1-binomcdf(50,0.5,31) 4.7384
= 1- 0.9675 = 0.0325. probability is quite small, so we have reason to be suspicious.

8.15 (a) L"etX =the number of correct answers. X isB(10, 0.25). The probability of at least
one correct answer is P(X ~ 1) = 1- P(X = 0) = 1- binompdf(I0,0.25,0) = 1-0.0563 = 8.21 (a) The mean is.Ux ~ np ~ 20x0.8 = 16. (b) The standard deviation is
0.9437. (b) Let X =the number of correct answers. P(X ~ 1) = I- P(X = 0). P(X = 0) is the ax ~ ~(20)(0.8)(0.2) = .J32 = 1.7889. (c) Ifp = 0.9 then ax~ .J20x 0.9x 0.1 = 1.3416, and ifp
probability of getting none of the questions correct, or every question wrong. Note that this is
not a binomial random variable because each question has a different probability of a success. = 0.99 then ax ~ .J20x 0.99x 0.01 = 0.4450. As the probability of"success" gets closer to I the
The probability of getting the first question wrong is 2/3, the second question wrong is 3/4 and standard deviation decreases. (Note that asp approaches 1, the probability histogram of the
the third question wrong is 4/5. The probability of getting all of the questions wrong is P(X = 0) binomial distribution becomes increasingly skewed, and thus there is less and less chance of
= (2/3)x(3/4)x(4/5) = 0.4, because Erin is guessing so the responses to ditlerent questions are getting an observation an appreciable distance from the mean.)
independent. Thus, P(X :::: I) = 1 - P(X = 0) = 1 - 0.4 = 0.6.
8.22 If H is the number of home runs, with a binomial(n = 509, p = 0.116) distribution, then H
8.16 (a) Yes: 1) "Success" means having an incarcerated parent and "failure" is not having an has mean .UH = np = 509x 0.116 = 59.0440 and standard deviation
incarcerated parent. (2) We have a fixed number of observations (n = 100). (3) It is reasonable a H ~ .J509x 0.116x 0.884 = 7.2246 home runs. Therefore,
to believe that the responses of the children are independent. (4) Each randomly selected child
has probability p = 0.02 of having an incarcerated parent. (b) P(X = 0) is the probability that P(H ~ 70) =. P ( Z ~ 70-59.0440)
_ ~ P(Z ~!.52)~ 0.0643. Usmg . a calculator or software, we
none of the 100 selected children has an incarcerated parent. P(X = 0) = binompdf(lOO, 0.02, 0) 7 2246
= 0.1326 and P(X= 1) = binompdf(IOO, 0.02, 1) = 0.2707. (c) P(X:::: 2) = 1- P(X :S I)= 1- find that the exact value is l-binomcdf(509, 0.116, 69) ~ 0.0763741347 or about 0.0764.
binomcdf(100, 0.02, 1) = I- 0.4033 = 0.5967. Alternatively, by the addition rule for mutually
exclusive events, P(X:::: 2) =I- (P(X = 0) + P(X = 1)) =I- (0.1326 + 0.2707) = 1-0.4033 = 8.23 (a) Let X ~the number of people in the sample of 400 adults from Richmond who approve
0.5967. of the President's response. The count X is approximately binomial. I) "Success" means the
respondent "approves" and "failure" means the respondent "does not approve." 2) We have a
8.17 LetX=thenumberofplayerswhograduate. XisB(20,0.8). (a)P(X=I1)= fixed number of observations (n ~ 400). 3) It is reasonable to believe each response is
binompdf(20, 0.8, 11) = 0.0074. (b) P(X = 20) = binompdf(20, 0.8, 20) = 0.8
20
=
0.0115. (c) independent of the others. 4) The probability of"success" may vary from individual to
P(X :s; 19) = I- P(X~20) = I - 0.0015 = 0.9985. individual (think about people with affiliations in different political parties), but the national
survey will provide a reasonable approximate probability for the entire nation. (b) P(X :> 358) =
1 2 8 binomcdf(400, 0.92, 358) = 0.0441. (c) The expect number of approvals is .Ux = 400x0.92 =
8.18 (a)n=10andp=0.25. (b) P(X=2)=( ;)<o.25) (0.75) =binompdf(l0,0.25,2) =
368 and the standard deviation is ax ~ .J400 x 0.92 x 0.08 ~ ,/29.44 =5.4259 approvals. (d) Using
0.2816. (c) P(X :s; 2) = P(X ~ O)+P(X ~I)+P(X ~ 2)= binomcdf(lO, 0.25, 2) = 0.5256.
the Normal approximation, P(X :s; 358) = P(z;:;
358 368
- ) ~ P(Z :s; -1.84) = 0.0329, a
5.4259
188 Chapter 8
T The Binomial and Geometric Distributions 189
1"
i
difference of 0.0112. The approximation is not very accurate, but note that p is close to 1 so the (b) Using the calculator, we find P( X > 24) = 1- P (X ,;; 24) = 1 - binomcd:l{30, 0.65, 24) = 1
exact distribution is skewed. -0.9767 = 0.0233. (c) Using the Normal approximation, we find
24 19 5
8.24 (a) The mean is f.lx = np = 1500x0.12 = 180blacks and the standard deviation is P(X > 24) = P(z > - ) = P(Z > 1.72) = 0.0427. The Normal approximation is not very
2.6125
0'x = .J1500 x 0.12 x 0.88 = 12.5857 blacks. The Normal approximation is quite safe: n xp = 180 good in this situation, because nx(l-p) = 10.5 is very close to the cutoff for our rule of thumb.
and nx(1- p) = 1320 are both more than 10. We compute The difference between the two probabilities in (b) and (c) is 0.0194. Note that the simulation
165 180 195 180 provides a better approximation than the Normal distribution.
P(165,;; X,;; 195) = P( - ,;; Z,;; - ) = P( -1.19,;; Z o> 1.19) = 0.7660. (Exact
12.5857 12.5857
computation of this probability with a calculator or software gives 0.7820.) 8.29 Let X= the number ofOs among n random digits. X is B(n, 0.1). (a) When n = 40, P(X =
4) = binompdf(40, 0.1, 4) = 0.2059. (b) When n = 5, P(X 2 I)= I- P(X = 0) = 1- (0.9) 5 = I
8.25 The command cumSum (L2) ~ L3 calculates and stores the values ofP(X o> x) for x = 0, 1, -0.5905 = 0.4095.
2, ... , 12. The entries in L 3 and the entries in L4 defined by binomcd:l{12,0.75, Lt) ~ L4 are
identical. 8.30 (a) The probability of drawing a white chip is 15/50 = 0.3. The number of white chips
in 25 draws is B(25, 0.3).Therefore, the expected number of white chips is 25x0.3 = 7.5.
8.26 (a) Answers will vary. The observations for one simulation are: 0, 0, 4, 0, 1, 0, 1, 0, 0, and (b) The probability of drawing a blue chip is 10/50 = 0.2. The number of blue chips in
1, with a sum of7. For these data, the average is :X= 0.7. Continuing this simulation, 10 sample 25 draws is B(25, 0.2). Therefore, the standard deviation of the number of blue chips is
means were obtained: 0.7, 0.6, 0.6, 1.0, 1.4, 1.5, 1.0, 0.9, 1.2, and 0.8. The mean of these sample .J25 x 0.2 x 0.8 = 2 blue chips. (c) Let the digits 0, I, 2, 3, 4 => red chip, 5, 6, 7 => white chip, and
means is 0.97, which is close to 1, and the standard deviation of these means is 0.316, which is 8, 9 =>blue chip. Draw 25 random digits from Table Band record the number of times that you
close to 0.9847 I .JlO = 0.3114. (Note: Another simulation produced sample means of0.8, 0.9, get chips of various colors. Using the calculator, you can draw 25 random digits using the
0.5, 0.9, 1.4, 0.5, 1.6, 0.5, 1.0, and 1.8, which have an average of0.99 and a standard deviation of command randlnt (0, 9, 25)--> Ll. Repeat this process 50 times (or however many times you
0.468. There is more variability in the standard deviation.) (b) For n = 25, one simulation like) to simulate multiple draws of25 chips. A sample simulation of a single 25-chip draw using
produced sample means of 1.5, 2.2, 3.2, 2.1, 3.2, 1.7, 2.6, 2.7, 2.4, and 2.5, with a mean of2.41 the TI-83 ielded the followin result:
and a standard deviation of0.563. For n =50, one simulation produced sample means of 4.3, 0 1 2 3
5.5, 5.0, 4.7, 5.0, 5.1, 4.7, 3.8, 4.7, and 6.3, with a mean of 4.91 and a standard deviation of
0.672. (c) As the number of switches increases from 10 to 25 and then 50, the sample mean also This corresponds to drawing 14 red chips, 4 white chips, and 7 blue chips.
increases from 1 to 2.5 and then 5. As the sample size increases from 10 to 25 and then from 25 (d) The expected number of blue chips is 25x0.2 = 5, and the standard deviation is 2. It is very
to 50, the spread of :X values increases. The number of simulated samples stays the same at 10, likely that you will draw 9 or fewer blue chips. The actual probability is binomcdf(25, 0.2, 9) =
0.9827. (e) You are almost certain to draw 15 or fewer blue chips; the probability is binomcdf
but 0' changes from .J10x O.lx 0.9 = 0.9847to .J25x0.1x0.9 = 1.5 and then
(25, 0.2, 15) = 0.999998 .
.J50xO.lx0.9 =2.1213.
8.31 (a) A binomial distribution is not an appropriate choice for field goals made by the National
8.27 (a) Let S denote the number of contaminated eggs chosen by Sara. S has a binomial Football League player, because given the different situations the kicker faces, his probability of
distribution with n = 3 and p = 0.25; i.e., Sis B(3, 0.25) (b) Using the calculator and letting 0 => success is likely to change from one attempt to another. (b) It would be reasonable to use a
a contaminated egg and I, 2 or 3 =>good egg, simulate choosing 3 eggs by Randlnt(O, 3, 3). binomial distribution for free throws made by the NBA player because we have n = 150
Repeating this 50 times leads to 30 occasions when at least one of the eggs is contaminated; attempts, presumably independent (or at least approximately so), with chance of success p = 0. 8
P(S ~ 1) = = 0.6. (c) P(S z1) = 1- P(S = 0) = I - binompdf(3, 0.25, 0) = I - (0.75i =
30 each time.
50
0. 5781. The value obtained by simulation is close to the exact probability; the difference is 8.32 (a) Yes: I) "Success" means the adult "approves" and "failure" means the adult
0.0219. "disapproved." 2) We have a fixed number of observations (n = 1155). 3) It is reasonable to
believe each response is independent of the others. 4) The probability of"success" may vary
8.28 (a) We simulate 50 observations of X= the number of students out of 30 with a loan by from individual to individual, but a national survey will provide a reasonable approximate
using the command randBin (I, 0.65, 30)--> Ll: sum (Ll). Press ENTER 50 times. Then sort probability for the entire nation. (b) Not binomial: There are no separate "trials" or "attempts"
the list from largest to smallest using the command SortD(L 1) (this command is found on the TI being observed here. (c) Yes: Let X= the number of wins in 52 weeks. I) "Success" means Joe
83/84 under Stat--> EDIT--> 3:SortD) and then look to see how many values are greater than 24. "wins" and "failure" means Joe "loses." 2) We have a fixed number of observations (n =52). 3)
Only one of the simulated values was greater than 24, so the estimated probability is 1/50 = 0.02.
190
Chapter 8
T
I The Binomial and Geometric Distributions 191

I 3 0.2013
The results from one week to another are independent. 4) The probability of winning stays the
same from week to week.
I 4 0.0881
5 0.0264
I 6 0.0055
8.33 (a) Answers will va . A table of counts is shown below. I
101 107 113 119 120 126 132 138 142 146 7 0.0008
Line Number
3 5 6 3 2 3 2 3 4 9 I 8 0.0001
9 0.000004
10 0.000000
(h) The expected number of correct answers is !Ox0.25 = 2.5.
I
8.36 Let X= the number of truthful persons classified as deceptive. X is B(l2, 0.2). (a) The
probability that the lie detector classifies all 12 as truthful is

P(X = 0) = c: )co.2) 0 (0.8)


12
= 0.0687, and the probability that at least one is classified as

deceptive is P(X<': 1) = 1-P(X = 0}= 1-0.0687 =0.9313. (b) The mean number who will be
classified as deceptive is ]2x0.2 = 2.4 applicants and the standard deviation is
.J12x0.2x 0.8 = 1.3856 applicants. (c) P(X:;; 2.4) = P( X:;; 2} = 0.5583, using binomcdf(l2,
The sample mean for these I 0 is 4 zeros and the standard deviation. ~bout .2.16 zeros.
The distribution is clearly skewed to the right. (b) The number of zeros IS bmomml because I) 0.2, 2).
"Success" is a digit of zero and "failure" is any other digit. 2) The number of digits on each line
is n = 40. We are equally likely to get any of the I 0 digits in any position so 3) the trials are 8.37 In this case, n = 20 and the probability that a randomly selected basketball player graduates
independent and 4) the probability of"success" is p = 0.1 for each digit examined. (c) As the is p =0.8. We will estimate P(X :0: II) by simulating 30 observations of X =number graduated
number of lines used increases, the mean gets closer to 4, the standard deviation becomes closer and computing the relative frequency of observations that are 11 or smaller. The sequence of
to 1.8974 and the shape will still be skewed to the right because we are sim~lating ~ ~ino~ial calculator commands are: randBin(I,0.8,20) ~ L,: sum(L 1) ~ L2(1), where I 's represent
distribution with n = 40 and p = 0.1. (d) Dan is right, the number of zeros m 400 dtgtts Will be players who graduated. Press Enter until 30 numbers are obtained. The actual value of P(X ,;
approximately normal with a mean of 40 and a standard deviation of 6. As n increas?s, . . 11) is binomcdf(20, 0.8, 11) = 0.0100.
nx 0.1 > 10 and n x 0.9 > 10 so the conditions are satisfied and we can use a Normal dtstnbutwn
with a mean of nx0.1 and a standard deviation of .Jnx0.1 x0.9 to approximate the Binomial(n, 8.38 (a) I) "Success" is getting a response, and "Failure" is not getting a response. 2) We have
0.1) distribution. However, the conditions are not satisfied for one line so th~ ~imulated a fixed number of trials (n = 150). 3) It is reasonable to believe that the business decisions to
distribution will not become approximately normal, no matter how many additional rows are respond or not are independent. 4) The probability of success (responding) is 0.5 for each
business. (b) The mean is 150x0.5 = 75 responses. (c) The approximate probability is
examined.

8.34 (a)n=20andp=0.25. (b)Themeanis ,u=20x0.25=5. (c) Theprobabilityofgetting


P(X :0: 70)= P( Z :0: ~~~~;~) = P(Z :0: -0.82} = 0.2061; using unrounded values and software
yields 0.2071. The exact probability is about 0.2313. (d) Use 200, since 200x0.5 = 100.
exactly five correct guesses is P(X = 5) = ( 20) (0.25) 5 (0.75)15 =. 0.2023.
5
8.39 (a) Let X= the number of auction site visitors. X is B(l2, 0.5). (b) P(X = 8) =
8.35 Let X= the number of correctly answered questions. X is B(IO, 0.2). (a) P(X = 0) = binompdf(l2, 0.5, 8) = 0.1209; P(X:::: 8) = I - P(X ::; 7) = I- binomcdf(l2, 0.5 ,7) = I -
0.1074. (b) P(X = !) = 0.2684. (c) P(X = 2) = 0.3020. (d) P(X :0: 3) = 0.8791. (e) P(X > 3) = 0.8062 = 0.1938.
1 - P(X,; 3) = 1- 0.8791 = 0.1209. (f) P(X = 10) = 0.0000! (g) A probability distribution
table is shown below 8.40 (a) Let X= the number of units where antibodies are detected. X is B(20, 0.99) (b) The
X P(X) probability that all 20 contaminated units are detected is

(~~)(0.99) 20 (0.01) 0 = 0.8179, using binompdf(20, 0.99, 20), and the probability
0 0.1074
1 0.2685 P(X = 20) =
2 0.3020
192 Chapter 8
T
I The Binomial and Geometric Distributions 193
I.

I
that at least one unit is not detected is P(X < 20) = P(X ~ 19) == 0.1821, using binomcdf{20, 8.43 (a) 1) "Success" is a defective hard drive, "failure" is a working hard drive, and a trial is a
0.99, 19). (c) The mean is 19.8 units and the standard deviation is 0.445 units.
I test of a hard drive, 2) the probability of a defective hard drive is p = 0.03, 3) the observations,
results of the tests on different hard drives, are independent, and 4) we are waiting for the first
defective hard drive. The random variable of interest is X= number of hard drives tested in
8.41 (a) Yes: 1) "success" is getting a tail, "failure" is getting a head, and a trial is the flip of a
coin, 2) the probability of getting a tail on each flip is p = 0.5, 3) the outcomes on each flip are order to find the first defective. (b) P(X =5) = (l-0.03t' x 0.03 = 0.0266 (c) The first four
independent, and 4) we are waiting for the first tail. (b) No: The variable ofinterest is the entries in the table for the pdf of X are shown below.; P(X = x) for p =0.03 and X= 1, 2, 3 and
number of times both shots are made, not the number of trials until the first success is obtained. 4 = 0.03, 0.0291, 0.0282 and 0.0274.

I P~) I 0.~31 0.~91 I 0.0;821 0.0~741


(c) Yes: 1) "success" is getting a jack, "failure" is getting something other than a jack, and a trial
is drawing of a card, 2) the probability of drawing a jack is p = 4/52 = 0.0769, 3) the
observations are independent because the card is replaced each time, and 4) we are waiting for
the first jack. (d) Yes: 1) "success" is matching all 6 numbers, "failure" is not matching all 6 8.44 The probability of getting the first success on the fourth trial for parts (a), (c), and (d) in
numbers, and a trial is the Match 6 lottery game on a particular day, 2) the probability of winning 4-1

Exercise8.41 is:(a) P(X=4)=(1-0.St'x0.5=0.0625,(c) P(X=4)= ( !-__) x--=0.0605


is p = ( ~) =0.000.000142, 3) the observations are independent, and 4) we are waiting for the 52 52 '

first win: (e) No: the probability of a "success," getting a red marble, changes from trial to trial
4
and (d) P(X=4)=(1-y(: )f (~)x =0.0000001.

because the draws are made without replacement. Also, you are interested in getting 3 successes,
rather than just the first success.
8.45 (a) The random variable of interest is X= number of flips required in order to get the first
8.42 (a) 1) "Success" is rolling a prime number, "failure" is rolling number that is not prime, head. X is a geometric random variable with p= 0.5. (b) The first five possible values of the
and a trial is rolling a die, 2) the probability of rolling a prime is p = 0.5, 3) the observations, random variable and their corresponding probabilities are shown in the table below
outcomes from rolls, are independent, and 4) we are waiting for the first prime number. (b) The X 1 2 3 4 .5
first five possible values of the random variable and their corresponding probabilities are shown PX 0.5 0.25 0.125 0.0625 0.03125
in the table below F(X) 0.5 0.75 0.875 0.9375 0.96875
X 1 2 3 4 5 A probability histogram is shown below (left). (d) The cumulative probabilities are shown in the
P(X) 0.5 0.25 0.125 0.0625 0.03125 third row
F(X) 0.5 0.75 0.875 0.9375 0.96875 ~
.. histogram IS shown below (left). (d) The cumulative probabilities are shown in
(c) A probability
row of the table below 1""'ht 1

8.46 (a) P(X>IO)= ( 1- 1 )" =0.4189. (b) P(X>10)=1-P(X~10)= 1-geometcdf{l/12, 10)


12
=1-0. 5811 = 0.4189.
(e) f(o.s)' =~=1. 8.47 (a) The cumulative probabilities for the first 10 values of X are: 0.166667, 0.305556,
il 1-0.5
0.421296, 0.517747, 0.598122, 0.665102, 0.720918, 0.767432, 0.806193, and 0.838494. A
cumulative probability histogram is shown below.
194
Chapter 8
T
I The Binomial and Geometric Distributions 195

I
The relative frequencies are not far away from the calculated probabilities of0.75, 0.1875, and
I 0.046875 in part (d). Obviously, a larger number of trials would result in better agreement
because the relative frequencies will converge to the corresponding probabilities.
I
I 8.51 (a) Geometric: 1) "Success" is selecting a red marble, "failure" is not selecting a red
marble, and a trial is a selection from the jar, 2) the probability of selecting a red marble is
I
p= ~~ = * = 0.5714, 3) the observations, results of the marble selection, are independent because
I the marble is placed back in the jar after the color is noted, and 4) we are waiting for the first red
marble. The random variable of interest is X= number of marbles you must draw to find the
first red marble. (b) The probability of getting a red marble on the second draw is
(b)P(X>10)=(1- =0.1615or1-P(X:S10)= 1-0.8385=0.1615. (c)
Using the calculator, geometcdf(1/6,25) = 0.989517404 and geometcdf(1/6,26) = 0.9912645033, P(X = 2) = ( 1- * f' ( *) = 0.2449 . The probability of drawing a red marble by the second draw is
so the smallest positive integer k for which P( X,; k) > 0.99 is 26. I
8.48 Let X= the number of applicants who need to be interviewed in order to find one who is
P(X,; 2)= P(X =1)+ P(X = 2) = *+(% )(*) =0.8163. The probability that it takes more than 2

fluent in Farsi. X is a geometric random variable withp = 4% = 0.04. (a) The exrecte~ number
I draws to get a red marble is P(X> 2)=(1-*J =0.1837. (c) Using TI-83 commands:
of interviews in order to obtain the first success (applicant fluent in Farsi) is f1 = p = 0 _04 = 25
I seq(X,X,I,20)~L~, geompdf(417,LI)~L 2 and geomcdf(4/7,L 1 )~L 3 [or cumsum(L2 )~L 3 ]. The
40
(b) P(X > 25) = (1 - 0.04i 5 = (0.96)25 = 0.3604; P(X > 40) = (0.96) = 0.1954.
I first ten possible values of the random variable X and their corresponding probabilities and
cumuJ" ative probbT"
a !Illes areshown .m thetabJe below.
8.49 (a) We must assume that the shots are independent, and the probability o~success.~s t~e . X I 2 3 4 5 6 7 8 9 10
same for each shot. A "success" is a missed shot, so the p = 0.2. (b) The first success (miss) IS P(X) 0.5714 0.2449 0.105 0.045 0.0193 0.0083 0.0035 0.0015 0.0007
I 0.0003
the sixth shot, so P(X = 6) =(I -p)"" 1(p) = (0.8)5 x0.2 = 0.0655. (c) P(X,; 6) = 1 - P~X> 6) =I F(X) 0.5714
.. 0.8163 0.9213 0.9663 0.9855 0.9938 0.9973
.. 0.9989 0.9995 0.9998

t
- (1 - p = 1- (0.8)6 = 0.7379. Using a calculator, P(X,; 6) = geometcdf(0.2, 6) = 0.7379. (d) A probability histogram (left) and a cumulative probability histogram (nght) are shown

8.50 (a) There are 23 = 8 possible outcomes, and only two of the possible _outco~es (HHH and I
TTT) do not produce a winner. Thus, P(no winner)= 2/8 = 0.25. (b) P(wmner) -.1- P(no
winner)= 1-0.25 = 0.75. (c) Let X= number ofrounds_(tosses) until_so~eone wms. 1)
"Success" is getting a winner, "failure" is not getting a wmner, and a tnal1s one roun~ (each I
person tosses a coin) of the game, 2) the probabili~y ofsucces~ is 0.75, 3) ~he observatlo?s are I
independent, and 4) we are waiting for the first wm. Thus, X IS a geometnc rand?m vanabl~ ...
(d) The first seven possible values of the random variable X and their correspondmg probabilities
and cumulative prob ab1T!ties
are shown m the table below
I
1 2 3 4 5 6 7
X
P(X) 0.75 0.1875 0.046875 0.011719 0.00293 0.000732 0.000183
F(X) 0.75 0.9375 0.98438 0.99609 0.99902 0.99976 0.99994
(e) P(X:::; 2)- 0.75 + 0.1875 0.9375. (f) P(X > 4) (0.25) 0.0039. (g) TheexPected 8.52 (a) No. Since the marbles are being drawn without replacement and the population (the set
of all marbles in the jar) is so small, the probability of getting a red marble is not independent
number of rounds for someone to win is p = ~ = 0.~ 5 = 1.3333. (h) Let 1 =>heads and 0 => from draw to draw. Also, a geometric variable measures the number of trials required to get the
first success; here, we are looking for the number of trials required to get two successes. (b) No.
tails, and enter the command randlnt (0, I, 3) and press ENTER 25 times. In a simulation, we Even though the results of the draws are now independent, the variable being measured is still
recorded the following firequenc1es:
.
not the geometric variable. This random variable has a distribution known as the negative
3
X
Freq.
1
21
2
3 I I binomial distribution. (c) The probability of getting a red marble on any draw is
p=
20
= = 0.5714. Letthe digits 0, 1, 2, 3 =>a red marble is drawn, 4, 5, 6 =>some other color
Relative Freq. 0.84 0.12 O.o4
I 35 7

I iI
I

J
T
'

196 Chapter 8 The Binomial and Geometric Distributions 197

marble is drawn, and 7, 8, 9 =; digit is disregarded. Start choosing random digits from Table B, then the average number of girls per family is 2- I~ I. (c) Let an even digit represent a boy,
or use the TI-83 command randlnt (0, 9, I) repeatedly. After two digits in the set 0, I, 2, 3 have and an odd digit represent a girl. Read random digits until an even digit occurs. Count number of
been chosen, stop the process and count the number of digits in the set {0, 1, 2, 3, 4, 5, 6} that digits read. Repeat many times, and average the counts. Beginning on line I 0 I in Table B and
have been chosen up to that point; this count represents the observed value of X. Repeat the simulating 50 trials, the average number of children per family is 1.96, and the average number
process until the desired number of observations of X has been obtained. Here are some sample of girls is 0.96. These averages are very close to the expected values.
simulations using the TI- 83 (with R ~red marble, 0 ~other color marble, D ~disregard):
7 o 4 3 x~3 8.55 Letting G ~girl and B ~boy, the outcomes are: {G, BG, BBG, BBBG, BBBB}. A
D R 0 R "success" is having a girl. (b) The random variable X can take on the values ofO, I, 2, 3 and 4.
9 o 8 6 2 x~3 The multiplication rule for independent events can be used to obtain the probability distribution
D R D 0 R table for X below
9 7 3 2 x~2 etc. X 0 I 2 3 4
P(X)
D

X
D R R
For30 repetitions, we recorddth
e
2
e 11
o owmg fr equenc1es:
3 4 5 6 7 8
GJ GJ= GJ =~ G)' =I~ G)'= I~
Note that LP(X) = 1. (c) Let Y ~number of children produced until first girl is born. Then Y is
Freq. 16 5 5 3 0 0 I
Relative Freq. 0.5333 0.1667 0.1667 0.1 0.0 0.0 0.0333 a geometric variable for Y = I to 4 but not for values greater than 4 because the couple stops
.. ..
A simulated probab1hty h1stogram for the 30 repetitiOns IS shown below. The Simulated having children after 4. Note that BBBB is not included in the event Y= 4. The multiplication
distribution is skewed to the right, just like the probability histogram in Exercise 8.51, but the rule f<or m
d e en dent events can be used to ob tam
the probability distribution table for Y below.
y 1 2 3 4
two distributions are not the same.
P(Y)
GJ (H = GJ =~ (H =~~
Note that th1s IS not a vahd d1stnbutwn smce LP(Y) <I. The difficulty lies in the fact that one
of the possible outcomes, BBBB, cannot be written in terms ofY. (d) IfT is the total number of
children m
the family, then the probability distribution ofT is shown in the table below.
T I 2 3 4
P(T)
GJ GJ= GJ =~ GJ\(H =~~=~
The expected number of children for th1s couple IS f1r = I xO.S + 2x0.25 + 3x0.!25 + 4xO.l25 =
1.875. (e) P(T > 1.875) ~ I - P(T =I)~ 0.5 or P(T > 1.875) = P(T ~ 2) + P(T = 3) + P(T ~ 4) ~
8.53 "Success" is getting a correct answer. The random variable of interest is X~ number of 0.25 + 0.!25 + 0.125 ~ 0.5. (f) P(having a girl)= I- P(not having a girl)~ I - P(BBBB) ~ 1-
questions Carla must answer until she gets one correct. The probability of success is p ~ 1/5 ~ 0.0625 ~ 0.9375. '
0.2 (all choices are equally likely to be selected). (b) P(X ~ 5) ~ (1-o.2t' x 0.2= 0.0819. (c)
8:56 Let 0, I, 2, 3, 4 =;Girl and 5, 6, 7, 8, 9 =;Boy. Beginning with line 130 in Table B, the
P(X > 4) ~ (1-0.2) = 0.4096. (d) The first five possible values of the random variable X and their
4
Simulated values are:
690 I 51 I 64 I 81 1 7871 1 74 1 o 951 784
BBG BG BG BG BBBG BG G BBG BBG
3 2 2 2 4 2 3 3

(e) The expected number of questions Carla must answer to get one correct is f1 x ~ - 1- ~ 5. 53 4 0 64 89872 0 I 972 4 50 50
0.2
G G G BG BBBBG G G BBG G BG BG
8.54 (a) If"success" is having a son and the probability of success is p ~ 0.5, then the average 2 2 5 3 2 2.
1
number of children per family is f1 = - - = 2. (b) The expected number of girls in this family is 0 7I 663 2 81
f1 = 2x 0.5=I. 0.5
Alternatively, if the average number of children is 2, and the last child is a boy,
B
1
BG
2
BBG
3
G
I
BG
2
II
Chapter 8
T
I

The Binomial and Geometric Distributions 199


!98

The average number of children is 52/25 = 2.08, which is close to the expected value of 1.875. guess correctly 88 or more times in 288 trials. (Note: A formal hypothesis test of H 0 : p = 0.25
versus H 1 : p > 0.25 yields an approximate test statistic of Z = 2.18 and an approximate P-value
8.57 Find the mean of25 randomly generated observations of X; the number of ~hildren in the of 0.0 15, so we have statistically significant evidence that Olof is doing better than guessing.
family. We can create a suitable string of random digits (say of length 100) by usmg the Students willleam more about hypothesis testing procedures very soon.) (c) The probability of
command randint(0,9,100)--+L 1 Let the digits 0 to 4 re?resent a '.'boy" and 5,~o. 9 represe~~ a " making a correct guess on the first draw is 1/4 = 0.25. If you are told that your first guess is
"girl." Scroll down the list and count the number of children until you get a g~rl number or 4 incorrect, then the revised probability of making a correct guess on the second draw is 1/3 =
boy numbers in a row," whichever comes first. The number you have counted IS X, the ~u.mbe~ 0.3333. If you are then told that your second guess is incorrect, then the revised probability of
of children in the family. Continue until you have 25 values of X. The average for repetitions IS making a correct guess is 1/2 = 0.5. Finally, if you are told that your first three guesses are
incorrect, then your revised probability of making a correct guess is 1/1 = I. (d) The sum of the
x = 45 =1.8, which is very close to the expected value of fl = 1.875.
probabilities of guessing correctly on the first, second, third, and fourth draws without
25
replacement is 1/4 + 1/3 + 1/2 +I= 2 1112 or about 2.0833. Thus, the proportion of correct
guesses would be 2.0833/4 or about 0.5208. The expected number of correct guesses without
replacement is 288x0.5208=150, which is much higher than the expected number of correct
guesses without replacement, 288x0.25=72. If some of the computer selections were made
without replacement, then the expected number of correct guesses would increase. (e) Let X=
number of runs in precognition mode. Since the computer randomly selected the mode
(precognition or clairvoyance) with equal probability, X is B(l2, 0.5). The probability of getting
=
3 or fewer runs in precognition mode is P(X::; 3) = binomcdf(12, 0.5, 3) 0.073, somewhat
unlikely.

8.59 (a) Not binomial: the opinions of a husband and wife are not independent. (b) Not
binomial: "success" is responding yes, "failure" is responding no, and a trial is the opinion
expressed by the fraternity member selected. We have a fixed number of trail, n = 25, but these
trial are not independent and the probability of "success" is not the same for each fraternity
member.

8.60 Let N be the number of households with 3 or more cars. Then N has a binomial
(c) From Chapter 4, the power law model is Y = aXP, we transform the data by taking the
logarithm of both sides to obtain the linear model log(Y) =log(a)+ plog(X) Thus, we need to distribution with n = 12 and p = 0.2. (a) P(N = 0) = C~J(o.2)" (0.8Y' =(0.8)" =0.0687.
compute the logarithms of X andY. (d) A scatterplot of the transformed data is sho~ above on
the right. Notice, that the transformed data follow a linear pattem. (e) The correlatiOn for ~he P(N "?. 1) = 1-P(N = 0) = 1-(0.8)
12
=0.9313. (b) The mean is ,u = np = 12x0.2 =2.4and the
transformed data is approximately r =-I. (f) The estimated power function is Y= x-t = X = v'12x 0.2x 0.8 =1.3856
standard deviation is a= ~np(I- p) households. (c)

(g) The power function illustrates the fact that the mean of a geometric random variable is equal P(N > 2.4) = I-P(N,; 2) =1-0.5583 = 0.4417.
I
to the reciprocal of the probability p of success: fl =- 8.61 (a) The distribution of X will be symmetric; the shape depends on the value of the
p
probability of success. Whenp = 0.5, the distribution of X will always be symmetric. (b) The
values of X and their corresponding probabilities and cumulative probabilities are shown in the
CASE CLOSED! .
(a) The proportion of successes for all trials is 8~1288 =
0.3056 .. The pro_!lortwn of successes table below
for high confidence is 55/165 = 0.3333, for med1um confidence IS 12/48- 0.25, and fo~ low X 0 I 2
PfX) 0.0078 0.0547 0.1641 0.2734
3 4
0.2734
5 6
0.1641 0.0547
7
0.0078
confidence is 21/75 = 0.28. IfOlof Jonsson is simply guessing, then we would expect h1s .
proportion of successes to be 0.25. Overall, he has don~ sli.ghtly b~tter than expec~ed, ~s?ecmlly F(X) 0.0078 0.0625 0.2267 0.5000 0.7734
..
0.9375 0.9922 1.0000
when he has high confidence. (b) Yes, this result d~es mdlc.ate evidence ofpsych1c ab1l~~ Let
..
A probability histogram (left) and a cumulative probability histogram (nght) are shown below.
X= number of correct guesses in 288 independent tnals. X IS B(288, 0.25). The probab1hty of
observing 88 or more successes ifOlofis simply guessing is P(X 2: 88) =I- P(X::; 87) =I-
=
binomcdf(288, 0.25, 87) 1 -0.9810 = 0.0190. There is a very small chance that Olof could
T
I The Binomial and Geometric Distributions 201
Chapter 8
200
I
8.64 Let X= the number of schools out of20 who say they have a soft drink contract. X is
binomial with n = 20 and p = 0.62. (a) P(X = 8) = binompdf(20, 0.62, 8) = 0.0249 (b) P(X::; 8)
= binomcdf(20, 0.62, 8) = 0.0381 (c) P(X =:: 4) = 1- P(X::; 3) = 1-0.00002 = 0.99998 (d) P(4
I =
::; X::; 12) = P(X::; 12)- P(X::; 3) = binomcdf(20, 0.62, 12)- binomcdf(20, 0.62, 3) 0.51078
-0.00002 = 0.51076. (e) The random variable of interest is defined at the beginning of the
solution and the probability distribution table is shown below.
X P(X) X P(X)
0 0.000000 11 0.144400
1 0.000000 12 0.176700
2 0.000002 13 0.177415
3 0.000020 14 0.144733
(c) P(X = 7) = 0.0078125 4 0.000135 15 0.094458
5 0.000707 16 0.048161
.1 .1 _ h d) d y = the number oftrials to get the 6 0.002882 17
862 Let X = the result ofthe first toss (0 = tat , - ea an 0.018489

first head (1). (a) A record of the 50 repetitions shown below.
ts 7 0.009405 18 0.005028
X y X Y X Y X Y X Y 8 0.024935 19 0.000863
1111051111 9 0.054244 20 0.000070
0211110202 10 0.097353
!111110204
0404110411 for X is shown below.
!111111102
1105111111
1!03021102
0211111111
0 10 1 1 1 1 1 1 0 4
1102111111
(b) Based on the results above, the probability of gettmg ahe~d~~ 32/~~: ~~!' ~;:dai;:~o!":d
f t ( ) A frequency table for the number of tosses nee e o ge e
~~{: eBa~ed on the frequency table, the probability that the first h~ad appears on ~n odd-
. d (32 + 1 + 2)/50 = 0 70 This estimate is not bad, smce the theorettca1
numbere toss ts
probability is 2/3 or about 0.6667.
Percent
8.65 (a) X, the number of positive tests, has a binomial distribution with parameters n = 1000
y Count
1 32 64.00 and p = 0.004. (b) p= np=1000x0.004 = 4 positive tests. (c) To use the Normal approximation,
2 9 18.00
1 2.00
we need np and n(1- p) both bigger than 10, and as we saw in (b), np = 4.
3
4 5 10.00
5 2 4.00
10 1 2.00 8.66 Let X= the number of customers who purchase strawberry frozen yogurt. Then X is a
N~ 50 binomial random variable with n = 10 and p = 0.2. The probability of observing 5 or more
orders for strawberry frozen yogurt among I 0 customers is P(X =:: 5) = 1 - P(X ::; 4) = 1 -
8.63 Let X= number ofsouthem.ers out.of20 who belie~e they have bee~ ~~~l:db~~o~a~~C20, binomcdf(10, 0.2, 4) = I- 0.9672 = 0.0328. Even though there is only about a 3.28% chance of
Then X is a binomial random vanable wtth n = 20 and P- 0..46. (a) P(X _ P observing 5 or more customers who purchase strawberry frozen yogurt, these rare events do
o 4 6 10) = 0.1652. (b)P(10<X<15)=P(ll ::;X::;14)= bmomcdf(20,?.46, 14) _ occur in the real world and in our simulations. The moral of the story is that the regularity that
: ' 046 10 = 0.9917-0.7209=0.2708orP(10,;X,;15)=:bmomcd~(20,0.46, 15)
bb~nomcddfif((2o'
20 helps us to understand probability comes with a large number of repetitions and a large number
0.46, 9))= 0 9980-0.5557 = 0.4423, depending on your mterpretatwn of of trials.
momc , , . . . _ _ P(X,; 15) = 1- 0.9980 = 0.002.
"between" being exclustve or mclustve. (c) P(X > 15)- I
(d) P(X <8) = P(X,; 7) = binomcdf(20, 0.46, 7) = 0.2241.
202 Chapter 8 203

8.67 X is geometric withp = 0.325. (a) P(X =I)= 0.325. (b) P(X:;; 3) = P(X =I)+ P(X = 2) + Chapter 9
=
P(X = 3) = 0.325+(1-0.325) 2-1(0.325) + (1-0.325)3' 1(0.325) 0.6925. Alternatively, P(X:;; 3)
= =
=I- P(X > 3) =I- 0.675 3 0.6925. (c) P(X > 4) = (1-0.325)4 0.2076. (d) The expected 9.1 (a) p=2.5003 is a parameter (related to the population of all the ball bearings in the
container and x=2.5009 is a statistic (related to the sample of 100 ball bearings). (b) p=7.2%
1
number of times Roberto will have to go to the plate to get his first hit is 11 =.!:_ = -
p
-
0.325
=3.0769, is a statistic (related to the sample of registered voters who were unemployed).
or just over 3 at bats. (e) Use the commands: seq(X,X,I,IO)~Lh geompdf(0.325,LI)~Lz and
geomcdf(0.325,L 1 )~L 3 (f) A probability histrogram (left) and a cumulative probability
9.2 (a) p = 48% is a statistic; p =52% is a parameter. (b) Both x
con trol = 335 and x . _,
expenment.u =
are shown below. 289 are statistics.

9.3 (a) Since the proportion of times the toast will land butter-side down is 0.5, the result of20
coin flips will simulate the outcomes of20 pieces of falling toast (landing butter-side up or
butter-side down). (b) Answers will vary. A histogram for one simulation is shown below (on
the left). The center of the distribution is close to 0.5. (c) Answers will vary. A histogram based
on pooling the work of25 students (250 simulated values of p) is shown below (on the right).
As expected, the simulated distribution of Normal with a center at 0.5.

8.68 (a) By the 68-95-99.7 rule, the probability of any one observation falling within the
interval 11- a to 11 +a is about 0.68. Let X= the number of observations out of 5 that fall
within this interval. Assuming that the observations are independent, X is B(5, 0.68). Thus,
=
P(X = 4) = binompdf(5, 0.68, 4) 0.3421. (b) By the 68-95-99.7 rule, 95% of all observations
fall within the interval 11- 2a to 11 + 2a . Thus, 2.5% (half of 5%) of all observations will fall
above 11+ 2a. Let X= the number of observations that must be taken before we observe one
3
falling above 11 + 2a. Then X is geometric with p = 0.025. Thus, P(X = 4) =(I - 0.025) x0.025
=
= (0.975) 3 x0.025 0.0232.
05
=
(d) Answers will vary, but the standard deviation will be close to ) x 0.5 0.1118. The
20
simulation above for the pooled results for 25 students produced a standard deviation of0.1072.
(e) By combining the results from many students, he can get a more accurate estimate of the
value of p since the value of p approaches p as the sample size increases.

9.4 (a) A histogram is shown below. The center of the histogram is close to 0.5, but there is
considerable variation in the overall shape of the histograms for different simulations with only
10 repetitions.
.....------------------------
Chapter 8 203
202
Chapter9
. . .. - 2 5 a P(X=1)=0.325.(b)P(X~3)=P(X=1)+P(X=2)+
8.67 X IS geometnc wtthp -;}.3 () ) ( _ i"t(0.3 2S) 0.6925. Alternatively, P(X ~ 3)
=
P(X = 3) = 0.325+(1-0.325). (0.32 5 +
1 0 325
= 1 _ P(X > 3) = 1 - 0.675 = 0.6925. (c) P 4
=
(X> ) = (1_ 0 _325 )4 0.2076. (d) The expected
I I .
9.1 (a) JL=2.5003 is a parameter (related to the population of all the hall bearings in the
container and :X=2.5009 is a statistic (related to the sample of 100 ball bearings). (b) p=7.2%
h' fi
number of times Roberto will have to go to the plate to get IS trst hit is - - =
- P - 0.325 3.0769,
JL---
is a statistic (related to the sample of registered voters who were unemployed).
ands se (X X 1 10)~L~o geompdf(0.325,Lt)~L2 and
or just over 3 at bats. (e) Use the cbomb~ h.' tr q ~ (l;ft) and a cumulative probability 9.2 (a) p = 48% is a statistic; p =52% is a parameter. (b) Both :Xmtroi = 335 and :X.,porimwtai =
geomcdf(0.325,Lt)~L3 (f) A pro a 1 tty IS ogr 289 are statistics.
are shown
9.3 (a) Since the proportion oftimes the toast will land butter-side down is 0.5, the result of20
coin flips will simulate the outcomes of20 pieces of falling toast (landing butter-side up or
butter-side down). (b) Answers will vary. A histogram for one simulation is shown below (on
the left). The center of the distribution is close to 0.5. (c) Answers will vary. A histogram based
on pooling the work of25 students (250 simulated values of p) is shown below (on the right).
As Normal with a center at 0.5.

T f observation falling within the


8.68 (a) By the 68-95-9~.7 ~le, ~h~r~~:~ :~h~ :~~~~reofobservations out of5 that fall
interval JL- 0' to fl + 0' IS a out . . . are inde endent X is B(5, 0.68). Thus,

P(X = 4) = bmompdf (5, 0.68, 4)


L
within this i~terval. Assuming th~ t~;~~~~r(:;~~~\he 68_ 9 99 .7 :U1e, 95% of~ll obs~rvations
Th 2 5"; (half of 5%) of all observatwns wtll fall
. h' h . t al u-20' top+20'. us, . ;o
fall wtt m t e Ill erv r f b f that must be taken before we observe one
above fl + 20'. Let X= the nu~ber o o _serv~~on: 0 025 Thus, P(X = 4) = (1 - 0.025ix0.025
falling above fl + 20' . Then X ts geometriC wtt P
=
= (0.975ix0.025 0.0232.
(d) Answers will vary, but the standard deviation will be close to 1
20
=
5 x 0 5 0.1118. The

simulation above for the pooled results for 25 students produced a standard deviation of0.1072.
(e) By combining the results from many students, he can get a more accurate estimate of the
value of p since the value of p approaches p as the sample size increases.

9.4 (a) A histogram is shown below. The center of the histogram is close to 0.5, but there is
considerable variation in the overall shape of the histograms for different simulations with only
10 repetitions.

I
)
204 Chapter 9 Sampling Distributions 205

F" i:L 1
rD" 9.6 (a) There are 45 possible samples of size 2 that can be drawn. The frequency table below
~hows the values of the sample mean for samples of size 2, the number of times that mean occurs
m the 45 samples, and the corresponding percent.
Mean2 Count Percent

- - 60.0
61.5
2
1
4.44
2.22

r 62.0
63.5
64.0
2
2
2 4.44
4.44
4.44
65.0 1 2.22
Mil'l=.'1; 65.5 2 4.44
l'ii!JX<.;; n=; 66.0 1 2.22
..
(b) A htstogram for 100 repettttons ts shown below (on the left). The distribution is 67.0 2 4.44
67.5 2 4.44
I N ormaI wtt
aooroxtmately . h a center at 0 50 68.0 2 4.44
68.5 1 2.22
i:L 1 F" ;:::L 1
F"
.._WI_.
~
.... 69.0
69.5
3
2
6.67
4.44
.q.l"- .-....,. 70.0 2 4.44
. ,......
........ 71.0
72.0
2
2
4.44
4.44
r--. 72.5
73.0
2
2
4.44
4.44

-
.

h .
73.5
74.0
76.0
2
1
1
4.44
2.22
2.22
I 76.5 1 2.22
Mii'I:.Lt; 77.0 2 4.44
l'ii!JX<.;; 1'1=3:6 . Hr;:.j::_,; 77.5 1 2.22
78.0 1 2.22
81.0 1 2.22
(c) The mean and the median are extremely close to one another. See the plot above (on the
right). (d) The spread of the distribution will not change. To decrease the spread, the sample
size should be increased.

9.5 (a) The scores will vary depending on the starting row. Note that the smallest possible mean
is 61.75 (from the sample 58, 62, 62, 65) and the largest is 77.25 (from 73, 74, 80, 82). One
simulation produced a sample of73, 82, 74, and 62 with a mean of x =72.75. (b) Answers will
vary. A histogram for one set of 10 simulated means is shown below (left). The center of the
simulated distribution is slightly higher than 69.4. (c) Answers will vary. A histogram for a set
of250 simulated means is shown below (right). The simulated distribution is approximately
Normal with a close to 69.4. The the 205 simulated means is 69

The shapes and. sprea~s are different, but the centers are the same. The distribution for n = 2
ro~gh~y s~mmetrtc, but tt does not have the general shape of a Normal distribution. For n = 4
t~e ~tstr~button of the sample mean resembles the shape of a Normal distribution. Both '
d~str~but~ons are roughly centered at 69 .4. However, the spread is a little larger for the
dtstnbutton corresponding to n = 2.

=
9.7 (a) A histogram is shown below (on the left). (b) ,u 141.847 days. (c) Means will vary with
samples. The mean of our first sample was 120.833. (d) Our four additional samples produced
means of 183.4, 212.8, 127.3, and 119.7. It would be unlikely (though not impossible) for all

i
. I
206 Chapter 9 Sampling Distributions
207

five values to fall on the same side of.u. This is one implication of the unbiasedness of:X: Some sample in, e.g., California will be considerably larger, and therefore the variability will be
values will be higher and some lower than I', but not necessarily a 50/50 split. (e) The mean of smaller.
the (theoretical) sampling distribution is .u =141.847. (f) Answers will vary. A histogram for I 00
9.10 (a) Large bias, large variability. (b) Small bias, small variability. (c) Small bias, large
sample means of size 12 is shown below (on the right). T~e distrib~ti?n o_fsample mean lo~ks variability. (d) Large bias, small variability.
more Normal than the population distribution of survival times, but 1t IS still skewed to the n~ht.
The center of the distribution is about 145.9 and the means ranged from a low of88.75 to a h1gh
9.11 BothjJ~ 40.2% and j}~ 31.7% are statistics (related, respectively, to the sample of small-
The standard deviation of the 100 means 3~3m.~02~-~~~~~~~~~~~ class and regular-size-class black students).

9.12 The sample mean x ~ 64.5 inches is a statistic and the population mean p ~ 63 inches is a
parameter.

9.13 (a) If we choose many samples, the average of the x -values from these samples will be
close to I'. In other words, the sampling distribution of x is centered at the population mean p
we are trying to estimate. (b) The larger sample will give more information, and therefore more
precise results. The variability in the distribution of the sample average decreases as the sample
size increases.

9.14 (a) Use digits 0 and I (or any other 2 of the 10 digits) to represent the presence of egg
9.8 The table below shows the count, sample proportion, frequency, and percent for each masses. Reading the first I 0 digits from line 116, for example, gives YNNNN NNYNN-2
distinct value. A probability histogram is also provided. square yards with egg masses, 8 without-so p ~ 0.2. (b) The numbers and proportions of
square yards with egg masses are shown below. A stemplot is also shown below (right). The
Count Sample Prop frequency Percent mean of this approximately Normal distribution is 0.2.
9 0. 045 1 1
13 0.065 3 3
Eggmasses p-hat
14 0.070 2 2
0 0.0
15 0.075 5 5
3 0. 3
16 0.080 11 11
17 0.085 12 12 2 0.2
18 0.090 12 12 2 0.2
2 0.2
19 0.095 9 9
4 0. 4
20 0.100 7 7
21 0.105 5 5 3 0.3
3 0.3
22 0.110 6 6
23 0.115 7 7 2 0.2
24 0.120 10 10 1 0.1
3 0. 3 Stem-and-leaf of p-hat N ~ 20
25 0.125 4 4
26 0.130 1 1 1 0.1 Leaf Unit = 0.010
27 0.135 2 2 1 0.1
1 0.1
28 0.140 2 2
2 0.2 1 0 0
30 0.150 1 1 h 0 b d 1 1 0.1 8 1 0000000
(b) The distribution is bimodal with a very small bias. (c) The mean oft e I 0 o serve va ues 1 0.1 (5) 2 00000
of the sample proportion is 0.0981. The center of the sampling distribution .is about 0.0019 4 0. 4 7 3 00000
below where we would expect it to be, so there appears to be a very small b1as. (d) The mean of 3 0.3 2 4 00
1 0.1
the sampling distribution of p is 0.10. (e) By increasing the. sa~pl: size fr~m 200 to 1000, the
(c) The mean of the sampling distribution of p is 0.2. (d) The mean of the sampling
mean of p would stay the same, 0.1 0, but the spread in the d1stnbut10n of p would be smaller. distribution of p is 0.4 in this other field.

9.9 (a) Since the smallest number oftotal tax returns (i.e., the. smallest population) is still more
9.15 (a) A probability histogram for the number of dots on the upward facing side is shown
than 100 times the sample size, the variability will be (approximately) the sam~ for all states. (b)
below. The mean isp~ 3.5 dots and the standard deviation is u ~ 1.708 dots.
yes, it will change-the sample taken from Wyoming will be about the same s1ze, but the
- Chapter 9 Sampling Distributions 209
208

(b) This is equivalent to a pair affair, six-sided dice. (c) The 36 possible SRSs of size 2
9.17 Assuming that the poll's sample size was less than 870,000-10% of the population of
and the sample averages are shown below.
x=I.O New Jersey-the variability would be practically the same for either population. (The sample
1,1; size for this poll would have been considerably less than 870,000.)
2,1; 1,2; x=l.5
3,1; 2,2; 1,3; x=2.0
9.18 (a) The digits I to 41 are assigned to adults who say that they have watched Survivor:
4,1; 2,3; 3,2; 1,4; x-=2.5
Guatemala. The program outputs a proportion of"Yes" answers. For (b), (c), (d), and (e),
5,1; 2,4; 3,3; 4,2; 1,5; x-=3.0
answers will vary; however, as the sample size increases from 5 to 25 to 100, thfi variability in
6,1; 2,5; 3,4; 4,3; 5,2; 1,6; x=3.5 the sampling distributions of the sample proportion will decrease.
6,2; 3,5; 4,4; 5,3; 2,6; x-=4.0
6,3; 4,5; 5,4; 3,6; x-=4.5 9.19 (a) The mean is pi>= p= 0.7 and the standard deviation is
6,4; 5,5; 4,6 x=s.o
6,5; 5,6;
6 6
x-=5.5
x=6.0
a,=) p(l- p) =
n
07
x 0.3 = 0.0144. (b) The population (all U.S. adults) is clearly at least I 0
1012 .
(d) The ;a~pling distribution of x is shown below. The center is identical to that of the times as large as the sample (the 1012 surveyed adults). (c) The two conditions, np = 10!2x0.7
POiPUiallcm distribution the shape is symmetrical, single-peaked, and bell shaped and the spread = 708.4 > 10 and n(l - p) = IOI2x0.3 = 303.6 > 10, are both satisfied. (d) P(p,;;0.67) = P(Z :<;;
is smaller ~han that distribution. -2.08) = 0.0188. This is a fairly unusual result if70% ofthe population actually drinks the
cereal milk. (e) To half the standard deviation ofthe sample proportion, multiply the sample size
by 4; we would need to sample IOI2x4 = 4048 adults. (f) It would probably be higher, since
teenagers (and children in general) have a greater tendency to drink the cereal milk.

9.20 (a) The mean is Jli> = p = 0.4 and the standard deviation is a,= 0.4 x 06 = 0.0116. (b)
1785
The population (all adults) is considerably larger than I 0 times the sample size (n = 1785 adults).
(c) The two conditions, np = 1785x0.4 = 714 > 10 and n(I -p) = 1785x0.6 = 1071 > 10, are
both satisfied. (d) P(0.37,;; p ,;;0.43)= P( 037 -0.4,;; Z < 0.4 3 -0.4) = P( -2.59$ z $2.59)=
O.Dll6 0.0116
0.9952- 0.0048 = 0.9904. Over 99% of all samples of size n = 1785 will produce a sample
9.16 Answers will vary. A histogram of the X, -values is shown below. While the cent~r of the proportion p within 0.03 of the true population proportion.
distribution remains the same, the spread in this distribution is smaller than the spread m the
sampling distribution of x for samples of size n = 2.
9.21 For n = 300, the standard deviation is a, =)0.4x0. 6 =0.0283 and the probability is
300
037 3
approximately equal to P( -0.4,;; Z,;; 0.4 -0.4)= P(-1.06 $Z,;; 1.06) = 0.8554- 0.1446 =
0.0283 0.0283 .
-- 210 Chapter 9 Sampling Distributions 211

6 0 13 0 15
0.7108. For n = 1200, the standard deviation is O"P = 0.4x0. "0.0141 and the probability is .satisfied. (d) P(0.13 :s; p:;; 0.17) = P( - < Z :s; O.l 7 - 0 15 )" P( -2.20 :s;z :s; 2.20) = 0.9861 -
1200 . . Q~1 on~1
approximately equal to p( 037 -0.4 :s; Z < 0.43 -0.4)" P(-2.13 :s; Z :s; 2.13)= 0.9834- 0.0166 = 0.0139 = 0.9722. (Software gives 0.972054.) (e) To reduce the standard deviation in (a) by a
0.0141 0.0141 . third, we need a sample nine times as large; n = 13,860.
0
:: ~ 6 "0.0071 and the probability is
0.9668. For n = 4800, the standard deviation is O"' =
0 9.26 For n = 200, the standard deviation is
0 15 0 85
x "0.0252 and the probability is
(j". =
200
approximately equal to p( 037 -0.4 <Z < 0.4 3 -0.4)" P(-4.23 :s; Z:;; 4.23).: I. Larger sample
0 13 5 0 17
p

0.0071 0.0071 . approximately equal to P( -O.l < Z :s; -O.l 5 )" P(-0.79:;; Z :s; 0.79)= 0.7852- 0 2148 =
sizes produce sampling distributions of the sample proportion that are more concentrated about 0.0252 0.0252 .
the true proportion. 0 15 0 85
0.5704. For n = 800, the standard deviation is (j". = x "0.0126 and the probability is
p 800
9.22 (a) The distribution of the sample proportion is approximately normal with mean 0.14 and 0 13 0 15 0 17 0 15
0 14 0 86 approximately equal to P( - < z < - )" P(-!.59 :s;z:;; 1.59)= 0 9441- 0 0559 =
standard deviation O"P x 0.0155. (b) 20% or more Harley owners is unlikely: 0.0126 0.0126 . .
500
= 0 15 x 085 "0.0063 and the probability is
P(p > 0.2)"
-
p(z 0 2 0 14
> - )= P(Z ;o, 3.87) < 0.0002. (c) There is a fairly good chance of finding
0.0155
0.8882. For n = 3200, the standard deviation is (j".
3200 p

0 3 15 0 17 0 15
0 15 0 4 approximately equal to P( .1 -0. < Z :s; - )" P(-3.17:;; Z :s; 3.17)= 0 9992- 0 0008 =
at least 15% Harley owners P(p ;o,0.2)" P(z ;o, - .1 )= P(Z ;o, 0.64) = 1-0.7389 = 0.2611. 0.0063 0.0063 . .
. 0.0155
0.9984. Larger sample sizes produce sampling distributions of the sample proportion that are
more concentrated about the true proportion.
9.23 (a) The sample proportion is 86/100 = 0.86 or 86%. (b) We can use the normal
approximation, but Rule of Thumb 2 is just barely satisfied: n(1 - p) = 10. The standard 9.27 (a) The sample proportion is p= 62/100 = 0.62. (b) The mean is flp = p = 0.67 and the
deviation of the sample proportion is O"P = ~
0
~~0 1
"0.03 and the probability is
standard deviation is O"P =
0 67 033
x "0.0470. P(p:;; 0.62)" p(z :s; 062 - 067 )" P(Z:;; -1.06)
100 0.047
P(p :s; 0.86)" p( z < 0 8~-~3 9 )= P(Z :s;-1.33)= 0.0918. (Note: The exact probabili.ty is 0.1239.) = 0.1446. (c) Getting a sample proportion at or below 0.62 is not an unlikely event. The sample
results are lower than the national percentage, but the sample was so small that such a difference
(c) If the claim is correct, then we can expect to observe 86% or fewer orders shipped ~n time in could arise by chance even if the true campus proportion is the same.
about 12.5% of the samples ofthis size. Getting a sample proportion at or below 0.86 ts not an
unlikely event.
9.28 (a) The mean is fl, = p = 0.52 and the standard deviation is (j". = 052 x0.4S "0.0223 (b)
9.24 If p is the sample proportion .Who have been on a diet, then p is approximately Normal p 500
The population (all residential telephone customers in Los Angeles) is considerably larger than
with mean 0 7 and standard deviation O"
p
=~ 0 7267
x 03 "0.02804. The probability is 10 times the sample size (n = 500). The two conditions, np = 500x0.52 = 260 > 10, n(1- p) =

0 75 0 7 500x0.48 = 240 > 10, are both satisfied. P(p ;o, 0.5)" p(z > 05 - 052 ) = P(Z >0.90)= 1 -
approximately equal to P(p;o, 0.75)" P(z ;o, - ) = P(Z ;o, 1.78).= 1- 0.9625 = 0.0375. 0.0223
0.02804 0.1841 = 0.8159.
(Software gives 0.0373). Alternatively, as p?: 0.75 is equivalent to 201 or more dieters in the
sample, we can compute this probability using the binomial distribution. The exact probability is
9.29 (a) The mean isfl, = 0.75 and the standard deviation is O"P = 075 x 025 = 0.0433.
P(X;o,201)=1-P(X:s;200)= 1-0.9671=0.0329. 100
P(p :>0.7)" P( Z :;;
0
0 1
;_~4~; 5 ) = P(Z:;; -1.15)= 0.1251. (b) The mean isfl, = 0.75 and the

54
85
9.25 (a)ThemeanisfL,=p= 0.15andthestandarddeviationisO",= : x ~ "0.0091. (b)
0 75 0 25
standard deviation isO". = x "0.0274. P(p:>0.7)"P(z< 07 - 075 )=P(Z:s;-1.82) =
The population (all adults) is considerably larger than 10 times the sample size (n = 1540). (c) p 250 0.0274
The two conditions, np = 1540xO.J5 = 23 I ?: I 0, n(l - p) = 1540x0.85 = 1309?: I 0, are both 0.0344. (c) To reduce the standard deviation for a 100-item test by a fourth, we need a sample
------------------------------
Chapter 9 Sampling Distributions 213
212
9.36 The central limit theorem says that over 40 years, the mean return x is approximately
- d y the answer is the same for Laura. Taking a sample
sixteen times as large; n.- \600. ( ) d esd
deviation by a fourth, for all values of p. Normal with mean ,u, =13.2% and standard deviation 0'11 =
1
J;
= 2.7670%. Therefore,
sixteen times as large will cut the stan ar v40
.. . R 1 of Thumb 2 is not satls fi1ed ; np--I5x03=45<10.
: 15 13 2
o (a) One of the two cond1t10ns m u e . . ( ) is not at least 10 times as 1arge P(x">J5%)'=P(Z> - )=P(Z>0.65)= 1-0.7422=0.2578 and
9 3 . t f fled The populatiOn s1ze 316 2.767
(b) Rule of Thumb \IS no sa IS df(15 0 3 3) =0.2969. 132
asthesamplesize(n=50). (c) P(X:o;3)=bmomc ' . ' P(:x <10%)'=P(z < l0- )=P(Z <-1.16)= 0.1230.
. 2.767
-~= 26 =11.6276%. (b)
,[nn JS
0 0

. - - 3 5o/c and the standard deviation IS 0',- 9.37 (a) No this probability cannot be calculated, because we do not know the distribution of the
9.31 (a)Themeanls,u,-J.t-- 0
weights. (b) If W is total weight and x =W/20, the central limit theorem says that x is
. ( 5-(-3.5))= P(Z > 0.33) = 1-0.6293 = 0.3707. (c)
P(X>5)=P Z> 26 approximately Nonnal with mean 190 lb and standard deviation 0'11 = ~ = 7.8262lb. Thus,
v20
. (
5-(-3.5))=P(Z>0.73) =l-0.7673=0.2327.(d) 200 190
P(x>5)=P Z> 11.6276 P(W > 4000) =P(x > 200) = P(z > - ) =P(Z > 1.28)= 1-0.8997 = 0.1003. There is about
7.8262
- . ( 0-(-3.5))=P(Z <0.30)= 0.6179. Approximately 62% of all five-stock a 10% chance that the total weight exceeds the limit of 4000 lb.
P(x<O)=P Z< !1.6276
0
portfolios lost money 9.38 (a) The mean is ,u, =40.125 mm and the standard deviation is 0'11 = ~2 =0.001 mm. These
. ( 2!-18.6)=P(Z>0.41)= 1-0.6591 =0.3409. (Software gives results do not depend on the distribution of the individual axel diameters. (b) No, the
9.32 (a) P(X > 21) = P z > 5.9
probability cannot be calculated because we do not know the distribution of the population, and n
. . . _ 5.9 =0.8344. These
. _ 6 and the standard devJatJOn IS 6 - .J5o = 4 is too small for the central limit theorem to provide a reasonable approximation.
0.3421.) (b) The mean IS,U,- 18
d on the distribution of the individual s.cores. (c) 9.39 (a) Let X denote Sheila's glucose measurement.
results do not depen
. ( 21-18.6)=P(Z>2.88)= 1-0.9980=0.002. P(X>J40)=P ( Z> 140-125) =P(Z>1.5)=1-0.9332=0.0668. (b)Ifx 1sthemeanoffour 0

P(x > 21) = P z > 0.8344 10


measurements (assumed to be independent), then x has a N(12s,IO/v'4) =
. . . 10
=-= . . b) so1ve ~=3r ' n = 11.1, son
5.7?35m!lhgrams. ( 140 125
9 33
. .
(a) The standard deviation IS 0', Jj v n . .
everal measurements than there IS m a smg1e
. . N(l25mg/dl,Smg/dl) distribution andP(x> 140) =P( Z > ~ ) =P(Z > 3.0) =1- 0.9987 =
= 12 There is less variability in the average of s ts is more likely to be close to the true 0.0013. (c) No, Sheila's glucose levels follow a Normal distribution, so there is no need to use
.
measuremen . t Also , the average of several measuremen
o the central limit theorem.
mean than a single measurement.
2
. th " = 6 strikeslkm and 9.40 The mean of four measurements has a N (125mg/dl, 5mg/dl) distribution, and
- . ber of strikes per square kilometer, en r-x
9 34 (a) If x 1s the mean num . kn
b b'l't because we do not ow P(Z>l.645)=0.05 ifZisN(O, 1), soL= 125 + 1.645x5 = 133.225 mg/dl.
= 2 4 = 0.7589 stn'kes;km2 (b) We cannot calculate the pro a 1 1 Y
-~ . .
6
" .J1o b fl'gbtning strikes. If we were told the populatiOn IS Normal, 9.41 (a)) Let X denote the amount of cola in a bottle.
the distribution of the num er 0 1 .
0

295 298
then wewould be able to compute the probab1hty. P(X < 295)=P( Z < ; )=P(Z <-1) = 0.1587. (b) If x is the mean contents of six bottles
. 1 N (I 6 l 2/.J200 = 0.0849) distribution. The
9.35 The samplemeanxhas approximate y a . ' . . . (assumed to be independent), then x has a N( 298,3/J6) = N(298ml, 1.2247ml) distribution
_ "'"p(z > 2-1.6)= P(Z > 4.7 1), which IS essentially 295 298
probability is approximately equal toP(x > 2)- 0.0849 . and P(x < 295) =P(z <
.
-
1.2247
)= P(Z < -2.45)=0.0071.

0.
Chapter 9 Sampling Distributions
214 215

the distribution of pis approximately Normal. (S)


9.42 (a) The sample mean lifetime has a N{ 55000,4500/ .JS) =N(55000 miles, 1591 miles)
P(fJ, 480
40) =P(z, o.0833 -o.1) _ < _ .
-P{Z--1. 22 )-0.1112. (6) Thereisaboutanllo/ochanceof
distribution. (b) P(x <51,800) ~ P(z < ' )= P(Z <-2.01)~ 0.0222.
51 800 55 000 . 0.0137
- '
1591 observing a sample proportion this small 0 .f h
produced is 0.1. A good l!int man r sma11 er~ 1 t e true proportion of unsuitable batteries
9.43 (a) The approximate distribution of the mean number of accidents :X is overall process before se~ding this :~~;::n~~d reqmre a smaller probability and evaluate the

= N(2.2,0.1941). (b) P(x < 2) ~ P(z < - )= P(Z < -1.03)= 0.1515. (c) If
2 22
N(2.2,1.4/ .J52) 0.1941 9.47 (a)p=0.68or68%isaparameter-' p=073or73o/c
0 Isastat'IS t'IC. (b)Th e mean IS
. Jlp = p =
X denotes the number of accidents at an intersection per year, then
0.68 and the standard deviation is a.~ ~
0.68x0.32 -0.0381.
P(X < 100) =P(x <100152) ~ P(z < 1. 9231 - ) =P(Z < -1.43)= 0.0764.
22 P (c)
150
0.1941
P(p>0.73)=P(Z> 0.73-0.68)-
0.038! -P (Z>l.31)-:-l-0.9049=0.0951.
- Thereisabouta 10%
9.44 The mean of22 measurements has a N(13.6,3.1/Fz)distribution, and chance of getting a sample proporf f .
Thus the random dig't d . IOn o 0.73 or ~realer, If the population proportion is 0.68.
P(Z <-1.645) ~o.os if Z is N(O, 1), soL= 13.6- 1.645x0.6609 = 12.5128 points. ' 1 eviCe appears to be workmg fine.

r~:s ~=* ~~~~~~~r~_:;~lav:~;~tz;: ~~~~~tio;h~e o?tained a total_ors simulated values of


8
9.45 The mean loss from fire, by definition, is the long-term average of many observations of
the random variable X= fire loss. The behavior of Xis much less predictable if only a small customs agents have given us reasonable info:~ati~~ :~~~t:~d p~obabiiif su~gests that the
number of observations are made. If only 12 policies were sold, then the company would have
no protection against the large expense that would be incurred if at least one ofthe 12
~c) Let p den~te the sample pr~portion of passengers who get ~~ g~:~: ~ig~~~:~:gg~:::!~!t.
policyholders happened to lose his or her home. If thousands of policies were sold, then the ample of n - 100 travelers gomg through security at Guadalaiara airport The m . - 0
Th 1 " ean1s,u.- 7
average fire loss for these policies would be far more likely to be close to J1, and the company's e popu atwn (all travelers going through s t G d 1.
times as large as the sample (n = IOO) so R ~cu~Tyhat bua a llJara airport) is clearly at least 10
P

profit would not be endangered by the few large fire-loss payments that it would have to make. ~ ' ue0 urn 1 suggests that the standard deviation
a,= y---wo- = 0.0458 is appropriate. The two conditions, np = !OOx0.7 = 70 > 10 and n(l _
9.46 The approximate distribution of the average loss :X is
N( $250,$300/~10,000) = N($250,$3) and P(x> $260) ~ P( Z > 260 ; 250 ) =P(Z > 3.33)= 1- p) = lOOx0.3 = 30 > 10' are both safIs fiIed' so Ru1e of Thumb 2 suggests that the distribut'
' ( 1000
f
pisapproximatelyNormal. (d) P(jJ:o.o. 6s)=P 2 ,0.65-0.7)- ( __ .
.. 0.0458 - P Z "'-1.09)-- 0.1379. This
0.9996 = 0.0004.
~robabiii!,Y. (or 1~% chance) is reasonably close to the 16% obtained in our si I .
CASE CLOSED! - 1000, pIS agam approximately normal, with mean .u, = 0.7 and standard d:~a~:~~n. (e) For n
(1) The central limit theorem suggests that the mean of30 lifetimes will be approximately
a.~ 0.7x0.3 ~ ..
Normal with mean ,u, ~ 17 hours and the sta~dard deviation of a,~ ~ =0.1461 hours. (2) P 1000 -0.0145. (The conditiOns for Rule of Thumb 2 are clearly satisfied, but some
v30
~~d;:~ ~~;ryw:~~::a~~~~e{e~~ ~~nt~~ s~~~~ard ~eviation is reasonable ~n this situation because
P(x <16.7) = p(z < 16 7 - 17 )~ P(Z < -2.05) = 0.0202. (3) There is only about a 2% chance of ,
rave ers pass through secunty at this airport.)
0.1461
P(p:o.0.65)=P ( 2<0.65-0.7)- ( -
- 0.0145 - P Z "'-3.45)- 0.0003. The sample proportionp is less variable
observing a sample average as small or smaller than the one observed (:X,;; 16.7) if the process is
working properly. This small probability provides evidence that the process is not working for larger sample sizes, so the probability of seeing a value of less than or equal to 0 65
properly. (4) Let p denote the sample proportion of unsuitable batteries among a random decreases. .
sample ofn = 480 batteries. The mean is fifo~ p~ 0.1. The population (all batteries produced)
9.49 (a) Let X denote the WAIS score for a randomly selected individual.
is clearly at least 10 times as large as the sample (n = 480), so Rule of Thumb 1 suggests that the
P(X<:!OS)~P(z>- 105-100)~- P (Z <: 0.33)-- 1-0.6293 = 0.3707.
standard deviation a,= ~O.l x0. 9 =0.0137 is appropriate. The two conditions, np = 480x0.1 ~ 15
(Software gives 0.3695.)
480
48 > 10 and n(l- p) ~ 480x0.9 = 432 > 10, are both satisfied, so Rule of Thumb 2 suggests that
- - - - - - - - - - - - - - - - - - - - - -. . . .1

216 Chapter 9
Sampling Distributions
217

(b) The mean is ,u, =I 00 and the standard deviation is ax = ~ = 1.9365 . (c) P(X,o;300)=P('< 300)"-P( 0.1929-0.2)
v60 . P-1555 - Z,o; O.OIOI =P(Z,o;-0.7)=0.2420. (Theexactprobability
P(x;:, 105) = p(z > 105 - 100 ) = P(Z;:, 2.582)= 1- 0.9951 = 0.0049 (d) The answer to (a) could IS 0.25395.) Note: ~ctually, X has a hypergeometric distribution, but the size of the population
1.9365 .
(all Internet users) zs so much larger than the sample that the binomial distribution is a
be quite different; the answer to (b) would be the same (it does not depend on normality at all). extremely good approximation. n
The answer we gave for (c) would still be fairly reliable because ofthe central limit theorem.
9.54 (a) Let X denote the level of nitrogen oxides (NOX) for a randomly selected car.
9.50 (a) Letp denote the sample proportion of women who think they do not get enough time 2)- P(Z
P(X>O .3)= P(z > - 0.3-0.
. - ;:, 2.0 )-
- 1-0.9772 = 0.0228 (or 0.025, using the 68-95-99.7
for themselves in a random sample ofn = 1025 adult women The mean is 1-li> = 0.47. The 0 05
population (all adult women) is clearly at least I 0 times as large as the sample (n = I 025), so rule). (b) The mean NOX level for these 25 cars is ,u,= 0.2 g/mi, the standard deviation is
a,=~= 0.0100 g/mi, and P(x;:, 0.3) = P( Z;:, o.~~~- 2 ) = P(Z;:, 10), which is basically 0.
7
Rule of Thumb I suggests that the standard deviation a,= 0.4 x0.5J = 0.0156 is appropriate.
1025
The two conditions, np = 1025x0.47 = 481.75 > 10 and n(l - p) = 1025x0.53 = 543.25 > 10, are
both satisfied, so Rule of Thumb 2 suggests that the distribution of pis approximately Normal. 9.~5 The mean NOX level for 25 cars has aN(0.2, 0.01) distribution and P(Z > 2 33) = 0 01 1"f
(b) The middle 95% of all sample results will fall within two standard deviations (2x0.0156 = Z IS N(O, 1), soL= 0.2 + 2.33xO.Ol = 0.2233 g/mi. '
0.0312) of0.47 or in the interval (0.4388, 0.5012). (c)
P(jJ < 0.45) = p(z,; 0.4 5 - 0.4 7 ) = P(Z,; -1.28) = 0.1003. (Software gives 0.0998.) 9.56 (a) No-~ count ?nl~ ta~es ?n whole-number values, so it cannot be normally distributed.
0.0156 (b) T~ approximate d1stnbutwn 1s Normal with mean ,u, =1.5 people and standard deviation

9.51 (a) The mean is ,u, = 0.5 and the standard deviation is a,= ~ = 0.0990.
0
(b) Because this
a,= Jito = 0.0283. (c) P(X > 1075) = P(x > 1.5357) = p(z > 1. 5:.~~~1. 5 ) = P(Z > 1.26)= 1 _
v50 0.8962 = 0.1038.
distribution is only approximately normal, it would be quite reasonable to use the 68-95-99.7
rule to give a rough estimate: 0.6 is about one standard deviation above the mean, so the
probability should be about 0.16 (half of the 32% that falls outside I standard deviation). 9.57 (a)Themeanis!-1,=0.5,andthestandarddeviationisa. = 05 x0. 5 =oo 041 (b)
p 14941 . .
06 0 5
Alternatively, P(x;:, 0.6) = P(z;:, - ) = P(Z;:, 1.01)= 1-0.8438 = 0.1562. P(049 ' 051) (0.49-0.5 051-05)
0.0990 . < p < . = p 0.0041 < z < ~.004; = P(- 2.44 < z <2.44)= 0.9927-0.0073 =
0.9854.
9.52 (a) Let X denote the number ofhigh school dropouts who will receive a flyer. The mean is
1-lx = np = 25,000x0.202 = 5050. (b) The standard deviation is 9.58 (a) If samples of size n = 219 were obtained over and over again and the sample mean
ax = ,}np(l- p) = ,j2s,ooo x 0.202 x0.798 = 63.4815 and computed for each sample, the center of the distribution of these means would be at p gramwas
5000 5050 In other words, the expected value of the sample mean is " grams (b) No the po 1 t" s.
P(X;:, 5000)= P(z;:, - )= P(Z ;:,-0.7876)= 0.7845. d" tr'b . . ,... , pu a wn
63.4815 b IS .1 u!i?n of birth weights. among ELBW babies is likely to be skewed to the left. Most ELBW
abies ':lll_be above a certam value, but some babies will even be below this value. (c) Yes the
9.53 (a) Let pdenote the sample proportion oflntemet users who have posted a photo online in central hm1t theorem suggests that the mean birth weight of 219 babies will be approximately'
Normal.
a random sample ofn = 1555 Internet users. The mean is 1-lp = 0.2. The population (all Internet
users) is clearly at least I 0 times as large as the sample (n = 1555), so Rule of Thumb I suggests
. . a i> = 02 X 08 =. 0.01 01 IS
that the standard devJatwn . appropna. te. The two cond"t"
1 wns, np =
1555
1555x0.2 = 311 > 10 and n(l- p) = 1555x0.8 = 1244 > 10, are both satisfied, so Rule of Thumb
2 suggests that the distribution of p is approximately Normal. (b) Let X= the number of in the
sample who have posted photos online. X has a binomial distribution with n = 1555 and p = 0.2.
218 Part III Review Exercises Part III Review Exercises
219
Part III Review Exercises
(d) Th~ events "in the labor force" and "college graduate" are not independent since the
111.1 (a) The proofreader will catch 75% of the errors. probab1hty ofbemg in the labor force (0.6632) does not equal the probability ofbeing in the
labor force g1ven that the person is a college graduate (0.7830).
?{proofreader catches error)= P(nonword error) ?(proofreader catches errorinonword error)
I
+P( word error) P (proofreader catches error word error) 111.3 (a) We know that the sum of the probabilities must be I, so
P(X = 7) = 1-(P(X = 0)+ P(X = 1)+ P(X = 2)+P(X = 3)+P(X = 4)+P(X = 5)+P(X = 6))
= (0.25){0.90)+(0.75)(0.70)
= 0.225+0.525 = 0.750 = 1- ( 0.04 + 0.03 + 0.06 + 0.08 + 0.09 + 0.08 + 0.05)
(b) Of all errors the proofreader catches, 30% of them are nonword errors. = 1-0.43 = 0.57
P{ nonword error and proofreader catches error) (b) T!1~ m~an number of days that a randomly selected young person (aged 19 to 25) watched
P( nonword error proo fireader catches error ) = . . {
I ) televlSlon 1s 5.44.
P proofreader catches error 7

= (0.25){0.90) 0.225 =0.30


flx = L.;x,p,
i=O
0.750 0.750
(c) Let X be the number of word errors that are missed. If a human proofreader catches 70% of = 0 (0.04) +I{ O.D3) + 2 ( 0.06) + 3 ( 0.08) + 4 ( 0.09) + 5( 0.08) + 6( 0.05)+ 7 ( 0.57)
word errors, that person will miss 30% of word errors. This is a binomial situation with n = 10 = 0+0.03 +0.12+0.24+0.36+ 0.40+ 0.30+ 3.99
andp=0.30. =5.44
P(X ~ 3) = 1-P(X,; 2) (c) First we need to find the standard deviation for X
7
=1-[P(X = O)+P(X = 1)+P(X =2)] ax= L.;(x,-!1x)
2
P,

C~Jco.3o)' (0.7o)' +C~Jco.3o)' (o.7o)' +C~Jco.3o)' (o.7o)' J


i=O
0

= ~(0-5.44) 0.04+ "+ (7 -5.44) 0.57


= 1-[ 2 2

"'1-( 0.0283 + 0.1211 + 0.2335] = ../4.5664


=1-0.3829 =2.14
=0.6171 We wo~ld expect the mean x of 100 randomly selected young people (aged 19 to 25) to be
Thus, the probability that a fellow student misses 3 or more out of 10 word errors is 0.6171. approximately Normally distributed with mean 11, = flx = 5.44 and standard deviation
ax 2.14
. 111.2 (a) The unemployment rates for each level of education are: a,=.[,;= .JiOO = 0.214 .
. . ) (12470-11408) 1062
P (unemployed! didn't fimsh HS = = 0.0852 Standardize x = 4.96: z = x- /1, = 4 96 - 5.44
-2. 24
12470 12470 a, 0.214
1977
P ( unemployed! HS but no college ) = - - = 0.0523 The area to the left of z = -2.24 is 0.0125. We would expect to see results as extreme as or more
37834 extreme than ours about 1.25% of the time. The average number of days spent watching TV
1462 seems unusually low for this group.
P ( unemployed! less than bachelor's degree)=--= 0.0425
34439
1097
P ( unemployed! college graduate ) = - - = 0.0272
111.4 (a) T!~w probability that the mean score for the delayed speakers in the experiment would
40390 exceed 32 IS 0.0020.
The unemployment rate decreases with additional education . There were n = 23 delayed speakers. We would expect the mean score for delayed speakers to be
. (b) P{inlaborforce)= 12470+37834+34439+40390 = 125133 0. 6632
27669+59860+47556+51582 188667
Normally distributed with 11, = flx = 29 and standard deviation a, =
.
af- =
vn v23
b
= 1.04 ..

. 40390
(c) P (m labor force! college graduate)=-- =0.7830
51582
220 Part III Review Exercises Part III Review Exercises 221

32 29
P(x>32)=P(Z> - )
1.04 0.85-0.88
P(p < 0.85) = P Z <
=P(Z>2.88) 0.88(0.12)
= 1-0.9980 = 0.0020 500
(b) The random assignment was used to "even out" the two groups (immediate and delayed
speakers) with respect to any other variables that might affect learning a foreign language so the = P(z < -0.03 )
effects of immediate versus delayed practice could be studied. 0.0145
(c) The probability that the experiment will show (misleadingly) that the mean score for delayed = P( Z < -2.06)
speaking is at least as large as that for early speaking is 0.0329.
= 0.0197
The probability that fewer than 85% of the men in the sample were employed last summer is
0-(32-29) 0.0197. (b) Random sampling is important to obtain a sample that is representative of the
P(x -xd s;O)=P Z<--.~~=-"-
' 62 52 population (here college men and college women). (c) The probability that a higher proportion
-+- of women than men in the sample worked last summer is 0.0038.
23 23

=P(z<~)
-1.629 P (Pw -PM > 0) = P
0-(0.82-0.88)
Z > --,==;=2==:====f:.:==;=-
0.82(0.18) + 0.88(0.12)
=P(Zs;-1.84)
500 500
=0.0329
=P(z > o.o6 )
III.S (a) The mean is 1200(0.121) = 145.2, and the standard deviation is "'1200(0.121)(0.879) = 0.0225
11.3. (b) According to the 68-95-99.7 rule, the range 122.6 to 167.8 will include the counts of =P(Z>2.67)
Hispanics in approximately 95% of all such samples. =1-0.9962
145.2- 2(11.3) = 122.6 and 145.2 + 2(11.3) = 167.8
= 0.0038
(c) For there to be at least 200 Hispanics, we need to find n such that 0.12ln ::0:200. This means
200
that n;;;, - - = 1652 . 89, son= 1653 adults. 111.9 (a) First, we need to fmd the proportion of mathematics degrees earned by women.
0.121 P( degree earned by a woman)= 0.73(0.48)+0.21(0.42)+0.06(0.29)
111.6 We have a binomial distribution with n = 9 and p = 0.056. Let X be the number of authors = 0.3504+ 0.0882 + 0.0174
whose names are among the 10 most common. We need to fmd the probability that X= 0. =0.4560

P(X =0) =(~}0.056)' (0.944)" = 0.5953


Since 16,701 degrees were awarded and 45.6% of these degrees were awarded to women,
approximately 7616 mathematics degrees were awarded to women. (b) The events "degree
earned by a woman" and "degree was a master's degree" are not independent. The probability
that a degree is earned by a woman is 0.4560, while the probability that a master's degree earned
111.7 Sample statistic A provides the best estimate of the parameter. Both statistics A and B
by a woman was 0.420. For these two events to be independent, these probabilities would have
appear unbiased, while statistic C appears to be biased (low). In addition, statistic A has lower
to be equal. (c) Let X be the number of degrees earned by a woman. X is approximately a
variability than statistic B. In this situation, we want low bias and low variability, so statistic A is binomial random variable with n = 2 and p = 0.4560.
the best choice.
P(X:<: 1) =1-P(X = 0)
III.S (a) We will use Normal approximation here, since np = 500(0.88) = 440::0:10 and
= 1-(~)(0.456)' (0.544) 2

n(l- p) = 500(0.12) = 60;;;, 10.


= 1-0.2959
= 0.7041
I
222 Part III Review Exercises

Several other methods of solution that involve probability methods from Chapter 6 are also
possible.

111.10 (a) Sensitivity is the probability that the test shows a hearing loss when there really is a
hearing loss. The test detected 54 hearing losses out of a total of 58 hearing losses, so the
sensitivity is 54 = 0.931. Specificity is the probability that the test shows that hearing is normal
58 .
when hearing is really normal. Out of the 42 normal hearing babies, the test identified 36 of them
36
as normal hearing, so the specificity is =0.857. (b) The prevalence of hearing loss is 0.003.
42
Out of the approximately 4,008,083 babies born each year, we would expect about 12,024 of
them to have hearing loss. Since the sensitivity of the test is 0.931, we would expect that about
830 (12024(1-0.931) = 12024(0.069) = 829.7) of them would have a hearing loss that will be
missed by this new screening device. (c) P(hearing loss)= 0.003
P( test shows hearing loss)= P( test shows hearing loss and baby has hearing loss)
+ P (test shows hearing loss and baby does not have hearing loss)

=(~:}o.oo3)+( : 2 }o.997)
=0.0028+0.1424
=0.1452
0 0028
P(hearing loss! test shows hearing loss)= 0.0193
0.1452
This low probability might appear surprising, but it is low because the prevalence of hearing loss
is so low.
~I
I
'

222 Part III Review Exercises 223

Several other methods of solution that involve probability methods from Chapter 6 are also Chapter 10
possible.
I 0.1 (a) The sampling distribution of x is approximately normal with mean llx = 280 and
III.lO (a) Sensitivity is the probability that the test shows a hearing loss when there really is a
hearing loss. The test detected 54 hearing losses out of a total of 58 hearing losses, so the standard deviation u, = '{-. =
vn v840
=
~ 2.0702. (b) The mean is 280. One standard deviation from
54 the mean: 277.9 and 282.1; two standard deviations from the mean: 275.8 and 284.2; and three
sensitivity is = 0.931. Specificity is the probability that the test shows that hearing is normal
58 ' standard deviations from the mean: 273.7 and 286.3. Two different sketches are provided below.
when hearing is really normal. Out of the 42 normal hearing babies, the test identified 36 of them
as normal hearing, so the specificity is !~ = 0.857. (b) The prevalence of hearing loss is 0.003.
Out of the approximately 4,008,083 babies born each year, we would expect about 12,024 of
them to have hearing loss. Since the sensitivity of the test is 0.931, we would expect that about
830 (12024(1-0.931) = 12024(0.069) = 829.7) of them would have a hearing loss that will be
missed by this new screening device. (c) P(hearing loss)= 0.003
P (test s~ows hearing loss) = P( test shows hearing loss and baby has hearing loss)
+P( test shows hearing loss and baby does not have hearing loss)

=(~:}0.003)+( :2}0.997) =
(c) 2 standard deviations; m 4.2. (d). The confidence intervals drawn will vary, but they
= 0.0028+0.1424 should all have length 2m= 8.4. See the sketch above on the right. (e) About 95% (by the 68-
= 0.1452 95-99.7 rule).

( I .
p hearing loss test shows heanng loss = 0.1
)
0.0028 0 0193 10.2 (a) Incorrect; the probability is either 0 or I, but we don't know which. (b) Incorrect; the
452
general form of these confidence intervals is X: m, so xwill always be in the center of the
This low probability might appear surprising, but it is low because the prevalence of hearing loss
confidence interval. (c) Incorrect; the different samples will yield different sample means, and
is so low. the distribution of those sample means is used to provide an interval that captures the population
mean. (d) Incorrect; there is nothing magic about the interval from this one sample. Our method
for computing confidence intervals is based on capturing the mean of the population, not a
particular interval from one sample. (e) Correct interpretation.

I 0.3 No. The student is misinterpreting 95% confidence. This is a statement about the mean
score for all young men, not about individual scores.

I 0.4 (a) The sampling distribution of x is approximately normal with mean llx = 11 and standard
deviation u, = '{-. =
-vn v50
=
~ 0.0566 . (b) The mean is 11 . One standard deviation from the mean:
,u-0.0566and,u+ 0.0566; two standard deviations from the mean: ,u- 0.1132 and,u+0.1132;
and three standard deviations from the mean: ,u- 0.1698and,u+0.1698. Two different sketches
are provided below.
224 Chapter 10 Estimating with Confidence 225

Highlight L5 and let LS = L2- L4. To make it easier to count the number of differences in LS
that are greater than zero, sort LS: SortD ( L5) and look to see when the sign changes.

D L't LS 5:

.~.
5:8.7::.: 5:2:.95:'t Lrm:t.t.ll
5:5:.781 5't. 't8 3.7882:
S3.3't5 5:5.2:1't 3.1i133
56.95't S't.663 3.2:31
5(1.85:6 5:3.'t97 2:.92:'tS
55:.(12:3 S't.95:9 2:.S7S't
(c) 2 standard deviations; m = 0.1132. (d) About 95% (by the 68-95-99.7 rule). (e) The 5:2:.956 5:3.92:3 2:.3851
confidence intervals drawn will vary, but they should all have length 2m= 0.2264. See the
sketch above on the right. L5W=4 056 707196 ...
10.5 (a) 51%3%or (48%, 54%). (b) 51% is a statistic from one sample. A different sample In this simulation 115 out of200 were greater than zero, or 0.575.
may give us a totally different answer. When taking samples from a population, not every
sample of aaults will give the same result, leading to sampling variability. A margin of error 10.7 The figure below (left) shows how much area is in each tail and the value z* you want to
allows for this variability. (c) "95% confidence" means that this interval was found using a find. Search Table A for 0.0125 (half of the 2.5% that is not included in a 97.5% confidence
procedure that produces correct results (i.e., includes the true population proportion) 95% of the This area to z* = 2.24.
time. (d) Survey errors such as undercoverage or non-response, depending on how the Gallup
Poll was taken, CO!Ild affect the accuracy. For example, voluntary response will not base the
sample on the population.

I 0.6 (a) Let x0 denote the sample mean for the 10 girls and Xn denote the sample mean for the
7 boys. The distribution of x0 - Xn is Normal with mean flxG- = 54.5-54.1 = 0.4 and standard 0.9?5

. .
devtat10n axa-xB : : : :
2.72
. . Thus,
+ 2.42 =125
10 7
P(x0 >x.)=P(x0 -x0 >0)=P(z> O-OAJ=P(z>-0.32)=0.6255. (b) Generate 10
1.25
observations from a Normal distribution with mean 54.5 inches and standard deviation 2.7 10.8 The figure above (right) shows how much area is in each tail and the value z* you want to
inches. Store the average in a list, say L3. Generate 7 observations from a Normal distribution find. Search Table A for 0.03 (half of the 6%% that is not included in a 94% confidence
with mean 54.1 and standard deviation 2.4 inches. Store the average in another list, say L4. interval). This area corresponds to z* = 1.88.
Repeat the previous steps 200 times. Store the 200 differences L3-L4 in list L5. Count how
many of the differences in list L5 are greater than zero. The estimated probability is the count 10.9 (a) The parameter of interest is 11= the mean IQ score for all seventh-grade girls in the
divided by 200. Clear lists L1 to L5. Generate the means of the 200 samples of 10 girls but the school district. The low score (72) is an outlier, but there are no other deviations from
following commands: Normality. In addition, the centra! limit theorem tells us that the sampling distribution of x will
1 ---+ c be approximately Normal since n = 31. We are told to treat these 31 girls as an SRS. The 31
randNorm(54.5,27,10)--+L1: mean(L1)--+L2(C): 1+C--+C measurements in the sample should be independent ifthere are at least IOx3I = 310 seventh-
Continue to press Enter until the counter reaches 200. Now generate the means of the 200 grade girls in this school district. x = 105.84, the 99% confidence interval for fl, is
samples of 7 boys:
1 ---+ c 105.84 2.576( ~ J = 105.84 6.94 = (98.90, 112.78). With 99% confidence, we estimate the
randNorm(54.1,2.4,7)--+L3: mean(L3)--+L4(C): 1+C--+C
mean IQ score for all seventh-grade girls in the school district to be between 98.90 and 112.78
Continue to press Enter until the counter reaches 200.
IQ points. (b) Unless the class was representative of the whole school district, we would not
have been able to generalize our conclusions to the population of interest.
226 Chapter 10
r Estimating with Confidence 227

0.527 inches for every one inch increase in the brother's height. (c) Tonya's predicted height is
10.10 (a) A pharmaceutical product is a medication or device used to treat patients. This =
y = 26.74 +0.5270 x 70 63.63 inches. (d) No, the residual plot above (right) shows a clear
analysis is important to make sure that the production process is working properly and the quadratic pattern and r' = 0.311, so only 31.1% of the variability in the heights of sisters is
medication contains the correct amount of the active ingredient. (b) The parameter of interest is explained by the linear regression line using brother's height as the explanatory variable.
p= the true concentration of the active ingredient in this specimen. The repeated measurements
are clearly not independent because they are taken on the same specimen. However, we are told
that these repeated measurements follow a Normal distribution and the analysis procedure has no I 0.13 For 80 video monitors the margin of error is m = 1.645 ;b =7.9, which is half of 15.8,
v80
bias. The sample mean is x = 0.8404 and the 99% confidence interval for Jl, is
the margin of error for n = 20.
0
0.8404 2.576(
68
~ )"" 0.8404 0.010 I= (0.8303,0.8505). With 99% confidence, we estimate the
10.14 (a) A 98% confidence interval for Jl =the mean scale reading for a 10 gram weight is
true concentration of the active ingredient for this specimen to be between 0.8303 and 0.8505
grams per liter. (c) "99% confident" means that if we repeated the entire process of taking three 10.0023 2.33 ( 0.0002)
.j5 ""10.0023 0.0002 = (10.0021, I0.0025). We are 98% confident that the
measurements and computing confidence intervals over and over a large number of times, then
99% of the resulting confidence intervals would contain the true concentration of the active mean scale reading for a 10 gram weight is between 10.0021 and 10.0025 grams. Notice that 10
ingredient. is not in the 98% confidence interval, so there is some evidence that the scale might need to be
2 33 0 0002
adjusted. (b) To meet the specifications, we need n ;:> ( x
0.0001
)' =21.7156 or n = 22
10.11 (a) We want to estimate p= the mean length of time the general managers have spent measurements.
with their current hotel chain. The sample of managers is a SRS (stated in question) and the
sample size is large enough (n = 114) to use the central limit theorem to assume normality for the I 0.15 (a) A 95% confidence interval for Jl =mean score gain on the second SAT Mathematic

~) 22 3.1 = (18.9,25.1). With 95% confidence we estimate the mean


sampling distribution. The managers' length of employment is independent and so the
conditions for a confidence interval for a mean are met. 99% C.I. for p= exam is 22 1.96(
viOOO
=
11.78 2.576( ~) =11.78 0.77 = (11.01, 12.55). With 99% confidence, we estimate that the gain for all Math SAT second try scores to be between 18.9 and 25.1 points. (b) The 90%
confidence interval is (19.4, 24.6) and the 99% confidence interval is (17.93, 26.07). (c) The
mean length of time the general managers have spent with their current hotel chain is between confidence interval widens with confidence level. See the figure below.
11.01 and 12.55 years. (b) 46 out of 160 did not reply. This nonresponse could have affected
the results of our confidence interval considerably, especially if those who didn't respond
differed in a systematic way from those who did.

There is a very weak, positive association

I 0.16 (a) When n = 250, a 95% confidence interval for Jl is

22 1.96( }so)= 22 6.2 = (15.8,28.2). (b) (a) When n = 4000, a 95% confidence interval

~)"'221.55=(20.45,23.55). (c) Themarginsoferrorare3.099,6.198,


"M ~ M M M m .M n M ~~ ~ u M w ro n n .n
H<ilght Of bfother"_(lndl.es} Height or brother.(irid!es)
forp is 221.96(
v4000
(b) Let y =sister's height and x =brother's height. The least squares regression line is and 1.550, respectively. The margin of error decreases as the sample size increases (by a factor
y = 26.74 + 0.5270x. The slope indicates that the sister's height will increase on average by
228 Chapter 10
T Estimating with Confidence 229

2
96 50 10.23 (a) We can be 99% confident that between 63% and 69% of all adults favor such an
of I/ Fn ). (d) To meet the specifications, we need n .<! ( 1. x ) = 240 I , so take n = 240 I
0 66 034
2
. amendment. Solve the equation z' x = 0.03 to find the critical value z*. The critical
students. 1664
value is z' = 2.58, which means that the confidence level is 99%. (b) The survey excludes
10.17 (a) The computations are correct. (b) Since the numbers are based on a voluntary people without telephones, a large percentage of whom would be poor, so this group would be
response, rather than an SRS, the methods of this section cannot be used; the interval does not underrepresented. Also, Alaska and Hawaii are not included in the sample.
apply to the whole population.
10.24 (a) A boxplot (left) and a Normal probability plot (right) are shown below. The median
10.18 (a) We don't know if this interval contains the true percent of all voters who favored is almost exactly in the center of the box and the left whisker is only slightly longer than the right
Kerry, but it is based on a method that captures the unknown parameter 95% of the time. (b) whisker, so the distribution of healing rate appears to be roughly symmetric. The linear trend in
Since the margin of error was 2%, the true value ofpcould be as low as 49%. Thus, the the Normal probability plots indicates that the Normal distribution provides a reasonable model
confidence interval contains some values ofp which suggest that Bush will win. the rate.
(c) The proportion of voters that favor Kerry is not random---either a majority favors Kerry,
or they don't. Discussing probabilities about this proportion has little meaning: the "probability"
the politician asked about is either I or 0 (respectively).

10.19 (a) We would want the sample to be a SRS of the population under study and the
observations to be independently sampled. As the sample size is only 25, the population should
be approximately normally distributed. (b) A 95% confidence interval for JL is 76 12 = (64,
88). We are 95% confident that the population mean is between 64 and 88, within a range of 12
on either side of the sample mean. (c) When using this method for repeated samples, 95% of the
resulting confidence intervals will capture JL.
(b) A 90% confidence interval for JL is 25.67 I = 25.673.10 = (22.57,28.77). We are
10.20 (a) A SRS of all seniors was obtained and the sample size (n = 500) is large enough so
that the distribution o the sample mean will be approximately Normal. The population size is 90% confident that the mean rate of healing is between 22.57 and 28.77 micrometers per hour.
also clearly greater than SOOx 10 = 5000, so the conditions for using the confidence interval for a (c) Her interval is wider. To be more confident that an interval contains the true population
mean are satisfied. A 99% confidence interval for JL is parameter, we must increase the length of the interval, which means allow a larger margin of
error. The margin of error for 95% confidence is larger than for 90% confidence.
4612.576( ~ )=46111.52 =(449.48,472.52). We are 99% confident that the mean SAT
2
, 8 1.645x8
Math score for all California seniors is between 449 and 4 73 points. (b) In order to estimate the 10.25 We want z Fn ~l,soweneed n<! (
1 )
=173.19 orn= 174newts.
mean within 5 points, the margin of error needs to be 5 so the sample size must be
2
2.576xl00)
n :<! ( = 2654.31. Taken = 2,655 students. 10.26 (a) The 90% confidence interval for the mean rate of healing for newts is (22.565,
5 28.768). The calculator screens are shown below.

10.21 No. A confidence interval describes a parameter, not an individual measurement.

10.22 The sample sizes and critical values are the same, but the variability in the two
populations is different. The margin of error for the 3'd graders will be smaller because the
variability in the heights of 3'd graders is smaller than the variability in the heights of children in
kindergarten through fifth grade.
230 Chapter 10 Estimating with Confidence 231

Zinterval Zlnterval .. plots indicates that the Normal distribution is reasonable, even though the sample size is very
small.
I nPt: lll?IE' Stats (22.565,28.768)
a:15 x--25.66666667
List: L1 Sx.8.324308839
Fre-:.j:1 n=18
C-Level:90
Calculate

(b) For steps I, 2, and 4 in the Inference Toolbox, see the solution to Exercise 10.11 (a). The
99% confidence interval for the mean number of years for the hotel managers is (11.008,
12.552). The calculator screens are shown below. Since the observations are taken from the same production run, they are not independent.
However, they are representative of the vitamin C content ofthe CSB produced during this run,
Zintervai Zlnterval so we will proceed as if the conditions are satisfied. (b) Since n = 8, df= 7 and t' = 2.365. A
InPt:Data ~ ( 11.008, 12. 552)
,u is 22.5 2.36sc~
13
cr:3.2 x=11. 78 95% confidence interval for ) =22.5 6.013 = (16.487,28.513). We are
x: 11.78 n=114 95% confident that the mean vitamin C content of the CSB produced during this run is between
n: 114 16.487 and 28.513 mg/1 00 g.
C-LeveL: 99
Calculate I 0.31 (a) A histogram (left) and a stemplot (right) are shown below. The distribution is slightly
left-skewed with a center around 19. The gas mileages range from a minimum of 13.6 to a
maximum of22.6.
Stem-and-leaf of mpg N ~ 20

10.27 (a) SEM = ffi =1.7898. (b) SinceSEM = }3 =0.01, the standard deviation iss= 0.0173.
Leaf Unit

1 13 6
= 0.10

4 14 368
7 15 668
10.28 (a) df=ll, t'=l.796. (b) df=29, t'=2.045. (c) df=17, t'=l.333. 7 16
8 17 2
10.29 (a) 0.0228. (b), (c), and (d): 10 18 07
10 19 144
df P(t > 2) Absolute difference 7 20 9
2 0.0917 0.0689 6 21 05
10 0.0367 0.0139 4 22 4566
30 0.0273 0.0045
50 0.0255 0.0027
100 0.0241 0.0013 (b) Yes. The sample size is not large enough to use the central limit theorem for Normality.
(e) As the degrees of freedom increase, the area to the right of 2 under the t distributions gets However, there are no outliers or severe skewness in the sample data that suggest the population
closer to the area under the standard normal curve to the right of2. distribution isn't Normal. (c) x= 18.48, s = 3.12 and n = 20, so standard error is 0.6977. Since
n = 20, df= 19 and t* = 2.093, so the margin of error is 1.46. (d) The 95% confidence interval
I~
I 0.30 (a) The conditions are: SRS, Normality, and Independence. A random sample of n = 8 for ,u is 18.48 1.46 = (17.02, 19.94). With 95% confidence we estimate the mean gas mileage
i'
was taken from a production run. A boxplot (left) and Normal probability plot (right) are shown for this vehicle to be between 17.02 and 19.94 mpg. (e) No, gas mileage depends on driving I
below. Even though the left whisker is a little longer than the right whisker, the distribution of habits, and it is unlikely that this one vehicle will be representative of other similar vehicles.
vitamin C level appears to be roughly symmetric. The linear trend in the Normal probability

'i
'!
232 Chapter 10 Estimating with Confidence 233

10.32 (a) The histogram (left) and stemplot (right) below show some left-skewness; however,
for such a small sample, the data are not unreasonably skewed. There are no outliers. (b) With
x= 59.59% and s = 6.26% nitrogen, and t* = 2.306 (df= 8), we are 95% confident that the mean
percent of nitrogen in the atmosphere during the late Cretaceous era is between 54.78% and
64.40%.
Stem-and-leaf of nitrogen N = 9
Leaf Unit = 1. 0

1 4 9
2 5 1
2 5
3 5 4
3 5
3 5 10.35 (a) Taking d =number of disruptive behaviors on moon days- number on other days, we
4 6 0 want to estimate Jid =the mean difference for dementia patients. We don't know how the
(2) 6 33
3 6 445 sample was selected. If these patients aren't representative of the population of interest, we
won't be able to generalize our results. The sample size is too small (n = 15) for the central limit
theorem to apply, so we examine the sample data. The distribution is roughly symmetric with no
I 0.33 (a) Tile histogram (left) and stemplot (right) below show that the distribution is roughly outliers, which gives us no reason to doubt the Normality of the population of differences. We
symmetric with mean x = 3.62 and standard deviations= 3.055. (b) Using df= 30, t* = 2.042, assume that these 15 difference measurements are independent. x, = 2.43, s, = 1.46, n = 15, and
and the interval is (2.548, 4.688). Software and the TI calculator gives (2.552, 4.684) using df=
33. With 95% confidence we estimate the mean change in reasoning score after 6 months of t* for df=l4 is 2.145. Thus, the 95% confidence interval for Jid is 2.43 2.145(~)= (1.62,
piano lessons for all pre-school children is between 2.55 and 4.68 points. (c) No. We don't lmow
that students were assigned at random to the groups in this study. Also, some improvement 3.24). On average, the patients have between 1.62 and 3.24 more episodes of aggressive
come with increased behavior during moon days than other days. (b) No, this is an observational study; there could
Stem-and-leaf of scores N ~ 34 be any number of reasons that there is increased aggressive behavior.
Leaf Unit= 0.10
1 -3 0
3 -2 00 10.36 (a) With data on all U.S. presidents formal inference makes no sense. (b) The 32
4 -1 0 students in an AP Statistics class are not a SRS of all students, so the t interval should not be
5 -0 0
6 0 0
used to make an inference about the mean amount of time all students spend on the internet. (c)
7 1 0 The stemplot is strongly skewed to the left and the sample size is n = 20, so we cannot trust the t
10 2 000 interval.
15 3 00000
(7) 4 0000000
12 5 00 10.37 (a) df= 9, t' = 2.262. (b) df= 19, t' = 2.861. (c) df=6, t' = 1.440.
10 6 000
7 7 00000
2 8 10.38 (a) The histogram (left) and stemplot (right) below show one observation is somewhat
2 9 00 smaller than the others, but it would not be classified as an outlier. The plots do not show severe
skewness or any outliers, so the Normal condition appears to be reasonable based on this small
10.34 (a) Neither the subjects getting the capsules nor the individuals providing them with the sample.
capsules knew which capsules contained caffeine and which were placebos. (b) The differences
(number with caffeine- number with placebo) in the number of beats for the II subjects are 80,
22, 17, 131,-19, 3, 23, -1, 20,-51, and -3. The histogram (left) and boxplot (right) below
show that the distribution is not symmetric and there are 3 outliers. The mean difference xd =
20.2 is greater than the median difference of 17, and the standard deviation of the differences
sd = 48.75 is much larger than the IQR of26. (c) No, the t procedure should not be used
because the sample size is small (n = 11) and the differences in beats per minute are clearly not
Normal.
234 Chapter 10 Estimating with Confidence 235

Stem-and-leaf of levels
Leaf Unit ~ 0.10
N 9 confidence interval is 1.45 1.729e,:OJ = (0.212, 2.69). We are 90% confident that the mean
increase in listening score after attending the summer institute improves by between 0.212 and
1 0 7
1 1 2.69 points. (c) No, their listening skills may have improved for a number of other, for instance
1 2 by studying every night or by living with families that only spoke Spanish. There was no control
2 3 5
for either.
4 4 09
(1) 5 5
4 6
4 7 04
2 8 14

population of adults, but since only healthy men were used for this study,
the results can only be generalized to the population of healthy men. In fact, these healthy men
were not randomly selected, so we need to restrict the inference to other healthy men with
characteristics similar to the men in the study. In short, the limitations of inferences based on
observational studies with small samples (of volunteers) are clearly illustrated with this exercise.
s/ =
The mean of the nine observations is :X= 5.5 percent, whiles= 2.517 and J9 0.839. With
df= 8, the critical value is t* = 1.860 and 90% confidence interval is 5.5 1.860x0.839 = 3.939
to 7.061. We are 90% confident that the mean percent change in polyphenol level for healthy
men with characteristics similar to those in this study is between 3.9% and 7 .l %. (c) The data 10.42 (a) The distribution cannot be normal, because all values must be (presumably) integers
are paired because the polyphenollevel for each man is collected before and after the wine between 0 and 4. (b) The sample size (282) should make the t methods appropriate, because
drinking period, so the observations are clearly dependent. However, this is not a matched-pairs
experiment because this group of men only received one treatment. the distribution of ratings can have no outliers. (c) The margin of error is t' ~ , which
v282
10.39 Let I'= the mean HA V angle in the population of all young patients who require HAV is either 0.161 I (Table C' or 0.1591 (software):
df t Interval
surgery. The t interval may be used (despite the outlier at 50) because n is large (close to 40).
Table C 100 2.626 2.22 0.1611 - 2.0589 to 2.3811
The patients were randomly selected and it is assumed independent. The mean and standard
Software 281 2.5934 2.22 0.1591-2.0609 to 2.3791
deviation of the angle of deformity are x= 25.42 and s = 7.475. Using df= 30 and t* = 2.042,
We are 99% confident that the mean ratmg for boys with ADHD IS between 2.06 and 2.38 for
the 95% confidence interval for I' is 25.422.042C~J= (22.944, 27.896). Software and the this item. (d) Generalizing to boys with ADHD in other locations is not recommended. These
boys were clearly not a random sample from the population of all boys with ADHD.
TI calculators use df= 37 and give the interval22.9642 to 27.8779. With 95% confidence we
estimate the mean HAV angle for all young patients who require HA V surgery to be between 23 . the previous exercise, except fior the chmce
10.4 3 These intervals are constructed as m . o ft'.
and 28. df t Interval
90% confidence TableC 100 1.660 2.22 0.1018-2.1182 to 2.3218
10.40 (a) Dropping the outlier at 50, we have x= 24.76 and s = 6.34. Using df= 30 and t* = Software 281 1.6503 2.22 0.1012 = 2.1188 to 2.3212
2.042, the 95% confidence interval for I' is 24.76 2.042( ffi J = (22.632, 26.888). Software and
95% confidence Table C 100
Software 281
1.984 2.22 0.1217- 2.0983 to 2.3417
1.9684 2.22 0.1207- 2.0993 to 2.3407
the TI calculators use df= 36 and give the interval22.6431 to 26.8704. (b) The interval in part As the confidence level mcreases, the width of the mterval mcreases.
(a) is narrower than the interval in Exercise 10.39. Removing the outlier decreased the standard
99%
deviation and consequently decreased the margin of error.
95%
10.41 (a) The histogram (left) and boxplot (right) below show that the differences are slightly 90%
left-skewed, with no outliers, so the Normal distribution is reasonable and the t interval should
be reliable. (b) The mean of the differences is xd = 1.45, the standard deviation is sd = 3.203,
2.0 2.1 2.2 2.3 2.4
and the standard error of the mean is SEM = 0.716. Using df= 19 and t* = 1.729, the 90%
236 Chapter 10 Estimating with Confidence 237

10.44 (a) The mean difference for these 14 newts is xd = -5.71, the standard deviation is the fee is scaled for institution size, larger institutions can more easily absorb it. These other
sources of error are much more significant than sampling error, which is the only error accounted
s = 10 56 and the standard error ofthe mean is SEM =
d '
10
;;
~14
=2.82. (b) Using df= 13 and t' for in the margin of error from part (a).
1
= 1.771, the 90% confidence interval is -5.71 1.77{ ~j~t) or from -10.71 to -0.71 10.49 (a) The population is all college undergraduates and the parameter is p =the proportion of
college undergraduates who are abstainers. (b) Example 10.15 states that the sample is an SRS.
micrometers per hour. If a large number of samples were obtained and the confidence intervals The population (all undergraduates) is at least 10 times the sample size of 10,904. The number
were computed, then approximately 90% of the intervals would contain the true mean difference of successes np = 2105 and the number of failures n(l- p)= 8799 are both at least 10, so the
in healing rates. (c) No. A histogram (left) and boxplot (right) are shown below. Since the conditions for constructing a confidence interval are satisfied. (c) A 99% confidence interval for
sample size (n = 14) is small and the distribution of the difference is skewed to the left with an
p is 0.193 2.576~0.193x 0.807/10,904 = (0.183,0.203). (d) With 99% confidence, we
outlier, the Normal distribution is not appropriate. The t interval should not be used to make
inferences about estimate between 18.3% and 20.3% of all college undergraduates are classified as abstainers.

10.50 The report should include the sample proportion p = 1127 =0.690lor approximate
1633
sample percent= 69% and the margin of error 1.96~0.6901(1-0.6901)/1633 =0.022 or 2.2
percentage points for 95% confidence. News release: In January 2000, the Gallup Organization
discovered that approximately 69% of adults were satisfied with the way things are going in the
United States. A random sample of 1633 adults participated in the poll and the margin of error is
about 2.2 percentage points. Do you think our readers are still satisfied with the way things are
going in the United States? The results of our local survey will be printed next Wednesday!

0 179 0 821
10.51 (a)Theproportionisp=.!2=o.179 andthestandarderroris SE. = x =0.042.
10.45 (a) The population is the 175 residents ofTonya's dorm and pis the proportion of all the M M
residents who like the food. (b) The sample proportion is p = ~~ = 0.28 . (c) No, the population is (b) A 90% confidence interval is 0.179 1.645 x 0.042 = (0.110,0.247). With 90% confidence,
we estimate that between 11.0% and 24.7% of all applicants lie about having a degree.
not large enough relative to the sample (N = 175 < 500 = 1Ox50).

10.46 (a) The population is the 2400 students at Glen's college, and pis the proportion who 0 54 6
10.52 (a) The standard error is SE. = x 0.4 = 0.0156, so the 95% confidence interval is
1019
p = ~~ = 0.76. (b) Yes-we have an
p

believe tuition is too high. (b) The sample proportion is 0.54 1.96x 0.0156 = (0.509,0.571). We are 95% confident that the proportion of adults who
SRS, the population is 48 times as large as the sample, and the success count (38) and failure would answer "Yes" is between 50.9% and 57 .I%. Notice that 50% is not in the confidence
count (12) are both greater than 10. interval. The margin of error is 1.96x0.0156 or about 3%, as stated. (b) The sample sizes for
men and women are not provided. (c) The margin of error for women alone would be greater
10.47 (a) The population is all adult heterosexuals and p is the proportion of all adult than 0.03 because the sample size for women alone is smaller than 1019.
heterosexuals who have had both a blood transfusion and a sexual partner from a high risk of
AIDS group. (b) The sample proportion is p= 0.002. (c) No, there are only 5 or 6 "successes" 10.53 (a) The margins of error are 1.96x ~ p(l- p)/100 = 0.196x~ p(l- p) . See the table below.
in the sample. (b) With n = 500, the margins of error are 1.96x ~ p(l- p)/500. The new margins of error are

10.48 (a) The standard error of p isSEi= ~0.87x 0.13/430,000 =0.0005129. For 99% less than half their former size (in fact, they have decreased by a factor of }s = 0.447 ).
confidence, the margin of error is 2.576x SEi =0.001321. (b) One source of error is indicated p 0.1 0.2 0.3 0.4 0.5 0.6 0. 7 0.8 0.9
by the wide variation in response rates: We cannot assume that the statements of respondents (a) m.e. 0.0588 0.0784 0.0898 0.0960 0.0980 0.0960 0.0898 0.0784 0.0588
represent the opinions ofnonrespondents. The effect of the participation fee is harder to predict, (b) m.e. 0.0263 0.0351 0.0402 0.0429 0.0438 0.0429 0.0402 0.0351 0.0263
but one possible impact is on the types of institutions that participate in the survey: Even though
238 Chapter 10 Estimating with Confidence 239

alcohol. We do not know that the examined records came from an SRS, so we must be cautious.
10.54 (a) To meet the specifications, we need 1.96~ 0 .44 : 056 ~ 0.03 or Both np=542 aod n(l-p)=1169areatleast 10. Therearemorethan IOxl711 = 17,110fatally
injured bicyclists in the United States. A 99% confidence interval for p
n2 ( -1.96 J' x0.44x0.56=1051.74. Takeasampleofn= 1052adults. (b) With the
0.03 is0.317 2.576 3171711x 0683 = (0.288,0.346). With 99% confidence, we estimate that between

. 1.96 ' . 28.8% aod 34.6% of all fatally injured bicyclists aged 15 or older would test positive for alcohol.
(o.m J x 0.5 x 0.5 = 1067.11 or I 068 adults. The conservative
conservative guess, we need n 2 - -
(b) No. For example, we do not know what percent of cyclists who were not involved in fatal
approach requires 16 more adults. accidents had alcohol in their systems. Many other factors, such as not wearing a helmet, need to
be considered.
I 0.55 (a) The 95% confidence interval for pis 0.64 1.96~0.64x 0.36/1028 = ( 0.61, 0.67).
. , d l.75x0.25
With 95% confidence we estimate that between 61% aod 67% of all teens aged 13 to 17 have 10.60 Our guess IS p = 0.75, so we nee 1.9 6 n ,; 0.04 or
TV s in their rooms. (b) Not all samples will be the same, so there is some variability from
sample to sample. The margin of error accounts for this variability. (c) Teens are hard to reach 96
n 2 ( 1. J' x 0.75 x 0.25 = 450.19. Take an SRS of n = 451 Americaos with at least one ltaliao
aod often unwilling to participate in surveys, so nonresponse bias is a major "practical difficulty" 0.04
for this type of poll. Teens can also be sensitive so response bias associated with the wording of grandparent.
the question or the verbal emphasis by the interviewer may be a problem.

10.56 Our guess is p' = 0.7, so we need


07 0
1.645~ : .3,; 0.04 or
10.61 (a) The sample proportion is
3
p=
: = 0.3275 and a 95% confidence interval for pis
1 91
0.32751.96~0.3275x 0.6725/1191 = (0.3008,0.3541). (b) Only 45 congregations did not
1.645J' . participate in the study, so the nonresponse rate is 45/1236 = 0.0364 or about 3.6%. This
n2 ( - - x0.7x0.3=355.17. TakeaoSRSofn=356students. (b) With p=0.5 aodn=
0.04 nonresponse rate is quite small, which suggests that the results should be reliable: If we had

1.645~ 0 5 x 0 5 =0.0436.
information for the few congregations that failed to respond, our conclusions would probably not
356, themarginoferroris chaoge. very much. (c) Speakers aod listeners probably perceive sermon length differently (just
356
as, say, students aod teachers have different perceptions of the length of a class period).
Listeners tend to exaggerate and report that the sermons lasted longer than they actually did,
10.57 (a) The sample proportion is p = 171 = 0.1943 and the 95% confidence interval is while speakers are more likely to report shorter sermons. Since the key informants provided the
880 information, the estimate of the true population proportion may be too low.
0.1943 1.96~0.1943x 0.8057/880 = (0.1682, 0.2205). We are 95% confident that between
16.82% and 22.05% of all drivers would say that they had run at least one red light. (b) More 10.62 (a) The sample proportion is p = 3547 = 0.6341. The standard error is
than 171 respondents have run red lights. We would not expect very many people to claim they 5594
have run red lights when they have not, but some people will deny running red lights when they
SE, = p(l- p) =0.00644, so the margin of error for 90% confidence is 1.645 xSE, = 0.0106, and
have. 5594
the interval is 0.6235 to 0.6447. This interval was found using a procedure that includes the true
10.58 (a) Our guess is p' = 0.2, so we need 2.576~ 0 2 : 08 ~ 0.015 or proportion 90% of the time. (b) Yes, we do not know if those who did respond can reliably
represent those who did not.

2.576J' I 0.63 No, the data are not based on an SRS, aod therefore inference procedures are not reliable
n2 ( _ x0.2x0.8=4718.77. TakeanSRSofn=4719customers. (b) With p=O.l and
0 015 in this case. A voluntary response sample is typically biased.

n=4719,themarginoferroris 2.576 O.lx0. 9 0.0112.


4719 10.64 (a) The sample proportion is p = 107 = 0.8425 aod a 99% confidence interval for pis
127
10.59 (a) The population of interest is all bicyclists aged 15 or older who were fatally injured 0.8425 2.576~0.8425 x 0.1575/127 = ( 0.7592, 0.9258). With 99% confidence, we estimate
and p is the proportion of all fatally injured bicyclists aged 15 or older who tested positive for
240 Chapter 10 Estimating with Confidence 241

that between 76% and 93% of all undergraduate students pray at least a few times a year. (b)
No, the fact that these students were all in psychology and communications courses makes it 10.65 (a) The histogram below shows that the distribution is slightly skewed to the right, but the
seem unlikely that they are truly representative of all undergraduates. Normal distribution is reasonable.

CASE CLOSED!
I. Graphical summaries of the call response times are shown below. The histogram (left) and
boxplot (right) clearly show that the distribution of the call response times is strongly skewed to
the right with several outliers. The Normal distribution is certainly not appropriate for these
response times. The mean response time is 31.99 seconds and the median response time is 20
seconds. The large outliers clearly have an impact on the mean, and they will also influence the
deviation. The standard deviation is 37.2 seconds and the is 32 seconds.

(b) The sample mean is :X= 224.002 mm and the standard deviation is s = 0.062, very close to
the known standard deviation in the population. A 95% confidence interval for p is

224.002 1.96 O.~O (223.973, 224.031). With 95% confidence, we estimate the mean critical
=
v16
dimension for auto engine crankshafts of this type are between 223.973 mm and 224.031 mm.
(c) In repeated samples of this size, 95% of the intervals obtained will contain the true mean.
ts
(d) Th e spect'fitcatton 1.96 0.06
vn 1 ~ .02, so we need n ;o, (1.96x0.06)'
0.02
=34.57 or 35 crankshafts.
2. Let p denote the mean call response time at the bank's customer service center. Using df=

100 and t' = 1.984, a 95% confidence interval for p is 31.991.984


3
g=
v300
(27.73,36.25).
10.66 (a) If we take a different sample, then we will probably get a different estimate. There is
variability from sample to sample. (b) The sample proportion is p = 0.37 and the 95%
Software and the Tl calculators give the interval from 27.7673 to 36.2194 seconds. confidence interval for pis 0.37 1.96~0.37x0.63/IOOO =(0.3401,0.3999). (c) Yes, the
3. Let p denote the proportion of call response times that are at most 30 seconds at the bank's
=
margin of error is 1.96~0.37 x 0.63/1000 0.0299 or about 3 percentage points. (d) Yes, most
people are not thinking about football during June so the proportion would probably decrease
customer service center. The sample proportion is p =
203
300
= 0.6767 and the 95% confidence and more people would say that baseball, tennis, soccer, or racing was their favorite sport to
watch.
=
interval for pis 0.6767 1.96~0.6767x0.3233/300 (0.6238,0.7296).
10.67 (a) Using df= 26 and t' = 2.779, a 95% confidenceinterval for p is
4. The distribution of response times clearly does not follow a Normal distribution. The major
conditions regarding random sampling and independence are satisfied for the inferences below. 114.92.779 ~ (109.93,119.87). With 95% confidence we estimate that the mean seated
=
...;27
However, it is worth noting that we are relying on the central limit theorem and the robustness of
systolic blood pressure of all healthy white males is between 109.93 and 119.87 mm Hg. (b)
the t procedures for the inference regarding the mean call response time because the sample size
The conditions are SRS, Normality, and Independence. The most important condition is that the
(n = 300) is large. We are 95% confident that the mean call response time is between
27 members of the placebo group can be viewed as an SRS of the population. The Normality
approximately 28 and 36 seconds. The large call response times, which unfortunately occur in
condition requires that the distribution of seated systolic BP in this population is Normal, or at
this business, clearly have an impact on the mean. With 95% confidence, we estimate the
least not too nonNormal. Since the sample size is moderate, the procedure should be valid as
proportion of calls answered within 30 seconds to be between 62% and 73%. The intervals
long as the data show no outliers and no strong skewness. We must assume that these 27
reported above are based on methods that will include the true mean and true proportion for all
measurements are independent.
calls to your customer service center 95% of the time. (P.S. As you know, there is another way
to describe the center of the call waiting times. This statistic is known as the median and it is
I 0.68 (a) For each subject, subtract the weight before from the weight after to determine the
very useful for skewed distributions. If you would like to learn more about inferences based on
weight gain. For example, the weight gain for Subject I is 61.7- 55.7 = 6 kg. The mean weight
the median, we can schedule another meeting.)
242 Chapter 10 Estimating with Confidence 243

gain for all 16 adults is x, = 4.7313 kg, the standard deviation is sd = 1.7457 kg, and the standard

error of the mean is SEM = 1.~-r 0.4364 kg. Using df= 15 and t' = 2.131, the 95%
16

confidence interval is 4.7313 2.13 { l.~~r) or from 3.8013 to 5.6613 kg. Software and the TI

calculators give the interval (3.8010, 5.6615). (b) Because there are 2.2 kg per pound, multiply
the value in kilograms by 2.2 to obtain pounds. The confidence interval from software and the
calculators becomes 8.3622 to 12.4553 lbs. (c) No, the value 16 is not in our 95% confidence
interval. The data suggest that the excess calories were not converted into weight. The subjects
must have used this energy some other way.
(b) The data are from a random sample, and the sample size is large (n = 50), so the central limit
660 theorem tells us that the sampling distribution of x is approximately Normal. The population of
10.69 (a) The sample proportion is p = = 0.44 and the 95% confidence interval for pis
1500 commercial Internet service providers is also much larger than !OxSO = 500 users. (c) The
=
0.44 1.96~0.44 x 0.56/1500 (0.4149, 0.4651). With 95% confidence, we estimate that sample mean is x = 20.9, the standard deviation is s = 7.6459 and the standard error of the mean
is SEM = ~ 1.0813. Using df= 40 and t' = 1.684, the confidence interval for J.l is
between 41.,5% and 46.5% of all adults would use alternative medicine. (b) The news report 7 9

should contain the estimate and the margin of error (0.0251 or 2.51 %). A brief, nontechnical 50

~ 9 ) = (19.08,22.72).
explanation of"95% confidence" might also be included. News Release: A nationwide survey
discovered that 44% of adults would use alternative medicine if traditional medicine was not 20.9 1.684c Software and the TI calculators give (19.0872, 22.7128),
producing the results they wanted. A random sample of 1500 adults participated in the survey
using df = 49. With 90% confidence, we estimate the mean cost for users of commercial Internet
and the margin of error is about 2.5 percentage points. What percent of our readers do you think.
service providers in August 2000 to be between $19.08 and $22.72.
would turn to alternative medicine? The results of our local survey will be printed next Monday!
10.72 (a) The sample mean is 7.5x60 = 450 minutes. The margin of error is 20 minutes, so
. . 221
10.70 (a) The samp Ie proportwn IS p =
270
= S'E
0 .8185 and i> =
0.8185x0.1815 .
270
= 0 02346 ,
1.96 ~ = 20 minutes. Thus, the standard deviation iss=
20
x.J40 = 64.5363 minutes. (b)
=
so the margin of error for a 99% confidence interval is 2.576x 0.02346 0.0604. (b) Using the
v40 1.96
This interpretation is incorrect. The confidence interval provided gives an interval estimate for
estimate from part (a) as our guess p' = 0.82, we need 2.576~ 0 82 : O.l 8 :-:; 0.03 or the mean lifetime of batteries produced by this company, not individual lifetimes. (c) No, a
confidence interval provides a statement about an unknown population mean, not another sample
2 576 mean. (d) We are 95% confident that the mean lifetime of all AA batteries produced by this
n <o (
0.03
=
)' x 0.82x 0.18 1088.27. Take an SRS ofn = 1089 doctors. In order to guarantee company is between 430 and 470 minutes. This interval is based on a method that will capture
the l!ue mean lifetime 95% of the time.
that the margin of error is less than 3 percentage points, the conservative approach with p' = 0.5
2 576
should be used. Thus, we would need n <o (
0.03
)' x 0.5 x 0.5 =1843.27 or 1844 doctors. 10.73 (a) The sample proportion is
750
p= =
0.4202 andSEP. = 0.4
202 0 5798
x 0.011683 =
1785 1785 ,

I 0.71 (a) The histogram below shows that the distribution is skewed to the right and the boxplot
=
so a 99% confidence interval for pis 0.4202 2.576x O.Ql 1683 (0.390,0.450). With 99%
below shows three low and five high outliers. confidence, we estimate that between 39% and 45% of all adults attended church or synagogue
within the last 7 days. (b) Using the conservative guess p' =0.5, the specification is

2.576~ 0 5 nx 05 :-:; 0.0 I, so we need n <o ( 20.0!



576
)' x 0.5 x 0.5 = 16589.44. Take an SRS of n =

16,590 adults. The use of p' = 0.5 is reasonable because our confidence interval shows that the
actualp is in the range 0.3 to 0.7.
244 Chapter 10 245

10.74 (a) The differences are spread from -0.018 to 0.020 g, with mean x = -0.0015 g and Chapter 11
standard deviation s = 0.0122 g. A stemplot is shown below, but the sample is too small to make
judgments about skewness or symmetry. I
11.1 (a) ll =the mean score for all older students at this college. (b) If f.1 = 115, the sampling
Stem-and-leaf of diff N = 8 I
distribution of x in Normal with mean 115 and standard deviation 30/.fi5 = 6 or N(ll5, 6).
Leaf Unit = 0.0010 See the sketch below (on the left). (c) Assuming H, is true, observing a mean of 118.6 or higher
2 -1 85
2 -1
I
would not be surprising, but a mean of 125.7 or higher is less likely, and therefore provides more
4 -0 65 evidence against H,. (d) Yes, the sample size is not large enough (n = 25) to use the central
4 -0 I
4 0 2 limit theorem for normality. (e) No, the older students at this college may not be representative
3 0 55 of older students at other in the USA.
1 1
1 1
1 2 0

(b) Using df= 7 and t' = 2.365, a 95% confidence interval for f.1 is

-0.0015 2.365 ( 0.0122)


-J8 = -0.0015 0.0102 =(-0.0117,0.0087) We are 95% confident that the
mean diffefence in TBBMC readings is between -0.0117 and 0.0087 g. (c) The subjects from
this sample may be representative of future subjects, but the test results and confidence interval
are suspect because this is not a random sample.

11.2 (a) f1 =the mean hemoglobin level for all children of this age in Jordan. (b) If f.1 = 12, the
sampling distribution of x is Normal with mean 12 g/dl and standard deviation
1.6/.J50 = 0.2263 g/dl. See the sketch above (on the right). (c) A result like x = 11.3g/dllies
way down in the low tail of the density curve (over 3 standard deviations below the mean), while
11.8 g/dl is fairly close to the middle. If f.1 = 12g/dl, observing a mean of 11.8 g/dl or smaller
would not be too surprising, but a mean of 11.3 g/dl or smaller is extremely unlikely, and it
therefore provides strong evidence that f.1 < 12g/dl. (d) No, since the sample size is large (n =
50), the central limit theorem says that the sampling distribution of xis approximately N(12 g/dl,
0.2263 g/dl). (e) No, we are told that this is sample, but we don't know if these children were
randomly selected from any larger population. The only way we can generalize to a larger
population is if this sample is representative ofthe larger population.

11.3 (a) H, :,u=l15; H. :f.1>115. (b) H, :f.1=12; H. :,u<12.

11.4 (a) f.1 =the mean gas mileage for Larry's car on the highway.
H 0 : f.1 = 26 mpg; H. : ,u > 26 mpg . (b) p = the proportion of teens in your school who rarely or
never fight with their friends. H0 : p = 0.72; H.: p * 0.72.
11.5 (a) p =the proportion of calls involving life threatening injuries where the paramedics
arrived within 8 minutes. H 0 : p = 0.78; H. : p > 0.78. (b) f.1 =the mean percent of local
household food expenditures used for restaurant meals. H, : ,u = 30; H. : f.1 * 30 .
~I

244 Chapter 10 245

10.74 (a) The differences are spread from -0.018 to 0.020 g, with mean x =-0.0015 g and Chapter 11
standard deviation s = 0.0122 g. A stemplot is shown below, but the sample is too small to make
judgments about skewness or symmetry. 11.1 (a) 11 =the mean score for all older students at this college. (b) If p = 115, the sampling
Stem-and-leaf of diff N = 8 distribution of x in Normal with mean 115 and standard deviation 30/ .J25
= 6 or N(ll5, 6).
Leaf Unit = 0.0010 See the sketch below (on the left). (c) Assuming H 0 is true, observing a mean of 118.6 or higher
2 -1 85
2 -1 would not be surprising, but a mean of 125.7 or higher is less likely, and therefore provides more
4 -0 65 evidence against H,. (d) Yes, the sample size is not large enough ( n = 25) to use the central
4 -0
4 0 2 limit theorem for normality. (e) No, the older students at this college may not be representative
3 0 55 of older students at other in the USA.
1 1
1 1
1 2 0

(b) Using df= 7 and t' = 2.365, a 95% confidence interval for p is
0.0122) .
-0.0015 2.365 ( ..[8 = -0.0015 0.0102 = (-0.0117, 0.0087) We are 95% confident that the

mean difference in TBBMC readings is between -O.Dll7 and 0.0087 g. (c) The subjects from
this sample may be representative of future subjects, but the test results and confidence interval
are suspect because this is not a random sample. .

11.2 (a) f1 ""the mean hemoglobin level for all children of this age in Jordan. (b) If f1 = 12, the
sampling distribution of x is Normal with mean 12 gldl and standard deviation
1.6/../50 = 0.2263 g/dl. See the sketch above (on the right). (c) A result like x = 11.3 g/dllies
way down in the low tail of the density curve (over 3 standard deviations below the mean}, while
11.8 g/dl is fairly close to the middle. If f1 = 12g/dl, observing a mean of 11.8 g/dl or smaller
would not be too surprising, but a mean of 11.3 g/dl or smaller is extremely unlikely, and it
therefore provides strong evidence that f1 < 12 g/dl. (d) No, since the sample size is large (n =
50}, the central limit theorem says that the sampling distribution of xis approximately N(12 g/dl,
0.2263 g/dl). (e) No, we are told that this is sample, but we don't know if these children were
randomly selected from any larger population. The only way we can generalize to a larger
population is if this sample is representative of the larger population.

11.3 (a) H 0 :p=ll5; H.:p>115. (b) H 0 :p=12; H.:p<12.

11.4 (a) p =the mean gas mileage for Larry's car on the highway.
H 0 :p = 26mpg; H. :p > 26mpg. (b) p = the proportion of teens in your school who rarely or
never fight with their friends. H 0 : p = 0. 72; H. : p * 0.72 .
11.5 (a) p =the proportion of calls involving life threatening injuries where the paramedics
arrived within 8 minutes. H 0 : p = 0.78; H.: p > 0.78. (b) p =the mean percent oflocal
household food expenditures used for restaurant meals. H 0 :ji--"30~ H. ;.;u.,..,.;30.
f::.e?.~ ~-!
246 Chapter 11
T
I Testing a Claim 247
I
I I
11.6 (a) H0 and Ha have been switched: The null hypothesis should be a statement of"no representative sample of all the weeks after the price is reduced. Since three consecutive weeks '
change." (b) The null hypothesis should be a statement about p, not x. (c) Our hypothesis
I have been chosen immediately after the price has been reduced, the SRS and independence
should be "some claim about the population." Whether or not it rains tomorrow is not such a I con d1't'wns are not very reaI'1st!c. (c) The test statistic
. . 1s. z = 398-354
/ ,[3 2.31 and the P-value = 1
statement. Put another way, hypothesis testing-at least as described in this text-does not deal I 33 3
with random outcomes, but rather with statements that are either true or fulse. Rain (or not) is a -0.9896 = 0.0104. (d) The P-value of0.0104 tells us that there is only about a 1.04% chance of
random outcome. I getting values of x at or above 398 units when H 0 is true, so this is convincing evidence that

11.7 (a) Because the workers were chosen without replacement, randomly sampled from the
I
assembly workers and then randomly assigned to each group, the requirements of SRS and
independence are met. The question states that the differences in job satisfaction follow a
Normal distribution. (b) Yes, because the sample size (n = 18) is too small for the central limit I
theorem to apply. I
11.8 (a) No, the sample size (n = 75) is much larger so the central limit theorem says that the I
sampling distribution of xis approximately Normal. (b) For :X= 17 and n = 75, the test statistic I .I
is z= -0 2.45 andtheP-valueis P{Zs-2.45orZ~2.45)=2x0.0071=0.0142. (c)
17
60/$5 .
I
This is fairly strong evidence against H 0 I
I 11.12 (a) P(Z ~ 1.6) = 1-0.9452 = 0.0548. (b) P(Z s 1.6) = 0.9452. (c)
11.9 See the sketch in the solution to Exercise 11.1 for parts (a) and (b). (a) The test statistic is P(Z s -1.6or Z ~ 1.6) = 2x 0.0548 = 0.1096.
118.6-115
0.6 and the P-value = 1-0.7257 = 0.2743. (b) The test statistic is
I
z
30/$5 I 11.13 Significance at the I% level means that the P-value for the test is less than 0.01. If the P-
value is less than 0.0 I, then it must also be less than 0.05. If a test is significant at the 5% level,
z 125 7 -
115
'1.78 andtheP-value=1-0.9625=0.0375.(c)If p=115,theprobabilityof I then we know that the P-value is less than 0.05. However, we don't know how much smaller
30/$5
getting a sample meal\ of 118.6 or something more extreme by chance is 0.2743, and the
I than 0.05 it is, so it may or may not be less than 0.01. In short, knowing that a test is significant
at the 5% level does not tell you anything about its significance at the 1% level.
probability of getting a sample mean of 125.7 or something more extreme by chance is 0.0375-
much more unlikely. A small P-value (such as 0.0375) tells us that values of x similar to 125.7
I 11.14 (a) The P-value is P(Z;,: 2.42) = 1-0.9922 =0.0078. Since the Pvalue is less than 0.05,
would rarely occur when H 0 is true, while a P-value of0.2743 indicates that results similar to we say that the result is statistically significant at the 5% level. (b) Since the P-value is less than
118.6 give little reason to doubt H 0 0.01, we also say that the result is statistically significant at the I% level. (c) For both
significance levels, we would reject H 0 and conciude that the mean nicotine content is greater
11.10 See the sketch in the solution to Exercise 11.2 for parts (a) and (b). (a) For x =11.3 g/dl, than 1.4 mg for this brand of cigarettes.
1
theteststatisticisz 1.~~ -3.09 andthePcvalue=0.0010. (b) For :X=11.8g/dl,thetest 11.15 (a) Reject H, ifz > 1.645. (b) Reject H 0 if lzl > 1.96. In other words, we would reject
1.6 50

statisticisz
1
1.i~
1.6 50
-0.88 andtheP-value=0.1894. (c) TheP-valueof0.0010tellsus
H, when z s -1.96or z;,: 1.96. (c) For tests at a fixed significance level (a), we reject H0
when we observe values of our statistic that are so extreme (far from the mean of the sampling
distribution) that they would rarely occur when H0 is true. For a two-sided alternative, the
that values of x similar to 11.3 g/dl would rarely occur when H 0 is true, while a P-value of
extreme values could be small or large (i.e., in either tail), so the significance level is divided
0.1894 indicates that results similar to 11.8 give little reason to doubt H,. evenly in the two tails. For a one-sided alternative, all the extreme values must be in one tail, so
all ofthe area is in that tail.
11.11 (a) x= 398. (b) If p = 354, the sampling distribution of x is Normal with mean 354 and
=
standard deviation 33/,[3 19.0526 because weekly sales are Normal. See the sketch below.
We must assume independence, and the three chosen weeks can be considered as a
t
I
248
Chapter II I Testing a Claim 249
I
11.16 (a) The test statistic is z = -.
0.4 365j -0.5 2 20 and the P -vaIue ts
0.2887 "100
. I too surprising, but a mean of27 .6% or smaller is unlikely, and it therefore provides evidence that
p<31%. (c) For x=30.2%,theteststatisticisz= -
30 2 31
-0.53 andtheP-value=
P(Z,; -2.20or z ~ 2.20) = 2x 0.0139 = 0.0278. (b) Since the P-value is less than 0.05, we say
I 9.6/Ko

that the result is statistically significant at the 5% level. (c) Since the P-value is greater than I 0.2981. For x = 27.6%, the test statistic isz = ~~~~~ -2.24 and the P-value = 0.0125. (d)
0.0 I, we say that the result is not statistically significant at the I% level. (d) At the 5% level, we I
would reject H0 and conclude that the random number generator does not produce numbers with The P-value of0.2981 indicates that x = 30.2% gives little reason to doubt H. . At both
I significance levels we would conclude that an average of 31% is spent on hou:ing. The P-value
an average of 0.5. At the I% level, we would not reject H0 and conclude that the observed
deviation from the mean of0.5 is something that could happen by chance. That is, we would I o~0.0125 tells us that x = 27.6% is significant at the 5% level, ]Jut not at the 1% level. At the
5 Yo level we would conclude that households spend less than 31% on average for housing but at
conclude that the random number generator is working fine at the 1% level. I the I% level we would conclude that households spend 31% on average. '
11.17 The command rand(IOO)-" L 1 generates 100 random numbers in the interval (0,1) and I
11.21 (a) For a two-sided alternative, z is statistically significant at a = 0.005 if I z 1 > 2.81. In
stores them in list L 1 The answers will vary but one simulation generated random numbers with
=
mean x = 0.4851, test statistic z -0.52, and P-value = 0.603. Since 0.603 is greater than 0.01
I other words, we would reject H 0 when z :0; -2.81 or z ~ 2.81. See the sketch below on the left.
and 0.05, we do not reject H0 at either significance level, and conclude that there is no evidence I (b) For a one-sided alternative (on the positive side), z is statistically significant at a = 0.005 if
z > 2.576. on the
to suggest that the mean of the random numbers generated is different from 0.5. I
11.18 At the 5% significance level the results of both studies would be considered statistically I
significant. However, the P-values convey impmtant information about how extreme the results I
really are. For the first study the P-value is barely less than 0.05, so the result is barely liI
significant. The result for the second study would be considered statistically significant at any I 0.995
I
reasonable significance level a . '
I
11.19 (a) (I) Take a random sample of several apartments and measure the area of each. (2) I
H : Jl = 1250; H.: Jl < 1250. (b) (I) Take a random sample of service calls over the year and
0
find out how long the response time was on each call. (2) H 0 : Jl = 1.8 ; H. : Jl 1.8. (c) (I)
I
Take a random sample of students from his school and find the proportion of lefties. (2) I 11.22 The explanation is not correct. Either H 0 is true (in which case the probability that H.0 is
H 0 :p=0.12; H. :p0.12. I t~ue .is I) or H 0 is false (in which case the probability that H 0 is true is 0). Statistically

11.20 (a) If Jl = 31%, the sampling distribution of x is Normal with mean 31% and standard I Sl~l~cant at the a= 0.05 le.vel, means that if H 0 is true, then the chance of observing a test
statlsttc of the value we obtamed or something more extreme is less than 5%.
I
I 11.23 (a) If the population mean is 15, there's about an 8% chance of getting a sample mean as
far from or even farther from 15 as we did in this sample. (b) We would not reject H0 at
I a= 0.05 because the P-value of0.082 is greater than 0.05. (c) The probability that you are

I wrong is either 0 or I, depending on the. true value of Jl .

I 11.24 For z* = 2, the P-value would be 2x P(Z > 2)= 2x0.0228 = 0.0456, and for z*=3, the
P-valuewouldbe 2xP(Z>3)=2x0.0013=0.0026. Note: Inotherwords, the Supreme Court
has chosen to use a no bigger than about 0. 05.

(b) A result like x = 27.6% g/dllies down in the low tail of the density curve, while 30.2% is
fairly close to the middle. If Jl = 31% g/dl, observing a mean of 30.2% or smaller would not be
T i
il
I
Chapter I I I Testing a Claim 251
250
I
11.25 An SRS is necessary to ensure generalizability, Normality is required to perform I 13.11 points, so the test statistic is z ./460 1.78 and the P-value is P(Z;:: 1.78) = 0.0375.
J3.1l-
calculations using z, and independence is required for the standard deviation of the sampling 501
distribution of x to be accurate. I The P-value of0.0375 is less than 0.05, so we reject H 0 and conclude that the mean change in
I thke.SAthT Math scores is greate~ than 0. In other words, the students significantly improve when
1!.26 P = 0. 02 means that if there were truly no change in cholesterol-if all differences were ta mg e test for the second time.
due to random fluctuation or "chance assigmnent"-then only 2% of all samples would produce I
results as extreme or more extreme as the ones found in this sample.
I 1>
11.27 We test H, : p = 5 versus H. : p < 5, where p =the mean dissolved oxygen content in I
the stream. Water is taken at randomly chosen locations so the SRS condition is satisfied. The
observations are not independent since they are being taken from the same stream, but the
I
number of possible collection locations along the stream (the population size) is clearly much I I
greater than the sample size (n = 45). The sample size is large (n = 45), so the central limit
theorem says the sampling distribution of the sample mean will be approximatc;ly Normal. The
I

test statitic is z 462 ""}s= -2.77 and the P-value is P(Z => -2.77) = 0.0028. The P-value is
I
0.921 45 I

Jess than 0.01, so there is very strong evidence to reject H0 and conclude that the mean dissolved
oxygen content in the stream is Jess than 5 mg.
I
I
I . . output mean
slightly larger than the median of 11.501 and the sample standard deviation ofO 095 is sm ll

I t~:n the pop~lation standard deviation ofO.i. 65% of the sample observations a~c bctwee~
er
x _is and 95 Yo of the observations are between x 2s. Although it is hard to tell based on such
I a small sample, the data could have come from a Normal population.
.,.,..

I Var~able N Mean StDev Minimum Ql M d'
hardness 20 11.516 0.0950 11.360 11.459 1~.~~~ 11 . 6 g~ M~~i~~
,.. ...,.;I' I (b) We want to test H 0 : p = 11.5 versus H. : p 11.5, where f.J =the mean targ~t value of
*

" I tab.let hardness. The sample was randomly selected without replacement so the SRS condition is
satls~ed. The number of tablets in the population must be greater than 1Ox20=200 Th
I questiOn of normality is discussed in part (a). The conditions for a one-sample z-te~t
fo; a
64 11 5
I populationmeanaremet. The test statistic is z ll.Sl - 03
0.2! ,fijj 7 andthe p -va1ue = 2x0.3557
The descriptive statistics below from Minitab indicate that the distribution of the differences is
centered slightly above 0 (13.11 if you use the mean and 2 if you use the median) with the
I = ~7.114. Because th~ ~-value is greater than any reasonable a level, we do not rejectH,.

smallest difference being -67 and the largest difference being 128.
Variable N Mean StDev Minimum Ql Median Q3 Maximum
I This IS reasonable variatiOn when the null hypothesis is true, so the hardness levels appear to be
on target.
scorediff 46 13.11 52.97 -67.00 -30.50 2.00 52.00 128.00
(b) No, as noted above, the distribution of the differences appears to be skewed to the right. (c)
I 11.30 .We want to test H 0 : p = 300 versus H. : p < 300, where J.i =the mean amount of cola in
We want to test H : p = 0 versus H a : p > 0 , where f.J = the mean change in SAT Math scores. I a certam ty~e of bottle. The SRS condition is satisfied because the bottles were randomly
I
0
The students are randomly chosen high school students, so the SRS condition is satisfied. The selected. Smce the sampling is without replacement the number of bottles produced in one day
difference in the scores for one student is independent of the differences for other students so the mu.st be larger t~an.l 0~6 = 60 (which should be no problem), so the independence condition is
independence condition is also satisfied. Even though the population of difference may be satisfied. The d1stnbutwn of the mean contents for 6 bbttles is Normal because the population is
299
skewed to the right, the sample size (n=46) is large enough so that the central limit theorem says Normal. The test statistic is z .Q3~JOO
3/ 6 -0.79andtheP-value=02148
. B ecause the p -
the sampling distribution of x is approximately Normal. The sample mean is approximately
T
I
Chapter II I Testing a Claim 253
252
I
by this producer, and we are told that the freezing temperatures vary according to a Normal
value of0.2I48 is greater than any reasonable a level, we do not rejectH0 This is reasonable I
variation when the null hypothesis is true, so the filling machinery appears to be working distribution. The test statistic is z = -0.5JS- ( - 0545 ) 196 an d the p -va1ue = 0.025. Since 0.025
o.oosrJ5
properly on this day.
I
I is less than 0.0.5, we reject Ho at a= 0.05 and conclude that this producer appear to be adding
Il.3I (a) Yes, the P-value = 0.06 indicates that the results observed are not significant at the 5%
level, so the 95% confidence interval will include 10. (b) No, because P-value < 0.1, we can I water to the m1lk. The mean freezing point is significantly higher than -0.54sc.
reject H : 11 = 10 at the 10% level. The 90% confidence interval would include only those values
0 I 11.37 (a) Y~s, 30 is in the 95% confidence interval, because P-value = 0.09 means that we
a for which we could not reject H 0 : 11 = a at the I 0% level. would n~t reject Ho at a = 0.05. (b) No, 30 is not in the 90% confidence interval because we
I would reject H 0 at a = 0.10. '
11.32 The 95% confidence interval for 11 is (28.0, 35.0). (a) No, since the 95% confidence I
interval includes 34, we cannot reject H0 : 11 = 34 at the :i% level. (b) Yes, since the 95%
confidence interval does not include 36, we can reject H 0 : p = 36 at the 5% level. Note: We
I 11.38 (a) No, 13 is in the 90% confidence interval, so H0 cannot be rejected at the 10% level.

are using a two-sided alternative for both tests.


I 12
(b) No, the sample mean is x = + l5 13.5 and the standard error is approximately
2
I 1.5
Il.33 (a) A 90% confidence inte~al for the mean reading is 104.1331.645 k = (99.86, I
1.
645
= 0.91, so 13.5 is less than one standard error from 13. (c) Yes, 10 is not in the 90%

confidence interval, so H0 can be rejected at the 10% level . (d) Here the ans wer depends on the
108.4I ). With 90% confidence, we estimate the mean reading of all radon detectors exposed to I . .
d1rectwn of the 11 <1 0 , the answer IS
. alternative. If the alternative is H ,.. no because the sample
105 picocuries of radon is between 99.86 and 108.41 picocuries. (b) Because 105 falls in this
90% confidence interval, we cannot reject H 0 : 11 = 105 in favor of H.: 11 or 105. The
I mean of 13.5 IS well above 10. However, if the alternative is H. : f.1 > 10, the answer is yes
confidence interval may be wider than we would like, but we do not have evidence to suggest I =
because the sample mean of 13.5 is approximately :.: 3.85 standard errors above 10.
that the mean reading is different from I 05 picocuries. I 1

11.34 The two-sided P-value is 2x0.04 = 0.08. (a) Yes, the P-value = 0.08 indicates that the 1 1._39 Yes. We want to test Ho :/1 = 450 versus H.: 11 > 450. The conditions for the z-test are
results observed are not significant at the 5% level, so the 95% confidence interval will include satisfied because the test was given to an SRS of seniors from Califo
30. (b) No, because P-value < 0.1, we can reject H 0 :p=30at the 10% level. The 90% 5000 seniors in the state of California, and the central limit theorem :I~, t~:~r;h:r:
more than
confidence interval would include only those values a for which we could not reject H 0 : 11 = a at 500 scores will be approximately Normal, even if the distribution of sZores is sligh~~~~e~:~~of
461 450
the I 0% level. The test statistic is z lOO/.JSQO-
- "'- 246 and the P-value = 0.0069. Since 0.0069 is less than

11.35 (a) The sample may not be representative, as the women have taken themselves to the 0.01 we reject H, at a= 0.01 and conclude that mean SAT Math score is significantly higher
clinic. Normality should be ok due to the large sample size (n = 160). We are sampling without than 450.
replacement, but the independence condition is satisfied because there are more than lOx 160 =
1600pregnantwomen in Guatemala. (b) Wewanttotest H0 :p=9.5 versus H.:p*9.5. The 11.40 The 95%.confidence interval for the mean amount of sugar in the hindgut is (I 9 65
test statistic is z 957 ~ 2.21 and the P-value = 2x0.0136 = 0.0272. Since 0.0272 is less ~g). (a) Yes, smce 7 mg is not contained in the 95% confidence interval, we reject Ho ~!'= 7
0.4/ 160
than 0.05, we reject H0 at a= 0.05 and conclude that the mean calcium level in healthy,
. alternative H ,.-
m favor of the 11*7 at the so/
o-o 1eve1. (b) No, smce
5 mg is contained in the 95%
confidence mterval, we cannot reject H0 : p = 5 at the 5% level.
pregnant, Guatemalan women differs from 9.5 grams per deciliter. (c) A 95% confidence
interval for the mean calcium level is 9.57 1.96~ = (9.508, 9.632). With 95%confidence, 11.41 (a) Yes, see the screenshot below on the left. (b) As one can judge from the shadin
-vl60
we estimate the mean blood calcium of all healthy pregnant women in Guatemala at their first
under the Norrr:al curve, the results are not significant for x s 0.5, and significant for x;;,0~6 . In

visit to be between 9.508 and 9.632 grams per deciliter.


fact, the cutoff IS about 0.52, which is approximately 1.645/ JIO. .
11.36 We want to test H 0 : 11 = --0.545 versus H.: 11 > -0.545. The conditions for the z-test are
satisfied because the containers were randomly selected, more than 50 containers are produced
t
I
254 Chapter II I Testing a Claim 255
I
I Z-Test
I In~t:LmE Stats
"'" 1.00
;;; ; I.!JO

I J..l.llll~
cr:.2
I List:L1
I Fre-:.t: 1
I J..I.:Fni!J (JJ.Il )JJ.o
-U!II -ll.oid 11.1:111 Q.I!.D 1.2:11
I
CaTCUlate Draw .3:667 F-=.713:8

....
-1.211 -11.611 lUlU 310fll(ln~t u1 ~~~,..1 0.~1

;r,.ur::::J e.~d ott:~er'led f< i6 )l" .. E=:J I


,,,,
I 1'\(1')'1) IJ(I1t, 6HQ ttre Ob$f.lfY~d ii )0 . !lni'IO dSJto, tM The results, except for some slight rounding differences are the same as those in Exercise 11.29.

'" ; The triJ\h 1bout thoe population It J1 ~~
I
The trutlllbO!It the popijJation is J1 ~

I 11.43 (a) No, not at the 5% level. The test statistic is z


.
536 7
:icJo
114/ 100
8
1.64 and the P-value =
(c) yes, see the screenshot above on the right. As one can judge from the shading under ~e .
Normal curve the results are significant for :;:;:, 0.8. In fact, the cutoff is about 0.7354, whtch ts I 0.0505. (b) Yes, the test statistic is z =
536 8
:icJo =
8
1.65 and Pvalue = 0.0495 is less than 0.05.
114/ 100
approximate!; 2.326/.JlO. Smaller a means that x must be farther away from J1o in order to I
reject H 0 I 11.44 (a) The test statistic is z
522
~ 0.4 and the Pvalue = 0.3446, which is not below
100/ 100
I 522
11.42 (a) We want to test H,: 11 = -0.545 versus H.: I'> -0.545. The Tl calculator screens are
a= 0.05. (b) The test statistic is z ~ =1.26 and the Pvalue = 0.1038, which is still
shown below. I 100/ 1000

Z-Test .
I not below a= 0.05. (c) The test statistic is z
522-518
roo Utoooo
4.0 and the Pvalue < 0.0001,
lnpt:Data .F:tE'LE. ....... I which is statistically significant at any reasonable a level.
J..l.a: -.545 I
a:.008
x: -. 538 I 11.45 A 99% confidence interval for the mean SAT Math score f1 after coaching is
n:5 I 5221.96 ~. (a) When n = 100, the 99% confidence interval for f1 is (496.24, 547.76). (b)
*
J..l. :
J..l.
Calculate
<IJ I] I]
:z=i.!IS:66 I When n = 1000, the 99% confidence interval for f1 is (513.85, 530.15). (c) When n = 10,000,
I the 99% confidence interval for 11 is (519.42, 524.58).
The results, except for some siight rounding differences, are the same as those in Exercise 11.36.
*
(b) We want to test H, : 11 = 11.5 versus H. :11 11.5. The TI calculator screens are shown
I 11.46 This is not information taken from an SRS, or from any kind of sample. We have

below. I information about all presidents-the whole population of interest.

I 11.47 (a) No, in a sample of size n = 500, we expect to see about 5 people who do better than
random guessing, with a significance level ofO.Ol. These four might have ESP, or they may
I simply be among the "lucky" ones we expect to see. (b) The researcher should repeat the
I procedure on these four to see if they again perform well.

11.48 The answer (b). A test of significance is used to determine if the observed effect is due to
chance.
256 Chapter 11
T
I Testing a Claim 257
I
exceeds $85,000 when in fact it does not. The consequence is that you will open your restaurant
11.49 (a) H, :p=0.75 versus H. :p>0.75. (b) A Type 1 error would be committed if the
I in a location where the residents will not be able to support it. A Type II error is committed if
manager decided that they were responding to more than 75% of the calls within 8 minutes I you conclude that the local mean income does not exceed $85,000 when in fact it does. The
when, in fact there responding to less than 75% within that time. A Type II error w~ul~ be consequence ofthis error is that you will not open your restaurant in a location where the
committed if the manager decided that they responding to 75% or less of the calls w1thm 8 .
I residents would have been able to support it. (c) A Type I error is more serious. If you opened
minutes when in fact they were responding to more than 75% in that time. (c) The consequence I your restaurant in an inappropriate area, then you would sustain a financial loss. If you failed to
of a Ty~e I error is that city officials may be satisfied with response times and see no need to open your restaurant in an appropriate area, then you would miss out on an opportunity to earn a
improve, when they should. The consequence of a Type II error is that offici~ls try and make the I profit, but you would not necessarily lose money (e.g., if you chose another appropriate location
in its place). (d) The smallest significance level, a= 0.01, is the most appropriate, because it
response times even faster when there is no need to do so. (d) A Type I error IS much more
serious in this situation because city officials think things are better than they actually are. (e)
I would minimize your probability of committing a serious Type I error. (e) When ,u= 87,000,
Students can give either answer with an appropriate defense. However, most students. will I there is about a 69% chance that you will open a restaurant in that area, and the probability of
probably say they are more interested in testing hypotheses abo~t the mean r~sponse time. Short
rationales are: 1. Lower mean response times are better; 2. A higher proportiOn of successes
I committing a Type II error is 0.3078, or about 31%.

(getting to the accident in less than 8 minutes) is better. I 11.54 (a) H,: ,u = 2mg versus H.: ,u * 2mg, where ,u =the mean salt content of a certain type

11.50 (a) H 0 :,u = 130 versus H.: ,u >.130. (b) A Type I error. is committed by telling an
I of potato chips. (b) A Type I error is committed if the company decides that the mean salt
content is different from 2 mg when in fact it is not. The probability of making a Type I error is
employee that they have a high systolic blood pressure ~hen m fact they do not. A Type ~I error I a= 0.05. (c) A Type II error is committed if the company sticks by its claim when the mean
is committed by failing to notifY an employee who has high blood pressure. (c) You obviously salt content is different from 2 mg. The probability of making a Type II error when p = 2.05 is
want to make the chance of a Type II error as small as possible. While it is inconvenient to send I 0.0576 or about 6%. (d) The power of the test is 1- 0.0576 = 0.9424 or about a 94.24% chance
some employees for further testing when their blood pressure is OK (a Type I error), death could I of detecting this difference if0.05. (e) The power of this test is also 0.9424 because it is the
result from a Type II error. same distance away from the mean specified in the null hypothesis. You should have the same
I power of detecting differences that are the same distance away from the mean, whether you are
11.51 (a) A Type I error is committed if you decide the mean sales for the new catalog will be
more than $40 when in fact it is not. The consequence is that you waste company resources by
I above the mean or below the mean does not matter for two-sided alternatives. (f) The
probability of a Type I error would increase from 0.05 to 0.10, so the probability of a Type II
changing the production process to the new cover design and it won't increase the mean.sales. I error would decrease and the power would increase. (g) Throwing away good chips, the
(b) A Type II error is committed if you decide that the mean sales .for the new catalog w~ll be $40
(or less) when it turns out to be more than $40. The conse~uence !s that the ~ompany Will not
I consequence of making a Type I error, is not a good idea, but it is better than telling consumers
that the chips contain 2 mg of chips when they do not. A Type II error probability could create
make the additional profits that would have been made by mcreasmg ~ales With the n?w c?ver.
(c) Increasing profits would be nice, but wasting money or resources IS ~ever a good Idea m
I serious health problems for some consumers. Thus, the company should try to minimize the
chance of making a Type II error, which means that the highest Type I error rate a= 0.1 would
business-Type I is more serious. (d) The probability of a Typej error Is. a = 0.01, and t?e I be best in this situation.
probability of a Type II error is p = 0.379. (e) 44.4139 is the 99 percentile of the sampling
I
distribution of x when H 0 :,u=40 is true. That is, 40+2.32635x ~ =44.4139. I
11.55 The power of this study is far lower than what is generally desired-for example, it is well
below the "80% standard" mentioned in the text. Twenty percent power for the specified effect

11.52 (a) H 0 :,u = 10,000psi versus H. :,u <10,000psi. (b) A Type I error is committed by
I means that, if the effect is present, we will only detect it 20% of the time. With such a small
chance of detecting an important difference, the study should probably not be run (unless the
telling telling the consumer that the wood with blue stain is we.aker when in fact it i~ not. T~e I sample size is increased to give sufficiently high power).
consequence is that you must find more wood without ~lue stams. A T~pe II ~n:or IS committed
by not telling the consumer that the blue-stained wood IS weaker ~hen m fa~t It IS ..A
I 11.56 The power for p = 80 will be higher than 0.5, because larger differences are easier to
consequence is that you may lose loyal customers because they Willlo~e their tru~t m the
company. (c) Spending more money to find wood that d?es not cont:tm blue s~ams ~Type I
I detect.

11.57 (a) For both Po and p 1 , each probability is between 0 and I and the sum of the
error) would increase expenses for the company, but that IS not as senous as bemg dishonest to
customers. A Type II error is more serious. I probabilities for each distribution is 1. (b) The probability of a Type I error is P(X = 0 or X= 1
when the distribution is p 0 ) = 0.1 + 0.1 = 0.2. (c) The probability of a Type II error is P(X > 1
11.53 (a) H 0 :,u = $85,000 versus H. :,u > $85,000, where ,u =the mean income of residents when the distribution is p 1 ) = P(X = 2) + P(X = 3) + P(X=4) + P(X=5) + P(X=6) = 5xO.l = 0.5.
near the restaurant. (b) A Type I error is committed if you conclude that the local mean income
258 Chapter II
T
I Testing a Claim 259
I
though it is safe to fly. Obviously, the passengers will not be happy with this decision. A Type
I II error is committed if Captain Ben concludes that the mean weight ofthe checked baggage is
equal to 100 pounds, when in fact it is heavier. The consequence of this error is that Captain Ben

I will take off with luggage that exceeds the safety standards and the jet may experience
mechanical problems. (c) We want to minimize the chance of making a Type II error, so set the
I chance of making a Type I error at the maximum of the values provided-a = 0.10. (d) The
sample is 20% of the population. The weight of checked bags for passengers may not be
normally distributed. In short, two of our three conditions, Independence and Normality, may
not be satisfied.

11.62 (a) The probability of a Type II error is I - 0.82 = 0.18. (b) The power of the test will be
the same for this alternative, because it is the same distance from 0. The symmetry of two-sided
I tests with the Normal distribution means that we only need to consider the size of the difference, II
I

I not the direction. (c) The power when 11 = -6 would be smaller because it is closer to 0, and
hence harder to detect, than the difference in part (b).
The upper distribution of the applet gives the value of a and shades the area corresponding to the I
probability of a Type I error. The lower distribution gives the value ofthe power and_s~ades the I 11.63 Finding something to be "statistically significant" is not really useful unless the
corresponding area. The remaining area of the distribution corresponds to the probabthty of a significance level is sufficiently small. While there is some freedom to decide what "sufficiently
Type II error. The TYPE2 program however, superimposes both graphs on one screen and also I small" means, a = 0.5 would lead your team to incorrectly rejecting H 0 half the time, so it is
gives the value of [:l and the critical value of x . I clearly a bad choice. (This approach would be essentially equivalent to flipping a coin to make
your decision!)
11.59 A larger sample gives more information and therefore gives a better chance (or larger I
probability) of detecting differences. That is, larger samples give more power. I 11.64 (a) A Type I error would be committed if the inspector concluded the mean contents were
below the target of 300 ml, when in fact they are not. A Type II error would be committed if the
11.60 (a) H 0 : Jl ~ !20psi versus H.: Jl < 120psi, where 11 ~the mean water pressure for pipes I inspector concluded that the filling machines were working properly, when in fact they were
from this supplier. (b) The power of the test in (a) when n ~ 25, a= 0.05, and 11 = 125 is about putting less cola in the bottles than specified. The power of the test is the probability of
0.93 or 93% according to Minitab and the Power applet. (c) Each pipe is numbered from 0001
I detecting mean content below 300 ml. (b) According to the Power applet, the power of this test r
to 1500, and then 4 digit random numbers are examined, starting at line 140, until 40 nu~be~s I against the alternativep = 299 is 0.198 or about 20%. (Minitab gives 0.2037 or about 20%.) (c)
between 0001 and 1500 are identified. The pipes with these numbers would be the 40 ptpes m According to the Power applet, the power of this test against the alternative 11 = 295 is 0.992 or 1:

our sample. The first two pipes in the sample are 1297 and 484. Notic~ that even though m~n~ I 99.2%. (Minitab gives 0.9926 or 99.26%.) (d) Answers will vary. Students may increase n, [,

students will suggest this method, it is not very efficient because a constde~able number of.dtgtts I increase a, or decrease a. For example increasing the sample size ton= 7 gives a power of 1
I,

will need to be examined to get your sample. It would be much more effictent to use the Simple
Random Sample applet or a random number generator. (d) A Type I error ~s co~mitted if!he I 0.997 and increasing the significance level to a= 0.10 gives a power of0.997.
i:
I
construction manager tells the supplier that their pipes do not meet the spectficatwn, when m fact CASE CLOSED!
they do. The consequence of this error will be a strained business relationship. A Type II error I I. Yes, in order to use the inference methods from this chapter the sample should be an SRS. If
is committed when the manager says that the pipes meet the specification, but they don't. The
consequence of this error may be leaky pipes, major water dam~ge, ~r pi~es _that _won't work and
I the sample is not representative of all tablets produced in this batch, then it will not make sense
to use them for inferences about the mean contents of acetylsalicylic acid for all tablets in this
need to be replaced. (e) Type II error is obviously the most sen?us m thts sttua!w~, so we I batch. 2. We want to test H 0 : 11 ~320mg versus H,: Jl '>'320mg, where 11 =the mean content
should minimize the probability of making a Type II error by usmg the largest stgmficance level
of 0.10. (f) It only takes one weak pipe to create major problems, so it would be best to t~st
I of the active ingredient (acetylsalicylic acid). The tablets may contain too much or too little of
the active ingredient, so the alternative should be two sided. 3. A Type I error would occur if
hypotheses about the proportion of pipes that have a breaking strength ofless than 120 pst. I we conclude that the company is not putting the correct amount of the active ingredient in the

11.61 (a) H, : 11 = 5 ~~
0
~IOOversusH. :11>100. (b) A Type I is committed if Captain Ben
I tablets, when in fact they are. The consequence of this error is that the company will dispose of
a good batch of tablets. A Type II error would occur if we conclude that the tablets contain the

concludes that the mean weight of the checked baggage is greater than I 00 pounds, when in fact
I correct amount of the active ingredient when in fact they have too much or not enough. The
consequences of this error could be death (in the most severe situation) from an overdose of the
it is not. The consequence of this error is for Captain Ben to keep the plane on the ground, even

.'
'
i
I
260
Chapter 11 I Testing a Claim 261
I
active ingredient or unhappy customers because the pills do not have enough of the active
ingredient to relieve a headache. 4. Since Type II error is the most serious error in this situation,
I 11.69 Wewanttotest H 0 :f1=150 versusH.:Jl<lSO. The test statistic is z 137-150 =-3.28
. . 65i.J269
we should use the highest reasonable significance level (0.1) to minimize the probability of a I and the P-valu~ = 0.0005. Smce 0.0005 is less than any reasonable significance level, this is

Type II error. 5 . Th
. . .
e test stat1sttc IS z =
321.028-320
3/v36
d h P I
2.06 an t e -va ue = I very strong evidence that students study less than an average of2.5 hours per night.

2x 0.0197 = 0.0394. Since 0.04 is less than 0.1, we reject H 0 and conclude that the mean I 11.70 We want to test H 0 : fl = 0 versus H. : Jl > 0. The test statistic is z = 69 - 0 1. 28 and
content of the active ingredient is significantly different from 320 mg. 6. A 90% confidence I . 55/.J104
th~ P-value = 0.1003. Smce 0.1003 is greater than any reasonable significance level we cannot
interval for the mean content of acetylsalicylic acid is 321.028 1.645 b,
v36
= (320.21, 321.85). 7. I reject::rand conclude that this is not good evidence that the mean real compensati~n of all

The power of the test when fl = 321 mg is 0.633 or about 63%. (Minitab gives a power of I C~~s mcrea~ed. In other words, the sample mean appears to indicate a positive increase but
th1s mcrease IS not statistically significant. '
0.6389.) This power could be increased by increasing the sample size or increasing the I
significance level (although it is rare to conduct hypothesis tests with a significance level higher 11.71 (a) The margin of error decreases. (b) The P-value decreases. (c) The power increases
than 0.1), or decreasing cr. 8. Answers will vary, but the report should contain graphical I Note: . All of these c~anges w~uld be _vie':"ed favorably by statisticians conducting the analysis
displays, numerical statistics, and a clear conclusion for the executives. In short, this batch of
tablets sj:10uld not be distributed to drugstores because the mean amount of the active ingredient
I and chents who are mterested m makmg mferences.

is significantly different from the specified amount of320 mg. Delivering disappointing news I _11.72 (a) The 95% con~dence interval would be wider. To be more confident that our interval
like this to executives is never easy! Medical experts may argue that the 90% confidence
interval suggests that the mean contents are only slightly off target, and this difference is not of I mcludes the true populatwn parameter, we must allow a larger margin of error. So the margin of
error for 95% confi~e~ce is larger than for 90% confidence. (b) We would not reject H 0 :,u =$16
any practical significance. If that is the case, then the company may decide to send the shipment
with a warning label and reexamine their production process.
I because $16 falls w1thm the 90% confidence interval, indicating that the two-sided P-value is at
I least 0.10. However, we could reject H 0 :,u= $15 at a= 0.10 because the 90% confidence
interval does not include $15.
11.65 (a) H 0 :p=$72,500;H.:,u>$72,SOO.(b) H0 :p=0.75; H.:p<0.15. (c) H 0 :,u=20
I
seconds; H. : Jl < 20 seconds.
I 11.73 (a! Ho :f1=300 (the company's claim is true) versus H. :f1<300 (the mean breaking
strength IS less th~ th~ co_mpany claim). (b) A Type I error would be committed if we conclude
I that the company s clmm IS mcorrect ( ,u <300) when in fact it is legitimate ( Jl =300). A Type II
11.66 We want to test H 0 : fl = 120versus H.: fl ;t 120. The test statistic is
I ~rror.would be committ~d if we conclude that the company's claim is legitimate when in fact it is
z 123 8 -J7o0 2.40 and the P-value = 2x0.0082 = 0.0164. Yes, since 0.0164 is less than 0.05,
I mv~~1d. A Type ~I.err~r 1s_more ~erious in this case, because allowing the company to continue
the false advert1smg of 1ts cha1rs' strength could lead to injuries, lawsuits and other serious
10/ 40 .
consequences. (c) If the null hypothesis is true, then the sampling distributlon of x is
we reject H 0 and conclude that the mean yield of com in the United States is not 120 bushels per
acre. This conclusion holds even if the distribution of com yields is slightly non-Normal,
I appro~imately ~ormal with mean 300 pounds and standard deviation cr, = 15/J30 2. 7386. =
because the sample size (n = 40) is reasonably large. I The 5 percentlle of the N(3?0, 2.7386) distribution is 295.495, so all values at or below 295.495

11.67 The two-sided P-value is 2x0.02 = 0.04. (a) Yes, the P-value = 0.04 indicates that the
I pounds would cause us to reject H 0 (d) The probability of a Type II error is 0.022 or about 2%.
(e) Increase the sample size or increase the significance level.
I
jl
results observed are not significant at the 1% level, so the 99% confidence interval will include I I
15. (b) No, because the P-value < 0.05, we can reject H 0 :Jl= !Sat the 5% level. The 95%
confidence interval would include only those values a for which we could not reject H 0 : Jl = a at
I 11.74 The stu~y may have rejected fl =,U0 (or some other null hypothesis), but with such a
l~rge sample s_1ze, such a rejection might occur even if the actual mean (or other parameter)
!
the 5% level. I differs only shghtly from flo. For example, there might be no practical importance to the

11.68 We expect more variation with small sample sizes, so even a large difference between I difference betweenp = 10 andJl = 10.5.
x and flo (or whatever measures are appropriate in our hypothesis test) might not tum out to be I
significant. If we were to repeat the test with a larger sample, the decrease in the standard error
might give us a small enough P-value to reject H 0
262 Chapter 12 Significance Tests in Practice 263

Chapter 12 12.6 We want to test H0 : ,u = 1 versus Ha : ,u > 1, wh~re ,u =the mean heat conductivity
measured in watts of heat power transmitted per square meter of surface per degree Celsius of
12.1 (a)2.015. (b)2.518. temperature difference on the two sides of this particular type of glass. We must be willing to
.I treat these 11 measurements as an SRS from a larger population of this type of glass. The
12.2 (a) 2.145. (b) 0.688. histogram (below on the left) and Normal probability plot (below on the right) show no serious
departures from Normality or outliers so the Normal condition appears to be satisfied. The
12.3 (a) 14.(b) 1.82isbetween 1.761 (p=0.05)and2.145 (p=0.025). (c)TheP-valueis independence condition is also satisfied since we are sampling without replacement and the
between 0.025 and 0.05. (In fact, the P-value is 0.0451.) (d) t = 1.82 is significant at a= 0.05but number of windows (or other products) made from this type of glass is clearly larger than
not at a= 0.01. 10xll = 110.

12.4 (a)24.(b) 1.12isbetween 1.059 (p=0.15) and 1.318 (p=O.lO).(c)TheP-valueis


between 0.30 and 0.20. (In fact, the P-value is 0.2738.) (d) No, t = 1.12 is not significant at
either a= 0.10 or at a= 0.05.

12.5 (a)H0 : ,u =1200mg versus Ha : ,u < 1200mg, where ,u= the mean daily calcium intake for
women between the ages of 18 and 24 years. We know that these women participated in the
study, but we do not know if they were randomly selected from some larger population. These
women are most likely volunteers, so we must be willing to treat them as an SRS from a larger
population ofwomen in this age group. The histogram (on the left) and the Normal probability
plot (on the right) below show that the distribution of calcium intake is skewed to the right, with
two outliers. There is a clear nonlinear pattern in the Normal probability plot which should
create some concern. Use of the !-procedure is justified because the sample size is reasonably . . IS
The test statistic . t = 1. 1182/ - r;-;
1 . 8 95 . h df
= . , wit = 10, and P-va1ue < 0.000: 1. Because the P-
large (n = 38) and thus the distribution of :X will be approximately Normal by the central limit 0.0438 v11
theorem. The independence condition is satisfied because we are sampling without replacement
value is less than any reasonable significance level, we reject H 0 and conclude that the mean heat
and the size is much than 10 x 38 = 380.
conductivity for this type of glass is greater than 1. A 95% confidence interval for ,u is
~~~~~~
1.1182 2.228( 0.0438/ Jli) = (1.089, 1.148). We are 95% confident that the mean
conductivity for this type of glass is between 1.089 and 1.148 units.

12.7 We wantto test H 0 :,u=l6 versus Ha :,u:t:l6. The two-sided alternative is used because
Ill
.
.,...

we want to see if the mean weight gain is different than what is expected. The test statistic is

........... t=
10
.4 i~ = -5.82
8
with df= 15 and a P-value < 0.0001. Since the P-value is less than
3.8406 16 .
any reasonable significance level, we reject H 0 and conclude that the mean weight gain is
significantly different than expected. This is the same conclusion we made in Exercise 10.68.
The test statistic is t = j
~O -3.95, with df= 37, and P-value = 0.00017. Because
926 026
=
427.230 38
~ ~ = 5.125 with
12.8 We want to test H 0 : Jl = 0 versus Ha: ,u =1- 0. The test statistic is t =
32

the P-value is less than a =0.05, we reject H 0 and conclude that the mean daily intake is 256 16
significantly less than the RDA recommendation. (b) Without the two high outliers (1933 and =
df= 15 and a P-value 0.0012. Since 0.0012 is less than a= 0.01, we reject H 0 and conclude
=
2433), t = -6.73 and the P-value 0. (Minitab shows one high outlier (2433)-without the that there is a significant change in NEAT. (b) With t* = 2.131, the 95% confidence interval is
outlier, t = -5.46 and the P-value =
0.) Our conclusion does not change. 191.6 to 464.4 cal/day. This tells us how much of the additional calories might have been burned
by the increase in NEAT: It consumed 19% to 46% of the extra 1000 callday.

II
_j
266 Chapter 12 Significance Tests in Practice 267

12.17 (a) For at distribution with df = 4, the P-value is 0.0704-not significant at the 5% level. PROBBILITY OF
See the sketch below (on the left). (b) For at distribution with df= 9, the P-value is 0.0368, TYPE II ERROR
which is significant at the 5% level. See the sketch below (on the right). A larger sample size . 1027776239
means that there is less variability in the sample mean, so the t statistic is less likely to be large POWER OF TEST
when H 0 is true. Note that even with these computer produced graphs of these t distributions, it
.8972223761
is difficult to see the subtle difference between them: The "tails" of the t (4) distribution are
"heavier" which is the P-value is larger. PRESS ENTER
(d) Any two of the following: Increasing the significance level, decreasing the standard II
deviation a, or moving the particular alternative value for f.l farther away from 0.

12.20 Let Jl =the mean percent of purchases for which an alternative supplier offered a lower
price than the original supplier. The conditions for inference are satisfied. The invoic~s were
randomly selected so it is reasonable to view these differences as an SRS. The differences are
independent from one invoice to another. The graphical displays suggest that the differences are
skewed to the left, but there are no outliers. With n = 25, we can appeal to the robustness of the t
procedures, since the distribution of the differences is not Normal. The summary statistics
provided indicate that the mean is 77.76%, the standard deviation is 32.6768%, and the standard
error is about 6.5354%. Using Table C with df= 24, the critical value is t* = 2.064, so the 95%
12.18 (a) The parameter fld is the mean difference in the yields for the two varieties of plants. confidence interval for Jl is 77.76%2.064x6.5353603%= (64.27%, 91.25%). The data
(b) We want to test H 0 : f.ld = 0 versus Ha : Jld > 0. It is reasonable to assume that the differences support the retailer's claim: 64% to 91% ofthe time the original supplier's price was higher.
in the yields are independent and follow a Normal distribution. Nothing is mentioned about
random selection, but we must also assume that these differences represent an SRS from the
population of differences in the yields for these two varieties. The test statistic is 12.21 (a) Let 1-'H-F= the mean difference in vitamin C content (Haiti -Factory) at the two
0 locations. We want to test H 0 : f1H-F =0 versus Ha : f1H-F < 0 . The test statistic is
t= }~ 1.295 with df= 9 and P.cvalue = 0.114. Since the P-value is greater than 0.05,
=
0.83 10
we cannot reject H 0 and conclude that the yields for the two varieties are not significantly
t= -
53
jffi =
5.5885 27
-4.96, with df= 26 and a P-value < 0.0005. (b) A 95% confidence interval

different. The observed difference appears to be due to chance variation. for 1-'H-F is -5.33332.056(5.5885/ffi)=(-7.54,-3.12). With 95% confidence, we estimate

12.19 (a) A Type I error is committed when the experts conclude that there is a mean difference the mean loss in vitamin C content over the 50 month period to be between 3.12 and 7.54
in the yields when in fact there is none. A Type II error is committed when the experts conclude mg/100g. (c) Yes. Let flF denote the mean vitamin C content ofthe specially marked bags of
that there is no mean difference in yields when in fact one does exist. A Type II error is more WSB at the factory. We want to test H 0 : f1F = 40 versus Ha : f1F =F 40 . The test statistic is

, I
serious because the experts would like to increase the yield (and hence make more money)
whenever possible. (b) The power is 0.5278. (Reject H 0 if t > 1.833 i.e., if x > 0.4811.) See the t= J
~ 3.09, with df=26 and 0.002 < P-value < 0.005. Since the P-value is below
42 85
4.793 27
=
T DISTRIBUTION
HYPOTH MU:0 rft,lfE:
screen shots from the calculator below. Minitab gives 0.545676.

. :A
PROBBlLITY OF
TYPE II ERROR
0.01, we have strong evidence that the mean vitamin C content differs from the target value of 40
mg/100g for specially marked bags at the factory. (Note: Some students may identify the
parameter of interest as the vitamin C content of the bags when they arrive in Haiti. The correct
STAND DEV:.83 2:ALT<NUL .472137179
ALTERN MU:.5 ~ALT>NUL POWER OF TEST solution for these students is: Let flH denote the mean vitamin C content of the bags ofWSB
SAMPLE SZ:10 .527862821 shipped to Haiti. We want to testH0 : f1H = 40 versus Ha: f1H =F 40. The test statistic is
ALPHA LEV:.051 37 5185 - 40 . 5 29 . h df=26 d
t= / r;:;;:; =- . , wtt - an P-value < 0 .0001. Smce . the P-value is below 0.01,
PRESS ENTER 2.4396 -v27
we have strong evidence that the mean vitamin C content differs from the target value of 40
(c) The power is 0.8972. (Reject H 0 if t > 1.711, i.e., if x > 0.2840 .) See the screen shot below. mg/100g.)
Minitab gives 0.899833.
Chapter 12 Significance Tests in Practice 269
268

0.66-0.73 =-
z = ----;===== . h a P-value of 0.026. Since 0.026 < 0.05, we reject H and
. 2 .23 , wit
0
0.73x0.27
12.22 We will reject H 0 when t = s/Fn ~ t*, where t' is the appropriate critical value for the 200
conclude that we do have statistically significant evidence that the proportion of all first-year
chosen sample size. This corresponds to x ~ 1Ot* / Fn, so the power against f1 = 2 is students at this university who think being very well-off is important differs from the national

p (- ~ 1Ot;"nr) = p (1x-2
X o/ Fn ~ 10t'/Fn
1o/ Fn
-2) value. (b) A 95% confidence interval for pis 0.661.96 66 034
x
200
= (0.5943, 0.7257). The

confidence interval gives us information about the plausible values ofp. We are 95% confident
II = P(z ~ t* -o.2Fn) that the proportion of students at this university who would like to be well-off is between 59.4%
I, I I
For a= 0.05, the first two columns of the table below show the power for a variety of sample and 72.6%.
sizes, and we see that n 2156 achieves the desired 80% power. The power for a variety of
sample sizes when a= 0.01 is shown in the last two columns of the tab.le below, and we ~ee that 12.25 (a) Yes. Letp = Shaq's free-throw percentage during the season following his off-season
n 2 254 achieves the desired 80% power. As expected, more observatiOns are needed with the training. We certainly do not have an SRS of all free-throws by Shaq, but we will proceed to see
smaller significance level. Since the significance level was not provided, students may use other if the observed difference could be due to chance. The other two conditions (expected number of
values, but these will b e th e two most common responses. successes and failures are at least 10 and large population) are both satisfied. We want to test
Sample Power when Sample Power when . . IS
H 0 : p =0.533 versus H a : p > 0 .533 . The test statistic . z = 0.6667-0.533 =. 1.67 and t he p -
size a =0.05 size a=0.01 0.533 X 0.467
25 0.250485 25 0.083564 39
50 0.401222 50 0.170827 value= 0.0475. Notice that the P-value is just under 0.05, so we would say that this increase
75 0.528434 75 0.265700 would not be explained by chance. Although we found a statistically significant increase in
100 0.633618 100 0.361801 Shaq's free-throw shooting percentage for the first two games, we would not suggest niaking an
125 0.718701 125 0.454352 inference about p based on these two games. (b) A Type I error would be committed by
150 0.786254 150 0.540181 concluding that Shaq has improved his free-throwing when in fact he has not. A Type II error
151 0.788631 175 0.617459 would be committed by concluding that Shaq has not improved his free-throwing when in fact he
152 0.790984 200 0.685391 has. (c) The power is 0.2058. (d) The probability of a Type I error is a= 0.05. The probability
153 0.793314 225 0.743930 of a Type II error is 1-0.2058 = 0.7942.
154 0.795621 250 0.793528
155 0.797906 251 0.795336 12.26 (a) We want to test H 0 : p =0.1 versus Ha :p < 0.1. The conditions for inference are met.
156 0.800167 252 0.797131 We must assume these patients are an SRS of all patients who would take this pain reliever.
157 0.802406 253 0.798913 Both np0 = 440x 0.1 = 44 and n(1- p 0 ) = 440x 0.9 = 396 are at least 10. It is also reasonable to
158 0.804623 254 0.800682 assume that the number of patients who would take this pain reliever is larger than 1Ox440 =
159 0.806818 255 0.802438 23
4400. The sample proportion is p =
440
=0.0523 and the test statistic is
12.23 (a) No, the expected number of successes np0 and the expected number of failures
n(l- p 0 ) are both less than 10 (they both equalS). (b) No, the expected number of failures is less z ~ 0~1 =-3.34, with a P-value ~ 0.0004. (b) A Type I error would be committed if
O.lx0.9
than 1O; n(l- p 0 ) = 2. (c) Yes, we have a SRS, the population is more than 10 times as large as
440
the sample, and np0 = n(l- p 0 ) = 10. the researchers conclude that the proportion of"adverse symptoms" is less than 0.1, when in fact
it is not. A Type II error would be committed if the researchers conclude that the proportion of
12.24 (a) We want to test H 0 : p = 0.73 versus Ha: p ::f:. 0.73. The conditions for inference are "adverse symptoms" is equal to 0.1, when in fact it is less than 0.1. A Type I error is more
met since this is an SRS and np0 = 200 x 0. 73 = 146 and n(1- Po) = 200 x 0.27 = 54 are both at serious because the researchers do not want to mislead consumers.
least 10. It is also reasonable to assume that the student body at this university is larger than
12.27 (a) A Type I error would be committed by deciding that the proportion differs from the
10x200 = 2000. The sample proportiOn . p~ = 132 = 0 .66 and t he test statistic
. IS . . IS.
national proportion when in fact it doesn't. This may lead to the restaurant manager
200
270 Chapter 12 Significance Tests in Practice 271

investigating the reason for the difference, which could waste time and m~ney A Typ~ II error
would be committed by deciding that the proportion is the same as the. nattonal prop.ortion when from 98.6. (2) A 95% confidence interval for J1 is 98.251.96336( ~ J= (98.1958,
in fact, it isn't. This may lead the manager to conclude that no action IS needed, ~htch may
result in disgruntled employees. (b) Power= 0.0368. (c) When n = 200, power- 0.1019. _ 98.3042). We are 95% confident that the mean body temperature is between 98.20F and
Doubling the sample size increases the power by about 176.9%. (d) When a= 0.01, power- 98.30F. The confidence interval provides an estimate for plausible values of"normal" body
0.062. When a= 0.10, power= 0.299. temperature. (3) Now, we want to testH0 : p = 0.5 versus Ha: p -:t:. 0.5. The test statistic is
0 623 0 5
z =~ - =. 6.51 wtt. h a p -va1ue very c1ose to 0. We have statisttca
. 11y stgm
. 'fitcant evidence
12.28 Results will vary. (a) Suppose one student obtained 17 heads. The sample proportion is 0.5x0.5

p 20 ~0.5x0.5
=
~ = =0.85 and the test statistic is z = 085 -O.S 3.13, with a P-value = 0.0018. This
.!2. 700
that the proportion of all healthy adults in this age group with a temperature less than 98.6 is not
20
student would conclude that the proportion ofheads obtained from tipping U.s .. pennie~ is equal to 0~5.
(4) A 95% confidence interval for p is 0.623 1.96
.

623 0377
x
700
=
(0.59, 0.66).
'
significantly different from 0.5. (b) Suppose a class of20 obtained 340 heads m 400 tips. The

sample proportion is p ~
340
400
. .
l.5
~ 0.85 and the test statistic IS z ~
0 85 - 0 5 . 14 00
x _ ~ WI a
05
'th p
We are 95% confident that the proportion of all healthy adults in this age group with a body
temperature below 98.6 is between 0.59 and 0.66. (5) Repeated measurements were taken on
140 healthy adults, so these 700 temperature readings are clearly not independent. There is also
400 no indication that these individuals were randomly selected from a larger group, so without
1 ery close to o At any reasonable significance level, the class would conclude that the additional information it is risky to assume they represent an SRS from some larger population.
va ue v
proportion
ofheads obtained from tipping U.S. pennies IS 'fiICantlY d'ffi
sigm I erent fJrom 0 5 The population is much larger than 10x700=7000, so this should not be a concern. Finally, the
distribution of :X will be approximately Normal, even ifthe distribution of temperatures is
12.29 We want to testH0 : p = 1/3 versus Ha: p > 1/3. The test statistic is slightly skewed because the sample size is reasonably large, and the expected number of
successes (350) and failures (350) are both at least 10.
z = 304/803-113:::::2.72
with aP-value = 0.0033. Yes, because 0.0033 is less than 0.01, this is
1/3x2/3 12.31 (a) Standard error should be replaced by margin of error. The margin of error equals the
803 critical valuez* times the standard error. For 95% confidence, the critical value is z* = 1.96. (b)
strong evidence that more than one-third of this population never use condoms. H 0 should refer top (the population proportion), not p (the sample proportion). (c) The Normal
distribution (and a z test statistic) should be used for significance tests involving proportions.
12.30 The table below shows that Tanya, Frank, and Sarah all recorded the. san:e sample
.
proport ton, p~ -- 0 .28 , but the P-values were all quite different. Our conclusiOn IS that the same
. 12.32 Let p =the proportion of adults who favor an increase in the use of nuclear power as a
value of the sample proportion provides different information about the strength o~th~ evidence major source of energy. We want to test H 0 : p = 0.5 versus Ha : p < 0.5. The expected number of
against the null hypothesis because the sample sizes are dif~erent. As the samp.le SIZe ~ncreases,
the P-value deceases, so the observed difference (or somethmg more extreme) IS less hkely to be successes ( np0 = 512 x 0.5 = 256) and the expected number of failures (also 256) are both at least
due to chance. 10, so use of the z test is appropriate for the SRS of adults. The sample proportion is
~

z P-value 225 39 S-O.S 2 74 . h P I


X n p p" = - = 04395
. and the test stattsttc
. . ts. z = 0.4
~ =- . wtt a -va ue = 0. 00 31. Yes,
14 50 0.28 -0.80 0.212 512 0.5x0.5
98 350 0.28 -2.12 0.017 512
140 500 0.28 -2.53 0.006 because 0.0031 is less than 0.01, this is strong evidence that less than one-half of all adults favor
an increase in the use of nuclear power.
CASE CLOSED!
(1) Let J1 =mean body temperature in the population ofhealthy 18 to 40 year olds. We want to 12.33 (a) There is borderline evidence. We want to testH0 : J1 = 0% versus Ha: J1 -:t:. 0%, J1 =the
9825 98 6 . 12 69
.J700
test H 0 : J1 = 98.6 versus Ha : Jl t:. 98.6. The test statistic is t = 0. / - =-
73
, WI'th df-
-
. 12 40
=
mean percent change (month to month) in sales. The test statistic is t = ~~ 2.0028, with

699 and a P-value very close to 0. Since the P-value is less than any reasonable sig.nifi.cance df= 39 and P-value = 0.0522. (The best we can say using Table C with df= 30 is that the P-
1eveI, say a -0
- .01 , we have very strong evidence that the mean body temperature IS dtfferent value is greater than 0.05.) This is not quite significant at the 5% level. Since 0.0522 is slightly
larger than 0.05, we cannot reject H 0 at the a= 0.05 significance level. However, we would
272 Chapter 12 Significance Tests in Practice 273

reject H 0 at the a= 0.055 significance level, since 0.0522 is less than 0.055. (b) Even if we had but not convincing evidence (particularly because the sample size n = 148 is quite
large).
rejected H 0 , this would only mean that the average change is nonzero. This does not guarantee
that each individual store increased sales.
12.37 (a) A histogram (on the left) and a boxplot (on the right) are shown below. The
distribution looks reasonably symmetric with a sample mean of x = 15.59 ft and a standard
12.34 (a) (a) A subject's responses to the two treatments would not be independent. (b) We want
deviation of s = 2.550 ft. Notice that the two extreme values are not classified as outliers by
to test H 0 : fld = 0 versus Ha : fld =t- 0 , where fld =the mean difference in the two chemical Minitab-recall that this is because of the difference in the way the quartiles are computed with
measurements from the brain of patients with Parkinson's disease. Since the sample size n = 6 is software and with the calculator.
small we must assume that the differences of these measurements follow the Normal distribution.
We must also assume that these 6 patients are an SRS. The independence condition is met and
the population size is much larger than 60. The test statistic is t = -0.3/1
0.181 6
=-4.4118, with df
=Sand a P-value = 0.0069. Since 0.0069 < 0.01, we reject H 0 and conclude that there is
significant evidence of a difference in the two chemical measurements from the brain.

12.35 (a) We want to testH0 :p=0.5 versus Ha :p> 0.5. The expected number of successes
(np0 =SOx 0.5 = 25) and the expected number of failures (25) are both at least 10, so use of the z
test for these subjects who must be viewed as an SRS of all coffee drinkers is appropriate. The 2
sampIe proportiOn . p" =31
. ts - = 0.62 and t h e test statistic
. . ts. z=~ 0 62 - O.S 1 70 . h a P-vaIue
= . , wtt
(b) A 95% cqnfidence interval for the mean shark length is 15.58642.02{ ~ 9 ) = (14.81,
50 0.5x0.5
16.36). (Note: Some students may use d:f-=43 and the critical value t* = 2.01669 from software of
50
the calculator.) Yes, since 20 feet does not fall in the 95% confidence interval, we reject the
= 0.0446. Since 0.0446 < 0.05, we reject H 0 at the 5% level and conclude that a majority of
claim that great white sharks average 20 feet in length at the 5% level. (c) We need to know
people prefer the taste of fresh-brewed coffee. Some students may argue that the P-value is just what population these sharks were sampled from: Were these all full-grown sharks? Were they
barely below 0.05, so this result may not be practically significant. However, most students will all male? (i.e., is p the mean adult male shark length or something else?)
point out that the results are significant and that this conclusion matches their personal
experiences with coffee drinkers-a majority of people prefer fresh-brewed coffee. (b) A 90%
12.38 We want to testH0 : p = 0.5 versus H.: p ::F 0.5, where p =the proportion of heads obtained
confidence interval for pis 0.62 1.645
62 0 38
x
50
= (0.5071, 0.7329). We are 90% confident from spinning a Belgian euro coin. The expected number of successes ( np0 = 250 x 0.5 =125)
and the expected number of failures (125) are both at least 10, so use of the z test is appropriate.
that between 51% and 73% of coffee drinkers prefer fresh -brewed coffee. (c) The coffee should
. .1s p" =140 056 dh . . 1s
. z=~ 0 56 - 05 '190' 'h p
be presented in random order. Some subjects should get the instant coffee first, and others T he samp Ie proportiOn - = . an t e test statistiC = . , w1t a -
should get the fresh-brewed coffee first. 250 0.5x0.5
250
12.36 Let flu =the mean masculinity score of all hotel managers. We want to test value= 0.0574. Since 0.0574 > 0.05, we cannot reject H 0 at the 5% level and conclude that the
observed difference could be due to chance. An interval of plausible values for p is provided by
H0 = 4.88 versus Ha ~ 21.98, with df =
5 91
> 4.88. The test statistic is t = f
:flu :flu
0.57 148
147 and a P-value ofO to many decimal places. Since the P-value is much smaller than 0.01,
a 95% confidence interval, 0.561.96
56 0 4
x .4 = (0.4985, 0.6215). Notice that the 95%
250
there is overwhelming evidence that hotel managers scored higher on the average than males in confidence interval includes 0.5, which would indicate that the coin is "fair" or balanced. (Note:
general. Turning to femininity scores, let flF =the mean femininity score of all hotel managers. Some students will look at the data and then conduct a one-sided test-this is not good statistical
practice.)
5 29
~~ 1.62,
We want to test H 0 : flF = 5.19 versus Ha : flF > 5.19. The test statistic is t = =
0.75 148
with df= 147 and a P-value of0.053. (To use Table C, look at the df= 100 row and find that
0.05 < P-value < 0.1 0.) There is some evidence that hotel managers exceed males in general,
Chapter 13 Comparing Two Population Parameters 275
274

Chapter 13 I 1

13.1 (a) Counts will be obtained from the samples so this is a problem about comparing
proportions. (b) This is an observational study comparing random samples selected from two
I independent populations.

13.2 (a) Scores will be obtained from the samples so this is a problem about comparing means
(average scores). (b) This is an experiment because the researchers an imposing a "treatme~t"
and measuring a response variable. Since these are volunteers we will not be able to generalize
the results to all garners.
i

13.3 (a) Two samples. The two segments are used by two independent groups of children. (b) (c) Randomization was not possible, because existing classes were used. The researcher could
Paired data. The two segments are both used by each child. not randomly assign the students to the two groups without disrupting classes.

13.4 (a) Single sample. The sample mean will be compared with the known concentration. (b) 13.6 (a) The two populations are breast-feeding women and other women. We want to test
Two samples. The mean concentration in 10 beakers with the new method will be compared to H0 : flB = Jlc versus Ha : flB < Jlc, where JlB and Jlc are the mean percent change in mineral
the mean concentration in 10 different beakers with the old method content of the spines over three months for breast-feeding and other mothers, respectively.
(b) Dotplots (on the left) and boxplots (on the right) are shown below. Both distributions appear
13.5 (a) H 0 : Jlr = Jlc versus Ha : Jlr > Jlc, where fir and Jlc are the mean improvement of to be Normal.
reading ability of the treatment and control group respectively. (b) The treatment group is
slightly left-skewed with a greater mean and smaller standard deviation (:X =51.48, s= 11.01)
than the control group (:X =41.52, s= 17 .15). The histograms below show no serious departures
from Normality for the treatment group (on the left) and one unusually large score for the control
the

Breast-feeding mothers have a lower mean mineral content (:X= -3.587, s= 2.506) with more
variability than other mothers (:X= 0.314, s= 1.297). (c) This is an observational study so we
cannot make a cause and effect conclusion, but this effect is certainly worth investigating
because there appears to be a difference in the two groups of mothers for some reason.

13.7 (a) The hypotheses should involve 1-4 and Jlz (population means) rather than x; and :X2
The boxplot (on the left below) also shows that the median DRP score is higher for the treatment (sample means). (b) The samples are not independent. We would need to compare the scores of
group and the IQR is higher for the control group. Notice that the unusually high score is not the 10 boys to the scores for the 10 girls. (c) We need the P-value to be small (for example, less
identified as an outlier by Minitab. The combined Normal probability plot (on the right below) than 0.05) to reject H 0 A large P-value like this gives no reason to doubtH0
shows an overall liner trend for both sets of scores, so the Normal condition is satisfied for both
groups. 13.8 (a) Answers will vary. Examine random digits, if the digit is even then use Design A,
otherwise use Design B. Once you use a design 30 days, stop and use the other design for the
remaining days in the study. The first three digits are even, so the first three days for using
Design A would be days 1, 2, and 3. (Note, ifDesign A is used when the digit is odd, then the
first three days for using Design A are day 5, day 6, and day 8.) (b) Use a two-sided alternative
( H 0 : J1A = flB versus Ha : J1 A -:F flB ), because we (presumably) have no prior suspicion that one

l
274 Chapter 13 Comparing Two Population Parameters 275

Chapter 13

13.1 (a) Counts will be obtained from the samples so this is a problem about comparing
proportions. (b) This is an observational study comparing random samples selected from two
independent populations.

13.2 (a) Scores will be obtained from the samples so this is a problem ~bout ~omp~:ing mean~
(average scores). (b) This is an experiment because the researchers .an tmposmg a treatme~t
and measuring a response variable. Since these are volunteers we wtll not be able to generahze
the results to all garners.

13.3 (a) Two samples. The two segments are used by two independent groups of children. (b) (c) Randomization was not possible, because existing classes were used. The researcher could
Paired data. The two segments are both used by each child. not randomly assign the students to the two groups without disrupting classes.

13.4 (a) Single sample. The sample mean will be compared with the known ~oncentration. (b) 13.6 (a) The two populations are breast-feeding women and other women. We want to test
Two samples. The mean concentration in 10 beakers with the new method wtll be compared to H 0 : J.l.n = J.l.c versus Ha : J.l.n < J.l.c , where J.iB and J.lc are the mean percent change in mineral
the mean concentration in 10 different beakers with the old method
content of the spines over three months for breast-feeding and other mothers, respectively.
(b) Dotplots (on the left) and boxplots (on the right) are shown below. Both distributions appear
13.5 (a) H0 : J.1.r = J.1.c versus Ha : J.1.r > J.l.c, where fir and J.lc are the mean improvement of to be Normal.
reading ability of the treatment and control group respectively. (b) The treatment group is
slightly left-skewed with a greater mean and smaller standard deviation (x=51.48, s= 11.01)
than the control group (x=41.52, s= 17.15). The histograms below show no serious departures
from Normality for the treatment group (on the left) and one unusually large score for the control
the

Breast-feeding mothers have a lower mean mineral content (:X= -3.587, s= 2.506) with more
variability than other mothers (:X= 0.314, s= 1.297). (c) This is an observational study so we
cannot make a cause and effect conclusion, but this effect is certainly worth investigating
because there appears to be a difference in the two groups of mothers for some reason.

13.7 (a) The hypotheses should involve 1-4 andJ.L.z (population means) rather than x; and :X2
The boxplot (on the left below) also shows that the median DRP score is higher for the treatment
(sample means). (b) The samples are not independent. We would need to compare the scores of
group and the IQR is higher for the control group. Notice that the .u~usually high sc~re is not the 10 boys to the scores for the 10 girls. (c) We need the P-value to be small (for example, less
identified as an outlier by Minitab. The combined Normal probabthty plot (on the nght below) than 0.05) to reject H 0 A large P-value like this gives no reason to doubtH0
shows an overall liner trend for both sets of scores, so the Normal condition is satisfied for both
groups.
13.8 (a) Answers will vary. Examine random digits, if the digit is even then use Design A,
otherwise use Design B. Once you use a design 30 days, stop and use the other design for the
remaining days in the study. The first three digits are even, so the first three days for using
Design A would be days 1, 2, and 3. (Note, if Design A is used when the digit is odd, then the
first three days for using Design A are day 5, day 6, and day 8.) (b) Use a two-sided alternative
( H 0 : J.1.A = J.1.n versus Ha : J.l.A -:f. J.l.n ), because we (presumably) have no prior suspicion that one
276 Chapter 13 Comparing Two Population Parameters 277

design will be better than the other. (c) Both sample sizes are the same ( n1 = n2 = 30 ), so the that the difference in mean summer earnings is between $413.62 and $634.64 higher for men.
appropriate degrees of freedom would be df = 30 - 1 = 29. (d) Because 2. 045 < t < 2.150, and ~c) The sample ~s not r~al~y random, but there is no reason to expect that the method used should
the alternative is two-sided, Table C tells us that 0.04 < P-value < 0.05. (Software gives P = mtroduce any btas. Thts Is known as systematic sampling. (d) Students without employment
0.0485.) We would reject H 0 and conclude that there is a difference in the mean daily sales for were ~xcluded, so the survey results can only (possibly) extend to employed undergraduates.
,I Knowmg the number of unreturned questionnaires would also be useful. These students are
the two designs.
from one college, so it would be very helpful to know if this student body is representative of
some larger group of students. It is very unlikely that you will be able to generalize these results
13.9 (a) We want to testH0 : Jlr = Jlc versus Ha: Jlr > Jlc. The test statistic is to all undergraduates.
I
t= ~
5 48 4 52
1. - 1. 2.311,0.01 <P-value<0.02withdf=20(TlcalculatorgivesP-
11.012 /21 + 17.15 2/23 13.12 Answers will vary.
value= 0.0132 with df= 37.86 and Minitab.gives P-value = 0.013 with df-=37). At the 5%
significance level, it does not matter which method you use to obtain the P-value. The P-value 13.13 (a) We want to test H 0 : f-lR = Jlw versus Ha: f-lR > Jlw, where f-lR and Jlw are the mean
(rounded to 0.013) is less than 0.05, so the data give good evidence that the new activities percent change in polyphenols for men who drink red and white wine respectively. The test
. . . 5.5-0.23
improve the mean DRP score. (b) A 95% confidence interval for f1r - Jlc is stattsttc IS t = ~ 2
2.52 /9 + 3.29 2 /9
= 3.81 with df= 8 and 0.0025 < P-value < 0.005. (b) The
2
(51.~8-41.52)2.086~11.oej21+17.15 /23 = (0.97, 18.94) with df= 20; (1.233, 18.68) on TI
value of the test statistic is the same, but df= 14.97 and the P-value is 0.00085 (Minitab gives
calculator with df= 37.86; and (1.22637, 18.68254) using Minitab with df= 37. We estimate the
0.001 with df= 14). The more complicated degrees of freedom give a smaller and less
mean improvement in reading ability using the new reading activities compared to not using
cons~rvati~e P-value. (c) This study appears to have been a well-designed experiment, so it does
them over an 8-week period to be between 1.23 and 18.68 points. provtde evtdence of causation.

13.10 We want to test H 0 :JIB = Jlc versus Ha : JIB < Jlc . The test statistic is
13.14 (a)A95%confidenceintervalfor JIR-flw is (5.5-0.23)2.306~2.52 2 /9+3.29 2 /9=
3 59 0 31
t=
1
- -
2
=-8.51, P-value < 0.0005 with df= 21 (the TI calculator and Minitab
(2.08%, 8.45%). (b) With df= 14.97, t* = 2.132 and the confidence interval is 2.32% to 8.21 %.
v2.5e /47 + 1.30 j22
(Minitab gives 2.304% to 8.229% with df= 14.) There is very little difference in the resulting
give P-values very close to 0). The small P-value is less than any reasonable significance level, confidence intervals.
say 1%, so the data give very strong evidence that nursing mothers on average lose more bone
mineral than other mothers. (b) A 95% confidence interval for JIB - Jlo is 13.15 (a) We want to test H0 : Jls =JIN versus Ha : Jls > JIN, where Jls and f-lN are the mean
(-3.59-0.31)2.080~2.5e /47 + 1.30 2/22 = (-4.86, -2.95) with df= 21; (-4.816, -2.986) on ~ee velocities for skilled and novice female competitive rowers, respectively. The test statistic
TI calculator with df=66.21 (see the screen shots below); and (-4.81632, -2.98633) using Is t = 3.1583 and the P-value = 0.0052. Note that the two-sided P-value is provided on the SAS
Minitab with df= 66. We estimate the difference in the mean change in bone mineral for output, so to get the appropriate P-value for the one-sided test use 0.0104/2 = 0.0052. Since
breastfeeding mothers when compared to other mothers to be between about 3% and 5%, with 0.0052 < 0.01, we reject H 0 at the 1% level and conclude that the mean knee velocity is higher
breastfeeding mothers losing more r:b-:-on::.e::d~e::.n::.s.:.:ity~~-::-------, for skilled rowers. (b) Using df= 9.2, the critical value is t* = 1.8162 and the resulting
2-SaMPTint 2-SaMPT!nt 2-SaMPTlnt confidence interval for Jls- JIN is (0.4982, 1.8475). With 90% confidence, we estimate that
Inpt:Oata ~ tn1:47 . ( -4.816, -2. 986) skilled female rowers have a mean angular knee velocity of between 0.498 and 1.847 units
xl= ~3.587 x2=.314 df=66.21405835 higher than that of novice female rowers. (c) Taking the conservative approach with Table C df
Sxl:2.506 Sx2:1.297 x1=-3.587
n1:47 n2:22 x:::=.314 = 7 and the critical value is t* = 1.895. Since 1.895 > 1.8162, the margin of error would be '
x2~.314 C-Level:95 Sx1=2.506 larger, so the confidence interval would be slightly wider.
Sx2:1.297 Pooled:llfi Yes 4-Sx:::=1.297
~n2:22 Calcul~ I 13 . 16 (a) The mtssmg
t statistic
IS t= 70.37-68.45
~6.1 0035 2/10 +9.03999 2/8
=0.5143. (b) We want to
13.11 (a) Because the sample sizes are so large, the t procedures are robust against non-
Normality in the populations. (b) A 90% confidence interval for JIM - JIF is testH0 : Jls = JIN versus Ha : Jls * JIN, where Jls and f-lN are the mean weights of skilled and
2 novice female competitive rowers, respectively. The test statistic is t = 0.5143 and the P-value =
(1884.52-1360.39) 1.660~1368.37 2 /675 + 1037.46 /621 = ($412.68, 635.58) using df= 100; 0.6165. Since 0.6165 > 0.05, we cannot reject H 0 at the 5% level. There is no significant
($413.54, $634.72) using df= 620; ($413.62, 634.64) using df= 1249.21. We are 90% confident
278 Chapter 13 Comparing Two Population Parameters 279

difference in the mean weights for skilled and novice rowers. (c) The more conservative 13.23 Answers will vary, but here is an example. The difference between average female (55.5)
approach would use df = 7. The t distribution with df = 7 has slightly heavier tails than the t and male (57.9) self-concept scores was so small that it can be attributed to chance variation in
distribution with df= 11.2, so the conservative P-value would be larger. the samples (t = -0.83, df = 62.8, P-value = 0.411 0). In other words, based on this sample, we
have no evidence that mean self-concept scores differ by gender.
13.17 (a) Two-sample t test. (b) Paired t test. (c) Paired t test. (d) Two-sample t test. (e) Paired t
test. 13.24 (a) If the loggers had known that a study would be done, they might have (consciously or
subconsciously) cut down fewer trees, in order to reduce the impact of logging. (b) Random
13.18 (a) The summary table is shown below. The only values not given directly are the assignment allows us to make a cause and effect conclusion. (c) We want to test H 0 :Jlu =ILL
standard deviations, which are found by computing s = MSEM. (b) Use df= 9. versus Ha : Jlu >ILL , where ILu and ILL are the mean number of species in unlogged and logged
Group Treatment n X s 17 5 13 67
plots respectively. The test statistic is t = - = 2.11 with df= 8 and 0.025 < P-
1 IDX 10 116.0 17.71 ~3.532 /12 + 4.5 2/9
2 Untreated 10 88.5 6.01
value< 0.05. Logging does significantly reduce the mean number of species in a plot after 8
(c) This is a completely randomized design with one control group and one treatment g~oup.
The easiest way to carry out the randomization might be to number the hamsters (or the1r years at the 5% level, but not at the 1% level. (d) A 90% confidence interval for f.1u - JLL is
(17.5 -13.67) 1.860~3.53 /12+ 4.5 2 /9 = (0.46, 7.21). (Minitab gives an interval from 0.63964
individual cages) from 1 to 20. Use the SRS applet and put 20 balls in the population hopper. 2
Select 10 balls from the hopper. The 10 hamsters with these numbers will be injected with IDX.
to 7.02703 .) We are 90% confident that the. difference in the means for unlogged and logged
The other 10 hamsters will serve as the control group.
plots is between 0.46 and 7.21 species.
13.19 (a) Yes, the test statistic for testing H 0 :fit = f.12 versus Ha :fit > ~ is
13.25 Let p 1denote the proportion of mice ready to breed in good acorn years and p 2 denote the
t=
116 - 88 5 = 4.65. With either df= 9 or df= 11.05, we have a significant result proportion of mice ready to breed in bad acorn years. The sample proportions are
~17.71 2 /10+6.oe;to p1 =54/72 =0.75 and p2 =10/17 =0.5882, and the standard error is
(P-value < 0.001 or P-value < 0.0005, respectively), so there is strong evidence that IDX 0.75x0.250.5882x0.4118 .
prolongs life. (b) !fusing df= 9, the 95% confidence interval for fit-~ is SE = + = 0.1298. A 90% confidence mterval for p 1 - p 2 is
72 17
(116-88.5)2.262~17.71 2 /10+ 6.01 2 /10 = (14.12, 40.88). With 95% confidence we estimate (0.75-0.5882)1.645x0.1298 = (-0.0518,0.3753). With 90% confidence, we estimate that
that IDX hamsters live, on average, between 14.12 and 40.88 days longer than untreated the percent of mice ready to breed in the good acorn years is between 5.2% lower and 37.5%
hamsters. !fusing df= 11.05, the interval is (14.49, 40.51). higher than in the bad years. These methods can be used because the populations of mice are
certainly more than 10 times as large as the samples, and the counts of successes and failures are
13.20 (a) This is a two-sample t statistic, comparing two independent groups (supplemented and at least 5 in both samples. We must view the trapped mice as an SRS of all mice in the two
control). (b) Using the conservative df= 5, t = -1.05 would have aP-value between 0.30 and areas.
0 .40, which (as the report said) is not significant.

13.21 We want to test H 0 : ILc = ILs versus Ha : ILc ::f:. ILs . The test statistic is 13.26 (a) The sample proportion of women who felt vulnerable is Pw = ~~ = 0.4821, and the
40 1 3
t= - 1. =-3.74and theP-value is between 0.01 and 0.02 (df= 5) or corresponding sample proportion for men is PM = 46 = 0.7302. (b) A 95% confidence interval
~3.10934 2 /6+3.92556 2/7 63
0.0033 (df= 10.95), agreeing with the stated conclusion (a significant difference). for the difference PM- Pw is (0.7302 -0.4821)1.96 0.7302x 0.2698 + 0.4821x0.5179 =
63 56
13.22 (a) These are paired t statistics: For each bird, the number of days behind the caterpillar (0.0773, 0.4187). With 95% confidence, we estimate the percent of men who feel vulnerable in
peak was observed, and the t values were computed based on the pairwise differences between this area to be about 0.08 to 0.42 above the proportion of women who feel vulnerable. Notice
the first and second years. (b) For the control group, df= 5, and for the supplemented group, df that 0 is not included in our confidence interval, so there is a significant difference between these
= 6. (c) The control tis not significant (so the birds in that group did not "advance their laying proportions at the 5% level.
date in the second year"), while the supplemented group t is significant with a one-sided P-value
= 0.0195 (so those birds did change their laying date).
I '

Chapter 13 Comparing Two Population Parameters 281


280

(0.1104-0.1 121 )1. 96 0.1104x0.8896 + 0.1121x0.8879 _


5690 0.44x 0.56 = (0.4315 0.4486). 1649 - (-0.0232, 0.0197). We are 95Yo 0
13.27 (a) A 95% confidence interval for PN is 1.96 ' 1650
i 12931 12931
'!
With 95% confidence, we estimate the percent of cars that go faster than 65 mph when no radar confident that the di~ference i~ t~e proportion of deaths for the two treatment groups is between
-_0.0.2 and 0.?2. Notice that 0 IS m the confidence interval, so we do not have evidence of a
I is present is between 43.15% and 44.86%. (b) A 95% confidence interval for PN- PR is
I ' stgmficant dt.fference. in th~ proportion of deaths for these two treatments at the 5% level. (d) A
0 56 0 32 0 68 Type I ~rror IS committe? If the researchers conclude that there is a significant difference in the
, I ( 0.44-0.32) 1.96 0.44 x + x = (0.1 02, 0.138). With 95% confidence, we
12931 3285 . propo.rtiOns o~ stro~es With these two treatments, when in fact there is no difference. A Type II
estimate the percent of cars going over 65 mph is between 10.2% and 13.8% higher when no error IS c~mmttted 1fthe researchers conclude that there is no difference in the proportions of
radar is present compared to when radar is present. (c) In a cluster of cars, where one driver's str~kes With these two .treatments, when in fact there is a difference. A Type II error is more
behavior might affect the others, we do not have independence; one of the important properties senous bec~use no patients .would be harmed with a Type I error, but patients suffer
unnecessanly from strokes If the best treatment is not recommended.
of a random sample.
= Pw * Pw.. The

13.28 A 95% confidence mterva 1318 1.96
1.,.10r p IS--
2092
63

2092
7
x0.3 = (0.6093, 0.6507). We are
13.31 For computer access at home, we want to testH0 : Pn
. d I . . ~
comb me samp e proportiOn Is p = 86 + 1173
versus Ha : Pn

=0.615 and the test statistic is


95% confident that between 61% and 65% of all adults use the internet. (b) A 95% confidence c 131+1916
0.6565-0.6122
. ( )
interval for Pu- PN IS 0.79-0.38 1.96
0.79x0.21 0.38x0.62 (O 3693 0 4506 ) W
+ = . , . . e are z = ~0.615(1-0.615)(1/131 + 1/1916) =1.01' with a P-value = 0.3124. The same hypotheses are
1318 774
95% confident that the difference in the proportion of internet users and nonusers who expect used for the proportions with computer access at work. The combined sample proportion is
businesses to have Web sites is between 0.37 and 0.45. ~ 100+1132. 07634-05908
Pc = 0.602 and the test statistic is z = 3:9o
131 + 1916 ~0.602(1-0.602)(1/131+1/1916) . '
13.29 Let pi= the proportion of students who use illegal drugs in schools with a drug testing with a P-value < 0.0004. Since the P-value is below any reasonable signific~nce level say 1o.t
h 'd ' ;ro,
program and p 2 = the proportion of students who use illegal drugs in schools without a drug we ave very strong ev1 ence of a difference in the proportion of blacks and whites who have
testing program. We want to test H 0 :PI = p 2 versus Ha :PI < p 2 The combined sample computer access at work.

. IS
proportiOn . p~ = 7 + 27 =. 0.1232 and t h e test statistic
. . IS. 13.32 (a) Let Pi= the proportion of women got pregnant after in vitro fertilization and
c 135+141
intercessory prayer and p 2 =the proportion of women in the control group who got pregnant
z= 0 0519 - 0 .1 915 -3.53, withaP-value=0.0002. Since0.0002<0.01,
after in vitro fertilization. We want to test H 0 :pi = p 2 versus Ha :pi * p 2 The combined sample
~0.1232(1- 0.1232)(1/135 + 1/141)
. . ~
we rejectH0 There is extremely strong evidence that drug use among athletes is lower in proportiOn IS Pc =
44+21
88+81
= 0.3846 and the test statistic is
schools that test for drugs. There should be some concern expressed about the condition oftwo
0.5-0.26
independent simple random samples, because these two samples may not be representative of =
z = ~0.3846(1-0.3846)(1/88+ 1/81) 3.21' with a P-value = 0.0014. Since 0.0014 < 0.01, we
similar schools.
reject H 0 This is very strong evidence that the observed difference in the proportions of women
13.30 (a) The patients were randomly assigned to two groups. The first group of 1649 patients
wh? got pre~nant is not due to chance. (b) This study shows that intercessory prayer may cause
received only aspirin and the second group of 1650 patients received aspirin and dipyridamole.
an mcrease m pregnancy. However, it is unclear if the women knew that they were in a
*
(b) We want to test H 0 :pi = p 2 versus Ha :PI p 2 The combined sample proportion is treatment group. If they found out that other people were praying for them, then their behaviors
~
p =
c
206+ 157
1649 + 1650
= d h . . .
0.11 an t e test statistic IS z =
1
0.1249-0.0951
-y0.11(1- 0.11)(1/1649 + 1/1650)
. 2 73 . h
, Wit may ha~e changed and there could be many other factors to explain the difference in the two
proportiOns. (c) A Type I error would be committed if researchers concluded that the
a P-value = 0.0064. Since 0.0064 < 0.01, there is very strong evidence that there is a significant proportions of p:egnancies are different, when in fact they are the same. This may lead many
difference in the proportion of strokes between aspirin only and aspirin plus dipyridamole. (c) couples to seek mtercessory prayer. A Type II error would be committed if researchers
c~ncluded that the proportions are not different, when in fact they are different. Couples would
A 95% confidence interval for Pi- p 2 is
fall to take advan.tage of a ~elpful. technique to improve their chances of having a baby. For
couples who are mterested m havmg a baby, a Type II error is clearly more serious.
i'
I
I
Chapter 13 Comparing Two Population Parameters
282 283
I I

I !
AZT and a placebo, when in fact there is a difference. The consequence is that patients would
13.33 (a) H 0 should refer to population proportions PI and p 2 , not sample proportions. (b) not get t~e best possible treatment. A Type II error i~ more serious in this situation because we
i Confidence intervals account only for sampling error. want patients to get the best possible treatment.
! !

I 13.34 (a) Let p 1 =the proportion of households where no message was left and contact was 13.37 (a) The number of_ord:rs completed in 5 days or less before the changes was
eventually made and p 2 =the proportion of household where a message was left and contact was =
XI = 0.16 x 200 = 32 With P1 = 0.16 and SEP 0.02592, the 95% confidence interval for p
eventually made. We want to test H 0 : p1 = p 2 versus Ha :pi < p 2 The combined sample is (0.1092, 0.2108). (b) After the changes, x2 = 0.9x200 = 180. With P2 = 0.9 and I

. p~ = 58+ 200 =
proportion IS . 0 .66 an d the test statistic
. . IS. SEP =0.02121, the 95% confidence interval for p is (0.8584, 0.9416). (c) The standard error
2

c 100+291 ofthe difference in the proportions is SEPz-fi, =0.0335 and the 95% confidence interval for
''
I
z 0 58 - 0 687 =
-1.95, with a P-value = 0.0256. Yes, at the 5% level, there ~z- PI is (0.6743, 0.8057) or about 67.4% to 80.6%. No, the confidence intervals are not
~0.66(1-0.66)(1/100+ 1/291) duect~y rel~te~. E_ach interva! is based on a different sampling distribution. Properties of the
is good evidence that leaving a message increases the proportion of households that are sampl~ng d~str~but~on o_fthe difference can be obtained from properties of the individual
eventually contacted. (b) Let p 1 =the proportion ofhouseholds where no message was left but
the survey was completed and p 2 =the proportion of household where a message was left and
d
s~mphng distnbutiOns m parts (a) and (b), but the upper and lower limits of the intervals are not
Irectly related.

the survey was completed. We want to test H 0 : p 1 = p 2 versus Ha :PI < P2 The combined sample
13.38 (a) We must have tw~ simple random samples of high-school students from Illinois; one
~
proportion is Pc
33 + 134
100+291
=0.427 and the test statistic
. . .IS
for freshman and one for semors. (b) The sample proportion of freshman who have used
z= 0 33 - 0 .4 6 =-2.28, with a P-value = 0.0113. Yes, at the 5% level, anab 0 rIC st ermds IS ~ = 34 0.0203. Since the number of successes (34) and the, number of
PF =
1679
~0.427(1-0.427)(1/1 00 + 1/291)
failures (1645) are both at least 10, the z confidence interval can be used. A: 95% confidence
there is good evidence that leaving a message increases the proportion of households who
complete the survey. (c) A 95% confidence interval for the difference p 1 - p 2 when dealing with . t I .(:', . 0.0203 X 0.9797
m erva 1.0r PF Is 0.02031.96 = (0.0135, 0.0270). We are 95% confident
1679
eventual contact is (-0.218, 0.003). A 95% confidence interval for the difference p 1 - P2 when
that between 1.35% and 2.7% ofhigh-school freshman in Illinios have used anabolic steroids.
dealing with completed surveys is (-0.239, -0.022). Although these effects do not appear to be
large, when you are dealing with hundreds (or thousands) of surveys anything you can do to (c) The sample proportion of seniors who have used anabolic steroids is p~ = ~:::: 0.0176
improve nonresponse in the random sample is useful. . s 1366 .
Notice t~at 0.01:6 falls in the 95% confidence interval for plausible values of PF from part (b),
13.3 5 (a) H 0 : PI = p 2 versus Ha : PI > p 2 where PI is the proportion of all HIV patients taking a so there IS no ev~dence of a significant difference in the two proportions. The test statistic for a
placebo that develop AIDS and p 2 is the proportion of all HIV patients taking AZT that develop formal hypothesis test is z = 0.54 with a P-value = 0.59.
AIDS. The populations are much larger than the samples, and ndJc, n1 (1- Pc ), n2ftc, n2 (1- Pc)
13.39 We want to test Ho: PI= Pz versus Ha: PI =F p 2 From the output, z =-3.45 with a P-
38
are all at least 5 (b) The sample proportions are PI = =0.0874, p2 = _!2._ =0.0391, and value = 0.0006, showing a significant difference in the proportion of children in the two age
. ~5 ~5
groups who sorted the products correctly. A 95% confidence interval for p _ p is (-0 5025279
-0.15407588). With 95% confi~ence we estimate that between 15.4% and ~0.3~ mor~ 6- to 7- '
. . . 0.0874-0.0391 . .h P I
p = 0.0632 . The test statistic IS z = = 2 .93 , Wit a -va ue
c ~0.0632(1-0.0632)(1/435 + 1/435) year-olds can sort new products mto the correct category than 4- to 5-year-olds.
of 0.0017. There is very strong evidence that a significantly smaller proportion of patients taking
AZT develop AIDS than if they took a control. (c) Neither the subjects nor the researchers who
had contact with them knew which subjects were getting which drug.
13.40 (a) The two sample proportions are Pw = ~
53
=0.1132 and pN = 108
45 ::::0.4167
.
(b) W
e
wanttotest H o p
w =pN versusHapw.,...... PN The com b"me d samp1e proportiOn
IS

13.36 A Type I error would be committed if researchers concluded that the treatment is more 6 45
effective than a placebo, when in fact it is not. A consequence is that patients would be taking Pc = + = 0.3168and the test statistic isz 0 .1 132 -0.4167 .:..
AZT and perhaps suffering from side effects from the medication that is not helpful. A Type II
53 + 108 ~0.3168(1-0.3168)(1/53+1/108) --3.89 '
error would be committed if researchers conclude that there is no difference in the success of with a P-value < 0.0002. Since the P-value is less than any reasonable significance level, say
286 Chapter 13 Comparing Two Population Parameters 287
I 1

13.48 (a) We want to test H0 :liP =lie versus Ha :liP > lie . The test statistic is

t =~ 2 - 1.17, with a P-value close to ~.125. (Minitab reports a P-value of0.123


193 174
=
68 /26 + 44 2/23
with df= 44.) Since 0.125 > 0.05, we do not have strong evidence that pets have higher mean
cholest~rol than clinic dogs. (b) A 95% confidence interval for f.ip- lie is

(193-174) 2.074~68 2 /26+44 2 /23 = (-14.5719, 52.5719). Minitab gives (-13.6443,


51.6443). With 95% confidence, we estimate the difference in the mean cholesterol levels
between pets and clinic doges to be between -14 and 53 mg/dl. (c) A 95% confidence interval
for lip is 1932.060 ~ = (165.5281, 220.4719). Minitab gives (165.534, 220.466). With
v26
The test statistic is t =
40 8 25
!1. - =
1.91, with 0.025 <P-value < 0.05 and df= 7 95% confidence, we estimate the mean cholesterol level in pets to be between 165.5 and 220.5
2
~3.17 /I 0 + 3.69 2 /8 mg/dl. (d) We must have two independent random samples to make the inferences in parts (a)
(Minitab gives a P-value of 0.039 with d:f=13). The P-value is less than 0.05, so the data give and (b) and a random sample of pets for part (c). It is unlikely that we have random samples
good evidence that the positive subliminal message brought about greater improvement in math from either population.
scores than the control. (b) A 90% confidence interval for f.1r - f.ic is
(11.40-8.25)1.895~3.17 2 /10+3.69 2 /8 = (0.03, 6.27) with df= 7; (0.235, 6.065) using =
13.49 (a) The two sample proportions are Pe = 17/283 0.0601 for residents of congested streets

Minitab with df = 13. With 90% confidence, we estimate the mean difference in gains to be
=
and Pn = 35/165 0.2121 for residents of bypass streets. The difference is Pc- Pn = -0.1520 with
0.0601x0.9399 0.2121x0.7879 .
0.235 to 6.065 points better for the treatment group. (c) This is actually a repeated measures
design, where two measurements (repeated measures) are taken on the same individuals. Many
a standard error of SE =
283
+
165
= 0.0348. (b) The hypotheses are
.
students will probably describe this design as a completely randomized design for two groups, H 0 : Pe = Pn versus Ha : Pe < Pn. The alternative reflects the reasonable exp'ectation that
with a twist-instead of measuring one response variable on each individual, two measurements reducing pollution might decrease wheezing. (c) The combined sample proportion is
are made and we compare the differences (improvements). 17+35 . dh . . . o.o6o1-o.2121 .
Pc =
A

= 01161
. an t e test statistic IS z =
1
=-4.85,
283 + 165 "0.1161(1- 0.1161)(1/283 + 1/165)
13.45 (a) A99% confidence interval for PM- Pw is
with a P-value < 0.0001. A sketch of the distribution of the test statistic, assuming H 0 is true, is
(0.9226-0.6314)2.576
9226 0 0774 0 6314 03686
x
840
+ x
1077
= (0.2465, 0.3359). Yes, because shown below.

the 99% confidence interval does not contain 0. (b) We want to test H 0 :liM = f.1w versus

H :liM =1:-
. . IS
f.1w. The test statistic . t
272.40-274.7
= . - 0.87 , wit
. h a P-vaIue c Iose to
a ~59.2 /840+57.5 /1077
2 2

0.4. (Minitab reports a P-value of0.387 with df= 1777.) Since 0.4 > 0.01, the difference
between the mean scores of men and women is not significant at the 1% level.

13.46 (a) Matched pairs t. (b) Two-sample t. (c) Two-sample t. (d) Matched pairs t. (e)
Matched pairs t.

13.47 (a) A 99% confidence interval for f.ioPT - J.iwJN is Notice that a reference line is provided at -4.85 to illustrate how far down in the lower tail of the
distribution that this value of the test statistic is located. The P-value tells us the chance of
(7638-6595) 2.581~289 2 /1362+ 247 2 /1395 = (1016.55, 1069.45). (b) The fact that the observing a test statistic of -4.85 or something smaller if H 0 is true. As you can see there is
sample sizes are both so large (1362 and 1395) .. almost no chance of this happening, so we have very convincing evidence that the percent of
residents reporting improvement from wheezing is higher for residents ofbypass streets. (d)
The 95% confidence interval, using the standard error from part (b), has margin of error
1.96x 0.0348 = 0.0682. Thus, the 95% confidence interval is -0.1520.0682 = (-0.2202,
288 ChapterJ3.

-0.0838). The percentage reporting improvement was between 8% and 22% higher for bypass
residents. (e) There may be geographic factors (e.g., weather) or cultural factors (e.g., diet) that
limit how much we can generalize the conclusions.

13.50 (a) A 99% confidence interval for PH- PN is


0.07 X 0.93 + 0.14 X 0.86 = (-0.0991, -0.0409). W1t
h 99% confid
( 0 .07-0. 14) 2.57 6 1 ence,
2455 1191
the percentage ofblacks is between 4.09% and 9.91% higher for non-household providers. Yes,
the difference is significant at the 1% level because the 99% confidence interval does not contain
0. (b) A 99% confidenceinterval for flH- flN is (11.6-12.2)2.581~2.2 2 /2455 +2.f /1191 =
(-0.7944, -0.4056), using df= 1000. (Minitab gives (-0.794182, -0.405818) with df=2456.)
With 99% confidence, the mean number of years of school for non-household workers is
between 0.41 and 0.79 years higher than household providers. Yes, the difference is significant
at the 1% level, because 0 is not included in the 99% confidence interval.
'i
I
I I 288 Chapter.l3. 289

-0.0838). The percentage reporting improvement was between 8% and 22% higher for bypass Chapter 14
residents. (e) There may be geographic factors (e.g., weather) or cultural factors (e.g., diet) that
limit how much we can generalize the conclusions. 14.1 (a) (i) 0.20 < P-value < 0.25. (ii) P-value = 0.235. (b) (i) 0.02 < P-value < 0.025. (ii) P-
value = 0.0204. (c) (i) P-value > 0.25. (ii) P-value = 0.3172.
13.50 (a) A 99% confidence interval for Pn- PN is
, I 0.07 X 0.93 0.14 X 0.86 . 14.2 Answers will vary. (a) Use a ;r 2 goodness of fit test. Most classes will obtain a very large
( 0 .07- 0 . 14 ) 2 .576 + = (-0.0991, -0.0409). Wtth 99% confidence,
2455 1191 value ofthe test statisticX 2 and a very small P-value. (b) Use a one-proportion z test with a
the percentage of blacks is between 4.09% and 9.91% higher for non-household providers. Yes, two-sided alternative or construct a confidence interval for p. (c) You can construct the interval;
the difference is significant at the 1% level because the 99% confidence interval does not contain however, your ability to generalize is limited by the fact that your sample of bags is not an SRS.
M&M's are packaged by weight rather than count.
0. (b) A 99% confidence interval for fln- flN is (11.6-12.2)2.581~2.2 2 /2455+2.1 2 /1191 =
(-0.7944, -0.4056), using df= 1000. (Minitab gives (-0.794182, -0.405818) with df=2456.)
With 99% confidence, the mean number of years of school for non-household workers is 14.3 (a) See the table below; for example, =22
91
24.2% received A's. (There were 91 students
between 0.41 and 0.79 years higher than household providers. Yes, the difference is significant
in the class.) The professor gave fewer A's and more D/F's than TAs. (b) The expected counts
at the 1% level, because 0 is not included in the 99% confidence interval.
are also given in the table below; for example, 91x0.32 = 29.12. (c) We want to test
H0 : PA = 0.32, PB = 0.41, Pc = 0.20, pDf F = 0.07 versus Ha : at least one of these proportions is
different. All the expected counts are greater than 5 so the condition for the goodness of fit test
is satisfied. The chi-square statistic is
2 2 2
(22-29.12) (38-37.31) (20-18.2t (11-6.37) 9 w h
2
X = + + + =5.2 7. e ave df = 4 - 1 = 3, so
29.12 37.31 18.2 6.37
TableD shows 0.15 < P-value < 0.20 and software gives P-value = 0.1513. Since 0.1513 > 0.05,
there is not enough evidence to conclude that the professor's grade distribution was different
from the TA grade dlS tfl'b Uf lOll.
A B c DIF
Percent 24.2% 41.8% 22.0% 12.1%
Expected Count 29.12 37.31 18.2 6.37

14.4 We want to test H 0 : Pv = PT2o = pT40 =.!.versus Ha: at least one of these proportions is
3
=
different. There w~re 53 birds in all, so the expected counts are each 53 x! 17.67 . Since the
3
expected counts are.greater than 5, the goodness of fit test can be used for inference. The chi-
. . . . (31-17.67)2 (14-17.67 )2 (8-17.67 )2
squarestatlsttcts X 2 = + + . =10.06+0.76+5.29=16.11. The
17.67 17.67 17.67
degrees of freedom are df= 3- 1 = 2, and TableD shows that 16.11 is greater than the 0.0005
critical value of 15.20, so the P-value < 0.0005. Since 0.0005 < 0.01, there is very strong
evidence that the three tilts differ. The data and the terms of chi-square show that more birds
than expected strike the vertical window and fewer than expected strike the 40 degree window.

14.5 We want to testH0 :The genetic model is valid (the different colors occur in the stated ratio

of 1:2:1 or Paa =Pgg =!,Pag


4
=!2 ). Ha :The genetic model is not valid. The expected counts
are 21 for GG (green), 42 for Gg (yellow-green), and 21 for gg (albino). The chi-square statistic
290 Chapter 14 Inference for Distributions of Categorical Variables: Chi-Square Procedures 291

2 2 2 2 2 2
. (22-21) (50-42) (12-21) 2 (530-523) (470-523) (420-523r (610-523) 2 (585-523)
I ' tsX 2 = + + == 5.43 with df= 3- 1 = 2. Accordmg to TableD, 0.05 < X= + + + + =47.57. Wehavedf=
, I 21 42 21 523 523 523 523 523
P-value < 0.1 and software gives P-value = 0.0662. Since 0.0662 > 0.01, we do not have 5- 1 = 4, so TableD shows that 47.57 is greater than the 0.0005 critical value of20.00, soP-
significant evidence to refute the genetic model, although the P-value is only slightly larger than value < 0.0005. Since 0.0005 < 0.01, we have statistically significant evidence that the fruit
0.05. . flavors in Trix cereal are not uniformly distributed.

14.6 We want to test H 0 : Motor vehicle accidents involving cell phone use are equally likely to 14.11 (a) The two-way table of counts is shown below.
occur on each weekday versus Ha : The probabilities of a motor vehicle accident involving cell Treatment Successes Failures
phone use vary from weekday to weekday (that is, they are not the same). The hypotheses can Nicotine patch 40 244-40 = 204
Drug 74 244-74= 170
also be stated in terms of population proportions: H0 :PM = Pr =Pw = pR = PF =.!. versus Ha : At Patch plus drug 87 245-87 = 158
5
least one of the proportions differs from 1/5 = 0.2. The expected counts are all equal to 667x0.2 Placebo 25 160-25=135
40 74 87
= 133.4 > 5, so the condition for inference with the goodness of fit test is satisfied. The chi-
square statistic is
(b) The proportions are pN =
244
== 0.1639, pD =
244
== 0.3033, pP+D =
245
=
0.3551, and

we PP = 25 == 0.15625. (c) The bar chart below shows that the patch plus the drug is the most
2 2 2
2 (133 -133.4) (126-133.4r (159-133.4r (136-133.4) (113 -133.4)
X + + + + = 8.495. 160
133.4 133.4 133.4 133.4 133.4
have df= 5-1= 4, so TableD shows 0.05 < P-value < 0.10 and software gives P-value = 0.075. effective treatment, followed by the drug alone. The patch alone is only slightly better than a
Since 0.075 > 0.05, we do not have significant evidence to refute the hypothesis that motor I I
vehicle accidents involving cell phone use are equally likely to occur on each weekday.

14.7 Answers will vary

14.8 We want to testH0 :p1 = p 2 == p 12 =_!_versus Ha: At least one ofthe proportions differs
12
=
from 1/12. There were 2779 responses, so we would expect 2779/12 231.58 for each sign.
The condition for inference (231.58 > 5) is satisfied. The chi-square statistic is
2 2
2 (225-231.58) (222-231.58r (241-231.58r (244-231.58) h df-
X + + ++ =14.39 Wlt - 12
231.58 231.58 231.58 231.58 '
(d) The success rate (proportion of those who quit) is the same for all four treatments. (e) The
- 1 = 11. From TableD, 0.20 < P-value < 0.25 and software gives P-value = 0.212. There is not
expected counts are shown in the table below. Each entry is obtained by multiplying the row
enough evidence to conclude that births are not uniformly spread throughout the year.
total by the column total and dividing by the total number of smokers (893). For example, with
=
the nicotine patch the expected number of success is 244x226/893 61.75 and the expected
14.9 (a) H 0 : Po= P1 = ..... = p 9 =0.1 versus Ha :At least one ofthe p;'s is not equal to 0.1. (b)
=
number of failures is 244x667/893 182.25.
and (c) Answers will vary. Using randlnt (0,9,200) ~ L4, we obtained the counts for digits 0 to Treatment Successes Failures
9: 19, 17, 23, 22, 19, 20, 25, 12, 27, and 16. (d) Expected counts are all to 200x0.1 = 20. (e) Nicotine patch 61.75 182.25
2 2 2
. . +'. I .. (19-20) (17-20) (16-20) "hdf Drug 61.75 182.25
The test statistic 1.0r our stmu at10n ts X 2 = + + + == 8.9 , wtt =
20 20 20 Patch plus drug 62 183
10-1 = 9 and P-value = 0.447. There is no evidence that the sample data were generated from a Placebo 40.49 119.51
distribution that is different from the uniform distribution. (f) The numbers of smokers who successfully quit with "patch plus drug" and "drug" are higher
than expected. The numbers of smokers who successfully quits with "nicotine patch" and
1 "placebo" are lower than expected. This is a slightly different way of looking at the differences
14.10 We want to testH0 :pa = PLemon = Pume =Po= Ps =-versus Ha: At least one ofthe
5 in the success rates we noticed in parts (b) and (c).
proportions differs from 1/5 = 0.2. The expected counts are all equal to 2615x0.2 = 523 > 5, so
the condition for inference with the goodness of fit test is satisfied. The chi-square statistic is 14.12 (a) r =the number of rows in the table and c =the number of columns in the table. (b)
The approximate proportions are shown in the table below.
I i
I

292 Chapter 14 Inference forDistributions of Categorical Variables: Chi-Square Procedures 293

Goal Female Male


HSC-HM 0.21 0.46
HSC-LM 0.10 0.27
LSC-HM 0.31 0.07
LSC-LM 0.37 0.19
(c) One of the two bar charts below should be provided. Both graphs compare the distributions,
so the choice is really a personal preference. It appears that men and women participate in sports
for different reasons-women are more likely to fall in the two categories oflow social
while men are more to fall in the two of social
I I

(c) The null hypothesis says that the incidence of strokes is the same for all four treatments. (d)
th e table beIow.
The expecte d counts are sh own m
Treatment Strokes No Strokes
Placebo 1649x824 = .
205 81
1649x 5778 = .1
1443 9
6602 6602
Aspirin 205.81 1443.19
Dipyridamole 1654 X 824 :::: .4
206 4
1654x5778 = .
1447 56
6602 6602
(d) The expected counts are shown in the table below. The proportions of students in the other
categories is 25/134 = 0.1866, 26/134 = 0.194, and 38/134 = 0.2836. Multiplying each ofthese
Both 1649 X 824 :::: .
205 94
1650x5778 = .
1444 06
6602 6602
proport10ns b>y 67 gives
th e expecte d va1ues.
Goal Female Male
14.14 The two-sample z test statistics for two proportions and the corresponding P-values are
HSC-HM 22.5 22.5
shown in the table below
HSC-LM 12.5 12.5
Null Hypothesis Test Statistic P-value
LSC-HM 13 13
Ho : Pprimary = Psecondary z=-0.66 0.510
LSC-LM 19 19
(e) For women, the observed counts are higher than expected for the two LSC categories and Ho : Pprimary = Puniversity z=1.84 0.065
lower than expected for the two HSC categories. For men the observed counts are higher than z=2.32 0.020
Ho : Psecondary = Puniversity
expected for the two HSC categories and lower than expected for the two LSC categories. The
comparison of the observed and expected counts shows the same association as we noticed with
the proportions in parts (b) and (c). 14.15 (a) The components of the chi-square statistic are shown in the table below.
Treatment Successes Failures
Nicotine patch 7.662 2.596
'
,I Treatment Drug 2.430 0.823
Placebo 250 1649- 250 = 1399 Patch plus drug 10.076 3.414
206 1649-206 = 1443 Placebo 5.928 2.008
211 1654- 211 = 1443 The sum ofthese 8 values isX 2 =34.937with df= (4-l)x(2-1) = 3. (b) According to TableD,
Both 157 1650- 157 = 1493 P-value = < 0.0005. A P-value of this size indicates that it is extremely unlikely that such a result
(b) Even though the number of patients receiving each treatment is approximately the same, it is occurred due to chance; it represents very strong evidence against H 0 (c) The term for success
best to get the students used to switching counts to proportions (or percents) before making with patch plus drug contributes the most (10.076) to X 2 No, this is not surprising because we
comparisons. The bar graphs below compare the four distributions. Students will make a choice noticed in Exercise 14.11 that the "patch plus drug" group contained a higher than expected
between the two different visual displays based on personal preference. The treatment using number of successful quitters and had the highest proportion of successes. (d) Treatment is
both aspirin and dipyridamole appears to be the most effective because it has the highest strongly associated with success. More specifically, the patch together with the drug seems to be
proportion of patients who did not suffer from strokes.
I I
! 11,1

294 Chapter 14 Inference for Distributions of Categorical Variables: Chi-Square Procedures 295
I I
most effective, but the drug is also effective. (e) Yes, the X 2 value and conclusion are the same,
and the P-value is given more accurately, as 0.00000013. 14.17 (a) The components of the chi-square statistic are shown in the table below.
I I, Goal Female Male
I 14.16 Answers will vary. The bar graphs below illustrate the differences in the three HSC-HM 3.211 3.211
I I
distributions. The biggest differences appear for the responses of Excellent and Good. Blacks HSC-LM 2.420 2.420
are less likely to rate the schools as excellent and Hispanics are more likely to give the schools LSC-HM 4.923 4.923
I
' ' the highest rating. Whites are most likely to give the schools a good rating, while Blacks and LSC-LM 1.895 1.895
are most to ve the schools a fair
The sum ofthese 8 values isX 2 =24.898with df= (4-1)x(2-1) = 3. (b) From TableD, P-value
I i
< 0.0005. A P-value of this size indicates that it is extremely unlikely that such a result occurred
due to chance; it represents very strong evidence against H 0 (c) The terms corresponding to
LSC-HM and HSC-HM (for both sexes) provide the largest contributions toX2 This reflects the
fact that males are more likely to have "winning" (social comparison) as a goal, while females
i 1
are more concerned with "mastery." (d) The terms and results are identical. The P-value of
0.000 in the MINITAB output reflects the fact that the true P-value in part (b) was actually
considerably smaller than 0.0005.

14.18 (a) We want to test H 0 : p 1 =p 2 versus H 1 : p1 -:~= p 2 , where p 1 denotes the proportion of
The null hypothesis is that the distributions of responses to this question will be the same for
each group, and the alternative hypothesis is that the distributions are not the same. The Minitab patients who. improved with gastric freezing and p 2 denotes the proportion of patients who
2 improved with the placebo. The actual counts of successes and failures are all greater than 5, so
output below contains the counts, expected counts, contribution to X , the value of the test
statistic X 2 = 22.426, df= (5-1)x(3-1) = 8, andP-value = o.oo4. Since 0.004 < 0.01, we have = p
the z test is safe. The sample proportions are p1 =28 I 82 0.3415 , 2 =30 I 78 0.3846, and =
strong evidence to reject the null hypothesis and conclude that these three groups have different
opinions about the performance of high schools in their state.
"
Pc = 28+30 =. O.3625 . The test stattsttc
. . .
lS z =
0.3415-0.3846
=-0.57 with a
Expected counts are printed below observed counts 82+ 78 ( 1 1)
0.3625(1-0.3625) 82 + 78
Chi-Square contributions are printed below expected counts
Black Hispanic White P-value = 0.5686 (software gives 0.57). (b) See the Minitab output below. The expected cell
parents parents parents Total
1 12 34 22 68 counts are all greater than 5, so the X 2 test is safe. The test statistic is X 2 = 0.322, which
22.70
5.047
22.70
5.620
22.59
0.015
equals z 2 =(-0.57)
2
=0.3249 (up to rounding; it is even closer if we carry out more decimals in
the computation of z). With df= 1, TableD tells us that the P-value > 0.25; Minitab reports P =
2 69 55 81 205 0.570. (c) Gastric freezing is not significantly more (or less) effective than a placebo treatment.
68.45 68.45 68.11 Freezing Placebo All
0.004 2.642 2.441
Improved 28 30 58
3 75 61 60 196 29.73 28.28 58.00
65.44 65.44 65.12 0.10011 0.10524
1. 396 0.301 0.402
No 54 48 . 102
52.28 49.73 102.00
4 24 24 24 72 0.05692 0.05984
24.04 24.04 23.92
0.000 0.000 0.000 All 82 78 160
82.00 78.00 160.00
5 22 28 14 64
21.37 21.37 21.26 Cell Contents: Count
0.019 2.058 2.481 Expected count
Contribution to Chi-square
Total 202 202 201 605 ~earson Chi-Square = 0.322, DF = 1, P-Value = 0.570
Chi-Sq = 22.426, DF = 8, P-Value = 0.004
296 Chapter 14 Inference for Distributions of Categorical Variables: Chi-Square Procedures 297

14.19 (a) The comgonents of the chi-square statistic are shown in the table below. 14.20 (a) We want to test H 0 : PI= p 2 versus HI: pi*: p 2 , where pi denotes the proportion of
Treatment Strokes No Strokes
patients who died while taking aspirin and p 2 denotes the proportion of patients who died while
'I, Placebo 9.487 1.353
Aspirin 0.000 0.000 taking both aspirin and dipyridamole. The sample proportions are
Digyridamole 0.101 0.014 PI =206/1649='=0.1249,p2 =157/1650='=0.0952, and p = 206 + 157 ='=0.11. The test
Both 11.629 1.658 c 1649+ 1650
The sum of these 8 values isX 2 = 24.243 with df= 3 and P-value < 0.0005. Since 0.0005 < 0.01, . . IS
statistic . z= 0.1249-0.0952 = 2 .73 wit
. h a p -va1ue = 0 .0064 (so ftware gives
.
we reject the null hypothesis and conclude that the distributions were different for the different 1 1
treatments. The largest contributions to the X 2 statistic come from the Stroke and Placebo 0.11(1-0.11)(-- + -- )
i I 1649 1650
treatments. Patients taking a placebo had many more strokes than expected, while those taking
0.006). Since 0.0064 < 0.01, we have strong evidence that there is a significant difference in the
both drugs had fewer strokes. The combination of both drugs is effective at decreasing the risk
proportion of strokes for these two treatment groups. (b) We want to test H 0 :PI = p 2 versus
ofstroke. (~)_A two-wl!Y_ table of counts is shown below
Treatment Strokes No Strokes HI : pi *: p 2 , where pi denotes the proportion of patients who suffered from strokes while taking
Placebo 202 1649-202=1447 aspirin and p 2 denotes the proportion of patients who suffered from strokes while taking both
Aspirin 182 1649-182=1467 aspirin and dipyridamole. The actual counts of successes and failures are all greater than 5, so
Dipyridamole 188
Both 185
1654-188=1466
1650-185=1465
= =
the z test is safe. The sample proportions are PI= 18211649 0.1104 ,p2 = 185/1650 0.1121,
182 185
Bar charts comparing the four distributions are shown below. The distributions appear to be very and" = + ='=0.1112. Theteststatisticis
similar. Pc 1649+ 1650 I I
~~~ z=
0 1104 0 1121
- =-0.16 withaP-value=0.8728 (softwaregives
1 1
0.1112(1-0.1112)(-- + -- )
1649 1650
0.873). Since 0.8728 > 0.05, we do not have evidence to refute the null hypothesis that the death
rates are the same for the two treatment groups. (c) No, a chi-square test is not needed because
we are comparing two different response variables for two groups.

14.21 (a) r = 2 and c = 3. (b) The three proportions are 11/20 = 0.55or 55.0%,
=
68/91 0.7473 or 74.73%, and 3/8 = 0.375 or 37.5%. Some (but not too much) time spent in
extracurricular activities seems to be beneficial. (c) A bar graph is shown below.
The Minitab output below shows the counts, expected counts, contributions to X 2 = 1.418 with df
= 3, and P-value = 0. 701. No drug treatment had a significant impact on death rate.
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts
deaths NoDeaths Total
1 202 1447 1649
189.08 1459.92
0.883 0.114

2 182 1467 1649


189.08 1459.92
0.265 0.034

3 188 1466 1654


189.65 1464.35 (d) H 0 : There is no association between amount of time spent on extracurricular activities and
0.014 0.002
grades earned in the course versus Ha : There is an association. (e) The expected counts are
4 185 1465 1650 shown in the table below; each entry is the row total times the column total divided by 119.
189.19 1460.81
0.093 0.012
Total 757 5845 6602
Chi-Sq = 1.418, DF = 3, P-Value = 0.701
298 Chapter 14 Inference for Distributions of Categorical Variables: Chi-Square Procedures 299

Extracurricular Activities amount of time in extracurricular activities and also work hard on their classes; one does not
(hours per week) necessarily cause the other.
Grade <2 2 to 12 >12
Cor better 13.78 62.71 5.51 14.24 (a) H 0 : There is no association between smoking habits of parents and their high school
DorF 6.22 28.29 2.49 students. Ha :There is an association between smoking habits of parents and their high school
(f) Students who participated in almost no extracurricular activities (<2 hours) or lots of activities students. The expected counts given in Exercise 14.22 are all greater than 5, so the condition for
I (> 12 hours) passed less than expected and earned aD or F more than expected if these variables inference is satisfied. The test statistic is
are not associated. Students who tried to maintain balance (and participated in 2 to 12 hours of
X =13.7086+3.1488+0.0118+ 0.0027 + 16.8288+ 3.8655 = 37.566with df= (3-1)x(2-1) = 2.
2
activities) passed more than expected and earned aD or F less than expected if these variable are
I :
not associated. The P-value is less than 0.0005, so we reject H 0 and conclude that there is very strong evidence
of association between the smoking habits of parents and their high school children. (b) The
14.22 (a) r = 3and c = 2. (b) The proportions are 400/1780 = 0.2247 or 22.47%, highest contributions come from row 1 and column 1 ("both parents smoke, student smokes")
416/2239 = 0.1858 or 18.58%, and 188/1356 = 0.1386or 13.86%. A student's likelihood of and row 3 column 1 ("neither parent smokes, student smokes"). When both parents smoke, their
smoking increases when one parent smokes, and increases even more when both smoke. (c) A student is much more likely to smoke, and when neither parent smokes, their student is unlikely
i I
bar is shown below. to smoke. (c) No-this study demonstrates association, not causation. There may be other
factors (heredity or environment, for example) that cause both students and parent(s) to smoke.

14.25 H 0 :all proportions are equal versus Ha :at least one proportion is different. All of the
expected counts are greater than 5, so we may proceed with a ;r2. analysis. The test statistic is
=
X 2 = 4.3604 + 1.4277 + 0.0360 + 0.0118 + 3.6036 + 1.1799 10.619 with df= 2 and P-value = 0.0049.
Since 0.0049 < 0.01, we reject H 0 and conclude that the proportion of people who will admit
using cocaine depends on the method of contact.

14.26 (a) We want to test H 0 : Pn =Pa versus H 1 : Pn Pa, where Pn denotes the proportion
(d) The null hypothesis says that the smoking habits of parents and their students is independent of people with a bachelor's degree who favor the death penalty and Pa denotes the proportion of
(or not associated). (e) The expected counts are shown in the table below. people with a graduate degree who favor the death penalty. The actual counts of successes and
Parents Student smokes Student does not smoke failures are all greater than 5, so the z test is safe. The sample proportions are
Both smoke
One smokes
332
418
1448
1821
Pn =135/206= 0.6553,p2 = 641114 = 0.5614, and Pc = 135 + 64 = 0.6219. The test statistic
206+114
Neither smokes 253 1103 is z= - - - ; = =0.6553-0.5614 . h a P-va1ue = 0 .09 7. Smce
= = = = = = = = = =-1
- . 66 wtt . 0.097 > 0.05, we
(f) The observed number of student smokers is much higher than expected when both parents
1 1
smoke, and the observed number of student smokers is much lower than expected when neither 0.6219(1-0.6219)(-- + -)
parent smokes. This is another way to look at the relationship between the smoking habits of 206 114
parents and students. Looking at observed and expected counts we come to the same conclusion have no evidence to refute the hypothesis that the proportions of people who favor the death
!
that we did when comparing proportions: Children of non-smokers are less likely to smoke. penalty are the same for these two educational levels. (b) See the Minitab output below. The
chi-square statistic is X 2 = 2.754, which agrees (up to rounding) with z 2 = 1.662 = 2. 756. For
14.23 (a) Missing entries in table of expected counts are 62.71 and 5.51 in the first row and 6.22 df= 1, TableD tells us that 0.05 < P-value < 0.1 0, while software gives P-value = 0.097, which
in the second row. Missing entries in components of X 2 are 0.447 and 0.991. (b) The degrees of agrees with the result from part (a).
freedom are df= (2-1)x(3-1) = 2, and according to TableD, 0.025 < P-value < 0.05. Software
gives P-value = 0.0313. Since 0.0313 < 0.05, we have significant evidence that there is a
relationship between hours spent in extracurricular activities and performance in the course. (c)
The largest contribution comes from row 2, column 3 ("D or Fin the course, > 12 hours of
extracurricular activities"). Too much time spent on these activities seems to hurt academic
performance because the observed count is higher than expected. (d) No-this study
demonstrates association, not causation. Certain types of students may tend to spend a moderate

~
I l'l
~ II ~
. I
II 300 Chapter 14 Inference for Distributions of Categorical Variables: Chi-Square Procedures 301

Contribution to Chi-square
Rows: Degree Columns: Death Penalty Pearson Chi-Square= 4.840, DF = 2, P-Value = 0.089
Favor Oppose All
Bachelor 135 71 206
I 'I
128.1 77.9 206.0 14.28 (a) A bar graph is shown below. The proportions in favor of regulating guns (in order
0.3710 0.6101 * = =
from least to most education) are 58/116 = 0.5, 84/213 0.3944, 169 I 463 0.3650,
Graduate 64 50 114
= =
98/233 0.4206, and 77/176 0.4375. Those who did not complete high school and those
70.9 43.1 114.0 with a to be more likely to favor a ban.
0.6704 1.1025 *
All 199 121 320
I I 199.0 121.0 320.0
Cell Contents: Count
Expected count
Contribution to Chi-square
Pearson Chi-Square = 2.754, DF = 1, P-Value = 0.097

14.27 (a) A two-way table of counts is shown below


Cardiac Event
Group Yes No Total
Stress management 3 30 33 (b) The Minitab output below shows the counts, expected counts, and contributions to X The
2

Exercise 7 27 34
test statistic is X 2 =8.525, with df= 4, and P-value = 0.074. Since 0.074 > 0.05, we cannot
Usual care 12 28 40
reject H0 We do not have evidence to refute the hypothesis that the proportion of the adult
Total 22 85 107
= =
(b) The success rates are 30/33 0.9091 or 90.91%, 27/34 0.7941 or 79.41%, and population who favor a ban on handguns stays the same for different levels of education.
Expected counts are printed below observed counts
28/40 =0 7000 or 70o/to. (c) The expected cell co unts are Chi-Square contributions are printed below expected counts
Cardiac Event Yes No Total
1 58 58 116
Group Yes No 46.94 69.06
Stress management 6.79 26.21 2. 605 1. 771
Exercise 6.99 27.01
2 84 129 213
Usual care 8.22 31.78 86.19 126.81
All expected cell counts exceed 5, so the condition for the chi-square test is satisfied. (d) See the 0.056 0.038
Minitab output below for the counts, expected counts, and components ofX 2 The test statistic 3 169 294 463
is X 2 = 4.84 with df = 2 and P-value = 0.0889. Although the success rate for the stress 187.36 275.64
1. 799 1.223
management group is slightly higher than for the other two groups, this difference could be due
to chance. We cannot reject the null hypothesis of no association between a cardiac event and 4 98 135 233
the type of treatment. 94.29 138.71
Rows: Group Columns: Cardiac 0.146 0.099
No Yes All
Exercise 27 7 34 5 77 99 176
27.01 6.99 34.00 71.22 104.78
0.00000 0.00001 0.469 0.319
Stress 30 3 33
26.21 6.79 33.00 Total 486 715 1201
0.54650 2.11149
Usual 28 12 40 Chi-Sq = 8.525, DF = 4, P-Value = 0.074
31.78 8.22 40.00
0.44864 1. 73339
85 22 107
14.29 We want to testH0 :There is no association between where young adults live and gender
All
85.00 22.00 107.00 versus Ha: There is an association between where young adults live and gender. All expected
Cell Contents: Count counts are greater than 5, so the condition for inference is satisfied. The counts, expected counts,
Expected count and components of X 2 are shown in the Minitab output below. All of the expected counts are
302 Chapter 14 Inference for Distributions of Categorical Variables: Chi-Square Procedures 303

much greater than 5, so the condition for inference is satisfied. The test statistic is X 2 = 11.03 8
with df= 3 and P-value = 0.012. Note that the chi-square components for "parents' home"
account for 6.456 ofthe total X 2 Since 0.012 < 0.05, the choices of living places are
Expected counts are printed below observed counts
significantly different for males and females. More specifically, women are less likely to live Chi-Square contributions are printed below expected counts
with their parents and more likely to have a place on their own.
Low Medium High Total
I, 1 398 397 430
I Expected counts are printed below observed counts 1225
404.39 404.19 416.42
Chi-Square contributions are printed below expected counts
0.101 0.128 0.443
Female Male Total
1 923 986 1909
i 1' 2 250 241 237 728
978.49 930.51
I 240.32 240.20 247.47
3.147 3.309
I 0.390 0.003 0.443
2 144 132 276
3 1368 1377 1409 4154
141.47 134.53
1371.29 1370.61 1412.10
0.045 0.048
0.008 0.030 0.007
3 1294 1129 2423
Total 2016 2015 2076 6107
1241.95 1181.05
2.181 2.294
Chi-Sq = 1.552, DF = 4, P-Value = 0.817
4 127 119 246
126.09 119. 91 14.32 To describe the differences, we compare the percents of American and of East Asian
0.007 0.007
students who cite each reason. Then we test H 0 : There is no difference in the distributions for
Total 2488 2366 4854 American and East Asian students versus H 0 : There is a difference in the distributions for
Chi-Sq = 11.038, DF = 3, P-Value = 0.012
American and East Asian students. We compute the percentages of each group of stUdents who
14.30 (a) The population of interest will probably be specified as all high school students at gave each response by taking each count divided by its column total; for example,
your school. Some students may say all high school students, but you certainly don't have an 291115 = 0.2522 or 25.22%. The percentages, rounded to one decimal place, are shown in the
SRS from that population. (b) This student is correct, the sample is not an SRS, but we can use table below
inference to see if the observed difference in this sample is due to chance. (c) You are taking Reason American East Asian
one sample and classifYing the students according to two categorical variables. Thus, this is a Save time 25.2% 14.5%
chi-square test of independence. (d) Answers for the chi-square test will vary. Easy 24.3% 15.9%
Low price 14.8% 49.3%
14.31 (a) This is not an experiment because no treatment was assigned to the subjects. (b) A Live far from stores 9.6% 5.8%
high nonresponse rate might mean that our attempt to get a random sample was thwarted because No pressure to buy 8.7% 4.3%
of those who did not participate. This nonresponse rate is extraordinarily low. (c) We want to Other reason 17.4% 10.1%
testH0 :There is no association between olive oil consumption and cancer versus Ha: There is an Mimtab output for the chi-square test is shown below. One expected cell count is less than 5, but
association between olive oil consumption and cancer. See the Minitab output below for the this is within our guidelines for using the chi-square test. Note that the chi-square components
counts, expected counts, and components of X 2 All expected counts are much more than 5, so for low price account for 18.511 ofthe total chi-square 25.737. With df= 5, TableD tells us that
the chi-square test is safe. The chi-square statistic is X 2 = 1.552 with df= 4 and P-value = P < 0.0005. There is very strong evidence that East Asian and American students buy from
catalogs for different reasons; specifically, East Asian students place much more emphasis on
0.8174. High olive oil consumption is not more common among those without cancer; in fact,
"low price" and less emphasis on "easy" and "save time."
when looking at the conditional distributions of olive oil consumption, all percents are between
32.4% and 35.1%-that is, within each group (colon cancer, rectal cancer, control) roughly one-
third fall in each olive oil consumption category.
304 Chapter 14 Inference for Distributions of Categorical Variables: Chi-Square Procedures 305

Expected counts are printed below observed counts


Chi-Square contributions are printed below expected counts
CASE CLOSED!
American East Asian Total (1) We want to test H 0 : the distributions of the two treatment groups are the same versus Ha:
1 29 10 39
24.38 14.63 the distributions of the two treatment groups are different. Women were recruited for the study,
0.878 1. 463 so we must assume that these women are representative of all women with ages from 21 to 4 3
2 28 11 39
and this cause of infertility. The women were randomly assigned to the two treatment groups so
24.38 14.63 we can assume that we have two samples, one from the population of women who would
0.539 0.899 undergo acupuncture and another to serve as a control. We will conduct a test for homogeneity
3 17 34 51 of populations. The Minitab output below shows the counts, expected counts, and components
31.88 19.13 of X 2 All of the expected counts are greater than 5, so this condition for inference is satisfied.
6.942 11.569
The test statistic is X 2 = 4.682 with df= 1 and P-value = 0.030. Since 0.03 < 0.05, we have
4 11 4 15 evidence to reject the null hypothesis and conclude that the pregnancy rates are diffe~ent for the
9.38 5.63
0.282 o. 469 two groups of women. In short, acupuncture appears to improve a woman's chance of getting
pregnant with this fertilization technique.
5 10 3 13 Rows: Pregnant Columns: Group
8.13 4.88
0. 433 o. 721 Acupuncture Control All

6 20 7 27 No 46 59 105
16.88 10.13 52.50 52.50 105.00
0.579 0.965 0.8048 0.8048

7 115 69 184 Yes 34 21 55


115.00 69.00 27.50 27.50 55.00
0.000 0.000 1.5364 1. 5364

Total 230 138 368 All 80 80 160


80.00 80.00 160.00
Chi-Sq = 25.737, DF = 6, P-Value = 0.000
Cell Contents: Count
Expected count
14.3 3 (a) We want to test H 0 : p 1 = p2 versus Ha : p 1 =1= p 2 , where p 1 is the proportion of women Contribution to Chi-square
Pearson Chi-Square= 4.682, DF = 1, P-Value = 0.030
customers in city 1 and p 2 is the proportion of women customers in city 2. The sample
. ~ 203 . 0 4 ~ 150 d " 203 + 150 . (2) We want to test H 0 : PA = Pc versus Ha : p A :f. Pc, where p A is the proportion of women
proporttonsare p 1 =-= .8 23, p 2 =-=0.6881,an p = =0.7691. Thetest
241 218 c 241 + 218 undergoing IVF or ICS who would become pregnant with acupuncture and Pc is the proportion
. . .
statistic IS z =
0.8423-0.6881
1 1
= .h
3.92 Wit a P-value = 0.00009. We have of women undergoing IVF or ICS who would become pregnant lying still. The sample
0.7691(1-0.7691)(--+--)
241 218
.
proportions are p~ A =- = ~ =-
34 0.425 ' Pc =
21 0.2625' and"
Pcombined =
34+21.= 03438
. . The test
M M W+W
extremely strong evidence that the proportion of women customers in these two cities is 0 25 0 2625
different. (b) The chi-square test statistic isX 2 =15.334, which agrees with (up to rounding)
statistic is z = .4 - =2.16 with a P-value = 0.0308. Notice that
z2 =3.92 2 ='=15.3664. With df= 1, TableD tells us thatP-value < 0.0005; a statistical calculator 0.3438(1- 0.3438)(_!_+ _!_)
80 80
gives P = 0.00009. (c) A 95% confidence interval for p 1 - p 2 is
=
z 2 = 2.16 2 4.6656, which agrees with (up to rounding) X 2 = 4.682, and the P-values are also
( 0.8423-0.6881) 1.96 8423
241
0 8423 0 6881 1 0 6881
(1- )+ ( -
218
) =c (0.0774, 0.2311 ). Notice the same (except for the rounding differences). Since 0.03 < 0.05, we make exactly the same
conclusion we made in part (a).
that 0 is not in the 95% confidence interval for the difference in the two proportions.
(3) The physiological effects of acupuncture on the reproductive system were not being studied
14.34 No, with df= 4 and P-value = 0.4121, we do not have evidence to reject the hypothesis in this experiment. The researchers wanted to see if adding acupuncture to a fertilization method
that the income distributions are different for customers at the two stores.
306 Chapter 14 Inference for Distributions of Categorical Variables: Chi-Square Procedures 307

would improve the pregnancy rates of women who choose this technique with a particular cause < 0.0005. We have very strong evidence that there is an association between the amount of
of infertility.
alcohol and the amount of nicotine consumed during pregnancy. The primary deviation from
independence (based on a comparison of expected and actual counts) is that nondrinkers are
14.35 The observed counts, marginal percents, and expected counts are shown in the table
more likely to be nonsmokers than we might expect, while those drinking 0.11 to 0.99 oz/day are
below. The expected counts are obtained by multiply the national proportions (percent/100) by less likely to be nonsmokers than we might expect. The visual displays provided will vary, but
535.
I. they should illustrate the conditional distributions provided in the output below. One possible
. I Score 5 4 3 2 1 graph is provided below the Minitab output.
Observed Count 167 158 101 79 30
Percent 31.22 29.53 18.88 14.77 5.61 Rows: Alcohol Columns: Nicotine
Expected Count 81.855 117.7 132.68 105.93 96.835 1-15 16 or more None All
The bar graphs below show the two distributions, one for the nattonal percents and another for 0.01-0.10 5 13 58 76
the sample percents. Note that students may decide to use proportions instead of percents, but 6.58 17.11 76.32 100.00
7.69 15.66 19.08 16.81
the overall shapes will be the same. The national distribution has a peak at 3 and is roughly 10.93 13.96 51.12 76.00
symmetric. The sample is skewed to the right according to the graph below, but notice that the 3.2167 0.0655 0.9274
I' scores are listed from highest to lowest so student may list the scores from lowest to highest and
0.11-0.99 37 42 84 163
then correctly say that the sample is skewed to the left. Some students may avoid this issue 22.70 25.77 51.53 100.00
altogether by saying that the sample distribution is skewed towards the smaller scores, with a 56.92 50.60 27.63 36.06
at the 23.44 29.93 109.63 163.00
7.8440 4.8661 5.9913

1.00 or more 16 17 57 90
17.78 18.89 63.33 100.00
24.62 20.48 18.75 19.91
12.94 16.53 60.53 90.00
0. 7223 0.0136 0.2060

None 7 11 105 123


5.69 8.94 85.37 100.00
10.77 13.25 34.54 27.21
17.69 22.59 82.73 123.00
6.4583 5.9435 5.9975

All 65 83 304 45.2


14.38 18.36 67.26 100.00
We want to test H 0 :The distribution of scores in this sample is the same as the distribution of 100.00 100.00 100.00 100.00
65.00 83.00 304.00 452.00
scores for all students who took this inaugural exam versus Ha :The distribution of scores in this Cell Contents: Count
sample is different from the national results. All expected counts are greater than 5, so the % of Row
%.of Column
condition for the goodness of fit test is satisfied. The test statistic Expected count
isX 2 :::7::88.5672 +13.7986+7.5642+6.8463+46.1292:::7::162.9 with df= 4 and P-value < 0.0005. We Contribution to Chi-square
have very strong evidence that the distribution of AP Statistics exam scores in the sample is
Pearson Chi-Square = 42.252, DF = 6, P-Value = 0.000
different from the national distribution.

14.36 The Minitab output below shows the counts, conditional distributions for the rows
(amount of alcohol), conditional distributions for the columns (amount of nicotine), the expected
counts, and the components of X 2 Since the expected counts are all greater than 5, the
condition for the chi-square test of association is satisfied. We want to test H 0 : There is no
association between the amount of alcohol and the amount of nicotine consumed during
pregnancy versus Ha : There is an association between the amount of alcohol and the amount of
nicotine consumed during pregnancy. The test statistic is X 2 = 42.252 with df= 6 and P-value
308 Chapter 14 Inference for Distributions of Categorical Variables: Chi-Square Procedures 309

@~--------------------~.-----------~ 14.39 (a) A two-way table of counts is shown below


liJ Nollloobol Temperature Hatched Not
E 0.01-0.HI 'OZ/day Cold 16 11
(Ul-0.99 oz;/day Neutral 38 18
, I
Hot 75 29
1.00+ oz/day
I' (b) The percents are 59.3% for cold water, 67.9% for neutral water, and 72.1% for hot water. The
I
percent hatching increases with temperature. The cold water did not prevent hatching, but made
it less likely. (c) We want to test H 0 : Pc = pN = p 8 versus Ha: at least one P; is different. The
0
None 1-15 mg/day 16+ mg/day Minitab output below shows the counts, expected counts, and components of X 2 All expected
Nicotine consumption counts are greater than 5, so it is safe to use the chi-square test ofhomogeneity. The test statistic
is X 2 =1.703 with df= 2 and aP-value = 0.427. Since 0.427 > 0.05, the differences are not
significant and could be due to chance.
14.37 We want to testH0 :The survey results match the college population versus Ha: The
survey does not match the college population. See the table below for observed counts, expected Expected counts are printed below observed counts
2 Chi-Square contributions are printed below expected counts
counts, and components of X All expected counts are greater than 5, so the condition for the
goodness of fit test is satisfied. The test statistic is X 2 = 5.016, with df= 3 and P-value = Hatched Not Total
0.1706. We have little reason to doubt our survey responses match the college population. Cold 16 11 27
18.63 8.37
0.370 0.823
Observed Expected (observed- expectedi
Neutral 38 18 56
expected 38.63 17.37
0.010 0.023
54 59.74 0.5515
66 55.62 1.9372 Hot 75 29 104
56 51.5 0.3932 71.74 32.26
0.148 . 0.329
30 39.14 2.1344
206 5.0163 Total 129 58 187

Chi-Sq = 1. 703, DF = 2, P-Value = 0.427

14.38 We want to test H 0 : p 1 = p 2 = p 3 = p 4 = 114 versus Ha :at least one of the proportions is 14.40 The sample percents of cocaine addicts who did not have a relapse are 14/24 or 58.33%
different from 114. The table below shows the counts, expected counts, and components of with desipramine, 6/24 or 25% with lithium, and 4/24 or 16.67% with a placebo. A bar graph
2
X for the sample data provided. The expected counts are all equal to 200x0.25 = 50, which is these is shown below.
greater than 5 so it is safe to use the goodness of fit test. The test statistic is X 2 =3.6 with df=
3 and P-value > 0.25, according to TableD (software gives a P-value = 0.3080). Since 0.308 >
0.05, we have no evidence to refute the hypothesis that the spinner is equally likely to land in any
one of the four sections.
Outcome Counts Expected (observed- expected) 2
expected
1 51 50 0.02
2 39 50 2.42
3 53 50 0.18
4 57 50 0.98
200 200 3.6
We want to test H 0 : Pv = pL = Pptaceho versus Ha: at least one of the proportions is different. The
Minitab output below shows the counts, expected counts, and components of X 2 All expected
counts are greater than 5, so it is safe to use the chi-square test ofhomogeneity. The test statistic
is X 2 = 10.5 with df= 2 and a P-value = 0.005. Since 0.005 < 0.01, we have very strong
310 Chapter 14 Inference for Distributions of Categorical Variables: Chi-Square Procedures 311

evidence that the probability of successfully breaking a cocaine addiction is different for the is X 2 =2.669 with df= 1, P-value = 0.102, so we cannot conclude that students and non-
three different treatments. More specifically, desipramine appears to be the best. students differ in the response to this question.
Expected counts are printed below observed counts Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts Chi-Square contributions are printed below expected counts

,I
I
Yes No Total Students Nonstudents Total
Desipramine 10 14 24 Agree 22 30 52
I I 16.00 8.00 26.43 25.57
2.250 4.500 0.744 0.769

Lithium 18 6 24 Disagree 39 29 68
16.00 8.00 34.57 33.43
0.250 0.500 0.569 0.588

Placebo 20 4 24 Total 61 59 120


16.00 8.00
1.000 2.000 Chi-Sq = 2.669, OF= 1, P-Value = 0.102
(b) We want to test H 0 : p 1 = p 2 versus Ha : p 1 =F- p 2 , where p 1 is the proportion of students who
Total 48 24 72
agreed and p 2 is the proportion of non-students who agreed. The sample proportions are
Chi-Sq = 10.500, OF = 2, P-Value = 0.005
~
p1 =
22 =0.3607 , p~ = 30 =0.5085, and Pc
6t ~ = 22+30
+ =. 0.4333 . The test statistic
. . .IS
2
14.41 (a) No, this is not an experiment because a treatment was not imposed. (b) Among those 59 61 59
who did not own a pet, 28/39 or 71.8% survived, while 50/53 or 94.3% of pet owners survived.
Overall, 84.8% of the patients survived. It appears that you are more likely to survive CHD if
z = 0.3607-0.5085 = . h a P-vaIue = o.102 .
-1. 63 wit up to round'mg, z 2 = x2
you own a pet! (c) We want to testH0 : There is no association between patient status and pet '
0.4333(1-0.4333) ( 61
1 +59
1)
ownership versus Ha :There is an association between patient status and pet ownership. The
and the P-values are the same. (c) The statistical tests in part (a) and (b) assume that we have
Minitab output below shows the counts, expected counts, and components of X 2 The.expected two SRSs, which we clearly do not have here. Furthermore, the two groups differed in geography
counts are all greater than 5, so it is safe to use the chi-square test for independence. The test (northeast/West Coast) in addition to student/non-student classification. These issues mean we
statistic is X 2 =8.851 with df= 1 and P-value = 0.003. (d) Since 0.003 < 0.01, we have very should not place too much confidence in the conclusions of our significance test--or, at least, we
strong evidence that there is an association between pet ownership and survival with CHD. (e) should not generalize our conclusions too far beyond the populations "upper level northeastern
We used a X2 test. In a z test, we would test H 0 : p 1 = p 2 vs. Ha : p1 < p 2 For this test, z = college students taking a course in Internet marketing" and "West Coast residents willing to
-2.975 with P-value = 0.0015. The P-value is half that obtained in (c). The z test enables us to participate in commercial focus groups."
use a one-tailed test. If we are interested in deciding if pet ownership made a difference to
survival rate (a two-tailed test) and not just improved survival rate (a one-tailed test), then it 14.43 (a) The best numerical summary would note that we view target audience ("magazine
wouldn't matter which test we used. readership") as explanatory, so we should compute the conditional distribution of model dress
Expected counts are printed below observed counts for each audience. This table and graph are shown below.
Chi-Square contributions are printed below expected counts
Ma azine readershi
No Yes Total Model dress Women Men General
Alive 28 50 78
33.07 44.93
Not sexual 60.94% 83.04% 78.98%
0.776 0.571 Sexual 39.06% 16.96% 21.02%
Dead 11 3 14
5.93 8.07
4.323 3.181

Total 39 53 92

Chi-Sq = 8.851, OF= 1, P-Value = 0.003

(b) The Minitab output containing the counts, expected counts, and components of X 2 are shown
14.42 (a) Subtract the "agreed" counts from the sample sizes to get the "disagreed" counts. The below. The expected counts are all greater than 5. The test statistic is X 2 = 80.9 with df= 2 and
table is in the Minitab output below. The expected counts are all greater than 5. The test statistic
312 Chapter 14

P-value < 0.0005. Since the P-value is very small, we have very strong evidence that target
audience affects model dress.
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts
Gen.
Women Men Interest Total
Not sexual 351 514 248 1113
424.84 456.56 231.60
12.835 7.227 1.162

Sexual 225 105 66 396


151.16 162.44 82.40
36.074 20.312 3.265

Total 576 619 314 1509

Chi-Sq = 80.874, DF = 2, P-Value = 0.000


(c) The sample is not an SRS: A set of magazines were chosen, and then all ads in three issues of
those magazines were examined. It is not clear how this sampling approach might invalidate our
conclusions, but it does make them suspect.

14.44 (a) First we must find the counts in each cell of the two-way table. For example, there
=
were about 0.172 x 5619 966 Division I athletes who admitted to wagering. These counts are
shown in the Minitab output below, where we see that X 2 = 76.675 with df= 2 and P < 0.0001.
There is very strong evidence that the percentage of athletes who admit to wagering differs by
division. (b) Even with much smaller numbers of students (say, 1000 from each
division), the P-value is still very small. Presumably the estimated numbers are reliable enough
that we would not expect the true counts to be less than 1000, so we need not be concerned
about the fact that we had to estimate the sample sizes. (c) If the reported proportions are
wrong, then our conclusions may be suspect-especially if it is the case that athletes in some
division were more likely to say they had not wagered when they had. (d) It is difficult
to predict exactly how this might affect the results: Lack of independence could cause the
estimated percents to be too large, or too small, if our sample included several athletes from
teams which have (or do not have) a "gambling culture."

Expected counts are printed below observed counts


Chi-Square contributions are printed below expected counts

I II III Total
Yes 966 621 998 2585
1146.87 603.54 834.59
28.525 0.505 31.996

No 4653 2336 3091 10080


4472.13 2353.46 3254.41
7.315 0.130 8.205

Total 5619 2957 4089 12665

Chi-Sq = 76.675, DF = 2, P-Value = 0.000


T
I
312 Chapter 14 I 313
I 1 I
P-value < 0.0005. Since the P-value is very small, we have very strong evidence that target Chapter 15
audience affects model dress. I .I
Expected counts are printed below observed counts
I 15.1 The correlation is r = 0.994, and the least-squares linear regression equation is
Chi-Square contributions are printed below expected counts
Gen. j) = -3.66 +!.1969x, where y =humerus length and x= femur length. The scatterplot with the
Women Men Interest Total I regression line below shows a strong, positive, linear relationship. Yes, femur length is a very
Not sexual 351 514 248 1113
424.84 456.56 231.60 I of humerus
12.835 7.227 1.162
I
Sexual 225 105 66 396
151. 16
36.074
162 . 4 4
20.312
82.40
3.265
I

Total 576 619 314 1509
I
Chi-Sq = 80.874, DF = 2, P-Value = 0.000
I
(c) The sample is not an SRS: A set of magazines were chosen, and then all ads in three issues of I
those magazines were examined. It is not clear how this sampling approach might invalidate our
conclusions, but it does make them suspect. I
14.44 (a) First we must find the counts in each cell of the two-way table. For example, there
I
= 15.2 (a) The least-squares regression line is y = 11.547 + 0.84042x, where y =height (inches)
were about 0.172x5619 966Division I athletes who admitted to wagering. These counts are I
shown in the Minitab output below, where we see that X' = 76.675 with df= 2 and P < 0.0001. and x in arm span (inches). (b) Yes, the least-squares line is an appropriate model for the data
There is very strong evidence that the percentage of athletes who admit to wagering differs by
I because the residual plot shows an unstructured horizontal band of points centered at zero.
Since 76 inches is within the range of arm spans examined in Mr. Shenk's class, it is reasonable
division. (b) Even with much smaller numbers of students (say, 1000 from each I to predict the height of a student with a 76 inch arm span.
division), the P-value is still very small. Presumably the estimated numbers are reliable enough
that we would not expect the true counts to be less than 1000, so we need not be concerned . I
15.3 (a) The observations are independent because they come from 13 unrelated colonies. (b)
about the fact that we had to estimate the sample sizes. (c) If the reported proportions are I The scatterplot of the residuals against the percent returning (below on the left) shows no
wrong, then our conclusions may be suspect--{lspecially if it is the case that athletes in some
I systematic deviations from the linear pattern. (c) The may be slightly wider in the middle
division were more likely to say they had not wagered when they had. (d) It 'is difficult
but not markedly so. (d) The histogram (below on the shows no outliers or strong '
to predict exactly how this might affect the results: Lack of independence could cause the
estimated percents to be too large, or too small, if our sample included several athletes from I from N<>r1;P~[i1Jl[i1Jl~~
teams which have (or do not have) a "gambling culture." I I

Expected counts are printed below observed counts


Chi-Square contributions are printed~below expected counts
I

I II III Total
I
Yes 966 621 998 2585
1146.87 603.54 834.59 I
28.525 0.505 31.996
No 4653 2336 3091 10080
I
4472.13
7.315
2353.46
0.130
3254.41
8.205
I
Total 5619 2957 4089 12665
I 15.4 (a) The observations are independent because they come from 16 different individuals. (b)
Chi-Sq = 76.675, DF = 2, P-Va1ue = 0.000 I The scatterplot of the residuals against nonexercise activity (below on the left) shows no
systematic deviations from the linear pattern. One residual, about 1.6, is slightly larger than the
I others, but this is nothing to get overly concerned about. (c) The spread is slightly higher for
I larger values ofnonexercise acitvity, but not markedly so. (d) The histogram (below on the right)

j,
I'
li
1
I !.
I
314 Chapter 15 I Inference for Regression 315

I I
shows no outliers and a slight skewness to the right, but this does not suggest a lack of
I I
N~~~~ I
I
I

I

I
I
15.8 (a) The least-squares regression line is y = 0.12049 +0.008569x, where y =the proportion
I
of perch killed and x =the number of perch. The fact that the slope is positive tells us that as the
I 15.5 (a) The slope parameter f3 represents the change in the mean humerus length when femur I number of perch increases, the proportion being killed by bass also increases. (b) The regression
length increases by I em. (b) The estimate ofjJ is b =1.1969, and the estimate of a is standard error iss= 0.1886, which estimates the standard deviation a. (c) Who? The
I individuals are kelp perch. What? The response variable is the proportion of perch killed and
a= -3.66. (c) The residuals are -0.8226, -0.3668, 3.0425, -0.9420, and -0.9110, and their sum
the explanatory variable is the number of perch available (or in the pen); both variables a
is -0.0001. The standard deviation is estimated by s
I(resid') J'-1.79- -1.982.
_,_
I quantitative. Why? The researcher was interested in examining the relationship between
n-2 3 ' predators and available prey. When, where, how, and by whom? Todd Anderson published the

15.6 (a) The scatterplot (below on the left) shows a strong, positive linear relationship between
I data obtained from the ocean floor off the coast of southern California in 200 !.Graphs: The
scatterplot provided clearly shows that the proportion of perch killed increases as the number of
x =speed (feet/second) andy= steps (per second). The correlation is r = 0.999 and the least: I perch increases. Numerical Summaries The mean proportions of perch killed are 0.175, 0.283,
squares regression line is y = 1.76608+0.080284x. (b) The residuals (rounded to 4 decimal 0.425, and 0.646, in order from smallest to largest number of perch available. Mode/The least-
I squares regression model is provided in part (a). Interpretation The data clearly support the
places) are 0.0106,-0.0013,-0.0010,-0.0110,-0.0093, 0.0031, and 0.0088, and their sum i~
-0.0001 (essentially 0, except for rounding error). (c) The estimate of a is a= 1.76608, the I predator-prey principle provided. (Students will soon learn how to formally test this hypothesis.)
(d) Using df= 16- 2 = 14 and t' = 2.145, a 95% confidence interval for fJ is
estimate of f3 is b = 0.080284, and the estimate of a is s = t ~
0 041
"'0.009l. I 0.0085692.145x 0.002456 = (0.0033, 0.0138). We are 95% confident that the proportion of
perch killed increases on average between 0.0033 and 0.0138 for each addition perch added to
the pen.
I
I 15.9 The regression equation is ji=560.65 -3.017ix, where y=calories and x=time. The
scatterplot with regression line (below) shows that the longer a child remains at the table, the
I fewer calories he or she will consume. The conditions for inference are satisfied. Using df= 18
and t' = 2.101, a 95% confidence interval for fJ is -3.0771 2.101x 0.8498 = (-4.8625,
I -1.2917). With 95% confidence, we estimate that for every extra minute a child sits the table, he
I or she will consume an average of between 1.29 and 4.86 calories less during lunch.
I
15.7 (a) The scatterplot below shows a strong, positive linear relationship. (b) The slope f3 I
gives this rate. The estimate of fJ is listed as the coefficient of "year" in the output, b =
9.31868 tenths of a millimeter. (c) We are not able to make an inference for the tilt rate from a
I
simple linear regression model, because the observations are not independent. I ,!.
I
I
I
I
I
I
316 Chapter 15 I Inference for Regression 317
I
accidents should follow a Normal distribution. The conditions are satisfied except for having
I independent observations, so we will proceed with caution. (d) LinRegTTest reports that t =
I 21.079 with df= 8 and ?-value is 0.000. With the earlier caveat, there is very strong evidence to
reject H 0 and conclude that there is a significant positive association between number of
I accidents and number of jet skis in use. As the number of jet skis in use increases, the number of
I accidents significantly increases. (e) Using df= 8 and t' = 2.896, a 98% confidence interval for
fJ is 0.00482.896x 0.0002 = (0.0042, 0.0054). With 98% confidence, we estimate that for
I every extra thousand jet skis in use, the number of accidents increase by a mean of between 4.2
I
I
15.10 (a) Excel's 95% confidence interval for /]is (0.0033, 0.0138). This matches the I
confidence interval calculated in Exercise 15.8. We are 95% confident that the proportion of
perch killed increases on average between 0.0033 and 0.0138 for each addition perch added to I
the pen. (b) See Exercise 15.8 part (d) for a verification using the Minitab output. Using df= I
16- 2 = IA and t' =2.145 with the Excel output, a 95% confidence interval for fJ is
0.00862.145x 0.0025 = (0.0032, 0.0140). (c) Using df= 16- 2 = 14 and t' =1.761, a 90% I
confidence interval for fJ is 0.00861.761x0.0025 = (0.0042, 0.0130). I
15.11 (a) The least-squares regression line from the S-PLUS output is y =-3.6596+ l.l969x,
I
15.14 (a) We want to testH0 : fJ = 0 (there is no association between yearly consumption of wine
where y =humerus length and x= femur length. (b) The test statistic is I and deaths from heart disease) versusH,: fJ <0 (there is a negative association between yearly
969
t = _!>_ = 1.1
SE, 0.0751
"'15.9374. (c) The test statistic t has df= 5 - 2 =3. The largest value in Table.D I consumption of wine and deaths from heart disease). The data are obtained from different
I nations, so independence seems reasonable. The other conditions of constant variance, linear
is 12.92. Since 15.9374 > 12.92, we know that ?-value< 0.0005. (d) There is very strong
22 969
evidence that fJ > 0, that is, the line is useful for predicting the length of the humerus given the I relationship and Normality are also satisfied. The test statistic is t - -6.46with df=
3.557
length of the femur. (e) Using df= 3 and t' = 5.841, a 99% confidence interval for fJ is 17 and ?-value < 0.0005. Since the ?-value is smaller than any reasonable significance level,
1.19695.841x0.0751 = (0.7582,1.6356). We are 99% confident that for every extra centimeter say I%, we reject H 0 We have very strong evidence of a significant negative association
in femur length, the length of the humerus will increase on average between 0.7582 em and I between the consumption of wine and deaths from heart disease. (b) Using df= 17 and
1.6356 em.
I t' = 2.110, a 95% confidence interval for fJ is -22.969 2.110x3.557 = (-30.4743, -15.4637).
With 95% confidence, we estimate that the number of deaths from heart disease (per 100,000
15.12 (a) The value of r 2 = 0.998 or 99.8% is very close to one (or 100%), which indicates I people) decreases on average between 15.46 and 30.47 for each additional liter of wine
perfect linear association. (b) The slope parameter /]gives this rate. Using df= 5 and consumed (per person). I'

t' = 4.032, a 99% confidence interval for fJ


is 0.0802844.032x0.0016 = (0.0738, 0.0867). I
We are 99% confident that the rate at which steps per second increase as running speed increases I
by 1 ft/s is on average between 0.0738 and 0.0867.
I
15.13 (a) The scatterplot (below) with regression line shows a strong, positive linear association
between the number of jet skis in use (explanatory variable) and the number of accidents
I
(response variable). (b) We want to testH0 :(J =0 (there is no association between number of jet I
skis in use and number of accidents) versusH, :(J > 0 (there is a positive association between
number of jet skis in use and number of accidents). (c) The conditions are independence, the
I
mean number of accidents should have a linear relationship with the number of jet skis in use, I
the standard deviation should be the same for each number of jet skis in use, and the number of I
I
I I

j
I

I
I
!
I
......_
1
I
318 Chapter 15 I Inference for Regression 319

I
15.15 (a) The scatterplot below shows a moderately strong, positive linear association betweeny I Variable N Mean StDev Minimum Ql Median Q3 Maximum
Taken 19 15.684 2.865 9.200 13.800 17.100 17.700 18.300
=number of beetle larvae clusters and x =number of beaver-caused stumps. (b) The least-
I Percent 19 35.379 1. 425 34.100 34.400 34.600 36.200 38.400
squares regression line isy = -1.286+ I 1.894x. r 2 =83.9%, so regression on stump counts explains
83.9% of the variation in the number of beetle larvae. (c) We want to testH0 :fJ =0 versus I
H.: fJ 0. The conditions for inference are met, and the test statistic is t = 10.47with df= 21. I
The output shows P-value = 0.000, so we have very strong evidence that beaver stump counts
I
I
I
I
I
I (2) The least-squares regression line is y= 42.8477 -0.4762x with r' = 0.917 or 91.7%. The
linear model provides a reasonably good fit for these data. However, the residual plot shows a
I clear pattern with positive residuals for small and large numbers of 3-pointers taken and negative I
I residuals in between the two extremes.
,I
(3) The point is tagged as being influential because it may have a considerable impact on the
15.16 (a) The mean of the standardized residuals is 0.00174 and the standard deviation is 1.014. I regression line. Influential points often pull the regression line in their direction so the residuals
Since the residuals are standardized, we expect the mean and standard deviation to be close to ,0 tend to be small for influential points.
and 1, respectively. (b) A stemplot is shown below on the left. The distribution is slightly I (4) We want to testH0 : fJ = 0 versus H. : fJ 0. Independence is reasonable because the data are
I
skewed to the left, but this is not unusual for a small data set. There are no striking departures I from different seasons. The linear relationship condition is met, but the constant variance
from Normality. For a standard Normal distribution, we would expect 95% of the observations condition and the Normality are both questionable so we will proceed with caution. A histogram
to fall between -2.0 and 2.0. Thus, -1.99 is quite reasonable. (c) The residual plot on the right I of the percent made below shows that the distribution is skewed to the right. The test statistic is
below shows no obvious patterns. I t = -13.7with df= 17 and P-value = 0.000. We have very strong evidence of a significant
Stem-and-leaf of Residuals 23 association between the number taken and the percent made.
Leaf Unit = 0.10
N =




I

3
5
-1
-1
965
30

I
6
10
-0
-0
7
4422
I
(4) 0 0224
9 0 56789
I
4 1 2233
I
I
CASE CLOSED! I
(5) Using df= 17 and t' = 2.110, a 95% confidence interval for j3 is -0.47622.110x0.03475
(1) Descriptive statistics for x =number of three-point shots taken andy= percent made are I
shown below. The average number of three-point shots taken per game is 15.684 and the = (-0.5495, -0.4029). With 95% confidence, we estimate that for every additional three-pointer
standard deviation is 2.865. The average percent of three-point shots made per game is 35.379 I taken, the percent made will decrease on average between 0.40 and 0.55.
and the standard deviation is 1.425. The correlation is r = -0.958 and the scatterplot below
shows a negative association between these two variables. Notice that the cluster of points in the
I 15.17 Regression of fuel consumption on speed gives b = -0.01466, SEb = 0.02334, and
bottom right corner shows some positive association, but the overall association between x andy I t = -0.63 with df= 13 and P-value= 0.541. Thus, we have no evidence to suggest a straight-
is clearly negative.

I
~ '

.................................................................~]
I IJ

I
Chapter 15 I Inference for Regression 321
320
I
. 274.7821 . . I
line relationship between speed and fuel use. The scatterplot below shows a strong relationship
and fuel the relationship is not linear. See Exercise 3.9 for more details.
I IS t= .1
88 7712
3.1163 With df= 10 and P-value = 0.011. (Table C indicates that 0.01 < P-
I
I value < 0.02.) Since the P-value < 0.05, we reject H 0 and conclude that there is a significant
linear relationship between thickness and gate velocity. The regression formula might be used as
I a rule of thumb for new workers to follow, but the wide spread in the scatterplot below suggests
I there be other factors that should be taken into account in choosing the gate velocity.


I
I


I
/.
I I
I
I
,!

15.18 Repeated measurements of Sarah's height are clearly not independent. I

I
15.19 (a)The slope ,B tells us the mean change in the percent of forest lost for a 1 unit (1 cent
per pound) increase in the price of coffee. The estimate of ,B is b = 0.05525 and the estimate of I
a is a= -1.0134. (b) This says that the straight-line relationship described by the least-squares I 15.21 (a) A scatterplot with the regression line is shown below. r' = 0.992 or 99.2%. (b) The
estimates of a, ,B, and a are a= -2.3948 em, b = 0.1585 em/min, and s = 0.8059 em. (c) The
line is very strong. r' = 0. 907 or 91% indicates that 91% of the total variation in the percent of
forest lost is accounted for by the straight-line relationship with prices paid to coffee growers.
I least-squares line is =-2.3948+ 0.1585x, where y =length and x =time.
(c) The P-value refers to the two-sided alternative: H 0 : ,B = 0 versus H.: P* 0. The small p. I
value indicates that we have very strong evidence of a significant association between the
percent of forest lost and the price paid for coffee. (d) The residuals are -0.0988, 0.3934,
I
-0.2800,-0.2053, and 0.1907, and their sum is 0. The standard deviation a is estimated by I
s = ) 03: 15 "'0.3274. (e) A scatterplot (on the left) and a residual plot (on the right) are shown I


below. Even though the number of observations is small, there are no obvious problems with the I
linear regression model. Coffee price appears to be a very good predictor of forest lost for this
of values.
I
I
I 15.22 (a) A scatterplot with the least-squares regression line y = 3.5051- 0.0034x is shown
below. We want to test H 0 : fJ = 0 versus H, : fJ < 0. The test statistic is t = -4.64 with df = 14
I and P-value < 0.0005. We have very strong evidence that people with higher NEA gain less fat.

i'
I (b) To find this interval, we need SEb, which is given in the Minitab output below as
( 0.0007414. Using df= 14 and t' =1.761, a 90% confidence interval for ,B is
I -0.003441.76lx 0.0007414 = (-0.0047, -0.0021).
I
I
15.20 (a) The scatterplot below, with the regression line y = 70.436874+ 274.782lx, shows a I
=
moderate, positive, linear association. The linear relationship explains r' 0.493 or 49.3% of
I
*
the variation in gate velocity. (b) We want to testH0 : fJ = 0 versus H.: fJ 0. The test statistic

i
l'
'I

;"

I
I
I
Chapter 15 I Inference for Regression 323
322
I
I
~: I
. ~ . I

~
I
I
I
I
15.25 (a) The scatterplot below (on the left) shows a weak, negative association between com
The regression equation is
Fat gain {kg) ~ 3.51 - 0.00344 NEA change {cal) I yield and weeds. The least-squares regression line is .Y~ 166.483 -1.0987x, where y =com yield
I (bushels per acre) and x~ weeds (per meter). r2 ~ 0.209 or 20.9%, so the linear relationship
p
Predictor Coef SE Coef T explains about 20.9% of the variation in yield. (b) The t statistic for testing H 0 :p~o versus
Constant 3.5051
-0.0034415
0.3036
0.0007414
11.54
-4.64
0.000
0.000
I H, :p <0 is t = -1.92 with df= 14 and P-value = 0.0375. Since 0.0375 < 0.05, thereis
NliA change (cal)
I sufficient evidence to conclude that more weeds reduce com yields. (c) The small number of
observations for each value of the explanatory variable (weeds/meter), the large variability in
s ~ 0.739853 R-Sq ~ 60.6% R-Sq(adj) ~ 57.8%
I those observations, and the small value of? will make prediction with this model imprecise. A I
15.23 (a) A scatterplot is shown below. There is a moderate, positive, linear association between I residual plot below (on the right) also shows that the linear model is quite imprecise.
investment returns in the U.S. and investments overseas. (b) The test statistic is
0 6181 "'2.6091 with df= 25 and O.oi < P-value < 0.02. Thus, we have fairly strong
I I
1 ~__!!__ ~ I
II SE, 0.2369 I
evidence that there is a significant linear relationship between the two returns. That is, the slope
is nonzero. (c) r2 = 0.214 or 21.4%, so only 21.4% of the variation in the overseas returns is
explained by using linear regression with U.S. returns as the explanatory variable. Using this
linear will not be very useful in practice.
I
I
I
~
. :
--
I
I
I
I
:;.;
Iii:
I
I
II
!'' I
I
I i
15.24 (a) The residual plot (below on the left) shows that the variability about the regression I
line increases as the U.S. return increases. (b) The histogram (below on the right) indicates that
the distribution of the residuals is skewed to the right. The outlier is from 1986, when the
I
overseas return was much higher than our regression model predicts. I
I
I
. I
I
I
324 Chapter 15 I Part IV Review Exercises 325
I
15.26 Using df= 21 and t' = 1.721, a 90% confidence interval for fJ is Part IV Review Exercises
-9.69491.72lxl.8887 = (-12.9454, -6.4444). With 90% confidence, we estimate that for I
each one minute increase in time (a slower, more leisurely swim) the professor's pulse will drop I IV.l (a) We label each subject using labels 01, 02, 03, ... , 44, and then enter the partial table of
on average between 6 and 13 beats per minute. There is a negative relationship between the rando~ digits and read two-~igit groups. The labels 00 and 45 to 99 are not used in this example,
time and heart rate. A scatterplot is shown below. I so we Ignore them. We also Ignore any repeats of a label, since that subject is already assigned to
a group. We need to pick 22 subjects in this way to have the regular chips frrst (the other 22
I subjects will have the fat-free chips frrst), but here we pick only the frrst 5. The frrst two-digit

--...
.~.
~-......._

.
I
I
group is 19, so the subject with label19 is in the regular chips frrst group. The second two-digit
group is 22 and the third two-digit group is 39, so the subjects with labels 22 and 39 are also in
the regular chips first group. The fourth two-digit group is 50, which we ignore. The next two
. D~

.
~
~
0

. I two-digit gr~ups are 34 and 05, so the subjects with labels 34 and 05 are in the regular chips first

-- I
group. (b) Smce we want to compare the amounts of regular and fat-free chips eaten and each
woman serves as her own control, we will use a paired t test. The hypotheses are
Ho: Jl"ff = 0 and H.: Jldiff "'0, where "diff' =(weight in grams of regular potato chips eaten)-
I
(weight in grams of fat-free potato chips eaten).
I
IV.2 Step I: Hypotheses
-
I H 0 :There is no difference in response between walking and resting flies.
.,I '
I H. :There is some difference in response between walking and resting flies.
,! I Step 2: Conditions The expected cell counts are
. '
Response to Vibrate?
'II

.!'I I Yes No
I
'

I 38x64 58x64
Fly was walking 25.3 38 6
r I
I 96 96 .
lli 38x32 58x32
i; Fly was resting 12.6 19.3
' I 96 96
All expected cell counts are greater than I and no expected cell counts are less than 5 (the
smallest expected cell count is 12.7).
I Step 3: Calculations
Test statistic

x2 = L: (o-E)2
I E
= 9.2807 + 6.0805+ 18.5614+ 12.1609
= 46.0835
I P-value Under the null hypothesis, the test statistic has a x2 distribution with 1 degree
I of freedom. The P-value is less than 0.0005 (technology gives us 1.13 X ro- 11 ).
Step 4: Interpretation: Because the expected cell counts are all large, the P-value from Table E
I will be quite accurate. There is strong evidence to reject H 0
I (X 2 =46.0835, df =l,P-value < 0.0005) and conclude that resting flies respond differently than
I flies that are walking.
I
I
I

Das könnte Ihnen auch gefallen