Beruflich Dokumente
Kultur Dokumente
Second Edition
Ann E. Watkins
Richard L. Scheaffer
George W. Cobb
Project Editor Josephine Noah
Consulting Editor Kendra Lockman
Project Administrators Elizabeth Ball, Aaron Madrigal
Editorial Assistants Aneesa Davenport, Nina Mamikunian
Teacher Consultant and Writer Corey Andreasen, North High School, Sheboygan, Wisconsin
Mathematics Reviewer and Cindy Clements, Trinidad State Junior College, Trinidad, Colorado
Accuracy Checker
AP Sample Test Contributor Joshua Zucker, Castilleja School, Palo Alto, California
AP Teacher Reviewers Angelo DeMattia, Columbia High School, Maplewood, New Jersey
Beth Fox-McManus, formerly of Alan C. Pope High School, Marietta, Georgia
Dan Johnson, Silver Creek High School, San Jose, California
Multicultural Reviewers Gil Cuevas, University of Miami, Coral Gables, Florida
Genevieve Lau, Skyline College, San Bruno, California
Beatrice Lumpkin, Malcolm X College (retired), Chicago, Illinois
Editorial Production Manager Christine Osborne
Production Editor Kristin Ferraioli
Copyeditor Mary Roybal
Production Supervisor Ann Rothenbuhler
Production Coordinator Thomas Brierly
Text Designers Graphic World, Thomas Brierly
Compositor Graphic World
Art Editor and Technical Artist LMP Media, Inc.
Cover Designers Jensen Barnes, Nidaul Uk
Cover Photo Credit Getty Images/Alberto Incrocci
Prepress and Printer RR Donnelley
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any
form or by any means, electronic, photocopying, recording, or otherwise, without the prior
written permission of the publisher.
®Key Curriculum Press is a registered trademark of Key Curriculum Press. ™Fathom Dynamic
Data is a trademark of KCP Technologies. All other registered trademarks and trademarks in this
book are the property of their respective holders.
Key Curriculum Press
1150 65th Street
Emeryville, CA 94608
editorial@keypress.com
www.keypress.com
iii
Acknowledgments
This book is a product of what we have learned from the statisticians and teachers
who have been actively involved in helping the introductory statistics course
evolve into one that emphasizes activity-based learning of statistical concepts
while reflecting modern statistical practice. This book is written in the spirit of the
recommendations from the MAA’s STATS project, the ASA’s Quantitative Literacy
and GAISE projects, and the College Board’s AP Statistics course. We hope that
it adequately reflects the wisdom and experience of those with whom we have
worked and who have inspired and taught us.
We owe special thanks to Corey Andreasen, an outstanding high school
mathematics and AP Statistics teacher at North High School in Sheboygan,
Wisconsin, for his insight into what makes a topic “teachable” to high school
students. His careful review of the manuscript led to many clarifications of
wording and improvements in exercises, all of which will make it easier for you to
learn the material. Corey also has made substantial contributions to the solutions
and teacher’s notes, adding his unique perspective and sense of humor.
It has been an awesome experience to work with the Key Curriculum staff
and field-test teachers, who always put the interests of students and teachers first.
Their commitment to excellence has motivated us to do better than we ever could
have done on our own. Steve, Casey, Jim, Kristin B., Kristin F., and the rest of the
staff have been professional and astute throughout. Our deepest gratitude goes to
Cindy Clements and Josephine Noah, the editors of the first and second editions,
respectively, who have been a joy to work with. (Not all authors say that—and
mean it—about their editors.) Cindy and Josephine were outstanding high school
teachers before coming to Key. Their organizational skills, experience in the
classroom, and insight have improved every chapter of this text.
iv
Contents
vi
CHAPTER 10 Chi-Square Tests 672
10.1 Testing a Probability Model: The Chi-Square Goodness-of-Fit Test 674
10.2 The Chi-Square Test of Homogeneity 692
10.3 The Chi-Square Test of Independence 711
Chapter Summary 728
vii
A Note to Students
from the Authors
Data enter the conversation whether you talk about income, sports, health,
politics, the weather, or prices of goods and services. In fact, in this age of
information technology, data come at you at such a rapid rate that you can catch
only a glimpse of the masses of numbers. You cannot cope intelligently in this
quantitative world unless you have an understanding of the basic concepts of
statistics and have had practice making informed decisions using real data.
Statistics in Action is designed for students taking an introductory high
school statistics course and includes all of the topics in the Advanced Placement
(AP) Statistics syllabus. Beginning in Chapter 1 with a court case about age
discrimination, you will be immersed in real-world problems that can be solved
only with statistical methods. You will learn to explore, summarize, and display data;
design surveys and experiments; use probability to understand random behavior;
make inferences about populations by looking at samples from those populations;
and make inferences about the effect of treatments from designed experiments.
After completing your statistics course, you will be prepared to take the AP
Statistics Exam, to take a follow-up college-level course, and, above all, to make
informed decisions in this world of data.
You will be using this book in a first course in statistics, so, you aren’t required
to know anything yet about statistics. You may find that your success in statistics
results more from your perseverance in trying to understand what you read rather
than your skill with algebra. However, basic topics from algebra, such as slope,
linear equations, exponential equations, and the idea of a logarithm, will arise
throughout the book. Be prepared to review these as you go along.
ix
These features grow out of the vigorous changes that have been reshaping the
practice of statistics and the teaching of statistics over the last quarter century.
The most basic question to ask about any data set is, “Where did the data come
from?” Good data for statistical analysis must come from a good plan for data
collection. Thus, Statistics in Action treats the design and analysis of experiments
honestly and thoroughly and discusses how these methods of collecting data differ
from observational studies. It then follows through on this theme by relating the
statistical analysis to the manner in which the data were collected.
Ann Watkins
Dick Scheaffer
George Cobb
x
Statistics in Action
Understanding a World of Data
Second Edition
CHAPTER
1 Statistical Reasoning:
Investigating a Claim
of Discrimination
20 25 30 35 40 45 50 55 60 65 70
Age
Were older workers
discriminated against
during a company’s
downsizing? When an
older worker felt he had
been unfairly laid off,
his lawyers called on a
statistician to help them
evaluate the claim.
In the year Robert Martin turned 54, the Westvaco Corporation, which makes
paper products, decided to downsize. They laid off several members of the
engineering department, including Robert Martin. Later that year, he sued
Westvaco, claiming he had been laid off because of his age. A major piece of
Martin’s case was based on a statistical analysis of the ages of the Westvaco
employees.
In the two sections of this chapter, you will get a chance to try your hand at
two very different kinds of statistical work, exploration and inference. Exploration
is an informal, open-ended examination of data. Your goal in the first section
will be to uncover and summarize patterns in data from Westvaco that bear on
the Martin case. You will try to formulate and answer basic questions such as
“Were those who were laid off older on average than those who weren’t laid off?”
You can use any tools—graphs, averages, and so on—that you think might be
useful. Inference, which you’ll use in the second section, is quite different from
exploration in that it follows strict rules and focuses on judging whether the
patterns you found are the sort you would expect. You’ll use inference to decide
whether the patterns you find in the Westvaco data are the sort you would expect
from a company that does not discriminate on the basis of age, or whether further
investigation into possible age discrimination is needed.
The purpose of this first chapter is to familiarize you with the ideas of
statistical thinking before you involve yourself with the details of statistical
methods. It is easy to get caught in the trap of doing rather than understanding, of
asking how rather than why. You can’t do statistics unless you learn the methods,
but you must not get so caught up in the details of the methods that you lose
sight of what they mean. Doing and thinking, method and meaning, will compete
for your attention throughout this course.
Display 1.1 The data in Martin v. Westvaco. [Source: Martin v. Envelope Division of Westvaco Corp.,
CA No. 92-03121-MAP, 850 Fed. Supp. 83 (1994).]
20 25 30 35 40 45 50 55 60 65 70
Age
Display 1.2 Ages of the salaried workers. (Each dot represents a
worker; the age is shown by the position of the dot
along the scale below it.)
Display 1.2 provides some useful information about the variability in the ages,
but by itself doesn’t tell anything about possible age discrimination in the layoffs.
For that, you need to distinguish between those salaried workers who lost their
jobs and those who didn’t. The dot plot in Display 1.3, which shows those laid
off and those retained, provides weak evidence for Martin’s case. Those laid off
generally were older than those who kept their jobs, but the pattern isn’t striking.
Laid Off
Job Status
Retained
20 25 30 35 40 45 50 55 60 65 70
Age
Display 1.3 Salaried workers: ages of those laid off and those
retained.
Display 1.3 shows that most salaried workers who were laid off were age 50
or older. However, this alone doesn’t support Martin’s case because most of the
workers were age 50 or older to begin with.
One way to proceed is to make a summary table. The table shown here
classifies the salaried workers according to age and whether they were laid off
or retained. (Using 50 as the dividing age between “younger” and “older” is a
somewhat arbitrary, but reasonable, decision.)
Laid Off Retained Total
Under 50 6 10 16
50 or Older 12 8 20
Total 18 18 36
Round 1
Round 2
Round 3
Round 4
Round 5
20 25 30 35 40 45 50 55 60 65 70
Age
Display 1.4 Salaried workers: ages of those laid off (open circles)
and those retained (solid dots) in each round.
You might feel as if the analysis so far ignores important facts, such as worker
qualifications. That’s true. However, the first step is to decide whether, based
on the data in Display 1.1, older workers were more likely to be laid off. If not,
Martin’s case fails. If so, it is then up to Westvaco to justify its actions.
Job Status
Retained
20 25 30 35 40 45 50 55 60 65 70
Age
Display 1.5 Hourly workers: ages of those laid off and those
retained.
D3. Whenever you think you have a message from data, you should be careful
not to jump to conclusions. The patterns in the Westvaco data might be
“real”—they reflect age discrimination on the part of management. On
the other hand, the patterns might be the result of chance—management
wasn’t discriminating on the basis of age but simply by chance happened
to lay off a larger percentage of older workers. What’s your opinion about
the Westvaco data: Do the patterns seem “real”—too strong to be explained
by chance?
D4. The analysis up to this point ignores important facts such as worker
qualifications. Suppose Martin makes a convincing case that older workers
were more likely to be laid off. It is then up to Westvaco to justify its actions.
List several specific reasons Westvaco might give to justify laying off a
disproportionate number of older workers.
Total 10 4 14
Exercises
E1. This summary table classifies salaried a. What proportion of workers age 40 or
workers as to whether they were laid off and older were laid off ? What proportion of
their age, this time using 40 as the cutoff laid-off workers were age 40 or older?
between younger and older workers. b. What proportion of workers under age 40
Laid Off Retained Total were laid off ? What proportion were not
Under 40 4 5 9 laid off ?
40 or Older 14 13 27
c. What two proportions should you
compute and compare in order to
Total 18 18 36
Display 1.7 Number of bases stolen in a single season by the top five Major
League baseball players. [Source: mlb.mlb.com.]
b.
Closing Price Closing Price Percentage
Stock on 10/28 on 10/30 Change Change Volume
Chrysler 40 33_12 6_12 16.25 269,100
Coca-Cola 137 128_3
8 —?— —?— 14,100
Eastman Kodak 181_18 —?— 11_18 6.14 27,800
General Electric 250 222 28 11.20 136,300
General Motors —?— 40 7_1
2
—?— 971,300
Proctor & Gamble 77_3
4 66_1
2 11_1
4 14.47 13,800
US Steel 186 174 —?— —?— 307,300
Display 1.8 New York Stock Exchange Activity for October 29, 1929.
[Source: marketplace.publicradio.org.]
Laid Off
Job Status
Retained
25 30 35 40 45 50 55 60 65
Age
Display 1.9 Hourly workers: ages of those laid off and those
retained in Round 2.
To simplify the statistical analysis to come, it helps to “condense” the data into
Use a summary statistic a single number, called a summary statistic. One possible summary statistic is
to “condense” the data. the average, or mean, age of the three workers who lost their jobs:
55 55 64 58 years
____________
3
Knowing what to make of the data requires balancing two points of view. On
one hand, the pattern in the data is pretty striking. Of the five workers under age
50, all kept their jobs. Of the five who were 55 or older, only two kept their jobs.
On the other hand, the number of workers involved is small: only three out of ten.
Should you take seriously a pattern involving so few cases? Imagine two people
taking sides in an argument that was at the center of the statistical part of the
Martin case.
Martin: Look at the pattern in the data. All three of the workers laid
off were much older than the average age of all workers. That’s
evidence of age discrimination.
Westvaco: Not so fast! You’re looking at only ten workers total, and only
three positions were eliminated. Just one small change and the
picture would be entirely different. For example, suppose it had
been the 25-year-old instead of the 64-year-old who was laid off.
Switch the 25 and the 64, and you get a totally different set of
averages. (Ages in red are those selected for layoff.)
Actual Data: 25 33 35 38 48 55 55 55 56 64
Altered Data: 25 33 35 38 48 55 55 55 56 64
Martin: Not so fast yourself ! Of all the possible changes, you picked the
one most favorable to your side. If you’d switched one of the
55-year-olds who got laid off with the 55-year-old who kept his
or her job, the averages wouldn’t change at all. Why not compare
what actually happened with all the possibilities?
Westvaco: What do you mean?
Martin: Start with the ten workers, and pick three at random. Do this over
and over, to see what typically happens, and compare the actual
data with the results. Then we’ll find out how likely it is that the
average age of those laid off would be 58 or greater.
Activity 1.2a shows you how to estimate the probability of getting an average
age of 58 years or greater if you choose three workers at random. You will
Simulation requires a use simulation, a procedure in which you set up a model of a chance process
chance model. (drawing three ages out of a box) that copies, or simulates, a real situation
(selecting three employees at random to lay off ).
[See Calculator Note 1B to learn how to do this kind of simulation with your
calculator.]
Your simulation was completely age-neutral. All sets of three workers
had exactly the same chance of being selected for layoff, regardless of age. The
simulation tells you what results are reasonable to expect from that sort of
age-blind process.
The simulation tells what Shown here are the first 4 of 200 repetitions from such a simulation. (The
kind of data to expect if ages in red are those selected for layoff.) The average ages of the workers selected
workers are selected at for layoff—42.7, 48.0, 42.7, and 37.0—are highlighted by the red dots in the
random for layoff.
distribution of all 200 repetitions in Display 1.10.
Average Age
25 33 35 38 48 55 55 55 56 64 42.7
25 33 35 38 48 55 55 55 56 64 48.0
25 33 35 38 48 55 55 55 56 64 42.7
25 33 35 38 48 55 55 55 56 64 37.0
30 35 40 45 50 55 60
Mean Age 58
Display 1.10 Results of 200 repetitions: the distribution of the
average age of the three workers chosen for layoff
by chance alone.
Martin: Look at the pattern in the data. All three of the workers laid off
were much older than average.
Westvaco: So what? You could get a result like that just by chance. If chance
alone can account for the pattern, there’s no reason to ask us for
any other explanation.
Martin: Of course you could get this result by chance. The question is
whether it’s easy or hard to do so. If it’s easy to get an average
as large as 58 by drawing at random, I’ll agree that we can’t rule
out chance as one possible explanation. But if an average that
large is really hard to get from random draws, we agree that
it’s not reasonable to say that chance alone accounts for the
pattern. Right?
Westvaco: Right.
Martin: Here are the results of my simulation. If you look at the three
hourly workers laid off in Round 2, the probability of getting an
average age of 58 or greater by chance alone is only 5%. And if you
do the same computations for the entire engineering department,
the probability is a lot lower, about 1%. What do you say to that?
Westvaco: Well . . . I’ll agree that it’s really hard to get an average age that
extreme simply by chance, but that by itself still doesn’t prove
discrimination.
Martin: No, but I think it leaves you with some explaining to do!
In the actual case, Martin and Westvaco reached a settlement out of court
before the case went to trial.
The logic you’ve just seen is basic to all statistical inference, but it’s not easy
to understand. In fact, it took mathematicians centuries to come up with the
ideas. It wasn’t until the 1920s that a brilliant British biological scientist and
mathematician, R. A. Fisher, realized that results of agricultural experiments
may be analyzed in a way similar to that in Activity 1.2a to see whether observed
differences should be attributed to chance alone or to treatment. Calculus, in
contrast, was first understood in 1665. Precisely because it is so important, the
logic of using randomization as a basis for statistical inference will be seen over
and over again throughout this book. You’ll have lots of time to practice with it.
Practice
The Logic of Inference b. Describe a simulation for finding the
P4. Suppose three workers were laid off from a distribution of the average age of ten
set of ten whose ages were the same as those workers laid off at random.
of the hourly workers in Round 2 in the c. The results of 200 repetitions from a
Martin case. This time, however, the ages of simulation are shown in Display 1.11.
those laid off were 48, 55, and 55. Suppose 10 workers are picked at random
25 33 35 38 48 55 55 55 56 64 for layoff from the 14 hourly workers.
Make a rough estimate of the probability
a. Use the dot plot in Display 1.10 on page
of getting, just by chance, the same or
14 to estimate the probability of getting
larger average age as that of the workers
an average age as large as or larger than
who actually were laid off (from part a).
that of those laid off in this situation.
d. Does this analysis provide evidence in
b. What would your conclusion be if
Martin’s favor?
Westvaco had laid off workers of these
three ages?
P5. At the beginning of Round 1, there were 14
hourly workers. Their ages were 22, 25, 33,
35, 38, 48, 53, 55, 55, 55, 55, 56, 59, and 64.
After the layoffs were complete, the ages
of those left were 25, 38, 48, and 56. Think 42 44 46 48 50 52
about how you would repeat Activity 1.2a Mean Age
using these data.
Display 1.11 Results of 200 repetitions.
a. What is the average age of the ten workers
laid off ?
35 40 45 50 55
Mean Age
Chapter Summary
In this chapter, you explored the data from an actual case of alleged age
discrimination, looking for evidence you considered relevant. You then saw how
to use statistical reasoning to test the strength of the evidence: Are the patterns
in the data solid enough to support Martin’s claim of age discrimination, or are
they the sort that you would expect to occur even if there was no discrimination?
Along the way you made a substantial start at learning many of the most
important statistical terms and concepts: distribution, cases and variables,
summary statistic, simulation, and how to determine whether the result from
the real-life situation can reasonably be attributed to chance alone or whether an
explanation is called for.
You have practiced both thinking like a statistician and reporting your results
like a statistician. Throughout this textbook, you will be asked to justify your
answers in the real-world context. This includes stating assumptions, giving
appropriate plots and computations, and writing a conclusion in context.
The last chapter of this book includes a final look at the Martin case.
Review Exercises
E15. A teacher had two statistics classes, and Earlier class:
students could enroll in either the earlier 99 95 69 91 79 67 64 54 68
class or the later class. The final grades in the 47 53 86 100 95 45 41 59 66
courses are given here. Later class:
84 68 94 77 88 75 88 91 83
61 97 75 37 82 62 49 43 93
25 30 35 40 45 50 55
Mean Age
Chapter Summary 23
AP Sample Test
AP1. This plot shows the ages of the part-time No, because half of hourly workers were
and full-time students who receive financial laid off, but more than half of salaried
aid at a small college. Which of the following workers were laid off.
is a conclusion about students at this college AP3. This table shows the number of male
that cannot be drawn from the plot alone? and female applicants who applied and
were either admitted to or rejected from
a graduate program. What proportion of
Full-time
admitted applicants were female?
Status
Part-time Male 17 33 50
Female 8 12 20
18 20 22 24 26 28 30 32 34 36
Age Total 25 45 70
AP Sample Test 25
CHAPTER
2 Exploring Distributions
0 1 2 3 4 5 6 7 8 9
• If the slip you drew says “fake it,” don’t use the page from the phone
book but instead make up and plot 30 digits on a dot plot using a scale
(continued)
The number of births per month in a year is another set of data you might expect
to be fairly uniform. Or, is there a reason to believe that more babies are born in
one month than in another? Display 2.1 shows a table and plot of U.S. births (in
thousands) for 2003.
Births
Month (in thousands)
1 330
2 307
3 337 400
4 330 350
Number of Births
300
(in thousands)
5 347
6 337 250
7 364 200
8 360
150
100
9 360
50
10 354
0
11 320 1 2 3 4 5 6 7 8 9 10 11 12
12 344 Month
Display 2.1 An example of a (roughly) uniform distribution:
births per month in the United States, 2003.
[Source: Centers for Disease Control and Prevention.]
The plot shows that there is actually little change from month to month;
that is, we see a roughly uniform distribution of births across the months. To
summarize this distribution, you might write “The distribution of births is
roughly uniform over the months January through December, with about
340,000 births per month.”
Normal Distributions
Activity 2.1b introduces one of the most important common shapes of
distributions and one of the common ways this shape is produced. What happens
when different people measure the same distance or the same feature of very
similar objects? In the activity, you’ll measure a tennis ball with a ruler, but the
results you get will reflect what happens even if you use very precise instruments
under carefully controlled conditions. For example, a 10-gram platinum weight
is used for calibration of scales all across the United States. When scientists at the
National Institute of Standards and Technology use an analytical balance for the
weight’s weekly weighing, they face a similar challenge due to variability.
Mode = Mean
Display 2.4 A normal curve, showing the line of symmetry, mode,
mean, inflection points, and standard deviation (SD).
Use the mean and You should use the mean (or average) to describe the center of a normal
standard deviation to distribution. The mean is the value at the point where the line of symmetry
describe the center
intersects the x-axis. You should use the standard deviation, SD for short, to
and spread of a normal
distribution.
describe the spread. The SD is the horizontal distance from the mean to an
inflection point.
68% of the area
It is difficult to locate inflection points, especially when curves are drawn
by hand. A more reliable way to estimate the standard deviation is to use areas.
Inflection Inflection For a normal curve, 68% (roughly) of the total area under the curve is between
point SD SD point the vertical lines through the two inflection points. In other words, the interval
between one standard deviation on either side of the mean accounts for roughly
3 2 1 0 1 2 3 68% of the area under the normal curve.
35 40 45 50 55 60
Average Age
Display 2.5 Distribution of average age for groups of five workers
drawn at random.
[You can graph a normal curve on your calculator by specifying the mean and standard
deviation. See Calculator Note 2B.]
In this section, you’ve seen the three most common ways normal distributions
arise in practice:
• through variation in measurements (diameters of tennis balls)
• through natural variation in populations (weights of pennies)
• through variation in averages computed from random samples (average ages)
All three scenarios are common, which makes the normal distribution especially
important in statistics.
Skewed Distributions
Both the uniform (rectangular) and normal distributions are symmetric. That
is, if you smooth out minor bumps, the right side of the plot is a mirror image
of the left side. Not all distributions are symmetric, however. Many common
Skewed left distributions show bunching at one end and a long tail stretching out in the other
direction. These distributions are called skewed. The direction of the tail tells
whether the distribution is skewed right (tail stretches right, toward the high
values) or skewed left (tail stretches left, toward the low values).
Mode
Skewed right
“Tail” of the distribution
Display 2.6 Weights of bears in pounds. [Source: MINITAB data set from
MINITAB Handbook, 3rd ed.]
The dot plot in Display 2.6 shows the weights, in pounds, of 143 wild bears. It is
skewed right (toward the higher values) because the tail of the distribution stretches
out in that direction. In everyday conversation, you might describe the two parts of
Upper Quartile
Bimodal Distributions
Many distributions, including the normal distribution and many skewed
distributions, have only one peak (unimodal), but some have two peaks
(bimodal) or even more. When your distribution has two or more obvious peaks,
A bimodal distribution or modes, it is worth asking whether your cases represent two or more groups.
has two peaks. For example, Display 2.9 shows the life expectancies of females from countries on
two continents, Europe and Africa.
30 40 50 60 70 80 90
Years
Display 2.9 Life expectancy of females by country on two
continents. [Source: Population Reference Bureau, World
Population Data Sheet, 2005.]
Africa
Europe
30 40 50 60 70 80 90
Years
Display 2.10 Life expectancy of females in Africa and Europe.
Although it makes sense to talk about the center of the distribution of life
expectancies for Europe or for Africa, notice that it doesn’t really make sense to
talk about “the” center of the distribution for both continents together. You could
possibly tell the locations of the two peaks, but finding the reason for the two
modes and separating the cases into two distributions communicates even more.
Practice
Practice problems help you master basic concepts a. What value divides the distribution in
and computations. Throughout this textbook, you half, with half the numbers below that
should work all the practice problems for each value and half above?
topic you want to learn. The answers to all practice b. What values divide the distribution into
problems are given in the back of the book. quarters?
Uniform (Rectangular) Distributions c. What values enclose the middle 50% of
the distribution?
P1. This diagram shows a uniform distribution
on [0, 2], the interval from 0 through 2. d. What percentage of the values lie between
0.4 and 0.7?
e. What values enclose the middle 95% of
0 2 the distribution?
250
(in thousands)
V.
Display 2.13 Four distributions that are Display 2.14 Five distributions with different
approximately normal. shapes.
The 2003 Women’s World Cup Championship team, 48.0 50.0 52.0 54.0
from Germany Age
E3. Sketch these distributions. Display 2.15 Ages of colonels. Each dot represents
a. a uniform distribution that shows the sort two points. [Source: Data and Story Library at
Carnegie-Mellon University, lib.stat.cmu.edu.]
of data you would get from rolling a fair
die 6000 times
c. What kind of wall might there be that Display 2.18 Last digit of a sample of Social Security
causes the shape of the distribution? numbers.
Generate as many possibilities as you can. E11. Although a uniform distribution gives a
E7. The distribution in Display 2.17 shows reasonable approximation of the actual
measurements of the strength in pounds of distribution of births over months (Display
22s yarn (22s refers to a standard unit for 2.1 on page 29), you can “blow up” the graph
measuring yarn strength). What is the basic to see departures from the uniform pattern,
shape of this distribution? What feature as in Display 2.19. Do these deviations from
makes it uncharacteristic of distributions the uniform shape form their own pattern,
with that shape? or do they appear haphazard? If you think
there’s a pattern, describe it.
370
360
Number of Births
(in thousands)
350
60 70 80 90 100 110 120 130 140 340
Weight (lb)
320
Display 2.17 Strength of yarn. [Source: Data and 310
Story Library at Carnegie-Mellon University,
lib.stat.cmu.edu.] 300
1 3 5 7 9 11
E8. Sketch a normal distribution with mean 0 Month
and standard deviation 1. You will study this Display 2.19 A “blow up” of the distribution of births
standard normal distribution in Section 2.5. over months, showing departures
from the uniform pattern.
A family in Albania
5 15 25 35 45 55 65 75
Speed (mi/h)
Display 2.25 Dot plot of the speeds of mammals.
When are dot plots most As you saw in Section 2.1, a dot plot shows shape, center, and spread. Dot
useful? plots tend to work best when
• you have a relatively small number of values to plot
• you want to see individual values, at least approximately
• you want to see the shape of the distribution
• you have one group or a small number of groups you want to compare
Histograms
Histograms show groups A dot plot shows individual cases as dots above a number line. To make a
of cases as bars. histogram, you divide the number line into intervals, called bins, and over
each bin construct a bar that has a height equal to the number of cases in that
bin. In fact, you can think of a histogram as a dot plot with bars drawn around
Frequency
2
0
15 30 45 60 75
Speed (mi/h)
Display 2.26 Histogram of mammal speeds.
Borderline values go in Most calculators and statistical software packages place a value that falls at the
the bar to the right. dividing line between two bars into the bar to the right. For example, in Display
2.26, the bar going from 30 to 35 contains cases for which 30 speed 35.
Changing the width of the bars in your histogram can sometimes change your
impression of the shape of the distribution. For example, the histogram of the
speeds of mammals in Display 2.27 has fewer and wider bars than the histogram
in Display 2.26. It shows a more symmetric, bell-shaped distribution, and there
appears to be one peak rather than two. There is no “right answer” to the question
of which bar width is best, just as there is no rule that tells a photographer
when to use a zoom lens for a close-up. Different versions of a picture bring out
different features. The job of a data analyst is to find a plot that shows important
features of the distribution.
6
4
Frequency
0
10 20 30 40 50 60 70 80
Speed (mi/h)
Display 2.27 Speeds of mammals using a histogram with
wider bars.
Relative Frequency
0.20
0.15
0.10
0.05
0.00
30 40 50 60 70 80 90
Life Expectancy
Display 2.28 Life expectancies of people in countries around the
world. [Source: Population Reference Bureau, World Population
Data Sheet, 2005.]
Solution
The bar including 70 years and up to 75 years has a relative frequency of about
0.30, so the number of countries with a life expectancy of at least 70 years but less
than 75 years is about 0.30 203, or about 61.
The proportion of countries with a life expectancy of 70 years or greater is the
sum of the heights of the three bars to the right of 70—about 0.30 0.19 0.07,
or 0.56.
■
DISCUSSION Histograms
D7. In what sense does a histogram with narrow bars, as in Display 2.26, give
you more information than a histogram with wider bars, as in Display 2.27?
In light of your answer, why don’t we always make histograms with very
narrow bars?
D8. Does using relative frequencies change the shape of a histogram? What
information is lost and gained by using a relative frequency histogram rather
than a frequency histogram?
Stemplots
The plot in Display 2.29 is a stem-and-leaf plot, or stemplot, of the mammal
speeds. It shows the key features of the distribution and preserves all the original
numbers.
1 1 2
2 0 5
3 0 00259
4 0 00258
5 0
6
7 0
3 | 9 represents 39 mi/h
Display 2.29 Stemplot of mammal speeds.
DISCUSSION Stemplots
D9. What information is given by the numbers in the bottom half of the far
left column of the plot in Display 2.32? What does the 2 in parentheses
indicate?
20
Frequency
10
0
0 1
Display 2.33 Bar chart showing frequency of domesticated (0)
and wild (1) mammals.
Proportion
0.20
0.15
0.10
0.05
0.00
1 2 3 4 5 6 7 8 9
Educational Attainment (women)
Display 2.34 The female labor force age 25 years and older by
educational attainment. [Source: U.S. Census Bureau, March
2005 Current Population Survey, www.census.gov.]
The educational categories in Display 2.34 have a natural order from least
education to most education and are coded with the numbers 1 through 9. Note
that if you compute the mean of this distribution, there is no reasonable way to
interpret it. However, it does make sense to summarize this distribution using the
mode: More women fall into the category “high school graduate” than into any
other category. Thus, the numbers 1 through 9 are best thought of as representing
an ordered categorical variable, not a quantitative variable.
You will learn more about the analysis of categorical data in Chapter 10.
Practice
More About Dot Plots Histograms
P6. In the listing of the Westvaco data in P8. Make histograms of the average longevities
Display 1.1 on page 5, which variables are and the maximum longevities from Display
quantitative? Which are categorical? 2.24. Describe how the distributions differ in
P7. Select a reasonable scale, and make a dot terms of shape, center, and spread. Why do
plot of the gestation periods of the mammals these differences occur?
listed in Display 2.24 on page 43. Write a P9. Convert your histograms from P8 of
sentence using shape, center, and spread the average longevities and maximum
to summarize the distribution of gestation longevities of the mammals to relative
periods for the mammals. What kinds of frequency histograms. Do the shapes of the
mammals have longer gestation periods? histograms change?
P10. Using the relative frequency histogram of
life expectancy in countries around the
world (Display 2.28 on page 47), estimate
the proportion of countries with a life
expectancy of less than 50 years. Then
estimate the number of countries with a life
expectancy of less than 50 years. Describe
the shape, center, and spread of this
distribution.
Stemplots
P11. Make a back-to-back stemplot of the average
longevities and maximum longevities from
Display 2.24 on page 43. Compare the two
distributions.
Proportion
Display 2.34. What are the cases, and what 0.20
is the variable? Describe the distribution 0.15
you see here. How does the distribution of
0.10
female education compare to the distribution
0.05
of male education? Why is it better to look
at relative frequency bar charts rather 0.00
1 2 3 4 5 6 7 8 9
than frequency bar charts to make this Educational Attainment (men)
comparison?
Display 2.35 The male labor force age 25 years
and older by educational attainment.
[Source: U.S. Census Bureau, March 2005 Current
Population Survey, www.census.gov.]
Exercises
E15. The dot plot in Display 2.36 shows the a. Where did this data set come from? What
distribution of the ages of pennies are the cases and the variables?
in a sample collected by a statistics b. What are the shape, center, and spread of
class. this distribution?
c. Does the distribution have any unusual
characteristics? What are possible
interpretations or explanations of the
patterns you see in the distribution? That
is, why does the distribution have the
shape it does?
E16. Suppose you collect this information for
each student in your class: age, hair color,
number of siblings, gender, and miles he
or she lives from school. What are the
cases? What are the variables? Classify
0 10 20 30 40 50 60
Age each variable as quantitative or
categorical.
Display 2.36 Age of pennies. Each dot represents
four points.
Frequency
statistics
II. heights of a group of mothers and their
12-year-old daughters 100
III. numbers of medals won by medal-
winning countries in the 2004 Summer
Olympics 0
30 34 38 42 46 50 54 58 62
IV. weights of grown hens in a barnyard Age
0.14
0.12
Male Heights
C. D. 0.10
0.08
0.06
0.04
0.02
0.00
60 62 64 66 68 70 72 74 76 78 80
Male Heights (in inches)
Display 2.37 Four histograms with different shapes. Display 2.39 Heights of males, age 18 to 24. [Source:
U.S. Census Bureau, Statistical Abstract of the
E18. Using the technology available to you, make United States, 1991.]
histograms of the average longevity and
maximum longevity data in Display 2.24 a. Draw a smooth curve to approximate the
on page 43, using bar widths of 4, 8, and histogram.
16 years. Comment on the main features b. Without doing any computing, estimate
of the shapes of these distributions and the mean and standard deviation.
determine which bar width appears to c. Estimate the proportion of men age 18 to
display these features best. 24 who are 74 in. tall or less.
E19. Rewrite each sentence so that it states a d. Estimate the proportion of heights that
relative frequency rather than a count. fall below 68 in.
a. Six students in a class of 30 got an A. e. Why should you say that the distribution
b. Out of the 50,732 people at a concert, of heights is “approximately” normal
24,021 bought a T-shirt. rather than simply saying that it is
normally distributed?
domesticated mammals generally have Display 2.41 Population pyramids for the United
greater longevity? States and Mexico, 2005. [Source: U.S.
Census Bureau, International Data Base,
b. Using the data in Display 2.24 on page 43,
www.census.gov.]
make a back-to-back stemplot to compare
the average longevities.
c. Write a short summary comparing the
two distributions.
Measures of Center
The two most commonly used measures of center are the mean and the median.
__
The mean, x , is the same number that many people call the “average.” To
compute the mean, sum all the values of x and divide by the number of values, n:
__ ∑x
x ___
n
(The symbol ∑, for sum, means to add up all the values of x.)
Median
Display 2.44 The median divides the distribution into two
equal areas.
Before
Median
After
20 30 40 50 60
Display 2.45 Ages of Westvaco hourly workers before and after
Round 2, showing the means and medians.
Solution
Means
464
Before: The sum of the ten ages is 464, so the mean age is ___10 , or 46.4 years.
290
After: There are seven ages and their sum is 290, so the mean age is ___ 7 , or
41.4 years.
The layoffs reduced the mean age by 5 years.
Medians
(n 1) (10 1)
Before: Because there are ten ages, n 10, so _____
2 ______
2 or 5.5, and the
median is halfway between the fifth ordered value, 48, and the sixth
(48 55)
ordered value, 55. The median is _______
2 , or 51.5 years.
(n 1) (7 1)
After: There are seven ages, so _____
2 _____
2 or 4. The median is the fourth
ordered value, or 38 years.
The layoffs reduced the median age by 13.5 years.
■
Q1 M Q3
After: After the three workers are laid off in Round 2, there are seven ages: 25, 33,
35, 38, 48, 55, 56. Because n is odd, the median is the middle value, 38. Omit this
one number. The lower half of the data is made up of the three ordered values
to the left of position 4. The median of these is the second value, so Q1 is 33.
The upper half of the data is the set of the three ordered values to the right of
position 4, and the median of these is again the second value, so Q3 is 55. The
IQR is 55 33, or 22.
25 33 35 38 48 55 56
Q1 M Q3
■
The difference of the maximum and the minimum is called the range.
Display 2.46 shows the five-number summary for the speeds of the mammals
listed in Display 2.24.
1 1 2 min 11
2 0 5 Q1 30
3 0 00259 median 37
4 0 00258 Q3 42
5 0 max 70
6
7 0
0 20 40 60 80
Speed (mi/h)
Display 2.47 Boxplot of mammal speeds.
The maximum speed of 70 mi/h for the cheetah is 20 mi/h from the next
fastest mammal (the lion) and 28 mi/h from the nearest quartile. It is handy to
have a version of the boxplot that shows isolated cases—outliers—such as the
cheetah. Informally, outliers are any values that stand apart from the rest. This
rule often is used to identify outliers.
A value is an outlier if it is more than 1.5 times the IQR from the nearest
quartile.
1.5 IQR rule for outliers Note that “more than 1.5 times the IQR from the nearest quartile” is another
way of saying “either greater than Q3 plus 1.5 times IQR or less than Q1 minus 1.5
times IQR.”
2.3 Measures of Center and Spread 61
Example: Outliers in the Mammal Speeds
Use the 1.5 IQR rule to identify outliers and the largest and smallest
non-outliers among the mammal speeds.
Solution
From Display 2.46, Q1 30 and Q3 42, so the IQR is 42 30 or 12, and
1.5 IQR equals 18.
At the low end:
Q1 1.5 IQR 30 18 12
The pig, at 11 mi/h, is an outlier.
The squirrel, at 12 mi/h, is the smallest non-outlier.
At the high end:
Q3 1.5 IQR 42 18 60
A modified boxplot, shown in Display 2.48, is like the basic boxplot except
that the whiskers extend only as far as the largest and smallest non-outliers
(sometimes called adjacent values) and any outliers appear as individual dots or
other symbols.
0 20 40 60 80
Speed (mi/h)
Display 2.48 Modified boxplot of mammal speeds.
Boxplots are particularly useful for comparing several distributions.
Wild
Domesticated
0 10 20 30 40
Average Longevity (yr)
Display 2.49 Comparison of average longevity.
[See Calculator Note 2D to learn how to display regular and modified boxplots and
five-number summaries on your calculator.]
When are boxplots most Boxplots are useful when you are plotting a single quantitative variable and
useful?
• you want to compare the shapes, centers, and spreads of two or more
distributions
• you don’t need to see individual values, even approximately
• you don’t need to see more than the five-number summary but would like
outliers to be clearly indicated
ACTIVITY 2.3a Comparing Hand Spans: How Far Are You from the Mean?
What you’ll need: a ruler
1. Spread your hand on a ruler and
measure your hand span (the distance
from the tip of your thumb to the tip of
your little finger when you spread your
fingers) to the nearest half centimeter.
2. Find the mean hand span for your
group.
3. Make a dot plot of the results for your
group. Write names or initials above
the dots to identify the cases. Mark the mean with a wedge (▲) below the
number line.
4. Give two sources of variability in the measurements. That is, give two
reasons why all the measurements aren’t the same.
5. How far is your hand span from the mean hand span of your group? How
far from the mean are the hand spans of the others in your group?
6. Make a plot of differences from the mean. Again label the dots with names
or initials. What is the mean of these differences? Tell how to get the
second plot from the first without computing any differences.
7. Using the idea of differences from the mean, invent at least two measures
that give a “typical” distance from the mean.
8. Compare your measures with those of the other groups in your class.
Discuss the advantages and disadvantages of each group’s method.
__
The differences from the mean, x x , are called deviations. The mean is
the balance point of the distribution, so the set of deviations from the mean will
always sum to zero.
How can you use the deviations from the mean to get a measure of spread?
You can’t simply find the average of the deviations, because you will get 0 every
time. As you might have suggested in the activity, you could find the average of
the absolute values of the deviations. That gives a perfectly reasonable measure
of spread, but it does not turn out to be very easy to use or very useful. Think of
how hard it is to deal with an equation that has sums of absolute values in it, for
example, y | x 1 | | x 2 | | x 3 |. On the other hand, if you square the
deviations, which also gets rid of the negative signs, you get a sum of squares.
Such a sum is always quadratic no matter how many terms there are, for example,
y (x 1)2 (x 2)2 (x 3)2 3x 2 12x 14.
The measure of spread that incorporates the square of the deviations is the
standard deviation, abbreviated SD or s, that you met in Section 2.1. Because sums
of squares really are easy to work with mathematically, the SD offers important
advantages that other measures of spread don’t give you. You will learn more
about these advantages in Chapter 7. The formula for the standard deviation, s, is
given in the box.
∑(x x )
s _________
n1
The square of the standard deviation, s 2, is called the variance.
It might seem more natural to divide by n to get the average of the squared
Calculators might label deviations. In fact, two versions of the standard deviation formula are used: One
the two versions σn and divides by the sample size, n; the other divides by n 1. Dividing by n 1 gives
σn1, or σn and s.
a slightly larger value. This is useful because otherwise the standard deviation
computed from a sample would tend to be smaller than the standard deviation of
the population from which the sample came. (You will learn more about this in
Chapter 7.) In practice, dividing by n 1 is almost always used for real data even
if they aren’t a sample from a larger population.
∑(x x )
______
s _________ 196 4.67
n1 10 1
[You can organize the steps of calculating the standard deviation on your calculator.
See Calculator Note 2E.]
■
Your calculator will compute the summary statistics for a set of data. [See
Calculator Note 2F.] Here are the summary statistics for the domesticated mammal
longevity data. Note that the standard deviation calculated in the previous
example is denoted as Sx. Note also that the five-number summary is shown.
__ ∑x f
x _____
n
The standard deviation is given by
___________
__ 2
∑(x x ) f
s
___________
n1
where n is the sum of the frequencies, or n ∑ f.
__ ∑x f ___
40
x _____
n 10 4
Display 2.51 Steps in computing the mean of a frequency table.
Display 2.52 gives an extended version of the table, designed to organize the
steps in computing both the mean and the standard deviation.
Value Frequency _ _ _
x f x f x ⫺ x (x ⫺ x )2 (x ⫺ x )2 f
Penny 1 5 5 3 9 45
Nickel 5 3 15 1 1 3
Dime 10 2 20 6 36 72
Sum 10 40 120
___________
__ 2 ____
∑(x x ) f
s
___________
n1
___9 3.65
120
Practice
Measures of Center Stem-and-leaf of Life Stem-and-leaf of Life
Exp Africa Exp Europe
P14. Find the mean and median of these ordered N = 56 Leaf Unit = 1.0 N = 41 Leaf Unit = 1.0
2 3 55 2 7 22
lists. 4 3 77 5 7 455
a. 1 2 3 4 b. 1 2 3 4 5 4 3 13 7 66667777
5 4 1 19 7 888999
c. 1 2 3 4 5 6 d. 1 2 3 4 5 . . . 97 98 9 4 2233 (9) 8 000111111
14 4 44455 13 8 2222223333
e. 1 2 3 4 5 . . . 97 98 99 20 4 666666 3 8 444
P15. Five 3rd graders, all about 4 ft tall, are 27 4 8888999
(2) 5 00 6 |8 represents 68 years
standing together when their teacher, who 27 5 2333
is 6 ft tall, joins the group. What is the new 23 5 455
20 5 677
mean height? The new median height? 17 5 8999
P16. The stemplots in Display 2.53 show the life 13 6
13 6 22
expectancies (in years) for females in the 11 6 4
countries of Africa and Europe. The means 10 6 6
9 6
are 53.6 years for Africa and 79.3 years for 9 7
Europe. 9 7 222
6 7 455
a. Find the median life expectancy for each 3 7 6
set of countries. 2 7 8
1 8 0
b. Is the mean or the median smaller for
each distribution? Why is this so? Display 2.53 Female life expectancies in Africa and
Europe. [Source: Population Reference Bureau,
World Population Data Sheet, 2005.]
Exercises
E29. The mean of a set of seven values is 25. Six A. B. C.
of the values are 24, 47, 34, 10, 22, and 28.
8 8 10
What is the 7th value?
6 6 8
E30. The sum of a set of values is 84, and the
Frequency
Frequency
Frequency
6
mean is 6. How many values are there? 4 4
4
E31. Three histograms and three boxplots appear 2 2 2
in Display 2.58. Which boxplot displays the 0
0 0
same information as 8 9 10 11 12 13 14 15 16 12 13 14 15 16
a. histogram A?
I.
b. histogram B?
c. histogram C? II.
III.
8 9 10 11 12 13 14 15 16
The dot plot in Display 2.63 shows that the temperatures are centered at about
32°F, with an outlier at 22°F. The spread and shape are hard to determine with
only seven values.
–25 –15 –5 5 15 25 35 45 55
Temperature (°F)
Display 2.63 Dot plot for record low temperatures in degrees
Fahrenheit for seven capitals.
0 10 20 30 40 50 60 70 80
Number of Viewers (in millions)
Display 2.66 Number of viewers of prime-time television shows
in a particular week.
The printout in Display 2.67 gives the summary statistics for all 101 shows.
Variable N Mean Median StDev
Ratings 101 11.187 10.150 9.896
Variable Min Max Q1 Q3
Ratings 2.320 76.260 6.160 12.855
100
80
Percentile
60
40
20
0
250 350 450 550 650 750
SAT I Critical Reading Score
Display 2.69 Cumulative relative frequency plot of SAT I critical
reading scores and percentiles, 2004–2005. [Source:
The College Board, www.collegeboard.org.]
Practice
Which Summary Statistic? P27. A news release at www.polk.com stated that
P26. A community in Nevada has 9751 the median age of cars being driven in 2004
households, with a median house price of was 8.9 years, the oldest to date. The median
$320,000 and a mean price of $392,059. was 8.3 years in 2000 and 7.7 years in 1995.
a. Why is the mean larger than the median? a. Why were medians used in this news
story?
b. The property tax rate is about 1.15%.
What total amount of taxes will be b. What reasons might there be for the
assessed on these houses? increase in the median age of cars? (The
median age in 1970 was only 4.9 years!)
c. What is the average amount of taxes per
house?
Frequency
b. each child grows 2 in. 6
Exercises
E47. Discuss whether you would use the mean b. the yield of corn (bushels per acre) for a
or the median to measure the center of each sample of farms in Iowa
set of data and why you prefer the one you c. the survival time, following diagnosis, of
chose. a sample of cancer patients
a. the prices of single-family homes in your
neighborhood
8
deviation of the original data to the formula
6 for the standard deviation of the recentered
4 data.
2 E53. The cumulative relative frequency plot in
0 Display 2.74 shows the amount of change
98 106 114 122 130 138
High Temperature (°F) carried by a group of 200 students. For
example, about 80% of the students had
Display 2.72 Record high temperatures for the 50 $0.75 or less.
U.S. states. [Source: National Climatic Data
Center, 2002, www.ncdc.noaa.gov.] 100%
80%
from degrees Fahrenheit, F, to degrees
Celsius, C, using the formula 60%
5(F 32)
C __ 40%
9
20%
If you make a histogram of the
temperatures in degrees Celsius, how 0%
will it differ from the one in Display 2.72? 0 25 50 75 100 125 150 175 200 225 250 275
Amount of Change (in cents)
b. The summary statistics in Display 2.73
are for record high temperatures in Display 2.74 Cumulative percentage plot of amount
degrees Fahrenheit. Make a similar table of change.
for the temperatures in degrees Celsius. a. From this plot, estimate the median
amount of change.
The normal distribution with mean 0 and standard deviation 1 is called the
standard normal distribution. In this distribution, the variable along the
horizontal axis is called a z-score.
P = –?–
0 z = 1.23
Display 2.77 The percentage of values less than z 1.23.
Solution
Tail probability p Think of 1.23 as 1.2 0.03. In Table A on pages 824–825, find the row labeled
z .02 .03 .04 1.2 and the column headed .03. Where this row and column intersect, you find
1.20 .8888 .8907 .8925
the number .8907. That means that 89.07% of standard normal scores are less
than 1.23.
The total area under the curve is 1, so the proportion of values greater than
z 1.23 is 1 0.8907, or 0.1093, which is 10.93%.
■
A graphing calculator will give you greater accuracy in finding the proportion
of values that lie between two specified values in a standard normal distribution.
For example, you can find the proportion of values that are less than 1.23 in a
standard normal distribution like this:
[To learn more about calculating the proportion of values between two
z-scores, see Calculator Note 2I.]
P = 0.75
0 z = –?–
Display 2.78 The z-score that corresponds to the 75th percentile.
Tail probability p Look for .7500 in the body of Table A. No value in the table is exactly equal to
z .06 .07 .08 .7500. The closest value is .7486. The value .7486 sits at the intersection of the row
.60 .7454 .7486 .7517
labeled .60 and the column headed .07, so the corresponding z-score is roughly
0.60 0.07, or 0.67.
■
You can use a graphing calculator to find the 75th percentile of a standard
normal distribution like this:
[To learn more about finding the z-score that has a specified proportion of values
below it, see Calculator Note 2J.]
Alaska had 88 deaths per 100,000 residents from heart disease, and 111 from
cancer. Explain which death rate is more extreme compared to other states. [Source:
Centers for Disease Control, National Vital Statistics Report, vol. 53, no. 5, October 12, 2004.]
Solution
88 238 2.88
z heart ________
52
111 196 2.74
zcancer _________
31
Alaska’s death rate for heart disease is 2.88 standard deviations below the mean.
The death rate for cancer is 2.74 standard deviations below the mean. These rates
are about equally extreme, but the death rate for heart disease is slightly more
extreme.
■
Solution
First make a sketch of the situation, as in Display 2.79. Draw a normal shape
above a horizontal axis. Place the mean in the middle on the axis. Then mark and
label the points that are two standard deviations either side of the mean, 64.7 and
75.5, so that about 95% of the values lie between them. Next, mark and label the
points that are one and three standard deviations either side of the mean (67.4
and 72.8, and 62 and 78.2). Finally, estimate the location of the given value of x
and mark it on the axis.
P = –?–
P = –?–
P = 0.75
–3 –2 –1 0 1 2 3
90% of the values lie within 1.645 standard deviations of the mean.
90%
–1.645 0 1.645
95% of the values lie within 1.96 (or about 2) standard deviations of the mean.
95%
–1.96 0 1.96
99.7% (or almost all) of the values lie within 3 standard deviations of the mean.
99.7%
–3 0 3
Practice
The Standard Normal Distribution P33. Find the z-score that has the given
P32. Find the percentage of values below percentage of values below it in a standard
each given z-score in a standard normal normal distribution.
distribution. a. 32% b. 41% c. 87% d. 94%
a. 2.23 b. 1.67 c. 0.40 d. 0.80
Exercises
E59. What percentage of values in a standard a. x 12, mean 10, SD 1
normal distribution fall b. x 12, mean 10, SD 2
a. below a z-score of 1.00? 2.53? c. x 12, mean 9, SD 2
b. below a z-score of 1.00? 2.53? d. x 12, mean 9, SD 1
c. above a z-score of 1.5? e. x 7, mean 10, SD 3
d. between z-scores of 1 and 1? f. x 5, mean 10, SD 2
E60. On the same set of axes, draw two normal E62. Unstandardizing. Find the value of x that was
curves with mean 50, one having standard converted to the given z-score.
deviation 5 and the other having standard
a. z 2, mean 20, SD 5
deviation 10.
b. z 1, mean 25, SD 3
E61. Standardizing. Convert each of these values
to standard units, z. (Do not use a calculator. c. z 1.5, mean 100, SD 10
These are meant to be done in your head.) d. z 2.5, mean 10, SD 0.2
Frequency of TotalPoints
250
24-year-old males in the United States.
About how many are between 67 and 200
68 in. tall? 150
c. Find the height of 18- to 24-year-old
males that falls at the 90th percentile. 100
Chapter Summary
Distributions come in various shapes, and the appropriate summary statistics (for
center and spread) usually depend on the shape, so you should always start with a
plot of your data.
Common symmetric shapes include the uniform (rectangular) distribution
and the normal distribution. There are also various skewed distributions. Bimodal
distributions often result from mixing cases of two kinds.
Chapter Summary 95
Dot plots, stemplots, and histograms show distributions graphically and let
you estimate center and spread visually from the plot.
For approximately normal distributions, you ordinarily use the mean (balance
point) and standard deviation as the measure of center and spread. If you know
the mean and standard deviation of a normal distribution, you can use z-scores
and Table A or your calculator to find the percentage of values in any interval.
The mean and standard deviation are not resistant—their values are sensitive
to outliers. For a description of a skewed distribution, you should consider using
the median (halfway point) and quartiles (medians of the lower and upper halves
of the data) as summary statistics.
Later on, when you make inferences about the entire population from a
sample taken from that population, the sample mean and standard deviation will
be the most useful summary statistics, even if the population is skewed.
Review Exercises
E75. The map in Display 2.83, from the U.S. f. Describe the shape, center, and spread
National Weather Service, gives the number of the distribution of the number of
of tornadoes by state, including the District tornadoes.
of Columbia. E76. Display 2.84 shows some results of the Third
1 International Mathematics and Science study
0 0 0 for various countries. Each case is a school.
15 13 1
4 33 275
8 10 7 6 0
9 10 2 250
42 6 5
Hours per Year by School
1 26
2 76 11 8 2 225
48 0 10
7 72 35 30 24 200
39 26
4 79 1 175
9 24 54
45 37 22 150
232 51
125
95
0 100
75
0
50
Australia
Canada
Czech Republic
Hong Kong
Israel
Korea
New Zealand
Norway
Thailand
England
United States
Singapore
10 192 56
2
(25) 193 0000114446666666666666667
15 194
3 15 195 24444
10 196 1
9 197 55
35 40 45 50 55 60 65 70 75 80 85 7 198 35
Life Expectancy (yr) 5 199 44445
Group Mean Median StdDev Display 2.86 Stem-and-leaf plots of record low and
A 76.44 78 4.15
high temperatures of states. [Source:
National Climatic Data Center, 2002.]
B 72 72 5.24
C 52.20 49 11.04
E81. A distribution is symmetric with
approximately equal mean and median. Is
Display 2.85 Life expectancies for the countries of it necessarily the case that about 68% of the
Africa, Europe, and the Middle East. values are within one standard deviation of
[Source: Population Reference Bureau, World
Population Data Sheet, 2005.]
the mean? If yes, explain why. If not, give an
example.
Chapter Summary 97
E82. Display 2.87 shows two sets of graphs. The E83. The average number of pedestrian deaths
first set shows smoothed histograms I–IV for annually for 41 metropolitan areas is given
four distributions. The second set shows the in Display 2.88.
corresponding cumulative relative frequency Average Annual
plots, in scrambled order A–D. Match each Metro Area Deaths
plot in the first set with its counterpart in the Atlanta 84
second set. Baltimore 66
Distributions Boston 22
Charlotte, NC 29
I. II. Chicago 180
2.0 Cincinnati 23
1.00
1.6 Cleveland 36
0.75 Columbus, OH 20
1.2
0.50 0.8 Dallas 76
0.25 0.4 Denver 28
0 0 Detroit 107
0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00
Fort Lauderdale 58
III. IV. Houston 101
Indianapolis 24
2.0 2.0 Kansas City 27
1.6 1.6 Los Angeles 299
1.2 1.2 Miami 100
0.8 0.8 Milwaukee 19
0.4 0.4 Minneapolis 35
0 0 Nassau-Suffolk, NY 80
0.25 0.50 0.75 1.00 0.25 0.50 0.75 1.00
Newark, NJ 51
Cumulative relative frequency plots New Orleans 47
A. B. New York 310
Norfolk, VA 25
1.0 1.0 Orlando, FL 48
0.8 0.8 Philadelphia 120
0.6 0.6 Phoenix 79
0.4 0.4 Pittsburgh 33
0.2 0.2 Portland, OR 34
0 0 Riverside, CA 92
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
Rochester, NY 17
C. D. Sacramento, CA 37
Salt Lake City 28
1.0 1.0 San Antonio 37
0.8 0.8 San Diego 96
0.6 0.6
0.4 0.4 San Francisco 43
0.2 0.2 San Jose, CA 33
Seattle 37
0 0 St. Louis 51
0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
Tampa 85
Display 2.87 Four distributions with different
Washington, DC 98
shapes and their cumulative relative
frequency plots. Display 2.88 Average annual pedestrian deaths.
[Source: Environmental Working Group and
the Surface Transportation Policy Project.
Compiled from National Highway Traffic Safety
Administration and U.S. Census data. USA Today,
April 9, 1997.]
Chapter Summary 99
E86. For the countries of Europe, many average E91. The average income, in dollars, of people in
life expectancies are approximately the each of the 50 states was computed for 1980
same, as you can see from the stemplot in and for 2000. Summary statistics for these
Display 2.53 on page 69. Use the formulas two distributions are given in Display 2.92.
for the summary statistics of values in a 1980 2000
frequency table to compute the mean and
Mean 9,725 28,336
standard deviation of the life expectancies
Standard Deviation 1,503 4,413
for the countries of Europe.
Minimum 7,007 21,007
E87. Construct a set of data in which all values
Lower Quartile 8,420 25,109
are larger than 0, but one standard deviation
below the mean is less than 0. Median 9,764 28,045
Upper Quartile 10,746 30,871
E88. Without computing, what can you say about
the standard deviation of this set of values: Maximum 14,866 41,495
4, 4, 4, 4, 4, 4, 4, 4? Display 2.92 Summary statistics of the average
E89. In this exercise, you will compare how income, in dollars, for the 50 states
dividing by n versus n 1 affects the SD for for 1980 and 2000. [Source: U.S. Census
Bureau, Statistical Abstract of the United States,
various values of n. So that you don’t have to 2004–2005.]
compute the sum of the squared deviations
each time, assume that this sum is 400. a. Explain the meaning of $7,007 for the
a. Compare the standard deviation that minimum in 1980.
would result from b. Are any states outliers for either year?
i. dividing by 10 versus dividing by 9 c. In 2000 the average personal income
ii. dividing by 100 versus dividing by 99 in Alabama was $23,768, and in 1980 it
was $7,836. Did the income in Alabama
iii. dividing by 1000 versus dividing change much in relation to the other
by 999 states? Explain your reasoning.
b. Does the decision to use n or n 1 in the E92. For these comparisons, you will either use
formula for the standard deviation matter the SAT I critical reading scores in Display
very much if the sample size is large? 2.69 on page 78 or assume that the scores
E90. If two sets of test scores aren’t normally have a normal distribution with mean 505
distributed, it’s possible to have a larger and standard deviation 111.
z-score on Test II than on Test I yet be in a. Estimate the percentile for an SAT I
a lower percentile on Test II than on Test I. critical reading score of 425 using the
The computations in this exercise will cumulative relative frequency plot. Then
illustrate this point. find the percentile for a score of 425 using
a. On Test I, a class got these scores: 11, 12, a z-score. Are the two values close?
13, 14, 15, 16, 17, 18, 19, 20. Compute the b. Estimate the SAT I critical reading score
z-score and the percentile for the student that falls at the 40th percentile, using the
who got a score of 19. table in Display 2.69. Then find the 40th
b. On Test II, the class got these scores: percentile using a z-score. Are the two
1, 1, 1, 1, 1, 1, 1, 18, 19, 20. Compute the values close?
z-score and the percentile for the student c. Estimate the median from the cumulative
who got a score of 18. relative frequency plot. Is this value close
c. Do you think the student who got a score to the median you would get by assuming
of 19 on Test I or the student who got a a normal distribution of scores?
score of 18 on Test II did better relative to
the rest of the class?
Frequency
safely—that is, the hit results in a player 15
advancing to a base—usually reported to 10
three decimal places.)
5
30 0
.150 .200 .250 .300 .350 .400
25 Batting Average
AP1. These summary statistics are for the day with her credit card. The hotels charged
distribution of the populations of the major a mean price of 50 euros, with a standard
cities in Brazil. deviation of 10 euros. When the charges
Variable N Mean Median TrMean StDev SEMean appear on her credit card statement in
Population 222 381056 191348 261985 820246 55051 the United States, she finds that her bank
Variable Min Max Q1 Q3
Population 100049 10009231 129542 324323
charged her $1.20 per euro, plus a $5 fee
for each transaction. What is the mean and
Which of the following best describes the standard deviation of the thirty daily hotel
shape of this distribution? charges in dollars, including the fee?
skewed right without outliers mean $50, standard deviation $17
skewed right with at least one outlier mean $60, standard deviation $12
roughly normal, without outliers mean $60, standard deviation $17
skewed left without outliers mean $65, standard deviation $12
skewed left with at least one outlier mean $65, standard deviation $17
AP2. Which of these lists contains only summary AP5. The scores on a nationally administered test
statistics that are sensitive to outliers? are approximately normally distributed with
mean, median, and mode mean 47.3 and standard deviation 17.3.
standard deviation, IQR, and range Approximately what must a student have
mean and standard deviation scored to be in the 95th percentile nationally?
median and IQR 55 61 73 76 81
five-number summary AP6. A particular brand of cereal boxes is labeled
“16 oz.” This dot plot shows the actual
AP3. This stem-and-leaf plot shows the ages of
weights of 100 randomly selected boxes.
CEOs of 60 corporations whose annual sales
Which of the following is the best estimate
were between $5 million and $350 million.
of the standard deviation of these weights?
Which of the following is not a correct
statement about this distribution?
The distribution is skewed left (towards
smaller numbers).
The oldest of the 60 CEOs is 74 years old.
The distribution has no outliers.
15.8 15.9 16.0 16.1 16.2 16.3 16.4
The range of the distribution is 42. Weight of Cereal
The median of the distribution is 50. 0.04 oz. 0.1 oz.
Stem-and-leaf of AGE N = 60
Leaf Unit = 1.0 0.2 oz. 0.4 oz.
3 23 between 16.0 and 16.2 oz.
3 678
4 013344 AP7. The distribution of the number of points
4 55556677788889 earned by the thousands of contestants in
5 000000112333
5 555666677889 the Game of Pig World Championship has
6 0111223 mean 20 and standard deviation 6. What
6 99
7 04
proportion of the contestants earned more
than 26 points?
AP4. A traveler visits Europe and stays thirty
days in thirty different hotels, paying each
100
x
x
x
x x
x
x x x
x x x x
90 x x x
x x
x x
Graduation Rate
x xx
x x
xx x
80 x xx
x x x
x
x
x x
x x
x x x x
70 x
x
x
x
What variables
contribute to a
college having a
high graduation 60
rate? Scatterplots,
correlation, and
1400 1600 1800 2000 2200 2400
regression are
the basic tools used SAT 75th Percentile
to describe relation-
ships between two
quantitative variables.
In Chapter 2, you compared the speeds of predators and nonpredators. Not
surprisingly, among mammals meat eaters were usually faster than vegetarians.
Some nonpredators, however, such as the horse (48 mi/h) and the elk (45 mi/h),
were faster than some predators, such as the dog (39) and the grizzly (30). Because
of this variability, comparing the two groups was a matter for statistics; that is, you
needed suitable plots and summaries.
The comparison involved a relationship between two variables, one
quantitative (speed) and one categorical (predator or not). In this chapter, you’ll
learn how to explore and summarize relationships in which both variables are
quantitative. The data set on mammals in Display 2.24 on page 43 raises many
questions of this sort: Do mammals with longer average longevity also have
longer maximum longevity? Is there a relationship between speed and longevity?
The approach to describing distributions in Chapter 2 boiled down to finding
shape, center, and spread. For distributions that are approximately normal, two
numerical summaries—the mean for center, the standard deviation for spread—
tell you basically all you need to know. When comparing two quantitative
variables, you can see the shape of the distribution by making a scatterplot. For
scatterplots with points that lie in an oval cloud, it turns out once again that two
summaries tell you pretty much all you need to know: the regression line and the
correlation. The regression line tells about center: What is the equation of the line
that best fits the cloud of points? The correlation tells about spread: How spread
out are the points around the line?
Year of Birth
variables.
50
40
30
20
40 50 60 70 80 90
Year of Hire
Display 3.1 Year of birth versus year of hire for the 50 employees
in Westvaco Corporation’s engineering department.
In this scatterplot, you can see a moderate positive association: Employees
hired in an earlier year generally were born in an earlier year, and employees hired
in a later year generally were born in a later year. This trend is fairly linear. You
can visualize a summary line going through the center of the data from lower left
to upper right. As you move to the right along this line, the points fan out and
cluster less closely around the line.
Sometimes it’s easier to think about people’s ages than about the years
they were born. The scatterplot in Display 3.2 shows the ages of the Westvaco
employees at the time layoffs began plotted against the year they were hired. This
scatterplot shows a moderate negative association: Those people hired in later
years generally were younger at the time of the layoffs than people hired in
earlier years.
70
60
Age at Layoffs
50
40
30
20
40 50 60 70 80 90
Year of Hire
Display 3.2 Age at layoffs versus year of hire for the 50 employees
in Westvaco Corporation’s engineering department.
Dormitory Population
NY
160,000
140,000 CA
120,000
100,000 TX
80,000
60,000
40,000
20,000
0
0 10,000 20,000 30,000
Urban Population
(in thousands)
Display 3.3 Number of people living in college dormitories
versus number of people living in cities for the
50 states in the United States. [Source: U.S. Census Bureau,
2000 Census of Population and Housing.]
Solution
1. Variables and cases. The scatterplot plots dormitory population against urban
population, in thousands, for the 50 U.S. states. Dormitory population
ranges from near 0 to a high of more than 174,000 in New York. The
urban population ranges from near 0 to about 17 million in Texas and
New York and 32 million in California.
2. Shape. While most states follow a linear trend, the three states with the largest
urban population suggest curvature in the plot because, for those states,
the number of people living in dormitories is proportionately lower than in
the smaller states. California can be considered an outlier with respect to its
urban population, which is much larger than that of other states. It is also
an outlier with respect to the overall pattern, because it lies far below the
generally linear trend.
3. Trend. The trend is positive—states with larger urban populations tend to
have larger dormitory populations, and states with smaller urban populations
tend to have smaller dormitory populations.
4. Strength. The relationship varies in strength. For the states with the smallest
urban populations, the points cluster rather closely around a line. For the
states with the largest urban populations, the points are scattered farther from
the line. Overall, the strength of the relationship is moderate.
5. Generalization. The 50 states aren’t a sample from a larger population of
cases, so the relationship here does not generalize to other cases. Because
both variables tend to change rather slowly, however, we can expect the
relationship in Display 3.3 to be similar to that of other years.
20
Dormitory Population
18 MA
16
14
12
10
8
6
4
2
0
0.0 0.2 0.4 0.6 0.8 1.0
Proportion of Population
Living in Cities
Display 3.4 The proportion of people living in college
dormitories versus the proportion of people living
in cities for the 50 U.S. states.
Practice
Describing the Pattern in a Scatterplot c. Determine whether this statement is
P1. Growing kids. This table gives median heights true or false, and explain your answer:
of boys at ages 2, 3, 4, 5, 6, and 7 yr. American had a mishandled-baggage
rate that was more than twice the rate of
Age (yr) Height (in.) Age (yr) Height (in.) Southwest.
2 35.8 5 44.2
3 39.1 6 46.8 82 America West
United
Percentage On-Time Arrivals
4 41.4 7 49.6 80
78 Southwest US Airways
a. Scatterplot. Plot height versus age; that 76 Continental
is, put height on the y-axis and age on JetBlue
74
the x-axis. American
72
b. Shape, trend, and strength. Describe Delta
70
the shape, trend, and strength of the Alaska
68
relationship. Northwest
66
c. Generalization. Would you expect
4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5
these data to allow you to make good Mishandled Baggage (per thousand passengers)
predictions of the median height of
Display 3.5 On-time arrivals versus mishandled
8-year-olds? Of 50-year-olds?
baggage. [Source: U.S. Department of
d. Explanation. It doesn’t quite fit to say that Transportation, Air Travel Consumer Report,
age “causes” height, but there is still an October 2005.]
underlying cause-and-effect relationship. d. Is there a positive or a negative
How would you describe it? relationship between the on-time
P2. Late planes and lost bags. A great way to cap percentage and the rate of mishandled
off a long day of travel is to have your plane baggage? Is it strong or weak?
arrive late and then find that the airline has e. Would you expect the relationship in
lost your luggage. As Display 3.5 shows, some this plot to generalize to some larger
airlines handle baggage better than others. population of commercial airlines?
a. Which airline has the worst record for Why or why not? Would you expect
mishandled baggage? For being on time? the relationship in this plot to be roughly
b. Where on the plot would you find the the same for data from 10 years ago? For
airline with the best on-time record and next year?
the best mishandled-baggage rate? Which
airlines are best in both categories?
e. f.
200 150
150 100
y 100 y 50
50 0
0 –50
12 16 20 24 28 12 16 20 24 28
x x
g. h.
200 200
150 150
100
y y 100
50
0 50
–50 0
12 16 20 24 28 12 16 20 24 28 LaTasha Colander crosses the finish line of the
x x women’s 100-meter dash final at the 2004 U.S.
Display 3.6 Eight scatterplots with various Olympic Team Track and Field Trials.
distributions.
Average_SAT_Math_Score
600
III. moderate, roughly linear, positive 580
relationship
560
IV. moderate negative relationship
540
520
500
480
10 20 30 40 50 60 70 80 90
Percentage_Taking_Exam
Graduation Rate
Graduation Rate
Age at Hire
6.750 20.75 7.25 6.00 35
Display 3.12 Data on passenger aircraft. [Source: Air Transport Association of America, 2005,
www.air-transport.org.]
Lines as Summaries
You’ve seen the equation of a line, y slope x y-intercept, so the review here
will be brief. Linear relationships have the important property that for any two
points (x1, y1) and (x 2, y2) on the line, the ratio
rise __________
____ y y1
change in y _______
2
run change in x x 2 x1
is a constant. This ratio is the slope of the line. The rise and run are illustrated
in Display 3.13, where the slope is the ratio of the two sides of the right triangle.
This ratio is the same for any two points on the line because all the triangles
formed are similar.
Run
(change in x)
rise slope.
Display 3.13 ____
run
How thick is a single sheet of your book? One sheet alone is too thin to
measure directly with a ruler, but you could measure the thickness of 100 sheets
together, then divide by 100. This method would give you an estimate of the
thickness but no information about how much your estimate is likely to vary from
the true thickness. The approach in the next activity lets you judge precision as
well as thickness.
The next example illustrates how to find the equation of a line when you know
two points that fall on the line.
5
4
3
2
1
0
1960 1970 1980 1990 2000 2010
Year
Display 3.14 Minimum wages at five-year intervals, 1960
through 2005.
Solution
In theory, you can find the slope from any two points on the line. Here, however,
you have to estimate the coordinates from the graph. In such cases, you usually
can produce a better estimate of the slope by choosing two points that are far
apart. For this plot, choosing the points on the line for the years 1960 and 2000
works well. Approximate points are (1960, 0.80) for (x1, y1) and (2000, 4.80) for
(x2, y2). The estimated slope is
y2 y1 ___________
4.80 0.80 4.0
slope _______ ___
x 2 x 1 2000 1960 40 0.10
y slope x y-intercept
0.80 0.10(1960) y-intercept
y-intercept 0.80 196 195.20
In statistics this equation usually is written with the intercept first, becoming
y 195.20 0.10x
■
300
200
100
0
1970 1980 1990 2000 2010
Year
Display 3.15 CPI at five-year intervals, 1970–2005.
models the rise in the minimum wage for the years 1960 through 2005. Knowing
this equation enables you to make a general statement about the minimum wage
throughout these years: “The minimum wage went up roughly $0.10 per year.”
You might instead want to use the line to predict the minimum wage in one of
the years for which no amount is given or for years before 1960 or after 2005.
Assuming the linear trend continues back to earlier years, the predicted minimum
wage for 1950 is
y 195.20 0.10x 195.20 0.10(1950) 0.20
■
The predicted minimum wage for 2003 is very close to the actual minimum
wage of $5.15 per hour. But the actual minimum wage in 1950 was $0.75 per hour,
not a negative number! As you can see, making the assumption that the linear
trend continues can be risky. This type of prediction, making a prediction when
the value of x falls outside the range of the actual data, is called extrapolation.
Interpolation—making a prediction when the value of x falls inside the range of
the data, as does 2003—is safer.
Suppose you know the value of x and use a line to predict the corresponding
value of y. You know that your prediction for y won’t be exact, but you hope
that the error will be small. The prediction error is the difference between the
ŷ is read “y-hat” and may observed value of y and the predicted value of y, or ŷ. You usually don’t know
be called the “predicted” what that error is. If you did, you wouldn’t need to use the line to predict the value
value or the “fitted” value. of y. You do, however, know the errors for the points used to construct the line.
These differences are called residuals:
residual observed value of y predicted value of y y ŷ
Residual
y
y
x
Display 3.16 Residual y ŷ
The equation of the fitted line is ŷ 8300.6 4.2248x, where x is the year
and ŷ is the income in thousands of dollars.
Graph the fitted line with the data points. What is the residual for the year
1996?
Solution
You can use a graphing calculator to graph a scatterplot with a summary line.
[See Calculator Note 3B.]
The actual net income value for 1996 was $139,000. Using the equation of the
fitted line, the prediction for 1996 is
ŷ 8300.6 4.2248x 8300.6 4.2248(1996) 132.1008, or $132,101
You also can use your calculator to calculate a predicted value quickly. [See
Calculator Note 3C.]
To find the residual, subtract the predicted value from the observed value:
y ŷ 139 132.1008 6.8992
or about $6899. The residual is positive because the observed value is higher than
the predicted value. That is, the point lies above the line.
You can use a calculator to calculate residuals for all points in a data set
simultaneously. [See Calculator Note 3D.]
SSE ∑(residual)2
∑(y ŷ)2
[You can use your calculator to calculate the SSE quickly. See Calculator Note 3E.]
The first equation has the smaller SSE, so it must be the equation of the least
squares regression line. Note that for this line, except for rounding error, the sum
of the residuals, ∑(y ŷ), is equal to 0. This is always the case for the least squares
regression line, but it can be true for other lines, too.
■
[You can use a calculator program to visually explore the least squares regression line
and SSE. See Calculator Note 3F.]
In addition to making the sum of the squared errors as small as possible, the
least squares regression line has some other properties, given in the box on the
next page.
The ratio of the sums of the last two columns gives the slope
__ _
∑(x x )(y y ) _____
b1 ______________
__ 2 80000 16
∑(x x ) 5000
__
y-intercept: Now that you have a point on the line, (100, 1966.6 ), and the
slope, 16, you can find the y-intercept from the equation
_ __
b0 y b1x
__
1966.6 16(100)
__
366.6
This agrees with what you found in the previous example. That is, the
equation of the least squares regression line (with rounded y-intercept) is
ŷ 367 16x
[You also can use your calculator to find the equation of the least squares line. See
Calculator Note 3G.]
■
b. Draw another line that passes between the two points at x 0 and also
passes between the two points at x 2. Compute the sum of the absolute
values of the residuals for this line and compare it to your sum from
part a.
c. Draw yet another line that passes between the two points at x 0 and
also passes between the two points at x 2. Find the sum of the absolute
values of the residuals for this line and compare it to your sums from
parts a and b.
d. Draw a line that does not pass between the two points at x 0. Find the
sum of the absolute values of the residuals for this line and compare it to
your sums from parts a and b.
e. Now find the least squares regression line and compute the sum of the
squared residuals. Compute the sum of the squared residuals for your
lines in parts b, c, and d. What can you conclude?
f. Find the standard deviation of the residuals for the least squares
regression line and for the lines in parts b, c, and d. What can you
conclude?
Display 3.18 Data Desk output giving the equation of the least
squares line for the minimum wage data.
140
130
120
110
100
1990 1992 1994 1996 1998 2000 2002
Year
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 1858.1 1858.1 81.52 0.000
Error 8 182.3 22.8
Total 9 2040.4
The least squares regression line for a set of pairs (x, y) is the line for which
the sum of squared errors, or SSE, is as small as possible. For this line, these
properties hold:
• The sum (and mean) of the residuals is 0.
__ _
• The line contains the point of averages, (x , y ).
• The variation in the residuals is as small as possible.
• The line has slope b1, where
__ _
∑(x x )(y y )
______________
b1 __
∑(x x )2
3.2 Getting a Line on the Pattern 129
To find the equation of the regression line,
__ _
• compute x and y
• find the slope using the formula for b1
_ __
• compute the y-intercept: b0 y b1x
The equation is ŷ b0 b1x. Remember to use a hat, ŷ, to indicate a predicted
value of y.
Practice
Lines as Summaries a. Estimate the slope of the line.
P3. Display 3.22 shows the weight of a student’s b. What does the slope tell you?
pink eraser, in grams, plotted against c. Estimate the equation of the line.
the number of days into the school year. d. Students were instructed to measure their
Estimate the slope of the line drawn on the “hand width” with their fingers spread
graph. Interpret the slope in the context apart as far as possible. The scatterplot
of the situation. [Source: Zach’s Eraser, CMC shows a smaller cloud of points below
ComMuniCator, 28 (June 2004): 28.]
the main one. Why do you think that
92 is the case? What would happen to the
91 regression line if those points were
90
removed?
Weight (g)
8
which is the response variable?
6
4
b. Explain how you can see from the
graph that an increase of five students
2
per faculty member corresponds to a
0 decrease of about 10 percentage points in
4 5 6 7 8 9 10 the giving rate. Explain how you can see
Hand Length (in.)
this from the equation of the fitted line.
Display 3.23 Hand width and hand length, in c. Does the y-intercept have a useful
inches, for 383 students. interpretation in this situation?
P8. The JMP-IN computer output in Display Calories = 279.75 + 2.75 Fat
3.26 is for the pizza data in P6. Does it give Summary of Fit
the same results that you computed by hand?
Where in the output is the SSE found? RSquare 0.975806
RSquare Adj 0.951613
Root Mean Square Error 1.224745
Mean of Response 310
Observations (or Sum Wgts) 3
Analysis of Variance
Parameter Estimates
Exercises
E9. Display 3.27 shows cost in dollars per hour IV. This line underestimates cost for the
versus number of seats for three aircraft smallest plane and overestimates cost
models. Five lines, labeled A–E, are shown for the largest plane.
on the plot. Their equations, listed below, are V. On balance, this line gives a better fit
labeled I–V. than the other lines.
a. Match each line (A–E) with its equation
(I–V). Three Planes Scatter Plot
2000
V. cost 900 10 seats
1800
b. Match each line (A–E) with the
appropriate verbal description (I–V): 1600
1400
I. This line overestimates cost.
1200 A B
II. This line underestimates cost.
1000 C D E
III. This line overestimates cost for the
smallest plane and underestimates 20 40 60 80 100 120 140 160
cost for the largest plane. Seats
Display 3.27 Cost in dollars per hour versus number
of seats for three aircraft models.
60.0
I. calories 70 15 fat
55.0
II. calories 10 25 fat
50.0
III. calories 150 15 fat
IV. calories 110 15 fat 45.0
35.0
Pizza Nutrition Data Scatter Plot
2 4 6 8 10 12 14 16
400 Age (yr)
380 Display 3.30 Median height versus age for boys.
360 [Source: National Health and Nutrition
Examination Survey (NHANES), 2002,
340
www.cdc.gov.]
Calories
320
300
a. Estimate the slope of the line that
summarizes the relationship between age
280
A
B and median height.
260
b. Explain the meaning of the slope with
240
C respect to boys and their median height.
220 D E
c. Write the equation of the line using the
8 10 12 14 16 18 20 slope from part a and a point on the line.
Fat_grams
d. Interpret the y-intercept. Does the
Display 3.29 Five possible fitted lines for the interpretation make sense in this context?
pizza data.
Little Caesar’s Original Round 8 230 Display 3.32 Reaction distance at various speeds.
Little Caesar’s Deep Dish 14.2 350
a. Plot reaction distance versus speed, with
Pizza Hut’s Stuffed Crust 15 370 speed on the horizontal axis. Describe the
shape of the plot.
400 Domino’s Deep Dish
380
b. What should the y-intercept be?
Pizza Hut’s Stuffed Crust
360 c. Find the slope of the line of best fit
Little Caesar’s Deep Dish
340 by calculating the change in y per
Calories
d. Find the person on the plot with Display 3.35 Air quality index for 2001–2003.
[Source: U.S. Environmental Protection Agency,
the largest residual. What was the
www.epa.gov.]
concentration of arsenic in that person’s
toenails? a. By hand, compute the equation of the
e. The World Health Organization has set a least squares line.
standard that the concentration of arsenic
In Activity 3.3a, you will learn more about correlation and you’ll practice
finding the value of r using your calculator.
NumberOfBlogsInMillions
NumberOfBlogsInMillions
16 16
14
12 12
10
8 8
6
4 4
2
0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
MonthsFromMarch03 MonthsFromMarch03
Blog growth with a linear regression. Blog growth with graph of an exponential
equation, y 0.353 1.140x.
Display 3.44 The number of blogs, in millions, versus the
number of months after March 2003. [Source: State of
the Blogosphere, October 2005, Part 1: Blogosphere Growth, posted
by Dave Sifry, October 17, 2005. Technorati News, www.technorati
.com. Table numbers estimated from graph.]
The moral: Always The quiz scores for 22 students in Display 3.45, on the other hand, have a
plot your data before correlation, r, of only 0.48. There is quite a bit of scatter, partly because the quizzes
computing summary
covered very different topics. Quiz 2 covered exponential growth, and Quiz 3
statistics!
covered probability. In spite of the scatter, a line is the most appropriate model
because there is no curvature in the pattern of data points.
30
25
20
Quiz3
15
10
5
0
0 5 10 15 20 25 30 35
Quiz2
Display 3.45 Scores on two consecutive 30-point quizzes.
where sx is the standard deviation of the x’s and sy is the standard deviation of
the y’s. This means that if you standardize the data so that sx 1 and sy 1,
then the slope of the regression line is equal to the correlation.
Solution
The formula gives an estimate of the slope of
sy 115
sx 0.7 113 0.71
b1 r __ ___
__ _
To find the y-intercept, use the fact that the point (x , y ) (508, 520) is on the
regression line:
y slope x y-intercept
520 0.71 (508) y-intercept
y-intercept 159.32
Two variables might Even if you can’t identify a lurking variable, you should be careful to avoid
be highly correlated jumping to a conclusion about cause and effect when you observe a strong
without one causing relationship. The value of r does not tell you anything about why two variables
the other. are related. The statement “Correlation does not imply causation” can help you
remember this. To conclude that one thing causes another, you need data from a
randomized experiment, as you’ll learn in the next chapter.
Interpreting r 2
You might have noticed that computer outputs for regression analysis, like that in
Display 3.18 on page 127, give the value of R-squared, or r 2, rather than the value
of r. The student in this discussion will show you how to think about r 2 as the
fraction of the variation in the values of y that you can eliminate by taking x into
account.
Alexis: Exactly. Except now I have a problem. My ratio is near 0 when the
relationship is strong, and it’s near 1 when the relationship is weak.
That’s backward! Oh, I see how to fix it. Just subtract my ratio
from 1.
SST SSE
SSE _________
1 ____
SST SST
Statistician: Good. Your new ratio is near 1 when the relationship is strong and
near 0 when the relationship is weak. Your old ratio, SSE/SST, gave
the proportion of error still there after the regression, so your new
ratio . . .
Alexis: I can handle it from here. SST is the total error I started with.
SST minus SSE is the amount of error I get rid of by using the
relationship of y with x. So my new ratio is the proportion of error
I eliminate by using the regression.
Statistician: Right!
Alexis: But now I have two measures of strength—the correlation, r, and
my new ratio. Which one should I use?
Statistician: Lucky for you—with a little algebra, they turn out to be equivalent.
SST SSE
r 2 _________
SST
Alexis: Cool!
Statistician: We statisticians call r 2 the coefficient of determination. It tells us
the proportion of variation in the y’s that is “explained” by x.
Alexis: I like it. Anything’s better than those z-scores!
Predicting Pizzas
Display 3.46 compares two sets of predictions of the calorie content in the seven
kinds of pizza from E12 on page 134.
320
300
280
260
240
220
7 9 11 13 15 17 19
Fat (g)
Display 3.47 Squared deviations around the mean of y.
If you use the regression equation, calories 112 14.9 fat, to predict
calories, the resulting errors are much smaller in most cases and are given in the
second to last column of Display 3.46 and shown on the plot in Display 3.48.
400
380
360
340
Calories
320
300
280
260
240
220
7 9 11 13 15 17 19 21
Fat (g)
Display 3.48 Squared deviations around the least squares line.
DISCUSSION Interpreting r 2
D21. The scatterplot in Display 3.49 shows IQ plotted against head circumference,
in centimeters, for a sample of 20 people. The mean IQ was 101, and the
mean head circumference was 56.125 cm. The correlation is 0.138.
105
100
95
90
85
52 53 54 55 56 57 58 59 60
HeadCircum
Display 3.49 [Source: M. J. Tramo, W. C. Loftus, R. L. Green, T. A. Stukel, J. B.
Weaver, and M. S. Gazzaniga, “Brain Size, Head Size, and IQ in
Monozygotic Twins,” Neurology 50 (1998): 1246–52.]
73 y=x
71
Younger Sister’s Height (in.)
69
67
X
65
X
63
61
59
57
57 59 61 63 65 67 69 71 73
Older Sister’s Height (in.)
73 y=x
71
Younger Sister’s Height (in.)
69
Regression
67 line X
65
X
63
61
59
57
57 59 61 63 65 67 69 71 73
Older Sister’s Height (in.)
Display 3.50 Scatterplots showing the regression effect.
3.3 Correlation: The Strength of a Linear Trend 153
DISCUSSION Regression Toward the Mean
D24. Why is the regression line sometimes called the “line of means”?
D25. The equation of the regression line for the scatterplot in Display 3.50
is ŷ 43.102 0.337x. Interpret the slope of this line in the context
of the situation and compare it to the interpretation of the slope of the
line y x.
Geometrically, the correlation measures how tightly packed the points of the
scatterplot are about the regression line.
• The correlation has no units and ranges from 1 to 1. It is unchanged if you
interchange x and y or if you make a linear change of scale in x or y, such as
from feet to inches or from pounds to kilograms.
• In assessing correlation, begin by making a scatterplot and then follow these
steps:
1. Shape: Is the plot linear, shaped roughly like an elliptical cloud, rather
than curved, fan-shaped, or formed of separate clusters? If so, draw
an ellipse to enclose the cloud of points. The data should be spread
throughout the ellipse; otherwise, the pattern might not be linear or
might have unusual features that require special handling. You should
not calculate the correlation for patterns that are not linear.
2. Trend: If your ellipse tilts upward to the right, the correlation is
positive; if it tilts downward to the right, the correlation is negative. The
relationship between the correlation and the slope, b1, of the regression
line is given by
sy
b1 r __
s x
Practice
Estimating the Correlation P10. The table in E12 (Display 3.31 on page
P9. By comparing to the plots in Display 3.41 on 134) gives the amount of fat and number of
page 140, match each of the five scatterplots calories in various pizzas.
in Display 3.51 with its correlation, choosing a. Guess a value for the correlation, r.
from 0.95, 0.5, 0, 0.5, and 0.95. b. Calculate r using your calculator.
a. b. A Formula for the Correlation, r
5 5
4 4
P11. Eight artificial “data sets” are shown here.
For each one, find the value of r, without
y 3 y 3 computing if possible. Drawing a quick
2 2
sketch might be helpful.
1 1
a. x y b. x y
0 1 2 3 4 5 0 1 2 3 4 5
x x 1 1 1 1
0 0 0 1
c. d. 1 1 1 0
6 5
5 4 c. x y d. x y
4
y 3 y 3 1 0 1 1
2 1 0 0 0
2
1 1
1 1 1 1
0 1 2 3 4 5 6 0 1 2 3 4 5 e. f.
x x x y x y
99 9 15 30
e.
100 10 20 40
5
4 101 11 25 20
y 3 g. x y h. x y
2
1003 80 9.9 1000
1
1006 82 10.0 2000
0 1 2 3 4 5 1009 81 10.1 0
x
Display 3.51 Five scatterplots.
y1
page 142, and use the formula to find r. What 20
do you notice about the products zx zy? 15
10
P13. The scatterplot in Display 3.52 is divided into 5
quadrants by vertical and horizontal__lines
_
that 0
pass through the point of averages, (x , y ). 42 44 46 48 50 52 54 56 58 60
x
5
II I
Interpreting r Scatter Plot
4
50
3 40
y
30
2
y2
20
1
10
III IV
0 1 2 3 4 5 0
x 42 44 46 48 50 52 54 56 58 60
x
Display 3.52 Scatterplot divided into quadrants at
__ __ Display 3.53 Two scatterplots with the same
the point of averages, (x , y ).
correlation.
a. Is the correlation positive or negative?
b. Give the coordinates of the point that will The Relationship Between the Correlation
contribute the most to the correlation, r. and the Slope
c. Consider the product P15. Imagine a scatterplot of two sets of exam
__ _ scores for students in a statistics class. The
x x _____
yy
_____ score for a student on Exam 1 is graphed
sx sy on the x-axis, and his or her score on Exam
2 is graphed on the y-axis. The slope of the
Where are the points that have a positive regression line is 0.368. The mean of the
product? How many of the 30 points have Exam 1 scores is 72.99, and the standard
a positive product? deviation is 12.37. The mean of the Exam 2
d. Where are the points that have a negative scores is 75.80, and the standard deviation
product? How many of the 30 points have is 7.00.
a negative product? a. Find the correlation of these scores.
Correlation and the Appropriateness b. Find the equation of the regression line for
of a Linear Model predicting an Exam 2 score from an Exam
1 score. Predict the Exam 2 score for a
P14. Both plots in Display 3.53 have a correlation student who got a score of 80 on Exam 1.
of 0.26. For each plot, is fitting a regression
line (as shown on the plot) an appropriate
thing to do? Why or why not?
Poverty Percentage
cases of stomach cancer per year in the city, 14
you find a high correlation. 12
a. What is the lurking variable? 10
b. How would you adjust the data for the 8
lurking variable to get a more meaningful 6
comparison? 4
P17. If you take a random sample of public school 76 78 80 82 84 86 88 90 92 94
High School Graduation Percentage
students in grades K–12 and measure weekly
allowance and size of vocabulary, you will The regression equation is
find a strong relationship. Explain in terms Poverty = 64.8 – 0.621 HSG
of a lurking variable why you should not
Predictor Coef Stdev t-ratio p
conclude that raising a student’s allowance Constant 64.781 6.803 9.52 0.000
will tend to increase his or her vocabulary. HSG –0.62122 0.07902 –7.86 0.000
P18. For the countries of the United Nations,
s = 2.082 R-sq = 55.8% R-sq(adj) = 54.9%
there is a strong negative relationship
between the number of TV sets per Analysis of Variance
thousand people and the birthrate. What SOURCE DF SS MS F p
would be a careless conclusion about Regression 1 267.88 267.88 61.81 0.000
cause and effect? What is the lurking Error 49 212.37 4.33
variable? Total 50 480.25
70 100
68 95
66
Exam 2 Score
64 90
62 85
60
80
58
56 75
56 58 60 62 64 66 68 70 72 74 70
Younger Sister’s Height (in.) 50 60 70 80 90 100
Exam 1 Score
Display 3.55 The heights of older sisters versus the
heights of their younger sisters. Display 3.56 Exam scores.
Exercises
E27. Each scatterplot in Display 3.57 was made on b. The graduation rate versus the 75th
the same set of axes. Match each scatterplot percentile of SAT scores in E5 on
with its correlation, choosing from 0.06, page 113.
0.25, 0.40, 0.52, 0.66, 0.74, 0.85, and 0.90. c. The college graduation rate versus the
c. d. percentage of students in the top 10%
a. b.
of their high school graduating class
in E5 on page 113.
E29. For each set of pairs, (x, y), compute the
correlation by hand, standardizing and
e. f. g. h. finding the average product.
a. (2, 1), (1, 1), (0, 0), (1, 1), (2, 1)
b. (2, 2), (0, 2), (0, 3), (0, 4), (2, 4)
E30. For each artificial data set in P11 on
Display 3.57 Eight scatterplots with various page 155, compute the correlation by hand,
correlations. standardizing and finding the average
product.
E28. Estimate the correlation between the
variables in these scatterplots. E31. The scatterplot in Display 3.58 shows part
of the hat size data of E6 on page 113. The
a. The proportion of the state population
plot is divided into quadrants by vertical and
living in dorms versus the proportion
horizontal lines that pass through the point
living in cities in Display 3.4 on page 109. __ _
of averages, (x , y ).
weak.
7.2
ii. One sx is larger than the other, the
7.0 sy’s are equal, and the correlations are
6.8 strong.
6.6 E33. Several biology students are working
20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 together to calculate the correlation for the
Circumference relationship between air temperature and
how fast a cricket chirps. They all use the
Display 3.58 Head circumference, in inches, versus
hat size.
same crickets and temperatures, but some
measure temperature in degrees Celsius and
a. Estimate the value of the correlation. others measure it in degrees Fahrenheit.
b. Using the idea of standardized scores, Some measure chirps per second, and others
explain why the correlation is positive. measure chirps per minute. Some use x
c. Identify the point that contributes the for temperature and y for chirp rate, while
most to the correlation. Explain why the others have it the other way around.
contribution it makes is large. a. Will all the students get the same value
d. Identify a point that contributes little for the slope of the least squares line?
to the correlation. Explain why the Explain why or why not.
contribution it makes is small. b. Will they all get the same value for the
E32. The ellipses in Display 3.59 represent correlation? Explain why or why not.
scatterplots that have a basic elliptical shape.
A. B. C.
47 0.327
32 0.288
0.3
24 0.269
28 0.256
26 0.286 0.2
32 0.298 20 30 40 50 60 70 80
Mean Temperature (°°F)
40 0.329
55 0.318 Regression Analysis
63 0.381
The regression equation is
72 0.381 pts/pers = 0.202 + 0.00306 Temp F
72 0.470
Predictor Coef Stdev t-ratio p
67 0.443 Constant 0.20200 0.01452 13.91 0.000
60 0.386 Temp F 0.0030567 0.0002791 10.95 0.000
44 0.342 s = 0.02457 R-sq = 81.6% R-sq(adj) = 80.9%
40 0.319
Analysis of Variance
32 0.307
27 0.284 SOURCE DF SS MS F p
Regression 1 0.072436 0.072436 119.96 0.000
28 0.326
Error 27 0.016304 0.000604
33 0.309 Total 28 0.088740
41 0.359
52 0.376
64 0.416
71 0.437
Display 3.60 Data table, scatterplot, and regression analysis for the effects of outside
temperature on ice cream consumption. [Source: Koteswara Rao Kadiyala, “Testing for
the Independence of Regression Disturbances,” Econometrica 38 (1970): 97–117.]
a. Use the values of SST and SSE in the c. Is there a cause-and-effect relationship
regression analysis to compute r, the between the two variables?
correlation for the relationship between d. What are the units for each of x, y, b1,
the temperature in degrees Fahrenheit and r?
and the number of pints of ice cream
e. The letters MS stand for “mean square.”
consumed per person. Check your
How do you think the MS is computed?
answer against R-sq in the analysis.
b. Compute the value of the residual that is
largest in absolute value.
3500
3000
E42. A few years ago, a school in New Jersey
2500
tested all its 4th graders to select students for
2000
a program for the gifted. Two years later, the
1500
students were retested, and the school was
1000 shocked to find that the scores of the gifted
500 students had dropped, whereas the scores of
the other students had remained, on average,
50 100 150 200 250 300 350 400 the same. What is a likely explanation for
Seats this disappointing development?
Display 3.61 Scatterplot of number of seats versus
fuel consumption (gal/h) for passenger
aircraft.
3.4 Diagnostics: Looking for Features That the Summaries Miss 163
Example: Influential Mammals
The average elephant lives 35 years. The oldest elephant on record lived 70 years.
The average hippo lives 41 years—longer than the average elephant—but the
record-holding hippo lived only 54 years. The oldest-known beaver lived 50 years,
almost as long as the champion hippo, but the average beaver cashes in his wood
chips after only 5 short years of making them. Other mammals, however, are
more predictable. If you look at the entire sample, shown in Display 3.62, it
turns out that the elephant (E), hippo (H), and beaver (B) are the oddballs of
the bunch. For the rest, there’s an almost linear relationship between average
longevity and maximum longevity. The least squares line for the entire sample
has the equation
M̂ 10.53 1.58A
where M̂, or “M-hat,” stands for predicted maximum longevity and A stands for
observed average longevity. For every increase of 1 year in average longevity, the
model predicts a 1.58-year increase in maximum longevity. The correlation for
the relationship between these two variables is 0.77. How much influence do the
oddballs have on these summaries?
Points surrounded by E
white space might have 70
Maximum Longevity
strong influence. 60 H
B
50
40
30
20
10
0 10 20 30 40
Average Longevity
Display 3.62 Maximum longevity versus average longevity.
Solution
The hippo has the effect of pulling the right end of the regression line downward
(like putting a heavy weight on one end of a seesaw), as you can see in Display 3.63.
When the hippo is removed, that end of the regression line will “spring upward”
and the slope will increase. Because one large residual has been removed and many
of the remaining residuals have been reduced in size, the correlation will increase.
The new slope is 1.96, and the new correlation is 0.80. The hippo has considerable
influence on the slope and some influence on the correlation.
Now envision the scatterplot with just the elephant, E, missing. Because E
is close to the straight line fit to the data, it produces a small residual. Thus, you
would expect that removing E should not change the slope of the regression line
much (not nearly as much as removing H did) and should reduce the correlation
just a bit. In fact, the correlation does decrease some, to 0.72 from 0.77. However,
the new slope is 1.53. It turns out that removing the elephant gives the hippo even
more influence, and the slope decreases.
Maximum_Longevity
60 H
B
50
40
30
20
10
5 10
15 20 25 30 35 40 45
Average_Longevity
Maximum_Longevity = 1.58Average_Longevity + 10.5 ; r2 = 0.59
60
B
50
40
30
20
10
5 1015 20 25 30 35 40
Average_Longevity
Maximum_Longevity = 1.96Average_Longevity + 6.3 ; r2 = 0.64
Maximum_Longevity = 1.58 Average_Longevity + 10.5
3.4 Diagnostics: Looking for Features That the Summaries Miss 165
With a little practice, you often can anticipate the influence of certain points
in a scatterplot, as in the previous example, but it is difficult to state general rules.
The best rule is the one given in the box on page 163: Fit the line with and without
the questionable point and see what happens. Then report all the results, with
appropriate explanations.
III. IV.
14 14
12 12
y3 10 y4 10
8 8
6 6
4 4
4 8 12 16 4 8 12 16 20
x3 x4
Display 3.64 Four regression data sets invented by Francis J.
Anscombe. [Source: Francis J. Anscombe, “Graphs in Statistical
Analysis,” American Statistician 27 (1973): 17–21.]
D26. For each plot in Display 3.64, first give a short verbal description of the
pattern in the plot. Then
a. either fit a line by eye and estimate its slope or tell why you think a line is
not a good summary
b. either estimate the correlation by eye or tell why you think a correlation
is not an appropriate summary
D27. Display 3.65 shows a computer output for one of the four Anscombe data set
plots. Can you tell which one? If so, tell how you know. If not, explain why
you can’t tell.
3.4 Diagnostics: Looking for Features That the Summaries Miss 167
Residual plots may A special kind of scatterplot, called a residual plot, often can help you see
uncover more more clearly what’s going on. For some data sets, a residual plot can even show
detailed patterns. you patterns you might otherwise have overlooked completely. Statisticians use
residual plots the way a doctor uses a microscope or an X ray—to get a better
look at less obvious aspects of a situation. (Plots you use in this way are called
“diagnostic plots” because of the parallel with medical diagnosis.) Push the
analogy just a little. You’re the doctor, and data sets are your patients. Sets with
elliptical clouds of points are the “healthy” ones; they don’t need special attention.
82
80
78
76
74
72
70
68
66
4 5 6 7 8
Mishandled Baggage
(per thousand passengers)
Display 3.67 Scatterplot of airline data.
The calculated residuals are shown in Display 3.68, with the list of carriers
ordered from smallest to largest on the x-scale. This allows the size of the residuals
in the far right column to appear in the same order as in Display 3.67. Alaska
produces a negative residual of modest size, whereas US Airways produces a large
positive residual.
The residual plot, Display 3.69, is simply a scatterplot of the residuals versus
the original x-variable, mishandled baggage. Note that 0 is at the middle of the
residuals on the vertical scale.
8
6
4
Residual 2
0
2
4
6
8
10
4.0 5.0 6.0 7.0 8.0
Mishandled Baggage
(per thousand passengers)
Display 3.69 Residual plot for the airline data.
The residual plot shows nearly random scatter, with no obvious trends. This
is the ideal shape for a residual plot, because it indicates that a straight line is a
reasonable model for the trend in the original data. [You can use your calculator to
create residual plots. See Calculator Note 3I.]
3.4 Diagnostics: Looking for Features That the Summaries Miss 169
D30. To see how residual plots magnify departures from the regression line,
compare the Anscombe plots in Display 3.64 with Display 3.70, which shows
the four corresponding residual plots in scrambled order.
A. B.
2 2
Residual
1 1
Residual
0 0
–1 –1
–2 –2
4 6 8 10 12 14 8 13 18
x x
C. D.
3.5
1.0 2.5
Residual
Residual
0.0 1.5
0.5
–1.0 –0.5
–2.0 –1.5
4 6 8 10 12 14 3 6 9 12 15
x x
Display 3.70 Residual plots for the four Anscombe data sets.
a. Match each of the original scatterplots in Display 3.64 with its
corresponding residual plot in Display 3.70.
b. Describe the overall difference between the original scatterplots and
the residual plots. What do the scatterplots show that the residual plots
don’t? What do the residual plots show that the scatterplots don’t?
Residual plots Use residual plots to check for systematic departures from constant slope
sometimes yield (linear trend) and constant strength (same vertical spread). Look in particular
surprises. for plots that are curved or fan-shaped. It’s true that for data sets with only one
predictor value (like those in this chapter), you often can get a good idea of what
the residual plot will look like by carefully inspecting the original scatterplot.
Once in a while, however, you get a surprise.
Solution
The residual plot, shown in Display 3.72, quite dramatically reveals that the trend
is not as linear as first imagined. The curvature in the residual plot mimics the
curvature in the original scatterplot, which is harder to see. A line is not a good
model for these data.
1.50
1.00
0.50
Residual
0.00
–0.50
–1.00
–1.50
–2.00
0 3 6 9 12 15
Age
Display 3.72 Residual plot of median height versus age for
young girls.
■
Residuals sometimes Statistical software often plots residuals against the predicted values, ŷ, rather
are plotted against the than against the predictor values, x. For simple linear regression, both plots have
predicted values, ŷ. exactly the same shape as long as the slope of the regression line is positive.
3.4 Diagnostics: Looking for Features That the Summaries Miss 171
get from graphing calculators. The other plots residuals versus predicted
(fitted) y-values, or ŷ, the sort of plot you get from computer software
packages. Explain how the residual plots were produced and how you can
tell which residual plot is which. The equation of the least squares line is
ŷ 0.5 0.5x.
Residual
Residual
y
0.5 –0.4 –0.4
Practice
Which Points Have the Influence? a. Construct a scatterplot suitable for
P22. The data in Display 3.74 show some predicting international sales from
interesting patterns in the relationship domestic sales. Describe the pattern in
between domestic and international gross the data.
income from the ten movies with the highest
domestic gross ticket sales.
Residual Plots
P24. For the set of (x, y) pairs (0, 0), (0, 1), (1, 1),
and (3, 2), the equation of the least squares
line is y 0.5 0.5x.
a. Plot the data and graph the least squares
line.
b. Next complete a table for the predicted
values and residuals, like the table in
Display 3.68 on page 169.
c. Using the values in your table, plot
residuals versus predictor, x.
d. How does the residual plot differ from the
scatterplot?
3.4 Diagnostics: Looking for Features That the Summaries Miss 173
P25. Display 3.76 shows four scatterplots ii. unequal variation in the responses
(A–D) for the data from a sample of iii. a curve with decreasing slope
commercial aircraft. Display 3.77 shows four
iv. two linear patterns with different
corresponding residual plots (I–IV).
slopes
a. Match the residual plots to the
c. For one of the plots, two line segments
scatterplots.
joined together seem to give a better fit
b. Using scatterplots A–D as examples, than either a single line or a curve. Which
describe how you can identify each of plot is this? Is this pattern easier to see in
these in a scatterplot from the residual the original scatterplot or in the residual
plot. plot?
i. a curve with increasing slope
A. B.
4000 4000
3000 3000
2000 2000
1000 1000
0
0
100 200 300 400 200 300 400 500
Number of Seats Speed (mi/h)
C. D.
Cost per Seat per Mile (¢)
Flight Length (mi)
4000
18
3000
2000 12
1000 6
0 100 200 300 400 0
Number of Seats 1000 2000 3000 4000
Flight Length (mi)
I. II.
Residual
Residual
0 0
x x
III. IV.
Residual
Residual
0
0
x x
Display 3.77 Four residual plots corresponding to the scatterplots in Display 3.71.
1.5
1.0
0.5
0.0
0.00 0.02 0.04
0.06 0.08 0.10 0.12 0.14
ArsWater
ArsToenails = 13.0ArsWater + 0.16; r2 = 0.80
3.4 Diagnostics: Looking for Features That the Summaries Miss 175
this point? Perform the calculations to see College Online
if your intuition is correct. Bookstore Bookstore
Type of Textbook Price ($) Price ($)
b. Find a point that you think has almost no
Chemistry 93.40 94.18
influence on the slope and correlation.
Perform the calculations to see if your Classic Fiction 9.95 7.96
intuition is correct. English Anthology 46.70 48.75
5 2 —?— —?—
40
14 8 —?— —?—
19 11 —?— —?— 20
6 4 —?— —?—
10 13 —?— —?— 0
20 40 60 80 100
6 1 —?— —?— College Bookstore Price ($)
11 8 —?— —?— Display 3.81 Prices for a sample of textbooks at
3 0 —?— —?— a college bookstore and an online
bookstore.
Display 3.80 Coded bacteria colony counts before
(x) and after (y) treatment. [Source: b. Construct a residual plot. Interpret it and
Snedecor and Cochran, Statistical Methods (Iowa point out any interesting features.
State University Press, 1967), p. 422.]
c. In comparing the prices of the textbooks,
a. Plot the data, fit a regression line to them, you might be more interested in a
and complete a copy of the table, filling in different line: y x. Draw this line on a
the predicted values and residuals. copy of the scatterplot in Display 3.81.
b. Plot the residuals versus x, the count What does it mean if a point lies above
before the treatment. Comment on the this line? Below it? On it?
pattern. d. A boxplot of the differences
c. Use the residual plot to determine for college price online price is shown
which skin sample the disinfectant was in Display 3.82. Interpret this boxplot.
unusually effective and for which skin
sample it was not very effective.
E46. Textbook prices. Display 3.81 compares
–20 –15 –10 –5 0 5 10 15 20 25
recent prices at a college bookstore to those Price Difference
of a large online bookstore.
Display 3.82 A boxplot of the differences between
a. The equation of the regression line is
the college price and the online price
online 3.57 1.03 college. Interpret for various textbooks.
this equation in terms of textbook prices.
3.4 Diagnostics: Looking for Features That the Summaries Miss 177
E50. Can either of the plots in Display 3.86 be a E53. Can you recapture the scatterplot from the
residual plot? Explain your reasoning. residual plot? The residual plot in Display
A. B. 3.88 was calculated from data showing the
0.6 recommended weight (in pounds) for men at
10 0.2 various heights over 64 in. The fitted weights
5 y
–0.2 ranged from 145 lb to 187 lb. Make a rough
y 0 –0.6 sketch of the scatterplot of these data.
–5 –1.0
–10 –1.4 3
0 2 4 6 8 10 12 5 6 7 8 9 10 11
x x 2
Residual
E51. Display 3.87 gives the data set for the three
0
passenger jets from the example on page 123,
along with a scatterplot showing the least –1
squares line. (Values have been rounded.)
a. Use the equation of the line to find –2
predicted values and residuals to 64 68 72 76
complete the table in Display 3.87. Height (in.)
b. Use your numbers from part a to construct Display 3.88 Residuals of recommended weight
two residual plots, one with the predictor, versus height for men.
x, on the horizontal axis and the other with E54. The plot in Display 3.89 shows the residuals
the predicted value, ŷ, on the horizontal resulting from fitting a line to the data for
axis. How do the two plots differ? female life expectancy (life exp) versus gross
Aircraft Seats Cost Predicted Residual national product (GNP, in thousands of
ERJ-145 50 1100 —?— —?— dollars per capita) for a sample of countries
DC-9 100 2100 —?— —?—
from around the world. The regression
equation for the sample data was
MD-90 150 2700 —?— —?—
1400
1200 –4
1000 –8
20 40 60 80 100 120 140 160
Number of Seats –12
Display 3.87 Cost per hour versus number of seats for –16
three models of the passenger aircraft. 0 5 10 15 20 25 30
Per Capita Gross National Product (in thousands of dollars)
E52. Explain why a residual plot of (x, residual)
and a plot of (predicted value, residual) have Display 3.89 Residuals of female life expectancy
exactly the same shape if the slope of the versus gross national product.
regression line is positive. What changes if
the slope is negative?
178 Chapter 3 Relationships Between Two Quantitative Variables
3.5 Shape-Changing Transformations
For scatterplots in which the points form an elliptical cloud, the regression line
and correlation tell you pretty much all you need to know. But data don’t always
behave so obligingly. For plots in which the points are curved, fan out, or contain
outliers, the usual summaries do not tell you everything and can actually be
misleading. What do you do then? This section shows you one possible remedy:
Transform the data to get the shape you want.
You’re already familiar with linear transformations from Section 2.4—things
like changing temperatures from degrees Fahrenheit to degrees Celsius or
changing distances from feet to inches or times from minutes to seconds. These
linear transformations—adding or subtracting a constant or multiplying or
dividing by a constant—can change the center and spread of the distribution
without changing its basic shape.
Transforming data is Nonlinear transformations, such as squaring each value or taking logarithms,
sometimes called do change the basic shape of the plot. This section shows how a transformation of
“re-expressing data.” a measurement scale can lead to simplified statistical analyses. One of the most
common nonlinear relationships is the exponential, and that is where we begin
our discussion.
10
0
–4 –2 2 4
–5
x
Display 3.90 The exponential functions y 2x and y _12 .
50
25
0
4 6 82
Toss Number
Display 3.91 Number of heads versus number of the toss.
Does a log transformation appear to be appropriate here? The pattern looks
much like the left-hand curve in Display 3.90, and the values for the number of
heads are clustered at the smaller values but range over two orders of magnitude.
In addition, the number of heads remaining after each toss of the coins is
roughly proportional to the number of coins tossed. A log transformation is
worth a try. Display 3.92 shows the natural log (base e) of the number of heads
plotted against the toss number, along with the regression line, for the data of
Display 3.91.
Compare the scatterplot in Display 3.92 to the residual plot in Display 3.93.
(The line segments are added to help your eye follow the time sequence.) Does
the model appear to fit well if you look only at the scatterplot? How, if at all,
does the residual plot alter your judgment of how well the line fits? The cyclical
up-and-down pattern of residuals is common in such time series data.
The equation of the regression line (shown in Display 3.92) for the
transformed data is given by the equation ln y 5.21 0.66x. If you solve this
equation for ŷ, you get ŷ 183.1(0.52)x. The number of copper flippers that are
alive each day is about 52% of the number alive the previous day. In other words,
the decay rate is estimated to be 48%, or 0.48.
4.5
ln(heads)
3.5
2.5
1.5
0.5
1 2 3 4 5 6
Toss Number
Display 3.92 A plot of ln(heads) versus the number of the toss,
with the regression line.
0.250
0.125
Residual
0.000
–0.125
–0.250
–0.375
1 2 3 4 5 6
Toss Number
Display 3.93 Residual plot for the ln(heads) regression.
80
40
20
0
1760 1840 1920 2000
Year
Display 3.94 Population density of the United States over the
census years. [Source: U.S. Census Bureau, Statistical Abstract of
the United States, 2004–2005, p. 8.]
Transform and Plot Again. You can see the transformed points in Display
3.95. For example, for the year 1800, the point (1800, ln 6.1) or (1800, 1.808) is
plotted.
Fit. Although there is still some curvature, the pattern is much more nearly
linear, and a straight line might be a reasonable model to fit these data. The
regression line and regression analysis also are shown in Display 3.95 (on the
next page).
e ln y e25.1180.0148x
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 19.502 19.502 733.64 0.000
Error 20 0.532 0.027
Total 21 20.034
4.50
ln(population density)
3.75
3.00
2.25
1.50
0.75
1760 1840 1920 2000
Year
Display 3.95 ln (population density) versus year, with regression
line and computer output.
Residuals. In the case of data over time, it is often advantageous to plot the
residuals over time, as in Display 3.96.
0.20
0.10
Residual
0.00
–0.10
–0.20
–0.30
1760 1840 1920 2000
Year
Display 3.96 Residual plot of ln (population density) versus year.
The result is not exactly random scatter! Well, it is about the best you can do.
The problem is that there are subtle patterns in the data—and in the residuals—
that no simple model will adequately account for.
4000
495 1682 403 542
3000
525 3515 442 904
509 3559 396 465 2000
515 2485 339 175 1000
460 947 407 576 0
472 1309 387 496 200 300 400 500
497 2122 398 587 Speed (mi/h)
464 1175 387 313
487 1987 360 343
454 1094 397 486
454 1035 357 382
446 886 230 202
430 644
Display 3.97 Data table and scatterplot for the flight length and
speed of various aircraft. [Source: Air Transport Association
of America, 2005, www.air-transport.org.]
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 19.058 19.058 263.09 0.000
Error 31 2.246 0.072
Total 32 21.303
Display 3.98 ln(flight length) versus speed.
Although the pattern of points appears much more linear and the fit looks
pretty good, the lack of randomness in the residual plot in Display 3.99 indicates
that a linear model still does not really fit the points. D38 and E58 will give you a
chance to continue the detective work.
1.2
0.8
Residual
0.4
0.0
–0.4
–0.8
200 250 300 350 400 450 500 550
Speed (mi/h)
Display 3.99 Residual plot for ln(flight length) versus speed.
■
A scatterplot of the data (Display 3.101, on the next page) shows that the
relationship is not linear. On thinking carefully about an appropriate model, you
might realize that length is a linear measure while weight is related to volume—a
cubic measure. So perhaps weight is related to the cube of length, or some power
close to that. That is, the relationship is of the form weight a length3, where a
is constant.
3.5 Shape-Changing Transformations 187
Alligators Scatter Plot
700
600
500
Weight
400
300
200
100
0
0 20 40 60 80 100 120 140 160
Length
y ax b
as the underlying model. The points can be “linearized” (straightened) by
taking the logarithm (base 10 or base e) of both the values of x and the values
of y. The result will be a linear equation of the form
Thus, if ln(weight) is plotted against ln(length), the plot should be fairly linear
and the slope of the least squares line will provide an estimate of the power, b. The
result of this transformation is shown in Display 3.102. The regression equation is
ln(weight) 3.29 ln(length) 10.2.
The plot does indeed look linear, and the estimate of b is 3.29. (Natural logs
are used here, but logs to base 10 will produce essentially the same results.) The
biologist can use this model to predict ln(weight) from ln(length) and then change
the predicted value back to the original scale, if he or she chooses.
Note that the residual plot still shows a bit of curvature, mainly because the
three largest alligators are somewhat influential. But the offsetting advantage
of the power model is that the residuals are fairly homogeneous; that is, they
don’t tend to grow or shrink as length increases. This means that the error in
the prediction of weight will be relatively constant for alligators of all lengths.
The exponential model also fits these data well, but the residuals then lose their
homogeneity with no substantial decrease in their size.
Ultimately, a model should be selected based on its intended use. These
biologists wanted a model that predicts weight well for all reasonable values of
length, not just for large alligators. Furthermore, a model should make sense to
the experts in the field of use. The biologists could understand why weight (or
volume) should have a cubic relationship with length but could see no reason why
weight should grow exponentially with length.
InWeight
5.0
4.5
4.0
3.5
3.0
4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0
lnLength
0.4
Residual
0.0
-0.4
4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0
lnLength
lnWeight = 3.29lnLength - 10.2; r2 = 0.94
ln(weight) 4.951
Diameter (in.)
6
5
4
3
2
1
0
5 10 15 20 25 30 35 40 45
Age (yr)
Display 3.104 Diameter versus age of oak trees, with regression
line.
2
1
Residual 0
–1
–2
–3
0 5 10 15 20 25 30 35 40 45
Age (yr)
Display 3.105 Residual plot for the oak tree data.
The regression equation is
Diameter = 1.15 + 0.163 AGE
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 97.593 97.593 93.63 0.000
Error 25 26.059 1.042
Total 26 123.652
Display 3.106 Numerical summary in the form of computer
output of the ages and diameters (inches at chest
height) of a sample of oak trees.
Solution
Inspection reveals that the point cloud is roughly elliptical with a slight downward
curvature. Straightening this curve will require expanding the y-scale. You can
formulate a power transformation by thinking carefully about the practical
Diameter2
40
30
20
10
0
0 5 10 15 20 25 30 35 40 45
Age (yr)
Display 3.107 Diameter squared versus age for the oak tree data,
with the regression line.
20
10
Residual
0
–10
–20
–30
0 10 20 30 40 50 60
Fitted y-value
Display 3.108 Residual plot for diameter squared versus age.
■
Power transformations like the ones you have just seen can straighten a curved
plot (or change a fan shape to a more nearly oval shape). By choosing the right
power, you often can take a data set for which a fitted straight line and correlation
are not suitable and convert it into one for which those summaries work well.
Measurement scales are Is it cheating to change the shape of your data? (“You wanted a linear cloud,
selected by the user, so but you got a curved wedge. You didn’t like that, so you fiddled with the data
select one that meets until you got what you wanted.”) In fact, as you’ll see, changing scale is a matter
your needs. of re-expressing the same data, not replacing the data with entirely new facts.
The intelligent measurer selects a scale that is most useful for the problem the
measurements were taken to solve.
[You can use your calculator to perform all the transformations you’ve learned
about in this section. See Calculator Note 3J.]
6000
Brain Weight (g)
5000
4000
3000
2000
1000
0
2000 4000 6000
Body Weight (kg)
Display 3.112 The brain weights and body weights of mammals. [Source: T. Allison and
D. V. Cicchetti, “Sleep in Mammals: Ecological and Constitutional Correlates,” Science 194
(1976): 732–34.]
E58. Cost per seat per mile and flight length, Display 3.114 Hunting party size and percentage
revisited. As you saw in P25 on page 174, of success. [Sources: Mathematics Teacher,
when cost per seat per mile is plotted against August 2005, p. 13; C. B. Stanford, “Chimpanzee
Hunting Behavior and Human Evolution,”
flight length, the pattern is not linear. The American Scientist 83 (1995).]
residual plot in Display 3.77 on page 174
strongly suggests that two line segments a. Plot the data in a way that allows the
might provide a better model than a single building of a model to predict success
line. Apparently, there is one relationship from size of hunting party. Describe the
for aircraft meant for longer routes and pattern you see.
another for aircraft meant for shorter routes.
Review Exercises
E67. Leonardo’s rules. A class of 15 students a. Construct scatterplots and fit least
recorded the measurements in Display 3.122 squares lines for each of Leonardo’s rules
for Activity 3.3a. in Activity 3.3a. Do the rules appear to
Arm Kneeling Hand
hold?
Student Height Span Height Length b. Interpret the slopes of your regression
1 170.5 168.0 126.0 18.0 lines.
2 170.0 172.0 129.5 18.0 c. If appropriate, find the value of r for
3 107.0 101.0 79.5 10.0 each of the three relationships. Which
4 159.0 161.0 116.0 16.0
correlation is strongest? Which is
weakest?
5 166.0 166.0 122.0 18.0
6 175.0 174.0 125.0 19.5
E68. Space Shuttle Challenger. On January 28,
1986, because two O-rings did not seal
7 158.0 153.5 116.0 16.0
properly, Space Shuttle Challenger exploded
8 95.5 95.0 71.5 10.0
and seven people died. The temperature
9 132.5 129.0 95.0 11.5 predicted for the morning of the flight was
10 165.0 169.0 124.0 17.0 between 26°F and 29°F. The engineers were
11 179.0 175.0 131.0 20.0 concerned that the cold temperatures would
12 149.0 154.0 109.5 15.5 cause the rubber O-rings to malfunction.
13 143.0 142.0 111.5 16.0 On seven previous flights at least one of the
14 158.0 156.5 119.0 17.5
twelve O-rings had shown some distress.
The NASA officials and engineers who
15 161.0 164.0 121.0 16.5
decided not to delay the flight had available
Display 3.122 Sample measurements, in to them data like those on the scatterplot
centimeters, for Activity 3.3a. in Display 3.123 before they made that
decision.
Residual
90
Exam 2
O-Rings O-Rings
Temperature with Some Temperature with Some 85 0
80
(°F) Distress (°F) Distress 75
–5
53 3 69 0 70 –10
57 1 70 0 50 60 70 80 90 100 50 60 70 80 90 100
Exam 1 Exam 1
58 1 70 0
63 1 72 0 Display 3.125 Data for exam scores in a statistics
class, with scatterplot and residual
70 1 73 0
plot.
70 1 75 0
75 2 76 0 a. Is there a point that is more influential
66 0 76 0
than the other points on the slope of the
regression line? How can you tell from
67 0 78 0
the scatterplot? From the residual plot?
67 0 79 0
b. How will the slope change if the
67 0 80 0
scores for this one influential point are
68 0 81 0
removed from the data set? How will the
Display 3.124 Challenger O-ring data. [Source: correlation change? Calculate the slope
Siddhartha R. Dalal et al., “Risk Analysis of and correlation for the revised data to
the Space Shuttle: Pre-Challenger Prediction check your estimate.
of Failure,” Journal of the American Statistical
Association 84 (1989): 945–47.]
A B C D E F 2
A
4 2 2 0 6 0 –2
3 1 0 2 5 1 1
B
–1
2 0 0 0 5 0
1 1 0 2 5 1 1
C
–1
0 2 2 0 4 0
1
0 2 1 1 4 0 D
–1
1 1 1 1 5 1
3
2 0 1 1 5 0 –3
E
3 1 1 1 5 1
0.5
4 2 0 0 6 0 F
–0.5
–2 2 –1 1 –1 1 –1 1 –3 3 –0.5 0.5
Display 3.126 Data table for 6 variables and a “scatterplot matrix” of all 30 possible
scatterplots for the variables.
Display 3.128 Number of police officers and related variables. [Source: U.S. Census Bureau,
Statistical Abstract of the United States, 2004–2005.]
Display 3.129 Selling prices of houses in Gainesville, Florida. [Source: Gainesville Board of
Realtors, 1995.]
E83. Spending for schools. Display 3.130 provides a. Examine the association between per-
data on spending and other variables related pupil expenditure and average teacher
to public school education for 2001. The salary, with the goal of predicting per-
variables are defined as pupil expenditure. Is this a cause-and-
ExpPP expenditure per pupil (in effect relationship?
dollars) b. Analyze the effect of average teacher
ExpPC expenditure per capita (per salary on per-capita expenditure
person in the state, in dollars) (spending on public schools divided
by the number of people in the state).
TeaSal average teacher salary (in
Compare the association to the
thousands of dollars)
association in part a. Are the relative
%Dropout percentage who drop out of sizes of the correlations about what you
school would expect?
Enroll number of students enrolled c. Are any variables good predictors of the
(in thousands) percentage of dropouts? Explain your
Teachers number of teachers (in reasoning.
thousands)
State ExpPP ExpPC TeaSal %Dropout Enroll Teachers
Alabama 6669 1097 38.2 4.1 737 46.5
Alaska 10366 2165 49.7 8.2 134 8.1
Arizona 6547 1109 40.9 10.9 922 45.1
Arkansas 7080 1177 37.8 5.3 450 31.8
California 8442 1507 56.3 n/a 6249 309.8
Colorado 9092 1499 42.7 n/a 742 45.4
Connecticut 12605 2078 55.4 3 570 43.2
Delaware 11776 1681 50.8 4.2 116 7.7
Florida 8192 1227 40.3 4.4 2500 141
Georgia 9727 1675 45.5 7.2 1471 95.9
Hawaii 8092 1207 44.5 5.7 185 11.2
Idaho 6883 1266 40.1 5.6 247 13.8
Illinois 11371 1871 51.5 6 2071 133.7
Indiana 10131 1639 45 n/a 996 59.5
Display 3.130 Public school education by state in 2001. [Source: U.S. Census Bureau, Statistical
Abstract of the United States, 2004–2005.]
AP1. This scatterplot shows the age in years AP3. In a linear regression of the heights of a
of the oldest and youngest child in 116 group of trees versus their circumferences,
households with two or more children age the pattern of residuals is U-shaped. Which
18 or younger living with their parents. of the following must be true?
(Some points have been moved slightly to I. A nonlinear regression would be a
show that multiple households are at each better model.
coordinate.) Which of the following is not a
II. For trees near the middle of the range
reasonable interpretation of this scatterplot?
of tree circumferences studied, the
16 predicted tree height tends to be too tall.
14 III. r will be close to 0.
Age of Youngest Child
b. The variable skinfold is the sum of a 44.33 1.001 228.0 23.38 1.046 152.0
number of skinfold thicknesses taken 20.74 1.052 102.0 9.09 1.078 110.0
at various places on the body. (The 38.32 1.014 248.5 10.77 1.074 100.5
units are millimeters.) The skinfold 9.91 1.076 73.0 4.58 1.089 72.0
measurements are used to predict 15.81 1.063 92.5 21.93 1.049 219.5
body density. Find a good model 34.02 1.023 144.5 3.82 1.091 85.5
for predicting density from skinfold
23.15 1.046 86.5
measurements based on these data
for women. Do your models require Display 3.132 Percentage of body fat, body density,
re-expression? and skinfold for 15 women and
14 men. [Source: M. L. Pollock, University
of Florida, 1956.]
4 Sample Surveys
and Experiments
What prompts a
hamster to prepare
for hibernation? A
student designed an
experiment to see
whether the number
of hours of light in
a day affects the
concentration of a key
brain enzyme.
Most of what you’ve done in Chapters 2 and 3, as well as in the first part of
Chapter 1, is part of data exploration—ways to uncover, display, and describe
patterns in data. Methods of exploration can help you look for patterns in just
about any set of data, but they can’t take you beyond the data in hand. With
exploration, what you see is all you get. Often, that’s not enough.
Pollster: I asked a hundred likely voters who they planned to vote for, and
fifty-two of them said they’d vote for you.
Politician: Does that mean I’ll win the election?
Pollster: Sorry, I can’t tell you. My stat course hasn’t gotten to inference yet.
Politician: What’s inference?
Pollster: Drawing conclusions based on your data. I can tell you about the
hundred people I actually talked to, but I don’t yet know how to
use that information to tell you about all the likely voters.
Methods of inference can take you beyond the data you actually have, but
only if your numbers come from the right kind of process. If you want to use
100 likely voters to tell you about all likely voters, how you choose those 100
voters is crucial. The quality of your inference depends on the quality of your
data; in other words, bad data lead to bad conclusions. This chapter tells you
how to gather data through surveys and experiments in ways that make sound
conclusions possible.
Here’s a simple example.1 When you taste a spoonful of chicken soup and
decide it doesn’t taste salty enough, that’s exploratory analysis: You’ve found a
pattern in your one spoonful of soup. If you generalize and conclude that the
whole pot of soup needs salt, that’s an inference. To know whether your inference
is valid, you have to know how your one spoonful—the data—was taken from
the pot. If a lot of salt is sitting on the bottom, soup from the surface won’t be
representative, and you’ll end up with an incorrect inference. If you stir the soup
thoroughly before you taste, your spoonful of data will more likely represent the
whole pot. Sound methods for producing data are the statistician’s way of making
sure the soup gets stirred so that a single spoonful—the sample—can tell you
about the whole pot. Instead of using a spoon, the statistician relies on a chance
device to do the stirring and on probability theory to make the inference.
Soup tasting illustrates one kind of question you can answer using statistical
methods: Can I generalize from a small sample (the spoonful) to a larger
population (the whole pot of soup)? To use a sample for inference about a
population, you must randomize, that is, use chance to determine who or what
gets into your sample.
1
The inspiration for this metaphor came from Gudmund Iversen, who teaches statistics at Swarthmore College.
The other kind of question is about comparison and cause. For example,
if people eat chicken soup when they get a cold, will this cause the cold to go
away more quickly? When designing an experiment to determine if a pattern in
the data is due to cause and effect, you also must randomize. That is, you must
use chance to determine which subjects get which treatments. To answer the
question about chicken soup, you would use chance to decide which of your
subjects eat chicken soup and which don’t, and then compare the duration of
their colds.
The first part of this chapter is about designing surveys. A well-designed
survey enables you to make inferences about a population by looking at a sample
from that population. The second part of the chapter introduces experiments. An
experiment enables you to determine cause by comparing the effects of two or
more treatments.
Day
1 2 3 4 5 6 7 8 9 10 11 12
Bed 1
Bed 2
Bed 3
Bed 4
Bed 5
Day
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Bed 1
Bed 2
Bed 3
Bed 4
Bed 5
Display 4.3 Lengths of stay for the patients during the first 20 days.
(continued)
Sampling bias lies in There’s an important distinction here between the sample itself and the
the method, not in the method used for choosing the sample.
sample.
Investigator: What makes a good sample?
Statistician: A good sample is representative. That is, it looks like a small
version of the population. Proportions you compute from the
sample are close to the corresponding proportions you would
get if you used the whole population. The same is true for other
numerical summaries, such as averages and standard deviations
or medians and IQRs.
Investigator: How can I tell if my sample is representative?
Statistician: There’s the rub. In practice, you can’t. You can tell only by
comparing your sample with the population, and if you know
that much about the population, why bother to take a sample?
DISCUSSION Bias
D2. Explain the difference between “nonrepresentative” and “biased” as these
terms pertain to sampling.
D3. Which statements describe an event that is possible? Which describe an
event that is impossible?
A. A representative sample results from a biased sample-selection method.
B. A nonrepresentative sample results from a biased sample-selection
method.
C. A representative sample results from an unbiased sample-selection
method.
D. A nonrepresentative sample results from an unbiased sample-selection
method.
When a television or radio program asks people to call in and take sides on
some issue, those who care about the issue will be overrepresented, and those
who don’t care as much might not be represented at all. The resulting bias from
Voluntary response
such a volunteer sample is called voluntary response bias and is a second type of
bias is another kind of
sampling bias. sample selection bias.
Here’s a simple sampling method: Take whatever’s handy. For example, what
percentage of the students in your graduating class plan to go to work immediately
after graduation? Rather than find a representative sample of your graduating class,
it would be a lot quicker to ask your friends and use them as your sample—quicker
and more convenient, but almost surely biased because your friends are likely to
Convenience sampling is
have somewhat similar plans. A convenience sample is one in which the units
almost surely biased.
chosen are those that are easy to include. The likelihood of bias makes convenience
samples about as worthless as voluntary response samples.
Because voluntary response sampling and convenience sampling tend to
be biased methods, you might be inclined to rely on the judgment of an expert
to choose a sample that he or she considers representative. Such samples, not
surprisingly, are called judgment samples. Unfortunately, though, experts might
Judgment sampling,
overlook important features of a population. In addition, trying to balance
even when taken by
experts, is usually biased. several features at once can be almost impossibly complicated. In the early days
of election polling, local “experts” were hired to sample voters in their locale by
filling certain quotas (so many men, so many women, so many voters over the
age of 40, so many employed workers, and so on). The poll takers used their own
judgment as to whom they selected for the poll. It took a very close election (the
1948 presidential election, in which polls were wrong in their prediction) for the
polling organizations to realize that quota sampling was a biased method.
An unbiased sampling method requires that all units in the population have
a known chance of being chosen, so you must prepare a “list” of population
units, called a sampling frame or, more simply, frame, from which you select
The quality of your
the sample. If you think about enough real examples, you’ll come to see that
sample depends on
having a good sampling making this list is not something you can take for granted. For the Westvaco
frame. employees in Chapter 1 or for the 50 U.S. states, creating the list is not hard, but
other populations can pose problems. How would you list all the people using the
Internet worldwide or all the ants in Central Park or all the potato chips produced
in the United States over a year? For all practical purposes, you can’t. There will
often be a difference between the population—the set of units you want to know
Response Bias
Bias doesn’t always In all the examples so far, bias has come from the method of taking the sample.
come from the sampling Unfortunately, bias from other sources can contaminate data even from well-
method. chosen sampling units.
Perhaps the worst case of faulty data is no data at all. It isn’t uncommon
for 40% of the people contacted to refuse to answer a survey. These people
might be different from those who agree to participate. An example of this
Nonresponse bias can nonresponse bias came from a controversial study that found that left-handers
occur when people do died, on average, about 9 years earlier than right-handers. The investigators sent
not respond to surveys. questionnaires to the families of everyone listed on the death certificates in two
counties near Los Angeles asking about the handedness of the person who had
died. One critic noted that only half the questionnaires were returned. Did that
change the results? Perhaps. [Source: “Left-Handers Die Younger, Study Finds,” Los Angeles Times,
April 4, 1991.]
Nonresponse bias, like bias that comes from the sampling method, arises
Questionnaire bias from who replies. Questionnaire bias arises from how you ask the questions.
Incorrect response Another problem polls and surveys have is trying to ensure that people tell
the truth. Often, the people being interviewed want to be agreeable and tend to
respond in the way they think the interviewer wants them to respond. Newspaper
columnist Dave Barry reported that he was called by Arbitron, an organization
that compiles television ratings. Dave reports:
So I figured the least I could do, for television, was be an Arbitron household.
This involves two major responsibilities:
1. Keeping track of what you watch on TV.
2. Lying about it.
At least that’s what I did. I imagine most people do. Because let’s face it:
Just because you watch a certain show on television doesn’t mean you want
to admit it. [Source: Dave Barry, Dave Barry Talks Back, copyright © 1991 by Dave Barry. Used by
permission of Crown Publishers, a division of Random House, Inc.]
Bias from incorrect responses might be the result of intentional lying, but it
is more likely to come from inaccurate measuring devices, including inaccurate
memories of people being interviewed in self-reported data. Patients in medical
studies are prone to overstate how well they have followed the physician’s orders,
just as many people are prone to understate the amount of time they actually
spend watching TV. Measuring the heights of students with a meterstick that
has one end worn off leads to a measurement bias, as does weighing people on a
bathroom scale that is adjusted to read on the light side.
Practice
Census Versus Sample d. Which person had shots that were
P1. You want to estimate the average number of unbiased and had high variability?
TV sets per household in your community. e. Do you think it would be easiest to help
a. What is the population? What are the Al, Cal, or Dal improve?
units?
Sample Selection Bias
b. Explain the advantages of sampling over
P3. Describe the type of sample selection
conducting a census.
bias that would result from each of these
c. What problems do you see in carrying sampling methods.
out this sample survey?
a. A county official wants to estimate the
Bias average size of farms in a county in Iowa.
He repeatedly selects a latitude and
P2. Four people practicing shooting a bow and
longitude in the county at random and
arrow made these patterns on their targets.
places the farms at those coordinates in
Al Bal Cal Dal his sample. If something other than a
farm is at the coordinates, he generates
another set of coordinates.
b. In a study about whether valedictorians
“succeed big in life,” a professor “traveled
Display 4.4 Results of four archers. across Illinois, attending high school
a. Which person had shots that were biased graduations and selecting 81 students to
and had low variability? participate. . . . He picked students from
b. Which person had shots that were biased the most diverse communities possible,
and had high variability? from little rural schools to rich suburban
schools near Chicago to city schools.”
c. Which person had shots that were [Source: Michael Ryan, “Do Valedictorians Succeed Big in
unbiased and had low variability? Life?” PARADE, May 17, 1998, pp. 14–15.]
Exercises
E1. Suppose you want to estimate the percentage E2. A wholesale food distributor has
of U.S. households have children under commissioned a sample survey to estimate
the age of 5 living at home. Each weekday the satisfaction level of his customers, who
from 9 a.m. to 5 p.m., your poll takers call are the owners of small restaurants. The
households in your sample. Every time they sampling firm takes a random sample from
reach a person in one of the homes, they the current list of customers, develops a
ask, “Do you have children under the age of satisfaction questionnaire, and sends field
5 living in your household?” Eventually you workers out to interview the owners of
give up on the households that cannot be the sampled restaurants. The field workers
reached. realize that the time does not allow them
a. Will your estimate of the percentage of to interview all the owners selected for the
U.S. households with children under the sample, so if the owner is busy when the field
age of 5 probably be too low, too high, or worker arrives, the worker moves on to the
about right? next business.
b. How does this example help explain why a. What is the population? The sample?
poll takers are likely to call at dinnertime? b. What kind of bias does this survey have?
c. Is this a case of sampling bias or of c. What kind of restaurants would you
response bias? expect to be underrepresented in the
sample?
4.1 Why Take Samples, and How Not To 229
E3. Suppose you want to estimate the average E8. Are people willing to change their driving
response to the question “Do you like habits in the face of higher gasoline prices?
math?” on a scale from 1 (“Not at all”) to 7 At a time of steeply rising gas prices, a
(“Definitely”) for all students in your school. Consumers Union poll of Internet users
You use your statistics class as a sample. who chose to participate showed these
What kind of sample is this? What sort of results. [Source: www.consumersunion.org.]
bias, if any, would be likely? Be as specific as
As a result of volatile gasoline
you can. In particular, explain whether you
prices in recent months, I have
expect the sample average to be higher or
lower than the population average. Made no changes 42%
E4. At a meeting of local Republicans, the Driven less 53%
organizers want to estimate how well their Carpooled 2%
party’s candidate will do in their district
in the next race for Congress. They use the Relied more on mass
transportation 3%
people present at their meeting as their
sample. What kind of sample is this? What Total Votes: 139
bias do you expect?
a. What kind of sample is this?
E5. You want to estimate the average number of
states that people living in the United States b. Do you trust the results of this survey?
have visited. If you asked only people at least Why or why not?
40 years old, would you expect the estimate c. What percentage of U.S. drivers do you
to be too high or too low? What bias might think drive less as a result of higher
you expect if you take your sample only from gasoline prices: more than 53%, less than
those living in Rhode Island? 53%, or just about 53%? Why do you
E6. For a study on smoking habits, you want to think that?
estimate the proportion of adult males in E9. To estimate the average number of children
the United States who are nonsmokers, who per family in the city where you live, you
are cigarette smokers, and who are pipe or use your statistics class as a convenience
cigar smokers. Tell why it makes more sense sample. You ask each student in the sample
to use a sample than to try to survey every how many children are in his or her family.
individual. What types of bias might show up Do you expect the sample average to be
when you attempt to collect this information? higher or lower than the population average?
E7. “Television today is more offensive than ever, Explain why.
say the overwhelming majority—92%—of E10. Suppose you wish to estimate the average
readers who took part in USA Weekend’s size of English classes on your campus.
third survey measuring attitudes toward Compare the merits of these two sampling
the small screen.” More than 21,600 people methods.
responded to this write-in survey. I. You get a list of all students enrolled in
[Source: USA Weekend, May 16–18, 1997, p. 20.]
English classes, take a random sample of
a. What kind of sample is this? those students, and find out how many
b. Do you trust the results of the survey? students are enrolled in each sampled
Why or why not? student’s English class.
c. What percentage of the entire U.S. II. You get a list of all English classes, take a
television-watching public do you think random sample of those classes, and find
would say that today’s shows are more out how many students are enrolled in
offensive than ever: more than 92%, quite a each sampled class.
bit less than 92%, or just about 92%? Why?
In a simple random In simple random sampling, all possible samples of a given fixed size are
sample of size equally likely. All units have the same chance of belonging to the sample, all
n, each possible possible pairs of units have the same chance of belonging to the sample,
sample of size n has
all possible triples of units have the same chance, and so on.
the same chance of
being selected.
Do you think you can select representative sampling units as well as a random
number table can? Activity 4.2a provides an opportunity for you to test yourself.
17 12 13
18 15
20 22
19 21 23
26 24
25
27 35
33
31
30 34 37
28 29 32 36
38 39
41
40 47
42 45 46 50
43 44 48 49
58
52
57 60
51 56 59
53
54 55 65
61 62 64 66 67
63
69
72
71 75 76 77
70
68 74
78 73 85
82 83
80 81 84
79 90
87
86 88 89
100
91 93
95 96 98 99
92 94 97
Display 4.5 Random rectangles.
Why stratify? Why might you want to stratify a population? Here are the three main reasons:
• Convenience in selecting the sampling units is enhanced. It is easier to sample
in smaller, more compact groups (countries) than in one large group spread
out over a huge area (the world).
• Coverage of each stratum is assured. The company might want to have data
from each country in which it sells products; a simple random sample from
the frame does not guarantee that this would happen.
Stratification reduces • Precision of the results may be improved. That is, stratification tends to give
variability. estimates that are closer to the value for the entire population than does an
SRS. This is the fundamental statistical reason for stratification.
An example might help clarify the last point.
Small Rocks
Large Rocks
0 1 2 3 4 5 6
Diameter (in.)
Display 4.6 Data on rock diameters, population and strata.
SRS
Stratified
0 1 2 3 4 5 6
Sample Mean Diameter (in.)
Display 4.7 Means of simple and stratified random samples of
rock diameters.
Solution
Each method is unbiased because the sample means are centered at the
population mean, about 2.51. However, the means from the stratified random
samples are less variable, tending to lie closer to the population mean. Because it’s
better to have estimates that are closer to the parameter than estimates that tend
to be farther away, stratification wins.
■
Make the strata as From this example, it certainly looks as though the stratification (sieving)
different as possible. pays off in producing estimates of the mean with smaller variation. This will
be true generally if the stratum means are quite different. If the geologist had
decided to stratify based on color and if color was not related to diameter, then
the stratification would have produced little or no gain in the precision of the
estimates. The guiding principle is to choose strata that have very different means,
Allocate units in the
whenever that is possible.
sample proportionally to One good way to choose the relative sample sizes in stratified random
the number of units in sampling is to make them proportional to the stratum sizes (the number of units
the strata. in the stratum). Thus, if all the strata are of the same size, the samples should all
236 Chapter 4 Sample Surveys and Experiments
be of the same size. If a population is known to have 65% women and 35% men,
then a sample of 100 people stratified on gender should contain 65 women and
35 men. If the sample sizes are proportional to the stratum sizes, then the overall
sample mean (the mean calculated from the samples of all the strata mixed
together) will be a good estimate of the population mean. Proportional allocation
is particularly effective in reducing the variation in the sample means if the
stratum standard deviations are about equal. (If the stratum standard deviations
differ greatly, a more effective allocation of samples to strata can be found.)
Cluster Samples
To see how well U.S. 4th graders do on an arithmetic test, you might take a
simple random sample of children enrolled in the 4th grade and give each
child a standardized test. In theory, this is a reasonable plan, but it is not very
practical. For one thing, how would you go about making a complete list of all
the 4th graders in the United States? For another, imagine the work required to
track down each child in your sample and get him or her to take the test. Instead
of taking an SRS of 4th graders, it would be more realistic to take an SRS of
elementary schools and then give the test to all the 4th graders in those schools.
Getting a list of all the elementary schools in the United States is a lot easier than
getting a list of all the individuals enrolled in the 4th grade. Moreover, once you’ve
chosen your sample of elementary schools, it’s a relatively easy organizational
problem to give the test to all 4th graders in those schools. This is an example of
The situation of the 4th graders illustrated the two main reasons for using
cluster samples: You need only a list of clusters rather than a list of individuals,
and for some studies it is much more efficient to gather data on individuals
grouped by clusters than on all individuals one at a time.
Reasons to use a Two-stage cluster samples are useful when it is much easier to list clusters
two-stage cluster sample than individuals but still reasonably easy and sufficient to sample individuals once
the clusters are chosen. Two-stage cluster sampling might sound like stratified
random sampling, but they are different.
• In stratified random sampling, you want to choose strata so that the units
within each stratum don’t vary much from each other. Then you sample from
every stratum.
• In two-stage cluster sampling, you want to choose clusters so that the
variation within each cluster reflects the variation in the population, if
possible. Then you sample from within only some of your clusters.
Cluster Sampling
Exercises
E15. A wholesale food distributor has hired you E17. Haircut prices. You want to take a sample of
to conduct a sample survey to estimate the students in your school in order to estimate
satisfaction level of the businesses he serves, the average amount they spent on their last
which are mainly small grocery stores and haircut. Which sampling method do you
restaurants. A current list of businesses think would work best: a simple random
served by the distributor is available for the sample; a stratified random sample with
selection of sampling units. two strata, males and females; or a stratified
a. If the distributor wants good information random sample with class levels as strata?
from both the small grocery store owners Give your reasoning.
and the restaurant owners, what kind of E18. The Oxford Dictionary of Quotations, 3rd
sampling plan will you use? edition, has about 600 pages of quotations.
b. If the distributor wants information from Describe how you would take a systematic
the customers who frequent the grocery sample of 30 pages to use for estimating the
stores and restaurants he serves, how number of typographical errors per page.
would you design the sampling plan? E19. An early use of sampling methods was in
E16. Cookies. Which brand of chocolate chip crop forecasting, especially in India, where
cookies gives you the most chips per cookie? an accurate forecast of the jute yield in the
For the purpose of this question, take as your 1930s made some of the techniques (and
population all the chocolate chip cookies their inventors) famous. Your job is to
now in the nearest supermarket. Each cookie estimate the total corn yield, right before
is a unit in this population. harvest, for a county with five farms and
a total of 1000 acres planted in corn. How
a. Explain why it would be hard to take an
would you do the sampling?
SRS.
E20. You are called upon to advise a local movie
b. Describe how to take a cluster sample of
theater on designing a sampling plan for a
chocolate chip cookies.
survey of patrons on their attitudes about
c. Describe how you would take a two- recent movies. About 64% of the patrons are
stage cluster sample. What circumstances adults, 30% are teens, and 6% are children.
would make the two-stage cluster sample The theater has the time and money to
better than the cluster sample?
In this case, it is explanation III. The lurking variable—one that lies in the
background and may or may not be apparent at the outset but, once identified,
could explain the pattern between the variables—is the child’s overall size. Bigger
4.3 Experiments and Inference About Cause 243
children have bigger feet, and they drink more milk because they eat and drink
more of everything than do smaller children.
But suppose you think that explanation I is the reason. How can you prove
it? Can you take a bunch of children, give them milk, and then sit and wait to see
if their feet grow? That won’t prove anything, because children’s feet will grow
whether they drink milk or not.
Can you take a bunch of children, randomly divide them into a group that
will drink milk and a group that won’t drink milk, and then sit and wait to see if
the milk-drinking group grows bigger feet? Yes! Such an experiment is just about
the only way to establish cause and effect.
Kelly’s Question: If you reduce the amount of light a hamster gets from
16 hours to 8 hours per day, what happens to the concentration of
NaK ATP-ase in its brain?
Subjects: Kelly’s subjects were eight golden hamsters.
Treatments: There were two treatments: being raised in long (16-hour) days
or short (8-hour) days.
Random assignment of treatments: To make her study a true experiment, Kelly
randomly assigned a day length of 16 hours or 8 hours to each hamster in such a
way that half the hamsters were assigned to be raised under each treatment.
Replication: Each treatment was given to four hamsters.
Response variable: Because Kelly was interested in whether a difference in
the amount of light causes a difference in the enzyme concentration, she
chose the enzyme concentration for each hamster as her response variable.
Results: The resulting measurements of enzyme concentrations (in
milligrams per 100 milliliters) for the eight hamsters were
Short days: 12.500 11.625 18.275 13.225
Long days: 6.625 10.375 9.900 8.800
The characteristics of the plan Kelly used are so important that statisticians
try to reserve the word experiment for studies like hers that answer a question by
comparing the results of treatments assigned to subjects at random.
Conditions aren’t In an observational study, no treatments get assigned to the units by the
randomly assigned experimenter—the conditions of interest are already built into the units being
in an observational studied.
study.
her pain had disappeared. It was reported that her family “now turns first” to
homeopathic medicine. This kind of personal evidence—“It worked for me”—is
called anecdotal evidence.
Anecdotal evidence can be useful in deciding what treatments might be
helpful and so should be tested in an experiment. However, anecdotal evidence
cannot prove, for example, that calcium carbonate causes abdominal pain to
disappear. Why not? After all, the pain did go away.
The problem is that pain often tends to go away anyway, especially when
a person thinks he or she is receiving good care. When people believe they are
receiving special treatment, they tend to do better. In medicine, this is called the
placebo effect.
A placebo is a fake treatment, something that looks like a treatment to the
patient but actually contains no medicine. Carefully conducted studies show
that a large percentage of people who get placebos, but don’t know it, report
that their symptoms have improved. The percentage depends on the patient’s
problem but typically is over 30%. For example, when people are told they are
being treated for their pain, even if they are receiving a placebo, changes in their
brain cause natural painkilling endorphins to be released. Consequently, if people
given a treatment get better, it might be because of the treatment, because time
has passed, or simply because they are being treated by someone or something
they trust.
How can scientists determine if a new medication is effective or whether the
improvement is due either to the placebo effect or to the fact that many problems
A control group or a get better over time? They use a group of subjects who provide a standard for
comparison group comparison. The group used for comparison usually is called a control group if
provides the basis of the subjects receive a placebo and a comparison group if the subjects receive a
comparison. standard treatment.
Patients in the treatment group are given the drug to be evaluated. The
patients given a placebo get a nontreatment carefully designed to be as much
like the actual treatment as possible. The control and treatment groups should be
handled exactly alike except for the treatment itself. If a new treatment is to be
compared with a standard treatment, the comparison group receives the standard
treatment rather than a placebo.
In order for the control or comparison group and the treatment group to be
treated exactly alike, both the subjects and their doctors should be “blind.” That is,
Treatment 1
Assign at • Several units Measure
random (replication) responses
Treatment 2
Available Compare
• Several units
units responses
(replication)
Treatment 3
• Several units
(replication)
IGS: Here are my pages and pages of numbers. Can I conclude that cells
from white and green sponges have different lengths?
BBN: That depends. We’ll have to look at three things—the size of the
difference in the white and green averages, the size of your sample,
and the size of the natural variability from one unit to the next.
IGS: No problem. My sample size is humongous: 700 cells of each kind.
And the variability from one cell to the next isn’t all that big.
BBN: Seven hundred cells certainly is a lot. How many sponges did they
come from?
IGS: Uh, just two, one of each color. Does that matter?
BBN: Unfortunately, it’s critical. But I’ll start with the good news. With
700 cells from each sponge, you have rock-solid estimates of
4.3 Experiments and Inference About Cause 253
average cell length for the two sponges you actually looked at.
However, . . .
IGS: Right. And because one’s green and one’s white and the average
lengths are different, I can say that . . .
BBN: Not so fast. You have data that let you generalize from the cells
you measured to all the cells in your particular two sponges. But
that’s not the conclusion you were aiming for.
IGS: Yeah. I want to say something about all white sponges compared
to all green ones.
BBN: So for you, the kind of variability that matters most is the
variability from one sponge to another. Your unit should be a
sponge, not a cell.
IGS: You mean to tell me I’ve got only two units? After all that work!
BBN: I’m afraid you have samples of size 1.
IGS: Woe is me! I’d rather be a character in a bad parody of a Dickens
novel. But it will be a far, far better study that I do next time.
The take-away message Giving a name to the type of study isn’t important. What is important is that
is to randomize what you randomize what you can. In the next section, you will learn more about how
you can. to deal with variables that cannot be randomly assigned to the experimental units.
Practice
Cause and Effect b. Is this type of study a true experiment?
P14. Research has shown a weak association Explain why or why not.
between living near a major power line and c. According to a newspaper article, “While
the incidence of leukemia in children. Such there is a clear association between
a study might measure the incidence of high-voltage power lines and childhood
leukemia in children who live near a major leukemia, there is no evidence that the
power line and compare it to the incidence power lines actually cause leukemia.”
in children who don’t live near a major [Source: “Power lines tie to cancer unknown,” The Paris
power line. Typically, the children in the Why might the
[Texas] News, July 30, 2006.]
areas near major power lines are matched by newspaper come to this conclusion?
characteristics such as age, sex, and family P15. Solar thermal systems use heat generated by
income to children in the areas not near concentrating and absorbing the sun’s energy
major power lines. to drive a generator and produce electrical
a. Identify the subjects, conditions, and power. A manufacturer of solar power
response variable in such a study. generators is interested in comparing the
Display 4.11 Sample launch distance data, in inches, for red and green gummy bears.
Display 4.12 Plots of launch distances for red and green gummy bears.
Exercises
E23. A psychologist wants to compare children sunny locations have adapted in a way that
from 1st, 3rd, and 5th grades to determine makes them less successful in the shade
the relationship between grade level and than in the sun. For his study, he dug up ten
how quickly a child can solve word puzzles. plants in sunny locations. Five plants were
Two schools have agreed to participate in the chosen at random from these ten plants
study. Would this be an observational study and then were replanted in the sun; the
or an experiment? remaining five plants were replanted in the
E24. An engineer wants to study traffic flow at shade. At the end of the growing season, he
four busy intersections in a city. She chooses compared the sizes of the plants.
the times to collect data so that they cover a. Does this study meet the three characteristics
morning, afternoon, and evening hours on (randomization, replication, control or
both weekdays and weekends. Is this an comparison group) of a true experiment?
experiment or an observational study? b. Why did the plant ecologist bother to dig
E25. Buttercups. Some buttercups grow in bright, up the plants that were just going to be
sunny fields; others grow in woods, where it replanted in the sunny location? Use the
is both darker and damper. A plant ecologist word confounded in your answer.
wanted to know whether buttercups in
Long Days
5 10 15 20
Enzyme Concentration
Display 4.13 Dot plots of enzyme concentration in Kelly’s
hamsters, in milligrams per 100 milliliters.
Suppose Kelly had gotten results with more spread in the values but the same
means. As Display 4.14 shows, with the values more spread out, it’s no longer
obvious that the treatment matters.
Short Days
Long Days
0 5 10 15 20 25 30
Enzyme Concentration
Display 4.14 Dot plots of altered hamster data.
In order to conclude that the treatments make a difference, the difference
between the treatments has to be large enough to overshadow the variation within
each treatment.
A good experimental plan can reduce within-treatment variability and will
allow you to measure the size of the variability that remains both between and
within treatments.
88 68
sit
86 60
64 62
stand
72 80
72 58 45 50 55 60 65 70 75 80 85 90 95
Pulse
70 60
48 68
92 74
In the matched pairs design of Part B, the units were sorted into pairs of
students having similar sitting heart rates, with treatments randomly assigned
within each pair. In the repeated measures design of Part C, both treatments
were assigned, in random order, to each person (the ultimate in matching). In
Blocks are groups of experiments of this type, the matched units or the individual units that receive
similar units. all treatments in random order are called blocks.
-10 -5 0 5 10 15 20 25
Difference
The use of the term block in statistics comes from agriculture. One of the
earliest published examples of a block design appeared in R. A. Fisher’s The Design
of Experiments (1935). The goal of the experiment was to compare five types of
wheat to see which type gave the highest yield. The five types of wheat were the
treatments; the yield, in bushels per acre, was the response; and eight blocks of
farmland were available for planting.
There were many possible sources of variability in the blocks of land, mainly
differences in the composition of the soil: what nutrients were present, how well
the soil held moisture, and so on. Because many of these possibly confounding
influences were related to the soil, Fisher made the reasonable assumption that
the soil within each block would be more or less uniform and that the variability
to be concerned about was the variability from one block to another. His goal in
designing the experiment was to keep this between-block variability from being
confounded with differences between wheat types.
Fisher’s plan was to divide each of the eight blocks of land “into five plots
running from end to end of the block, and lying side by side, making forty plots in
all.” These 40 plots were his experimental units. Fisher then used a chance device to
assign one type of wheat to each of the five plots in a block. Display 4.18 shows how
his plan might have looked. A, B, C, D, and E represent the five types of wheat.
Small amounts
Plot of variability
1 2 3 4 5 within blocks
I B D A E C
II A D C B E
Large amounts
Block
. . of variability
. . between blocks
. .
VIII C A E D B
Display 4.18 An experimental design using blocks.
Suppose it turned out that one of the blocks had really poor (or really
favorable) conditions for wheat. Then Fisher’s blocking accomplished two things:
All five types of wheat would be affected equally, and the variation within a block
could be attributed to the types of wheat.
The effectiveness of blocking depends on how similar the units are in each
block and how different the blocks are from each other. Here “similar units” are
Your plots in Activity 4.4b show variability from three sources: the
launch angle, that is, the variability due to the two different treatments (one
book, four books); the particular team, that is, the variability from one team
to another for teams with the same launch angle; and the individual launches,
that is, the variability between launches for the same team. The first is the
difference you want to see—the difference between the two treatments. The
second two are “nuisance” sources of variability for launches with the same angle;
these are to be minimized. In the discussion questions, you will examine these
sources of variation.
Practice
Differences Between Treatments Versus A Design for Every Purpose
Variability Within Treatments P29. To test a new drug for asthma, both the new
P28. Review the antibacterial soap experiment drug and the standard treatment, in random
in E27. order, will be administered to each subject in
a. List at least two sources of within- the study.
treatment variability. a. What kind of design is this?
b. Is the point of randomization to reduce b. An observant statistician cries, “No, no!
the within-treatment variability or to Use two similar subjects in each pair,
equalize it between the treatment groups? randomized to each treatment.” What
kind of design is this?
c. Which design do you prefer, and why?
Chapter Summary
There are two main types of chance-based methods of data collection: sampling
methods and experimental designs. Sampling methods, studied in the first part
of this chapter, use chance to choose the individuals to be studied. Typically,
you choose individuals in order to ask them questions, as in a Gallup poll;
thus, samples and surveys often go together. Experiments, introduced in
the second part of this chapter, are comparative studies that use chance to
assign the treatments you want to compare. An experiment should have three
characteristics: random assignment of treatments to units, two or more treatments
to compare, and replication of each treatment on at least two subjects.
The purposes of sampling and experimental design are quite different. Sample
surveys are used to estimate the parameters of fixed, well-defined populations.
Experiments are used to establish cause and effect by comparing treatments.
Display 4.21 summarizes the differences between a survey and an experiment.
How You
What You Role of Control Threats to
Examine Ultimate Goal Randomization Variation Inference
Population Describe some Take a random Stratify Sampling
Sample characteristic of the sample from the bias, response
Survey population population bias
Treatments See whether Assign treatments Block Confounding
different treatments at random to
Experiment cause different available units
results
Review Exercises
E41. For each situation, tell whether it is better to E42. You want to estimate the percentage of people
take a sample or a census, and give reasons in your area with heart disease who also
for your answer. smoke cigarettes. The people in your area who
Characteristic of Interest Population of Interest have heart disease make up your population.
You take as your frame all records of patients
a. Average life of a battery Alkaline AAA batteries
hospitalized in area hospitals within the last
b. Average age Current U.S. senators
5 years with a diagnosis of heart disease. How
c. Average price per gallon Purchases of regular-octane gasoline well do you think this frame represents the
sold at U.S. stations next week
population? If you think bias is likely, identify
what kind of bias it would be and explain how
it might arise.
and each mouse in the last group lived 2 51.8 36.2 15.6
in isolation. All groups were otherwise 3 33.5 40.7 7.2
treated alike and were given as much food 4 32.8 38.8 6
as they wanted. After 90 days, the mice 5 69.0 71.0 2
were weighed again.
6 38.8 47.0 8.2
b. A student picked four different sites on an 7 54.6 57.0 2.4
isolated hillside at random. At each site,
he measured off a 10-ft-by-10-ft square. Display 4.23 Percentage of radioactivity remaining
At each site, a sample of soil was taken after 1 hour. [Source: Per Camner and Klas
Phillipson, “Urban Factor and Tracheobronchial
and the amounts of ten different nutrients Clearance,” Archives of Environmental Health 27
were measured. The student counted the (1973): 82. Reprinted in Richard J. Larson and
number of species of plants at each site, Morris L. Marx, An Introduction to Mathematical
hoping to be able to predict the number Statistics and Its Applications, 2nd ed. (Englewood
Cliffs, N.J.: Prentice Hall, Inc., 1986).]
of species from the amounts of the
nutrients. a. What is the factor? What are its levels?
c. A college professor helped his daughter Identify a block.
with a 2nd-grade science project titled
“Does Fruit Float?” They tested 15 different
AP1. Researchers want to estimate the mean AP4. To conduct a survey to estimate the mean
number of children per family, for all number of minutes adults spend exercising,
families that have at least one child enrolled researchers stratify by age before randomly
in one of ten similarly sized county high selecting their sample. Which of the following
schools. Which sampling plan is biased? is not a good reason for choosing this plan?
Randomly select 100 families from those Without stratification by age, age will be
families in the county that have at least confounded with the number of minutes
one child in high school. Compute the reported.
mean number of children per family. Researchers will be sure of getting adults
Randomly select one high school, and of all ages in the sample.
compute the mean number of children Researchers will be able to estimate the
in each family with a child or children in mean number of minutes for adults in
that school. various age groups.
Randomly select 20 students from each Adults of different ages may exercise
of the ten high schools, and ask each different amounts on average, so
one how many children are in his or her stratification will give a more precise
family. Compute the mean number of estimate of the mean number of minutes
children per family. spent exercising.
From a list of the families in all ten All of these are good reasons.
high schools, choose a random starting
AP5. A movie studio runs an experiment in order
point and then select every tenth family.
to decide which of two previews to use for
Compute the mean number of children
its advertising campaign for an upcoming
per family.
movie. One preview features the movie’s
None of the above is a biased plan. romantic scenes and is expected to appeal
AP2. A radio program asked listeners to call in more to women. The other preview features
and vote on whether the notorious band, The the movie’s action scenes and is expected
Rolling Parameters, should perform at the to appeal more to men. Sixteen subjects
Statistics Day celebration. Of the 956 listeners take part in this experiment, eight women
who responded, 701 answered that The and eight men. After viewing one of the
Rolling Parameters should perform. Which previews, each person will rate how much
type of sampling does this example use? he or she wants to see the movie. Which of
stratified random cluster the following best describes how blocking
systematic quota should be used in this experiment?
voluntary response Use blocking, with the men in one block
and the women in the other.
AP3. To select students to explain homework
Use blocking, with half the men and half
problems, a teacher has students count off
the women in each block.
by 5’s. She then randomly selects an integer
from 1 through 5. Every student who counted Do not block, because the preview that is
off that integer is asked to explain a problem. chosen will have to be shown to audiences
Which type of sampling plan is this? consisting of both men and women.
convenience systematic Do not block, because the response will
be confounded with gender.
simple random stratified random
Do not block, because the number of
cluster
subjects is too small.
284 Chapter 4 Sample Surveys and Experiments
AP6. A recent study tried to determine completely randomized with blocking
whether brushing or combing hair results completely randomized with no blocking
in healthier-looking hair. Forty male randomized paired comparison
volunteers were randomly divided into two (matched pairs)
groups. One group only brushed their hair
and the other group only combed their hair. randomized paired comparison
Other than that, the volunteers followed (repeated measures)
their usual hair care procedures. After two two-stage randomized
months, an evaluator who did not know the
treatments the volunteers used, scored each Investigative Tasks
head of hair by how healthy it looked. There
was almost no difference in the scores of AP9. Needle threading. With your eye firmly fixed
the two treatment groups. Which statement on winning a Nobel prize, you decide to
best summarizes this study? make the definitive study of the effect of
background color (white, black, green, or
If a male wants healthy looking hair, red) on the speed of threading a needle with
it probably doesn’t matter whether he white thread. Design three experiments—one
brushes or combs it. that uses no blocks, one that creates blocks by
You can’t tell whether brushing grouping subjects, and one that creates blocks
or combing is better, because the by reusing subjects. Tell which of the three
treatments are likely to be confounded plans you consider most suitable, and why.
with variables such as which kind of
AP10. For the 2000 U.S. Census, controversy
shampoo a male uses.
erupted over the Census Bureau proposal
You can’t come to a conclusion, because to use sampling to adjust for the anticipated
the study wasn’t double-blind. undercount. Here is a simplified version
You can’t come to a conclusion, because of the plan: The Census Bureau collects
only volunteers were included. the information mailed in by most of
The sample size is too small for any the residents of a region. Some residents,
conclusion to be drawn. however, did not receive forms or did not
AP7. Which of the following is not a necessary return them for some reason; these are the
component of a well-designed experiment? uncounted persons. The bureau now selects
a sample of blocks (neighborhoods) from
There must be a control group that the region and sends field workers to find all
receives a placebo. residents in the sampled blocks. The residents
Treatments are randomly assigned to the field workers found are matched to the
experimental units. census data, and the number of residents
The response variable is the same for all uncounted in the original census is noted. The
treatment groups. census count for those blocks is then adjusted
There are a sufficient number of units in according to the proportion uncounted.
each treatment group. (For example, if one-tenth of the residents
were uncounted, the original census figures
All units are handled as alike as possible,
are adjusted upward by 11%.) In addition,
except for the treatment.
the same adjustment factor is used for
AP8. In a clinical trial, a new drug and a placebo neighboring regions that have characteristics
are administered in random order to each similar to the region sampled. Comment on
subject, with six weeks between the two the strengths and weaknesses of this method.
treatments. Which best describes this design?
AP Sample Test 285
CHAPTER
5 Probability Models
Jack and Jill begin by formulating a model that specifies that if a person can’t
identify tap water, then he or she will choose the tap water, T, with probability 0.5
and the bottled water, B, with probability 0.5. If they have only one taster, that is,
n 1, then the probability that the person will guess correctly is written P(T) 0.5.
Onward and upward: What if they have two tasters, that is, n 2? Assuming
that the tasters can’t identify tap water, what is the probability that both people
will guess correctly and choose T? Here the research stumbles.
Jack: There are three possible outcomes: Neither person chooses T, one
chooses T, or both choose T. These three outcomes are equally
Jack: Whew! Now I see you must be right. The relative frequencies from
the simulation match your probabilities fairly well. I admit I fell
down a bit here. I forgot that there are two ways to get one person
choosing correctly—the first chooses correctly and the second
doesn’t, or the second chooses correctly and the first doesn’t.
There is only one way for two people to choose correctly: The first
person chooses correctly and the second person chooses correctly.
Jill: Now that this has been settled, let’s try to construct the probability
distribution for three people, or n 3.
Jack: Let me redeem myself. Following your reasoning that different
orders should be listed separately, there are eight possible outcomes.
First Person Second Person Third Person
T T T
T T B
T B T
B T T
T B B
B T B
B B T
B B B
Sample Spaces
Jack and Jill both used the same principle: Start by making a list of possible
outcomes. Over the years, mathematicians realized that such a list of possible
outcomes, called a sample space, must satisfy specific requirements.
A sample space for a chance process is a complete list of disjoint outcomes. All
of the outcomes in a sample space must have a total probability equal to 1.
Complete means that every possible outcome is on the list. Disjoint means
that two different outcomes can’t occur on the same opportunity. Sometimes
the term mutually exclusive is used instead of disjoint. This book will alternate
between the two terms so you can get used to both of them.
Deciding whether two outcomes are disjoint sounds easy enough, but be
careful. You have to think about what an outcome means in your situation. If your
outcome is the result of a single coin flip, your sample space is heads (H) and tails
(T). These two outcomes are mutually exclusive because you can’t get both heads
and tails on a single flip. But suppose you are thinking about what happens when
you flip a coin three times. Now your sample space includes outcomes like HHT
and TTT. Even though you get tails on the third flip in both HHT and TTT, these
are disjoint outcomes. Your sample space consists of triples of flips, and these
aren’t the same triple.
Jack’s sample space—neither person chooses T, one chooses T, two choose
T—has outcomes that are complete and disjoint, but they aren’t equally likely. He
has a legitimate sample space; he has simply assigned the wrong probabilities.
0.8
0.6
0.4 0.377
0.2
0 0
0 50 100 150
Spin Number
Display 5.4 Proportion of heads for the given number of spins of
a coin, with P (heads) 0.4.
[You can use a calculator to perform this experiment yourself. See Calculator Note 5B.]
Most people intuitively understand the Law of Large Numbers. If they want
to estimate a proportion, they know it is better to take a larger sample than
a smaller one. After Jack saw the results from 3000 flips of two coins, he
immediately rejected his model that there are three equally likely outcomes. If
there had been only 10 pairs of coin flips, he couldn’t have been so sure that his
model was wrong.
T
TT
T
B
TB
BT
T
B
BB
B
Display 5.5 A tree diagram of all possible outcomes for n 2.
With five people, or n 5, and two possible outcomes for each person, Jack
and Jill have 2 2 2 2 2, or 32, possible outcomes. If their model is correct—
that is, if people are just guessing which is the tap water—these 32 outcomes
are all equally likely. So the probability that each of the five people will correctly
1
choose the tap water is __ 32 .
When a process has only two stages, it is often more convenient to list them
using a two-way table.
Practice
Where Do Probabilities Come From? b. What is the probability that all four
P1. Suppose Jack and Jill use a sample of four people will identify the tap water
people who can’t tell the difference between correctly?
tap water and bottled water. c. Is four people a large enough sample to
a. Construct the probability distribution ease Jack’s concern about the reputation
for the number of people in the sample of Downhill Research?
who would choose the tap water just
by chance.
0.35 with?
0.30 b. What is the probability that you get your
0.25 favorite dentist and your favorite dental
0.20 hygienist?
0.15
c. Illustrate your answer in part a with a
0.10
two-way table.
0.05
0.00 d. Illustrate your answer in part a with a
0 10 20 30 40 50
tree diagram.
SpinNumber
Exercises
E1. Suppose you flip a coin five times and count E3. Refer to the sample space for rolling two
the number of heads. dice shown in Display 5.6 on page 295.
a. List all possible outcomes. Determine each of these probabilities.
b. Make a table that gives the probability a. not getting doubles
distribution for the number of heads. b. getting a sum of 5
c. What is the probability that you get at c. getting a sum of 7 or 11
most four heads? d. a 5 occurring on the first die
E2. Refer to the sample space for rolling two dice e. getting at least one 5
shown in Display 5.6 on page 295. Make a
f. a 5 occurring on both dice
table that gives the probability distribution
for the sum of the two dice. The first column g. the larger number is a 5 (if you roll
should list the possible sums, and the second doubles, the number is both the smaller
column should list their probabilities. number and the larger number)
h. the smaller number is a 5
i. the difference of the larger number and
the smaller number is 5
ProportionDoubles
number is both the smaller number and 0.35
the larger number.) 0.30
0.25
f. What is the probability that the smaller
0.20
number is a 2? 0.15
g. What is the probability that the larger 0.10
number minus the smaller number is 0.05
equal to 2? 0.00
200
160
Frequency
120
80
40
0
30 35 40 45 50 55 60
Mean Age
Display 5.12 Results of 2000 runs of the layoff of three workers,
using a table of random digits.
4. Conclusion.
From the histogram, the estimated probability of getting an average age of
90
58 years or more if you pick three workers at random is ____ 2000 , or 0.045. This
probability is fairly small, so it is unlikely that the process Westvaco used for
layoffs in this round was equivalent to picking the three workers at random.
■
According to the Law of Large Numbers, the more runs you do, the closer
you can expect your estimated probability to be to the theoretical probability.
The simulation in the previous example contained 2000 runs. When you do a
simulation on a computer, there is no reason not to do many thousands of runs.
Outcome Frequency
3000
0 hand-washers 124
Frequency
1 hand-washer 975
2000
2 hand-washers 2,967
3 hand-washers 3,964 1000
4 hand-washers 1,970
Total 10,000 0
0 1 2 3 4
Number of Hand-Washers
Display 5.14 Results of 10,000 runs of the hand-washer
simulation.
304 Chapter 5 Probability Models
4. Conclusion.
From the frequency table, the estimated probability that all four randomly
1,970
selected people will wash their hands is _____
10,000 , or 0.197. ■
800
4 tested 299 15 tested 4
5 tested 221 16 tested 3
600 6 tested 135 17 tested 1
7 tested 95 18 tested 1
400
8 tested 71 19 tested 0
200 9 tested 51 20 tested 0
10 tested 25 21 tested 2
11 tested 12 22 tested 0
0 5 10 15 20 25
NumberTested Total 3000
The first run of the simulation resulted in 1, 1, and 2. Because Sydney got
Blend 1 twice, record that the three blends weren’t all different.
In the second run of the simulation, the three numbers are different, so
record that Sydney got three different blends.
In the third run of the simulation, the three numbers are different, so again
record that Sydney got three different blends.
Display 5.20 gives the results of 2000 runs of this simulation.
Outcome Frequency
The three blends weren’t all different. 884
The three blends were all different. 1116
Total 2000
Frequency
1000
simulation, using the specified row of 800
Table D on page 828. Add your results to 600
the frequency table given in the practice 400
problem. 200
P11. Researchers at the Macfarlane Burnet Display 5.21 Results of 4990 runs of the
Institute for Medical Research and Public disappearing teaspoons.
Health in Melbourne, Australia, noticed that
P12. A catastrophic accident is one that involves
the teaspoons had disappeared from their
severe skull or spinal damage. The National
tearoom. They purchased new teaspoons,
Center for Catastrophic Sports Injury
numbered them, and found that 80%
Research reports that over the last 21 years,
disappeared within 5 months. [Source: Megan S. C.
Lim, Margaret E. Hellard, and Campbell K. Aitken, “The Case there have been 101 catastrophic accidents
of the Disappearing Teaspoons,” British Journal of Medicine among female high school and college
331 (December 2005): 1498–1500, bmj.bmjjournals.com.] athletes. Fifty-five of these resulted from
Suppose that 80% is the correct probability cheerleading. [Source: www.unc.edu.]
that a teaspoon will disappear within Suppose you want to study catastrophic
5 months and that this group purchases accidents in more detail, and you take a
ten new teaspoons. Estimate the probability random sample, without replacement, of 8 of
that all the new teaspoons will be gone in these 101 accidents. Estimate the probability
5 months. that at least half of your eight sampled
Start at the beginning of row 34 of Table D accidents resulted from cheerleading.
on page 828, and add your ten results to the Start at the beginning of row 17 of Table D
frequency table in Display 5.21, which gives on page 828, and add your ten runs to the
the results of 4990 runs. frequency table in Display 5.22, which gives
the results of 990 runs.
160 7 1536
120 Total 4990
80
Display 5.23 Results of 4990 runs of the number of
40
games in the World Series.
0
1 2 3 4 5 6 7 8
Number of Accidents from Cheerleading
Exercises
For E15–E20, complete a–d. one of the girls says that she rarely or never
a. Assumptions. State your assumptions. wears a seat belt. [Source: www.nhtsa.dot.gov.]
b. Model. Make a table that shows how Start at the beginning of row 36 of Table D
you are assigning the random digits on page 828, and add your ten results to the
to the outcomes. Explain how you will frequency table in Display 5.24, which gives
use the digits to conduct one run of the the results of 9990 runs.
simulation and what summary statistic Number of Girls Who 7000
you will record. Rarely or Never Wear 6000
a Seat Belt Frequency
c. Repetition. Conduct ten runs of the 5000
Frequency
The categories are disjoint, which makes computing probabilities easy. For
example, if you pick one of the people in the civilian labor force at random, you
can find the probability that he or she is employed by adding the employees on
farm payrolls to those on nonfarm payrolls and then dividing by the total number
in the civilian labor force:
P(employed) P(on farm payroll or on nonfarm payroll)
8,975 135,354 144,329
______________ _______ 0.952
151,534 151,534
Solution
a. These categories aren’t disjoint because, for example, the same person might
have dined out and read books. Also, for this table, you can tell that the
categories aren’t disjoint because the percentages sum to more than 100%.
The categories aren’t complete, because there are many other leisure activities.
b. Thirty-six percent of 213,000,000, or 76,680,000 adults, read books for leisure.
c. From this information, it is impossible to determine the percentage of
U.S. adults who surfed the net or went to the beach. If you add the two
percentages, you are counting the people twice who did both. Because you
don’t know how many people that is, you are stuck.
■
Two events are In Activity 5.3a, you can get the answer to one question by adding because the
disjoint if they can’t categories are disjoint. For the other question, adding doesn’t work because the
happen on the categories are not disjoint.
same opportunity.
Two useful rules emerge from Activity 5.3a. First, if two events are disjoint,
the probability of their occurring together is 0.
P(A or B) is Second, if two events are disjoint, then you can add probabilities to find
sometimes written P(A or B).
P(A 傼 B).
Using set notation, The Addition Rule for Disjoint (Mutually Exclusive) Events
this rule is written
P(A 傼 B) P(A) P(B) If event A and event B are disjoint, then
where A 傼 B is read
“A union B.” P(A or B) P(A) P(B)
The Addition Rule for Disjoint Events can be generalized. For example, if each
pair of events A, B, and C is disjoint, then
___ 5 ___
6 ___ 11
36 36 36 ■
P[(1st chooses T and 2nd chooses B) or (1st chooses B and 2nd chooses T)]
The ideas you explored in Activity 5.3b can be stated formally as the
Addition Rule.
You can also apply the Addition Rule “backward”; that is, you can compute
P(A and B) when you know the other probabilities.
A B A B
Practice
Disjoint and Complete Categories Child Support Status by Custodial Number
Parents in 2001 (in thousands)
P14. Of the 34,071,000 people in the United States
who fish, 1,847,000 fish in the Great Lakes, With child support agreement or award 7,916
27,913,000 fish in other fresh water, and Supposed to receive payments 6,924
9,051,000 fish in salt water. [Source: U.S. Census Actually received payments 5,119
Bureau, Statistical Abstract of the United States, 2006,
Received full amount 3,099
Table 1241.]
Received partial payments 2,020
a. In categorizing people who fish, are
Did not receive payments 1,805
these three categories disjoint? Are they
Child support not awarded 5,467
complete?
Total custodial parents with children
b. Suppose you randomly select a person under age 21
13,383
from among those who fish. Can you find
the probability that the person fishes in Display 5.33 Custodial parents and court-ordered
salt water? child support, 2001. [Source: U.S. Census
Bureau, Statistical Abstract of the United States,
c. Suppose you randomly select a person 2006, Table 558.]
from among those who fish. Can you find
the probability that the person fishes in a. Revise the table so that the categories
fresh water? are complete and disjoint. Note that
one category wasn’t included: custodial
d. The number of people who fish in fresh parents with a child support agreement or
water is 28,439,000. How many people award who were not supposed to receive
fish in both salt water and fresh water? payments in 2001 (but maybe get them in
P15. Display 5.33 categorizes the child support some other year).
received by custodial parents with children
under age 21 in the United States.
P17. A researcher will select a student at random a. Are the events crash involved a teen driver
from a school population where 33% of the and crash was speed related mutually
students are freshmen, 27% are sophomores, exclusive? How can you tell?
25% are juniors, and 15% are seniors. b. Use numbers from the cells of the table to
a. Is it appropriate to use the Addition Rule compute the probability that a randomly
for Disjoint Events to find the probability selected crash involved a teen driver or
that the student will be a junior or a was speed related.
senior? Why or why not? c. Now use two of the marginal totals and
b. Find the probability that the student will one number from a cell of the table to
be a freshman or a sophomore. compute the probability that a randomly
P18. A tetrahedral die has the numbers 1, 2, 3, selected crash involved a teen driver or
and 4 on its faces. Suppose you roll a pair of was speed related.
tetrahedral dice. P20. Use the Addition Rule to compute the
a. Make a table of all 16 possible outcomes probability that if you roll two six-sided dice,
(or use the one you made in E4 on a. you get doubles or a sum of 4
page 299). b. you get doubles or a sum of 7
b. Use the Addition Rule for Disjoint Events c. you get a 5 on the first die or you get a 5
to find the probability that you get a sum on the second die
of 6 or a sum of 7.
P21. Use the Addition Rule to compute the
c. Use the Addition Rule for Disjoint probability that if you flip two fair coins, you
Events to find the probability that you get get heads on the first coin or you get heads
doubles or a sum of 7. on the second coin.
d. Why can’t you use the Addition Rule for P22. Use the Addition Rule to find the probability
Disjoint Events to find the probability that if you roll a pair of dice, you do not get
that you get doubles or a sum of 6? doubles or you get a sum of 8.
Although the numbers alone can’t tell you who got to go first on the lifeboats,
344 367
they do show that ___ ____
470 , or roughly 73%, of the females survived, while only 1731 ,
or roughly 21%, of the males survived. Thus, the data are fully consistent with the
hypothesis that the song explains what happened.
711
Overall, ____
2201 , or approximately 32.3%, of the people survived, but the survival
rate for females was much higher and that for males much lower. The chance of
surviving depended on the condition of whether the person was male or female.
This commonsense notion that probability can change if you are given additional
information is called conditional probability.
344 Survived
)= 470 Female and Survived
Female P(S F
0
47 01
22 P(D F
= )= Died
F) 126 Female and Died
P( 470
P( 367 Survived
M ) = 1731 Male and Survived
P(S M
)=
17 Male
22 31
01
P(D M Died
)= 1364 Male and Died
1731
P(B)
B
P(A
B) B and A P(B) • P(AB)
A
Display 5.41 The general Multiplication Rule, shown on a tree
diagram.
Example: Coincidences
This article appeared in the Los Angeles Times on July 18, 1978.
Man, Wife Beat Odds in Moose-Hunt Draw
The Washington State Game Department conducted a public drawing
last week in Olympia for three moose hunting permits. There were
2,898 application cards in the wire mesh barrel. It was cranked around
b. The assumptions are that Bill and Judy had only one card each in the barrel
and that the cards were well mixed.
c. The chance that Judy’s name and then her husband’s name will be called is
about one chance in 10 million. However, there is also the chance that Bill’s
name will be called first and then Judy’s. This doubles their chances of being
the first two names drawn to 0.000000238—only about one chance in
5 million. However, suppose all the names in the barrel were those of
couples. Then the probability that the first two names drawn will be those
1
of a couple is ____
2897 , or about 0.000345, because the first name can be anyone
and that person’s partner is one of the 2897 left for the second draw. This
probability is still small, but now it is 3 chances in 10,000. Finally, thousands
of lotteries take place in the United States every year, so it is virtually certain
that coincidences like this will happen occasionally and be reported in the
newspaper.
■
is true in general.
Event A Present?
Yes No
Event B Yes c d
Present? No e f
Solution
In some ways, this is a pretty good test. It finds 9 out of the 10 people who have
the disease, for a sensitivity of 0.9. It correctly categorizes 9940 out of the 9990
people who don’t have the disease, for a specificity of 0.99. The NPV is 9940 out
of 9941, or 0.9999. However, notice that only 9 out of 59, or 15%, of the people
who test positive for the disease actually have it! The PPV is quite low because
there are so many false positives.
■
The previous example shows what can happen when the population being
screened is mostly disease free, even with a test of high specificity. Most of the
people who test positive do not, in fact, have the disease. On the other hand, if the
population being screened has a high incidence of the disease, then there tend to
be many false negatives and the negative predictive value tends to be low.
Because the positive predictive value and the negative predictive value depend
on the population as well as the screening test, statisticians prefer to judge a test
based on the other pair of conditional probabilities: sensitivity and specificity.
The dialogue that follows is invented and did not actually occur in the
Westvaco case (Chapter 1), but it is based on real conversations one of your
authors had on several occasions with a number of different lawyers as they
grappled with conditional probabilities.
Statistician: Suppose you draw three workers at random from the set of ten
hourly workers. This establishes random sampling as the model
for the study.
Lawyer: Okay.
Statistician: It turns out that there are 10 , or 120, possible samples of size 3,
3
and only 6 of them give an average age of 58 or more.
6
Lawyer: So the probability is ___
120 , or 0.05.
Statistician: Right.
Lawyer: There’s only a 5% chance that the company didn’t discriminate and
a 95% chance that it did.
Statistician: No, that’s not true.
Lawyer: But you said . . .
Statistician: I said that if the age-neutral model of random draws is correct,
then there’s only a 5% chance of getting an average age of 58 or
more.
Lawyer: So the chance that the company is guilty must be 95%.
The statistician has computed P(data model). The lawyer wants to know
P(model data). Finding the probability that there was no discrimination given
that the average age was 58 is not a problem that statistics can solve. A model
is needed in order to compute a probability.
Age
16 to 24 25 to 34 35 to 44 45 to 54 55 to 64 65 and Older Total
Volunteer 8,821 10,046 14,783 13,584 8,784 8,524 64,542
Display 5.46 Persons (in thousands) who performed unpaid volunteer activities in the last
year. [Source: U.S. Census Bureau, Statistical Abstract of the United States, 2006, Table 575.]
You can also use this rule to decide whether two events are independent.
P(heads on 1st flip and heads on 2nd and heads on 3rd and heads on 4th)
Solution
First, check the probabilities:
41 0.526
P(win) ___
78
11 0.524
P(win day game) ___
21
Practice
Independent Events a. Make a table like Display 5.50 on page 342
P37. Suppose you select one person at random to show all possible results.
from the Titanic passengers and crew in b. What is the probability that exactly one
Display 5.39 on page 325. Use the definition of the two people has type O blood?
of independent events to determine whether c. Make a tree diagram that illustrates the
the events didn’t survive and male are situation.
independent. Are any two events in this table P40. Suppose you select ten people at random.
independent? Using the information in P39, find the
P38. Suppose you draw a card at random from probability that
a standard deck. Use the definition of a. at least one of them has type O blood
independent events to determine which pairs
of events are independent. b. at least one of them doesn’t have type O
blood
a. getting a heart; getting a jack
P41. After taking college placement tests,
b. getting a heart; getting a red card freshmen sometimes are required to repeat
c. getting a 7; getting a heart high school work. Such work is called
“remediation” and does not count toward
Multiplication Rule for Independent Events a college degree. About 11% of college
P39. About 42% of people have type O blood. freshmen have to take a remedial course in
Suppose you select two people at random reading. Suppose you select two freshmen at
and check whether they have type O blood.
Exercises
E57. Which of these pairs of events, A and B, do Right-Eyed Left-Eyed Total
you expect to be independent? Give a reason Right-Handed 57 31 88
for your answer.
Left-Handed 6 6 12
a. For a test for tuberculosis antibodies:
Total 63 37 100
A: The test is positive.
B: The person has a relative with Display 5.52 Eyedness and handedness of
100 people.
tuberculosis.
b. For a test for tuberculosis antibodies: a. Find each probability.
A: A person’s test is positive. i. P(left-handed)
B: The last digit of the person’s Social ii. P(left-eyed)
Security number is 3. iii. P(left-eyed left-handed)
c. For a randomly chosen state in the United iv. P(left-handed left-eyed)
States: b. Are being left-handed and being left-eyed
A: The state lies east of the Mississippi independent events?
River. c. Are being left-handed and being left-eyed
B: The state’s highest elevation is more mutually exclusive events?
than 8000 ft. E60. Display 5.53 gives the decisions on all
E58. Use the definition of independent events to applications to two of the largest graduate
determine which of these pairs of events are programs at the University of California,
independent when you roll two dice. Berkeley, by gender of the applicant. Suppose
a. rolling doubles; rolling a sum of 8 you pick an applicant at random.
b. rolling a sum of 8; getting a 2 on the first Admit Deny Total
die rolled Man 650 592 1242
c. rolling a sum of 7; getting a 1 on the first Woman 220 263 483
die rolled Total 870 855 1725
d. rolling doubles; rolling a sum of 7
Display 5.53 Application decisions for the two
e. rolling a 1 on the first die; rolling a 1 on largest graduate programs at the
the second die University of California, Berkeley, by
E59. Display 5.52 gives the handedness and gender. [Source: David Freedman, Robert
eyedness of a randomly selected group of Pisani, and Roger Purves, Statistics, 3rd ed. (New
York: Norton, 1997); data from Graduate Division,
100 people. Suppose you select a person UC Berkeley.]
from this group at random.
Chapter Summary
Probability is the study of random behavior. The probabilities used in statistical
investigations often come from a model based on equally likely outcomes. In other
cases, they come from a model based on observed data. While you do not know
for sure what the next outcome will be, a model based on many observations gives
you a pretty good idea of the possible outcomes and their probabilities.
Two important concepts were introduced in this chapter: disjoint (mutually
exclusive) events and independent events.
• Two events are disjoint if they can’t happen on the same opportunity. If A and
B are disjoint events, then P(A and B) is 0.
• Two events are independent if the occurrence of one doesn’t change the
probability that the other will happen. That is, events A and B are independent
if and only if P(A) P(A B).
P(not A) 1 P(A)
This rule often is used to find the probability of at least one success:
Review Exercises
E77. Suppose you roll a fair four-sided to land on your opponent’s counter, your
(tetrahedral) die and a fair six-sided die. opponent has to move his or her counter
a. How many equally likely outcomes are to the “bar” where it is trapped until your
there? opponent can free it. For example, suppose
your opponent’s counter is five spaces
b. Show all the outcomes in a table or in a
ahead of yours. You can “hit” that counter
tree diagram.
by rolling a sum of 5 with both dice or by
c. What is the probability of getting getting a 5 on either die.
doubles?
a. Use the sample space for rolling a pair of
d. What is the probability of getting a sum dice to find the probability of being able
of 3? to hit your opponent’s counter on your
e. Are the events getting doubles and next roll if his or her counter is five spaces
getting a sum of 4 disjoint? Are they ahead of yours.
independent? b. Can you use the Addition Rule for
f. Are the events getting a 2 on the Disjoint Events to compute the
tetrahedral die and getting a 5 on the probability that you roll a sum of 5 with
six-sided die disjoint? Are they both dice or get a 5 on either die? Either
independent? do the computation or explain why you
E78. Backgammon is one of the world’s oldest can’t.
games. Players move counters around the E79. Jorge has a CD player attached to his alarm
board in a race to get “home” first. The clock. He has set the CD player so that when
number of spaces moved is determined by it’s time for him to wake up, it randomly
a roll of two dice. If your counter is able selects one song to play on the CD. Suppose
AP1. A student argues that extraterrestrials of a successful day. How should the student
will either abduct her statistics teacher by conduct one run?
tomorrow or they will not, and therefore Assign the digits 0–9 to the retirees.
there’s a 1 out of 2 chance for each of these Assign the integers 1–15 to the children.
two events. Which of the following best Randomly choose a number from each
explains why this reasoning is incorrect? group, pairing up a child and a retiree.
The two events are not independent. Repeat, without replacement, until all
The two events are mutually exclusive. 10 retirees are assigned a child. Record
The two events are not equally probable. whether the day is successful.
The two events are complements. Assign the integers 0–15 to the children.
Randomly choose digits one at a time,
There are more than two events that with replacement, until 10 different
need to be considered. integers are chosen. Record the number
AP2. Suppose you roll two dice. Which of the of integers needed to get 10 different ones.
following are independent events? Assign the digits 0–9 to the retirees.
getting a sum of 8; getting doubles Randomly choose 15 digits, with
getting a sum of 3; getting doubles replacement, and record whether all
getting a sum of 2; getting doubles 10 digits were chosen or not.
getting a 1 on the first die; getting a Assign the integers 1–15 to the children.
sum of 5 Randomly choose 10 of these integers,
without replacement, and record the
getting a 1 on the first die; getting proportion that are less than 10.
doubles
Assign the integers 1–15 to the children.
AP3. Suppose you roll two dice. Which of the Randomly choose 10 of these integers,
following are mutually exclusive (disjoint) with replacement, and record whether
events? all 10 integers were different or not.
getting a sum of 8; getting doubles AP5. In a statistics classroom, 50% of the students
getting a sum of 3; getting doubles are female and 30% of the students got
getting a sum of 2; getting doubles an A on the most recent test. What is the
getting a 1 on the first die; getting a probability that a student picked at random
sum of 5 from this classroom is a female who got an
A on the most recent test?
getting a 1 on the first die; getting
doubles 0.15
AP4. Suppose 15 children visit a retirement home 0.20
at various times during one day, and each 0.40
child randomly chooses one of 10 retirees 0.65
to visit. A “successful” day is one in which cannot be determined from the
all 10 retirees are visited by at least one of information given
these children. A statistics student wants to
use simulation to estimate the probability
6 Probability Distributions
0.4
0.3
0.2
0.1
Forty percent of people
have type A blood. A
blood bank is in dire
need of a type A donor 0.0
today. How many
donors will they have
to test before finding
the first type A? A
probability distribution
can describe the
chances of the possible
outcomes.
A probability distribution describes the possible numerical outcomes of a
chance process and allows you to find the probability of any set of possible
outcomes. Sometimes a probability distribution is defined by a table, like the ones
Jack and Jill made in Chapter 5. Sometimes a probability distribution is defined
by a curve, like the normal curve in Chapter 2. If you select a male at random,
all possible heights he could have are given by the values on the x-axis, and the
probability of getting someone whose height is between two specified x-values is
given by the area under the curve between those two x-values. As you will learn in
this chapter, sometimes a probability distribution is defined by a formula.
Probability distributions for practical use come about in two different
ways—through data collection and through theory. If you want to know the
chance of a paper cup landing on its closed bottom when tossed, the best way
to find out is to toss the cup many times and collect data. On the other hand, if
you want the probability of an event that can be modeled by coin flipping, you
can use the fact that the probability of heads is always 0.5 and build from there.
This chapter begins with data collection but quickly moves to theory.
Some types of probability distributions occur so frequently in practice that
it is important to know their names and formulas. Among these are the binomial
and geometric distributions, which closely (but not perfectly) reflect many
real-world situations and so are used to model many applications that have
similar characteristics.
Vehicles_per_Household
Vehicles per Proportion of 0.35
Relative Frequency of
Household Households 0.30
0 0.088 0.25
1 0.332 0.20
2 0.385 0.15
3 0.137 0.10
4 0.058 0.05
0 1 2 3 4 5 6
Vehicles_per_Household
The first sequence, 391, represents a household with one motor vehicle
because 391 is in the interval 089–420. The second sequence, 545, represents a
household with two motor vehicles. So for the first duplex, you have a total of
three motor vehicles. In sampling this way, you might come across a sequence that
you have already used. You should go ahead and use it again because each random
sequence represents many households, not just one, and you want to keep the
probabilities the same for each household selected.
Such a simulation for 500 duplexes (each consisting of randomly selecting two
values from the distribution in Display 6.1 and adding them) is shown in Display
6.3. This distribution differs considerably from the one in Display 6.1 and could
not be anticipated without some clever work with probabilities.
Total Number Proportion of Histogram
Sample 2 Households
of Vehicles Duplexes
0.35
0 0.008
Relative Frequency of 0.30
1 0.058 Total_Vehicles
0.25
2 0.142
0.20
3 0.306
0.15
4 0.250
0.10
5 0.160
0.05
6 0.064
7 0.010 0 2 4 6 8 10
8 0.002 Total_Vehicles
Solution
There are four possible outcomes for the two patients. With “yes” representing
“caused by smoking” and “no” representing “not caused by smoking,” the
possibilities are
no for 1st patient and no for 2nd patient
P(no for 1st patient and no for 2nd patient) P(no for 1st) P(no for 2nd)
(0.1)(0.1) 0.01
P(no for 1st patient and yes for 2nd patient) (0.1)(0.9) 0.09
P(yes for 1st patient and no for 2nd patient) (0.9)(0.1) 0.09
P(yes for 1st patient and yes for 2nd patient) (0.9)(0.9) 0.81
The first of these outcomes results in X 0, the second and third each result in
X 1, and the fourth results in X 2. Because the second and third outcomes
are disjoint, their probabilities can be added. The probability distribution of X is
then given by this table:
x Probability of x
0 0.01
1 0.09 0.09 0.18
2 0.81 ■
Solution
There are ten ways to pick two teams from the five: 5C2 52 10. These ten
pairs of teams, with the total possible attendance, are shown in Display 6.8.
The mean of a probability distribution for the random variable X is called its
expected value and is usually denoted by μX , or E(X).
You can report to your boss that the expected number of vehicles per
household is 1.745. However, you realize that you should give your boss an
estimate of how much a typical household is likely to differ from this average.
To calculate the standard deviation of the number of vehicles per household, you
find the expected value of the square of the deviations from the mean, which
is called the variance of the probability distribution. As always, the standard
deviation is then the square root of the variance. The variance of the distribution
in Display 6.1 on page 359 is given by
where pi is the probability that the random variable X takes on the specific
value xi. To get the standard deviation, take the square root of the variance.
Now your boss asks about the duplexes. For the duplexes, the simulated data
of Display 6.3 on page 360 should provide a good approximation of the actual
probability distribution of this random variable, call it Y. The mean, variance, and
standard deviation as calculated from the 500 simulated values in the frequency
table of Display 6.3 turn out to be
The neighborhood of duplexes can expect to have 3.530 motor vehicles per duplex
but often will see up to 1.357 vehicles more or less than this.
Solution
Student: I remember how to do this. First I have to make a reasonable
estimate of the center of each interval. For example, for the
μX 冱xi pi
(0 13.78)2(0.06) (3 13.78)2(0.28)
You can use a calculator to quickly find the mean and variance of a probability
distribution listed in a table. The mean and variance for the data in Display 6.11
are shown here. [See Calculator Note 6A to learn how to calculate these statistics.]
b. Suppose the data had been given in a relative frequency table like this
one, which shows the proportion of times each value occurs. Fill in the
rest of the second column.
Value, x Proportion, f /n
5 0.24
6 —?—
8 —?—
__
c. Show that you can find the mean, x , using the formula
冱 x __n
f
d. Discuss how the formula in part c relates to the formula for the expected
value, μ.
D6. Compare the mean number of vehicles in single-family households to the
mean number in a duplex. Compare the variances. What do you notice?
D7. Eighteen percent of high school boys and 10% of high school girls say they
rarely or never wear seat belts. Suppose one high school boy and one high
school girl are selected at random, with the random variable of interest
being the number in the pair who say they rarely or never wear seat belts.
Describe two ways of finding the expected value and standard deviation of
this random variable, at least approximately. (Use only the material that has
been presented in this chapter so far.)
D8. Define a random variable for Display 6.4 on page 361 that is different from
the two random variables in Display 6.5. Give its probability distribution
and compute its expected value and standard deviation.
D9. This sentence appeared in the British humor magazine Punch:
The figure of 2.2 children per adult female was felt to be in some respects
absurd, and a Royal Commission suggested that the middle classes be paid
money to increase the average to a rounder and more convenient number.
[Source: M. J. Moroney, Facts from Figures (Baltimore: Penguin Books, 1951).]
Who would find the figure absurd—the student or the statistician from the
dialogue on page 367?
Solution
First, note that the probabilities don’t sum to 1—it’s not even close. That’s because
the most likely outcome is winning nothing. So imagine another row with $0
for “Winnings” and 0.7793 for “Probability.” Using the expected-value formula,
you can verify that the expected value for this probability distribution is 0.6014.
This means that if you spend $1 to play this game, you “expect” to get 60.14¢
back in winnings. Of course, you can’t get this amount on any one play, but over
the long run that would be the average return. Another way to understand this
is to imagine playing the game 1000 times. You expect to get back $601.40, but
you will have spent $1000. The standard deviation of the winnings per game
is $4.040, which is quite large because of the possibility of winning one of the
larger amounts.
The expected value may not be of much importance to an individual player
(unless he or she is going to play many times) but it is of great importance to the
State of Wisconsin, which can expect to pay out $601.40 for every $1,000 bet. ■
μcdX c dμX
cdX d X
(2700 1.804)2(1/120000)
146.89
______
(x) 146.89 12.12
The expected winnings are $1.804, with a standard deviation of $12.12.
■
2XY 2X 2Y
2XY 2X 2Y
Note the difference in the previous two examples. In the first, there was one
randomly selected ticket and its value was tripled. In the second, there were three
randomly selected tickets and their values were added. The expected values are
the same, but there is more variability when the winnings from a single ticket are
tripled. The next example combines both sets of rules.
Solution
Let X be the number of hours per week taking dance lessons and Y be the number
of hours per week tutoring. The expected number of hours you take dance
lessons, μX, is 0(0.4) 1(0.3) 2(0.3), or 0.9, with a standard deviation, X, of
Taking the square root, the standard deviation, 12Y8X, is about $14.78.
■
For most students, the most surprising rule says that to get the variance of
the difference of two independently selected variables, you add the individual
variances. Why add and not subtract? Activity 6.1a will help you see the reason.
E(X) μX 冱xi pi
Estimating the expected value has many real-world applications. For example,
you can figure out your expected weekly savings or the break-even price for
insurance.
For random variables X and Y and constants c and d,
• the mean and standard deviation of a linear transformation of X are given by
μcdX c dμX
cdX d X
2XY 2X 2Y
2XY 2X 2Y
Practice
Probability Distributions from Data Number of Proportion of
Children Families
P1. Refer to Display 6.2 on page 359. Use this
0 0.524
line of random digits to simulate selecting
two households at random and counting 1 0.201
the number of motor vehicles in both 2 0.179
households together. Then repeat for two 3 0.070
more households. 4 or more 0.026
177324106845248 0.6
P2. As you saw in the example on page 362,
Relative Frequency
0.5
90% of lung cancer cases are caused by 0.4
smoking. How would you assign the 0.3
numbers in a random digit table so that they 0.2
represent the distribution in Display 6.6? Use 0.1
a random digit table (Table D on page 828)
0.0
to select a lung cancer patient at random and 0 1 2 3 4 5 6
then tell whether smoking was responsible Children per Family
for the patient’s lung cancer. Display 6.16 The number of children per family.
P3. The distribution in Display 6.16 gives the [Source: U.S. Census Bureau, Statistical Abstract of
number of children per family in the United the United States, 2004–2005, www.census.gov.]
States. Describe how to use this line from P4. Use the appropriate rule of probability from
a random digit table to find the number of Chapter 5 and the fact that there is an 0.088
children in a randomly selected family: chance that a single household will have
no vehicles to compute the probability that
48830994251773890817
the two households in a duplex will have
Use your process to find the total number a total of zero vehicles. How close is your
of children in three randomly selected computation to the estimate in Display 6.3
families. on page 360?
(0.27)(0.73)(0.27)(0.27)(0.73)(0.73)(0.73) (0.27)3(0.73)4
Another outcome with three grads is
(0.73)(0.27)(0.27)(0.73)(0.73)(0.73)(0.27) (0.27)3(0.73)4
which is exactly the same.
Jack: That’s because the probabilities of the outcomes with exactly three
grads all have the same factors but in a different order.
Jill: And there are 35 of them because
7! 35
73 ____
3!4!
It’s a good thing we didn’t have to list all of them!
384 Chapter 6 Probability Distributions
Jack: So the probability of getting exactly three college grads is
(number of ways to get 3 grads) (probability of each way)
or
73 (0.27)3(0.73)4 0.196
Jill: Yeah! Now we can do any problem they throw at us.
■
You might have noticed that the trials in Jack and Jill’s example aren’t
really independent. The first adult they selected who was age 25 or older has
probability 0.27 of being a college graduate. If that person is a college graduate,
the probability that the next person selected is a college graduate is a bit less.
However, the change in probability is so small that Jack and Jill can safely ignore
it. If there are 150,000,000 adults age 25 and older in the United States, then there
would be 40,500,000 college graduates. The probability that the first adult selected
40,500,000
is a college graduate is ________
150,000,000 , or 0.27. If that person turns out to be a college
graduate, the probability that the second adult selected is a college graduate
40,499,999
It is generally safe is ________
149,999,999 , or 0.2699999951, which is very close to 0.27.
to regard trials as
independent for all
You can treat your random sample as a binomial situation as long as the
practical purposes if the sample size, n, is small compared to the population size, N. A simple guideline
condition n 0.10N is that works well in practice is that n should be less than 10% of the size of the
satisfied. population, or n 0.10N.
In Activity 6.2a, you’ll conduct the tap water vs. bottled water experiment
yourself. This activity will give you practice in statistical decision-making as you
decide how many subjects will have to correctly select the tap water before you are
convinced that people can tell the difference.
A general rule does seem to be a possibility, and that, indeed, is the case. As
a bonus, there is a general rule for the standard deviation that turns out to be just
about as simple as the one for the expected value.
Probability
Probability
Probability
0.500 0.375 0.375
0.375
0.250 0.250
0.250
0.125 0.125 0.125
0 0 0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
Number of Successes Number of Successes Number of Successes
When n = 5 and p = 0.1 When n = 5 and p = 0.5 When n = 5 and p = 0.7
Probability
Probability
Probability
0.225 0.225 0.225
0.150 0.150 0.150
0.075 0.075 0.075
0 0 0
0 4 8 12 16 20 0 4 8 12 16 20 0 4 8 12 16 20
Number of Successes Number of Successes Number of Successes
When n = 20 and p = 0.1 When n = 20 and p = 0.5 When n = 20 and p = 0.7
Practice
Binomial Probabilities 89254 99538 18315 45716 36270 79665
P19. Suppose Jack and Jill ask six people who 49830 06226 88863 02322 36630 07176
can’t tell the difference between tap water
and bottled water to identify the tap water. P24. According to a recent government report,
Use their method to make a probability 73% of drivers now use seat belts regularly.
distribution table for six people. Then make Suppose a police officer at a road check
a graph of the distribution. randomly stops four cars to check for seat
belt usage. Find the probability distribution
P20. Suppose you flip a coin eight times. What
of X, the number of drivers using seat belts.
is the probability that you’ll get exactly [Source: National Highway Traffic Safety Administration.]
3 heads? Exactly 25% heads? At least
7 heads? Shape, Center, and Spread of a
P21. Suppose you roll a balanced die seven times. Binomial Distribution
What is the probability that you will get an P25. The median annual household income for
even number exactly two times? More than U.S. households is about $44,400. [Source: U.S.
half the time? Census Bureau, www.census.gov.]
a. Among five randomly selected U.S.
The Binomial Probability Distribution households, find the probability that four
P22. About 8.8% of people ages 14–24 are or more have incomes exceeding $44,400
“dropouts,” persons who are not in regular per year.
school and who have not completed the b. Consider a random sample of 16 U.S.
12th grade or received a general equivalency households.
degree. Suppose you pick five people at
random from this age group. [Source: U.S. i. What is the expected number of
Census Bureau, Statistical Abstract of the United States, 2006, households with annual incomes
Table 255.] under $44,400?
a. What is the probability that none of the ii. What is the standard deviation of the
five are dropouts? number of households with annual
b. What is the probability that at least one is incomes under $44,400?
a dropout? iii. What is the probability of getting at
c. Make a probability distribution table for least 10 out of the 16 households with
this situation. annual incomes under $44,400?
P23. Describe how to use simulation to construct c. In a sample of 16 U.S. households,
an approximate binomial distribution for the suppose none had annual incomes under
situation in P22. Conduct two trials of your $44,400. What might you suspect about
simulation, using these random digits: this sample?
Exercises
E23. If you roll a pair of dice five times, find the a. What is the probability that your sample
probability of each outcome. will include at least three people who do
a. You get doubles exactly once. not have health insurance?
b. You get exactly three sums of 7. b. What are the expected value and standard
deviation of the number of people in your
c. You get at least one sum of 7.
sample without health insurance?
d. You get at most one sum of 7.
E27. You buy 15 lottery tickets for $1 each.
E24. Suppose you select five numbers at random With each ticket, you have a 0.06 chance of
from 10 through 99, with repeats allowed. winning $10. Taking into account the cost of
Find the probability of each outcome. the tickets,
a. Exactly three of the numbers are even. a. what is your expected gain (or loss) on
b. Exactly one of the numbers has digits this purchase?
that sum to a number greater than or b. what is the probability that you will gain
equal to 9. $10 or more?
c. At least one of the numbers has digits c. what is the standard deviation of your
that sum to a number greater than or gain?
equal to 9.
E28. An oil exploration firm is to drill ten wells,
E25. According to a recent Census Bureau each in a different location. Each well has
report, 37 million Americans, or 12.7% a probability of 0.1 of producing oil. It will
of the population, live below the poverty cost the firm $60,000 to drill each well.
level. Suppose these figures hold true for A successful well will bring in oil worth
the region in which you live. You plan to $1 million. Taking into account the cost
randomly sample 25 Americans from your of drilling,
region. [Source: Current Population Survey, 2005 Annual
Social and Economic Supplement, www.census.gov.] a. what is the firm’s expected gain from the
ten wells?
a. What is the probability that your sample
will include at least two people with b. what is the standard deviation of the
incomes below the poverty level? firm’s gain for the ten wells?
b. What are the expected value and standard c. what is the probability that the firm will
deviation of the number of people in your lose money on the ten wells?
sample with incomes below the poverty d. what is the probability that the firm will
level? gain $1.5 million or more from the ten
E26. According to the U.S. Census Bureau, about wells?
16% of residents have no health insurance. E29. A home alarm system has one detector for
You are to randomly sample 20 residents each of the n zones of the house. Suppose the
for a survey on health insurance coverage. probability is 0.7 that the detector sounds
[Source: U.S. Census Bureau, Current Population Survey, an alarm when an intruder passes through
March 2004.]
its zone and that this probability is the
Probability
0.08 0.300
0.06 0.225
0.04 0.150
0.02 0.075
0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Number of Trials to Get First Number of Trials to Get First
Success When p = 0.1 Success When p = 0.3
0.5 0.8
Probability
Jack: Yep, we can do it. This is easier than our problem from Section 6.2
about the binomial distribution.
Jill: It sure is. Let’s do it for the blood bank example, where about 40%
of people have type A blood. There the probability that the first
donation checked is type A is 0.4.
Jack: Right. The probability that the waiting time, X, is 1 is P(X 1) 0.4.
Jill: Now for P(X 2). For the second donation to be the first that is
type A . . .
Jack: What kind of nonsense is that? “For the second donation to be
the first . . . ?”
Jill: Sorry, let me say it with more words. Suppose the first donation
checked isn’t type A and the second donation is. Then we have our
first success with the second donation.
Jack: That’s better. The probability of this sequence of outcomes is
P( first donation isn’t type A and second is type A) (0.6)(0.4),
or 0.24.
Always check conditions! Jill: But you get to multiply like that only if the events are independent.
What if a whole family had donated blood? Then their blood types
might not be independent.
Jack: Yeah, we will have to be careful about things like that. The
probability can’t change depending on who else has come in.
Jill: If we can assume independence, the probability that the
third donation checked will be the first that is type A is
P(X 3) (0.6)(0.6)(0.4).
Jack: Because we have to have two “failures” and then our first success.
Jill: This could go on forever. So we better get started and make a table.
Number of Trials
to Get First Success Probability
1 0.4
2 (0.6)(0.4) 0.24
3 (0.6)(0.6)(0.4) 0.144
4 (0.6)(0.6)(0.6)(0.4) 0.0864
5 (0.6)(0.6)(0.6)(0.6)(0.4) 0.05184
P(X k) (1 p)k1p
for k 1, 2, 3, . . . .
Both the binomial and geometric random variables start with a sequence
of independent trials with two outcomes and constant probability of success.
The binomial counts the number of successes in a fixed number of trials; the
geometric counts the number of the trial on which the first success occurs. [See
Calculator Notes 6F and 6G to learn how to calculate geometric probabilities and
cumulative probabilities of a geometric distribution.]
Like the formulas for the binomial distribution, those for the expected value
and standard deviation of the geometric distribution turn out to be surprisingly
simple. Although using the mean as the measure of center and the standard
deviation as the measure of spread seemed complicated in Chapter 2, it pays
off now.
You can find a proof of the formula for the expected value in E42. The proof
of the formula for the standard deviation is a bit more involved.
μX _p1_ ___
1 2.5
0.4
with standard deviation
_____ _______
1 p _________
X _______ 1 0.4
1.94
p 0.4
The expected value is Note that the expected value is the expected number of donations that must
the expected number of be checked to get one that is type A. It is not the expected number checked
trials needed to get the before getting one that is type A. In this case, the expected number includes
first success.
1.5 donations that aren’t type A and one donation that is. ■
P(X k) (1 p)k1p
for k 1, 2, 3, . . . and 0 p 1.
• The mean (expected value) of the distribution is
E(X) μX _p1_
Practice
Waiting-Time Situations The Formula for a Geometric Distribution
P27. Suppose you are rolling a pair of dice and P28. About 85% of Americans over age 25
waiting for a sum of 7 to occur. have graduated from high school. You are
a. What is the probability that you get a randomly sampling and interviewing adults
sum of 7 for the first time on your first one at a time for an opinion poll that applies
roll? On your second roll? only to high school graduates. [Source: U.S. Census
Bureau, Statistical Abstract of the United States, 2006.]
b. Using the graphs in Display 6.29 as a
a. What is the probability that you get your
guide, sketch an approximate graph of the
first high school graduate on the third
probability distribution of this situation.
person you interview?
6.3 The Geometric Distribution 399
b. What is the probability that you get your d. What assumption are you making in
first high school graduate sometime after computing these probabilities?
the second person you interview?
Expected Value and Standard Deviation
c. Sketch an approximate distribution of the
number of the interview on which you P31. The probability that a random blood
get the first high school graduate. donation is type B is 0.1.
P29. Suppose 9% of the engines manufactured a. What is the expected number of
on a certain assembly line have at least one donations that must be checked to obtain
defect. Engines are randomly sampled from the first that is type B?
this line one at a time and tested. What is the b. What is the standard deviation of the
probability that the first nondefective engine number of donations that must be
is found checked to obtain the first of type B?
a. on the third trial? c. What is the expected number of
b. before the fourth trial? donations that must be checked to
obtain two that are type B? To obtain
three that are type B?
P32. About 85% of Americans over age 25
have graduated from high school. You are
randomly sampling and interviewing adults
one at a time for an opinion poll that applies
only to high school graduates.
a. What is the expected number of
interviews you have to conduct to get
the first high school graduate?
b. What is the standard deviation of the
P30. About 70% of the time, the telephone lines number of interviews you have to
coming into a concert ticket agency are all conduct to get the first high school
busy. Suppose you are calling this agency. graduate?
a. What is the probability that it takes you c. What is the expected number of
only one try to get through? interviews you have to conduct to get
b. What is the probability that it takes you the first ten high school graduates?
two tries? d. What is the standard deviation of the
c. What is the probability that it takes you number of interviews you have to
four tries? conduct to get the first ten high school
graduates?
Exercises
E35. You are participating in a “question bee” in a. What is the probability that you are still
history class. You remain in the game until in the bee after the first round?
you give your first incorrect answer. The b. What is the probability that you are still
questions are all multiple choice, each with in the bee after the third round?
four possible answers exactly one of which is
c. What is the expected number of rounds
correct. Unfortunately, you have not studied
you will be in the bee?
for this bee, and you simply guess randomly
on each question. d. If an entire class of 32 is simply guessing
on each question, how many students
Chapter Summary
In this chapter, you have learned that random variables are variables with
a probability attached to each possible numerical outcome. Probability
distributions, like data distributions, are characterized by their shape, center,
and spread. You can use similar formulas for the mean and standard deviation.
The binomial distribution is a model for the situation in which you count
the number of successes in a random sample of size n from a large population.
A typical question is “If you perform 20 trials, what is the probability of getting
exactly 6 successes?”
The geometric distribution is a model for the situation in which you count the
number of trials needed to get your first success. A typical question is “What is
the probability that it will take you exactly five trials to get the first success?”
Review Exercises
E43. Suppose you roll a dodecahedral (12-sided) E45. Two different pumping systems on levees
die twice. Your summary statistic will be the have pumps numbered 1, 2, 3 and 4,
sum of the two rolls. configured as in Display 6.32. In System
a. What is the probability that the sum is 3 I, water will flow from A to B only if both
or less? pumps are working. In System II, water
will flow from A to B if either pump is
b. Compute the mean and standard
working. Assume that each pump has
deviation of the distribution of outcomes
this lifelength distribution: 1 month with
of a single roll of one die.
probability 0.1, 2 months with probability
c. Compute the mean and standard 0.3, and 3 months with probability 0.6.
deviation of the probability distribution Lifelength refers to the length of time the
of the sum of two rolls. pump will work continuously without
E44. Suppose you and your partner each roll two repair. Assume that the pumps operate
dice. Each of you computes the average of independently of each other. (You are
your two rolls. The summary statistic is the interested only in whether water will flow,
difference between your average and your not the amount of water flowing.)
partner’s average. Describe the sampling
distribution of these differences.
P(X k m X m)
E50. It is estimated that 16% of Americans have This is referred to as the memoryless
no health insurance. A polling organization property of the geometric distribution.
randomly samples 500 Americans to ask E53. This question demonstrates one reason why
questions about their health. [Source: U.S. Census the mean and variance are considered so
Bureau, Current Population Survey, March 2004.]
important. Suppose you select one book at
a. What is the probability that more than random to read from List A and one from
420 of those sampled will have health
insurance?
AP1. This table gives the percentage of women who AP4. Russell’s parents buy two apples each
ultimately have a given number of children. weekend, and put them in his lunch on
For example, 19% of women ultimately have two randomly selected weekdays of the
3 children. What is the probability that two following week. What is the probability that
randomly selected women will have a total of Russell gets an apple on exactly one day that
exactly 2 children? is a Monday or Tuesday?
52 (0.4)2(0.6)3
Number of 0 1 2 3 4 5 or
children more
7 Sampling Distributions
120
100
80
Frequency
60
40
What would happen
if you could take
random samples over
and over again from 20
your population?
Sampling distributions
show how much your
results might vary from 0
sample to sample,
as when estimating 2000 4000 6000 8000 10000
the mean number of
departures from U.S. Mean Number of Departing Passengers
airports for a given
period of time.
You have studied methods that are good for exploration and description, but for
inference—going beyond the data in hand to conclusions about the population
or the process that created the data—you need to collect the data by using a
random sample (a survey) or by randomly assigning treatments to subjects (an
experiment). The promise of Chapter 4 was that if you used those methods to
produce a data set, you then could use your data to draw sound conclusions.
Randomized data production not only protects against bias and confounding but
also makes it possible to imagine repeating the data production process so you
can estimate how the summary statistic you compute from the data would vary
from sample to sample. To oversimplify, but only a little, a sampling distribution is
what you get by repeating the process of producing the data and computing the
summary statistic many times.
30 35 40 45 50 55 60
Mean Age for n = 3 58
Display 7.1 A simulated sampling distribution of the mean age
from random samples of three people who could
have been laid off at Westvaco.
In the Westvaco analysis, you went through four steps in using simulation to
generate an approximate sampling distribution of the mean age of three workers:
Random sample 1. Take a random sample of a fixed size n from a population.
Summary statistic 2. Compute a summary statistic.
Repetition 3. Repeat steps 1 and 2 many times.
Distribution 4. Display the distribution of the summary statistics.
0 2 4 6 8 10 12 14 16 18 20
Rectangle_Area
0 2 4 6 8 10 12 14 16
Mean_Area
Solution
There are 5C2, or 10, equally likely ways to select two national parks, as shown in
Display 7.5. The relative frequency histogram shows the sampling distribution
of the sum. The probability that you will have to map more than 600 square
4
miles is __
10 .
Sample of Two Parks Total Area (sq mi)
A and B 175
A and C 646
A and R 497
A and Z 348
B and C 583
B and R 434
B and Z 285
C and R 905
C and Z 756
R and Z 607
0.10
Relative Frequency of
Total_Square_Miles
0.08
0.06
0.04
0.02
0.30
Relative Frequency of
Sample_Maximum
0.25
0.20
0.15
0.10
0.05
10 20 30 40 50 60 70 80 90
Sample_Maximum
Solution
The sample maximum The population maximum is 93, but the mean of the sampling distribution is only
can never be too big! 56. If you could repeat your process of estimating the population maximum by
using the maximum in the sample, on average your estimate would be too small.
In other words, the sample maximum is a biased estimator of the population
maximum. That’s not too surprising, because the maximum of a sample can
never be larger than the maximum of its population.
Frequency
description of each of the planets’ moons. 20
15
Planet Number of Moons 10
Mercury 0 5
Venus 0 0
30 46 62 78 94 110
Earth 1 Exam Scores
Mars 2 Display 7.8 A distribution of exam scores.
Jupiter 63
a. Match each histogram in Display 7.9 to
Saturn 56
its description.
Uranus 27
I. the individual scores for one random
Neptune 13
sample of 30 students
Display 7.7 Number of moons for the planets in II. a simulated sampling distribution of
our solar system. [Source: NASA, solarsystem the mean of the scores of 100 random
.nasa.gov.]
samples of 4 students
a. What is the smallest number of words III. a simulated sampling distribution of
you might have to write? The largest? the mean of the scores of 100 random
b. Describe how to generate a simulated samples of 30 students
sampling distribution of the total number
of moons you must describe. A.
B.
c. Generate 20 values for a simulated
Frequency
Frequency
C.
Frequency
50 58 66 74 82 90
6
4 a. What is the most you could be paid? The
2 least?
0 b. Construct the sampling distribution of
50 60 70 80 90 your total possible earnings.
B. c. What is the probability that you will be
20 paid $3 million or more?
Frequency
15
10 Properties of Point Estimators
5
P5. The areas of the five national parks in Utah
0
80 90 100 110
are given in Display 7.11.
National Park Area (sq mi)
C.
35 Arches (A) 119
30 Bryce Canyon (B) 56
Frequency
25
20 Canyonlands (C) 527
15 Capitol Reef (R) 378
10
5 Zion (Z) 229
0
60 70 80 90 100 Display 7.11 Areas of the five national parks in Utah.
Exercises
E1. Random samples are taken from the c. Compare the spreads. How does the
population of random digits 0 through 9, spread of the sampling distribution
with replacement. depend on the sample size?
a. Each histogram in Display 7.14 is a E2. Three very small populations are given, each
simulated sampling distribution of the with a mean of 30.
sample mean. Match each sampling A. 10 50
distribution to the sample size used:
B. 10 20 30 40 50
1, 2, 20, or 50.
C. 20 30 40
b. Compare the means. How does the mean
of the sampling distribution depend on Match each population to the sampling
the sample size? distribution of the sample mean (Display
7.15) for a sample of size 2 (taken with
replacement).
420 Chapter 7 Sampling Distributions
A. B.
160 140
120
Frequency
Frequency
120 100
80 80
60
40 40
20
0 0
2 4 6 8 0 2 4 6 8
C. D.
120 250
100 200
Frequency
Frequency
80 150
60
100
40
20 50
0
0
2 4 6 8 10 2 4 6
Display 7.14 Histograms of sample means of random digits.
I. II. III.
0.6
Relative Frequency
Relative Frequency
Relative Frequency
0.20 0.30
0.5
0.16
0.4 0.20
0.12
0.3
0.08 0.2 0.10
0.04 0.1
0 0
10 20 30 40 50 60 0 10 20 30 40 50
10 20 30 40 50 60
E3. The mean and the median are only two of 240
many possible measures of center. Another 200
Frequency
FL 37.5 NY 31.4 8
Frequency
GA 37.7 OH 26.4 6
HI 4.1 OK 44.8 4
IA 36.0 OR 62.1 2
ID 53.5 PA 29.0
0
IL 36.1 RI 0.8 15 35 55 75 95 115
Mean Area for n = 5
IN 23.1 SC 19.9
KS 52.6 SD 49.4 Display 7.18 A simulated sampling distribution of
KY 25.9 TN 27.0
the sample mean of 5 state areas.
LA 30.6 TX 170.8 d. From what particular sample could the
MA 5.3 UT 54.3 largest value in the plot in Display 7.18
MD 6.7 VA 26.1 have come?
ME 21.3 VT 6.1 E5. The five tennis balls in a can have diameters
MI 37.4 WA 43.6 62, 63, 64, 64, and 65 mm. Suppose you
MN 54.0 WI 35.9
select two of the tennis balls at random,
without replacing the first before selecting
MO 44.6 WV 15.5
the second.
MS 30.5 WY 62.6
a. Construct a dot plot of the five
20 population values.
b. List all possible sets of size 2 that can be
15
chosen from the five balls. There are 5C2
Frequency
Frequency of DoubleIQR
a. What is the value of the population 140
parameter Joel is trying to estimate? 120
120
c. Compute the variance of each sample,
100
dividing by n 2, and enter it in the
80
third column. (You should be able to do 60
this in your head.) 40
d. Compute the variance of each sample, 20
dividing by n 1, or 1, and enter it in the
0 2 4 6 8 10
fourth column.
SD_Dividing_by_n_minus_1
e. Compute the average of each column.
f. Compute the variance of the population Sample of Rectangles Histogram
{2, 4, 6} using the formula on page 366 160
with the probabilities of selection being 140
Frequency
In the rest of this section, you’ll see if the results from Activity 7.2a, step 8, are
true for other distributions.
Children Histogram
0.6
Relative Frequency of
0.5
0.4
Number
0.3
0.2
0.1
If you count all families with four or more children as having four children,
this highly skewed population has a mean of about 0.9 and a standard deviation of
about 1.1. [See Calculator Note 6A to review how to calculate the mean and standard
deviation of a probability distribution.]
Suppose you are working for a video game company that wants to sample
families to study interests of children. What will the sampling distributions of the
mean number of children per family look like? These four steps review how to
construct a simulated sampling distribution of the mean for samples of size 4.
Random sample 1. Take a random sample from a population.
In your model, there should be a 0.524 chance that a randomly selected family
will have no children, and so on. Use a table of random digits, as you did in
Section 5.2, or use a calculator’s random number generator to select a random
sample of four families from this distribution.
Relative Frequency
0.5
0.14
0.4 0.12
0.3 0.10
0.08
0.2 0.06
0.1 0.04
0.02
0.0 0.00
0.0 1.0 2.0 3.0 4.0 0.0 1.0 2.0 3.0 4.0
Mean Mean
n 10: 0.12 n 20: 0.08
Relative Frequency
0.07
0.10
Relative Frequency
0.06
0.08 0.05
0.04
0.06 0.03
0.04 0.02
0.01
0.02
0 0.5 1.0 1.5 2.0
0.00 Mean
0.0 0.5 1.0 1.5 2.0 2.5
Mean
n 40: 0.06 Sample Standard
Size, n Mean Error, SE
Relative Frequency
0.05
1 0.873 1.1
0.04
4 0.873 0.55
0.03 10 0.873 0.35
0.02 20 0.873 0.25
0.01 40 0.873 0.17
Population 0.873 1.1
0
0.5 1.0 1.5 2.0
Mean
Display 7.25 Sampling distributions of the sample mean for samples of size 1, 4, 10, 20, and 40.
__
DISCUSSION Shape, Center, and Spread of the Sampling Distribution of x
D4. Why is it the case that the sampling distribution of the mean for samples of
size 1 is identical to the population distribution?
D5. The scatterplot in Display 7.26 shows the standard error, SE, plotted
against the sample size, n, for the table in Display 7.25 on page 429. Find a
transformation that linearizes these points. Use the transformation and__the
equation of the resulting least squares line to justify the rule SE /n .
1.2
Standard Error, SE
1.0
0.8
0.6
0.4
0.2
0.0
0 5 10 15 20 25 30 35 40 45
Sample Size, n
Display 7.26 SE plotted against n.
D6. Justify the comment “Large samples are better, because the sample mean
tends to be closer to the population mean.”
P = –?–
0.9 x– = 1.5
Sample Mean
__
Display 7.27 Sampling distribution of x when μ x_ 0.9,
x_ 0.25, and n 20. The shaded area shows the
probability that the sample mean is less than 1.5.
The z-score for the value 1.5 is
__ __ __
x μ x μ ________
_
x mean
z _______________ __ 1.5 0.9 2.4
______x ______
standard deviation x_ /n 0.25
You can use a table or a calculator to find that the area under a standard
normal curve to the left of 1.5 (a z-score of 2.4) is about 0.9918, which is the
probability that the sample mean will fall below 1.5. In a random sample of
20 families, it is almost certain that the average number of children per family
will be less than 1.5.
Using a calculator or Table A on page 824, the probability that z is less than 2.40 is
about 0.9918. ■
Practice
Shape, Center, and
__ Spread of the Sampling for samples of size 4 and 10 are shown in
Distribution of x Display 7.28.
P7. The distribution of the population of the a. Which distribution is which? Make a
number of motor vehicles per household rough estimate of the mean and standard
and sampling distributions of the mean deviation of each distribution.
I. II. III.
0.40 0.12 0.20
Relative Frequency
Relative Frequency
Relative Frequency
0.35
0.10 0.16
0.30
0.25 0.08
0.12
0.20 0.06
0.15 0.08
0.04
0.10
0.05 0.02 0.04
0
0.0 1.0 2.0 3.0 4.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
0.10
0.25 0.08
0.06
0.20
0.04
0.15 0.02
0.10 0 2 4 6 8 10
Sample_Mean
0.05 B.
Measures from Sample of Skewed Population Histogram
Relative Frequency of Sample_Mean
0 2 4 6 8 10 0.20
Population 0.18
Display 7.30 A skewed population. 0.16
0.14
a. Match the theoretical summary 0.12
information with the correct simulated 0.10
sampling distribution (A, B, or C) and 0.08
the correct sample size (2, 4, or 25). 0.06
I. mean 2.50; standard error 0.48 0.04
II. mean 2.50; standard error 1.20 0.02
0.14
c. Does the sampling distribution appear
0.12
to be approximately normal in all cases?
If not, explain how the given shape came 0.10
about. 0.08
d. For which sample sizes would it be
0.06
reasonable to use the rule stating that
about 95% of all sample means lie within 0.04
approximately two standard errors of the 0.02
population mean?
0 2 4 6 8 10
Sample_Mean
0.14
0.18 0.12
0.16 0.10
0.14 0.08
0.12 0.06
0.04
0.10
0.02
0.08
0.06 0 1 2 3 4 5 6 7 8 9 10
0.04 Sample_Mean
0.02 B.
0 2 4 6 8 10 Measures from Sample of M-shaped Population Histogram
0.12
a. Match the theoretical summary
information with the correct simulated 0.10
sampling distribution (A, B, or C) and the 0.08
correct sample size (2, 4, or 25).
0.06
I. mean 4.50; standard error 1.75
0.04
II. mean 4.50; standard error 0.70
0.02
III. mean 4.50; standard error 2.47
b. Does the rule for computing the standard 0 1 2 3 4 5 6 7 8 9 10
error of the mean from the standard Sample_Mean
deviation of the population appear to C.
hold in all three situations?
Measures from Sample of M-shaped Population Histogram
c. Does the sampling distribution appear
Relative Frequency of Sample_Mean
0.16
to be approximately normal in all cases?
0.14
If not, explain how the given shape came
about. 0.12
0.02
0 2 4 6 8 10
Sample_Mean
Frequency
25
to get a random sample of size 5 from the 20
population in E16. (You will have to estimate 15
the percentages in the population from the 10
histogram.) 5
E19. Suppose police records in a small city show
0
that the number of automobile accidents 0.5 1.0 1.5 2.0
Mean for –?– Days
per day for 1,045 days has the frequency
distribution shown in Display 7.34. The 45
relative frequencies, in order, are 0.36, 0.37, 40
0.17, 0.09, and 0.01. 35
30
Frequency
400 25
350 20
300 15
Frequency
250 10
200 5
150
0
100 0.5 1.0 1.5 2.0 2.5 3.0
50 Mean for –?– Days
Frequency
40 what is the probability that it weighs less
20 than 0.148 g?
0 b. If you select four ball bearings at random,
1.0 1.4 1.8 2.2 2.6 3.0 3.4 3.8 what is the probability that their mean
Sample Mean weight is less than 0.148 g?
B. 50 c. If you select ten ball bearings at random,
40 what is the probability that their mean
Frequency
Display 7.40 shows the exact sampling distributions for samples of size 10, 20,
and 40 drawn from a population with 60% “successes.” They should be similar in
shape, center, and spread to your dot plots from Activity 7.3a.
0.3 0.2 0.15
Relative Frequency
Relative Frequency
0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Sampling Distribution Sampling Distribution Sampling Distribution
for n = 10 and p = 0.6 for n = 20 and p = 0.6 for n = 40 and p = 0.6
Display 7.40 Exact sampling distributions of p̂ for samples of size 10, 20, and 40 when
p 0.60.
These formulas tell you two facts about the sampling distribution of p̂:
• The mean does not change depending on the sample size. No matter how
large the sample size, the mean stays at p.
• The spread decreases as the sample size increases.
The properties of the sampling distribution of sample proportions are
summarized in this box, followed by an example showing how to use them.
where pi is the probability that the random variable takes on the specific
value xi. The standard deviation is the square root of the variance.
Practice
Sampling Distribution of the Number Suppose you take a random sample of
of Successes 100 of the freshmen surveyed. What is the
P16. A survey of hundreds of thousands of college probability that you will find that between
freshmen found that 63% believe “dissent is 56 and 70 of the freshmen in your sample
a critical component of the political process.” believe this?
[Source: Higher Education Research Institute, UCLA,
The American Freshman, National Norms for Fall 2005.]
0.35 0.25
0.30 10, with p 0.1. Compute the mean
0.25 0.20
0.20 0.15 and SE using the formulas given in D12
0.15 0.10 on page 452. Do these formulas give the
0.10
0.05 0.05 same mean and SE as the formulas you
0.00 0.00 used in part a?
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Sampling Distribution Sampling Distribution p̂ Probability
for n = 10 and p = 0.10 for n = 20 and p = 0.10
0.0 0.348678
0.14 0.1 0.387420
Relative Frequency
Relative Frequency
Exercises
E35. The ethnicity of about 92% of the population Perez, Sanchez, Rivera, Ramirez, Torres,
of China is Han Chinese. Suppose you take Gonzales. Suppose you take a random
a random sample of 1000 Chinese. [Source: CIA sample of 500 Spanish-surnamed people in
World Factbook.] the United States. [Source: David L. Word and R. Colby
a. Make an accurate sketch, with a scale Perkins, Jr., Building a Spanish Surname List for the 1990’s—
A New Approach to an Old Problem, Technical Working Paper
on the horizontal axis, of the sampling no. 13, March 1996.]
distribution of the proportion of Han
Chinese in your sample. a. Make an accurate sketch of the sampling
distribution of the proportion of people
b. Make an accurate sketch, with a scale in your sample who have one of these
on the horizontal axis, of the sampling surnames.
distribution of the number of Han
Chinese in your sample. b. Make an accurate sketch of the sampling
distribution of the number of people
c. What is the probability of getting 90% or in your sample who have one of these
fewer Han Chinese in your sample? surnames.
d. What is the probability of getting 925 or c. What is the probability of getting 20% or
more Han Chinese? fewer with one of these surnames in your
e. What numbers of Han Chinese would be sample?
rare events? What proportions? d. What is the probability of getting 105 or
E36. According to the U.S. Census Bureau, more people with one of these surnames?
22.3 percent of the Spanish-surnamed e. What numbers of people with one of
population in the United States have one these surnames would be rare events?
of these surnames: Garcia, Martinez, What proportions?
Rodriguez, Lopez, Hernandez, Gonzalez,
7.3 Sampling Distribution of the Sample Proportion 455
E37. Refer to the situation in E35. This time, c. Of the 50 Westvaco employees listed in
suppose you take a random sample of Display 1.1 on page 5, 10 were under
100 Chinese rather than 1000. the median age. Is this about what you
a. Describe how the shape, center, and would expect from a random sample
spread of the sampling distribution of of 50 residents of the United States, or
the proportion of Han Chinese in your should you conclude that this group is
sample will be different from your sketch special in some way? If you think the
in E35, part a. group is special, what is special about it?
b. Describe how the shape, center, and E40. In fall 2004, 37% of the 38,859 first-year
spread of the sampling distribution of the students attending the California State
number of Han Chinese in your sample University system needed remedial work
will be different from your sketch in E35, in mathematics. [Source: California State University,
www.asd.calstate.edu.]
part b.
a. Suppose you select 136 students at
c. With a sample of size 100, will the
random from this population of students.
probability of getting 90% or fewer
Make an accurate sketch, with a scale
Han Chinese in your sample be larger
on the horizontal axis, of the sampling
or smaller than the probability you
distribution of the number of students
computed in E35, part c? Explain.
who need remedial work.
E38. Refer to the situation in E36. This time,
b. What is the probability that 68 or fewer
suppose you take a random sample of
in a random sample of 136 students need
100 Spanish-surnamed Americans rather
remedial work?
than 500.
c. Suppose you select 2850 students at
a. Describe how the shape, center, and
random. Make an accurate sketch, with
spread of the sampling distribution of
a scale on the horizontal axis, of the
the proportion of people in your sample
sampling distribution of the proportion
with one of the given surnames will be
who need remedial work.
different from your sketch in E36, part a.
d. What is the probability of getting 54%
b. Describe how the shape, center, and
or more who need remedial work in a
spread of the sampling distribution of the
random sample of 2850 students?
number of people with one of the given
surnames will be different from your e. Of the 2850 students entering California
sketch in E36, part b. State University, Northridge, 54% needed
remedial work. Is this result about what
c. With a sample of size 100, will the
you would expect from a random sample,
probability of getting 20% or fewer with
or should you conclude that this group is
one of the given surnames be larger
special in some way?
or smaller than the probability you
computed in E36, part c? Explain.
E39. In 1991, the median age of residents of the
United States was 33.1 years. [Source: U.S. Census
Bureau, www.census.gov.]
a. What is the probability that one person,
selected at random, will be under the
median age?
b. In a random sample of 50 people from
the United States, what is the probability
of getting 10 or fewer under the median
age?
456 Chapter 7 Sampling Distributions
E41. About 60% of married women are employed. E44. The guideline states that it is appropriate
If you select 75 married women, what is the to use the normal distribution as an
probability that between 30 and 40 of them approximation for the sampling distribution
are employed? What assumptions underlie of a sample proportion if both np and
your computation? [Source: U.S. Census Bureau, n(1 p) are greater than or equal to 10. To
Statistical Abstract of the United States, 2006.] check this out, generate simulated sampling
E42. Suppose 80% of a certain brand of computer distributions for values of p equal to 0.90
disk contain no bad sectors. If 100 such disks and 0.98. Use sample sizes of 50, 100, and
are inspected, what is the approximate chance 500 with each value of p. Does the guideline
that 15 or fewer contain bad sectors? What appear to be reasonable?
assumptions underlie this approximation? E45. In this exercise you will learn why the
E43. The histograms in Display 7.46 are sampling introduction to this section said that the
distributions of p̂ for samples of size 5, 25, properties of sample proportions parallel
and 100, first for a population with p 0.2 the properties of sample means in the
and then for a population with p 0.4. previous section. Recall that 60% of
a. Do the means of the sampling Mississippians use seat belts. Imagine the
distributions depend on p? On n? population of Mississippians as consisting
of a barrel containing one piece of paper per
b. How do the spreads of the sampling
Mississippian. Those Mississippians who
distributions depend on p and n?
wear seat belts are represented by a piece
c. How do the shapes of the sampling of paper with the number 1 on it. Those
distributions depend on p and n? Mississippians who don’t wear seat belts
d. For which combination(s) of p and n are represented by a piece of paper with the
would you be willing to use the rule that number 0 on it. Display 7.47 (on the next
roughly 95% of the values lie within two page) shows this population.
standard errors of the mean?
0.2 0.10
0.5 Relative Frequency
Relative Frequency
Relative Frequency
0.4
0.3 0.05
0.1
0.2
0.1
0 0 0
0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0
Sampling Distribution for Sampling Distribution for Sampling Distribution for
p = 0.2 and n = 5 p = 0.2 and n = 25 p = 0.2 and n = 100
Relative Frequency
Relative Frequency
0.30 0.08
0.06
0.20 0.10
0.04
0.10
0.02
0 0 0
0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.4 0.6 0.8 1.0
Sampling Distribution for Sampling Distribution for Sampling Distribution
p = 0.4 and n = 5 p = 0.4 and n = 25 for p = 0.4 and n = 100
Display 7.46 Sampling distributions of p̂ for p 0.2 and p 0.4 for samples of size n 5, 25,
and 100.
Chapter Summary
In this chapter, you have learned to create a sampling distribution of a summary
statistic. To create an approximate sampling distribution using simulation, you
first define a process for taking a random sample from the given population.
You then take this random sample and compute the summary statistic you
are interested in. Finally, you generate a distribution of values of the summary
statistic by repeating the process. For some simple situations, you can list all
possible samples and get the exact distribution of the summary statistic.
Some summary statistics have easily predictable sampling distributions.
• If a random sample of size n is taken from a population with __
mean μ and
variance , the sampling distribution of the sample mean, x , has mean and
2
standard error
μx_ μ and x_ ____ __
n
The sampling distribution of the sum of the values in the sample has mean
and standard error
__
μsum nμ and sum n
These formulas are summarized in Display 7.48.
the difference of the mean enzyme level of Display 7.53 Histogram of passengers for the
hamsters raised in short days and the mean 30 largest world airports. [Source: Airports
enzyme level of hamsters raised in long days. Council International, www.airports.org.]
a. What is the value of the summary statistic The histograms and summary statistics in
d for Kelly’s hamsters? Display 7.54 show simulated distributions of
b. Suppose the length of a day makes no 5000 sample means for samples of size 5, 10,
difference in enzyme levels, that is, and 20, selected without replacement from
suppose the eight numbers would have the numbers of passengers.
been the same if the hamsters had all Check how well the shapes, means, and
received the opposite treatment. Use standard deviations of the simulated
simulation to construct an approximate sampling distributions agree with what the
sampling distribution of all possible values theory says they should be. Do you see any
of d. In other words, assign the hamsters reason why the theory you have learned
at random so that four get each treatment should not work well in any of these cases?
but the enzyme level for the hamster is the
same no matter what treatment it gets.
0.12
morning or afternoon. Her data are shown
in Display 7.55.
0.08
Morning Afternoon
0.04 Commute Time Commute Time
Day (in minutes) (in minutes)
0.00 Monday 16 9
30,000 40,000 50,000 60,000 70,000
Mean Number of Passengers (in thousands) for n = 10 Tuesday 14 8
Wednesday 13 5
0.40
0.35 Thursday 11 7
Relative Frequency
0.30 Friday 10 11
0.25
Display 7.55 Commute times.
0.20
0.15 a. What are the mean μ and variance 2
0.10 of the morning commute times? Of the
0.05 afternoon commute times?
0.00 b. If the student selects a day at random
30,000 40,000 50,000 60,000 70,000
Mean Number of Passengers (in thousands) for n = 20 and finds the total commute time for that
day, what are the mean and variance of
Display 7.54 Summary statistics and histograms of the sampling distribution of this total
the distribution of sample means (in commute time?
thousands) for samples of size 5, 10,
and 20. c. Are your answers in part b equal to the
sum of those in part a? Explain why they
should or shouldn’t be equal.
AP1. Five math teachers are asked how many AP4. The distribution of the population of
pens they are currently carrying, and the the millions of household incomes in
results are 1, 1, 1, 2, 2. Random samples California is skewed to the right. Which of
of size two are taken from this population the following best describes what happens
(without replacement). What is the median to the sampling distribution of the sample
of the sampling distribution of the median? mean when the size of a random sample
1 1.4 increases from 10 to 100?
1.5 2 Its mean gets closer to the population
none of the above mean, its standard deviation gets
closer to the population standard
AP2. With random sampling, which of the deviation, and its shape gets closer to
following is the best reason not to use the the population’s shape.
sample maximum as an estimator for the
Its mean gets closer to the population
population maximum?
mean, its standard deviation gets smaller,
The sample maximum has too much and its shape gets closer to normal.
variability. Its mean stays constant, its standard
The sample maximum is biased. deviation gets closer to the population
The sample maximum is difficult to standard deviation, and its shape gets
compute. closer to the population’s shape.
The sample range is an unbiased Its mean stays constant, its standard
estimator. deviation gets smaller, and its shape gets
The sample maximum does not have closer to normal.
a normally distributed sampling None of the above
distribution. AP5. The scores on a standardized test are
AP3. In computing the sample standard normally distributed with mean 500 and
deviation, the formula calls for a division standard deviation 110. In a randomly
by n 1. Which of the following is the best selected group of 100 test-takers, what is
reason for dividing by n 1 instead of n? the probability that the mean test score is
For averages, always divide by n, and for above 510?
standard deviations, always divide by less than 0.0001
n 1. 0.1817
You use only n 1 data values when 0.4638
computing standard deviations. 0.5362
Dividing by n 1 gives less variation 0.8183
in the sampling distribution of the
population standard deviation. AP6. A statistics teacher claims to be able to
guess, with better than 25% accuracy, which
Dividing by n 1 makes the sample
of four symbols (circle, wavy lines, square,
variance an unbiased estimator of the
or star) is printed on a card. To test this
population variance in random sampling.
claim, she guesses the symbol on 40 cards.
The sampling distribution of the sample If you use the normal approximation to the
standard deviation is closer to normal binomial to compute the probability that
when dividing by n 1.
0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70
Sample Proportion
or about 50 10 successes.
■
Even though the Pew Research Center doesn’t know the value of p, the actual
proportion of young singles who aren’t in a committed relationship and are
not actively looking for a romantic partner, it can use the idea in the preceding
example. For each possible value of p, Pew can compute how close to p most
sample proportions will be. By knowing the variability expected in random
samples taken from populations with different values of p, Pew can estimate how
close p̂ should be to the “truth.”
In Activity 8.1a, you will collect data about the proportion of students who can
make the Vulcan salute. Later in this section, you will learn to find a confidence
interval for the percentage of all students who can make the Vulcan salute.
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Proportion of Successes in the Sample
Display 8.1 Reasonably likely sample proportions for samples of
size 40.
The line segments you drew on your copy of Display 8.1 in Activity 8.1b show
reasonably likely sample proportions when taking random samples of size 40
from a population with a given proportion of successes p.
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3 p = 0.75
0.2
0.1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Proportion of Successes in the Sample
Display 8.2 A complete chart of reasonably likely outcomes for
samples of size 40. (These intervals were calculated
directly from the binomial distribution so you may
not get exactly the same interval if you use the formula
based on the normal approximation on page 468.)
■
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3 p = 0.6
0.2
0.1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Proportion of Successes in the Sample
Display 8.4 The length of the 95% confidence interval (the
vertical line segment) is the same as the length of the
horizontal line segment of reasonably likely sample
proportions for p 0.6.
The bold vertical line segment is the confidence interval, and it has the same
endpoints—namely, 0.6 0.15, or 0.45 and 0.75. So, to get the (vertical) confidence
interval, all you have to do is find the endpoints of the horizontal line segment by
substituting the known value of p̂ for the unknown value of p in the formula.
The first two conditions listed in the box are necessary for you to be able to
use the normal distribution (and z-scores) as an approximation to the binomial
distribution. If the third condition isn’t met, your confidence interval will be
longer than it needs to be.
Margin of Error
The quantity
________
p̂(1 p̂)
E z* _______
n
is called the margin of error. It is half the width of the confidence interval.
You can also use a calculator to calculate confidence intervals. [See Calculator
Note 8C.] Shown here are the values for the previous example.
Student
Display 8.5 Chart for recording a sample of 95% confidence
intervals.
Student: In the activity, about 95% of the 95% confidence intervals captured
the true population proportion of 0.5. That was no surprise.
But I don’t see why that happened. Just calling something a
95% confidence interval doesn’t make it one.
Statistician: You’re right. This isn’t obvious. For me to explain the logic to you,
you’ll have to answer some questions as we go along.
Student: Okay.
Statistician: Go back and ask those of your classmates who had confidence
intervals that captured the true proportion of even digits
(p 0.5) what values of p̂ they got. We’ll talk again tomorrow.
Student: (The next day) They had values of p̂ between 0.35 and 0.65.
Statistician: Right again. Now look at Display 8.6. What do you notice about
these values of p̂?
Student: They all lie on the horizontal line segment for p 0.5.
Statistician: Yes, the values of p̂ that give a confidence interval that captures
p 0.5 are the reasonably likely outcomes for p 0.5. What is
the chance of getting one of these “good” values of p̂?
Student: 95%!
0.7
0.6
0.5 p = 0.5
0.4
0.3 p = 0.65
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Proportion of Successes in the Sample
Display 8.6 The population proportion p is in the confidence interval
if and only if p̂ is a reasonably likely outcome for p.
A note about correct It is correct to say that you expect the true value of p to be in 95 out of every
language 100 of the 95% confidence intervals you construct. However, it is not correct to
say, after you have found a confidence interval, that there is a 95% probability that
p is in that confidence interval. Here is an example that shows why. If you pick
a date in the next millennium at random, it is reasonable to say that there is a _17
chance you will pick a Tuesday. However, suppose the date you pick turns out
to be March 3, 3875. It sounds a bit silly to say that there is a _17 chance that
March 3, 3875, is a Tuesday. Once the date is selected, there is no randomness
left. Either March 3, 3875, is a Tuesday or it isn’t. All you can say is that the
process you have used to select a date gives you a Tuesday _17 of the time. (This
example is credited to Wes White, an AP Statistics teacher in Los Angeles.)
Practice
Reasonably Likely Events P2. Describe how to use simulation to find the
P1. Suppose 40% of students in your graduating reasonably likely sample proportions for
class plan to go on to higher education. a random sample of size 40 taken from a
You survey a random sample of 50 of population with p 0.3.
your classmates and compute the sample P3. According to the U.S. Census Bureau, about
proportion p̂ of students who plan to go on 16% of the residents of the country do not
to higher education. have health insurance. Suppose a polling
a. There is a 95% chance that p̂ will be agency randomly selects 200 residents.
between what two numbers? What numbers of residents without health
insurance are reasonably likely?
b. Is it reasonably likely to find that
25 students in your sample plan to go
on to higher education?
8.1 Estimating a Proportion with Confidence 483
The Meaning of a Confidence Interval 65% responded that their school was helping
Use Display 8.2 on page 472 to answer P4–P8. them discover what type of work they would
love to do as a career. [Source: Gallup, Teens: Schools
P4. Suppose you flip a fair coin 40 times. How Help Students Find Career Path, poll.gallup.com.]
many heads is it reasonably likely for you a. Check to see if the three conditions for
to get? computing a confidence interval are met
P5. About 65% of 18- and 19-year-olds are in this case.
enrolled in school. If you take a random b. Find a 95% confidence interval for the
sample of 40 randomly chosen 18- and percentage of all teens in the United
19-year-olds, would you be reasonably likely States who would respond that their
to find that 33 were in school? [Source: U.S. school is helping them discover a career
Census Bureau, Statistical Abstract of the United States, 2006,
Table 209.] path. What is the margin of error?
P6. About 85% of people in the United States c. Find a 90% confidence interval for the
age 25 or over have graduated from high percentage of all teens in the United
school. In a random sample of 40 people age States who would respond that their
25 or older, how many high school graduates school is helping them discover a career
are you reasonably likely to get? What path. What is the margin of error?
proportions of high school graduates are you d. Which confidence interval, 90% or 95%,
reasonably likely to get? [Source: U.S. Census Bureau, is wider? Why should that be the case?
Statistical Abstract of the United States, 2006, Table 214.]
P11. In the same survey as in P10, 4% of the
P7. In a random sample of 40 adults, 25% know 600 students responding gave their school
what color Elmo is. What is the 95% a D rating (on a scale of A, B, C, D, F).
confidence interval for the percentage of
a. Check to see if the three conditions for
all adults who know what color Elmo is?
computing a confidence interval are met
P8. Suppose that in a random sample of in this case.
40 retired women, 45% of the women
b. Find a 95% confidence interval for the
travel more than they did while they were
percentage of all teenagers in the United
working. Find the 95% confidence interval
States who would give their school a D
for the proportion of all retired women who
rating.
travel more.
c. How does the width of the confidence
From the Chart to a Formula interval in part b compare to that of the
P9. Suppose that in a random sample of 40 confidence interval in part b of P10?
students from your school, 25 are wearing What is the reason for this?
sneakers. Find the 95% confidence interval
The Capture Rate
for the percentage of all students in your
school who wear sneakers, P12. Suppose you know that a population
proportion, p, is 0.60. Now suppose 80
a. using the chart in Display 8.2
different students are going to select
b. without using the chart independent random samples of size 40 from
this population. Each student constructs his
Using the Formula
or her own 90% confidence interval. How
P10. In a survey consisting of a randomly selected many of the resulting confidence intervals
national sample of 600 teens ages 13–17, would you expect to include the population
conducted from July 6 to September 4, 2005, proportion, p, of 0.60?
1.0
0.9
p = 0.25
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Proportion of Successes in the Sample
Display 8.9 A sample proportion of p̂ 0.25 isn’t reasonably
likely when p 0.5.
■
p0: the hypothesized • The proportion of heads that you hypothesize is the true proportion of heads
value of the population when a coin is spun. The symbol used for this standard value is p0. You have
proportion been testing the standard that spinning a penny is fair, so you have been using
p0 0.5.
p̂ p0
___________
_________
p0(1 p0)
_________
n
0.25 0.5
_____________
___________
0.5)
0.5(1
__________
40
3.16
8.2 Testing a Proportion 493
Having a sample proportion that is 3.16 standard errors below the hypothesized
mean p0, of 0.5, would definitely be a rare event if it is true that half of spun pennies
land heads. The value of p̂ is outside the reasonably likely outcomes for p 0.5,
so Miguel and Kevin reject the null hypothesis that spinning a penny is fair. The
z-score that Miguel and Kevin computed is an example of a test statistic.
P-Values
Instead of simply reporting whether a result is statistically significant, it is
common practice also to report a P-value.
The P-value for a test is the probability of seeing a result from a random
sample that is as extreme as or more extreme than the result you got from your
random sample if the null hypothesis is true.
Solution
The P-value 0.01 means that if the chimps were selecting a rake at random,
which gives them a 50–50 chance of selecting the one that rakes in the food, the
probability is only 0.01 that the chimps would rake in the food as often as or more
often than they did. The researchers therefore can conclude that the chimps were
able to deliberately choose the rake that got them the food. ■
0.171
z = –0.95 0
Display 8.11 The probability of 17 or fewer heads with 40 spins
of a coin, if spinning a coin is fair.
However, Jenny and Maya notice that getting 23 or more heads is just as
extreme as getting 17 heads or fewer. Now the sketch looks like Display 8.12.
8.2 Testing a Proportion 495
The probability of getting a result as far out in the tails of the distribution as they
The phrase in italics is did is 2(0.171), or 0.342. This is their P-value. If the probability of getting heads
crucial in describing the is 0.5, there is a 34.2% chance of getting 17 heads or fewer or getting 23 heads
meaning of a P-value. or more in 40 spins. Because this P-value is relatively large, Jenny and Maya
don’t have statistically significant evidence to conclude that spinning a penny is
anything other than fair.
0.171 0.171
z = –0.95 0 z = 0.95
Display 8.12 The probability of at most 17 or at least 23 heads
with 40 spins of a coin, if spinning a coin is fair.
[See Calculator Note 8D to learn how to find a value of z using your calculator.]
A large value of z In summary, the test statistic, z, is computed using the hypothesized value, p0,
indicates that p̂ is far as the mean. A small P-value tells you that the sample proportion you observed is
from p0. quite far away from p0. Your data aren’t behaving in a manner consistent with the
null hypothesis. A large P-value tells you that the sample proportion you observed
is near p0, so your result isn’t statistically significant. So, the P-value weighs the
evidence found from the data: A small P-value places the weight against the null
hypothesis, and a large P-value weighs in as consistent with the null hypothesis.
DISCUSSION P-Values
D26. Why is the phrase “if the null hypothesis is true” necessary in the
interpretation of a P-value?
D27. Does the P-value give the probability that the null hypothesis is true?
0.05 0.05
–1.645 0 1.645
z = 1.87
Display 8.13 The test statistic z 1.87 is more extreme than
z* 1.645.
Because z 1.87 is more extreme than z* 1.645, reject the null hypothesis.
The sample proportion, p̂, is farther from p0 than would be reasonably likely if p0
were the true population proportion.
■
p̂ p0
z ___________
_________
p0(1 p0)
_________
n
–z * 0 z z*
–1.96 0 1.96
z = –0.95
Display 8.14 A z-score of 0.95.
Write a conclusion that 4. Because z 0.95 is less extreme than the critical value, 1.96, this isn’t a
is linked to the value of statistically significant result. Equivalently, because the P-value, 0.342, is larger
z or to the P-value and is than 0.05, this isn’t a statistically significant result.
stated in the context of
the situation. If spinning a penny results in heads 50% of the time, you are reasonably
likely to get only 17 heads out of 40 flips. Thus, Jenny and Maya cannot
reject the null hypothesis that spinning a penny is a fair process. (You should
never say that you “accept” the null hypothesis, because if you constructed
a confidence interval for p, all the other values in it—not just 0.5—are also
plausible values of p.)
■
You will more easily remember the structure of a test of significance if you
keep in mind what you should do before you look at the data from the sample.
• Check the conditions for the test. Checking the conditions to be met for a
significance test for a proportion requires only knowledge of how the sample
was collected, the sample size, and the value of the hypothesized standard,
p0. (For some tests, as you will learn later, you will have to peek at the data in
order to check conditions.)
• Write your hypotheses. The hypotheses should be based on the research
question to be investigated, not on the data. Ideally, the investigator sets the
hypotheses before the data are collected. This means that the value of p̂ should
not appear in your hypotheses.
• Decide on the level of significance. The level of significance, , and the
corresponding critical values, z*, are set by the investigator before the data
are collected. Some fields have set standards for the level of significance,
typically 0.05.
• Sketch the standard normal distribution. Mark the critical values, z*. You
can place the value of the test statistic, z, on this distribution after you look
at the data.
You will use the data (number of successes and value of p̂) only to compute
the test statistic and to write your conclusion.
Types of Errors
The reasoning of significance tests often is compared to that of a jury trial. The
possibilities in such a trial are given in this diagram.
Defendant Is Actually
Innocent Guilty
Not Guilty Correct Error
Jury’s Decision
Guilty Worse error Correct
In the same way, there are two types of errors in significance testing.
Null Hypothesis Is Actually
True False
Don’t Reject H0 Correct Type II error
Your Decision
Reject H0 Type I error Correct
Miguel and Kevin got a test statistic with a large absolute value and so
concluded that spinning a penny is not fair. However, Jenny and Maya got a test
statistic with a value that was close to 0, so the result from their sample was quite
consistent with the idea that spinning a penny is fair. Who is right? Thousands of
0.01 0.01
z = –2.25 0 z = 2.25
Display 8.15 The P-value for a z-score of 2.25.
Ann knows that she does not have ESP and, in fact, didn’t even try to discern
what the card was, instead selecting her choices rapidly and at random. In other
words, the null hypothesis is true that, in the long run, she will guess the correct
card 20% of the time. A Type I error has been made. Out of every 100 people who
take such a test, we expect that two of them will get 29 or more cards right (or 11
or fewer cards right). Ann was one of the lucky two. When she tried the ESP test
again, she correctly identified only 17 cards.
■
If H0 is true, the Suppose the null hypothesis is true. What is the chance that you will reject
probability of a Type I it, making a Type I error? The only way you can make a Type I error is to get a
error is α. rare event from your sample. For example, if you are using a significance level of
0.05, you would make a Type I error if you get a value of the test statistic larger
than 1.96 or smaller than 1.96. This happens only 5% of the time no matter
what the sample size. Thus, if the null hypothesis is true, the probability of a
Type I error is 0.05. If you used z* 2.576, the critical values for a significance
level of 0.01, then the probability of making a Type I error would be only 0.01. If
the null hypothesis is true, the probability of a Type I error is equal to the level of
significance. To lower the chance of a Type I error, then, your best strategy is to
have a low level of significance or, equivalently, large critical values.
It follows that if the probability of a Type II error is small, then the power of the
test is large.
0.08
0.06
0.04
0.02
0.00
0.2 0.4 0.6 0.8 1.0
Sample Proportion
Display 8.16 Sampling distribution of p̂ when n 40 and
p 0.5. The value p̂ 0.425 is indicated by the
vertical line.
Probability
0.04
0.03
0.02
0.01
0.00
0.2 0.4 0.6 0.8 1.0
Sample Proportion
Display 8.17 Sampling distribution of p̂ when n 200 and
p 0.5. The value of p̂ 0.425 is indicated by the
vertical line.
■
Bigger n, more power. As the previous example shows, if the null hypothesis is false and should be
Smaller n, less power. rejected, a larger sample size increases the probability that you will be able to
reject it. However, if the null hypothesis is true, increasing the sample size has no
effect on the probability of making a Type I error. The only way you can decrease
the probability of making a Type I error is to make the level of significance,
, smaller. As you’ll see in the next example, however, if the null hypothesis is
actually false, this strategy results in a higher probability of a Type II error and
lower power!
The first alternative hypothesis is for a two-sided test; the latter two are for
one-sided tests. For a one-sided test, the P-value is found using one side only.
0.0274
0 z = 1.92
Display 8.18 The P-value for the successful life/friends problem.
State conclusion in 4. The P-value (found in the upper tail) is fairly small. If the percentage of all
context with link to adults who believe a successful life depends on having good friends is 50%,
computations. then the probability of getting a sample proportion of 53% or more is only
0.0274. Because getting a sample proportion, p̂, of 0.53 or more is so unlikely
if the null hypothesis is true, reject the null hypothesis. This is quite strong
evidence that the true percentage must be greater than 50%. The editors
should feel free to use their headline.
■
p̂ p0
z ___________
_________
p0(1 p0)
_________
n
Practice
Informal Significance Testing P22. This year, 75% of the seniors wanted extra
P20. A 1997 article reported that two-thirds of tickets for their graduation ceremony. To
teens in grades 7–12 want to study more anticipate whether there might be a change
about medical research. You wonder if this in that percentage next year, the junior class
proportion still holds today and decide to took a random sample of 40 juniors and
test it. You take a random sample of 40 teens found that 32 would want extra tickets.
and find that only 23 want to study more a. What is the standard, p0?
about medical research. [Source: CNN Interactive b. What is the sample proportion, p̂?
Story Page, April 22, 1997, www.cnn.com.]
c. Use Display 8.9 to determine whether this
a. What is the standard (the hypothesized result is statistically significant.
value, p0, of the population proportion)?
d. Is this statistical evidence of a change?
b. What is the sample proportion, p̂?
c. Use Display 8.9 to determine whether the The Test Statistic
result is statistically significant. That is, is P23. For the situation in P20, what value of the
there evidence leading you to believe that test statistic should the junior class use to
the proportion today is different from the test whether there is statistical evidence of a
proportion in 1997? change?
P21. A student took a 40-question true–false test P24. Forty-five dogs and their owners, chosen at
and got 30 answers correct. The student random, were photographed separately. A
says, “That proves I was not guessing at the judge was shown a picture of each owner and
answers.” pictures of two dogs and asked to pick the
a. What is the standard, p0? dog that went with the owner. The judge was
b. What is the sample proportion, p̂? right 23 times. What value of the test statistic
should be used to test whether the judge did
c. Use Display 8.9 to determine whether the
better than could reasonably be expected just
result is statistically significant.
by guessing? [Source: Based on a study published in
d. Does the answer to part c prove that the Psychological Science, 2004.]
student was not guessing?
Exercises
E25. A random sample of dogs was checked to see E. The true proportion of dogs that wear a
how many wore a collar. The 95% confidence collar may or may not be between 0.82
interval for the percentage of all dogs that and 0.96.
wear a collar turned out to be from 0.82 to
0.96. Which of these is not a true statement?
A. You can reject the hypothesis that 75% of
all dogs wear a collar.
B. You cannot reject the hypothesis that
90% of all dogs wear a collar.
C. If 90% of all dogs wear a collar, then you
are reasonably likely to get a result like
the one from this sample.
D. If 75% of all dogs wear a collar, then you
are reasonably likely to get a result like
the one from this sample.
Probability
with the owner. The judge was right 23 0.10
0.08
times.
0.06
a. If you use a significance level of 0.05, do 0.04
you reject the null hypothesis that, in the 0.02
long run, the judge will select the correct 0.00
dog half the time? 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10
Sample Proportion
b. Suppose you compute the 95% confidence
interval for the proportion of times the Display 8.19 Sampling distribution of p̂ for n 266
judge will select the correct dog. Will the and p 0.02.
value of p0 from the null hypothesis be in d. What should the researchers conclude?
this confidence interval? Explain. E42. A psychologist was struck by the fact that
c. Compute the confidence interval to check at square tables many pairs of students sat
your answer to part b. on adjacent sides rather than across from
d. Explain how the margin of error for the each other and wondered if people prefer
confidence interval is related to the power to sit that way. The psychologist collected
of the test. some data, observing 50 pairs of students
e. How would you suggest that the seated in the student cafeteria of a California
investigator get more power for this test? university. The tables were square and had
one seat available on each of the four sides.
E40. To perform a significance test for a proportion, The psychologist observed that 35 pairs sat
you must decide on the hypotheses (null and on adjacent sides of the table, while 15 pairs
alternative), the level of significance, and the sat across from each other. He wanted to
sample size. Explain the effects of each of these know if the evidence suggested a preference
components on the power of the test. by students for adjacent sides, as opposed to
E41. At the beginning of this section, you read simply random behavior. [Source: Joel E. Cohen,
that 2% of barn swallows have white feathers “Turning the Tables,” in Statistics by Example: Exploring
in places where the plumage is normally blue Data, edited by Frederick Mosteller et al. (Reading, Mass.:
Addison-Wesley, 1973), pp. 87–90.]
or red. However, about 16% of barn swallows
captured around Chernobyl after 1991
had such genetic mutations. The number
captured around Chernobyl was relatively
large—266 barn swallows.
a. If you want to determine whether the
increase in the proportion of mutations
is statistically significant, should this be a
one-sided or two-sided test?
b. What condition(s) of the significance test
for a proportion are not satisfied?
Display 8.20 Heights of presidential candidates. [Source: Paul M. Sommers, “Presidential Candidates Who
Measure Up,” Chance 9, no. 3 (1996): 29–31.]
Here p̂1 and p̂2 are the proportions of successes in the two samples. Substituting
what you have so far into the formula gives the 95% confidence interval for the
difference between the proportion of U.S. households that own pets now and
the proportion that owned pets in 1994:
The hard part, as always, is estimating the size of the standard error.
From Section 7.3, the standard error of the distribution of the sample proportion
p̂1 is
_________ _________
p1(1 p1) p̂1(1 p̂1)
p̂
1 _________
n1 which can be estimated by
_________
n1
where n1 is the sample size. Similarly, the standard error of the distribution of the
sample proportion p̂2 is
_________ _________
p2(1 p2) p̂2(1 p̂2)
p̂
2 _________
n2 which can be estimated by
_________
n2
Now you can write the complete confidence interval for the difference
between two proportions.
Write your interpretation You are 95% confident that the difference in the two rates of pet ownership
in context. is between 0.057 and 0.083. This means that it is plausible that the difference in
the percentage of households that own pets now and the percentage in 1994 is
5.7%. It is also plausible that the difference is 8.3%. Note that a difference of 0
does not lie within the confidence interval. This means that if the difference in
the proportion of pet owners now and in 1994 actually is 0, getting a difference
of 0.07 in the samples is not at all likely. Thus, you are convinced that there was a
change in the percentage of households that own a pet.
■
Which sample It doesn’t matter which sample proportion you call p̂1 and which you call p̂2, as
proportion should be long as you remember which is which. Most people like to assign the larger one to
p̂1 and which p̂2? be p̂1 so that the difference, p̂1 p̂2, is positive. If you like working with negative
numbers, do it the other way!
Activity 8.3a will help you understand how confidence intervals for
differences of proportions may vary from sample to sample.
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Student
Display 8.21 Chart for recording 95% confidence intervals for the
difference in the proportion of yellows in Skittles
and M&M’s.
In practice, you estimate p̂ p̂ using the two sample proportions. This leads
1 2
to a confidence interval for the difference of the two population proportions of
____________________
p̂1(1 p̂1) _________
p̂ (1 p̂2)
(p̂1 p̂2) z*
_________
n1 2 n2
where z* are the values that enclose an area in a normal distribution equal
to the confidence level, which typically is 0.95. Don’t forget to check the three
conditions on page 520 that must be met to use this confidence interval, especially
the condition that the two samples were taken independently from two different
populations.
When interpreting, say, a 95% confidence interval, you should say that you
are 95% confident that the difference between the proportion of successes in the
first population and the proportion of successes in the second population is in
this confidence interval. (However, say this in the context of the situation.) If the
confidence interval includes 0, you can’t conclude that there is any difference in
the proportions in the two populations.
Practice
The Formula for the Confidence Interval teenagers’ driving speeds. Among the
P46. Suppose the surveys on pet ownership 325 13–15-year-olds in the sample, 50%
described in the example on page 520 had responded yes to this question. Among the
used sample sizes of only 100 people. That is, 224 16–17-year-olds, only 28% responded
a 1994 survey of 100 U.S. households found yes. [Source: Gallup, Teens Slow to Support Speed Monitors,
2005, www.poll.gallup.com.]
that 56% owned a pet, and this year a survey
of 100 U.S. households found that 63% a. Check the conditions for constructing a
owned a pet. confidence interval for the difference of
two proportions.
a. Check the conditions for constructing a
confidence interval for the difference of b. Estimate the difference between the
two proportions. proportion of all 13–15-year-olds who
would answer yes to this question and
b. Compute and interpret the confidence
the proportion of all 16–17-year-olds
interval.
who would answer yes.
c. Is 0 in the confidence interval? What does
c. Interpret the resulting interval in the
your answer imply?
context of the problem.
P47. A poll of 549 teenagers asked if it was
d. Is 0 in the confidence interval? What does
appropriate for parents to install a special
your answer imply?
device on the car to allow parents to monitor
two proportions. (0.222)(0.778) ____________
____________ (0.778)(0.222)
1.96
I. A 95% confidence interval consists of 99 99
those population proportions p for which 0.556 0.116
the proportion from the sample, p̂, is
reasonably likely to occur. Assuming that the respondents can be
considered a random sample from some
II. If you construct a hundred 95% confidence population, is the student’s method correct?
intervals, you expect that the population If so, write an interpretation of this interval.
proportion p will be in 95 of them. If not, do a more appropriate analysis.
E57. The formula for the confidence interval for E60. In the USA Today poll on driving ages
the difference of two proportions involves z. described in E52, 35% of 1000 randomly
Why is it okay to use z in this case? sampled adults chose 16 as the preferred
E58. What is suggested if the confidence interval initial driving age while 42% chose 18.
for the difference of two proportions Can the difference of the proportion in the
a. includes 0? population who prefer 16 and the proportion
b. does not include 0? in the population who prefer 18 be estimated
by the confidence interval developed in this
chapter? Explain why or why not.
600
Frequency of Differences
400
500
400 300
300
200
200
100
100
-0.4 -0.2 0.0 0.2 0.4 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4
Differences Differences
350
Frequency of Differences
300
250
200
150
100
50
Display 8.22 Simulated sampling distributions of p̂1 p̂2 when p1 p2 0.2 for various
sample sizes.
Note three facts about these approximate sampling distributions:
• Each sampling distribution is approximately normal in shape and becomes
more so with larger sample sizes.
• The mean of each sampling distribution is at p1 p2 0.2 0.2 0.
• The formula from Section 8.3 can be used to find the standard error:
____________________ _______________________
p1(1 p1) _________
p (1 p )
0.2(1 0.2) __________
0.2(1 0.2)
p̂ p̂
1 2
_________
n1 2 n 2
2 __________
n1 n2
Display 8.23 lists the SEs computed from the formula and from the
approximate sampling distributions in Display 8.22. They match very closely.
Display 8.23 SEs from the formula and from the simulation.
Compute the test The test statistic builds on the best estimates of these population proportions,
statistic and P-value. namely, the sample proportions
113 0.106 and
p̂1 ____ 92 0.079
p̂2 ____
1067 1170
The general form of a test statistic for testing hypotheses is
statistic parameter
________________________
standard deviation of statistic
The difference from the sample (statistic) is p̂1 p̂2, or 0.106 0.079 0.027.
The hypothesized difference (parameter) is 0, the value under the null hypothesis
p1 p2 0. The standard error of the estimate is given exactly by
____________________
p1(1 p1) _________
p (1 p2)
p̂ p̂
1 2 _________
n1 2 n2
You could do as you did in Section 8.3: Estimate p1 with p̂1 and estimate p2
with p̂2. However, you can do even better. The null hypothesis states that the
proportion of males who are left-handed is equal to the proportion of females
p̂ is called the pooled who are left-handed, that is, p1 p2. You can estimate this common value of
estimate of the common p1 and p2 by combining the data from both samples into a pooled estimate, p̂.
proportion of successes. This pooled estimate is found by combining males and females into one group:
total number of left-handers 113 92 ____
205
p̂ ______________________ ___________
total number of people 1067 1170 2237
1
p̂(1 p̂) __ 1
n1 n 2
__
92 0
113 ____
____
1067 1170
___________________________
_________________________ 2.23
2237
205 1 ____
205 ____
1 ____
1
2237 1067 1170
____
State the conclusion in This test statistic is based on the difference of two approximately normally
context. distributed random variables, p̂1 and p̂2. You learned in Chapter 7 that such a
difference itself has a normal distribution. Thus, you can use Table A on page 824
to find the P-value. The test statistic z 2.23 has a (one-sided) P-value of 0.0129.
P = 0.0129
0 z = 2.23
Display 8.24 P-value for a one-sided test with z 2.23.
This P-value is very small, so reject the null hypothesis. The difference
between the rates of left-handedness in these two samples is too large to attribute
to chance variation alone. The evidence supports the alternative that males have a
higher rate of left-handedness.
■
As you have seen, the significance test for the difference of two proportions
proceeds along the same lines as the test for a single proportion. The steps in the
significance test for the difference of two proportions are given in this box.
1
p̂(1 p̂) __ 1
n 1 n2
__
_1 P _1 P
2 2 P P
–z 0 z –z 0 0 z
are larger than 5. Both the number of teenage girls and the number of teenage
boys in the United States are much larger than 10 times each sample size.
Hypotheses 2. H0: The proportion p1 of teenage girls supporting dress codes is equal to the
proportion p2 of teenage boys supporting dress codes.
Ha: p1 p2. (Note that you can use the symbols p1 and p2 because you defined
exactly what they stand for in the null hypothesis.)
Computations with a 3. The pooled estimate is
diagram
total number of successes in both samples _________
p̂ ________________________________
n1 n 2 156 115 0.495
274 274
The test statistic is
(p̂1 p̂2) (p1 p2) __________________________
(0.57 0.42) (0)
z __________________
_______________ ________________________ 3.51
1
p̂(1 p̂) __ 1 1 ___
1
n1 n2 0.495(1 0.495) ___
274 274
__
With a test statistic of 3.51, the P-value for a two-sided test is 0.0002(2), or
0.0004. See Display 8.26 on the next page.
–z = –3.51 0 z = 3.51
Display 8.26 A two-sided test with z 3.51.
Note that you can use your calculator’s two-proportion z-test, which carries
more decimal places, to get the more accurate value z 3.503 and a P-value
of 0.00046. [See Calculator Note 8F.]
Conclusion in context 4. This difference is statistically significant, and you reject the null hypothesis.
If the proportions of girls and boys supporting dress codes were equal, then
you would expect only 4 out of 10,000 repeated samples of size 274 to have
a difference in sample proportions of 15% or larger. Because the P-value,
0.0004, is less than 0.01, you cannot reasonably attribute the difference
to chance variation. You conclude that the proportions of girls and boys
supporting dress codes are different.
Display 8.27 shows a printout for this example.
1
p̂(1 p̂) __ 1
n1 n2
__
Exercises
E61. Milk Chocolate M&M’s are 24% blue, and c. Sketch the distribution of p̂1 p̂2, with a
Almond M&M’s are 20% blue. Suppose you scale on the horizontal axis.
take a random sample of 100 of each. You d. Compute the probability that the
compute the proportion that are blue in each difference will be 0.05 or greater.
sample. You then subtract the proportion of
E62. Almond M&M’s and Peanut Butter M&M’s
blue in the sample of Almond M&M’s from
both are 20% blue. Suppose you take a
the proportion of blue in the sample of Milk
random sample of 100 of each. You compute
Chocolate M&M’s to get p̂1 p̂2. You repeat
the proportion that are blue in each sample.
this process until you have a distribution of
You then subtract the proportion of blue in
millions of values of p̂1 p̂2.
the sample of Peanut Butter M&M’s from
a. What is the expected value of p̂1 p̂2? the proportion of blue in the sample of
b. What is the standard error of the Almond M&M’s to get p̂1 p̂2. You repeat
distribution of p̂1 p̂2?
For this experiment, n1 and n2 are each half the number of students, and p1
and p2 are each 0.75, the probability of being cured. What can you conclude?
Display 8.28 shows the sampling distribution from Activity 8.5a for a class
of 40 students. The students participated in a sham experiment in which the
Difference_of_Proportions
0.25
Relative Frequency of
0.20
0.15
0.10
0.05
Of course, in the real-life situations you will be studying next, you will not
know the true success rates of the treatments and will have to estimate them
from the proportions of successes in the treatment groups. But that is nothing
new—this formula will continue to work fine just as you have used it before.
Here p̂1 is the observed proportion of successes in the group of size n1 given the
first treatment, and p̂2 is the observed proportion of successes in the group of
size n2 given the second treatment. (The group sizes do not have to be equal.)
The conditions that must be met in order to use this formula are that
• the two treatments are randomly assigned to the population of available
experimental units
• n1 p̂1, n1(1 p̂1), n2 p̂2, and n2 (1 p̂2) are all at least 5
Don’t we have to have a The group of ARC patients weren’t randomly selected from the population of
random sample? all ARC patients but were volunteers who had to give “informed consent.” Thus,
they might be quite different from the population of all ARC patients and should
not be regarded as a random sample from any specific population.
(0.179)(0.821) ____________
____________ (0.149)(0.851)
(0.179 0.149) 1.96
67 67
0.03 0.125
You also can write this confidence interval as (0.095, 0.155).
Write your interpretation If all 134 patients could have been given each treatment, we estimate that
in context. the difference between the rate of progression to AIDS if all had been given AZT
alone and the rate of progression if all had been given AZT plus ACV would be
someplace between a 9.5% difference in favor of AZT alone and a 15.5% difference
in favor of AZT plus ACV. Note that a difference of 0 lies well within the confidence
interval. This means that if the difference in the proportions of patients who would
progress to AIDS is actually 0, getting a difference of 0.03 in our treatment groups
is reasonably likely. Thus, we aren’t convinced that the two therapies differ with
respect to the rate of progression to AIDS in this group of patients.
■
1
p̂(1 p̂) __ 1
n 1 n2
__
–z 0 z –z 0 0 z
To see how to apply these ideas to an experiment, return to the clinical trial
experiment comparing the two treatments for AIDS-related complex (ARC).
is at least 5.
Compute the test The pooled estimate, p̂, of the probability that a patient will survive, under the
statistic and the P-value. null hypothesis that the treatment a patient received made no difference in whether
90
he or she survived, is ___
131 , or about 0.687. The test statistic then takes on the value
statistic parameter
z ________________________
standard deviation of statistic
(p̂1 p̂2) (p1 p2)
__________________
_______________
1
p̂(1 p̂) __ 1
n1 n2
__
(0.594 0.790) 0
________________________
______________________ 2.415
1 ___
0.687(1 0.687) ___ 1
69 62
This test statistic is based on the difference of two approximately normally
distributed random variables, p̂1 and p̂2. You learned in Chapter 7 that such a
difference is itself approximately normal. Thus, you can use Table A on page
824 to find the P-value. The test statistic z 2.42 has a (one-sided) P-value of
0.0078, as illustrated in Display 8.32.
P = 0.0078
z = –2.42 0
Display 8.32 P-value for a one-sided test with z 2.42.
Conclusion in context. This P-value is very small, so reject the null hypothesis. If all subjects in the
experiment could have been given the AZT plus ACV treatment, you are confident
that there would have been a larger survival rate than if all had received only AZT.
The difference between the survival rates for these two treatments is too large to
be attributed to chance variation alone.
■
0.65(0.35) _________
_________ 0.42(0.58)
(0.65 0.42) 1.96 0.23 0.08
297 314
The confidence interval is (0.15, 0.31).
Is there sufficient evidence to say that the difference in the percentage using
helmets on local streets before the law and after the law is so large that it cannot
reasonably be attributed to chance alone?
Solution
Even though some of the sites for observing riders were randomly selected,
the riders observed were those who just happened to be there at the time of
observation. In addition, there were no randomly assigned treatments and
no controls on any of the myriad other variables that could affect helmet use.
Nevertheless, a test of significance can shed some light on the possible association
between the law and helmet usage. The null hypothesis is that there is no
difference in the proportions of helmet users on local streets for 1999 and 2002.
The alternative hypothesis of interest is that the 2002 rate of helmet use is higher
than the 1999 rate (perhaps because of the law).
Exercises
E75. A famous medical experiment was while 302 of the 407 in the vitamin C group
conducted by Nobel laureate Linus Pauling got colds. [Source: T. W. Anderson, D. B. Reid, and
(1901–1994), who believed that vitamin C G. H. Beaton, “Vitamin C and the Common Cold,” Canadian
Medical Association Journal 107 (1972): 503–8.]
prevents colds. His subjects were 279
French skiers who were randomly assigned a. Find and interpret a 95% confidence
to receive vitamin C or a placebo. Of the interval for the difference of two
139 given vitamin C, 17 got a cold. Of the proportions.
140 given the placebo, 31 got a cold. [Source: b. Find a 99% confidence interval for the
L. Pauling, “The Significance of the Evidence About Ascorbic difference of two proportions. Does
Acid and the Common Cold,” Proceedings of the National
Academy of Sciences 68 (1971): 2678–81.] your conclusion change from your
interpretation in part a?
a. Find and interpret a 95% confidence
interval for the difference of two E77. In 1954, the largest medical experiment of
proportions. all time was carried out to test whether the
newly developed Salk vaccine was effective
b. Find a 99% confidence interval for the
in preventing polio. This study incorporated
difference of two proportions. Does
all three characteristics of an experiment: use
your conclusion change from your
of a control group of children who received
interpretation in part a?
a placebo injection (an injection that felt
E76. A randomized clinical trial on Linus like a regular immunization but contained
Pauling’s claim that vitamin C helps prevent only salt water), random assignment of
the common cold was carried out in Canada children to either the placebo injection
among 818 volunteers, with results reported group or the Salk vaccine injection group,
in 1972. The data showed that 335 of the and assignment of each treatment to several
411 in the placebo group got colds over the hundred thousand children. Of the 200,745
winter in which the study was conducted, children who received the Salk vaccine,
Chapter Summary
To use the confidence intervals and significance tests of this chapter, you need either
a random sample from a population that is made up of “successes” and “failures” or
two random samples taken independently from two distinct such populations. If the
study is an experiment, you should have two treatments randomly assigned to the
available subjects. (When there is no randomness involved, you proceed with the
test only if you state clearly the limitations of what you have done. If you reject the
null hypothesis in such a case, all you can conclude is that something happened that
can’t reasonably be attributed to chance.)
You use a confidence interval if you want to find a range of plausible values for
• p, the proportion of successes in a population
• p1 p2, the difference between the proportion of successes in one population
and the proportion of successes in another population
Both confidence intervals you have studied have the same form:
statistic (critical value) (standard deviation of statistic)
Both significance tests you have studied include the same steps:
1. Justify your reasons for choosing this particular test. Discuss whether the
conditions are met, and decide whether it is okay to proceed if they are not
strictly met.
2. State the null hypothesis and the alternative hypothesis.
554 Chapter 8 Inference for Proportions
3. Compute the test statistic and find the P-value. Draw a diagram of the
situation.
4. Use the computations to decide whether to reject or not reject the null
hypothesis by comparing the test statistic to z* or by comparing the P-value
to , the significance level. Then state your conclusion in terms of the context
of the situation. (Simply saying “Reject H0” is not sufficient.) Mention any
doubts you have about the validity of your conclusion.
Review Exercises
E88. A 6th-grade student, Emily Rosa, performed E89. “Most teens are not careful enough about the
an experiment to test the validity of information they give out about themselves
“therapeutic touch.” According to an article online.” In a survey that randomly sampled
written by her RN mother, a statistician, 971 teenagers who have online access, 78%
and a physician: “Therapeutic Touch (TT) agreed with this statement. [Source: Pew Internet
is a widely used nursing practice rooted in and American Life Project, 2005, www.pewinternet.org.]
mysticism but alleged to have a scientific a. Compute the 95% confidence interval for
basis. Practitioners of TT claim to treat the population proportion of teenagers
many medical conditions by using their who would agree with the statement.
hands to manipulate a ‘human energy field’ b. Interpret this confidence interval, making
perceptible above the patient’s skin.” To it clear exactly what it is that you are 95%
investigate whether TT practitioners sure is in the confidence interval.
actually can perceive a “human energy field,”
c. Explain the meaning of 95% confidence.
21 experienced TT practitioners were tested
to determine whether they could correctly E90. A 2001 Gallup poll found that 51% of the
identify which of their hands was closest to American public assigned a grade of A or
the investigator’s hand. They placed their B to the public schools in their community.
hands through holes in a screen so they In 2000, the comparable figure was 47%.
could not see the investigator. Placement of Assuming a sample size of 1108 in both
the investigator’s hand was determined by 2000 and 2001, find and interpret a 90%
flipping a coin. Practitioners of TT identified confidence interval for the difference of two
the correct hand in only 123 of 280 trials. proportions. [Source: Gallup, www.gallup.com.]
[Source: Journal of the American Medical Association 279 E91. A study of all injuries from the two winter
(1998): 1005–10.]
seasons 1999–2000 and 2000–2001 at the
a. What does “blinded” mean in the context three largest ski areas in Scotland found
of this experiment? How might it have that of the 531 snowboarders who were
been done? Why wasn’t double-blinding injured, 148 had fractures. Of the 952 skiers
necessary? who were injured, 146 had fractures. (For
b. Write appropriate null and alternative both groups, most of the other injuries were
hypotheses for a significance test. sprains, lacerations, or bruising.) [Source: www
.ski-injury.com, March 29, 2002.]
c. Is it necessary to actually carry out the
test, based on these data? a. Is this difference statistically significant at
the 0.05 level?
d. The article says, “The statistical power of
this experiment was sufficient to conclude b. There are about twice as many skiers as
that if TT practitioners could reliably snowboarders. Can you use this fact and
detect a human energy field, the study the data from this study to determine
would have demonstrated this.” What whether snowboarders are more likely
does this sentence mean? than skiers to be injured?
AP1. Only 33% of students correctly answered No, by constructing a 95% confidence
a difficult multiple-choice question on an interval, you can see that it is plausible
exam given nationwide. Ms. Chang gave that 50% or less of all alumni favor
the same question to her 35 students, abolishing the dress code.
hypothesizing that they would do better No, because the survey only used a
than students nationwide. Despite the lack of sample of alumni.
randomization, she performed a one-sided Because alumni probably have strong
test of the significance of a sample opinions that aren’t normally distributed,
proportion and got P 0.03. Which is no conclusion can be reached.
the best interpretation of this P-value?
AP4. In a study of the effectiveness of two tutors in
Only 3% of her students scored better preparing students for an exam, 50 students
than students nationally. were randomly assigned either to Mr. A or to
If the null hypothesis is true that her Mr. B. A larger proportion of students tutored
students do the same as students by Mr. A passed the exam, resulting in a
nationally, there is a 3% chance that her 95% confidence interval for the difference of
students will do better than students two proportions of (0.05, 0.45). Which is
nationally on this question. not a correct conclusion to draw from this?
Between 30% and 36% of her population There is statistically significant evidence
of students can be expected to answer that Mr. A is the better tutor.
the question correctly.
You cannot reject a null hypothesis of
There is a 3% chance that her students equal tutor effectiveness.
are better than students nationally.
The difference in the percentage who
There is a 3% chance that a random sample passed the exam and were tutored by
of 35 students nationwide would do as well Mr. A and the percentage who passed
as or better than her students did. and were tutored by Mr. B is 20%.
AP2. Researchers constructed a 95% confidence A Type II error may be made.
interval for the proportion of people who The design of this study didn’t have
prefer apples to oranges. They computed a enough power to pick up any but a very
margin of error of 4%. In checking their large difference in passing rates.
work, they discovered that the sample size
used in their computation was _14 of the AP5. With 0.05, researchers conducted a
actual number of people surveyed. Which test of the difference of two proportions
is closest to the correct margin of error? to compare the rate of alcohol use among
teens this year and in 1990. The rates for
1% 2% 4% 8% 16% both years are based on large, independent
AP3. A survey of 200 randomly selected alumni random samples of teens. Which is the best
of Lincoln High School found that 105 favor interpretation of “ 0.05” in this context?
abolishing the school dress code. Is this There’s a 5% chance that the rate of
convincing evidence that more than half of alcohol use has changed.
all alumni favor abolishing the dress code?
There’s a 5% chance that the rate of
Yes, because 105 is more than half of 200. alcohol use has not changed.
Yes, by constructing a 95% confidence If the rate of alcohol use has not changed,
interval, you can see that it is plausible there’s a 5% chance that the researchers
that more than half of all alumni favor will mistakenly conclude that it has.
abolishing the dress code.
AP Sample Test 557
If the rate of alcohol use has changed, will contain the percentage of all voters
there’s a 5% chance that the researchers who plan to vote for the incumbent.
will mistakenly believe it hasn’t. In 100 similar polls, you expect that 95
The study has enough power to detect of the confidence intervals constructed
a difference in the rate of alcohol use of will contain 51%, which is the best
0.05 or more. estimate of the percentage of all voters
AP6. A student obtains a random sample of who plan to vote for the incumbent.
M&M’s and a random sample of Skittles. The probability is 0.95 that the
She finds that 7 of the 40 M&M’s are yellow confidence interval constructed in this
and 13 of the 35 Skittles are yellow. Her poll contains the percentage of all voters
null hypothesis is that the proportion of who plan to vote for the incumbent.
yellow candies is equal in both brands. Her AP8. Sheldon takes a random sample of 50 U.S.
alternative hypothesis is that the proportion housing units and finds that 30 are owner-
is higher in Skittles. What is her conclusion, occupied. Using a significance test for a
if she uses 0.05? proportion, he is not able to reject the null
Reject the null hypothesis. The difference hypothesis that exactly half of U.S. housing
in the two proportions can reasonably be units are owner-occupied. Later, Sheldon
attributed to chance alone. learns that the U.S. Census for the same
Reject the null hypothesis. The year found that 66.2% of housing units are
difference in the two proportions cannot owner-occupied. Select the best description
reasonably be attributed to chance alone. of the type of error in this situation.
Do not reject the null hypothesis. The No error was made.
difference in the two proportions can A Type I error was made because a false
reasonably be attributed to chance alone. null hypothesis was rejected.
Do not reject the null hypothesis. The A Type I error was made because a false
difference in the two proportions cannot null hypothesis wasn’t rejected.
reasonably be attributed to chance alone. A Type II error was made because a false
Accept the null hypothesis because the null hypothesis was rejected.
difference in the two proportions is not A Type II error was made because a false
statistically significant. null hypothesis wasn’t rejected.
AP7. In a pre-election poll, 51% of a random
sample of voters plan to vote for the Investigative Tasks
incumbent. A 95% confidence interval was
computed for the proportion of all voters AP9. Fifty students want to know what percentage
who plan to vote for the incumbent. What is of Skittles candies are orange. Each student
the best meaning of “95% confidence”? takes a random sample of 50 candies and
constructs a 90% confidence interval using
If all voters were asked, there is a 95%
the formula
chance that 51% of them will say they ________
plan to vote for the incumbent. p̂(1 p̂)
You are 95% confident that the
p̂ z* _______n
confidence interval contains 51%. a. Of all Skittles candies, 20% are orange.
In 100 similar polls, you expect that 95 How many of their fifty 90% confidence
of the confidence intervals constructed intervals would you expect to capture
the population proportion of 0.20?
558 Chapter 8 Inference for Proportions
b. The students’ confidence intervals are add 2 to the number of observed failures
plotted in Display 8.36 as vertical lines. (which adds 4 to the sample size). That is,
How many of their confidence intervals if x denotes the number of successes in a
capture the population proportion of sample of n items, the new estimator of the
0.20? What percentage is this? population proportion, p, is
c. Are most of the confidence intervals that
don’t capture the population proportion x2
p̃ _____
n4
too close to 0 or too close to 0.5? Do they
tend to be shorter or longer than intervals The Plus 4 confidence interval is then
that do capture the population proportion? ________
p̃(1 p̃)
d. Unfortunately, what you have discovered
here is true in general: If you use
p̃ z*
n4
_______
Student
Bottom Mid-depth
8 3 28
98 4 389
743 5 22
3 6 236
3 7
Does it make a 81 8
difference whether you
take water samples
at mid-depth or near 9
the bottom when
studying pesticide Concentration of Aldrin
levels in a river?
Inference for differences
is fundamental to
statistical applications
because many surveys
and all experiments are
comparative in nature.
You now know how to make inferences for proportions based on categorical
data. But often the data that arise in everyday contexts are measurement data
rather than categorical data. Measurement data, as you know, are most often
summarized by using the mean. Mean income, mean scores on exams, mean
waiting times at checkout lines, and mean heights of people your age are all
commonly used in making decisions that affect your life.
In Chapter 8, you learned how to construct confidence intervals and
perform significance tests for proportions. Although the formulas you’ll use for
the inferential procedures in this chapter will change a little from those you just
learned, the basic concepts remain the same. These concepts are all built around
the question “What are the reasonably likely outcomes from a random sample?”
This chapter will follow the same outline as Chapter 8 except that the methods
are applied to means rather than proportions.
__
2. Use your calculator to find the mean, x , and standard deviation, s. Record
the value of s. [See Calculator Note 2F.]
3. Construct a confidence interval using the formula
__ s__
x 1.96 ____
n
You learned in the activity that the sampling distribution of s is skewed right.
Although the average value of s is about equal to , s tends to be smaller than
s is smaller than σ more more often than it is larger because the median is smaller than the mean for this
often than it is larger. skewed distribution. This causes the confidence intervals to be too narrow more
than half the time, so the capture rate will be less than the advertised value of 95%.
Fortunately, the sampling distribution of s becomes less skewed and more
approximately normal as the sample size increases. The histogram on the left in
Display 9.1 shows the values of s for 1000 samples of size 4 taken from a normally
distributed population with 107. For this small sample size, the distribution
is quite skewed. The histogram on the right shows the values of s for 1000 samples
of size 20. There is little skewness here, so s is smaller than about as often as it
is larger.
Relative Frequency of s
0.20
Relative Frequency of s
0.24
0.16 0.20
0.12 0.16
0.12
0.08
0.08
0.04 0.04
0 0
50 100 150 200 250 40 80 120 160 200
s s
Display 9.1 Approximate sampling distribution of s for samples
of size 4 (left) and size 20 (right) for a normally
distributed population with 107.
Solution
First, you need to check that these are random samples from relatively large
populations and that it’s reasonable to assume that the populations are normal.
(More on this is in Section 9.3.) The problem states that these are random
samples selected from a group of men and women examined by the researchers,
so your final result generalizes only to this population. The researchers reported
that the distributions are normal. You should always plot the data, however, and
the stemplots in Display 9.4 give you no reason to suspect that the populations
are not normally distributed.
Male Temperatures Female Temperatures
96 97
9 8
97 4 98 0222
5889 688
98 01 99 24
68
97 | 8 represents 97.8°
or (98.14, 98.90).
From the sample means it might appear that the average body temperature of
females is higher than that of males, but beware! There is some overlap in the
confidence intervals, so that conclusion might not be valid. You’ll come back to
the question of comparing two means in Section 9.4.
■
Solution
Check conditions. You can consider these measurements to be a random sample of all possible
measurements by this mass spectrometer. Display 9.6 shows that the distribution
of measurements from the sample is reasonably symmetric with no outliers, so it
is reasonable to assume that they could have come from a normal distribution.
Margin of Error
The quantity
s__
E t* ____
n
The margin of error is is called the margin of error. It is half the width of the confidence interval. When
half the width of the your samples are random, larger samples provide more information than do
confidence interval. smaller ones. So, in general, as the sample size gets larger, the margin of error
gets smaller.
Practice
The Effect of Estimating b. In general, how do you expect this
P1. Aldrin in the Wolf River. Aldrin is a highly estimate—using s for —to affect the
toxic organic compound that can cause center of a confidence interval? Will it
various cancers and birth defects. Ten tend to make the interval too wide or too
samples taken from Tennessee’s Wolf River narrow? Will it make the capture rate
downstream from a toxic waste site once used larger than 95% or smaller than 95%?
by the pesticide industry gave these aldrin
How to Adjust for Estimating
concentrations, in nanograms per liter:
5.17 6.17 6.26 4.26 3.17 P2. Using a t-Table. Use Table B on page 826 to
3.76 4.76 4.90 6.57 5.17 find the correct value of t* to use for each of
these situations.
a. Calculate
__
a confidence interval of the
__
form x 1.96 ___
n
, using s as an a. a 95% confidence interval based on a
estimate of . sample of size 10
Constructing a Confidence Interval Display 9.7 Number of hours of study per week.
for a Mean a. You were told that the sample can be
P5. An article in the Journal of the American considered a random selection from this
Medical Association included the body large population of students. Do the data
temperatures of 122 men. (The data in the look as if they reasonably could have
example on pages 566–567 were a random come from a normal distribution?
sample taken from this larger group of men.) b. Regardless of your answer to part a,
The summary statistics for the__
entire sample estimate the mean study hours per week
of men’s temperatures were x 98.1, for all the students taking this course,
s 0.73, and n 122. [Source: P. A. Mackowiak et with 90% confidence.
al., “A Critical Appraisal of 98.6 Degrees F, the Upper Limit
of the Normal Body Temperature, and Other Legacies of
c. Explain the meaning of your confidence
Carl Reinhold August Wunderlich,” Journal of the American interval.
Medical Association 268 (September 1992): 1578–80.]
a. Construct a 95% confidence interval for More on Interpreting a Confidence Interval
the mean of the population from which P7. Refer to the 95% confidence interval you
these data were selected. You can assume constructed for the aldrin data in P4. Which
that the conditions have been met. of statements A–E are correct interpretations
b. Write an interpretation of this confidence of that interval?
interval. A. If you take one more measurement from
c. Is 98.6°F in this interval? What can you the Wolf River, you are 95% confident
conclude? that this measurement will fall in the
confidence interval.
P6. The data in Display 9.7 are from a survey
taken in an introductory statistics class B. If you take ten more measurements from
during the first week of the semester. These the Wolf River, you are 95% confident that
data are the number of hours of study per the sample mean of the ten measurements
week reported by each of the 61 students. will fall in the confidence interval.
Because of the way students register for class, C. There is a 95% chance that the mean
aldrin level of the Wolf River falls in the
confidence interval.
574 Chapter 9 Inference for Means
D. You are 95% confident that the mean c. The probability that the interval includes
aldrin level of the Wolf River falls in the the population mean, μ, is 95%.
confidence interval. __
d. The sample mean, x , might not be in the
E. You are 95% confident that the sample confidence interval.
mean falls in the confidence interval. e. If 200 confidence intervals were generated
P8. A 95% confidence interval for a population using the same process, about 10 of these
mean is calculated for a random sample confidence intervals would not include
of weights, and the resulting confidence the population mean.
interval is from 42 to 48 lb. For each
statement, indicate whether it is a true Margin of Error
or false interpretation of the confidence P9. Refer to the confidence intervals for the
interval. Explain why the false statements mean body temperature of men and women
are false. in the example on pages 566–567.
a. 95% of the weights in the population are a. Find the margin of error for the men and
between 42 lb and 48 lb. for the women.
b. 95% of the weights in the sample are b. Which margin of error is larger? Why is it
between 42 lb and 48 lb. larger?
Exercises
E1. If you are lost in the woods and do not have Yards Walked Before Walker Yards Walked Before Walker
clear directional markers, will you tend to Went Out of Bounds Went Out of Bounds
walk in a circle? To study this question, some 35 59
students recruited 30 volunteers to attempt 37 60
to walk the length of a football field while 37 60
blindfolded. Each volunteer began at the 38 61
middle of one goal line and was asked to
40 65
walk to the opposite goal line, a distance of
42 68
100 yards. None of them made it. Display
9.8 shows the number of yards they walked 42 70
before they crossed a sideline. 48 70
30
Difference in Pulse Rates 20
10
–10
–20
–30 A B C D
Experiment
statistic parameter p̂ p0
z ________________________ ___________
_________
p0(1 p0)
standard deviation of statistic
_________
n
Solution
__
x μ0 __________
__ 3.16 3.11
t ______ __ 2.31
s/n 0.065/9
■
Constructing a t-Distribution
Before you can start interpreting the new test statistic t, you need to know how
it behaves when the null hypothesis is true. If the value of t from your sample fits
right into the middle of the distribution of t constructed by assuming the null
hypothesis is true, you have no evidence against the null hypothesis. If the value of
t from your sample is way out in the tail of the t-distribution, you have evidence
180 180
160 160
Frequency of z
140 140
Frequency of t
120 120
100 100
80 80
60 60
40 40
20 20
-6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6
z t
Display 9.18 (Left) 1000 values of z computed from random samples taken from a normally
distributed population with mean 100 and standard deviation 15; (right) 1000
values of t computed from random samples from the same population.
Note that the values of t __are more spread out than are the values of z. That
makes sense. Not only does x vary from sample to sample, but s also varies from
sample to sample. And, as you saw in Section 9.1, s tends to be smaller than
more often than it tends to be larger than , making t larger in absolute value than
the corresponding z more often than it is smaller. Thus, when you compute t, you
tend to get more values in the tails of the distribution than when you compute z.
The t-Distributions
Suppose you draw random samples of size n from a normally distributed
population with mean μ and unknown standard deviation. The distribution of
the values of
__
x μ
t _____
__
s/n
is called a t-distribution. There is a different t-distribution for each degree of
freedom, df n 1, where n is the sample size.
A t-distribution is mound-shaped, with mean 0 and a spread that depends
on the value of df. The greater the df, the smaller the spread. The spread of
any t-distribution is greater than that of the standard normal distribution.
Display 9.19 shows the t-distribution for df 4 plotted on the same graph as
the standard normal distribution.
0.4 z
Probability Density
0.3 t
0.2
0.1
0.0
–4 –3 –2 –1 0 1 2 3 4
0.4 z
Probability Density
0.3 t
0.2
0.1
0.0
–4 –3 –2 –1 0 1 2 3 4
You can explore graphs of t-distributions using your calculator. [See Calculator
Note 9C.]
_1 _1
2 P = 0.0248 2 P = 0.0248
–2.31 0 2.31
Display 9.21 P-value for t 2.31 on a t-distribution with df 8.
You can get this P-value from your calculator or software. Display 9.22 shows the
P-value 2(0.0248) 0.0496, or approximately 0.05, on various printouts. [See
Calculator Note 9D to learn how to find a P-value given t and df.]
T-Test of the Mean
Test of mu = 3.1100 vs mu not = 3.1100
Count: 9
Mean: 3.16
Std dev: 0.065
Std error: 0.0216667
Student's t: 2.308
DF: 8
P-value: 0.05
Display 9.22 P-values from Minitab, Fathom, and the TI-84 Plus.
If you do not have a calculator or software that finds P-values, you can get
an estimate of the P-value from Table B on page 826. The next example shows
you how.
_1 P _1 P
2 2
–3.98 0 3.98
Display 9.23 A test statistic, t, of 3.98.
Solution
Go to Table B on page 826 and find the row with 9 degrees of freedom. Go across
the row until you find the absolute value of your value of t, which probably will
lie between two of the values in the table. The partial t-table here shows the two
values of t that lie on each side of t 3.98.
Tail Probability p
df .0025 .001
9 3.690 4.297
The “tail probability” gives the area that lies in each tail. Because you have a
two-sided test, you will double the tail probability. If you must use Table B to find
a P-value, all you can say is that the P-value is between 2(0.001), or 0.002, and
2(0.0025), or 0.005. That is, 0.002 P-value 0.005. To find the P-value more
precisely, use your calculator to get approximately 0.0032.
or
_1 P _1 P
2 2
–t 0 t
When the P-value is close to 0, you have convincing evidence against
the null hypothesis. When the P-value is large, the result from the sample is
consistent with the hypothesized mean and you don’t have evidence against the
null hypothesis.
DISCUSSION P-Values
D9. Study the formula for the test statistic when testing a hypothesis about a
population mean, and think about the relationship of the test statistic to the
P-value.
a. What happens to the P-value if the sample standard deviation increases
but everything else remains the same?
b. What happens to the P-value if the sample size increases but everything
else remains the same?
Fixed-Level Testing
When you use fixed-level testing, you reject the null hypothesis in favor of the
alternative hypothesis if your P-value is less than the level of significance, .
If your P-value is greater than or equal to the level of significance, you do not
reject the null hypothesis.
The significance level is equal to the probability of rejecting the null
hypothesis when it is true (making a Type I error).
The smaller the significance level you choose, the stronger you are requiring
the evidence to be in order to reject H0. The stronger the evidence you require, the
less likely you are to make a Type I error (to reject H0 when it is true). However,
the stronger the evidence you require, the more likely you are to make a Type II
error (to fail to reject H0 when it is false).
Note that, in the previous example, if your calculator or software rounded the
P-value to 0.05, you would not have rejected the null hypothesis at a significance
level, , of 0.05. That’s the major drawback of fixed-level testing: It boils down all
the data to the two choices “reject” or “don’t reject.” To see what gets lost, think
about a court trial where “not guilty” can mean anything from “This guy is so
innocent he should never have been brought to trial” to “Everyone on the jury
thinks the defendant did it, but the evidence presented isn’t quite strong enough.”
In the same way, “do not reject” communicates only a small fraction of the
The t-Test
Tests of significance for means, called t-tests, have the same general structure as
significance tests for proportions, although some details are a bit different.
__ _1 P _1 P
x μ0 2 2
t ______
__
s/n
–t 0 t
The next example demonstrates a fixed-level significance test with all four
steps included.
You can use your calculator to perform a significance test for a mean. [See
Calculator Note 9E.]
In the next example, you will learn one technique for dealing with outliers.
Solution
Check conditions. This type of problem is commonly referred to as a measurement error problem.
There is no population of measurements from which this sample was selected.
The measurements were, however, independently determined by different
scientists and can be thought of as a “random” sample taken from a conceptual
population of all such measurements that could be made.
The stemplot in Display 9.25 shows that one measurement is extremely large
compared to the others. This sample does not look as if it could reasonably have
been drawn from a normally distributed population. The large measurement,
93.28 million miles, is Newcomb’s original measurement from 1895 and thus is
different from the more modern measurements. There might be a good scientific
reason to remove it from the data set. So the analysis will be done twice, once with
Newcomb’s measurement and once without it.
Letting μ denote the true value of the astronomical unit, the hypotheses are
State H0 and Ha. H0: μ 93 and Ha: μ 93
__
From the data, x 92.93 and s 0.112. The test statistic then is given by
__
Compute the test x μ0 ____________
__ 92.93 93.00
_1 P = 0.0148 _1 P = 0.0148
statistic, find the P-value, t ______ ___ 2.42 2 2
and draw a sketch.
s/n 0.112/15
–2.42 0 2.42
One-Sided Tests
Sometimes it makes So far in this section, every test has had a two-sided alternative hypothesis: The
sense to consider true mean, μ, is not equal to the standard, μ0. But there are two other possible
alternatives that depart alternatives: μ might be less than the standard, or μ might be greater than the
from the standard in standard. In real applications, you sometimes can use the context to rule out one
only one direction.
of these two possibilities as meaningless, impossible, uninteresting, or irrelevant.
In Martin v. Westvaco (Chapter 1), a statistical analysis compared the ages
of the workers who were laid off with the ages of workers in the population of
employees working for Westvaco at the time of the layoff. An average age for the
fired workers that was greater than the population mean would tend to support
a claim of age discrimination. On the other hand, the opposite inequality is
not relevant: An average age for the fired workers that was less than the overall
average would not be evidence of age discrimination, because younger workers
aren’t protected under the law.
When the context tells you to use a one-sided alternative hypothesis, your
P-value is the area on only one side of the t-distribution. Such P-values are called
one-sided or one-tailed P-values, and the corresponding test of significance is
called a one-sided or one-tailed test, just as in Chapter 8.
__ P = 0.0331
Compute the test x μ0 ___________
statistic, find the P-value, __ 1050 1015
t ______ ___ 1.87
and draw a sketch.
s/n 150/64
0 t = 1.87
The P-value is the area to the right of 1.87 under the t-distribution with
df 64 1, or 63. From a calculator, the P-value is about 0.0331.
Write a conclusion linked A P-value of only 0.0331 is fairly strong evidence against the null hypothesis
to computations and that the mean amount of time spent studying each week is 1015 minutes. Thus,
in the context of the the data support the dean’s claim that the mean is greater than 1015 minutes.
situation. ■
Practice
The Test Statistic Constructing a t-Distribution
P10. The thermostat in your classroom is set P12. One of the distributions in Display 9.26 is a
at 72°F, but you think the thermostat isn’t t-distribution, and the other is the standard
working well. On seven randomly selected normal distribution. Which is which?
days, you measure the temperature at Explain how you know.
your seat. Your measurements (in degrees 0.4
Fahrenheit) are 71, 73, 69, 68, 69, 70, and 71.
What is the test statistic for a significance 0.3
test of whether the mean temperature at your
seat is different from 72°F? 0.2
y
Frequency
corresponds to evidence that μ μ0.
Frequency
40 40
What is the relationship between the
20 20
two-sided and one-sided P-values for
a given value of the test statistic? 0 0
–8 –4 0 4 –8 –4 0 4
c. If all else is equal and the alternative Histogram A Histogram B
hypothesis is in the right direction, will Display 9.30 Histograms of the t-values and
the P-value be larger for a one-sided test z-values computed for 200 random
or a two-sided test? samples of size 4 taken from a normal
E34. For the aldrin data of P1 on page 573, find distribution with mean 100 and
the P-value for each alternative hypothesis. standard deviation 20.
a. Ha: μ 4 a. Compare the shapes, centers, and spreads
b. Ha: μ 4 of these two distributions.
c. Ha: μ 4 b. Choose which is the distribution of
E35. Suppose H0 is true and you are using a t-values, and give the reason for your
significance level of 5%. Which gives a larger choice.
chance of a Type I error, a one-sided test or a E40. Degrees of freedom tell how much
two-sided test? Explain. information in your sample is available for
E36. If the null hypothesis is true, does using estimating the standard deviation of the
0.05 or 0.01 give a larger chance of population. In a t-test, you use s to estimate
a Type I error? . Th__e more deviations from the mean,
(x x ), you have available to use in the
E37. In each part, the null hypothesis is actually
formula for s, the closer s should be to .
false. Which test will have more power, all
But, as you will see in this exercise, not all
other things being equal?
of these deviations give you independent
a. a test with 0.01; a test with 0.10 information.
b. a test with n 45; a test with n 29 a. For a sample of size n, how many
c. a one-sided test; a two-sided test deviations from the mean are there?
E38. Suppose μ 0 and you are using a What is their sum? (See Section 2.3.)
significance level of 5%. Which of these b. Suppose you have a random sample of
three tests has the greatest power (probability size n 1 from a completely unknown
of rejecting the false null hypothesis that population. Call the sample value x1.
μ 0)? Does the value of x1, by itself, give you
A. a one-sided test of H0 versus Ha: μ 0 information about the spread of the
population?
B. a two-sided test
c. Next suppose you have a sample of
C. a one-sided test of H0 versus Ha: μ 0
size n 2, with__values x1 and x2. If one
E39. Histograms A and B in Display 9.30 were deviation, x1 x , is 3, what is the other
generated from 200 random samples of size deviation?
4, each selected from a normal distribution
d. Now suppose you have a sample of size
n 3. Suppose you know two of the
60
50
Frequency
40
30
20
10
0
2000 4000 6000
Brain Weight (g)
Display 9.31 Brain weights for a selection of species. [Source:
T. Allison and D. V. Cicchetti, “Sleep in Mammals: Ecological and
Constitutional Correlates,” Science 194 (1976): 732–34.]
The lesson to be learned from Activity 9.3a is that if you have data from a
distribution that is far from normal, confidence intervals and hypothesis tests
might not behave the way they are supposed to.
Frequency
10
10
5
5
0
0
–2 0 2 4 6 8 1 2 3 4 5 6
ln(brain weight) Sample Mean
Display 9.32 Logarithms of brain Display 9.33 Sample means for
weights. 100 samples of size 5
from the logarithms
of brain weights. ■
Solution
The boxplots show that the gallons per mile distribution doesn’t have an outlier
and is less skewed than the miles per gallon distribution and thus should be better
suited to inference.
Check conditions. The second condition for constructing a confidence interval for a mean also
is met—you were told this is a random sample taken from the population of
compact car models. The third condition is met if there are at least 10(15), or 150,
models of compact cars. Because this isn’t the case, the confidence interval will be
wider than necessary for 95% confidence. __
The statistics from the sample are x 0.0405, s 0.00669, and n 15. For
df 15 1, or 14, use t* 2.145. A 95% confidence interval for the sample mean is
__ s__ 0.0405 2.145 _______
Show computations. x t* ____
n
0.00669
15
___ 0.0405 0.0037
Give conclusion in You are 95% confident that the mean gallons per mile of all models of compact cars
context. lies in the interval from 0.0368 to 0.0442. So, any mean gal/mi in that interval could
have produced the result from the sample as a reasonably likely outcome.
Applying the methodology of this chapter to gal/mi measurements will,
over many random samples, produce intervals with a capture rate closer to the
nominal 95% than would using miles per gallon measurements.
■
0 15 40
Sample Size
Practice
What If My Population Is Not Normal? Number of Fries Number of Fries
in Large Bag in Large Bag
P25. Pretend that each data set (A–D) is a random
sample and that you want to do a significance 63 108
test or construct a confidence interval for 122 92
the unknown mean. Use the sample size and 92 67
the shape of the distribution to decide which 96 90
description (I–IV) best fits each data set. 72 103
A. weights of bears (Display 2.6 on page 33) 68 72
B. number of passengers at airports (Display 88 76
7.53 on page 462) 64 93
C. speed of mammals (Display 2.25 on 96 93
page 44) 76 99
D. female life expectancy in Africa and 111 67
Europe (Display 2.53 on page 69) 144 101
I. There are no outliers, and there is no 93 92
evidence of skewness. Methods based 118 110
on the normal distribution are suitable. 79 94
II. The distribution is not symmetric, but
the sample is large enough that it is
reasonable to rely on the robustness
of the t-procedure and construct
a confidence interval, without 60 70 80 90 100 110 120 130 140 150
transforming the data to a new scale. Number of Fries in Large Bag
Frequency
I. There are no outliers, and there is no 15
evidence of skewness. Methods based 10
on the normal distribution are suitable.
5
II. The distribution is not symmetric, but
the sample is large enough that it is 0
.190 .220 .250 .280 .310 .340
reasonable to rely on the robustness Batting Average
of the t-procedure and construct
a confidence interval, without D. self-reported grade-point averages of
transforming the data to a new scale. 67 students
III. The shape suggests transforming. With
a larger sample, this might not be
necessary, but for a skewed sample of
this size transforming is worth trying.
IV. It would be a good idea to analyze this 1.5 2.0 2.5 3.0 3.5 4.0
data set twice, once with the outliers and GPA
once without. E42. See the instructions above.
E41. See the instructions above. A. mean number of people per room for
A. weights, in ounces, of bags of potato chips various countries
Frequency
Length of Stay 100
Count 396 50
Mean 2.91
20 30 40 50 60 70 0
SD 1.58 1 2 3 4 5 6 7 8 9
Age (yr) Length of Stay (days)
E43. Health insurance companies look for ways Display 9.38 Summary and data plot for lengths of
to lower costs, and one way is to shorten the hospital stays for Insurer B.
length of hospital stays. A study to compare
two large insurance companies on length a. Is it appropriate to construct a confidence
of stay (LOS) for pediatric asthma patients interval without transforming these data?
randomly sampled 393 cases from Insurer A. Explain.
Summary statistics and a histogram of the b. Regardless of your answer to part a,
data are shown in Display 9.37. estimate the mean LOS for Insurer B in a
90% confidence interval.
200
c. Would you be more concerned about
Summary of Insurer A: 150 constructing a confidence interval
Frequency
Length of Stay
100 without a transformation if the sample
Count 393
50 size was 40 instead of nearly 400? What
Mean 2.32 about a sample size of 4 instead of
SD 1.23 0 nearly 400?
1 2 3 4 5 6 7 8
Length of Stay (days) d. Compare your confidence intervals in
Display 9.37 Summary and data plot for lengths of part b of E43 and E44. Does it appear that
hospital stays for Insurer A. [Source: the two insurers differ in mean LOS?
R. Peck, L. Haugh, and A. Goodman, Statistical E45. Display 9.39 (on the next page) gives the
Case Studies, 1998, ASA-SIAM, 45–64.]
location and mass of 31 black holes. Assume
a. Is it appropriate to construct a confidence that these can be considered a random
interval without transforming these data? sample of all black holes in the universe.
Explain. a. Have the conditions been met for
b. Regardless of your answer to part a, constructing a confidence interval for a
estimate the mean LOS for Insurer A in a mean? If not, how would you recommend
90% confidence interval. that the analysis proceed?
c. Would you be more concerned about b. Construct and interpret a 90% confidence
constructing a confidence interval interval for a mean, proceeding in the
without a transformation if the sample way you recommended in part a.
size was 40 instead of nearly 400? What
about a sample size of 4 instead of
nearly 400?
E44. An independent random sample of 396 cases
from Insurer B gave the results for length of
stay summarized in Display 9.38.
Frequency
10
mean on this transformed scale.
5
a. Are there any problems with computing a Display 9.44 Los Angeles rainfall, in inches per
confidence interval estimate of the mean season, for 128 seasons.
mercury levels in fish for the lakes of a. You want to estimate mean seasonal
Maine using the values in the table? rainfall in a confidence interval and
If so, how do you recommend handling can afford to check a random sample
the analysis? of only seven seasons. If you suspect
b. One newspaper headline proclaimed, that the population has a shape like that
“Mercury: Maine Fish Are Contaminated in the histogram in Display 9.44, can
by This Deadly Poison.” Most states you proceed, or should you consider a
consider mercury levels of 0.5 ppm as the transformation?
borderline for issuing a health advisory. b. Display 9.45 shows a reciprocal
Does it appear that, on average, Maine transformation of the rainfall data. The
9.3 When Things Aren’t Normal 613
mean of the distribution is about 0.081. E50. Display 9.47 shows a random sample of
Does this mean have a meaningful 7 seasons taken from the 128 seasons of
interpretation? Are you satisfied with the Los Angeles rainfall in E49.
conditions for inference now? Rainfall
40 Season (in.)
Frequency
30 1958–59 5.58
20 1996–97 12.4
10 1906–07 19.3
0 0.05 0.10 0.15 0.20 0.25 1926–27 16.58
Los Angeles Rainfall (yr/in.) 1967–68 11.88
1932–33 11.88
1944–45 11.59
19 28.15 9.32 13.85 10.01 Display 9.50 Boxplots of the times of each of the
20 33.98 12.64 15.48 28.18 four treatment groups.
Mean 18.17 9.01 13.69 12.85 c. Using the procedure you think best,
Display 9.49 Data on visual/verbal report times estimate each of the four treatment means
(in seconds) from the Mount Holyoke in 90% confidence intervals. Is there any
experiment. evidence against the theory?
Hypothetical Data Set 4 Hypothetical Data Set 5 Hypothetical Data Set 6 Hypothetical Data Set 7
Bottom Mid-Depth Bottom Mid-Depth Bottom Mid-Depth Bottom Mid-Depth
2 2 2 2
88 3 2288 8 3 2 3 8 3
9988 4 338899 9 4 38 8 4 28 98 4 28
774433 5 2222 43 5 2 98 5 389 743 5 389
33 6 223366 3 6 36 743 6 22 3 6 22
33 7 3 7 3 7 236 3 7 236
8811 8 8 8 3 8 81 8
9 9 81 9 9
Display 9.51 Concentrations of aldrin (in tenths of a nanogram per liter) for samples from
the Wolf River: actual data and seven hypothetical data sets. [Source: P. R. Jaffe,
F. L. Parker, and D. J. Wilson, “Distribution of Toxic Substances in Rivers,” Journal of Environmental
Engineering Division 108 (1982): 639–49 (via Robert V. Hogg and Johannes Ledolter, Engineering
Statistics (New York: Macmillan, 1987).]
Most people can simply look at hypothetical data set 1 in Display 9.51 and
correctly conclude that the mean concentration at the bottom probably isn’t
equal to the mean concentration at mid-depth. However, with the actual data
and the other hypothetical data sets, it’s not clear if there really is a difference.
And to estimate the size of any difference, you need to compute a confidence
interval.
2 2
__ __ s 1 __
s
(x 1 x 2) t* __
n1 2n2
__ __
where x 1 and x 2 are the respective means of the two samples, s1 and s2 are
the standard deviations, and n1 and n2 are the sample sizes. It is best to use a
calculator or statistics software to find this confidence interval because the
value of t* depends on a complicated calculation.
Give interpretation in context and linked to computations.
For a 95% confidence interval, for example, you are 95% confident that if you
knew the means of both populations, the difference between those means,
μ1 μ2, would lie in the confidence interval.
For an experiment, the interpretation is like this: If all experimental units
could have been assigned each treatment, you are 95% confident that the
difference between the means of the two treatment groups would lie in the
confidence interval.
Of course, when you interpret a confidence interval, you do it in context,
describing the two populations.
______
s2 s2
n1 n2 , comes directly from the rules
The standard error of the difference, __1 __2
for random variables in the box in Section 6.1 on page 372. That is, if you have
s 2 s2 s 2 s2
two independent random variables with variances ____
n
1
n1 , and n , or n2 ,
__ , or __
1 ____
2
__ __2
1 2
2 2
s1 s
error, __
n1 __2 .
n2
Solution
You will construct a confidence interval for the difference between two means
(also called a two-sample t-interval). You have two samples from two populations,
but you have no information about how randomly they were taken with respect
to either their location in the river or time. Further, you don’t know if the samples
were taken independently of one another. If, for example, a pair of bottom and
Name the procedure mid-depth measurements were taken at the same time from the same spot, the
and check conditions. measurements would not be independent (and you would use the techniques of
the next section). Neither sample is skewed or has outliers, and each looks like it is
reasonable to assume that it came from a normally distributed population.
Here are the summary statistics:
__
Bottom: x 1 6.04 s1 1.579 n1 10
__
Mid-depth: x 2 5.05 s2 1.104 n2 10
Using a calculator (which does not give t*), you get a 95% confidence interval of
_______ ________________
s 21 __
s 22
2 2
__ __ (1.579) (1.104)
(x 1 x 2) t* n1 n2 (6.04 5.05) t*
__ _______ _______
10 10
Do computations.
0.99 1.2909
or (0.3009, 2.2809), where df 16.10.
You are 95% confident that the true difference between the mean aldrin
levels in the two depths in the Wolf River, μbottom μmid-depth, is in the interval
(0.3009, 2.2809). This interval overlaps 0 slightly, so there is insufficient
evidence to say that the true mean of the bottom measurements differs from the
Give interpretation in true mean of the mid-depth measurements (but it is close). However, just as for
context and linked to any confidence interval, unless the conditions are satisfied, there is no automatic
computations. guarantee that the capture rate will be equal to the advertised confidence
level. Thus, the conclusion must be limited to the population of potential
measurements from which these samples were taken.
■
Use these data to find a 95% confidence interval estimate of the difference
between mean walking times for the special exercises group and the exercise
control group if all babies could have had each treatment.
Exercise
Control
9 10 11 12 13 14 15
Age (mo)
Display 9.54 Dot plots of two samples.
__
Do computations. Special exercises: x 1 10.125 n1 6 s1 1.447
__
Exercise control: x 2 11.375 n2 6 s2 1.896
The two-sample t-interval function of a calculator gives df 9.35 and the
confidence interval (3.44, 0.94). [See Calculator Note 9F to learn how to calculate
a two-sample t-interval. You can start with the actual data or summary statistics.]
Give interpretation in You are 95% confident that if all the babies could have been in the special
context. exercises group and all the babies could have been in the exercise control group,
the difference in the mean age at which they would learn to walk is in the interval
from 3.44 months to 0.94 month. Because 0 is in the confidence interval, you
have no evidence that there would be any difference if you were able to give each
treatment to all the babies. However, it is important to note that any conclusion
is subject to doubt about the appropriateness of this procedure due to skewness
of the population of potential measurements, so it would be good to confirm this
finding with more data.
Display 9.55 shows printouts for the two-sample t-interval for the walking
babies experiment from three commonly used statistical software packages.
Fathom
Estimate of Walking Babies Difference of Means
First attribute (numeric): Special_Exercise
Second attribute (numeric or categorical): Exercise_Control
Interval estimate for the population mean of Special_Exercise
minus that of Exercise_Control
Special_Exercise Exercise_Control
Count: 6 6
Mean: 10.125 11.375
Std dev: 1.44698 1.89572
Std error: 0.590727 0.773924
Minitab
TWOSAMPLE T FOR Special VS Exercise
N Mean StDev SEMean
Special 6 10.12 1.45 0.59
Exercise 6 11.38 1.90 0.77
95 PCT CI FOR MU Special MU Exercise: (3.45, 0.95)
TTEST MU Special = MU Exercise (VS NE): T=1.28 P=0.23 DF= 9
Display 9.55 Printouts of estimation from three statistical
software packages.
■
2__
x 1x 2
__
2
x1 __
2 2
x2 __
2 2
_______ _____ _____
df df1 df2
2 2 2 2 2 2
_________
n n ______
n ______
n
2
s
__
1 s
__ s
__
2 s
__ 1 2
1 2
1
2
df n1 1 n2 1
a. Verify the value of df given in the aldrin and walking babies examples on
pages 620 and 621.
b. If n1 n2, derive a simplified version of the formula for df.
c. If n1 n2 and, in addition, s1 s2, derive an even simpler rule for df.
Ha: μ1 μ2 or Ha: μ1 μ2 0
Ha: μ1 μ2 or Ha: μ1 μ2 0
3. Compute the test statistic, find the P-value, and draw a sketch. Compute
the difference between the sample means (because the hypothesized mean
difference is zero), measured in estimated standard errors:
__ __
(x 1 x 2) 0 _1 P _1 P
t ___________
_______
2 2
n n
s21
__ s22
__
1 2
–t 0 t
Use the two-sample t-test function of your calculator or statistics software
to get the P-value.
4. Write your conclusion, linked to your computations and in the context
of the problem. If you are using fixed-level testing, reject the null
hypothesis if your P-value is less than the level of significance, . If the
P-value is greater than or equal to , do not reject the null hypothesis.
(If you are not given a value of , you can assume that is 0.05.) Write a
conclusion that relates to the situation and includes an interpretation of
your P-value.
where μbottom is the mean aldrin concentration at the bottom of the Wolf River
and μmid-depth is the mean concentration at six-tenths depth.
You are looking for a difference in either direction, so the alternative
hypothesis is two-sided:
Ha: μbottom μmid-depth or Ha: μbottom μmid-depth 0
Compute the test Here are the summary statistics for the aldrin concentrations:
statistic, find the P-value, __
and draw a sketch. Bottom: x 1 6.04 n1 10 s1 1.579
__
Mid-depth: x 2 5.05 n2 10 s2 1.104
n
2 2
s1 __
__ s
2 n2 _1 P = 0.0618 _1 P = 0.0618
1 2 2
(6.04 5.05) 0
_______________
_____________
1.5792 ______
______
10
1.1042
10 –1.625 0 1.625
1.625
From a calculator, you get an approximate df of 16.10 and a P-value of 0.1236.
[See Calculator Note 9G.]
Give conclusion in Conclude that, because the P-value for a two-sided test is greater than
context. 0.10, you do not reject the null hypothesis. There is insufficient evidence
to claim that the mean aldrin concentration at the bottom of the Wolf River is
Data Desk
2-Sample t-Test of μ1 – μ2
No Selector
Individual Alpha Level 0.10
H0: μ1 – μ2 = 0 Ha: μ1 – μ2 ≠ 0
bottom – middepth
Test H0: μ(bottom) – μ(middepth) = 0 vs Ha: μ(bottom) – μ(middepth) ≠ 0
Difference Between Means = 0.99000000 t-Statistic = 1.625 w/16 df
Fail to reject H0 at Alpha = 0.10
p = 0.1236
Fathom
Bottom Mid_Depth
Count: 10 10
Mean: 6.04 5.05
Std dev: 1.57917 1.10378
Std error: 0.499377 0.349046
Minitab
TWOSAMPLE T FOR Bottom VS Middepth
N Mean StDev SEMean
Bottom 10 6.04 1.58 0.50
Middepth 10 5.05 1.10 0.35
TTEST MU Bottom = MU Middepth (VS NE): T=1.62 P=0.12 DF=16
Display 9.56 Printouts of significance tests from three statistical
software packages.
■
For the aldrin data, it makes sense to use a two-sided alternative. (Researchers
wanted to know whether samples taken at mid-depth would give essentially
the same results as samples taken near the bottom.) For the walking babies
where μspecial ex is the mean age that the babies in the experiment would first walk
if they all could have been given the special exercises and μcontrol ex is the mean age
that the babies would first walk if all babies in the experiment could have received
the exercise control treatment. (You can also use the symbols μ1 and μ2 as long as
you define the symbols.)
For this one-sided test, the alternative hypothesis is
Ha: μspecial ex μcontrol ex or, equivalently, Ha: μspecial ex μcontrol ex 0
Here are the summary statistics:
__
Special exercises: x 1 10.125 n1 6 s1 1.447
__
Exercise control: x 2 11.375 n2 6 s2 1.896
n
2 2
s1 __
__ s
Compute the test 2 n2
1
statistic, find the P-value, P = 0.115
and draw a sketch. (10.125 11.375) 0
___________________
_____________
______
6
1.8962
1.4472 ______
6 –1.284 0
1.284
From a calculator, df is 9.35 and the P-value is 0.115.
Give conclusion linked Because the P-value, 0.115, is greater than 0.05, you do not reject the null
to computations and in hypothesis. The evidence isn’t convincing that babies who are given the special
context. exercises walk at an earlier age than babies who are given the control exercise. In
other words, if the researchers had been able to give both the special exercises and
the control exercises to all the babies in the experiment, they would have been
reasonably likely to get groups as different as these even if the special exercises
made no difference.
■
Why Not Two Separate Confidence Intervals? Significance tests for the
difference of two means take a little getting used to. It is natural to ask why you
can’t simply compute two separate confidence intervals, one for μ1 and one for μ2,
and check to see if they overlap. This method will tell you if there are any values
that are plausible means for both populations. If so, then you wouldn’t reject the
null hypothesis that the difference in the means is 0. The difficulty is that the
method is too conservative, meaning that you won’t reject a false null hypothesis
often enough. In other words, you have sacrificed power.
However, you can use this rule: If you construct two separate confidence
intervals for the means of two populations (at confidence level 1 ) and they
don’t overlap, you are safe in rejecting the null hypothesis that the means are equal
at significance level . If the intervals overlap, you can come to no conclusion.
n
2 2
__ __ s 1 __
s
(x 1 x 2) t* __ 2 n2
1
n
2 2
s 1 __
__ s
2 n2
1
Practice
A Confidence Interval for the Difference Walking Direction Dot Plot
Between Two Means
P27. As you read in E1 on page 575, some
L
Handed
students recruited 30 volunteers to attempt
to walk the length of a football field while
R
blindfolded. Each volunteer began at the
middle of one goal line and was asked to 0 20 40 60 80 100
walk to the opposite goal line, a distance Distance
of 100 yards. The dot plot and summary Descriptive Statistics
statistics in Display 9.57 show the distance Variable Handed N Mean Median TrMean StDev SEMean
at which the volunteer crossed a sideline Yard Line L 7 59.57 61.00 59.57 14.77 5.58
of the field and whether the volunteer was R 23 58.00 56.00 57.33 15.71 3.28
Exercises
E53. Kelly randomly assigned eight golden experiment.) The resulting measurements of
hamsters to be raised in long days or short enzyme concentrations (in milligrams per
days. She then measured the concentrations 100 milliliters) for the eight hamsters (shown
of an enzyme in their brains. (Refer to in Display 9.60) were
page 244 for more about Kelly’s hamster Short days: 12.500 11.625 18.275 13.225
Long days: 6.625 10.375 9.900 8.800
632 Chapter 9 Inference for Means
Short concentrations between the two groups
of hamsters is due to the difference in the
hours of daylight?
Long E55. The inflammation caused by osteoarthritis
of the knee can be very painful and can
5 10 15 20 inhibit movement. Leech saliva contains
Display 9.60 A dot plot of Kelly’s hamster data. anti-inflammatory substances. To study the
therapeutic effect of attaching four to six
a. Are the conditions met for inference
leeches to the knee for about 70 minutes,
about the difference of two means?
51 volunteers were randomly assigned to
b. Regardless of your answer to part a, receive either the leech treatment or a topical
construct a 95% confidence interval. gel, diclofenac. [Source: Andreas Michalsen et al.,
c. You are 95% confident that something is “Effectiveness of Leech Therapy in Osteoarthritis of the Knee:
A Randomized, Controlled Trial,” Annals of Internal Medicine
in the interval you constructed in part b.
139, no. 9 (November 4, 2003): 724–30.]
Describe exactly what that something is.
a. This summary table gives the results of a
d. Does Kelly have statistically significant
pretreatment measure of the amount of
evidence to back her claim that
pain reported by the two groups, before
the observed difference in enzyme
beginning therapy. A higher score means
concentrations between the two groups
more pain. The researchers hoped that
of hamsters is due to the difference in the
the randomization would result in two
hours of daylight?
comparable groups with respect to this
E54. Suppose Kelly’s means and the shapes of the variable. Construct and interpret a 95%
distributions are the same as in E53 but the confidence interval for the difference in
enzyme concentrations are more variable, as means. Is there statistically significant
shown in Display 9.61. evidence that the randomization failed to
yield groups with comparable means?
Short days: 9.500 8.625 27.275 10.225
__
Leech: x 1 53.0 s1 13.7 n1 24
Long days: 4.625 12.375 11.900 6.800
__
Short
Topical gel: x 2 51.5 s2 16.8 n2 27
b. Because a high body mass can stress
the knee, the researchers hoped that
Long the randomization would result in two
comparable groups with respect to this
0 5 10 15 20 25 30
variable as well. This summary table gives
Display 9.61 Dot plots of altered hamster data. the body mass index of the two treatment
a. With the altered data, are the conditions groups, before beginning therapy.
met for inference about the difference of Construct and interpret a 95% confidence
two means? interval for the difference in means. Is
there statistically significant evidence that
b. Regardless of your answer to part a,
the randomization failed to yield groups
construct and interpret a 95% confidence
with comparable means?
interval.
__
c. If the values had been this variable, Leech: x 1 27.6 s1 3.7 n1 24
would Kelly have had statistically __
significant evidence to back her claim Topical gel: x 2 27.1 s2 3.7 n2 27
that the observed difference in enzyme
Box Plot 5 5
Calcium
5 9
5 6
M
5 10
Sex
10 16
F
10 13
10 1
1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 10 0
Calcium
10 0
Display 9.63 Calcium in the blood of women and 10 0
men, in millimoles per liter. [Source: JSE
Data Archive, www.amstat.org.] 10 0
10 3
E59. In warm and humid parts of the world, a
constant battle is waged against termites.
Scientists have discovered that certain 5 mg
tree resins are deadly to termites, and thus
the trees producing these resins become a
valuable crop. In one experiment typical
of the type used to test the protective power
of a resin, two doses of resin (5 mg and 10 mg
10 mg) were dissolved in a solvent and
0 2 4 6 8 10 12 14 16
placed on filter paper. Eight dishes were
prepared with filter paper at dose level 5 mg Descriptive Statistics
and eight with filter paper at dose level Variable Dose N Mean Median TrMean StDev SEMean
10 mg. Twenty-five termites were then Count 5 8 9.500 10.500 9.500 2.673 0.945
10 8 4.13 0.50 4.13 6.53 2.31
placed in each dish to feed on the filter
paper. At the end of 15 days, the number of Variable Dose Min Max Q1 Q3
Count 5 5.000 12.000 6.750 11.750
surviving termites was counted. The results 10 0.00 16.00 0.00 10.50
are shown in Display 9.64. In parts a–c,
Display 9.64 Number of termites, out of 25,
you’ll determine if there is a statistically surviving after being placed in a dish
significant difference in the mean number of with 5 mg or 10 mg of a resin. [Source:
survivors for the two doses. lib.stat.cmu.edu.]
a. Are the conditions for a two-sample t-test
met here?
9.4 Inference for the Difference Between Two Means 635
E60. The fact that your food usually tastes good is E61. Is there sufficient evidence from the random
no accident. Food manufacturers regularly samples of heart rates for men and women
check taste and texture by recruiting under normal conditions in Display 9.66
taste-test panels to measure palatability. to say that the mean heart rates differ for
A standard method is to form a panel with the two groups? Analyze the data in two
50 persons—25 men and 25 women—to different ways (confidence interval and
do the tasting. In one such experiment, test of significance) before coming to your
simplified here, coarse versus fine texture conclusion, and then state your conclusion
was compared. Panel members were carefully.
assigned randomly to the treatment groups Heart Rate
as they were recruited. There were 16 panels (beats per minute)
of 50 consumers each. The variables in
Men Women
Display 9.65 are these:
74 75
Total palatability score for the panel of 50: A
80 66
higher total score means the food was rated
more palatable by the panel. 75 57
69 87
Texture: 0 (coarse), 1 (fine)
58 89
Score Texture
76 65
35 0
78 69
39 0
78 79
77 0
86 85
16 0
84 59
104 1
71 65
129 1
80 80
97 1
75 74
84 1
24 0
Display 9.66 Heart rates. [Source: Journal of Statistics
Education Data Archive, www.amstat.org,
21 0 April 15, 2002.]
39 0
E62. The data on mercury content of fish in
60 0
the lakes of Maine in E48 on page 613 are
65 1 augmented in Display 9.67 by two other
94 1 variables: whether the lake is formed behind
86 1 a dam and the oxygen content of the water.
64 1
Frequency
walked for males and females. Use the Length of Stay
100
summary statistics in the printout in Count 393
Display 9.69. 50
Mean 2.32
Variable Gender N Mean Median TrMean StDev SEMean SD 1.23 0
1 2 3 4 5 6 7 8
Yard Lin F 15 51.40 50.00 50.38 13.40 3.46
Length of Stay (days)
M 15 65.33 68.00 65.23 14.10 3.64
Variable Gender Min Max Q1 Q3 Display 9.71 Summary and data plot for lengths of
Yard Lin F 35.00 81.00 40.00 59.00 hospital stays for Insurer A. [Source:
M 37.00 95.00 56.00 74.00 R. Peck, L. Haugh, and A. Goodman, Statistical
Display 9.69 Summary statistics of yard-line data for Case Studies, 1998, ASA-SIAM, 45–64.]
males and females. An independent random sample of 396 cases
c. Display 9.70 shows the relationship from Insurer B gave the results on length of
between the height of the volunteer, in stay summarized in Display 9.72.
inches, and the number of yards he or she 150
walked before crossing a sideline. What Summary of Insurer B:
Frequency
Length of Stay 100
lurking variable can help explain your
result in part b? Count 396 50
Mean 2.91
Walking Direction Scatter Plot 0
SD 1.58 1 2 3 4 5 6 7 8 9
76 Length of Stay (days)
74 Display 9.72 Summary and data plot for lengths of
72 hospital stays for Insurer B.
70 a. Estimate the difference between the mean
Height
n
2 2
s R __
__ s
sx__Rx__L L
nL
R
The lesson of Activity 9.5a is this: Paired observations can greatly reduce
variation over independent samples and produce a much more powerful test
and a more precise confidence interval estimate of the true mean difference. The
reduction in variation is greatest when the underlying measurements vary greatly
from pair to pair but the differences do not.
In Activity 4.4a, you compared sitting and standing pulse rates using three
different designs. In the completely randomized design, you randomly selected
half of the class to sit and the other half to stand. These data can be analyzed using
the techniques for two independent samples, as in Section 9.4. In the matched
pairs design, you matched pairs of students on a preliminary measure of pulse
rate. Then you randomly assigned sitting to one student in each pair and standing
to the other. In the repeated measures design, you had each person sit and stand,
with the order randomly assigned. The data from the last two designs should not
be analyzed using the techniques for two independent samples. In this section,
you will learn how to analyze these data using the differences in pulse rate for
each pair. Display 9.74 (on the next page) shows the data for pulse rates, in beats
per minute, from a class that worked through this activity.
60
40
40 60 80 100 120
Standing Pulse Rate
Display 9.75 Scatterplot of sitting versus standing pulse rates,
in beats per minute, in a completely randomized
design.
To analyze the data from this completely randomized design, use the methods
of Section 9.4. A 95% confidence interval (df 24.31) for the difference between
the mean pulse rates for sitting and standing is given by
_______ _____________
2 2
______
__ __ s1 __
s2 2
17.04 ______
13.00 2
(x 1 x 2) t* n n (77.71 74.86) t*
__
1 2 14 14
or 8.96 μstand μsit 14.67. This interval overlaps 0, so you can’t conclude
that one of these treatments would produce a higher mean than the other if every
subject were given both treatments.
90
80
70
60
40 60 80 100
Standing Pulse Rate
Display 9.76 Scatterplot of sitting versus standing pulse rates, in
beats per minute, in a matched pairs design.
80
60
40
40 60 80 100
Standing Pulse Rate
Display 9.77 Scatterplot of sitting versus standing pulse rates, in
beats per minute, in a repeated measures design.
A 95% confidence interval for the mean difference (df 27) is
__ sd
d t* ____
__ 5.28
or 8.36 2.052 _____
___
n 28
644 Chapter 9 Inference for Means
or (6.31, 10.41). This estimate of the mean does not overlap 0, so the evidence
supports the conclusion that the mean standing pulse rate is higher than the mean
sitting pulse rate.
Checking Normality
You might be wondering why we did not check conditions by looking at a plot of
the data. We will do that now. For the analyses of differences, it is the differences,
not the two original samples, that must come from an approximately normal
distribution. Display 9.78 provides boxplots of these differences for both the
matched pairs design and the repeated measures design. They show that there is
no obvious reason to be concerned about non-normality in either case.
RMD
MPD
Example: Testing the Mean Difference for the Repeated Measures Design
Use the data in the repeated measures design of Display 9.74 to test the research
hypothesis that standing increases the mean pulse rate.
Solution
Check conditions. Because the order in which each subject receives the two treatments is
randomized, you can treat this as a random assignment of treatments to subjects.
Although Display 9.78 shows that the distribution of differences is not quite
symmetric, there is no reason to rule out the normal distribution as a possible
model for producing these differences.
State your hypotheses. The hypotheses are
H0: μd 0 versus Ha: μd 0
Here μd is the theoretical mean difference between standing and sitting pulse rates
for this group of subjects if each subject could have received each treatment in
both orders.
Compute the test The test statistic is
statistic, find the P-value,
and draw a sketch. __
d μd 8.36 ___
0 8.38
t ______
__ _________
0
P0
sd /n 5.28/28
0 8.38
With 27 degrees of freedom, the P-value for this large t-statistic is essentially 0.
Conclusion in context. The very small P-value indicates that there is sufficient evidence to conclude
that the mean difference in pulse rates is positive. This implies that the mean
pulse rate is higher for persons standing than it is for those same persons sitting.
The result applies only to the people (subjects) in this experiment and cannot be
generalized to other people based on these data alone.
■
3. Compute the test statistic, find the P-value,__and draw a sketch. Compute
the difference between the mean difference, d , from the sample and the
hypothesized difference μd , and then divide by the estimated standard
0
error:
__
_1 _1
d μd 2P 2P
t ______
__ 0
sd /n
–t 0 t
The P-value is based on n 1 degrees of freedom, where n is the number
of differences.
4. Write your conclusion linked to your computations and in the context of
the problem. If you are using fixed-level testing, reject the null hypothesis
if your P-value is less than the level of significance, . If the P-value is
greater than or equal to , do not reject the null hypothesis. (If you are not
given a value of , you can assume that is 0.05.) Write a conclusion that
relates to the situation and includes an interpretation of your P-value.
Solution
Check conditions. This particular group of subjects is a random sample taken from a larger group,
so the randomness condition is satisfied. But the distributions of the original
counts and the distribution of their differences don’t appear normal, as you
can see in Display 9.80. Counts of this type are notorious for having skewed
B – A
–2 0 2 4 6 8 10 12 14 16 18 20
Values
Display 9.80 Boxplots of the original counts, B and A, and of the
differences, B A.
If you take the square root of each of the counts in B and A, you get the more
symmetric boxplots in Display 9.81. The square root transformation typically
does a good job of making the distributions of counts more symmetric.
B– A
where μd is the mean of the population of differences of the square roots of the
number of bacteria colonies before and after the antiseptic is applied.
The statistical summary for the differences of the square roots is
__
Mean of differences of square roots: d 0.96
Standard deviation of differences of square roots: sd 0.77
P = 0.0017
sd /n
0.96 ___
0
_________
0.77/10
0 3.943
3.943
With df 9, the P-value is 0.0017.
Give conclusion in The small P-value, 0.0017, gives ample evidence to support the conclusion
context. that the bacteria counts were lower after the antiseptic was applied; that is, the
true mean difference is positive. However, because there was no randomization in
the order of treatments, perhaps you can’t attribute the decrease to the antiseptic.
There is always the possibility that the bacteria count might have gone down over
time anyway. (See E78 on page 659 to find out how the experimenters eliminated
this possibility.)
■
Practice
Two Independent Samples, or Paired a. Display 9.83 shows the sitting pulse rate
Observations? plotted against the standing pulse rate for
P33. Display 9.82 shows a set of data on pulse each of the three designs. For which design
rates, in beats per minute, from another is the relationship strongest? Weakest?
class experiment employing the same three Explain why that should be the case.
designs as in Display 9.74 on page 642.
Completely Matched
Randomized Pairs Repeated Measures Summary
Design Design Design Statistics
Sit Stand Sit Stand Sit Stand Sit Stand CRD MPD RMD
78 76 74 78 62 72 60 64 Sit Stand Sit Stand Sit Stand
64 68 74 76 76 76 58 66 Mean 64.57 75.71 67.29 73.29 68.29 72.93
50 82 58 60 76 82 78 88 SD 9.33 11.68 9.37 12.71 10.24 10.38
58 80 80 96 50 50 62 70
50 68 78 90 66 74 78 78
70 64 62 64 58 62 74 74
70 58 74 74 68 76 88 92
64 90 62 70 52 62 66 68
66 72 68 66 68 74 76 84
72 78 64 74 80 86 82 86
80 60 60 80 78 84 54 58
56 100 56 58 60 64 72 78
58 80 52 52 82 82 58 60
68 84 80 88 60 66 70 66
Stand
Sit MP RM
50 60 70 80 90 100 –5 0 5 10 15 20 –4 0 4 8 12
Pulse Rate Difference Difference
Display 9.82 Data tables and boxplots of pulse rate data from another class experiment.
70 Barracuda 3.83
Brown trout 0.57
60
Catfish 1.84
50 Mackerel 0.64
90 Tuna 3.09
RM
Sitting Pulse Rate
Twin Pair Rural Urban Display 9.86 Length and width measurements, in
1 10.1 28.1
inches, of a dozen eggs. [Source: A. P.
Dempster, Elements of Continuous Multivariate
2 51.8 36.2 Analysis (Reading, Mass.: Addison-Wesley, 1969),
3 33.5 40.7 p. 151.]
338 338 0 3 0 4
390 390 0 4 0 4
372 372 0 5 0 3
406 406 0 6 3 3
364 364 0 7 3 2
433 433 0 8 8 6
426 426 0 9 4 5
417 417 0 10 1 4
415 415 0 11 1 7
461 461 0 12 4 2
431 431 0 13 2 17
429 392 37 15 2 0
433 403 30 16 1 0
Chapter Summary
To use the confidence intervals and significance tests of this chapter, you need
either
• a random sample from a population (consisting of either single values or
paired values)
• two independent random samples from two distinct populations or, in the
case of an experiment, two treatments randomly assigned to the available
subjects
When there is no randomness involved, you proceed with the test only after
stating loudly and clearly the limitations of what you are doing. If you reject the
null hypothesis in such a case, all you can conclude is that something happened
that can’t reasonably be attributed to chance.
You use a confidence interval if you want to find a range of plausible
values for
• μ, the mean of your single population
• μ1 μ2, the difference between the means of your two populations
Both of the confidence intervals you studied have the same form:
Review Exercises
E89. To find an estimate of the number of hours E91. Seventy students were randomly assigned
that highly trained athletes sleep each night, to launch a gummy bear, by themselves,
a researcher selects a random sample of using either one book or four books (see
15 highly trained athletes and asks each page 258). Display 9.104 gives the results
how many hours of sleep he or she gets each of all 70 first launches by these students. Is
night. The results are given in Display 9.103. there statistically significant evidence of a
Stem-and-leaf of Number of Hours difference in the mean distance? Show all
N = 15 four steps of a test of significance, being sure
Leaf Unit = 0.10
3 7
to name the test you are using.
4 29
5 149 Gummy Bears Box Plot
6 15
Number_of_Books
7 115
8 58
1
9 39
Short
Display 9.106 Life expectancies (in years) for a
Long
sample of African countries. [Source:
CIA Factbook, 2006.]
40 50 60 70 80 90 100
Inter-eruption Time (min) E94. Altitude and alcohol. On every commercial
passenger flight, there is an announcement
Display 9.105 Summary statistics and boxplots of that tells what to do if the cabin loses
times until next eruption following pressure. At high altitudes, air has less
short and long eruptions. oxygen, and you can lose consciousness
a. State appropriate null and alternative in a short time. In 1965, the Journal of the
hypotheses in words. Then define American Medical Association published a
notation and restate H0 and Ha in paper reporting the effects of alcohol on the
symbols. length of time subjects stayed conscious at
b. Tell whether the design of the study high altitudes.
justifies the use of a probability model, There were ten subjects. Each was put in
and give reasons for your answer. an environment equivalent to an altitude
c. Based on the summary statistics and of 25,000 ft and then monitored to see
boxplots, tell whether the shapes of the how long the subject could continue to
distributions raise doubts about using a perform a set of assigned tasks. As soon as
t-test. performance deteriorated (the end of “useful
consciousness”), the time was recorded and
d. Carry out the computations, find the
the environment was returned to normal.
P-value, and tell what your conclusion is
if you take the results at face value. Three days later, each subject drank a dose
of whiskey based on body weight—1 cc of
e. Now tell what you think the results of
100-proof alcohol for every 2 lb—and then,
the test really mean for this particular
after waiting 1 hour for the whiskey to take
data set.
effect, was returned to the simulated altitude
E93. The data in Display 9.106 show the life of 25,000 ft for another test. Display 9.107
expectancies (in years) of males and females gives the times (in seconds) until the end of
for a random sample of African countries. useful consciousness under both conditions.
Is there evidence of a significant difference
a. How can you tell from simply the
between the life expectancies of males and
description of the data, even before seeing
females in the countries of Africa? Give
the numbers, that this study gives you
statistical justification for your answer and a
paired data from one sample rather than
careful explanation of your analysis.
two independent samples?
AP1. In measuring the angle formed by two AP3. Which of the following describes a difference
intersecting laser beams in a physics lab, between
__ the sampling distribution of
x μ
a student uses chalk dust to illuminate z /__n and the sampling distribution of
____
the beams and then uses a protractor to __
x μ
measure the angle between them. She takes t ____
s/n
__ in the case where n 5?
NV
their confidence interval?
Group
They should lower the center.
VV
They should decrease the width.
They should increase the width.
They should decrease the confidence 0 5 10 15 20 25 30 35 40 45 50
Time_in_Seconds
level.
They should replace “adult” with Stereograms
“uninjured adult.” Group
Row
NV VV Summary
Investigative Tasks 8.5604647 5.5514289 7.2102563
43 35 78
AP9. The stereogram in Display 9.109 contains Time_in_Seconds 8.0854116 4.8017389 6.9360082
the embedded image of a diamond. [Source: 1.2330137 0.81164201 0.78534828
W. S. Cleveland, Visualizing Data (1993). Original source: 0 0 0
J. P. Frisby and J. L. Clatworthy, “Learning to See Complex S1 = mean ( )
Random-Dot Stereograms,” Perception 4 (1975): pp. 173–78, S2 = count ( )
lib.stat.cmu.edu.] Look
at a point between the S3 = stdDev ( )
S4 = stdError ( )
two diagrams and unfocus your eyes until S5 = count (missing ( ) )
the two images merge into one and the
diamond pops out. Display 9.110 Boxplot and summary statistics of
time, in seconds, to see the diamond.
a. Check the conditions for doing a two-
sample t-test of the difference in means.
b. The researchers did a two-sample
t-test on the original data, with pooled
variances ( 0.05). Do you agree with
that decision? Why or why not?
Display 9.109 Stereogram with embedded diamond. c. Replicate the test that the researchers did,
In an experiment, one group (NV) of and find the P-value. What conclusion
subjects was either told nothing or told that do you come to if you take the results at
a diamond was embedded. A second group face value?
(VV) of subjects was shown a drawing of d. What would the test decision have been
the diamond. The times (in seconds) that if the variances weren’t pooled?
it took the subjects to see the diamond are e. When the standard deviations aren’t
summarized in Display 9.110. comparable, what is the effect of using
the pooled procedure rather than the
unpooled procedure?
10 Chi-Square Tests
1000
Is it necessary to
have a child at
800
some point in your
life in order to feel
Frequency
A Test Statistic
To begin a chi-square goodness-of-fit test, make a table with the first column
listing the possible outcomes. In the second column, give the frequency (or count)
that each outcome was observed (O). In the third column, give the frequencies
you would expect (E) if the hypothesized proportions are correct.
Display 10.1 Observed and expected frequencies for 60 rolls of a fair die.
■
冱(O E)
where O represents an observed frequency and E represents the
corresponding expected frequency.
a. Compute the value of this test statistic for the fair die example.
b. Will your result in part a always occur? Prove your answer.
c. Is this a good test statistic?
d. You have seen two other situations in which the sum of the differences
always turned out to be 0. What were those situations? What did you do
in those situations?
D3. Another test statistic that might be constructed is
冱(O E)2
a. Compute the value of this test statistic for the fair die example.
b. Display 10.3 shows the results from the rolls of two different dice. For
which die does the table give stronger evidence that the die is unfair?
c. Compute and compare the values of ∑(O E)2 for Die A and Die B.
Does this appear to be a reasonable test statistic? Explain.
Die A Die B
Observed Expected Observed Expected
Outcome Frequency Frequency Frequency Frequency
1 5 10 995 1000
2 16 10 1006 1000
3 18 10 1008 1000
4 4 10 994 1000
5 5 10 995 1000
6 12 10 1002 1000
The distributions in Display 10.4 each show 2000 values of c2 computed from
rolls of a fair die.
700 450
600 400
350
500
300
Frequency
Frequency
400 250
300 200
150
200
100
100 50
0
0 10 20 30 0 5 10 15 20 25 30
χ2 χ2
Display 10.4 Two histograms of 2000 values of c2, one with
n 1000 (left) and one with n 60 (right).
Frequency
Frequency
600 400 350
0 0 0
0 10 20 0 10 20 0 10 20 30
χ2 χ2 χ2
Dodecahedral Icosahedral
(Twelve-Sided) Die (Twenty-Sided) Die
700 700
Frequency
Frequency
350 350
0 0
0 5 10 15 20 25 30 35 0 10 20 30 40 50
χ2 χ2
a = 0.05
0 2.6 11.07
χ2
Display 10.7 Chi-square distribution with 5 degrees of freedom
and critical value 11.07.
Looking at the row of values for 5 degrees of freedom in Display 10.6, you
see that the smallest value is 9.24, which cuts off an upper tail area of 0.10. The
observed value, 2.6, is well to the left of 9.24 under the c2 distribution (see
Display 10.7). Therefore, all you can say about the P-value is that it is larger than
0.10. (Table C on page 827 tells you that it is also larger than 0.25.) Statistical
software or a calculator can give you an exact P-value. [To learn how to find
the P-value with your calculator, see Calculator Note 10D.] Here, the P-value is
approximately 0.76.
P-value
0 c2
Solution
If the distribution of colors for peanut butter M&M’s is the same as that for milk
chocolate M&M’s, you would expect the numbers of each color to be as shown
in Display 10.9. Note that in a chi-square test it is not necessary that all of the
expected frequencies be the same.
Percentage in Expected
Color Milk Chocolate M&M’s Number of M&M’s
Red 13% 0.13(200) 26
Yellow 14% 0.14(200) 28
Green 16% 0.16(200) 32
Orange 20% 0.20(200) 40
Brown 13% 0.13(200) 26
Blue 24% 0.24(200) 48
Total 100% 200
Check conditions. The conditions for a chi-square goodness-of-fit test are met in this situation.
You have a random sample of 200 peanut butter M&M’s. Each M&M was only
one color. You can compute the expected number of each color because you know
the distribution of colors in milk chocolate M&M’s. All of the expected counts are
at least 5.
State the hypotheses. The null hypothesis is
H0: The distribution of colors in peanut butter M&M’s is the same as the
distribution of colors in milk chocolate M&M’s; that is, there are 13% red,
14% yellow, 16% green, 20% orange, 13% brown, and 24% blue.
682 Chapter 10 Chi-Square Tests
The alternative hypothesis is
Ha: The distribution of colors in peanut butter M&M’s is not the same as
the distribution of colors in milk chocolate M&M’s; that is, at least one
proportion is different.
Compute the test The test statistic is
statistic, approximate
(O E)2
the P-value, and draw a
c2 冱________
sketch. E
(25 26)2 (37 28)2 (45 32)2
_________ _________ _________
26 28 32
P = 0.031
(34 40) 2
(19 26) (40 48)
2 2
_________ _________ _________
40 26 48
0 12.33
12.33
The value of c2 from the sample, 12.33, is quite far out in the tail of the chi-square
distribution with 6 1, or 5, degrees of freedom. In fact, it is in the upper 0.031
of the tail.
Write the conclusion Reject the null hypothesis. You cannot attribute the difference in the expected
in the context of the and observed frequencies to variation in sampling alone. A value of c2 this large
situation. is very unlikely to occur in random samples of this size if peanut butter M&M’s
have the same distribution of colors as milk chocolate M&M’s. Conclude that
the distribution of colors in peanut butter M&M’s is different from that in milk
chocolate M&M’s. Display 10.10 shows a Fathom printout for this test.
df = 5
df = 9
0 5 10 15 20
c2
Display 10.11 Chi-square distributions for df 5 and df 9.
The distribution of c2 computed from repeated sampling is discrete, however,
because only a limited number of distinct values of c2 can be calculated
for a given number of categories and a given sample size. Like the normal
approximation to the binomial distribution, the c2 distribution is a continuous
distribution that can be used to approximate a discrete distribution. The larger
the expected frequencies, the closer the distribution of possible values of c2 is
to a continuous distribution. In order to have a reasonable approximation, the
expected frequency in each category should be 5 or greater. (This is a conservative
rule, but it works well in most cases.)
Display 10.12 Table for computing c2 when testing the fairness of a coin.
The test statistic for testing the null hypothesis that spinning the coin is fair
becomes
n2 n2
x1 __2
________
x2 __2
________
c2 n
__ n
__
2 2
The sum of the number of heads and tails must be n, so x2 n x1 and you can
write the test statistic as
n2
[ 2]
n n n 2 2
x1 __2 (n x ) __
1
2
x __2 ________
x __2
1 1
c
2 ________ _____________ 2 ________
n
__ n
__ n
__
2 2 2 n _1_ _1_
2 2
But you already know another way of testing the hypothesis that spinning a coin is
fair. For large n, the test statistic for this hypothesis could be the familiar z-statistic
from Chapter 8, given by
x1 _1_
__ n
p̂ p _______
n x1 __
_____ 2
z ___ _____ ______ ________ 2
pq
1 1 n __ _1_
1
___
n _2_ _2_
_____ 2 2
n
You can now see that c2 z 2. The square of a z-statistic has a c2 distribution with
1 degree of freedom. This equality holds in general, even if p isn’t _12 .
In summary, if there are only two types of outcomes (success and failure),
you are back in the binomial situation and the z-test from Chapter 8 is equivalent
to the chi-square test of this chapter. This implies, among other things, that
the assumptions for the chi-square test are the same as those for the z-test: The
sample must be random and be large enough so that the sample proportion has an
approximately normal sampling distribution.
Practice
A Test Statistic The Distribution of Chi-Square
P1. Suppose you want to test whether a P3. Refer to Display 10.5 on page 678. Suppose
tetrahedral (four-sided) die is fair. You roll you roll a 12-sided die 300 times to see if it
it 50 times and observe a one 14 times, a is fair. You compute c2 and get 21.3. Use the
two 17 times, a three 9 times, and a four appropriate histogram to approximate a
10 times. P-value for this test. What is your conclusion
a. How many do you “expect” in each if you are using 0.05?
category? P4. Refer to Display 10.5 on page 678. Suppose
b. Compute the value of c2. you roll an eight-sided die 100 times to
see if it is fair, and get c2 21.3. Use the
P2. If each observed frequency equals the
appropriate histogram to approximate a
expected frequency, what is the value of c2?
P-value for this test. What is your conclusion
if 0.05?
686 Chapter 10 Chi-Square Tests
Using the Chi-Square Table and Your b. What else would you like to know about
Calculator the time frames in which these data were
P5. Give df for each situation, and then use collected?
Table C on page 827 to determine if the P9. Display 10.14, repeated from Chapter 7,
result is statistically significant ( 0.05). gives the distribution of the number of
a. You roll a tetrahedral die 100 times and children per family in the United States.
calculate a c2 value of 8.24. Number of
Children in Proportion
b. You roll a 20-sided die 500 times and
Family of Families
calculate a c2 value of 8.24.
0 0.524
P6. Learn to use your calculator to find P-values,
1 0.201
and then find the P-value for the case of
2 0.179
a. rolling an 8-sided die, and c2 2.6
3 0.070
b. rolling a 12-sided die, and c2 21.3 4 (or more) 0.026
Number_of_Children
Relative Frequency of
0.4
Blood Type O A B AB Total
0.3
Frequency 465 394 96 45 1000
Exercises
E1. In 1882, R. Wolf rolled a die 20,000 times. Outcome Frequency
The results are recorded in Display 10.16. 1 41
Is this evidence that the die was unfair, or 2 18
is it approximately what you would expect 3 28
from a fair die?
4 10
Outcome Frequency 5 22
1 3407 6 1
2 3631
Display 10.17 Frequencies of outcomes of
3 3176
120 rolls of a shaved die. [Source:
4 2916 stat-www.berkeley.edu.]
5 3448
Clearly, there are more 1’s than you would
6 3422 expect and fewer 6’s, the outcome on the
Display 10.16 Results from 20,000 rolls of a die. side opposite the 1. Is there evidence that
[Source: D. J. Hand et al., A Handbook of Small the shaved die was unfair with regard to the
Data Sets (London: Chapman & Hall, 1994), other four outcomes?
p. 29.]
E3. A study attempted to find a relationship
E2. Gelman and Nolan had students roll a die between people’s birthdays and dates of
that had the corners on the 1 side slightly admission for treatment of alcoholism. In
rounded. The results of 120 rolls are shown a sample of 200 admissions, 11 were within
in Display 10.17. 7 days of the person’s birthday; 24 were
between 8 and 30 days, inclusive; 69 were
results match too closely. In one experiment, Display 10.20 Poll predictions and actual results
Mendel predicted that he would get a 9:3:3:1 (percentage of votes) of the 1948 U.S.
ratio between smooth yellow peas, wrinkled presidential election. [Source: F. Mosteller,
yellow peas, smooth green peas, and The Pre-election Polls of 1948 (New York: Social
Sciences Research Council, 1949).]
wrinkled green peas. His experiment resulted
10.1 Testing a Probability Model: The Chi-Square Goodness-of-Fit Test 689
Suppose each poll sampled about 3000 Sundays. [Source: D. A. Redelmeier and C. L. Stewart, “Do
voters. Do any of the three poll results Fatal Crashes Increase Following a Super Bowl Telecast?”
Chance 18, no. 1 (2005): 19–24.]
demonstrate a reasonably good fit to the
actual results? Which of the three fits best? a. There were 662 fatalities in the 4 hours
E8. The 1948 presidential election polls of following Super Bowl telecasts and 936
Crossley, Gallup, and Roper (see E7) were in the corresponding hours on control
based on a method called quota sampling, Sundays. Are these data consistent with
in which quotas such as so many males, so the model that the telecast has no effect?
many females, so many retired, so many What can you conclude?
working, so many unemployed, and so on b. Within the home state of the winning
were to be filled by field-workers for the Super Bowl team, the number involved in
polling company. The state of Washington, traffic accidents (fatalities plus survivors)
however, tried out a fairly new method called totaled 141 for Super Bowl Sundays
“probability sampling,” which made use and 265 for control Sundays. Do these
of the random sampling ideas used today. data fit the model given? What can you
Display 10.21 shows how that poll, based conclude?
on about 1000 voters, came out, in terms of E10. It is sometimes said that older people are
percentages. overrepresented on juries. Display 10.22
Probability gives information about people on grand
Sample Actual Vote juries in Alameda County, California. Does it
Dewey 46.0 42.7
appear that these grand jurors were selected
at random from the adult population of
Truman 50.5 52.6
Alameda County?
Wallace 2.9 3.5
Countywide Number of
Other 0.6 1.2 Age Percentage Grand Jurors
Display 10.21 Poll predictions and actual results 21–40 42 5
(percentage of votes) of the 1948 41–50 23 9
U.S. presidential election for the state 51–60 16 19
of Washington. [Source: F. Mosteller, The 61 or older 19 33
Pre-election Polls of 1948 (New York: Social
Sciences Research Council, 1949).] Total 100 66
Can you reject the hypothesis that this Display 10.22 Distribution of ages for jurors in
sample result is a good fit to the “truth” ? Alameda County, California. [Source:
David Freedman et al., Statistics, 2d ed. (New
E9. Does the number of people involved in York: Norton, 1991), p. 484. Original source:
traffic accidents increase after a Super UCLA Law Review.]
Bowl telecast? To answer this question, E11. Write a research hypothesis about data
investigators looked at the number of that can be analyzed using a chi-square
fatalities on public roadways in the United goodness-of-fit test. For example, “People
States for the first 27 Super Bowl Sundays. are less likely to be born on some days of
They compared the number of fatalities the week than on others.” Design and carry
during the 4 hours after the telecast with out a survey about your hypothesis using a
the number during the same time period on random sample of a specified population.
the Sundays the week before and the week
E12. A sign on a barrel of nuts in a supermarket
after the Super Bowl (54 control Sundays).
says that it contains 30% cashews, 30%
If the telecast had no effect, there should be
hazelnuts, and 40% peanuts by weight. You
roughly twice the number of fatalities on the
mix up the nuts and scoop out 20 lb. When
54 control Sundays as on the 27 Super Bowl
b. Prove algebraically that the sum of the Display 10.25 [Source: mathworld.wolfram.com.]
relative frequencies is 1.
20 Observed number
that break
Frequency
15
10
0
Wipe-Ups
Wipe-Its
Wipe-Outs
Display 10.27 A stacked bar chart for Justine’s paper towels data.
Total 32 18 25 75
Wipe-It breaks 7
30 25 10
______ 3 0.9
75
Wipe-Out breaks 5
30 25 10
______ 5 2.5
75
Before you can find the P-value for this test, you need to determine the
number of degrees of freedom, df.
df (r 1)(c 1)
where r is the number of rows in the table of observed values and c is the
number of columns (not counting the headings or totals in either case).
df (r 1)(c 1) (2 1)(3 1) 2
Using the calculator to determine the P-value for this test, where c2 16.34,
you get approximately 0.00028. Alternatively, if you look at c2 values in Table C
on page 827, you will find that this value of c2 is significant at the 0.001 level, as
shown in Display 10.31. The P-value, 0.00028, means that if the null hypothesis
that the three brands are equally likely to break is true, then there is almost no
chance of getting observed frequencies as far from the expected frequencies as
Justine did. Consequently, you would reject the hypothesis that the probability of
breaking is the same for all three brands of paper towels.
a = 0.001
0 13.82 16.34
Display 10.31 c distribution with df 2, 0.001.
2
■
(O E)2
c2 冱________ P-value
E
0 c2
1000
900
800
700
Frequency
600 Undecided
500
400 Unnecessary
300
Necessary
200
100
0
U.S.
Canada
India
Mexico
Germany
Country
Display 10.32 Two-way table of observed frequencies and
segmented bar chart for the Gallup poll results.
[Source: www.gallup.com, May 2002.]
From the plot, you can see that the percentages do not appear to be the same
for each country. The difference between India and the United States seems too
great. Test the hypothesis that the proportion of adults who would give each
answer is the same for each country.
698 Chapter 10 Chi-Square Tests
Solution
Under this hypothesis, the expected frequencies are given in Display 10.33.
Country
U.S. India Mexico Canada Germany Total
Necessary 616 616 616 616 616 3080
Response Unnecessary 354 354 354 354 354 1770
Undecided 30 30 30 30 30 150
Total 1000 1000 1000 1000 1000 5000
Solution
Check conditions. Treatments were randomly assigned to subjects in large enough numbers so
that all expected cell frequencies are at least 5 (see Display 10.36). The conditions
are met.
State the hypotheses. The null hypothesis is
H0: If both treatments could be assigned to all subjects, the resulting
distributions of outcomes would be the same.
The alternative hypothesis is
Ha: If both treatments could have been assigned to all subjects, the resulting
distributions of outcomes would differ. (The chi-square test itself does
not tell how they differ.)
Compute the test Display 10.36 shows the expected frequencies.
statistic, approximate
the P-value, and draw a
Treatment
sketch. Nitric Oxide Placebo
Death 19.78 19.22
Outcome Survival with Chronic Lung Disease 39.06 37.94
Survival without Chronic Lung Disease 46.16 44.84
Total 105 102
df = 2
P = 0.08
0 5.026
Write a conclusion linked This is a borderline case. If you strictly interpret this as a fixed 5% level test, the
to the computations decision is not to reject the null hypothesis and to say you don’t have statistically
and in the context of the significant evidence that treating with nitric oxide is an improvement over
problem. the placebo. However, the P-value is fairly small and gives some evidence that
something other than random behavior is going on here. As you will see in P20,
this point of view is strengthened if you collapse the death and survival with
chronic lung disease categories into one category; both are considered undesirable
outcomes by the research physicians. The original article reporting this research
ends with a positive recommendation for the nitric oxide treatment under certain
conditions. See P21 for more information on this study.
■
s__
____ ∑(x x )
_________
where s
n n1
Rest of class: Figures! Jodain remembers everything!
__
Ms. C: Think about the deviations, (x x ). There are n of them. Because
the sum of the deviations from the mean is 0, if you know all the
deviations but one, you can figure out the last one. So you need
only n 1 deviations to get all the information about the size of a
typical deviation from the mean.
10.2 The Chi-Square Test of Homogeneity 703
Rest of class: It’s kind of like if the probability it’ll rain tomorrow is 0.4, then you
don’t learn anything new if they also tell you that the probability
it won’t rain is 0.6. You could have already figured that out. So you
really have only one piece of information.
Ms. C: Well, that’s sort of close.
Jodain: Then the reason we didn’t have to bother with df in a z-test for a
proportion in Section 8.2 is that we knew what the standard error
should be because it depended only on p0.
Ms. C: Right. You weren’t estimating the error term from the data using a
sum of squared deviations. So you didn’t need to worry about df.
Rest of class: We like z-tests!
Ms. C: What does this have to do with df equaling the number of
categories minus 1 for a chi-square goodness-of-fit test?
Jodain: Hmmm. The c2 statistic itself looks like one big error term. If you
are testing whether a six-sided die is fair, there are six categories
and six deviations, O minus E. I suppose if you know all but one of
them, you can figure out that one.
Rest of class: Huh?
Ms. C: Jodain’s right. The last deviation is determined by the others
because the sum of the deviations from the “center” is always
equal to 0:
冱(O E) 冱O 冱E n n 0
Because the last deviation doesn’t give you any new information
about the size of the deviations from the center,
df number of categories 1
Jodain: But what about the formula in this section, df (r 1)(c 1)?
Rest of class: (groan) Jodain, why do you do this to us? (mumble, mumble)
Ms. C: Well, how many deviations, O minus E, are you using in the
formula?
Rest of class: We know! The number of rows times the number of columns!
Ms. C: Great answer! Now all you have to do is figure out how many of
these give you new information about the size of O minus E and
how many are redundant.
Jodain: Well, with Justine’s paper towels example, there were 2 times 3,
or 6, values of O minus E. They have to sum to 0 in each row and
each column. So, for example, if you know the deviations I’ve put
in this table (Display 10.38), you can figure out the rest, meaning
that they don’t tell you anything new.
Display 10.38 Table of deviations with one row and one column
missing.
Rest of class: We can be missing an entire row and an entire column of
deviations!
Ms. C: That means you have only (r 1) times (c 1) independent
deviations.
Jodain: Here’s my rule: The concept of df applies when I need to use a sum
of squared deviations from a parameter or parameters in my test
statistic and I must estimate the parameter or parameters from the
data. I count the number of deviations that are free to vary. This is
the number of degrees of freedom.
Ms. C: That covers it for everything you’ll see in this class.
Rest of class: We have a different rule—just give us the formula!
The number of degrees of freedom for a two-way table with r rows and
c columns is
df (r 1)(c 1)
Bath 6 21 27 E 23 75 12 110
Exercises
Each time you are asked to perform a chi-square test, E18. “Overall, how satisfied are you with the
include all four of the steps given on pages 697–698. quality of education students receive in
E17. A recent Gallup poll asked the same question kindergarten through grade twelve in
the Gallup Organization has asked every the U.S. today—would you say you are
year for many years: “What do you think completely satisfied, somewhat satisfied,
is the most important problem facing somewhat dissatisfied or completely
this country today?” Display 10.42 shows dissatisfied?” This question was posed to a
the percentage responses for three major random sample of about 1000 adults in 2004
concerns from 2003 to 2006. and another sample of the same size in 2005.
The results are shown in Display 10.43. Was
Most Important Problem
there a significant change in the distribution
War Economy Health Care Other of results from 2004 to 2005? If so, what was
2003 19% 20% 9% 52% the direction of the shift?
2004 18% 18% 8% 56% E19. Joseph Lister (1827–1912), a surgeon at
Year the Glasgow Royal Infirmary, was one of
2005 22% 8% 5% 65%
the first to believe in Pasteur’s germ theory
2006 23% 10% 6% 61% of infection. He experimented with using
Display 10.42 Results from a Gallup poll: responses carbolic acid to disinfect operating rooms
for three major concerns, 2003–6. during amputations. Of 40 patients operated
[Source: www.gallup.com.] on using carbolic acid, 34 lived. Of 35
a. Assume that 1000 people were surveyed patients operated on not using carbolic acid,
each year, and convert the table to one 19 lived. [Source: Richard Larson and Donna Stroup,
Statistics in the Real World: A Book of Examples (New York:
displaying frequencies. Construct and
Macmillan, 1976), pp. 205–7. Original reference: Charles
interpret a segmented bar chart of these Winslow, The Conquest of Epidemic Diseases (Princeton, N.J.:
frequencies. Princeton University Press, 1943), p. 303.]
b. What are the populations in this case? a. Display these data in a two-way table and
c. Perform a chi-square test of homogeneity. in a segmented bar chart.
Use 0.05.
Completely Somewhat Somewhat Completely No
Satisfied Satisfied Dissatisfied Dissatisfied Opinion
2005 90 370 350 160 30
Display 10.43 Results of a poll about satisfaction with the quality of education in 2004 and
2005. [Source: poll.gallup.com, 2005.]
happen? (1) They have already begun to E26. How does the United States compare to
happen, (2) they will start happening within Canada and Great Britain with regard to
a few years, (3) they will start happening crime against individuals? Recent polls
within your lifetime, (4) they will not happen regarding victimization of those responding
within your lifetime, but they will affect to the survey were conducted in the three
future generations, or (5) they will never countries; the results are shown in Display
happen.” The responses are shown in Display 10.47. Do the data show significantly
10.45. Organize the data, display them in a differing crime victimization rates for the
segmented bar chart, write an appropriate three countries surveyed? If so, where do the
null hypothesis, and perform a chi-square largest differences occur?
test of homogeneity.
Great Britain Canada United States
Year
Individuals
Response 1997 2006 Victimized by 25% 21% 21%
Already begun 48% 58% Nonviolent Crime
Within a few years 3% 5% Individuals
In my lifetime 14% 10%
Victimized by 4% 2% 3%
Violent Crime
Future generations 19% 15%
Individuals Not 71% 77% 76%
Never 9% 8%
Victimized
No opinion 7% 4%
Total Individuals 1010 1003 1012
Total 100% 100%
Display 10.47 Results of polls regarding crime
Display 10.45 Results from a Gallup poll on global
against individuals. [Source: poll.gallup
warming. [Source: www.pollingreport.com, .com, 2005.]
September 2006.]
Men 48% 37% 26% Display 10.50 Results of polls regarding Internet
Women 34% 17% 13% users’ habits, for samples of
750 people in 2003 and 750 different
Display 10.49 Percentages of people who drink people in 2005. [Source: poll.gallup.com,
regularly, out of 500 men and 2006.]
500 women. [Source: poll.gallup.com, 2006.]
a. Can this data table be used to conduct a
Describe the test you would use to answer chi-square test of homogeneity on these
each question. Show the test statistic (with proportions across the two years? If so,
all numbers substituted in) that you would find and interpret the P-value for this test.
compute. If not, explain why not.
a. Is the proportion of men who drink b. Could you test for equality of proportions
regularly significantly greater than of those sending and reading e-mail
the proportion of women who drink across the two years? If so, find and
regularly in the United States? interpret the P-value for this test. If not,
explain why not.
A Few About Less Than Only on Total
Every Times a Once a Once a Special Sample
Day Week Week Week Occasions Never Size
United States 5% 14% 11% 10% 29% 31% 1012
Display 10.51 Results of a poll of parents’ opinions of the amount of emphasis schools
place on sports and standardized test preparation, for a sample of
1000 adults. [Source: poll.gallup.com, 2005.]
Why do you need a test of independence here when you already have one
from Chapter 5?
D19. Why do you think a chi-square test of independence is sometimes called a
test of “association”?
Frequency
600
Frequency
Survived 500
500 First
400
400
300
300
200
200
100
100
0
0
Survived
Didn’t
Survive
First
Second
Third
Class of Travel Survival Status
Display 10.55 Segmented bar charts for the Titanic survival data.
The column chart in Display 10.56 treats the two variables symmetrically.
You can see that the two variables do not appear to be independent because the
columns in the back row don’t follow the same pattern as the columns in the front
row. From left to right, the heights of the columns in the back row go shortest,
middle, tallest; in the front row, the heights go tallest, shortest, middle.
600
Frequency
400
200
0 Didn’t
st Survive
Fir nd Survived
e co rd
Class S T hi
of Tr
avel
4
___
15
The expected frequency of people who are right-handed and blue-eyed is then
4 (30) 8
P(right-handed and blue-eyed) (grand total) ___
15
In general, if two variables are independent, the probability that a randomly
selected observation falls into the cell in column C and row R is
column C total
row R total ____________
P(R and C) P(R) P(C) _________
grand total grand total
Thus, the expected number of observations that fall into this cell is
column C total (grand total)
row R total ____________
P(R and C) (grand total) _________
grand total grand total
(row R total) (column C total)
__________________________
grand total
This is the same formula as in Section 10.2.
(O E)2
c2 冱________ P-value
E
0 c2
(continued)
Display 10.58 Eye color and hair color data from a sample of
Scottish children. [Source: D. J. Hand et al., A Handbook
of Small Data Sets (London: Chapman & Hall, 1994), p. 146.
Original source: L. A. Goodman, “Association Models and
Canonical Correlation in the Analysis of Cross-Classifications
Having Ordered Categories,” Journal of the American Statistical
Association 76 (1981): 320–34.]
Solution
From the column chart in Display 10.59, you can see that it does indeed appear
to be the case that children with darker hair colors tend to have darker eye colors,
whereas children with lighter hair colors tend to have lighter eye colors. Compare,
especially, the rows for fair hair and dark hair.
800
600
Frequency
400
200
0 Dark
Medium
ir
Fa
d
Light
Re
or
m
ol
iu
eC
k
ed
Blue
ar
Ey
M
k
Hair
ac
Colo
Bl
r
Display 10.59 Column chart for hair color and eye color.
Check conditions. This situation satisfies the conditions for a chi-square test of independence
if the children can be considered a simple random sample taken from one large
population. Each child in the sample falls into one hair color category and one
eye color category. The Data Desk printout in Display 10.60 shows the expected
frequencies under the assumption of independence. The expected frequency in
each cell is 5 or greater.
State the hypotheses. The null hypothesis is
H0: Eye color and hair color are independent.
The alternative hypothesis is
Ha: Eye color and hair color are not independent.
Compute the test Your calculator gives a c2 value of 1240, extremely far out in the tail, and a
statistic, calculate the corresponding P-value of approximately 0. [See Calculator Note 10E to learn how to
P-value, and draw a use your calculator to find these statistics.]
sketch.
Display 10.60 also gives the computation of the test statistic, c2, and the P-value.
Notice that the expected frequencies in each cell are computed from the marginal
frequencies. Display 10.60 also includes the conditional relative frequencies for
the columns (hair color): Among those with fair hair, 22.4% have blue eyes;
among those with black hair, only 2.54% have blue eyes.
df = 12
a = 0.001
0 32.91
Display 10.61 Chi-square distribution with 12 degrees of
freedom.
Write the conclusion Reject the null hypothesis. These are not results you would expect for a
in the context of the sample from a population in which there is no association between eye color and
situation. hair color. As you can see from Display 10.61, a c2 value of 1240 is much larger
than the value of 32.91 that cuts off an upper tail of 0.001. Thus, a value of c2 of
1240 or larger is extremely unlikely to occur in a sample of this size if hair color
and eye color are independent. Examining the table and the column chart, you
might conclude that darker eye colors tend to go with darker hair colors and
lighter eye colors tend to go with lighter hair colors.
■
Display 10.62 Sample table for blue jeans data, collected from
two populations.
Class: We get it! We have to do a test of homogeneity because we sampled
separately from two populations: people under 40 and older
people. There are two populations and one variable about jeans.
Jodain: Now I see what you mean by the columns being proportional.
For people under 40, 42% are wearing jeans now, 46% are not
Display 10.63 Sample table for blue jeans data, collected from
one population.
Class: We use a test of independence because that’s the only other
possibility.
Jodain: We use a test of independence because there is only one sample,
from the population of all people. We categorize each person in
the sample on the two variables, age and jeans-wearing behavior.
The columns are roughly proportional, so we’ll conclude that we
can’t reject the hypothesis that the two variables are independent.
Jeans-wearing behavior and age aren’t associated. But this time we
didn’t predetermine how many people would be in each age group.
Ms. C: Right again, Jodain.
Class: Jodain is always right!
Ms. C: Also, it now makes sense to talk about conditional distributions for
23
the rows. You can estimate that __
40 , or about 58%, of people now
wearing jeans are under age 40 and about 42% are age 40 or older.
That statement wouldn’t make sense in the homogeneity case.
Jodain: Then why wouldn’t we always design our study as a test of
independence? Then we can look at the table both ways, and
as a bonus we get an estimate of the percentage of people in each
age group!
Ms. C: It depends on what you want to find out. Suppose your research
hypothesis is that the proportion of people who wear jeans differs
for the two age groups. You would design the study as a test of
homogeneity, taking an equal sample size from each age group.
Display 10.64 Tables showing how sample size can affect the test
statistic.
Practice
The Chi-Square Test of Independence D. type of movie preferred and gender
P22. Which pairs of variables do you believe E. eye color and class year
are independent in the population of U.S. F. class year and whether taking statistics
students? Explain.
A. hair color and eye color Tabular and Graphical Display of Data
B. type of music preferred and ethnicity P23. Which column charts in Display 10.65
display variables that are independent?
C. gender and color of shirt
A. B. C. D.
Display 10.66 Table of expected frequencies for a. Explain what the percentages measure.
a test of independence between Given the way the data were collected,
gender and handedness. would you do a test of homogeneity or a
test of independence?
Procedure for a Chi-Square Test of b. Suppose the sample size was 1000 and it
Independence just happened to be equally split between
P25. Is there a difference between handedness men and women. Is there evidence of
patterns in men and women? A good set of dependence between gender and an effort
data to help you answer this question comes toward a healthy diet?
from the government’s 5-year Health and c. Suppose the sample size was only
Nutrition Survey (HANES) of 1976–80, 500 and it just happened to be equally
which recorded the gender and handedness split between men and women. Is there
of a random sample of 2237 individuals from evidence of dependence between gender
across the country. The observed frequencies and an effort to eat a healthy diet?
for men and women in each of three P27. A health science teacher had his class do a
handedness categories are shown in Display sample survey of the students at a nearby
10.67. Use an appropriate statistical test to middle school. Two of the questions were “Do
answer the question posed. If the patterns you eat breakfast at least three times a week?”
differ significantly, explain where the main and “Do you think your diet is healthy?” The
differences occur. responses are shown in Display 10.69.
Men Women Do the answers to these two questions
Right-Handed 934 1070 appear to be independent, based on these
Left-Handed 113 92 data? Conduct a statistical test.
Ambidextrous 20 8 Healthy Diet?
Total 1067 1170 Yes No
Exercises
Always include all steps when doing a statistical test. on the age of a mother at the birth of her first
E33. According to a National Center for child and whether the mother eventually
Education Statistics report, 48,574,000 developed breast cancer. Of the 6168 mothers
children were projected to be enrolled in the in the sample, 26.39% had their first child at
nation’s K–12 public schools in 2006. About age 25 or older. Of the 6168 mothers in the
69% were projected to be enrolled in grades sample, 98.44% had not developed breast
K–8 and 31% in grades 9–12. It was also cancer. [Source: Jessica Utts, Seeing Through Statistics, 2d ed.
(Pacific Grove, Calif.: Duxbury, 1999), p. 209.]
projected that about 16.8% of the students
would be in the Northeast, 22.1% in the a. For what reason might the variables
Midwest, 36.5% in the South, and 24.6% in age at birth of first child and whether
the West. [Source: nces.ed.gov.] developed breast cancer be associated?
a. For what reason might the variables b. Construct a two-way table showing
region of country and grade level be the percentage of mothers who fall
associated? into each cell under the assumption of
b. Construct a two-way table showing independence.
the proportion of students who fall c. Construct a two-way table showing the
into each cell under the assumption of number of mothers who fall into each cell
independence. under the assumption of independence.
c. Construct a two-way table showing the E35. A student surveyed a random sample of 300
number of students who fall into each cell students in her large college and collected
under the assumption of independence. the data in Display 10.71 on the variables
E34. The first U.S. National Health and Nutrition class year and favorite team sport.
Examination Survey in the 1980s reported
726 Chapter 10 Chi-Square Tests
Favorite Team Sport
Basketball Soccer Baseball/Softball Football Total
Freshman 12 40 10 1 63
Sophomore 12 44 16 8 80
Class Year
Junior 9 43 11 11 74
Senior 10 49 18 6 83
the results in Display 10.72 regarding the Display 10.73 Titanic passengers sorted by gender
level of exercise attained regularly by adult and survival status.
men and women. (Assume the respondents
are equally split between men and women.) a. Construct a plot that displays these data
to see whether the variables gender and
Men Women survival status appear to be independent.
High 33% 26% b. Test to see if the association between
Medium 23% 16% the variables gender and survival status
Low 20% 26% can reasonably be attributed to chance
or if you should look for some other
Sedentary 24% 32%
explanation.
Display 10.72 Results of a survey regarding the E38. The fate of the members of the Donner party,
exercise level of 3026 adults, taken who were trapped in the Sierra Nevada
over a 3-year period. [Source: poll.gallup mountain range over the winter of 1846–47,
.com, 2005.]
is shown in Display 10.74. These data cannot
a. Change the percentages into observed reasonably be considered a random sample
frequencies and perform a chi-square test from any well-defined population.
to determine if exercise and gender are Gender
independent.
Male Female
b. Could you have made a Type I error?
Yes 23 25
c. Describe how the survey should have been Survived?
designed in order to test for homogeneity No 32 9
of male and female populations with Display 10.74 The Donner party survival data.
regard to level of exercise. [Source: www.utahcrossroads.org, June 2002.]
Display 10.75 Alcohol-related traffic fatalities on Display 10.76 Smoking behavior sorted by type
Super Bowl Sundays and control of occupation. [Source: D. J. Hand et al.,
A Handbook of Small Data Sets (London:
Sundays. [Source: D. A. Redelmeier and C. L. Chapman & Hall, 1994), p. 284. Original source:
Stewart, “Do Fatal Crashes Increase Following a T. D. Sterling and J. J. Weinkam, “Smoking
Super Bowl Telecast?” Chance 18, no. 1 (2005): Characteristics by Type of Employment,” Journal
19–24.] of Occupational Medicine 18 (1976): 743–53.]
Chapter Summary
In this chapter, you have learned about three chi-square tests: a test of goodness
of fit, a test of homogeneity, and a test of independence. Although each test is
conducted in exactly the same way, the questions they answer are different.
In a chi-square goodness-of-fit test, you ask “Does this look like a random
sample from a population in which the proportions that fall into these categories
are the same as the proportions hypothesized?” This test is an extension of the test
of a single proportion developed for the binomial case.
In a chi-square test of homogeneity, you ask “Do these samples from different
populations look like random samples from populations in which the proportions
Review Exercises
E41. This question was asked of random samples c. Perform the test you think is best.
of about 1000 residents of the United States Men Women
in each of a succession of years: “Which of
the following statements reflects your view of Positive 31% 18%
when the effects of global warming will begin Mixed 23% 22%
to happen?” The percentages selecting each Negative 44% 57%
choice, by year, are given in Display 10.77 (at
Undesignated 2% 3%
the bottom of the page). Is there evidence of
change in the pattern in these percentages Total 100% 100%
across the years? If so, describe the pattern of Display 10.78 Results of a poll of 1004 adults
change you see. regarding how critical they are of
E42. “Women are more critical of environmental environmental conditions. [Source:
conditions than are men.” So says a Gallup poll.gallup.com, 2005.]
Poll report on a survey of 1004 adults taken E43. Many years ago, Smith College, a residential
in 2005. Results are shown in Display 10.78. college, switched to an unusual academic
The percentages in the first row of data show schedule that made it fairly easy for students
that 31% of men reported a positive view of to take most or all of their classes on the
current environmental conditions whereas first three days of the week. (Smith has long
only 18% of women reported such a view. since abandoned this experiment.) At the
Assume that there were approximately equal time, the infirmary staff wanted to know
numbers of men and women. about after-hours use of the infirmary under
a. Construct a plot that displays these data. this schedule. They gathered data for an
b. What test should you use to decide entire academic year, recording the time
whether the data support the quoted and day of the week of each after-hours
claim? visit, along with the nature of the problem,
Within Within Not Within Will
Already a Few Your Lifetime, but Never No
Begun Years Lifetime in Future Happen Opinion
2005 54% 5% 10% 19% 9% 3%
Display 10.77 Result of a poll of 1000 U.S. residents each year, asking when they expect to
be affected by global warming. [Source: poll.gallup.com, 2005.]
Display 10.84 Data and summary statistics for 1996 and 2001 AP Calculus AB Exams.
[Source: For the 1996 data: AP Calculus Course Description, May 1998, May 1999, p. 76. For the
2001 data: apcentral.collegeboard.com, June 2002.]
b. Suppose you want to investigate whether c. Suppose each poll had sampled about
these data provide evidence of an increase 500 residents. Is there evidence that the
in the mean grade from 1996 to 2001. level of satisfaction changed significantly
What test is appropriate? Perform this across these years? If so, describe the
test, showing all steps. pattern of change.
E52. A common Gallup poll question often E53. The data in Display 10.86 show the Titanic
asked of a sample of residents of the United passengers sorted by two variables: class of
States is this: “In general, are you satisfied travel and survival status. These data cannot
or dissatisfied with the way things are going reasonably be considered a random sample.
in the United States at this time?” Display They are the population itself, so you must
10.85 shows the results of five such polls be careful in stating the hypotheses and the
conducted from 2001 to 2005. conclusion for any test of significance. Test
Percentage Percentage to see if the apparent lack of independence
Date of Poll Satisfied Dissatisfied between the variables class of travel and
2005 40 58
survival status can reasonably be attributed
to chance or if you should look for some
2004 43 55
other explanation. If the latter, what is
2003 46 53
that explanation?
2002 52 44
Class of Travel
2001 55 42
First Second Third Total
Display 10.85 Results of five polls taken from 2001
to 2005. [Source: poll.gallup.com, 2005.] Yes 203 118 178 499
Survived?
No 122 167 528 817
a. Why don’t the row percentages sum to 1?
Total 325 285 706 1316
b. Suppose each poll had sampled about
1000 residents (about the size of most Display 10.86 Titanic passengers sorted by class of
Gallup polls). Is there evidence that the travel and survival status.
level of satisfaction changed significantly
across these years? If so, describe the
pattern of change.
AP1. This partially completed table of expected counted the number of pimples on each side
frequencies is for a test of the independence of each person’s face. Which test is most
of a father’s handedness and his oldest appropriate in this situation?
child’s handedness. What is the expected matched-pairs t-test
frequency in the cell marked “—?—”?
two-sample t-test
Father
chi-square test for goodness of fit
Left Right Total chi-square test for homogeneity of
Oldest Left 64 72 136 proportions
Child Right —?— 455 chi-square test of independence
Total 591 AP5. A random sample of 1000 adults from each
of the 50 states is taken, and the people
64 455 ___
64
136 are categorized as to whether they have
591 ___
64
136
455 __
64
72
graduated from college or not, generating a
cannot be determined from the 50-by-2 table of counts. Which is the most
information given appropriate test for determining whether
the graduation rates differ among the states?
AP2. In a pre-election poll, 13,660 potential
voters were categorized based on their Compare each count to the expected
annual income (into one of eight categories) frequency of 500 and use a chi-square
and their intended vote (two categories, goodness of fit test.
Republican or Democratic candidate). The Use a chi-square test of homogeneity of
resulting chi-square test of independence proportions.
of these variables gave a test statistic of 270. Use a chi-square test of independence.
How many degrees of freedom are there? Use a one-proportion z-test.
2 7 16 269 13659 Use a t-test for the difference of means.
AP3. Researchers interested in determining AP6. For a project, a student plans to roll a
whether there is a relationship between six-sided die 1000 times. The teacher
a mother’s birthday and the birthday of becomes suspicious when the student
her oldest child took a random sample of reports getting 168 ones, 165 twos, 170
200 mothers. The mother’s birthday was threes, 167 fours, 164 fives, and 166 sixes.
categorized as within a week, more than a The teacher performs a chi-square test with
week but less than a month, or more than the alternative hypothesis that the student’s
a month from her oldest child’s birthday. reported observed counts are closer to the
Assuming that there are 365 days in a year expected frequencies than can reasonably
and the same number of people are born on be attributed to chance. What is the test
each day, what is the expected frequency of statistic and P-value for this test?
mothers in the “within at most one week”
c 2 0.14; P-value 0.0004
category?
c 2 0.14; P-value 0.9996
8 8.219 66 66.67 67
c 2 1000; P-value 0
AP4. In a study of whether two acne medications,
c 2 23.33; P-value 0.0004
are equally effective, researchers got
eight volunteers and randomly chose one c 2 23.33; P-value 0.9996
side of each person’s face to receive each AP7. A researcher wanted to determine whether
medication. After one month of use, they Barbarians and Vandals have similar food
AP8. Random samples are taken from two Survived 12,315 10,387
populations (Barbarians and Vandals) and
c. From the two analyses in parts a and b,
each person is categorized as a pillager or a
which safety feature appears to have the
burner. Which of the following, when done
stronger association with survival?
appropriately, is equivalent to a chi-square
test of homogeneity for this situation? AP10. The analysis in AP9 does not account for
seat belts and air bags interacting with
one-sample z-test each other. This table splits the air bag data
two-sample z-test according to whether seat belts were also in
one-sample t-test use during the accident.
two-sample t-test Seat Belts Used Seat Belts Not Used
chi-square goodness-of-fit test Air Bags No Air Bags Air Bags No Air Bags
Killed 19 16 23 44
Investigative Tasks
Survived 10,464 6,230 1,851 4,157
AP9. To study the effectiveness of seat belts and
Total 10,483 6,246 1,874 4,201
air bags, researchers took a careful look
at data that were collected on accidents a. What do the data suggest as to the effect
between 1997 and 2002. The data came of air bag use when seat belts are also
from a division of the National Highway used? Is there a significant association?
Traffic Safety Administration that collects b. What do the data suggest about the effect
detailed data on a random sample of of air bag use when seat belts are not
accidents “in which there is a harmful used? Is there a significant association?
event and from which at least one vehicle
6.0
5.5
Rock
Soil Sample
5.0
4.5
Redness
4.0
3.5
3.0
2.5
2.0
Do Mars rocks with
more sulfur tend
to be redder? Data 1.5
from the Pathfinder
mission to the red 0 1 2 3 4 5 6 7 8
planet were used to Sulfate Percentage
explore questions
like this.
Trivia Question 1: Who are Shark, Barnacle Bill, Half Dome, Wedge, and Yogi?
Answer: Mars rocks, found at the landing site of Mars Pathfinder in
July 1997.
Trivia Question 2: Mars is often called the red planet. What makes it red?
Answer: Sulfur?
This chapter addresses how to make inferences about the unknown true
relationship between two quantitative variables. The methods and logic you will
learn in this chapter apply to a broad range of such questions.
Linear Models
In Chapter 3, you learned to summarize a linear relationship with a least squares
regression line,
ŷ b0 b1x
That equation is a complete description if you have the entire population, but if
you have only a random sample, the values of b0 and b1 are estimates of the true
population parameters. That is, there is some underlying
__
“true” linear relationship
that you are trying to estimate, just as you use x as an estimate of μ. The notation
for a linear relationship is
y (0 1x) e
where 0 and 1 refer, respectively, to the intercept and slope of a line that you
don’t ordinarily get to see—the true regression line you would get if you had
e is the observed data for the whole population instead of only a sample. The letter e indicates
value minus the value the size of the random deviation—how far a point falls above or below the
predicted by the true true regression line. The true regression line, sometimes called the line of means
regression line.
or the line of averages, is written
μy 0 1x
Because such linear models are often used to predict unknown values of
y from known values of x, or to explain how x influences the variation in y,
y is called the response variable and x is called the predictor variable or the
explanatory variable.
Activity 11.1a is designed to help you understand the roles of these equations
and the relationship between them.
1. Use the information above to fill in the second column of your copy of the
table in Display 11.2.
Average Height Random Observed Height,
Age, x from Model, μy Deviation, ε y, of Your Child
8 51 —?— —?—
.. .. .. ..
. . . .
13 —?— —?— —?—
Yogi: You don’t expect me to believe that real data get created the way
we did it in the activity, do you?
Shark: No. At best, this model is a good approximation, and it’s up to you
to judge how well it describes the process that created your data.
But precisely because the model is a simplified version of reality,
when it’s reasonable, it’s quite useful.
Height (in.)
60
55
50
45
8 9 10 11 12 13
Age (yr)
Display 11.3 Scatterplot of height versus age. Conditional
distributions of children’s height given their age
are the vertical columns of dots.
You learned in Activity 11.1b that the variation in the estimated slopes
depends not only on how much the values of y vary for each fixed value of x but
also on the spread of the x-values.
In the methods of inference in this chapter, the variability in y is assumed
to be the same for each conditional distribution. That is, if you picked a value
of x and computed the standard deviation of all the associated values of y in the
population, you’d get the same number, , as you would if you picked any other
x, as shown in Display 11.4. This implies that also measures the variability of all
values of y about the true regression line. You can use this fact to estimate from
your data.
x μ y b0 b1x
Display 11.4 The variability of all values of y about the true
regression line is the same as the variability of the
y-values at each fixed value of x.
Variability in x and y
Note that s The common variability of y at each x is called . It is estimated by s, which can
measures the be thought of as the standard deviation of the residuals. You compute s using
variability of the all n values of y in the sample:
residuals.
_________ _____
∑(yi ŷ i) 2
_____
s _________ SSE
n2 n2
_________
__ __
The spread in the values of x is measured by ∑(xi x )2 , where x is computed
using all n values of x in the sample.
Display 11.5, which shows some results from Activity 11.1b, shows that the
variability in the slope depends on the sample size and the variability in x and y.
Plot I: x 0, μy 10; x 1, μy 12; Plot II: x 0, μy 10; x 4, μy 18;
3 3
20 20
16 16
y 12 y 12
8 8
4 4
0 1 2 3 4 0 1 2 3 4
x x
Plot III: x 0, μy 10; x 1, μy 12; Plot IV: x 0, μy 10; x 4, μy 18;
5 5
20 20
16 16
y 12 y 12
8 8
4 4
0 1 2 3 4 0 1 2 3 4
x x
Display 11.5 Regression lines relating variability in b1 to variation
in y and to spread in x.
11.1 Variation in the Slope from Sample to Sample 743
Each plot in Display 11.5 shows five regression lines for one of the four cases;
for example, Plot I shows five regression lines for Case 1. Each line was constructed
using the instructions in step 1 of the activity. Plots I and II (or III and IV) show
that a wider spread in x results in regression lines with less variability in their
slope—even though the values of y vary equally for each x-value. By comparing
plot I with plot III (or plot II with plot IV), you can see something more expected:
More variability in each conditional distribution of y means more variability in the
slope, b1.
∑(yi ŷ i)
_________
2
s
sb1 _________ n2
__
___________ ___________
_________
__
∑(xi x ) ∑(xi x )
2 2
The formula does what you would expect: The slope varies less from sample to
sample when the sample size is larger, when the values of y tend to be closer to the
regression line, and when the values of x are more spread out.
Display 11.8 Number and mass of french fries. [Source: Nathan Wetzel,
“McDonald’s French Fries. Would You Like Small or Large Fries?”
STATS, 43 (Spring 2005): 12–14.]
100
90
small_fries
80
70
60
30 35 40 45 50 55 60 65 70 75
Small_Bag_Number
20
Residual
0
-20
30 35 40
45 50 55 60 65 70 75
Small_Bag_Number
Small_Bag_Mass = 0.741Small_Bag_Number + 38; r2 = 0.40
150
140
130
120
110
60 70 80 90 100 110 120 130 140 150
Large_Bag_Number
Residual
20
0
-20
60 70 80
90 100 110 120 130 140 150
Large_Bag_Number
Large_Bag_Mass = 0.321Large_Bag_Number + 115; r2 = 0.23
∑(yi ŷi) 2
_________ 2057.3
______
sb1 s
___________ n 2
___________ 32 2 0.1656
________
__
_________ __
_________ ______
∑(xi x ) ∑(xi x ) 2499.5
2 2
So, if you were to take many random samples of 32 bags of small fries and
compute the slope of the regression line for predicting mass from number
of fries, the standard deviation of the distribution of these slopes would be
about 0.1656.
For the large bags, the estimated standard error is
_________
_______
∑(yi ŷi) 2
_________ 3337.10
_______
s
___________ n 2
___________ 29 2
_________
s b1 __
_________ __
_________ _______ 0.1118
∑(xi x ) ∑(xi x ) 9886.69
2 2
The estimated standard error of the slope for the large bags is smaller than that
for the small bags, even though there is much more variation in the masses (y)
of the large bags. This happens because the larger variation in masses is offset
by the larger variation in the numbers of fries (x) for the large bags.
■
∑(yi ŷi)
_________
2
s
sb1 ___________ n2
___________
__
_________ _________
__
∑(xi x ) ∑(xi x )
2 2
Practice
Linear Models b. Find the least squares regression line,
P1. The scatterplot of data on pizzas in Display ŷ b0 b1x, for these data. Interpret
3.31 on page 134 shows the number of the slope. Compare the estimated slope
calories versus the number of grams of fat and intercept to the theoretical slope and
in one serving of several kinds of pizza. Fat intercept in part a. Are they close?
contains 9 calories per gram. c. Calculate the random deviation, e,
a. What would be the theoretical slope of a for each student, using the theoretical
line representing such data? regression line. Plot these random
deviations against the arm spans and
b. What does the intercept of the line tell you?
comment on the pattern.
c. What are some reasons why not all of the
d. For each student, calculate the residual
points fall exactly on a line with the slope
from the estimated regression line. Plot
in part a?
the residuals against the arm spans and
P2. According to Leonardo da Vinci, a person’s comment on the pattern. Is the pattern
arm span and height are about equal. similar to that for the random deviations?
Display 11.10 gives height and arm span
P3. Every spring, visitors eagerly await the
measurements for a sample of 15 high school
opening of the spectacular Going-to-the-Sun
students.
Road in Glacier National Park, Montana. A
Arm Span Height Arm Span Height typical range of yearly snowfall for the area is
(cm) (cm) (cm) (cm) from 30 to 70 inches. The amount of snow is
168.0 170.5 129.0 132.5 measured at Flattop Mountain, near the top
172.0 170.0 169.0 165.0 of the road, on the first Monday in April and
101.0 107.0 175.0 179.0 is given in swe (snow water equivalent: the
161.0 159.0 154.0 149.0 water content obtained from melting). From
166.0 166.0 142.0 143.0 analysis of past data, when the amount of
174.0 175.0 156.5 158.0
snow was 30 inches of swe, the road opened,
on average, on the 150th day of the year.
153.5 158.0 164.0 161.0
Every additional 0.57 in. of swe measured
95.0 95.5
at Flattop Mountain meant another day
Display 11.10 Height versus arm span for 15 high on average until the road opened. [Source:
school students. “Spring Opening of the Going-to-the-Sun Road and Flattop
Mountain SNOTEL Data,” Northern Rocky Mountain Science
a. What is the theoretical regression line, Center, U.S. Geological Survey, December 2001, www.nrmsc
μy 0 1x, that Leonardo is .usgs.gov.]
standard error of the slope. Then locate Display 11.12 Data and scatterplot for P7.
your estimate from P4 of the variation in a. Find both the true regression line,
y about the line. What is the equation of μy 0 1x, and the least squares
the regression line? regression line, ŷ b0 b1x, and
compare them.
Exercises
Price
25 35
commonly sold in the United States. 15
30
25
Manufacturer 5 20
Model price mpg liter hp rpm 60 140 220 60 140 220
Chevrolet 13.4 36 2.2 110 5200 Horsepower Horsepower
Cavalier
Lumina APV 45 40
40 35
Chevrolet Astro 16.6 20 4.3 165 4000 35 30
30 25
Dodge Shadow 11.3 29 2.2 93 4800 25 20
Dodge Caravan 19.0 21 3.0 142 5000 20
4000 5000 6000
Eagle Vision 19.3 28 3.5 214 5800 1.0 2.0 3.0 4.0 Maximum Revolutions
Engine Size (L) per Minute
Ford Probe 14.0 30 2.0 115 5500
Hyundai Elantra 10.0 29 1.8 124 6000 Display 11.13 Table and scatterplots of car models
Lexus SC300 35.2 23 3.0 225 6000
data. [Source: Journal of Statistics Education
Data Archives, June 2002.]
Mazda RX-7 32.5 25 1.3 255 6500
Oldsmobile 13.5 31 2.3 155 6000
The variables are:
Achieva price typical selling price, in thousands of
Pontiac Grand 18.5 27 3.4 200 5000 dollars
Prix
mpg typical highway mileage, in miles per
Suzuki Swift 8.6 43 1.3 70 6000 gallon
Volkswagen Fox 9.1 33 1.8 81 5500
liter size of the engine, in liters
Volvo 850 26.7 28 2.4 168 6200
hp horsepower rating of the vehicle
rpm maximum revolutions per minute
the engine is designed to produce
(continued)
8
b. By looking at the scatterplots, estimate
6 which of the four pairs of variables has
4 the largest standard error of the slope and
which has the smallest.
2
c. Compute the slope of the least squares
0
10 20 30 40 50 60 70 80 regression line for the relationship
Mean Temp (°F) between y Mean Gas and x Mean
Temp. Interpret the slope in context.
40
Compute the estimated standard error
35
of the slope.
30
d. Find your values from part c on the
Mean KWH
25
20
computer printout in Display 11.16. Also
15
find and interpret s, the estimate of .
10 12
5 10
Mean Gas
8
0
10 20 30 40 50 60 70 80 6
Mean Temp (°F)
4
40 2
35 0
30 0 10 20 30 40 50 60 70 80
Mean KWH
25 4
Residual
20
0
15
10 –3
5 0 10 20 30 40 50 60 70 80
Mean Temp
0 200 400 600 800 1000 1200 Mean Gas –0.236 Mean Temp 17; r 2 0.84
Heat DD
Predictor Coef Stdev t-ratio p
1200 Constant 16.988 1.196 14.20 0.000
MeanTemp –0.23643 0.02401 –9.85 0.000
1000
s = 1.548 R-sq = 84.3% R-sq(adj) = 83.5%
800 Analysis of Variance
Heat DD
600 Source DF SS MS F p
Regression 1 232.45 232.45 97.01 0.000
400 Error 18 43.13 2.40
Total 19 275.58
200
Display 11.16 Regression analysis for Mean Gas
0
10 20 30 40 50 60 70 80
versus Mean Temp.
Mean Temp (°F)
E3. Suppose you collect a dozen cups, glasses,
Display 11.15 Table and scatterplots of household round bowls, and plates of various sizes
energy data. [Source: R. Carver, “What and then plot the distance around the rim
Does It Take to Heat a New Room? Estimating versus the distance across, both measured in
Utility Demand in a Home,” Journal of Statistics
centimeters.
Education 6, no.1 (1998).]
Redness
the area inside the triangle against the square 5.0
of the length of a side. 4.5
4.0
4.0 4.5 5.0 5.5 6.0 6.5
Sulfate Percentage
The test statistic for the slope is the difference between the slope, b1, estimated
from the sample, and the hypothesized slope, 10, measured in standard errors:
b1 10
t _______
sb1
If a linear model is correct and the null hypothesis is true, then the test statistic
has a t-distribution with n 2 degrees of freedom.
Yogi: What a relief! I was waiting for them to introduce yet another new
distribution: the z, the t, the c2, the blah-blah-cube.
Shark: But aren’t you worried about one little thing? Why does this
statistic have a t-distribution and not something else?
Yogi: Worried? I said I was relieved! I know where the t-distribution is
on my calculator. The df rule is easy. I am happy, and now you are
trying to raise problems. You have not had enough sulfur in your
diet.
Shark: Okay, we’ll let it go until your next course in statistics. For now,
simply notice that the sample slope behaves something like a
mean.
Redness
4
3
2
0 1 2 3 4 5 6 7
Sulfate Percentage
Rock Soil sample
2. State the hypotheses. The null and alternative hypotheses usually will be
H0: 1 0 and Ha: 1 0, where 1 is the slope of the true regression line.
However, the test may be one-sided, and the hypothesized value, 10, may
be some constant other than 0.
3. Compute the value of the test statistic, find the P-value, and draw a
sketch. The test statistic is
_1 P _1 P
b1 10 2 2
t _______
sb1
–t 0 t
Here b1 and sb1 are computed from your sample. To find the P-value, use
your calculator’s t-distribution with n 2 degrees of freedom, where n is
the number of ordered pairs in your sample. [See Calculator Note 11D.]
4. Write your conclusion linked to your computations and in context. The
smaller the P-value, the stronger the evidence against the null hypothesis.
Reject H0 if the P-value is less than the given value of , typically 0.05.
Alternatively, compare the value of t to the critical value, t*. Reject H0
if t t*, for a two-sided test.
Yogi: Four conditions! What happened to good old “line plus random
variation”?
Shark: It’s still there. But “variation” takes in a lot of territory. The
variation about the line has to be both random and regular.
“Regular” here means that the vertical spread is the same as you go
from left to right across your scatterplot and that the distribution
of points in each vertical slice is roughly normal.
Yogi: This is starting to sound complicated.
Shark: Not really. Sometimes a violation of the conditions will be obvious
from the scatterplot. To be safe—or if you happen to be taking
some important test—you should look at a residual plot as well as
a dot plot or boxplot of the residuals.
Yogi: That still leaves one more condition. Surely you’re not going to tell
me I can check randomness by looking at a plot?
Shark: No. For that condition, you need to check how the data were
collected. The observations should have been selected randomly,
which means, partly, that they should have been selected
independently.
Yogi: (Loud sigh)
Shark: Just read the next example, and you’ll see how easy it is.
758 Chapter 11 Inference for Regression
Example: Price Versus Horsepower
In E1 on page 751, you were asked about price versus horsepower for a random
sample of car models. Display 11.19 shows the data and a scatterplot. On the face
of it, the relationship looks strong enough for you to conclude that the pattern
is not simply the result of random sampling: In the population as a whole, there
really must be a relationship between price and horsepower. You would expect a
formal test to lead to the same conclusion, and it does. Carry out a significance
test for the slope of the true regression line for price versus horsepower.
Price
Horsepower ($ thousands)
110 13.4
170 16.3
40
165 16.6
35
93 11.3 30
Price
142 19.0 25
20
214 19.3
15
115 14.0 10
124 10.0 5
225 35.2 60 100 140 180 220 260
255 32.5 Horsepower
price 0.126 hp 1.5; r2 .72
155 13.5
200 18.5
70 8.6
81 9.1
168 26.7
Solution
Check conditions: You have a random sample. The relationship looks reasonably linear in the
randomness and scatterplot. The equation of the least squares regression line through these points
linearity. is ŷ 1.544 0.126x.
If you examine the residual plot in Display 11.20, you can see that while the
Check condition: relationship appears to be generally linear, the variation from the regression line
uniform residuals. tends to grow with x. That is, the values of y tend to fan out and get farther from
the regression line as the value of x increases. In the next section, you will see how
a transformation helps fix this violation of the conditions for inference.
Residual
2
0
–2
–4
–6
–8
[60, 260, 20, –8, 10, 2]
60 100 140 180 220 260
Horsepower
Display 11.20 Residual plot from the regression analysis of price
versus horsepower.
Check condition: If you plot the residuals themselves, as in Display 11.21, they look as if they
normality. reasonably could have come from a normal distribution. There are gaps, but the
distribution is reasonably symmetric.
–8 –6 –4 –2 0 2 4 6 8 10
Residual
[–8, 10, 1, 0, 5, 1]
Display 11.21 Dot plot and histogram of residuals. (By hand, you might prefer
a dot plot, but your calculator will only graph a histogram.)
State the hypotheses. The hypotheses are H0: 1 0 versus Ha: 1 0, where 1 is the slope of the
true linear relationship between the price and the horsepower for all car models.
Do computations and
draw a sketch.
b1 10
t _______
s b1
0.1256 0
_____________ 0.00003 0.00003
(4.448/205.192)
5.79 –5.79 0 5.79
The t-statistic is approximately 5.79, and the P-value is approximately 0.00006.
There is strong evidence to reject the null hypothesis and say that the slope
Give conclusion in of the true regression line for price versus horsepower is different from 0. A linear
context. model with 0 for the true slope probably would not have produced these data.
■
The results in the example are summarized in the Data Desk printout of a
regression analysis in Display 11.22. Note that s is the standard deviation of the
residuals and is equal to the square root of the mean square for residuals. The
s2
b0
b1 sb1 t P-value
As for all statistical tests, there are three ways to get a tiny P-value like the one
in the example: The model could be unsuitable, the sample could be truly unusual,
or the null hypothesis could be false.
• Unsuitable model? The relationship looks linear, and the sample is random.
But we are a bit worried about the fact that the variability of the y-values
seems to grow with the value of x.
• Unusual sample? The P-value tells just how unusual the sample would be if
the null hypothesis were true. Here, less than one sample in 10,000 would give
such a large value of the test statistic.
• False H0? By a process of elimination, this is the most reasonable explanation.
What’s the bottom line? In the population as a whole, there’s a positive linear
relationship between price and horsepower.
b1 t* sb1
The value of t* depends on the confidence level and the number of degrees
of freedom, df, which is n 2. [See Calculator Note 11E.]
3. Give interpretation in context. For a 95% confidence interval, you would
say that you are 95% confident that the slope of the underlying linear
relationship lies in the interval. By 95% confidence, you mean that out of
every 100 such confidence intervals you construct from random samples,
you expect the true value, 1, to be in 95 of them.
b1 b1
t __
sb or sb1 __
t
1
or (0.079, 0.173).
You can also use a calculator to find this interval. [See Calculator Note 11E.]
Give interpretation in You are 95% confident that the slope of the true linear relationship between
context. price and horsepower is between 0.079 and 0.173. Converting from thousands of
dollars to dollars, the increase in the cost of a car per unit increase in horsepower is
somewhere between $79 and $173. In other words, if one model has 1 horsepower
more than another model, its price tends to be between $79 and $173 more. This
result means that any true slope 1 between 0.079 and 0.173 could have produced
such data as a reasonably likely outcome. A value of 1 outside the confidence
interval could not have produced numbers like the actual data as a reasonably
likely outcome.
■
∑(yi ŷ i)
_________
2
s
n2
Shark: Right. The “center” is your regression line. How many deviations,
or residuals, are there?
Yogi: There are n values of (yi ŷ i), where n is the number of pairs
(x, y) in the sample. I suppose two of them must be redundant. If I
know n 2 of the residuals, I can figure out the other two?
Shark: Right. Try it with this example, where I won’t tell you either the
values of y or the equation of the regression line. The two missing
residuals are R and S. Can you figure out what they are?
x y y ŷ
1 —?— 1
2 —?— 2.5
3 —?— R
4 —?— S
Yogi: Well, I know that ∑(y ŷ) 0 because that’s always the case. So
1 (2.5) R S equals 0. But that’s not enough to get R and S.
Shark: Correct. You need a second condition on the residuals. That
condition is that x and the residuals are uncorrelated—the residuals
don’t grow or shrink as x increases. If you check the formula for the
correlation, you will see that for it to be zero means that
__ __
∑(x x )(residual mean of residuals) ∑(x x )[(y ŷ) 0]
__
∑(x x )(y ŷ) 0
Yogi: Okay. That gives me two linear equations in two unknowns:
b1 0
t ______
sb 1
To find the P-value, compare the value of the test statistic with a t-distribution
with n 2 degrees of freedom. If you cannot reject the null hypotheses, then there
is no statistically significant evidence of a linear relationship between x and y.
For a confidence interval for the slope of the true regression line, compute
b1 t* sb1. Again, use n 2 degrees of freedom.
As a rule, first do a test to answer the question “Is there an effect?” Then, if
you reject H0, construct a confidence interval to answer the question “How big is
the effect?”
Don’t confuse statistical significance with practical importance. Significant
means “big enough to be detected with the data available”; important means “big
enough to care about.”
You should not use the techniques of this section for time-series data, that
is, for cases that correspond to consecutive points in time. In these situations,
the individual observations typically aren’t selected at random and so are highly
dependent. Today’s temperature depends on yesterday’s. The unemployment rate
next quarter is unlikely to be very far from the rate this quarter. If your cases have
a natural order in time, chances are good that you should use special inference
methods for analysis of time series rather than the methods of this chapter.
12
11
10
9
8
7
0 5 10 15 20 25 30
Phone
88 (cm) IQ
84 54.7 96
80
54.2 89
76
72 53 87
68 52.9 87
14 15 16 17 18 19 20 21 57.8 101
Chirps per Second
56.9 103
Display 11.25 Temperature and number of chirps 56.6 103
per second by a Nemobius fasciatus
55.3 96
fasciatus cricket, as measured
53.1 127
electronically. [Source: George W. Pierce,
The Songs of Insects (Cambridge, Mass.: Harvard 54.8 126
University Press, 1949), pp. 12–21.] 57.2 101
a. Fit a straight line to the data that could be 57.2 96
used to predict the temperature from the 57.2 93
chirp rate. Interpret the slope. 57.2 88
b. Make a residual plot of (chirp rate, 55.8 94
residual) and a dot plot of the residuals to 57.2 85
check conditions. 57.2 97
c. Use the Minitab printout in Display 56.5 114
11.26 to test whether there is a linear 59.2 113
relationship between the chirp rate and 58.5 124
the temperature.
Display 11.27 IQ and head circumference. [Source:
M. J. Tramo et al., “Brain Size, Head Size, and
IQ in Monozygotic Twins,” Neurology 50 (1998):
1246–52.]
Exercises
E11. Display 11.30 shows the gas mileage (mpg) c. Find a 90% confidence interval estimate
and horsepower ratings (hp) for the random of the slope of the true regression line.
sample of car models in E1. The scatterplot Interpret the result in the context of the
and printout for the regression of mpg versus variables.
hp are shown in Display 11.31. 45
hp mpg hp mpg
Miles per Gallon
40
110 36 225 23 35
170 23 255 25 30
165 20 155 31 25
93 29 200 27 20
142 21 70 43 60 100 140 180 220 260
214 28 81 33 Horsepower
115 30 168 28
Dependent variable is: MPG
124 29 No Selector
R squared = 40.6% R squared (adjusted) = 36.0%
Display 11.30 Gas mileage and horsepower ratings. s = 4.778 with 15 – 2 = 13 degrees of freedom
Source Sum of Squares df Mean Square F-ratio
a. Locate the estimated standard error of b1. Regression 202.759 1 202.759 8.88
Residual 296.841 13 22.8339
b. Is there evidence to say that the slope of
Variable Coefficient s.e. of Coeff t-ratio prob
the true regression line is different from Constant 38.9805 3.759 10.4 ≤0.0001
0? Use 0.05. Horsepower –0.069395 0.0233 –2.98 0.0106
5200 36 6000 23
8
4800 23 6500 25
6
4000 20 6000 31
4 4800 29 5000 27
2 5000 21 6000 43
5800 28 5500 33
0
10 20 30 40 50 60 70 80
5500 30 6200 28
Mean Temp (°F)
6000 29
Predictor Coef Stdev t-ratio p
Constant 16.988 1.196 14.20 0.000 Display 11.33 Gas mileage and maximum
MeanTemp –0.23643 0.02401 –9.85 0.000 revolutions per minute for the car
s = 1.548 R-sq = 84.3% R-sq(adj) = 83.5% models problem.
Analysis of Variance
Source DF SS MS F p
E14. Refer to the data on mean monthly
Regression 1 232.45 232.45 97.01 0.000 electricity usage (in KWH) and mean
Error 18 43.13 2.40 monthly temperature (in degrees Fahrenheit)
Total 19 275.58
for a single-family residence over a sample of
Display 11.32 Scatterplot and computer printout months in Display 11.15 on pages 752–753.
for mean monthly gas usage versus Is there evidence that the mean monthly
mean monthly temperature. electricity usage is a linear function of the
a. Locate the estimated standard error of the mean monthly temperature?
slope. a. Do all four steps of the test of
b. Is there sufficient evidence to say that significance.
the slope of the true regression line is b. Would you say that this is strong evidence
different from 0? Use 0.05. of a linear relationship or evidence of a
c. Find a 90% confidence interval estimate strong linear relationship? Why?
of the slope of the true regression line. E15. Refer to the data on chirp rate in P13.
Interpret the result in the context of the a. Construct (and interpret, as always) a
variables. 95% confidence interval for the slope of
the true regression line
i. for predicting the temperature from
the chirp rate
ii. for predicting the chirp rate from the
temperature
b. Explain why the interval widths in part a
are not the same.
40
80 35
60 30
25
Price
40 20
15
20
10
0 5
0
0 50 100 150 200 250 300 350 400
20 24 28 32 36 40 44
70
Residual
Residual 15
0 0
–30
–15
0 50 100 150 200 250 300 350 400
Velocity of Black Hole 20 24 28 32 36 40 44
Miles per Gallon
Distance from Earth 0.105 Velocity
of Black Hole 1.6; r 2 0.23 price –0.747 mpg 39; r 2 0.30
Crime Rate
6 14,000
HCB
12,000
5 10,000
4 8,000
6,000
3 4,000
3 4 5 6 7 8 9 0 4,000,000 8,000,000
1.4 10,000
Residual
Residual
0 0
–1.0 –6,000
3 4 5 6 7 8 9 0 4,000,000 8,000,000
Aldrin Total Population
HCB 0.472 aldrin 3.0; r 2 0.54 cr –0.000146 pop 10,300; r 2 0.0022
Weight
3
11.1 0.7
2
12.1 0.9
1
13 1.2
0
14.3 1.6
8 10 12 14 16 18 20
15.8 2.3
0.6
Residual
16.6 2.6
0
17.1 3.1
17.8 3.5 –0.6
8 10 12 14 16 18 20
18.8 4
Length
19.7 4.9 weight 0.429 length 4.2; r 2 0.95
You can see from the plot that the relationship between mean weight and length
should not be modeled by the straight line. You can also figure this out by
thinking about the physical situation. Length is a linear measure, and weight is
more closely connected to volume, a cubic measure. As you learned in Chapter 3,
Replacing (x, y) you can linearize power functions by taking the logarithm of each value of x and
with (log x, log y) or of each value of y. (You can use either base 10 logarithms or natural logarithms
(ln x, ln y) is called a for your change of scale.) As you can see from the scatterplot, residual plot, and
log-log transformation. plot of the residuals in Display 11.40, ln(weight) versus ln(length) is linear, and the
residuals are small and scattered randomly about the line.
ln(weight)
1.0
0.5
0
–0.5
–1.0
2.2 2.4 2.6 2.8 3.0
0.06
Residual 0
–0.06
2.2 2.4 2.6 2.8 3.0 –0.04 –0.02 0 0.02 0.04
ln(length) Residual (ln(length), ln(weight))
ln(weight) 3.38 ln(length) 8.5; r 2 1.00
b1 t* sb 3.38051 (2.262)(0.0353)
1
Display 11.41 Data on population and economic variables for women in a random sample
of countries from around the world. [Source: United Nations Department of Economic
and Social Affairs, The World’s Women 2000: Trends and Statistics, unstats.un.org.]
Solution
The plots in Display 11.42 show the regression of the infant mortality rate for
girls (img) versus the fertility rate ( fr). The association is positive and looks as
if it follows a linear trend, but the plot is heteroscedastic—countries with larger
fertility rates tend to have larger variation in the infant mortality rate for girls.
Further, the boxplot of the residuals is skewed left and has four outliers.
Infant Mortality
Rate for Girls
120
100
80
60
40
20
0
1 2 3 4 5 6 7 8
70
Residual
0
–60
0 1 2 3 4 5 6 7 8 –60 –40 –20 0 20 40 60 80
Fertility Rate Residual (fr, img)
img 17.1 fr 21; r 2 0.69
log(img)
1.6 1.6
1.4 1.4
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0 1 2 3 4 5 6 7 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.5 0.5
Residual
Residual
0 0
–0.5 –0.5
0 1 2 3 4 5 6 7 8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Fertility Rate log( fr)
log(img) 0.195 fr 0.68; r 2 0.68 log(img) 1.45 log( fr) 0.67; r 2 0.66
Yogi: I have got a great idea! Forget all of this transformation stuff. I
just go to the Stat Calc menu on my calculator, fit every function
in the list—linear, quadratic, cubic, quartic, log, exponential,
Logistic functions power, logistic (whatever that is), and sine—and see which gives
model the spread of an me the largest value of r. That’s a whole lot easier than “log this”
epidemic, for example. and “log that.”
They have the form
c
y ________ . Shark: Chapter 3 was a long time ago, but . . .
1 aebx
Yogi: Oh, right. They wouldn’t let me do that there either. Remind me
again why not. After all, r does tell me how closely the points
cluster about my function.
Shark: You can get a very high value of r even though the equation
you used isn’t a good fit to the data. Look at Display 11.39 on
largemouth bass. It gives a value for r 2 of 0.95. That tells you that
the points cluster closely to the line. But, as you saw, a line isn’t a
good model for these data. They clearly follow a curve, not a line.
You can see that best from the residual plot—not from the value
of r. Also, you can get a very low value of r even though the points
form a fat elliptical cloud and a line is a perfectly appropriate
model.
Yogi: Well, okay. I see why I shouldn’t pay much attention to r. But if I
think the points follow an exponential curve, why can’t I just use
my calculator to fit an exponential equation instead of converting
all the y’s to log y’s and fitting a straight line? I promise to check
the residual plot.
Shark: Good statisticians transform for linearity because linear functions
are simpler to deal with than curves—and simpler to understand.
You know pretty well what the slope and y-intercept mean for a
line, but it’s much harder to interpret the parameters for other
types of equations.
Yogi: Hmmm. You must be right, because my calculator only tests for
the significance of a slope for a line! And the statistical software
we are using only fits lines, not other types of functions.
Shark: Good point.
Percentage of Renters
Test of Largest U.S. Cities
70 Response attribute (numeric): PercentRenters
60 Predictor attribute (numeric): TotalPopulation
50 Sample count: 77
20
If it were true that the slope of the regression line were equal
0 to 0 (the null hypothesis), and the sampling process were
–20 performed repeatedly, the probability of getting a value for
Student's t with an absolute value this great or greater
0 2,000,000 6,000,000 would be 0.0054.
Total Population
pr 0.00000292 pop 48; r 2 0.099
–10 0 10 20 30
Residual (pop, pr)
50 Sample count: 72
20
10 If it were true that the slope of the regression line were equal
0 to 0 (the null hypothesis), and the sampling process were
–10
–20 performed repeatedly, the probability of getting a value for
Student's t with an absolute value this great or greater
200,000 600,000 1,000,000 would be 0.97.
Total Population
pr 0.000000164 pop 49.6; r 2 0.000019
–10 0 10 20 30
Residual (pop, pr)
a. Compare the two sets of results. Did eliminating the five largest cities
improve the conditions for inference? How influential did the five cities
turn out to be?
log(renters)
Predictor attribute (numeric): logPopulation
1.7
Sample count: 72
1.6
Equation of least-squares regression line:
logRenters = 0.0109538 logPopulation + 1.6295
1.5 Alternative hypothesis: The slope of the least squares
5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 regression line is not equal to 0.
–0.5 0
ln(weight)
1.2
Weight
–1.0 –0.5
0.8 –1.5 –1.0
–2.0 –1.5
0.4 –2.5 –2.0
0 –3.0 –2.5
4 6 8 10 12 4 6 8 10 12 1.4 1.8 2.2 2.6
0.20
Residual
0.2 0.2
Residual
Residual
0.10
0 0 0
–0.2 –0.10 –0.2
4 6 8 10 12 4 6 8 10 12 1.4 1.8 2.2 2.6
Length Length ln(length)
weight 0.147 length 0.70; r 2 0.93 ln(weight) 0.320 length 3.7; r 2 0.99 ln(weight) 2.51 ln(length) 6.17; r2 0.99
–0.2 –0.1 0 0.1 0.2 –0.08 0 0.08 –0.2 –0.1 0 0.1 0.2
Residual (length, weight) Residual (length, ln(weight)) Residual (ln(length), ln(weight))
Display 11.47 Scatterplots, residual plots, and dot plots of the residuals
for three models for lengths and weights of black crappies.
Exercises
E23. The scatterplot in Display 11.38, part IV, The five largest cities were removed, and
on page 774 shows the crime rate (cr) Display 11.51 (on the next page) shows
plotted against the population (pop) for the analyses with the original scale and with log
76 largest U.S. cities for which data were transformations of both variables.
available. A test of the significance of the a. Compare these plots with the plots from
slope of the regression line has a P-value the complete set of cities in Display
of 0.69 with df 74 and t 0.4053. 11.38, part IV. Did eliminating the five
14,000 4.1
log(cr)
12,000 4.0
10,000 3.9
8,000
6,000 3.8
4,000 3.7
200,000 600,000 1,000,000 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1
12,000 0.3
Residual
Residual
6,000
0
0
–6,000 –0.3
200,000 600,000 1,000,000 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1
Total Population log(pop)
cr 0.000838 pop 10,000; r 2 0.0042 log(cr) 0.0523 log(pop) 3.7; r 2 0.0075
Test of Largest U.S. Cities-5 Test Slope Test of Largest U.S. Cities-5 Test Slope
Response attribute (numeric): CrimeRate Response attribute (numeric): logCrimeRate
Predictor attribute (numeric): TotalPopulation Predictor attribute (numeric): logPopulation
Sample count: 72 Sample count: 72
The test statistic, Student's t, is 0.5421. There are 70 The test statistic, Student's t, is 0.7289. There are 70
degrees of freedom (two less than the sample size). degrees of freedom (two less than the sample size).
If it were true that the slope of the regression line were If it were true that the slope of the regression line were
equal to 0 (the null hypothesis), and the sampling process equal to 0 (the null hypothesis), and the sampling
were performed repeatedly, the probability of getting a process were performed repeatedly, the probability of
value for Student's t with an absolute value this great getting a value for Student's t with an absolute value
or greater would be 0.59. this great or greater would be 0.47.
Display 11.51 Scatterplot with regression line, residual plot, and boxplot of residuals for
(population, crime rate) and (log(population), log(crime rate)) for U.S. cities
with the four largest cities removed.
Display 11.52 Graduation rates and student-teacher ratios. [Sources: U.S. Census
Bureau, Statistical Abstract of the United States, 2004–2005; and
NCES Report 11247516.]
80 Sample count: 50
75 Equation: Graduation_Rate = -0.602943 ST_Ratio + 84.496
70 Ho: Slope = 0
65 Ha: Slope is not equal to 0
60 Student's t: -1.363
DF: 48
55 P-value: 0.18
10 12 14 16 18 20 22 24
ST_Ratio
15
Residual
0
-15
10 12 14
16 18 20 22 24
ST_Ratio
Graduation_Rate = -0.603ST_Ratio + 84.5 ; r 2 = 0.037
0.00
-0.15
Display 11.53 Scatterplots and analyses of (student-teacher ratio, graduation rate) and
(ln(student-teacher ratio), ln(graduation rate)).
E25. How does the number of police officers in a E26. How do violent crime rates relate to property
state relate to the rate of violent crime? crime rates in U.S. cities? Display 11.55 (on
a. For the sample of states shown in the next page) gives these rates (in terms of
Display 11.54 (on the next page), find a crimes per 100,000 population) for a recent
good-fitting model relating the number year.
of police officers to the violent crime rate. a. For the sample of cities shown in Display
b. Construct a 95% confidence interval for 11.55, find a good-fitting model relating
the slope and interpret it. (Note that the the violent crime rate to the property
number of states sampled is greater than crime rate.
10% of the population, which consists of b. Construct and interpret a 95% confidence
the 50 states. The only problem caused by interval estimate of the slope of the true
that fact is that the estimate of sb1 will be a regression line.
little larger than it should be.)
Display 11.57 Sintering time and weight of wax 495 600 1328 800
from a sintering process. [Source: 451 450 1172 600
R. Sheaffer and James McClave, Probability 395 300 800 300
and Statistics for Engineers (Boston: Duxbury
Press, 1995), p. 536.] 337 200
253 100
E29. At the age of 31, Galileo conducted a series
of experiments on the path of projectiles that Display 11.58 Galileo’s (distance, height) data for
eventually led to his formulation of theories two experiments. [Source: D. A. Dickey
and T. Arnold, “Teaching Statistics with Data
for the motion of falling bodies. In Experiment of Historic Significance: Galileo’s Gravity and
1, a ball was released at a set height on an Motion Experiments,” Journal of Statistics
inclined ramp and allowed to roll down a Education 3, no.1 (1995).]
groove set into the ramp. After leaving the E30. Rivers and streams carry sediment (small
ramp, the ball fell to the floor unobstructed. particles of rock and mineral) downhill
The measurement of the release height (H) as they flow. It seems reasonable that
and the horizontal distance traveled (D) fast-moving streams would carry larger
between leaving the ramp and hitting the floor particles than would slower-moving streams.
were recorded, measured in punti. Knowing the relationship between the speed
Realizing that the ball leaving the ramp of the water and the size of the sediment
in the first experiment had a downward particles would be of great value to those
velocity when it left the ramp, Galileo studying, for example, the effects of dikes
carried out Experiment 2, in which a narrow and buildings on stream flow and the
horizontal shelf was placed at the end of resulting sediment carried off by the stream.
the ramp. When the ball reached the edge Display 11.59 shows the sample data on the
of the inclined plane, it rolled across the diameters of particles moved and the speed
shelf before starting its fall, neutralizing of the water moving them.
the downward force. The data for both a. Set up a plot to predict the size of objects
experiments are shown in Display 11.58. moved from the speed of the current.
a. For Experiment 2, set up a plot to predict Is a linear model a good fit? If not, find
distance from height. Would a simple a transformation that linearizes the
linear model be a good fit for these data? relationship.
If not, find a transformation that will b. Fit a line to the transformed data of part a
linearize the relationship. and estimate the slope in a 95% confidence
b. Fit a linear equation to the transformed interval. Interpret the observed slope and
data of part a and estimate the slope the confidence interval.
788 Chapter 11 Inference for Regression
Diameter of Objects Speed of Current Classification a. Do the data appear to meet the
Moved (mm) (m/s) of Objects conditions for a regression analysis?
0.2 0.10 Mud b. Verify the computations in the test of
1.3 0.25 Sand significance for the slope in Display
5 0.50 Gravel 11.60. Interpret the result of the test.
11 0.75 Coarse gravel c. Can you infer that an increase in life
20 1.00 Pebbles expectancy causes the fertility rate to
45 1.50 Small stones decrease?
80 2.50 Large stones d. Does the scatterplot suggest a different
180 3.50 Boulders way of looking at these data?
Display 11.59 Diameters of particles and the E32. Display 11.61 shows a regression analysis
speed of moving water. [Source: of the percentage of parliamentary seats
www.seattlecentral.edu.] in a single or lower chamber occupied
by women (ps) versus the girls’ share of
E31. Data on the world’s women can be found
secondary school enrollment (se) from the
in Display 11.41 on page 777. Display 11.60
world’s women data on page 777. Perform a
shows a regression analysis of the relationship
significance test for the slope. Does it agree
between fertility rate and life expectancy.
with the results shown? Make the necessary
8 dot plot of the residuals by estimating them
6 from the plots. Interpret your results.
Fertility Rate
2 30
0 20
40 45 50 55 60 65 70 75 80 85 10
3
0
Residual
0 25 30 35 40 45 50 55 60
–2 30
20
Residual
40 45 50 55 60 65 70 75 80 85 10
0
Life Expectancy –10
Fertility Rate –0.140 Life Expectancy 13; r 2 0.68 –20
25 30 35 40 45 50 55 60
Girls’ Secondary Education (%)
ps 0.633 se 16; r 2 0.14
Review Exercises
E33. Study the scatterplots in Display 11.62. a. In which scatterplots is it reasonable to
I. II. model the relationship between y and x
8
90
with a straight line?
7
y 6 y b. If you fit a line through each scatterplot
5 60
4 by the method of least squares, which plot
3 30 will give a line with slope closest to 0?
3 4 5 6 7 8 0 10 20 30 40 50 60 70 c. Which plot shows a correlation
x x
III. IV. coefficient closest to 1?
8
8 d. For each scatterplot that does not look as
y y
6 if it should be modeled by a straight line,
5 suggest a way to modify the data to make
4
2 2 the shape of the plot more nearly linear.
3.0 4.0 5.0 6.0 7.0 8.0 2 3 4 5 6 7 8 E34. More on pesticides in the Wolf River.
x x Display 11.63 shows four scatterplots
V. of HCB concentration versus aldrin
8 concentration. The four scatterplots are for
y 6 the measurements taken on the bottom, at
4
mid-depth, at the surface, and for all three
2
locations together. Based on the plots, which
2 3 4 5 6 7 8 9 10 depth do you expect to give the narrowest
x
HCB (mid-depth)
7.5 7.5 of detective Allan Pinkerton in using spying
HCB (bottom)
6.5
State of Regiments Estimate
5.0
HCB 5.5 Alabama 11 18
4.0 4.5 Arkansas 2 2
3.0 3.5
Florida 1 2
3.0 3.5 4.0 4.5 5.0 5.5 3 4 5 6 7 8 9
Aldrin (surface) Aldrin Georgia 22 18
Kentucky 1 4
Display 11.63 Four scatterplots of HCB concentration
versus aldrin concentration. [Source: R. V. Louisiana 11 18
Hogg and J. Ledolter, Engineering Statistics (New Maryland 1 4
York: Macmillan, 1987). Original source: P. R.
Mississippi 10 12
Jaffe, F. L. Parker, and D. J. Wilson, “Distribution
of Toxic Substances in Rivers,” Journal of the North Carolina 15 14
Environmental Engineering Division 108 (1982): South Carolina 9 18
639–49.]
Tennessee 7 8
E35. “If I change to a brand of pizza with Texas 3 3
lower fat, will I also reduce the number of Virginia 50 45
calories?” Display 11.64 shows the printout
for a significance test of calories versus fat per Display 11.65 Actual and estimated numbers of
5-oz serving for 17 popular brands of pizza. Confederate regiments. [Source: Chris
Olsen, “Was Pinkerton Right?” STATS (Winter
Calories = 241 + 7.26 Fat 2000): 24.]
Predictor Coef Stdev t-ratio p
Constant 240.55 14.88 16.17 0.000 a. Plot the data, with Pinkerton’s number as
Fat 7.263 1.064 6.82 0.000 the explanatory variable. Does the plot
s = 14.36 R-sq = 75.6% R-sq(adj) = 74.0% have a definite linear trend? If so, fit a
Analysis of Variance least squares regression line to the data.
Source DF SS MS F p
Regression 1 9605.9 9605.9 46.55 0.000 b. Although these observational data cover
Error 15 3095.2 206.3 all states with Confederate regiments,
Total 16 12701.1
inference might still be meaningful in
Display 11.64 Regression analysis for pizza data. deciding if a slope of this size could have
a. Use Display 11.64 to estimate, in a 95% happened merely by chance. Conduct
confidence interval, the reduction in a test to see if the observed slope was
calories you can expect per 1-g decrease likely to occur by chance. What is your
in the fat content of the pizza. conclusion?
b. Use your answer to part a to estimate, in a c. Interpret the slope of the least squares
95% confidence interval, the reduction in line in part a. Estimate plausible values
calories you can expect per 5-g decrease for the “true” slope in a 95% confidence
in the fat content of the pizza. interval. If Pinkerton was on target with
2.0
Action in the North Atlantic 1943 127 3
A Ticklish Affair 1963 89 2
1.0
Four Jills in a Jeep 1944 89 2.5
Blaze 1989 119 2.5 0.0
Hitler—Dead or Alive 1943 70 2 1930 1950 1970 1990
Year
Benson Murder Case 1930 69 2.5
Rating = -0.02954Year + 59.7 ; r2 = 0.54
City Lights 1985 85 1
Rating = -0.0146Year + 31.3 ; r2 = 0.30
Galileo 1973 145 3
short long
Display 11.66 Film lengths and ratings. [Source: Display 11.67 Ratings by year, separated by short
Thomas L. Moore, “Paradoxes in Film Ratings,”
Journal of Statistics Education 14, no. 1 (2006).]
and long movies.
AP1. A statistics exam has two parts, free linear relationship between shoe size and
response and multiple choice. A regression height.
equation for predicting the score, f, on the A confidence interval for the slope
free response part, from the score, m, on would have a width of 0.
the multiple choice part is f 50 0.25m. A confidence interval for the slope
This equation was based on the scores of would include 0.
17 students. The standard deviation of their
multiple choice scores was 30, the standard AP4. Refer to AP3. What is the best explanation
deviation of their free response scores was of what is measured by s 1.87862?
16, and the sum of the squared residuals the variability in the slope from sample
was 2940. What is the estimated standard to sample
error of the slope? the variability in the y-intercept from
0.117 0.133 0.219 sample to sample
0.467 3.5 the variability in the shoe sizes
AP2. Which of the following is not an important the variability in the heights
condition to check before constructing a the variability in the residuals
confidence interval for the slope of the true AP5. Refer to AP3. What is the best explanation
regression line? of what is measured by Std Error 0.1438?
You have a simple random sample. the variability in the slope from sample
The points fall in an elliptical cloud. to sample
The residuals for small values of x the variability in the y-intercept from
have about the same variability as the sample to sample
residuals for large values of x. the variability in the shoe sizes
The sum of the squared residuals is small. the variability in the heights
The residuals are approximately the variability in the residuals
normally distributed.
AP6. Refer to AP3. Which of the following is
AP3. On the first day of statistics class, data on the appropriate computation for a 95%
shoe size and height were gathered for confidence interval for the slope?
82 randomly selected female high school
1.2116 1.990 1.87862
seniors. Part of a regression analysis is given
below. Which of the following is the best 1.2116 1.990 0.1438 ___
interpretation of the P-value for shoe size? 1.2116 1.96 0.1438/80 ___
Predictor Coefficient Std Error t statistic P value
1.2116 1.990 0.1438/80 ___
Constant 55.4174 1.1681 47.443 1.2116 1.990 1.87862/80
ShoeSize 1.2116 0.1438 8.425 0.0000
s = 1.87862 AP7. In an attempt to predict adult heights,
researchers randomly selected men and
An error has been made in the data collected their heights at age two and
collection or analysis. their adult heights, then computed a
There is statistically significant evidence least squares regression equation,
of a non-zero slope in the true linear adult height 1.9 height at age two 3.5.
relationship between shoe size and height. A 95% confidence interval for the slope was
There isn’t statistically significant given as (1.8, 2.0). Which of the following
evidence of a non-zero slope in the true is the best interpretation of this confidence
interval?
794 Chapter 11 Inference for Regression
If the researchers took 100 more random The relationship doesn’t have constant
samples, they would expect 95 of the variability across all values of x. It’s not
regression equations to have a slope okay to proceed.
between 1.8 and 2.0. Try a transformation before proceeding.
95% of two-year-old boys have heights It’s clear even without a significance test
between 1.8 and 2.0. that the true slope isn’t 0.
The researchers are pretty sure that if
they studied all men, the slope of the Investigative Tasks
true regression line would be between
1.8 and 2.0. AP9. Do animals make optimal decisions? Tim
Penning of Hope College studied the strategy
If a two-year-old boy’s height is known,
his dog, Elvis, uses in retrieving a ball thrown
there’s a 95% chance that his adult height
into the water of Lake Michigan. Looking at
will be between 1.8 and 2.0 times his
the diagram in Display 11.70, suppose Tim
current height.
and Elvis stand on the edge of the lake at A
In predicting the height of an adult man and the ball is thrown to B in the lake. Elvis
using his height at age two, 95% of the could jump into the water immediately at A
errors will be between 1.8 and 2.0 inches. and swim to the ball, but he seems to know
AP8. A friend is doing a regression analysis to that he can run faster than he can swim. He
predict the mass of a small bag of fries given could run all the way to C and then swim the
the number of fries in the bag. He finds a perpendicular distance to the ball, but that
linear regression equation and then makes is not the most time-efficient strategy either.
the following residual plot. He asks your What Elvis actually does is run to a point D
advice about whether to proceed with a test and then swim diagonally to the ball at B.
of significance of the slope. Which is the But, does he determine D so as to minimize
best advice you could give? the time it takes him to get to the ball?
5
x
0
-5
-10
A y C
D
-15 Display 11.70 Diagram of paths Elvis might
30 35 40 45 50 55 60 65 70 75 follow to get from A to B.
Number_Small_Bag
Methods of calculus can be used to show
Something’s wrong here. This can’t be a that the time is minimized when y is related
residual plot. to x by the formula
The relationship is linear. It’s okay to
______x ______
y _________________
proceed with the significance test. r/s 1 r/s 1
Source DF SS MS F p
0.0
Regression 1 14.935 14.935 56.04 0.000
–0.9 Error 33 8.795 0.267
Total 34 23.730
0 2 4 6 8 10 12 14 16 18 20
x Display 11.72 Analysis of Elvis’s data.
y 0.196 x 0.33; r 2 0.63
70
South Africa 33.4 2.6
Spain 11.2 13.4 60
12 Statistics in Action:
Case Studies
60
55
50
45
Height Difference
40
35
30
25
Can you grow
bigger flowers by
reducing the length 20
of their stems? Plant
scientists designed
an experiment to 15
test different growth
inhibitors to see which
were most effective in 10
reducing the length of
stems, anticipating that 1 2 3 4 5 6 7
reduced stem growth Treatment Number
would enhance the
quality of the flower.
Statistics today is big business. Nearly every large commercial enterprise and
government agency in the United States needs employees who understand how
to collect data, analyze it, and report conclusions. The language and techniques of
survey and experimental design are part of politics, medicine, industry, advertising
and marketing, and even, as you will see, flower growing. Consequently,
a statistics course typically is required of college students who major in
mathematics or in fields that use data, such as business, sociology, psychology,
biology, and health science.
Individual plants of the same age were grown under nearly identical conditions,
except for the growth-inhibitor treatment. Each treatment was randomly assigned
to ten plants, whose heights (in centimeters) were measured at the outset of the
experiment (Hti) and after a period of 10 weeks (Htf). (Heights actually were
measured at intervening times as well, but those measurements are not part of
this analysis.)
You can find the raw data for this experiment—the treatment each of the
70 plants received and the height of each plant before and after treatment—in
Display 12.3 on page 802.
Display 12.1 shows the statistical summaries of the growth, Htf Hti , during
the 10 weeks for each of the seven treatment groups. Notice that the means
fluctuate quite a bit from group to group, as do the variances.
Group Count Mean Median Variance StdDev
01 10 30.8500 29.5000 58.4472 7.64508
02 10 35.9500 38 40.6917 6.37900
03 10 43.0500 44 25.4139 5.04122
04 10 52.7500 53.7500 18.2917 4.27688
05 10 28.4500 29.5000 58.3028 7.63563
06 10 35.9000 38.7500 58.7111 7.66232
07 10 39.7500 40.7500 20.9028 4.57196
Display 12.1 Summary statistics for plant growth, Htf Hti , for
seven treatment groups.
Display 12.2 shows the boxplots of the differences in height, Htf Hti , by
treatment. The boxplots give a better view of the key features of the data than does
the summary table.
60
50
Difference in Height
40
30
20
10
1 2 3 4 5 6 7
Treatment Number
Display 12.2 Boxplots of the differences in height (in centimeters)
of mums for the seven treatment groups.
Display 12.3 shows the raw data for this experiment.
Display 12.3 The treatment each of the 70 plants received and its
height before and after treatment. [Source: University of
Florida Institute for Food and Agricultural Sciences, 1997.]
NP BDS BLD ELEP INSP MRGP RMS OWN VEH HHL NOC R65
2 5 2 120 1400 8 2 2 1 0 0
4 3 3 100 5 3 1 1 2 0
1 1 3 90 190 200 3 1 0 1 0 1
1 1 3 50 3 3 0 2 0 1
2 2 2 90 2800 1700 7 1 2 1 0 0
4 3 2 100 500 450 5 1 2 1 1 1
4 3 2 100 400 6 2 0 1 1 0
2 2 3 90 460 330 5 1 1 2 0 1
1 3 2 80 2500 1500 4 1 1 1 0 0
1 0 3 110 2 3 1 1 0 0
2 3 1 200 760 6 2 1 1 1 0
1 3 2 80 1200 6 2 1 1 0 1
3 3 2 130 500 380 7 1 0 1 1 0
4 3 1 130 600 380 6 1 2 1 2 0
5 5 2 210 1700 940 9 1 2 1 1 0
1 2 2 200 1200 980 5 1 1 1 0 0
1 1 2 80 3 2 2 1 0 1
2 3 3 120 170 600 5 1 0 2 0 0
5 3 1 120 960 440 6 1 2 2 3 0
1 2 1 70 3 2 1 1 0 0
2 3 2 300 6300 20 7 1 2 1 0 1
2 3 2 270 800 290 7 1 1 1 0 0
2 3 2 120 5 3 1 1 0 1
1 3 2 110 500 860 5 1 0 1 0 0
2 3 3 200 200 5 2 2 1 0 1
2 2 3 60 600 600 5 1 2 1 0 1
1 1 3 60 4 3 1 2 0 0
2 2 3 90 500 760 3 1 2 2 1 0
3 3 2 110 5 3 0 1 2 0
2 3 2 50 600 1300 7 1 2 1 0 0
2 4 2 130 1200 1600 7 1 2 1 0 0
2 2 2 130 150 430 4 1 1 1 0 0
3 4 1 200 460 310 7 1 3 2 1 0
3 1 2 180 490 3 2 1 1 0 0
2 3 2 90 700 900 6 1 2 2 0 0
2 3 3 70 210 1500 5 1 1 1 0 1
1 2 3 50 160 4 2 1 1 0 1
2 2 1 90 1700 430 5 1 1 1 0 1
1 2 1 210 5 3 0 1 0 0
3 4 2 210 1900 1300 8 1 3 1 0 0
0 2 4 6 8 10 12
ChiSquareValue
0 2 4 6 8 10
RMS
250
200
ELEP
150
100
50
0
0 1 2 3 4 5 6 7 8 9 10
RMS
150
100
Residual
50
0
-50
-100
0 1 2 3 4 5 6 7 8 9 10
RMS
ELEP = 17.1RMS + 35; r2 = 0.20
Practice
Inference with Univariate Categorical Data households would contain an older resident.
P7. Suppose someone claims that a quarter of Do the data in Display 12.4 contain any
Florida households have a primary language evidence of this? Which two equivalent
other than English. Can you refute this claim significance tests could you use to answer
based on the data in Display 12.4 on page 804? this question? What concerns do you have,
if any, about conducting and interpreting
P8. Estimate with 95% confidence the
these tests?
proportion of Florida households residing
in apartments or other attached structures. P12. Is there a significant association between
primary language spoken in the household
P9. What plausible proportions of Florida
(HHL) and the type of building in which
households own their home outright?
the household is found (BLD)? If so, explain
Inference with Bivariate Categorical Data the nature of the apparent association. Are
you concerned about the accuracy of the
P10. Construct a two-way table of BLD versus
reported P-value here? Why or why not?
OWN. By looking carefully at your two-way
table but not actually conducting a test, P13. Display 12.9 (on the next page) shows 200
does it appear that these two variables are runs of the randomization distribution for c2
associated? If so, explain the nature of the values with BLD randomly paired with HHL.
association. a. Describe how this distribution was
P11. In some cultures it is common to have constructed.
extended families living in the same b. Does this distribution alleviate any
household, which might suggest that a concerns you might have about the
greater proportion of non–English speaking results of P12? Explain.
0 2 4 6 8 10 12
ChiSquareValue
Inference with Univariate P17. Estimate the difference between the mean
Measurement Data number of rooms in detached houses and in
P14. Estimate the mean number of people per apartments in a 95% confidence interval.
household in Florida, with 90% confidence.
Inference with Bivariate Measurement Data
Be sure to check the conditions first.
P18. Interpret the slope of the regression line in
P15. Focus on the monthly mortgage payments,
Display 12.8.
for those households that have them.
P19. Is the slope in Display 12.8 significantly
a. Plot the data on monthly mortgage
different from 0? Be sure to do all steps of
payments and describe the distribution.
the significance test.
Do these data appear to meet the
conditions for inference for a mean? Will P20. Make an appropriate plot and study the
a transformation help? relationship between the number of
people and the number of bedrooms in
b. In 2003, the mean monthly mortgage
the households in Display 12.4 on page 804.
payment nationwide was $840. Is there
Would you say that the number of bedrooms
statistically significant evidence that the
in a residence is a strong predictor of the
mean was less than that in Florida?
number of people in a household? Explain
P16. Florida is a hot spot for hurricanes, so your reasoning.
hazard insurance for a person’s place of
P21. One possible use of the data in Display 12.4
residence is highly recommended. Consider
is to build a model to predict the number
the yearly amounts paid for hazard insurance
of vehicles that might be associated with a
in the sampled households in Display 12.4.
household.
a. Plot the data on insurance payments
a. Which is the better predictor of the
and describe the distribution. Is a
number of vehicles attached to a
transformation needed here, in order to
household: the number of people, or the
allow you to make inferences about the
number of rooms?
mean? If so, find a transformation that
works well. b. Test for the significance of the slope in
each relationship in part a. What are your
b. Estimate the population mean of the
conclusions? Give a possible practical
appropriately transformed data in a
reason for this state of affairs.
95% confidence interval.
650
600
550 Dependent variable is: %Wins
500 No Selector
450
R squared = 11.4% R squared (adjusted) = 8.3%
400
350 s = 77.09 with 30 – 2 = 28 degrees of freedom
650
600 –120 –60 0 60 120 180 240
550 Residual (payroll, percent wins)
500
450
400 Dependent variable is: %Wins
350 cases selected according to Selected MLB payroll-tab
30 total cases of which 16 are missing
20 30 40 50 60 70 80 90 100110
R squared = 4.7% R squared (adjusted) = –3.2%
250
s = 99.54 with 14 – 2 = 12 degrees of freedom
Residual
National League
580
540
Percent Wins
Display 12.12 Percent wins versus payroll for the American League and the National League.
12.3 Baseball: Does Money Buy Success? 813
DISCUSSION Differences Between the Leagues
D14. Compare the plots in Display 12.12 for the two leagues. What is it about
these data that produces the drastic difference in results for the two leagues?
D15. Besides having a random sample, what other conditions need to be met for a
regression analysis? From the plots in Display 12.12, does it seem reasonable
to assume that these other conditions are met?
Simulating a P-Value
The null hypothesis for percent wins versus payroll is this:
The observed slope is no farther from 0 than you would be reasonably likely
to get if you randomly reassigned the values of percent wins to different teams
while keeping each team’s payroll fixed.
The P-value for the test of significance for the slope measures how unusual the
observed slope would be under those conditions. You can estimate the P-value
by repeatedly rearranging the y-values and observing what happens to the
slopes. Using this idea, Display 12.13 shows a set of 100 slopes found by 100
rerandomizations of the values of percent wins for the American League.
Stem-and-leaf of slopes for American League
Leaf Unit = 0.10
1 –2 2
7 –1 976655
18 –1 44433211000
31 –0 9988877665555
(22) –0 4333333333322211110000
47 0 000001112233333444
29 0 55566677888899
15 1 0011122444
5 1 779
2 2 01
Ecological Correlations
Each of the two major leagues is divided into three divisions, so another way
of looking at the baseball data is to explore what happens at the division level.
Display 12.15 shows the averages for the data in Display 12.10 on page 811 by
division. (All teams play a 162-game schedule, with similar numbers at bat for the
season, so it is fair to take simple averages of team batting averages and winning
percentages.)
Average % Wins
Average Payroll Attendance Batting (in tenths of
Division ($ millions) (in thousands) Average a percent)
AC 52.8 2037.6 269 481
AE 84.5 2407.2 260 474
AW 61.1 2618.5 272 565
NC 57.1 2653.7 260 484
NE 59.4 1818.6 257 494
NW 72.6 2913.6 266 519
Ecological Fallacy
Ecological correlations use groups as cases and averages as variables. For many
situations, you get quite different values than you would if you used individuals
as cases. The mistake of using ecological correlations to support conclusions
about individuals is called the ecological fallacy.
Of course, the team statistics are themselves averages (or totals), and we
could have analyzed the relationship between salaries and batting averages by
using individual players as cases. For the purpose of studying teams as business
entities—taking into account variables such as attendance and percent wins—it
makes sense to use teams as cases. Team statistics are of little use, however, in
studying player performance. So the choice of what to use as cases depends on
the objectives of the study.
Practice
Differences Between the Leagues P27. In the 2001 World Series, the New York
For P23–P25: Analyze the given relationship Yankees lost to the Arizona Diamondbacks.
Is the difference in their percentage of wins
a. for all teams
during the regular 162-game 2001 season
b. for the American League statistically significant?
c. for the National League P28. For the 2001 regular season of 162 games,
d. Compare the three relationships. the three National League division winners
P23. payroll versus attendance were Atlanta, St. Louis, and Arizona. For the
American League, the division winners were
P24. batting average versus payroll
New York, Cleveland, and Seattle.
P25. percent wins versus batting average
a. Are there statistically significant
P26. Use an appropriate test to answer these differences among the proportions of
questions about means. games won by the three National League
a. Is the difference between mean attendance division winners?
for the two leagues statistically significant? b. Are there statistically significant
b. Is the difference between mean batting differences among the proportions of
average for the two leagues statistically games won by the three American League
significant? division winners?
12.4 Martin v. Westvaco Revisited: Testing for Discrimination Against Employees 817
Comparing Termination Rates for Two Age Groups
By law, all employees age 40 or older belong to what is called a “protected class”:
To discriminate against them on the basis of their age is against the law. At the
time of the layoffs at Westvaco, 36 of the 50 people working in the engineering
department were age 40 or older. A total of 28 workers were terminated, and
21 of them were age 40 or older.
Total 22 28 50
This mismatch does not make the test invalid, however. You can still use the
test to answer this question: “If the process had been random, how likely would it
have been to get a difference in proportions as big as the one Westvaco got just by
chance?” As long as it is made clear that this is the question being answered, the
test is valid and can be very informative.
So you can proceed with a significance test, but you must make the limitations
of what you are doing very clear. If you reject the null hypothesis, all you can
conclude is that something happened that can’t reasonably be attributed to
chance alone.
Another alternative is to use Fisher’s exact test, in which the sampling
distribution of the difference of two proportions is constructed exactly rather than
relying on a normal approximation. Fisher’s exact test requires few assumptions
and uses no approximations. In contrast, the z-test for the difference between two
proportions is an approximation and requires strong assumptions. So why don’t
we always use Fisher’s exact test? We can, but it requires some computing power.
You saw a simulation of this test in E19 in Chapter 1. As technology becomes
more powerful, statisticians increasingly are turning to methods such as Fisher’s
exact test rather than using approximations based on the normal distribution.
12.4 Martin v. Westvaco Revisited: Testing for Discrimination Against Employees 819
DISCUSSION Conditions Rarely Match Reality
D22. Evaluate the conditions necessary for your significance test in D21, part c.
Give a careful statement of the conclusion you can make.
D23. Agree or disagree, and tell why: “A hypothesis test is based on a probability
model. Like all probability models, it assumes certain outcomes are random.
But in the Westvaco case, the decisions about which people to lay off weren’t
random. There’s no probability model, so a statistical test is invalid.”
Total 30 20 50 Total 22 8 30
About all you can ever show by using statistical methods in discrimination
cases is that the process doesn’t look like random selection. Statistical analysis
alerts us to questionable situations, but it cannot reconstruct the intent of the
people who did the laying off. Knowing the intent is crucial because it might
be perfectly legal: Perhaps employees in obsolete jobs were the ones picked
for termination, and it just happens that the obsolete jobs were held by older
employees.
The Westvaco case never got as far as a jury. Just before it was about to go
to trial, the two sides agreed on a settlement. Details of such settlements are not
public information, so, like many problems based on statistics, this case has no
“final answer.”
P34. Compare your analyses in D21, in which age Retained 22 46.18 11.00
40 was the cutoff age, and in P33, in which All Employees 50 48.24 12.42
age 50 was the cutoff age. Which cutoff
value, 40 or 50, leads to stronger evidence Display 12.21 Means and standard deviations
of discrimination? Which test do you think of ages for laid-off and retained
employees.
is more informative about what actually
happened—using age 40 as your cutoff age P39. Compare your two-sample t-test in P38
or using age 50? Explain your reasoning. with the randomization test in D24. How do
P35. According to the U.S. Supreme Court, if the hypotheses, the conditions you need to
you do a statistical test of discrimination, check, the sampling distributions, and the
you should reject the null hypothesis if your conclusions differ?
12.4 Martin v. Westvaco Revisited: Testing for Discrimination Against Employees 823
Appendix: Statistical Tables
Probability p
Table entry for z is the
probability lying below z.
z 0
824
Probability p
829
2 population variance box-and-whiskers plot See boxplot.
2X variance of the random variable X boxplot (or box-and-whiskers plot) A graphical
display of the five-number summary. The “box” extends
c2
test statistic for chi-square tests from the lower quartile to the upper quartile, with a
line across it at the median. The “whiskers” run from
Addition Rule For any two events A and B, the quartiles to the minimum and maximum.
P(A or B) P(A) P(B) P(A and B). If events A capture rate The proportion of confidence intervals
and B are disjoint, then P(A or B) P(A) P(B). produced by a particular method that capture the
adjacent values On a modified boxplot, the largest population parameter. See also confidence level.
and smallest non-outliers. case The subject (or unit) on which a measurement
alternative hypothesis The set of values, as compared is made.
to that of the null hypothesis, that an investigator believes categorical variable A variable that can be grouped
might contain the plausible values of a population into categories, such as “yes” and “no.” Categories
parameter in a test of significance. Sometimes called sometimes can be ordered, such as “small,” “medium,”
the research hypothesis. and “large.”
average See mean. census A collection of measurements on all units in
balanced design An experimental design in which the population of interest.
each treatment is assigned to the same number of units. Central Limit Theorem The shape of the sampling
bar chart (or bar graph) A plot that shows frequencies distribution of the sample mean becomes more normal
for categorical data as heights or lengths of bars, with as n increases.
one bar for each category. chance model See probability model.
bias The difference between the actual value of a chi-square test of homogeneity A chi-square test
parameter being estimated and the average value, over used to determine whether it is reasonable to believe
repeated sampling, of an estimator of that parameter. that when several different populations are broken
bias due to sampling See sample selection bias. down into the same categories, they have the same
proportion of units in each category.
bimodal Describes a distribution with two
well-defined peaks. chi-square test of independence A chi-square test of
the hypothesis that two categorical variables measured
binomial distribution A distribution of the random on the same units are independent of each other in the
variable X where X represents the number of successes population.
in n independent trials, with the probability of a success
the same on each trial. clinical trial A randomized experiment comparing
the effects of medical treatments on human subjects.
bins The intervals on the real number line that
determine the width of the bars of a histogram. cluster(s) On a plot, a group of data “clustering”
close to the same value, away from other groups. In
bivariate data Data that involve two variables per sampling, non-overlapping and exhaustive groupings of
case. For quantitative variables, often displayed on a the units in a population.
scatterplot. For categorical variables, often displayed in
a two-way table. cluster sampling Selecting a simple random sample
of clusters of units (such as classrooms of students)
blind Describes an experiment in which the subjects rather than individual units (students)
do not know which treatment they received.
coefficient of determination The square of the
blocking The process of setting up an experiment by correlation r. Tells the proportion of the total variation
dividing the units into groups (blocks) of similar units in y that can be explained by the relationship with x.
and then assigning the treatments at random within
each block. column chart A three-dimensional plot of frequencies
taken from a two-way table, which depicts those
blocks In an experiment, groups of similar units, with frequencies as heights of columns.
treatments randomly assigned within these groups.
830 Glossary
column marginal frequency The total of all critical value The value to which a test statistic is
frequencies across row categories for a particular compared in order to decide whether to reject the null
column of a two-way table of frequencies for hypothesis. Or, the multiplier used in computing the
categorical variables. margin of error for a confidence interval.
comparison group In an experiment, a group that cumulative percentage plot See cumulative relative
receives one of the treatments, often the standard frequency plot.
treatment. cumulative relative frequency plot (or cumulative
complement In probability, the outcomes in the percentage plot) A plot of ordered pairs in which
sample space that lie outside an event of interest. each value x in the distribution and its cumulative
completely randomized design An experimental relative frequency, that is, the proportion of all values
design in which treatments are randomly assigned to less than or equal to x, are plotted.
units without restriction. data A set of numbers or observations with a context
conditional distribution of y given x With bivariate and drawn from a real-life sample or population.
data, the distribution of the values of y for a fixed data analysis See statistics.
value of x. degrees of freedom The number of freely varying
conditional probability The notion that a probability pieces of information on which an estimator is
can change if you are given additional information. The based. For example, when using a sample to estimate
conditional probability that event A happens given that the variability in the population, the number of
P(A and B)
event B happens is given by P(A B) _______ as long independent deviations from the estimate of center.
P(B)
as P(B) 0. dependent events Events that are not independent.
__
conditional relative frequency The joint frequency deviation The difference from the mean, x x , or
in a column divided by the marginal frequency for that from some other measure of center.
column, or the joint frequency in a row divided by the disjoint events (or mutually exclusive events) Events
marginal frequency for that row. that cannot occur on the same opportunity. If event A
confidence interval A set of plausible values for a and event B are disjoint, P(A and B) 0.
population parameter, any one of which could be used distribution, data The set of values that a variable
to define a population for which the observed sample takes on in a sample or population, together with how
statistic would be a reasonably likely outcome. frequently each value occurs.
confidence level The probability that the method distribution, probability The set of values that a
used will give a confidence interval that captures the random variable takes on, together with a means of
parameter. determining the probability of each value (or interval of
confounding variables (or confounding) Two values in the case of a continuous distribution).
variables in an observational study whose effects on dot plot A graphical display that shows the values of
the response are impossible to separate. a variable along a number line.
continuous variable A quantitative variable that can double-blind Describes an experiment in which
take on any value in an interval of real numbers. neither the subjects nor the researcher making the
control group In an experiment, a group that measurements knows which treatment the subjects
provides a standard for comparison to evaluate the received.
effectiveness of a treatment; often given a placebo. ecological fallacy The mistake of using ecological
convenience sample A sample in which the units (group-level) correlations to support conclusions about
chosen from the population are the units that are easy individuals.
(convenient) to include, rather than being selected event Any subset of a sample space.
randomly.
expected value, μX or E(X) The mean of the probability
correlation A numerical value between 1 and 1, distribution for the random variable X.
inclusive, that measures the strength and direction of a
linear relationship between two variables. experimental units In an experiment, the subjects or
objects to which treatments are assigned.
Glossary 831
explanatory variable (or predictor) A variable each category, the proportion of outcomes in the
used to predict (or explain) the value of the response population that fall into that category is equal to some
variable. Placed on the x-axis in a regression analysis. hypothesized proportion.
exploratory analysis (or data exploration) An heteroscedasticity The tendency of points on a
investigation to find patterns in data, using tools such scatterplot to fan out at one end, indicating that the
as tables, statistical graphics, and summary statistics to relationship varies in strength.
display and summarize distributions. histogram A plot of a quantitative variable that
exponential relationship A relationship between groups cases into rectangles or bars. The height of the
two variables in which the response variable, y, is bar shows the frequency of measurements within the
multiplied by a constant for each unit of increase in the interval (or bin) covered by the bar.
explanatory variable, x. Mathematically, y ab x where homogeneous populations Two or more populations
a and b are constants. that have nearly equal proportions of units in each
extrapolation Making a prediction when the value of category of study.
the explanatory variable, x, falls outside the range of the hypothesis test See test of significance.
observed data.
incorrect response bias A bias resulting from
factor An explanatory variable, usually categorical, in responses that are systematically wrong, such as from
a randomized experiment or an observational study. intentional lying, inaccurate measurement devices,
first quartile, Q1 See lower quartile. faulty memories, or misinterpretation of questions.
fitted value See predicted value. independent events Events A and B for which the
five-number summary A data summary that lists the probability of event A happening doesn’t depend
minimum and maximum values, the median, and the on whether event B happens. Events A and B are
lower and upper quartiles for a data set. independent if and only if P(A B) P(A) or,
equivalently, P(B A) P(B) or, equivalently,
P(A and B) P(A) P(B).
fixed-level test A test in which the null hypothesis is
rejected or not rejected based on comparison of the test
statistic with the critical value for some predetermined inference (or inferential statistics) Using results
level of significance. from a random sample to draw conclusions about
a population or using results from a randomized
frame See sampling frame. experiment to compare treatments.
frequency (or count) The number of times a value
influential point On a scatterplot, a point that
occurs in a distribution. With categorical data, the
strongly influences the regression equation and
number of units that fall into a specific category. correlation. To judge a point’s influence, you compare
frequency table A table that gives data values and the regression equation and correlation computed first
their frequencies. with and then without the point.
Fundamental Principle of Counting If there are interpolation Making a prediction when the value of
k stages in a process, with ni possible outcomes for the explanatory variable, x, falls inside the range of the
stage i, then the number of possible outcomes for all observed data.
k stages taken together is n1n2n3 nk. interquartile range, IQR A measure of spread equal
gap On a plot, the space that separates clusters to the distance between the upper and lower quartiles;
of data. IQR Q3 Q1.
geometric (waiting-time) distribution The joint frequency The frequency within a particular
distribution of the random variable X in which X cell of a two-way table of frequencies for categorical
represents the number of trials needed to get the first variables.
success in a series of independent trials, where the judgment sample A sample selected using the
probability of a success is the same on each trial. judgment of an expert to choose units that he or she
goodness-of-fit test A chi-square test used to considers representative of a population.
determine whether it is reasonable to assume that Law of Large Numbers A theorem that guarantees
a sample came from a population in which, for that the proportion of successes in a random sample
832 Glossary
will converge to the population proportion of place where you would put a pencil point below the
successes as the sample size increases. In other words, horizontal axis in order to balance the distribution.
the difference between a sample proportion and a measure of center A single-number summary that
population proportion must get smaller (except in measures the “center” of a distribution; usually the
rare instances) as the sample size gets larger, if the mean (or average). Median, midrange, mode, and
sample is randomly selected from that population. trimmed mean are other measures of center.
least squares line See regression line. measure of spread (or measure of variability) A
level One of the values or categories making up a single-number summary that measures the variability
factor. of a distribution. Range, IQR, standard deviation, and
level of significance, ␣ The maximum P-value for variance are measures of spread.
which the null hypothesis will be rejected. median A measure of center that is the value that
line of averages See line of means. divides an ordered set of values into two equal halves.
To find it, you list all the values in order and select
line of means (or line of averages) Another term for the middle one or, if the number of values is even, the
the regression line, if points form an elliptical cloud. average of the two middle ones. If there are n values,
In theory, the population regression line contains the the median is at position (n 1) / 2. On a plot of a
means (expected values) of the conditional distribution distribution, the median is the value that divides the area
of y at each value of x. between the distribution curve and the x-axis in half.
linear shape The characteristic of an elliptical method of least squares A general approach to
cloud of points where the means of the conditional fitting functions to data by minimizing the sum of the
distributions of y given x tend to fall along a line. squared residuals (or errors).
lower quartile (or first quartile, Q1) In a distribution, midrange The midpoint between the minimum and
the value that separates the lower quarter of values from maximum values in a data set, or (max min) / 2.
the upper three-quarters of values. The median of the
lower half of all the values. minimum The smallest value in a data set.
lurking variable A variable other than those being mode A measure of center that is the value with
plotted that possibly can cause or help explain the the highest frequency in a distribution. On a plot of a
behavior of the pattern on a scatterplot. More generally, distribution, it occurs at the highest (maximum) peak.
a variable that is not included in the analysis but, once modified boxplot A graphical display like the basic
identified, could help explain the relationship between boxplot except that the whiskers extend only as far as
the other variables. the largest and smallest non-outliers (sometimes called
margin of error, E Half the length of a confidence adjacent values) and any outliers appear as individual
interval; E (critical value) (standard error). dots or other symbols.
marginal frequency The total of the joint frequencies Multiplication Rule For any two events A and B,
across row categories for a given column or across P(A and B) P(A) P(B A) P(B) P(A B).
column categories for a given row of a two-way table of If events A and B are independent, then
frequencies for categorical variables. P(A and B) P(A) P(B).
marginal relative frequency The marginal frequency mutually exclusive events See disjoint events.
of a two-way table of categorical data divided by the negative trend The tendency of a cloud of points
total frequency (number of units represented in the to slope downward as you go from left to right, or the
table). tendency of the value of y to get smaller as the value of
matched pairs design See randomized paired x gets larger.
comparison design. nonresponse bias A bias that can occur when people
maximum The largest value in a data set. selected for the sample do not respond to the survey.
__ normal distribution A useful probability distribution
mean, x A measure of center, often called the
average, computed by adding all the values of x and that has a symmetric bell or mound shape and tails
dividing by the number of values, n. On a plot, the extending infinitely far in both directions.
Glossary 833
null hypothesis The standard or status quo value population size The number of units in the
of a parameter that is assumed to be true in a test of population.
significance until possibly refuted by the data in favor population standard deviation, See standard
of an alternative hypothesis. deviation of a population.
observational study A study in which the conditions positive trend The tendency of a cloud of points
of interest are already built into the units being studied to slope upward as you go from left to right, or the
and are not randomly assigned. tendency of the value of y to get larger as the value of x
one-sided (one-tailed) test of significance A test gets larger.
in which the P-value is computed from one tail of the power of a test The probability of rejecting the null
sampling distribution. Used when the investigator hypothesis.
has an indication of which way any deviation from
the standard should go, as reflected in the alternative power relationship A relationship between
hypothesis. two variables in which the response variable, y, is
proportional to the explanatory variable, x, raised to
outlier A value that stands apart from the bulk of a power. Mathematically, y ax b, where a and b are
the data. constants.
parameter A summary number describing a predicted (or fitted) value An estimated value of the
population or a probability distribution. response variable calculated from the known value of
percentile The quantity associated with any specific the explanatory variable, x, often by using a regression
value in a univariate distribution that gives the equation.
percentage of values in the distribution that are equal prediction error The difference between the actual
to or below that specific value. The median is the 50th value of y and the value of y predicted from a regression
percentile. line. Usually unknown except for the points used to
placebo A nontreatment that mimics the treatment(s) construct the regression line, whose prediction errors
being studied in all essential ways except that it does not are called residuals.
involve the crucial component. predictor See explanatory variable.
placebo effect The phenomenon that when people probability A number between 0 and 1, inclusive
believe they are receiving the special treatment, they (or between 0% and 100%), that measures how likely
tend to do better even if they are receiving the placebo. it is for a chance event to happen. At one extreme,
plot of distribution (or graphical display or statistical events that can’t happen have probability 0. At the
graphic) A graphical display of the distribution of other extreme, events that are certain to happen have
a variable that provides a sense of the distribution’s probability 1.
shape, center, and spread. probability density A probability distribution,
point estimator A statistic from a sample that such as the normal or c 2 distribution, where x is a
provides a single point (number) as a plausible value of continuous variable and probabilities are identified as
a population parameter. areas under a curve.
__ _ __
point of averages The point (x , y ), where x is the probability distribution See distribution,
_
mean of the explanatory variable and y is the mean of probability.
the response variable. This point falls on the regression probability model (or chance model) A description
line. that approximates—or simulates—the random behavior
pooled estimate The weighted average of two of a real situation, often by giving a description of all
statistics estimating the same parameter, with the possible outcomes with an assignment of probabilities.
weights usually determined by the sample sizes or probability sample A sample in which each unit in
degrees of freedom. the population has a known probability of ending up in
population The entire set of people or things (units) the sample.
that you want to know about. protocol A written statement telling exactly how an
population regression line See true regression line. experiment is to be designed and conducted.
834 Glossary
P-value For a test, the probability of seeing a result the shape or spread but slides the entire distribution by
from a random sample that is as extreme as or more the amount c, adding c to the measures of center.
extreme than the one computed from the random rectangular distribution A distribution in which all
sample, if the null hypothesis is true. (Sometimes called values occur equally often.
the observed significance level.)
regression The statistical study of the relationship
quantitative variable (or numerical variable) between two (or more) quantitative variables, such
A variable that takes on numerical values. as fitting a line to bivariate data. (Can be extended to
quartiles Three numbers that divide an ordered set categorical variables.)
of data values into four groups of equal size. regression effect (or regression toward the mean)
questionnaire bias Bias that arises from how the On a scatterplot, the difference between the regression
interviewer asks and words the survey questions. line and the major axis of the elliptical cloud.
random sample A sample in which individuals are regression line (or least squares line or least squares
selected by some chance process. Sometimes used regression line) The line for which the sum of squared
synonymously with simple random sample. errors (residuals), SSE, is as small as possible.
random variable A variable that takes on numerical regression toward the mean See regression effect.
values determined by a chance process. relative frequency A proportion computed by
randomization (or random assignment) Assigning dividing a frequency by the number of values in the
subjects to different treatment groups using a random data set.
procedure. relative frequency histogram A histogram in which
randomized block design An experimental design the length of each bar shows proportions (or relative
in which similar units are grouped into blocks and frequencies) instead of frequencies.
treatments are then randomly assigned to units within repeated measures design See randomized paired
each block. comparison (repeated measures).
randomized comparative experiment An replication Repetition of the same treatment on
experiment in which two or more treatments are different units.
randomly assigned to experimental units for the
purpose of making comparisons among treatments. rescaling Multiplying all the values in a distribution
by the same nonzero number d. This process doesn’t
randomized paired comparison (matched pairs) change the basic shape but instead stretches or shrinks
An experimental design in which two different the distribution, multiplying the IQR and standard
treatments are randomly assigned within pairs of deviation by d and multiplying the measures of center
similar units. by d.
randomized paired comparison (repeated measures) residual (or error) For points used to construct the
An experimental design in which each treatment is regression line, the difference between the observed
assigned (in random order) to each unit. value of y and the predicted value of y, that is, y ŷ.
range A measure of spread equal to the difference residual plot A scatterplot of residuals, y ŷ, versus
between the maximum and minimum values in a predictor values, x, or versus predicted values, ŷ. A
data set. diagnostic plot used to uncover nonlinear trends in a
rare events Values or outcomes that lie in the outer relationship between two variables.
5% of a distribution or in the upper 2.5% and lower resistant to outliers Describes a summary statistic
2.5% of a distribution. Compare reasonably likely. that does not change very much when an outlier is
reasonably likely Describes values or outcomes that removed from the data set.
lie in the middle 95% of a distribution. Compare rare response variable The outcome variable used
events. to compare results of different treatments in an
recentering Adding the same number c to all the experiment or the outcome variable that is predicted
values in a distribution. This procedure doesn’t change by the explanatory variable or variables in regression
analysis. Placed on the y-axis in a regression analysis.
Glossary 835
robustness The comparative insensitivity of a shape One of the characteristics, along with center
statistical procedure to departure from the assumptions and spread, that is used to describe distributions.
on which the procedure is based. Univariate distributions sometimes have a standard
row marginal frequency The total of all frequencies shape such as normal, uniform, or skewed. Bivariate
across column categories for a particular row of a two- distributions may form an elliptical cloud. Descriptions
way table of frequencies for categorical variables. of shape should consider possible outliers, clusters,
and gaps.
sample The set of units selected for study from the
population. simple random sample, SRS A sample generated
from a sampling procedure in which all possible
sample selection bias (or sampling bias or bias samples of a given fixed size are equally likely.
due to sampling) The extent to which a sampling
procedure produces samples for which the estimate simulation A procedure that uses a probability
from the sample is larger or smaller, on average, than model to imitate a real situation. Often used to
the population parameter being estimated. compare an actual result with the results that are
reasonable to expect from random behavior.
sample space A complete list or description of
disjoint (mutually exclusive) outcomes of a chance size bias A type of sample selection bias that gives
process. units with a larger value of the variable a higher chance
of being selected.
sampling bias See sample selection bias.
skewed Describes distributions that show bunching
sampling distribution The distribution of a sample at one end and a long tail stretching out in the other
statistic under some prescribed method of probability direction. Often happens because the values “bump up
sampling. against a wall” and hit either a minimum that values
sampling distribution of a sample proportion, p̂ can’t go below or a maximum that values can’t go
The theoretical distribution of the sample proportion in above.
repeated random sampling. skewed left A skewed distribution with a tail that
__
sampling distribution of the sample mean, x The stretches left, toward the smaller values.
theoretical distribution of the sample mean in repeated skewed right A skewed distribution with a tail that
random sampling. stretches right, toward the larger values.
sampling frame (or frame) The listing of units from slope For linear relationships, the change in y (rise)
which the sample is actually selected. per unit change in x (run).
sampling with replacement In sequential sampling split stem A stem-and-leaf plot in which the leaves
of units from a population, a procedure in which each for each stem are split onto two or more lines. For
sampled unit is placed back into the population before example, if the second digit is 0, 1, 2, 3, or 4, it is placed
the next unit is selected. on the first line for that stem. If the second digit is 5, 6,
sampling without replacement In sequential 7, 8, or 9, it is placed on the second line for that stem.
sampling of units from a population, a procedure in spread See variability.
which each sampled unit is not placed back into the
population before the next unit is selected. stacked bar graph See segmented bar graph.
scatterplot A plot that shows the relationship standard deviation of a population, A measure
between two quantitative variables, usually with each of spread equal to the square root of the sum of the
case represented by a dot. squared deviations divided by n. For a probability
distribution, it is the square root of the expected
segmented bar graph (or stacked bar graph) A plot squared deviation from the mean.
in which categorical frequencies are stacked on top of
one another. standard deviation of a sample, s A measure of
spread equal to the square root of the sum of the
sensitive to outliers Describes a summary statistic squared deviations divided by n 1.
that changes considerably when an outlier is removed
from the data set. standard error The standard deviation of a sampling
distribution.
836 Glossary
standard error of the mean, x__ The standard sum of squared errors, SSE The sum of the squared
__ __
deviation of the sampling distribution of x , or /n . residuals: ∑ (y ŷ )2.
standard error of the mean (estimated) The summary statistic See statistic.
estimated standard deviation of the sampling systematic sampling with random start A sample
__ __
distribution of x , or s/n . selected by taking every nth member of the population,
standard normal distribution A normal distribution starting at a random spot—for example, having people
with mean 0 and standard deviation 1. The variable count off and then picking one of the numbers at
along the horizontal axis is called a z-score. random.
standard units, z The number of standard deviations table of random digits A string of digits constructed
a given value lies above or below the mean: in such a way that each digit, 0 through 9, has probability
1
__
value mean
z _______________ 10 of being selected and each digit is selected
standard deviation independently of the previous digits.
standardizing Converting to standard units; the t-distribution The distribution, for example, of the
two-step process of recentering and rescaling that statistic below, when the data are a random sample
turns any normal distribution into a standard normal from a normally distributed population:
distribution. __
x μ
statistic A summary number calculated from a t _____
__
s/n
sample taken from a population. For example, the
__ test of significance (or hypothesis test) A procedure
sample mean, x , and standard deviation, s, are statistics.
that compares the results from a sample to some
statistically significant Describes the situation predetermined standard in order to decide whether the
when the difference between the estimate from the standard should be rejected.
sample and the hypothesized parameter is too big to
reasonably be attributed to chance variation. test statistic Typically, in significance testing, the
distance between the estimate from the sample and the
statistics (or data analysis) The study of the production, hypothesized parameter, measured in standard errors.
summarization, and analysis of data, along with the
processes for drawing conclusions from the data. third quartile, Q3 See upper quartile.
stem-and-leaf plot (or stemplot) A graphical display treatment group In an experiment, a group that
with “stems” showing the leftmost digit of the values receives an actual treatment being studied. Compare
separated from “leaves” showing the next digit or set with control group.
of digits. treatments Conditions assigned to different groups
stemplot See stem-and-leaf plot. of subjects to determine whether subjects respond
differently to different conditions.
strata (singular, stratum) Subgroups of the population,
usually selected for homogeneity or sampling tree diagram A diagram used to calculate
convenience, that cover the entire population. See also probabilities for sequential events.
stratified random sampling. trend On a scatterplot, the path of the means of the
stratification A classification of the units in a vertical strips (conditional distributions of y given x)
population into homogeneous subgroups, known as as you move from left to right. More simply, the path
strata, prior to sampling. taken by a line through the center of the data in a
scatterplot.
stratified random sampling Stratifying the
population and then taking a simple random sample true regression line (or population regression
from within each stratum. line) The regression line that would be computed if
you had the entire population. Theoretically, the line
strength In the context of regression analysis, two through the means of the conditional distributions of y
variables are said to have a strong relationship if there given x. See also line of means.
is little variation around the regression line. If there
is a lot of variation around the regression line, the
relationship is weak.
Glossary 837
t-test A test of significance of a population mean (or upper quartile (or third quartile, Q3) In a
comparison of means) using the t-distribution. See also distribution, the value that separates the lower
test of significance and t-distribution. three-quarters of values from the upper quarter of
two-sided (two-tailed) test of significance A test values. The median of the upper half of all the values.
in which the P-value is computed from both tails of variability (or spread) The degree to which values
the sampling distribution. Used if the investigator is in a distribution differ. Measures of variability for
interested in detecting a change from the standard in quantitative variables include the standard deviation,
either direction. variance, interquartile range, and range.
two-stage sampling A sampling procedure that variability due to sampling (or variation in
involves two steps. For example, taking a random sampling) A description of how an estimate varies
sample of clusters and then taking a random sample from sample to sample.
from each of those clusters. variable A characteristic that differs from case to case
two-way table A table of frequencies that lists and defines what is to be measured or classified.
outcomes in the cells formed by the cross-classification variance A measure of spread equal to the square of
of two categorical variables measured on the same units. the standard deviation.
Type I error The error made when the null voluntary response bias The situation in which
hypothesis is true and you reject it. statistics from samples are not fair estimates of
Type II error The error made when the null population parameters because the sample data came
hypothesis is false and you fail to reject it. from volunteers rather than from randomly selected
unbiased Describes an estimator (statistic) that has respondents.
an average value in repeated sampling (expected value) voluntary response sample A sample made up of
equal to the parameter it is estimating. people who volunteer to be in it.
uniform distribution A distribution whose frequencies waiting-time problems Problems in which the
are constant across the possible values. Its plot is variable in question is the number of trials you have
rectangular. to wait until the event of interest happens. See also
unimodal Describes a distribution of univariate data geometric (waiting time) distribution.
with only one well-defined peak. z-score See standard normal distribution and
units Individuals that make up the population from standard units.
which samples may be selected or to which treatments
may be applied.
univariate data Data that involve a single variable
per case. A quantitative variable often is displayed on a
histogram. A categorical variable often is displayed on
a bar chart.
838 Glossary
Brief Answers to Selected Problems
The answers below are not complete solutions but are Section 1.2
meant to help you judge whether you are on the right
track. If you round computations in intermediate steps P4. a. about 37 out of 200, or 0.185
or use tables rather than a calculator, your numerical b. An average age this high would be easy to get
answers might not match those given exactly. just by chance, so there is no evidence of age
discrimination.
Chapter 1 P5. a. 48.6
b. Write 14 ages on cards. Draw 10 at random
Section 1.1 and find the average age. Repeat many times
P1. Older hourly workers were far more likely to be and find where 48.6 falls in the distribution.
laid off in Rounds 1–3 than were younger hourly c. about 45 out of 200, or 0.225 d. no
workers. E9. b. 24 out of 50, or 0.48
P2. a. _36 ; _36 3 __
b. __ 7
10 ; 10 c. The proportion can reasonably be attributed
c. A higher proportion of those age 50 and older to chance.
were laid off than those under age 50 (0.875 E11. C
versus 0.50). 4
E13. a. 45 b. 4 c. __
45 d. weak
d. Hourly workers. The difference in the
proportions for salaried workers (0.60 versus
0.375) is smaller than in part c. But note the
Chapter Summary
small number of hourly workers. E15. b. There is no reason to look for an explanation,
P3. hourly, although the patterns are similar because there isn’t much difference in centers
E1. 14 __
a. __ 14
b. _49 ; _59 or spreads.
27 ; 18
c. A higher proportion of those age 40 and older E17. B
were laid off than those under age 40 (0.52 E19. a. B d. 13%; no
versus 0.44). E21. a. 1001 b. 5, 6, 7, 8, or 9
455
d. age 50, because the difference in proportions c. i. 360 ii. 90 iii. 5 d. ____
1001
is greater 7
E23. a. ____ c. yes 6
d. ____
1000 1225
E3. a. The hourly workers who kept their jobs
tended to be younger than the salaried
workers who kept their jobs. Chapter 2
b. no (Salaried workers tended to be older even Section 2.1
before layoffs.)
P1. a. 1
E5. b. The percentage of those laid off who were
b. 0.5, 1, and 1.5
age 40 or older, by round, were 82%, 89%,
67%, 50%, and 0%. Most layoffs came early, c. 0.5 and 1.5
and older workers were hit harder in earlier d. 15%
rounds. e. 0.05 and 1.95
E7. a. .267; 626 or 627; 180 P2. The number of deaths per month is fairly
b. 8 _58 ; 6.30; 170; 47_12 ; 15.79; 12; 6.45 uniform, with about 190,000–200,000 per month.
Summer months have the smallest numbers of
deaths, and winter months the largest.
839
P3. a. A typical SAT math score is roughly 500, give b. Norway and Switzerland; no, they are part of
or take about 100 or so. the thinning tail.
b. A typical ACT score is about 20, give or take c. higher cluster; Eastern Europe, Asia, and the
5 or so. Middle East
c. A typical college-age woman is about d. not when economic conditions of different
65 inches tall, give or take 2.5 inches or so. continents are different
d. A typical professional baseball player in the
1910s had a single-season batting average Section 2.2
of about .260 or .270, give or take about
.040 or so. P6. Quantitative: year of birth, year of hire, and age;
categorical: row number, job title, round, and pay
P4. The middle 50% of students had GPAs between category. Month of birth and month of hire are
2.9 and 3.7, with half above 3.35 and half below. best called “ordered categories.”
P5. a. IV b. II c. V d. III e. I P7. The distribution is skewed right, with no obvious
E1. a. skewed left b. skewed right gaps or clusters and a wall at 0. The elephant
c. approximately normal d. skewed right is the only possible outlier. About half the
E5. a. each of the approximately 92 officers; the age mammals have gestation periods of more than
at which the officer became a colonel 160 days, and half less. The middle 50% have
gestation periods between 63 and 284 days. Large
b. This distribution is skewed left, with no mammals have longer gestation periods.
outliers, gaps, or clusters. The middle 50%
of the ages are between 50 and 53, with half P8. The average longevity distribution is skewed
above 52 and half below. right, with two possible outliers, while the
distribution of maximum longevity is more
c. mandatory retirement, age discrimination, uniform but has a peak at 20–30 years and a
an “up or out” rule by which if you haven’t possible outlier. The center and spread of the
been promoted beyond colonel by your 55th distribution of maximum longevity are larger.
birthday you must retire
P9. no
E7. approximately normal; too many outliers
P10. about 0.15; about 30; skewed left, with median
E9 a. For example, if a case is a business in the between 70 and 75 and the middle 50% between
United States and the variable is the number about 60 and 75
of employees, the distribution will be skewed
right. There would be a wall at 1, because P11. See P8.
that’s the smallest number of employees a P12. The cases are the individual males in the labor
business could have (and many businesses force age 25 and older. The variable is their
have only one employee). educational attainment. The proportion increases
b. For example, if a case is an AP Statistics class through the first three levels with a huge jump
and the variable is the percentage of students at the high school graduation level. Then it
who did their homework last night, the decreases except for a spike at bachelor’s degree.
distribution will be skewed left. There The distributions for males and females are
would be a wall at 100%, because in most similar in shape. Relative frequency bar charts
AP Statistics classes almost all students do account for the different numbers of males and
their homework. females.
E11. Births tend to be more frequent in the summer. P13. Westvaco laid off the majority of workers in
Rounds 1 and 2.
E13. a. Within each cluster on either side of the gap
from 18,000 to 27,000, the plot is skewed E15. a. collected by a statistics class; a penny; age of
right. If the two clusters are combined (which the penny
might not be a good idea), the median b. The shape is strongly skewed right, with
is 12,350, with the middle 50% of values a wall at 0. The median is 8 years, and the
between 2,533 and 30,355.
point pulls the right end of the regression line P26. c. ln y 5.22 0.435x; 1 e0.435 0.35, or
down, decreasing the slope and increasing the 35%, per time period
correlation.
d. some curvature, indicating a death rate of
P24. b. Residuals are 0.5, 0.5, 0, and 0. more than 0.35 in the early rolls and less than
d. A residual plot eliminates the tilt in the 0.35 in the later rolls
scatterplot so that the residuals can be seen as P27. a. log transformation
deviations above and below a horizontal line.
b. ln(pop) 54.9342 0.03583 year;
Here the symmetry of the residuals shows up
3.6% per year
better on the residual plot.
c. Florida grew less rapidly than the model
P25. a. A—IV; B—II; C—I; D—III
predicts until about 1845, then grew more
b. i. opens upward, as in II rapidly than predicted, then less, then more.
ii. fans out or in, as in I There was a big jump in growth between 1950
iii. opens downward and 1960 and a big drop in 2000.
iv. V-shaped, as in III P28. (2, 3), (1, 2), (0, 1), (1, 0); slope 1 and
y-intercept 1
c. plot D; residual plot
P29. a. (6, 3), (4, 2), (2, 1), (0, 0); slope 0.5 and
E43. a. no; not elliptical and has an influential point
y-intercept 0
b. ŷ 161.90 0.954x; r 0.49
b. (5, 4), (6, 2), (8, 2); slope 2 and
c. With Antarctica removed, the slope of the y-intercept 14
regression line changes from positive (0.954)
P30. for P28: y 10(10)x; for P29 part a: y
to negative (1.869) and the correlation
1(100.5)x 3.16 x; for P29 part b: y 1014 100 x
1 through k, say you get j. The sample consists P18. a. Brightness of the room and type of music; for
of the mortgage at position j in chronological brightness, the levels are low, medium, and
order, and every kth mortgage after that. high; for type of music, the levels are pop,
b. Pick a random sample of dates. The sample classical, and jazz.
consists of all mortgages assumed on those b. Possibilities are heart rate, blood pressure,
dates. and self-description of anxiety level.
c. Start as in part b, then take a random sample P19. a. No; the pipe and cigar smokers may be older.
of mortgages from within each cluster.
b. observational study
P13. a. Use pages as clusters.
c. smoking behavior; nonsmoking, cigarette
b. Take an SRS of pages. Then take an SRS of smoking, and pipe or cigar smoking; number
lines from each of those pages. of deaths per 1000 men per year
c. Take an SRS of characters from each line P20. Older men have a higher death rate, and the pipe
selected in part b. and cigar smokers are older; the new factor is age.
E15. a. stratified random sample with strata of P21. a. observational
grocery store owners and restaurant owners
b. legal age for driving; the age groups; highway
E17. Stratification by gender is likely to be the best death rate by state
strategy.
c. for example, driver education, because states
E19. Consider the farms as the five strata, and take a with higher age limits may generally be more
random sample of, say, 10 acres from each farm. restrictive and require more training
E21. systematic sampling with random start in both P22. a. green
cases
b. The greater distances traveled by the green
bears is due to confounding of launch
Section 4.3 order with bear color. As students got more
P14. a. children who live near a major power line practice, they were able to launch the bear
and the matching children; living near a farther.
major power line or not; whether the child c. Later launches tend to have greater distances.
gets leukemia d. Randomize the order in which the bears of
b. no, because the children are not randomly different colors are launched.
assigned to the two conditions P23. adults who died of nonrespiratory causes; yes;
c. For example, major power lines often are near a no; no
major highway (and hence in a polluted area).
Brief Answers to Selected Problems 849
P24. a. No; students might have used other clues. c. randomized paired comparison with matched
b. group with the magnets pairs, because the drug might not clear out of
the bloodstream in the time allowed between
c. whether the students were assigned randomly
treatments
to the treatments
P30. randomized paired comparison with repeated
d. The second design is better than the first.
measures
e. no; no
P31. a. There is a lot of variability in how well
P25. the two textbooks; the ten classes; five for each students memorize, so use a randomized
treatment paired comparison design with either
P26. Carnation plants (if in separate containers); put matched pairs or repeated measures. The
the plants in place and then randomly assign the response variable could be the number of
new product to half and leave the others growing words on a list that are remembered.
under standard conditions. There must be at b. Bigger people tend to eat a lot more soup
least two plants in each group, and preferably than do smaller people. Blocking could be
many more. The plants receiving the standard done by estimating a person’s weight. The
treatment can be used as a control. response variable is the amount of soup eaten.
P27. a. brand of paper towel, with levels Brand A P32. a. No; no; you can’t tell which dots represent
and Brand B, and wetness, with levels dry measurements for the same patient.
and wet; number of pennies a towel can hold
b. yes; yes
before breaking
c. Subtract the number of flicks per minute for
b. Randomize both the assignment of wet or dry
Treatment C from that for Treatment A.
to five towels from each brand and the order
of testing; the experimental units are the 20 d. Randomized paired comparison with repeated
time-towel combinations for which the tests measures; there’s a lot of subject-to-subject
will be performed. variability, as shown in Display 4.20.
c. experiment E33. completely randomized; high school students
who want to take the special course; course-taking
E23. observational study
assignment; take course or don’t take course; SAT
E25. a. yes score; no blocks
b. All ten must be dug up so that the variables E35. two weeks of a patient’s time; low phenylalanine
shade and being dug up are not confounded. diet or regular diet; one week of a patient’s time;
E27. a. dormitories b. 20 c. experiment paired comparison (repeated measures) with, we
E29. different population sizes; climate is confounded hope, randomization
with proportion of older people in the state; E37. block design; rate of finger tapping; one day of a
observational study subject’s time; caffeine, theobromine, or placebo;
E31. not unless the subjects are in random order to three days of a subject’s time
begin with E39. completely at random; quantity eaten (relative to
body weight); hornworm; diet of regular food or
Section 4.4 diet of 80% cellulose; no blocks
P28. a. Some dorms are less healthy than others (too Chapter Summary
stuffy or contaminated); some may be for
athletes or others who tend to be healthier. E41. a. sample, because the measurement process is
b. to equalize variability between the treatment destructive
groups b. census, because the population is small and
P29. a. randomized paired comparison (repeated the information is easy to get
measures) c. sample, because a census would be too costly
b. randomized paired comparison (matched and time-consuming
pairs)
850 Brief Answers to Selected Problems
1
E43. a. probably reasonably representative b. __
16
b. not representative, because too young c. Probably not; if no one can tell the difference,
c. probably reasonably representative there is 1 chance in 16 that all four will select
correctly.
e. not representative, because blood pressure
tends to increase with age P2. a. 13
__
27 , or about 0.48
E45. estimate of earnings too high b. For example, for a temperature of 20°F, the
2
probability is __ 27 , or about 0.074.
E47. owners who sold their 2000 model, who might be
the owners unhappy with high repair bills c. too warm
E49. Your estimate will be too large because your net P3. a. H1, H2, H3, H4, H5, H6, T1, T2, T3, T4,
allows tiny fish to escape. T5, T6
1
E51. New York Times readers tend to have higher b. yes c. __
12
incomes and more years of education. P4. a. 28, 35; 28, 41; 28, 47; 28, 55; 35, 41; 35, 47;
E53. Choose an SRS of states; choose an SRS of 35, 55; 41, 47; 41, 55; 47, 55
1 6
congressional districts from each state; choose an b. yes c. __
10 d. __
10
SRS of precincts from each congressional district; P5. a. yes b. no c. no d. no
choose an SRS of voters from each precinct.
P6. a. yes b. no
E55. a. observational study
P7. a. tails; tails; heads; tails; tails b. 0.44
b. The researcher can’t tell whether it was the
P8. b. 6 c. probably not
diet or some other difference in lifestyles that 1
accounts for the result. P9. a. 21 b. __
21
c. more physical activity and less stress in Greece E1. a. There are 32 outcomes.
1 5 10 10
E57. a. experiment; factors: presence of mother, b. P(0) __
32 ; P(1) 32 ; P(2) 32 ; P(3) 32 ;
__ __ __
5 1
with levels present and absent, and presence P(4) __
32 ; P(5) 32
__
of siblings, with levels present and absent; 31
c. __
32
difference in the mouse’s weight; total
30 4 8 6 11
number of baby mice E3. a. __
36
b. __
36
c. __
36
d. __
36
e. __
36
1 9 3 2
b. survey; the amounts of the ten different f. __
36
g. __
36
h. __
36
i. __
36
nutrients; the number of species of plants; 4 E5. a. disjoint and complete; __9 __
, 6 , __
1
16 16 16
c. observational study; the different types of b. not disjoint; complete
fruit; float or not; one observed unit for each
c. disjoint; not complete
type of fruit
d. disjoint; not complete
E59. a. location; urban or rural; one pair of twins
e. disjoint; not complete
b. observational study
6
E7. a. no; no; yes; no b. 6 c. __
50 , or 0.12
c. within the rural twins; yes
d. As the number of rolls increases, one
d. genetically identical; variability caused by
additional roll has a smaller effect on the
genetic differences
cumulative proportion.
E61. Form four blocks of spaces with similar locations
E9. a. 20 c. 40
in the gym: 1 and 4, 2 and 3, 5 and 8, 6 and 7.
Randomly assign two bikes within each block. E11. a. yes
b. They get the white marble on the same
1
draw; __
81 .
Chapter 5
c. Add more non-white marbles to the bags.
Section 5.1 E13. a. 1
64; __ 1
b. 46,656; _____
64 46,656
1 4 6 4
P1. a. P(0) __
16
; P(1) __
16
; P(2) __
16
; P(3) __
16
; c. 21200; no, because outcomes aren’t equally
1
P(4) __
16
likely
883
flower production, 800–803 clinical trials, 247, 251, 542–543, correct language use of, 479
Westvaco case, 3–8, 11–16, 817–822 545–546 for difference in two proportions,
cases See also experiments 518–523, 541–543
of bivariate data, 107 cluster samples, 237–238, 240 for difference of two means,
defined, 4 clusters, 36–37, 107 618–623, 629
categorical data, 692–693, 805–807 coefficient of correlation. See for difference of two means from
categorical variables correlation paired comparisons, 645–646
defined, 44 coefficient of determination, 148–151 error sources, 430
graphs of. See bar graphs column chart, 714 estimation of standard deviation
inference involving. See chi-square; column marginal frequency, 713 and, 563–568
proportion common ratio, 401 for experiments, 541–543
strength of association of, 722–723 comparative studies. See experiments; formulas for, 473–477, 519–520
cause and effect observational studies margin of error. See margin of error
correlation not implying, 147 comparing, as characteristic of for mean of population, 562–573
experiments establishing. See experiment design, 252 for mean of samples. See sample
experiments comparison group, 250–251 means, confidence intervals for
lurking variables and, 147, 243–244 complement, 288 for observational studies, 547–548
observational studies and, 247, 279 complete list, 291 outliers and, 604–609
Celsius, conversion to, 76 completely randomized design, 264, for proportions, 429–430, 475–477
census, 219, 358 265–266, 274, 643 skewness and, 602–609
center conditional distribution of y given x, for slope, 762–763, 765
defined, 27 741 for surveys, 518–523
of normal distributions, 31 conditional probability, 325–333 See also significance tests
recentering. See recentering definition of, 330 confidence level, t-values and, 566
skewed distributions and, 34 inference and, 332–333 confounding
center, measures of medical tests and, 330–331 in comparative studies, 245–247,
computation of, 56–58 Multiplication Rule and, 327–329, 248–249
mean. See mean; mean of a 330 See also bias; random sampling
population; sample mean from the sample space, 326 constant strength, 107
median. See median symbol for, 326 constant values, 76
midrange, 73, 421 conditional relative frequencies, 693, control group, 250–251
central intervals, 90–92 713 convenience sample, defined, 224
Central Limit Theorem, 430–431 conditions assigned. See factors; correlation, ecological, 815–816
chance. See random sampling treatments correlation (r), 140–151
chi-square conditions of significance tests and appropriateness of linear model,
calculator technique, 678 calculator technique, 757 144–145
critical values, table of, 827 checking, 500 cause and effect not implied by, 147
degrees of freedom and, 679–680, for difference of two means, 625, 647 and coefficient of variation (r 2),
695–696, 703–705 for difference of two proportions, 148–151
distribution of, 677–678, 684 531–532, 544 estimating, 140–141
expected frequency of, 684, fixed-level test for mean, 589 formula for, 142–144
694–695, 715 goodness-of-fit, 681, 682 interpretation of, 148–151
goodness-of-fit test, 674, 680–683, homogeneity, 697, 699 negative, 140
684–686 independence, 716, 718 positive, 140
history of, 674 for means, 589, 625 slope, relationship with, 146, 152
homogeneity test. See homogeneity, for one-sided tests, 508 steps for determining, 107–110
chi-square test of for proportions, 498, 500, 509 strength of, 140
independence. See independence, for slope, 757, 758, 759–760, 765 as term, 140
chi-square test of t-test, 820–821 transformations vs. use of, 779
sample size and, 678, 722–723 See also significance tests units absent from, 143
stacked bar graph, 693, 713–714 confidence intervals, 470–473, See also strength of association
symbol ( c 2), 675 482–483 count. See frequency, histograms and
table of values for, 679–680, 829 95% confidence interval, 473, 572 Counting, Fundamental Principle of,
test statistic, 674–675, 681 calculator technique, 476, 477 294–295
two-way table and, 692–693, 715 capture rate, 571–572, 604 critical values
z-test compared to, 685, 702–705 chart for reasonably likely events, calculator technique, 497
See also significance tests 470–472 chi-square distribution, table of,
conditions required for, 476 679–680, 829
884 Index
of confidence interval for a mean, increasing power in, 629–630 estimation
475–476 significance testing and, 624–628, of mean, 57
defined, 496 629–630, 646–647, 820–821 of median, 34
and level of significance, 496–497 difference of two proportions of parameter, 220
t-distribution, table of, 828 confidence interval for, 518–523, point estimators, 415–417
test statistic and, 496–497 541–543 of slope, 756, 764–765
See also t*-values sampling distribution of, 527–529, of standard deviation, 32, 563–568
cumulative percentage plots, 78–79 539–541 units of measurement and, 50
cumulative probabilities, 385–386, 396 significance test for, 526–535, ethics, observational studies and, 547
cumulative relative frequency plots, 544–546, 819 even numbers, and quartiles, finding,
78–79 difference, sampling distribution of, 59
445, 527–529 events
D discrete distributions, chi-square, 684 defined, 288
data disjoint events (mutually exclusive dependent. See dependent events
bivariate. See bivariate categorical events), 315–316 disjoint. See disjoint events
data; bivariate quantitative data Addition Rule and, 316–318 independent. See independent
categorical, 692–693, 805–807 defined, 291 events
elliptical, 154 property of, 317 rare, 412, 468, 593, 761
independence and. See distributions See also reasonably likely events;
independence, chi-square test of centers of. See center, measures of sample space
independent events with, 343–345 chi-square. See chi-square exact sampling distributions, 413–414
inference requirements and, 651, conditional, of y given x, 741 expected frequency, of chi-square tests,
652, 664 defined, 4 684, 694–695, 715
probability distributions from, modes of, 35–36 expected value
358–360, 361 probability. See probability of binomial distributions, 387–388
production of, 640 distributions defined, 365
quantitative (measurement), 807–808 recentering. See recentering in everyday situations, 369–370
raw, 27 rescaling. See rescaling formula for, 366
summaries. See summary statistics sampling. See sampling distributions of geometric distributions, 397–398
symmetry and, 292 shapes of. See bimodal distributions; linear transformations on, 371–375
time-series, 765 normal distributions; skewed standard deviation and, 364–367
univariate, 805–807 distributions; uniform symbol for, 365
See also paired comparisons (rectangular) distributions experiments and experiment design
decay, exponential, 179–180 summaries of, 37 balanced, 266
degrees of freedom See also graphs and graphing blind, 251
chi-square and, 679–680, 695–696, dot plots, 44 cause inferred through, 243–245,
703–705 calculator technique, 4 256, 279
difference of two means and, 621 defined, 4 characteristics of, 252–256
slope estimation and, 756, 764–765 mean, estimation of, 57 clinical trials, 247, 251
t-values and, 566, 584, 621 rounding rules and, 44 completely randomized design, 264,
dependent events software for statistics and, 44 265–266, 274, 643
defined, 339 usefulness of, 44 confidence intervals of, 541–543
matched pairs design and, 644 double-blind experiment, 251 confounding in, 245–247, 248–249
time as, 765 control groups, 250–251
See also independent events E double-blind, 251
dependent variables, 711 ecological correlations, 815–816 evidence, strength of, 617–618
design. See experiments and ecological fallacy, 816 flower production case study,
experiment design economics, sampling and, 219, 237, 800–803
determination, coefficient of, 148–151 480–481 lurking variables in, 243–244
deviations, 64–65 educational research, units and, 252 matched pairs design, 264–265,
See also standard deviation elliptical data, 154 643–644, 645
diagnostics errors observational studies contrasted to,
outliers and, 162–167, 172 degrees of freedom and, 703–705 247, 279
residual plots, 168, 172 hypotheses and, 501–506, 593–594 placebo effect, 250–251
vs. summaries, 162, 167, 172 measurement error problems, 592 protocol, 270
diagrams, tree, 294–295, 327–328 prediction, 120 random assignment, 539–541
difference of two means sum of squared, 123–125, 128 randomized block design, 268–270,
confidence intervals and, 618–623, See also margin of error; standard 274
629, 645–646 error; standard error of the mean
Index 885
randomized comparative geometric distributions, 393–399 for proportions, 498, 509
experiment, 251 calculator technique, 394 for slope, 758, 760
randomized paired comparison defined, 396 for t-tests, 589
design, 264–265, 266–268 expected value of, 397–398 See also significance tests
randomizing. See randomizing formula for, 395–396 hypothesis tests. See significance tests
repeated measures design, 265, memoryless property of, 404
644–645 shape of, 398 I
replication, 252, 253 standard deviation of, 398 incorrect response bias, 226–227
response in, 244 as waiting-time problem, 393 independence, chi-square test of,
significance tests for, 544–546 geometric sequence, 401 711–724
surveys contrasted to, 278–279, 539 glossary, 829–838 calculator technique, 718
as term, 245 goodness-of-fit test, 674, 680–683, expected frequencies in, 715
treatments. See treatments 684–686 graphical display of data, 712–714
units. See units, experimental graphs and graphing homogeneity test vs., 720–722
variability, management of, 262–263, bar graphs. See bar graphs procedure for, 716–719
269, 270–274 boxplots, 61, 62–63 sample size and, 722–723
See also observational studies; choice of, summarized, 51–52 strength of association and, 722–723
random sampling; surveys column chart, 714 independent, as term, 345
experts. See judgment sample cumulative frequency plots, 78–79 independent events, 339–345
explanation of pattern, 107, 109 dot plots. See dot plots binomial distributions and, 383,
explanatory (predictor) variable, 120 histograms. See histograms 385, 386
exploration of data, defined, 3 independence tests and, 712–714 confidence interval for difference of
exponential functions questions raised by, 50 means and, 621
growth and decay, 179–186 scatterplots. See scatterplots data and, 343–345
and log transformations, 180–187, stemplots. See stemplots definition of, 339–340, 345
193 growth, exponential, 179, 182–186 geometric distributions and, 396
regression lines and, 774, 781 Multiplication Rule for, 341–343
extrapolation, 120, 129 H random variables, 372–373
heteroscedasticity, 107, 774–775, 778 sampling with and without
F histograms, 44–47 replacement and, 345
factors, 248 back-to-back, 55 variance and, 372–373, 445
Fahrenheit, conversion to Celsius, 76 bar graphs compared to, 50 See also independence, chi-square
fallacy, ecological, 816 bars of, 44–46 test of
false negatives, 331 bins of, 44 inference, 217–218
false positives, 331 calculator technique, 46 with categorical data, 805–807
first quartile. See lower quartile mean, estimation of, 57 chi-squares and. See chi-square
Fisher, R. A., 15, 269 relative frequency, 46–47 conditional probability and, 332–333
Fisher’s exact test, 819 software for statistics and, 45, 46 confidence intervals and. See
five-number summary, 60–61 usefulness of, 46 confidence intervals
calculator technique, 63 homogeneity, chi-square test of, data requirements for, 651, 652, 664
fixed-level testing, 588–593 692–705 defined, 3
flower production case study, 800–803 categorical data with two variables observational studies and, 547, 812
frame, 224–225 and, 692–693 with quantitative data, 807–808
frequency tables degrees of freedom, 696, 703–705 randomization and, 249
calculator technique, 68, 290 expected frequencies, 694–695 significance testing and. See
chi-squares and, 692–693, 712–713 independence test vs., 720–722 significance tests
summaries from, 67–68 multiple z-tests vs. one chi-square simulation and, 12–16
See also relative frequency test, 702–703 See also cause and effect;
histograms; relative frequency procedure for, 697–702 experiments; observational
plots test statistic, 695–696, 697, 699 studies; random sampling
Fundamental Principle of Counting, horizontal axis, 83 inflection points, 31–32
294–295 hypotheses informed consent, 542
alternate. See alternate hypothesis interpolation, 120, 129
G errors and, 501–506, 593–594 interquartile range (IQR), 59–60, 61–62
Galton, Francis, 152 for goodness-of-fit test, 681, 682–683 IQR. See interquartile range
gaps, 36–37 for homogeneity test, 697, 698–699
Gauss, Carl Friedrich, 123 for independence test, 716, 718 J
generalization of pattern, 107, 108 for means, 585–587, 589, 625 joint frequency, 693, 713
genes, 490 null. See null hypothesis judgment sample, 224
886 Index
L logarithmic transformations, 193 mean of a population (μ)
Law of Large Numbers, 293 exponential functions and, 180–187, capture rate for, 571–572
least squares regression line, 123 193 confidence intervals for, 562–573
calculator technique, 124, 126, 131, power transformations and, 13, margin of error for, 572–573
141 187–189, 774–779 point estimators, 415–417
computer output for, 127–128 skewed distributions and, 605 significance testing of, 580–596. See
equation, 125 See also exponential functions also t–tests
finding, 123–124, 125–126 logistic functions, 779 symbol for, 430
as line of means, 152–153 lower quartile test statistic for, 581–582, 589, 625
linear models and, 739–741, defined, 34 means, line of. See least squares
772–773 in five-number summary, 61 regression line
point of averages of, 125 interquartile range and, 59 measurement
properties of, 125 lurking variables, 107, 109, 147, inaccurate, bias and, 227
slope of, 125, 126, 128 243–244 units of. See units of measurement
sum of squared errors (SSE), measurement data, 807–808
123–125, 128
M measurement error problems, 592
margin of error measures of center. See center,
as term, 152
confidence interval of a mean and, measures of
transformations and, 189, 774–779,
572–573 median
781
defined, 476–477 computation of, 57–58
true regression line equation, 739
sample size and, 479–481, 572 defined, 57
y-intercept of, 125, 126, 128, 739
marginal frequencies, 693 estimation of, 34
See also slope
marginal relative frequency, 693 as resistant to outliers, 79
leaves, 48
matched pairs design, 264–265, of skewed distribution, 34
Legendre, Adrien-Marie, 123
643–644, 645 spread around (interquartile range),
Leonardo da Vinci, 141
maximum 59–60, 61–62
level of confidence, 566
adjustment of sample maximum, medical experiments. See experiments
level of significance, 496–497, 500, 588
426 medical tests, 330–331
levels of factors, 248
in five-number summary, 61 midrange, 73, 421
line of averages. See least squares
range and, 61 minimum
regression line
skewed distributions and, 34 in five-number summary, 61
line of means. See least squares
Mayo, Charles, 246, 247 range and, 61
regression line __
mean (x ) skewed distributions and, 34
line of symmetry, 31, 32
calculator technique, 66 mode
linear equations
computation of, 56 defined, 31
finding, from two points, 118–119
confidence intervals for. See under of normal distributions, 31
slope-intercept form, 116–119
mean of a population; sample two or more, 35–36
transformations to find. See
mean models. See linear models; probability
transformations
defined, 32, 56 models
linear models
deviations from, 64–65 modified boxplot, 62–63
checking fit of, 771–773
estimation of, 57 Multiplication Rule, 327–329, 330
correlation and appropriateness of,
of frequency tables, 67–68 for independent events, 341–343
144–145
midrange and, 73 mutually exclusive outcomes. See
selection of, 188
of a population. See mean of a disjoint events
true regression line and, 739–741,
population
772–773
linear transformations, 75–76, 371–375
of a probability distribution. See N
expected value negative trend, 107
linearity, bivariate data and, 107, 108
regression toward, 152–153 nonresponse bias, 225
lines
of a sample. See sample mean normal distributions, 30–33
least squares. See least squares
of a sampling distribution, 430, 435, binomial distributions and, 388
regression line
459 calculator technique, 33
for prediction, 120–122
as sensitive to outliers, 79 center of, 31
regression. See least squares
significance tests for. See sample central intervals for, 90–92
regression line
means, significance tests and; common sources of, 33
as summaries, generally, 116–119,
t-tests confidence interval of a mean and,
129–130
standard error of. See standard error 568, 569
Lister, Joseph, 707
of the mean defined, 31
log-log transformations, 187–189,
as summary statistic, choice of, 74 difference of two means from paired
774–779
trimmed, 73 observations and, 645, 647
Index 887
inflection points of, 31, 32 units and, 252–254 meaningful comparisons, 650–651
mean of, 32 See also experiments; surveys significance tests and, 646–647
mode of, 31 observed significance level. See transforming non-normal data,
rare events, 412, 468, 593, 761 P-values 648–649
reasonably likely values. See odd numbers, and quartiles, finding, parameter, 220
reasonably likely events 59 Pearson, Karl, 674
sample size and use of. See sample one-sided (one-tailed) tests of percentiles, 78–79
size significance, 507–508, 594–595 permutation test, 821
sampling distributions and. See opinion polls placebo, 250–251
sampling distributions, normal methods of, 468 placebo effect, 250
distributions and questions to ask about, 482 plausible percentages, 471–472
shape of, 31, 83 See also surveys plausible values, 562–563
significance test for a mean and, 589 “or,” as term, 315 point estimators, 415–417
spread of, 31 outcomes point of averages, 125
standard. See standard normal disjoint. See disjoint events pooled estimate, 530, 629–630
distributions equally likely, 288 population
summary of, 37, 74, 92 Fundamental Principle of Counting, census of, 219, 358
symmetry line of, 31, 32 294–295 defined, 219
t-distributions and, 584 list of. See sample space frame contrasted to, 224–225
table of probabilities, 826–827 See also events mean of (μ). See mean of a
unknown percentage problems and, outliers, 36–37 population
84 bivariate data and, 107, 108, 162 parameter of, 220
unknown value problems and, 84–85 boxplots and, 62–63 sampling of. See random sampling
usefulness of, 79 confidence intervals of a mean and, size of (N). See population size
x-axis of, stretching, 83 604–609 standard deviation of. See
not A (complement), 288 defined, 36, 61 population standard deviation ()
null hypothesis, 492 influence of, 77, 162–167, 172 unknown, 358
defined, 498 and interquartile range, value of, population pyramid, 55
for difference of two means, 625, 647 61–62 population size (N)
for difference of two proportions, scatterplots and, 162–167 defined, 219
530, 532, 544, 545 replacement and, 326, 433
errors and, 501–506, 593–594 P representativeness of sample and,
for experiments, 544, 545 P-values 433–434, 479–481
for means, 585–587, 589, 625 calculator technique, 496, 585, 680 sampling distribution of the mean
for one-sided tests, 508 computing, 495–496 and, 434
P-value and, 494–495, 585–589 defined, 494 symbol for, 430
power of test and, 503–506, 594, for difference of two means, 625, population standard deviation ()
629–630 647, 650 defined, 412
randomness not involved and, 554 for difference of two proportions, estimating, 563–568
rejection of, 587–589, 593–594, 761 531, 532, 544–545 reasonably likely events and, 412
rejection of, language for, 510 for experiments, 544–545 symbol for, 430
for surveys, 530, 532 for homogeneity test, 696, 697–698 See also standard deviation; standard
symbol for, 492 for independence test, 716–717, 718 error
t-distributions and, 582–583 interpretation of, 494–495, 496 positive trend, 107
See also hypotheses for means, 585–587, 589–590, potentially influential outliers, 163–167
594–595 power functions, equation of, 188
O for one-sided tests, 507, 508 power of tests, 503–506, 594, 629–630
observational studies one-tailed, 594–595 power transformations, 190–192, 193
baseball salaries case study, 810–816 simulation of, 814 logarithms and, 187–189, 774–779
confidence interval for, 547–548 slope and, 761 regression lines and, 774, 781
confounding in, 245–247, 248–249 for surveys, 531, 532 predicted (response) variable, 120
defined, 246 t-statistic and, 593–594 predicted value, 120, 130
ethics and, 547 two-tailed, 586 prediction
experiments contrasted to, 247, 279 paired comparisons, 651–652 calculator technique, 122
factors in, 248 chance mechanisms behind, 651, error of, 120
inference and, 547, 812 652 extrapolation, 120, 129
levels of factors, 248 confidence interval and, 644, interpolation, 120, 129
significance test for, 548–549 645–646 lines used for, 120–122
surveys contrasted to, 279 designs of experiment and, 540–545
888 Index
prediction error, 120 proportion random digit table, 301–309,
predictor (explanatory) variable, 120 chi-square tests of. See chi-square 359–360, 828
probability confidence intervals and. See random number generator or table,
Addition Rule. See Addition Rule confidence intervals 30, 232, 234
assignment of, 289–291 difference of two. See difference of replacement and. See replacement,
complement event, 288 two proportions sampling with and without
conditional. See conditional of samples. See sample proportion simple random sample, 231–232, 240
probability significance tests and, 498–500, stratified random sample, 234–237,
defined, 288 509–510 240
distributions of. See probability significance tests and difference of, systematic sample with random
distributions 529–534, 544–546, 819 start, 238–239, 240
estimated, simulation used for, test statistic for, 493–494, 499–500 two-stage sample, 238, 240
301–309 test statistic for difference of, 531, See also randomizing
events and. See events; independent 532, 544 random variables, 361, 372–373
events protocol, 270 random variation, 739
Law of Large Numbers and, 293 randomization test, 821
Multiplication Rule. See Q randomized block design, 268–270,
Multiplication Rule quantitative data, 807–808 274
of number of successes, 446–448 quantitative variables randomized comparative experiment,
observed data and, 289 bivariate. See bivariate quantitative 251
P-values as, 587 data randomized paired comparison design,
of proportions, 452–453 defined, 44 264–265, 266–268
for sample means, 431–433 graphs of. See dot plots; histograms; randomizing
sample space. See sample space scatterplots; stemplots assignment in an experiment,
for sample sums, 434–436 inference involving. See least squares 539–541
sampling distinguished from, 358 regression line bias avoidance and, 231
significance test P-values. See quartiles as characteristic of experiment
P-values defined, 34 design, 252, 255, 278–279
simulation used to estimate, finding of, 59–60 importance of, 248–249
301–309 interquartile range (IQR), 59–60, observational studies and, 547, 549
subjective estimates of, 289–290 61–62 See also random sampling
symmetry and, 289, 290, 292 lower, 34, 59, 61 range, defined, 61
probability distributions, 357–399 odd number of cases, middle value rare events, 412, 468, 593, 761
binomial distributions. See binomial in, 59 raw data, 27
distributions percentiles, 78–79 Rayleigh, Lord, 36–37
calculator technique, 367 as resistant to outliers, 79 reasonably likely events, 468–469
data as basis of, 358–360, 361 upper, 34, 59, 61 chart for, 470–472
defined, 289 questionnaire bias, 225–226 defined, 412
geometric. See geometric R recentering, 75–76, 79
distributions r. See correlation of probability distributions, 371–375
linear transformations of, 371–375 random digit table, 301–309, 359–360, standardizing with, 83, 85–87
mean of. See expected value 828 See also transformations
random variables, 361, 372–373 random number generator, 30, 232, 234 reciprocal transformations, 190,
standard deviation of, 364–367, random number table, 232 605–607
371–375 random sampling, 231–240 rectangular (uniform) distributions,
table of standard normal, 826–827 bias as overcome by, 231 28–30, 37
theory as basis of, 361–364 cluster sample, 237–238, 240 regression effect (regression toward the
variance of, 365–367, 371–375 confidence interval of a mean and, mean), 152–153
probability models 568, 569 regression lines. See least squares
appropriateness of, 358 confounding as overcome by, regression line
conditional probability and, 332–333 248–249 relative frequency histograms, 46–47
construction of, 288–296 control groups, 250–251 relative frequency plots, 78–79
data and, 292, 343–344 independent selection and, 345 repeated measures design, 265,
defined, 96 lack of, experiments and, 542–543, 644–645
simulation and, 301–309 554 replacement, sampling with and
symmetry and, 292 normal distribution and, 32–33 without
wrong model, 593, 761 placebo effect, 250–251 conditional probability and, 326
probability sample, 231 independent events and, 345
Index 889
sample size vs. population size and, estimation of standard deviation stratification and, 236–237
326, 433 and, 565–568 strength of association and, 722–723
sampling distribution properties sample size, 568, 569, 572 symbol for, 430
and, 433, 444–445 skewness and outliers, effect of, t-procedures and, 568, 607–608, 629
replication, 252, 253 602–609 sample space
rescaling, 75–76, 79 sample means, significance tests and, data and symmetry and, 292
of probability distributions, 371–375 580–596 defined, 291
standardizing with, 83, 85–87 components of, 589–590 Fundamental Principle of Counting
See also transformations for difference between two means, and, 294–295
residual plots, 167–172 624–628, 629–630 samples
calculator technique, 169 differences, based on paired defined, 219
construction of, 168–169 samples, 546–547 economics and, 219, 237, 480–481
defined, 168 fixed-level testing, 588–593 generalizing from, 217–218
as diagnostic plot, 168, 172 one-tailed tests, 594–595 mean of. See sample mean
interpretation of, 170–171 P-value and, 585–587, 594–595 probability, 231
time and, 184, 193 rejection of null hypothesis, reasons for, 219, 227
types of, 171–172 meaning of, 593–594 size of. See sample size
residuals sample proportion (p̂) standard deviation of. See standard
calculator technique, 122 assigning variables, 521 deviation (SD)
defined, 120 Law of Large Numbers and, 293 sums of, probabilities of, 434–436
finding, 120–122 mean and, 450 variation in, 491
homogeneity of, in power model, pooled estimate, 530, 629–630 See also bias; random sampling
188 probabilities of, finding, 452–453 sampling bias. See sample selection bias
regression slope and, 772–773 reasonably likely, 468–469 sampling distinguished from
as sum of squared errors, 123–125, sampling distribution of, 448–453 probability, 58
128 standard deviation and, 450–451 See also random sampling
See also residual plots symbol for, 448 sampling distributions, 410–453,
resistance to outliers, 77 testing. See significance tests 458–459
response, 244 sample selection bias, 223–225 calculator technique, 412
response bias, 225–227 sample size Central Limit Theorem, 430–431
response variable, 120 binomial distributions and, 386 defined, 410
robustness, 602, 607 Central Limit Theorem and, 431 of a difference, 445
row marginal frequency, 713 chi-square and, 678, 722–723 of a difference of two proportions
confidence interval of a mean and, for a survey, 527–529
S __ 568, 569, 572 of a difference of two proportions
sample mean (x ) difference of two means and, 621, 647 for an experiment, 539–541
comparing, usefulness of, 650–651 economics and, 219, 480–481 exact, 413–414
confidence intervals. See sample estimation formula, 480–481 Fisher’s exact test, 819
means, confidence intervals and 15/40 guideline for inferences, listing samples, generation by, 417
importance of, 426 607–608 mean of, 430, 435, 459
point estimators, 415–417 Law of Large Numbers and, 293 of number of successes, 446–448,
finding probabilities of, 431–433 margin of error and, 479–481, 572 453
reasonably likely outcomes for, 412 power and Type II errors and, of the sample mean. See sampling
sample proportion as form of, 450 503–506, 594, 629 distribution of the sample mean
sampling distribution of. See replacement and, 326, 433 of sample proportion, 448–453
sampling distribution of the replications as equivalent to, 252 samples generated for. See random
sample mean representativeness of, population sampling
significance tests for. See sample size and, 433–434, 479–481 shape, center, and spread, 411–412
means, significance tests and sample proportion and, 450 simulation, generation by, 410–411,
symbol for, 430 sampling distribution and, 564 417, 428–429
use of, in text, 426 sampling distribution of the sample skewed distributions and, 428, 430,
sample means, confidence intervals mean and, 433–434, 437 431, 433–434, 564
and, 573 sampling distribution of the sum of standard deviation of. See standard
capture rate and, 571–572 a sample and, 435 error
construction of, 568–571 significance test for the difference of of the sum and difference, 445
for difference between two means, two proportions, 531 usefulness of, 410, 459
618–623, 629 skewness and outliers and, 607–608 sampling distributions, normal
differences, paired comparisons, standard error of the mean and, 431 distributions and, 411–412
645–646
890 Index
Central Limit Theorem and, errors, power of tests to avoid, sampling distributions and, 428,
430–431 503–506, 594, 629–630 430, 431, 433–434, 564
determination of shape, 453 errors, types of, 501–506 skewed left, 33
importance of, 426 for experiments, 544–546 skewed right, 33
number of successes and, 447 fixed-level, 588–593 summary of, 37, 74
sample proportion and, 450 formal language of, 498–500 transformations of. See
sample size and, 433–434, 435, 437 goodness-of-fit, 674, 680–683, transformations
sum and difference and, 445 684–686 slope
sampling distribution of the sample homogeneity test, 697–702, 720–722 confidence interval for, 762–763,
mean, 427, 436–437 hypotheses of. See hypotheses 765
and probabilities for sample means, independence test, 612–616, correlation coefficient r and, 146,
431–433 720–722 152
and probabilities for sample totals, informal, 491–493 defined, 116
434–436 level of significance, 496–497, 500, degrees of freedom and, 756,
properties of, 430–431, 433–434 588 764–765
shape, center, and spread of, for mean of population, 580–596. interpretation of, 116–119
427–431, 433–434 See also t–tests of least squares regression line, 125,
simulation of, 428–429 for mean of sample. See sample 126, 128
and sum of a sample, 434–436 means, significance tests and significance test for, 757–761, 765
sampling frame, 224–225 for observational studies, 548–549 standard error of, 741–748
scale. See rescaling one-sided (one-tailed), 507–508, symbols for (b1, 1), 125, 739
scatterplots, 106–110 594–595 test statistic for, 755–756, 758,
calculator technique, 107, 121 P-values. See P-values 760–761, 790
interpretation of, 106–107 power of tests, 503–506, 594, variability in estimated, 739–749
outliers on, 162–167 629–630 slope-intercept form, 116–119
patterns in, steps for describing, for proportions, 498–500, 509–510 software, statistical
107–110 for slope, 757–761, 765 dot plots, 44
residual plots. See residual plots statistical significance, 490, 492, 501 histograms, 45, 46
shape-changing transformations. See for surveys, 529–534 least squares line, 127–128
transformations test statistic. See test statistic quartiles, finding, 59
screening tests, 330–331 See also confidence intervals residual plots, 171
SD. See standard deviation simple random sample (SRS), 231–232, stemplots, 49
SE. See standard error 240 See also Calculator Notes references;
second quartile. See median See also replacement, sampling with calculators
segmented or stacked bar graph, 693, and without specificity, 331
713–714 simulation split stems, 48
sensitivity, 331 age-neutral, 13–15 spread
sensitivity to outliers, 77 calculator technique, 14, 306 around mean. See standard
shape, 27 defined, 13 deviation
of binomial distribution, 388 key steps in, 16 around median (interquartile
of bivariate data, 107, 108 probability distributions using, range), 59–60
See also bimodal distributions; 358–360, 361 defined, 27
normal distributions; skewed probability estimation using, of normal distributions, 31
distributions; uniform 301–309 skewed distributions and, 34
(rectangular) distributions sampling distributions generated by, square root transformation, 649–650
significance tests, 509–510 410–411, 417, 428–429 SRS (simple random sample), 231–232,
components of, generally, 500, Siri, W. E., 214 240
509–510 size SSE (sum of squared errors), 123–125,
conditions for use. See conditions of of population. See population size 128
significance tests of sample. See sample size stacked or segmented bar graph, 693,
defined, 490 size bias, 223 713–714
for difference of two means, skewed distributions, 33–35 standard deviation (SD)
624–628, 629–630, 820–821 confidence intervals and, 564, of binomial distributions, 387–388
for difference of two means from 602–609 calculator technique, 66
paired observations, 646–647 defined, 33 computation of, 66
for difference of two proportions, geometric distributions and, 398 defined, 32
529–534, 544–546, 819 median of, 34 estimation of, 32, 563–568
quartiles and, 34 formula for, 65
Index 891
of frequency table, 67–68 independence tests and, 722–723 skewness and outliers and, 602–609
of geometric distributions, 398 scatterplots and, 107, 108 slope and, 756
model checking and, 772–773 strength of evidence, 617–618 transformations and, 605–607
of populations. See population Strutt, John William, 36–37 two samples vs. paired design,
standard deviation () sum (∑), as symbol, 56 640–647
of probability distributions, sum of squared errors (SSE), 123–125, t-tables, 565–566, 828
364–367, 371–375 128 t-tests, 589–593
of sample proportion, 450–451 summary statistics conditions for, 589, 820–821
of sampling distributions. See bias and, 415, 416 paired data vs. single sample in,
standard error (SE) of bivariate data, 107–109, 116–119 640–641
as sensitive to outliers, 79 calculator technique, 66 use of, 640
as summary statistic, choice of, 74 choice of, generally, 4–5, 28, 74–75 t*-values
symbols for, 430 defined, 12 calculator technique, 566, 584
variance and. See variance diagnostics vs., 162 confidence intervals based on,
standard error of the mean distribution of. See sampling 565–573, 618–623, 762–763
defined, 430 distributions confidence level and, 566
sample size and, 431 of distributions, 37 defined, 565
symbol for (x__), 430 five-number, 60–61, 63 degrees of freedom and, 566, 584,
standard error (SE) outliers, influence of, 77 621
defined, 412 point estimators, 415–417 outliers and, 604
of difference of two proportions, precise, 415 P-values and, 593–594
519–520 proportion. See sample proportion significance testing and, 624–628,
of estimated slope, 741–748 recentering of, 75–76, 79 760–761
of the number of successes, 447 rescaling of, 75–76, 79 t-distributions, 582–584, 828
point estimators and, 415 See also center, measures of; t-tables, 565–566, 828
of the sample proportion, 450 diagnostics; spread z-values compared to, 583
of sampling distributions, 435, 459 summary tables, 6 tables
of the sum of a sample, 435 surveys chi-square values, 679–680, 829
standard normal distributions, 83–85 American Community Survey frequency. See frequency tables
defined, 83 (ACS) case study, 803–808 probability distribution, 365–366,
solving unknown percentage and bias and. See bias 375, 824–825
value problem, 87–90 causal inferences and, 278, 279 random digits, 301–309, 359–360,
standardizing, 83, 85–87 experiments contrasted to, 278–279, 830
table of probabilities, 826–827 539 random number, 232
unstandardizing, 86 parameters and, 220 reading vertically, 4
See also z-score random sampling for. See random summary, 6
standard normal probabilities, table of, sampling t-tables, 565–566, 828
824–825 samples used in, reasons for, two-way, 692–693, 715
standard units. See z-score 219–220 z-table, 84–85, 826–827
standardizing, 85–87 significance tests for, 529–534 tail probability, 586
statistic, as estimate from sample, 220 See also experiments; observational test statistic (t)
statistical significance, 490, 492, 501 studies chi-square tests, 674–675, 681
stem-and-leaf plot. See stemplots symbols, glossary of, 831–832 critical values and, 496–497
stemplots, 47–50 symmetry defined, 493–494
back-to-back, 48 defined, 33 for difference of two means, 625,
rounding rules and, 49 line of, 31, 32 647, 650
software for statistics and, 49 of normal distribution, 31 for difference of two proportions,
split stems, 48 probability and, 289, 290, 292 531, 532, 544
truncation of values, 49 systematic sample with random start, error types and, 501–502, 593–594
usefulness of, 49 238–239, 240 of experiments, 544
stems, 48 goodness-of-fit test, 681, 683
strata, defined, 235 T homogeneity test, 695–696, 697, 699
stratification, defined, 235 t-distribution, table of critical values, for independence test, 716, 718
stratified random sample, 234–237, 826 for mean of a population, 581–582,
240 t-distributions, 582–584, 826 589, 625
strength of association t-procedures normal distribution of, 531
correlation and. See correlation pooled vs. unpooled, 629–630 for one-sided tests, 508
defined, 722 as robust, 602, 607 paired differences, 647
sample size and, 568, 607–608, 629
892 Index
for proportions, 493–494, 499–500 U independent selection and, 372–373,
for slope, 755–756, 758, 760–761, uniform (rectangular) distributions, 445
790 28–30, 37 pooling and, 630
for surveys, 531, 532 unimodal distributions, 35 of a probability distribution,
symbol for, 494 units, experimental, 252–255 365–367, 371–375
as t-statistic, 631 blocks, 267, 268–270, 273, 274 sampling distributions of the sum
use of, 499 defined, 252 and difference and, 445
as z-score, 494 units of measurement variation
See also t-procedures; z-score calculator technique exploring coefficient of determination and,
testing, samples and, 219 changes in, 76 148–151
tests of significance. See significance correlation as quantity without, 143 random, 739
tests estimation and, 50 in sampling, 491
theoretical probability, probability standardizing. See z-score varying strength, 107
distributions from, 361–364 summary statistics and changes in, vertical axis, of histograms, 45
third quartile. See upper quartile 75–76 voluntary response bias, 224, 246
time units of population, 219
as dependent data, 765 univariate data
W
and patterns of data, 179, 193 waiting-time problems, 393
categorical, 805
plots of, 822 weak association, 107, 108
measurement, 807
transformations and, 184, 193 See also correlation; strength of
unknown percentage problems, 84
transformations, 179, 193 association
recentering and rescaling and, 83
calculator technique, 192 Westvaco case study, 3–8, 11–16,
solving, 87–90
exponential functions and. See 817–822
unknown values problems, 84–85
exponential functions whiskers, 61, 62
recentering and rescaling and, 83
heteroscedasticity and, 774–775, 778 solving, 87–90 X
linear vs. nonlinear, 179 upper quartile x-axis, 83
log-log, 187–189, 774–779 defined, 34
logarithmic. See logarithmic in five-number summary, 61 Y
transformations interquartile range and, 59 y-axis, histograms and, 45
of paired data, 648–649 y-intercept, symbols for (b0, 0), 125,
power. See power transformations V 739
recentering. See recentering variability See also slope-intercept form
reciprocal transformations, 190, as challenge in data analysis, 4
605–607 interquartile range and, 59 Z
regression lines and, 189, 774–779, linear models and, 741 z-score, 83
781 management of, 262–263, 269, calculator technique, 84, 85, 89
rescaling. See rescaling 270–274 chi-square compared to, 685,
of skewness and outliers, data with, point estimators and, 415, 417 702–705
605–607, 648–649 slope and, 739–749 comparisons using, 87
square root transformations, variables computation of, 86
649–650 bivariate. See bivariate quantitative confidence intervals and, 476–477
treatments data correlation and, 142–143
balanced design and, 266 categorical. See categorical variables defined, 86
defined, 244 defined, 4 finding value with known, 86
variability between vs. within, identification of, 107 significance testing and, 819
262–263, 270–272 lurking, 107, 109, 147, 243–244 t-values compared to, 583
tree diagrams, 294–295, 327–328 predictor (explanatory), 120 test statistic as, 494
trend, 107, 108 quantitative. See quantitative unstandardizing, 86
trimmed mean, 73 variables using z-table, 84–85
truthfulness, surveys and, 226–227 random, 361, 372–373 z*. See critical values
two-stage sample, 238, 240 response (predicted), 120
two-way tables, 692–693, 715 variance
Type I error, 501–503, 506, 593 calculator technique, 367
Type II error, 501–506, 594 defined, 65
Index 893
Photo Credits
Chapter 1
2: John Olson/Corbis. 4: Zigy Kaluzny/Getty Images. 10: Jeff Greenberg/The Image Works.
Chapter 2
26: Romeo Gacad/Stringer/Getty Images. 31: Cheryl Fenton. 34: Kennan Ward/Corbis. 36: Emilio Segrè Visual Archives/
American Institute of Physics. 39: Steve Dipaola/Reuters/Corbis. 41: Louie Psihoyos/Corbis. 43: TIM FITZHARRIS/Minden
Pictures. 46: B.S.P.I./Corbis. 52: Frans Lanting/Corbis. 53: A. Ramey/PhotoEdit. 64: Cheryl Fenton. 82: Alan Schein
Photography/Corbis. 88: Royalty-Free/Corbis. 89: Anna Peisl/zefa/Corbis.
Chapter 3
104: Getty Images. 111: Getty Images. 112: David Cooper/Toronto Star/ZUMA/Corbis. 113: Royalty-Free/Corbis. 117: Cheryl
Fenton. 125: Frank Cezus/Getty Images. 131: Getty Images. 132: James A. Sugar/Corbis. 138: Charles O’Rear/Corbis. 149:
Cheryl Fenton. 152: Royalty-Free/Corbis. 157: Reuters/Corbis. 159: Scott Smith/Corbis. 160: Stephen Mallon/Getty Images.
163: Laura Murray. 173: Frank Trapper/Corbis. 175: Galen Rowell/Corbis. 180: Cheryl Fenton. 187: Tony Anderson/Getty
Images. 193: Alan Kearney/Getty Images. 194: Alan Schein Photography/Corbis. 195: Sean Garnsworthy/Getty Images.
Chapter 4
216: Fredrik D. Bodin/Stock Boston/PictureQuest. 219: Cheryl Fenton. 220: Tom McCarthy/PictureQuest. 224: CALVIN AND
HOBBES © 1995 Watterson. Reprinted with permission of UNIVERSAL PRESS SYNDICATE. All rights reserved. 226: © The
New Yorker Collection 1989 Mick Stevens from cartoonbank.com. All rights reserved. 227: CALVIN AND HOBBES © 1995
Watterson. Reprinted with permission of UNIVERSAL PRESS SYNDICATE. All rights reserved. 232: Larry Gonick. 250: Scott
Adams/Dist. by United Feature Syndicate, Inc. 258: Cheryl Fenton. 260: Matt Perry. 261: Michael Melford/Getty Images.
264: Cheryl Fenton. 270: Cheryl Fenton. 271: Cheryl Fenton. 276: Vince Streano/Corbis. 277: David Young Wolff/Getty Images.
Chapter 5
286: Hulton Deutsch Collection/Corbis. 292: Cheryl Fenton. 295: Cheryl Fenton. 297: Leonard de Selva/Corbis. 299: Cheryl
Fenton. 318: Richard Cummins/ Corbis. 325: Matthew Polak/Corbis. 335: MLB Photos via Getty Images. 344: Cheryl Fenton.
Chapter 6
356: Paul A. Souders/Corbis. 366: Jose Luis Pelaez, Inc./Corbis. 386: Hulton Archive/Getty Images. 387: Cheryl Fenton.
392: Natalie Fobes/Corbis. 393: AFP/Corbis. 400: Pierre Vauthey/Corbis. 403: António Rafael C. Paiva. 404: Tom Stewart/
Corbis.
Chapter 7
408: The Image Works. 416: NASA. 418: NASA. 420: Scott Smith/Corbis. 423: Richard Hamilton Smith/Corbis. 438: Bettmann/
Corbis. 439: Royalty-Free/Corbis. 448: Martyn Goddard/Corbis. 455: Brooks Kraft/Corbis. 456: Chuck Savage/Corbis.
460: Keren Su/ Corbis.
Chapter 8
466: Getty Images. 470 (top): Getty Images; (bottom) Cheryl Fenton. 473: Nancy Kaszerman/ZUMA/Corbis. 485: James W.
Porter/Corbis. 491: Cheryl Fenton. 494: From: Causal knowledge and imitation/emulation switching in chimpanzees (Pan troglodytes)
and children (Homo sapiens), Animal Cognition; Victoria Horner and Andrew Whiten, Springer Berlin/Heidelberg; Volume, 8,
Number 3/July 2005, pp. 164181. 512: Michael Newman/PhotoEdit, Inc. 513: Eric O’Connell/Getty Images. 517: Peter Horree/
Alamy. 525: Paul A. Souders/Corbis. 527: Cheryl Fenton. 538: Engel & Gielen/Getty Images. 553: Bettmann/Corbis.
Chapter 9
560: Robert Essel NYC/Corbis. 569: Colin Cuthbert/Photo Researchers, Inc. 591: Roger Ressmeyer/Corbis. 611: NASA.
614: Michael Newman/PhotoEdit, Inc. 617: Raymond Gehman/Corbis. 621: Jose Luis Pelaez, Inc./Corbis. 636: Dann Coffey/
Getty Images. 640: Cheryl Fenton. 658: Bonnie Kamen/PhotoEdit, Inc. 661: Bettmann/Corbis.
Chapter 10
632: Robert Maass/Corbis. 678: Andres Liivamagi/Alamy. 691: Connie Ricca/Corbis. 692: Cheryl Fenton. 698: Reuters/Corbis.
700: Getty Images. 708: Kate Powers/Getty Images. 717: Peter Turnley/Corbis.
Chapter 11
736: AP/Wide World Photos. 737: NASA. 745: Getty Images. 754: Alan Schein Photography/Corbis. 756: Peter Menzel/Stock Boston/
PictureQuest. 766: NASA. 769: Per Eriksson/Getty Images. 775: Steve Maslowski/Photo Researchers, Inc. 777: Liba Taylor/Corbis.
Chapter 12
798: Roger Ball/Corbis. 800: Michelle Garrett/Corbis. 810: MLB Photos via Getty Images. 812: Getty Images.
894