Sie sind auf Seite 1von 78

Moore-212007

pbs

November 27, 2007

9:50

. SOLUTIONS. . .TO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .EXERCISES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ODD-NUMBERED . . . . . . . . . . . . . . . . . . . .


CHAPTER . . . . . . . . . . .1
1.1 1.3 Answers will vary. (a) The bars can be drawn in any order since major is a categorical variable. (b) The percents add to 98.5%, so a pie chart would not be appropriate without a category for Other. The histogram is unimodal and fairly symmetric except for a high outlier (Toyota Prius at 51). The range of the data without the outlier is 19 to 37 mpg. (a) See solutions to Exercise 1.5. (b) The Rolls-Royce Phantom (19 mpg) and Mercedes-Benz E55 AMG (21 mpg) have the lowest mileage. However, there are no low outliers. (a) It is fairly symmetric with one low outlier. Excluding the low outlier, the spread is from about 18% to 18%. (b) About 1%. (c) Smallest is about 18% and the largest is about 18%. (d) About 35% to 40%. From the stemplot, the center is $28 and the spread is from $3 to $93. There are no obvious outliers. Examination of the stemplot shows the distribution is clearly right-skewed. With or without split stems, the conclusions are the same. (a) and (b) are bar graphs. (c) One example of how to do this is by using the clustered bar graph on SPSS. (a) The top ve are Texas, Minnesota, Oklahoma, Missouri, and Illinois. The bottom ve are Alaska, Puerto Rico, Rhode Island, Nevada, and Vermont. (b) The distribution is strongly unimodal and right-skewed with a large peak in the 0 damage < 10 group. The range is 0 to 90. The three states with the most damage (Texas, Minnesota, and Oklahoma) may be outliers. (c) Answers will vary. (a) Alaska has 5.7% and Florida has 17.6% older residents. (b) The distribution is unimodal and symmetric with a peak around 13%. Without Alaska and Florida, the range is 8.5% to 15.6%. GM had more complaints than Toyota each year, but both companies seem to have fewer complaints in general over time. (a) The individuals are the different cars. (b) The variables are vehicle type (categorical), transmission type (categorical), number of cylinders (can be considered categorical because the number of cylinders divides the cars into only a few categories), city MPG (quantitative), and highway MPG (quantitative). (a) The different public mutual funds are the individuals. (b) The variables category and largest holding are both categorical, while the variables net assets and year-to-date return are quantitative. (c) Net assets are in millions of dollars, and year-to-date return is given as a percent. Some possible variables are cost of living, taxes, utility costs, number of similar facilities in the area, and average age of residents in the location.

1.5 1.7

1.9

1.11

1.13 1.15

1.17

1.19 1.21

1.23

1.25

S-1

Moore-212007

pbs

November 27, 2007

9:50

S-2

SOLUTIONS TO ODD-NUMBERED EXERCISES

1.27

The distribution of dates of coins will be skewed to the left because as the dates go further back there are fewer of these coins currently in circulation. Various left-skewed sketches are possible. (a) Many more readers who completed the survey owned Brand A than Brand B. (b) It would be better to consider the proportion of readers owning each brand who required service calls. In this case, 22% of the Brand A owners required a service call, while 40% of the Brand B owners required a service call. Some possible variables that could be used to measure the size of a company are number of employees, assets, and amount spent on research and development. The stemplot shows that the costs per month are skewed slightly to the right, with a center of $20 and a spread from $8 to $50. America Online and its larger competitors were probably charging around $20, and the members of the sample who were early adopters of fast Internet access probably correspond to the monthly costs of more than $30. When you change the scales, some extreme changes on one scale will be barely noticeable on the other. Addition of white space in the graph also changes visual impressions. (a) Household income is the total income of all persons in the household, so it will be higher than the income of any individual in the household. (c) Both distributions are fairly symmetric, although the distribution of mean personal income has two high outliers. The distribution of median household incomes has a larger spread and a higher center. The center of the distribution of mean personal income is about $25,000, while the center of the distribution of median household income is about $37,000. The means are $19,804.17, $21,484.80, and $21,283.92 for black men, white females, and white males, respectively. Since we have not taken into account the type of jobs performed by individuals in each category or years employed, we cannot make claims of discrimination without rst adjusting for these factors. The medians are $18,383.50, $19,960, and $19,977 for black men, white females, and white males, respectively. The medians are smaller, but our general conclusions are similar. Because of strong skewness to the right, the median is the lower number of $330,000 and the mean is $675,000. For Asian countries the ve-number summary is 1.3, 3.4, 4.65, 6.05, 8.8; and for Eastern European countries it is 12.1, 1.6, 1.4, 4.3, 7.0. Side-by-side boxplots show that the growth of per capita consumption tends to be much higher for the Asian countries than for the Eastern European countries and that the growth of per capita consumption for the Eastern European countries is much more spread out. (a) The mean is $983.5. (b) The standard deviation is $347.23. (c) Results should agree except for number of digits retained. A rare, catastrophic loss would be considered an outlier, and averages are not resistant to outliers. The ve-number summary is more appropriate for describing the distributions of data with outliers. (a) The ve-number summary is 5.7, 11.7, 12.75, 13.5, 17.6. (b) The IQR is 1.8. Any low outliers would have to be less than 9, and any high outliers would have

1.29

1.31 1.33

1.35

1.37

1.39

1.41

1.43 1.45

1.47 1.49

1.51

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-3

to be greater than 16.2. Therefore, Florida and Alaska are denitely outliers, but so is the state with 8.5% older residents. 1.53 (a) The ve-number summary is 0, 0.75, 3.2, 7.8, 19.9. (b) The IQR is 7.05. Any low outliers would have to be less than 9.825, and any high outliers would have to be greater than 18.375. The histogram shows that the U.S., Australia, and Canada all have CO2 emissions much higher than the rest of the countries, but only the U.S. is ofcially considered an outlier using the 1.5 IQR rule. (a) The 5th percentile is approximately at the 748th position. The 95th percentile is approximately at the 14,211th position. (b) The 5th percentile value is approximately $10,000. The 95th percentile value is approximately $140,000. (a) The distribution is skewed left with a possible low outlier. (b) Use stems from the tens place as 2, 1, 0, 0, 1, 2, 3. The median is $27.855. The mean is $34.70 and is larger than the median because the distribution is skewed right. (a) The mean change is x = 28.767% and the standard deviation is s = 17.766%. (b) Ignoring the outlier, x = 31.707% and the standard deviation is s = 14.150%. The low outlier has pulled the mean down toward it and increased the variability in the data. (c) Identical in this case probably means that the 5-liter vehicles were all the same make and model. You will learn about the effects of outliers on the mean and median by interactively using the applet. From the boxplot, the clear pattern is that as the level of education increases, the incomes tend to increase and also become more spread out. Ignoring the District of Columbia, the histogram of violent crimes is fairly symmetric, with a center of about 450 violent crimes per 100,000. The spread is from about 100 to 1000 violent crimes per 100,000 for the 50 states, with the District of Columbia being a high outlier with a rate of slightly over 2000 violent crimes per 100,000. Both data sets have a mean of 7.501 and a standard deviation of 2.032. Data A has a distribution that is left-skewed while Data B has a distribution that is fairly symmetric except for one high outlier. Thus, we see two distributions with quite different shapes but with the same mean and standard deviation. (a) The ve-number summary is 0.9%, 3.0%, 4.95%, 6.6%, 14.2%. (b) The mean is larger because the distribution is moderately skewed to the right with a high outlier. (a) A histogram is better because the data set is moderate size. (b) The two low outliers are 26.6% and 22.9%. The distribution is fairly symmetric, with a median, or center, of 2.65%. The spread is from 16.5% to 24.4%. (c) The mean is 1.551%. At the end of the month, you would have $101.55. (d) At the end of the month, you would have $74.40. Excluding the two outliers gives a mean of 1.551% and a standard deviation of 8.218%. Both have changed. Quartiles and medians are relatively unaffected. $2.36 million must be the mean since fewer than half of the players salaries were above it.

1.55

1.57 1.59 1.61

1.63 1.65 1.67

1.69

1.71

1.73

1.75

Moore-212007

pbs

November 27, 2007

9:50

S-4

SOLUTIONS TO ODD-NUMBERED EXERCISES

1.77 1.79 1.81 1.83 1.85 1.87 1.89 1.91 1.93 1.95

(a) 1, 1, 1, 1. (b) 0, 0, 10, 10. (c) There are several answers for (a) but (b) is unique. You can use the Uniform distribution for an example of a symmetric distribution, and there are many choices for a left-skewed distribution. (a) Mean is C; median is B. (b) Mean is A; median is A. (c) Mean is A; median is B. (a) 0.025. (b) 64 inches to 74 inches. (c) 16%. Eleanors z-score is 1.8, while Geralds z-score is 1.5, so Eleanor scored relatively higher. (a) 0.0505. (b) 0.0454. (c) 0.0505. (a) 0.5948. (b) 452. (c) 712. The plot suggests no major deviations except for a possible low outlier. Answers will vary. (a) No, this apartment building is no longer an outlier. Its not even one of the top two data points now. It ts in perfectly with the rest of the data. (b) The distribution looks fairly normal except for 2 new high outliers (#11 and #13). It looks much better than the distributions in 1.94. Answers will vary. For right-skewed data, the normal quantile plot will show the highest and lowest points below the diagonal. For left-skewed data, the normal quantile plot will show the highest and lowest points above the diagonal. For symmetric data, the points will follow the diagonal fairly closely, even at the ends. (a) Mean = 0.0224, standard deviation = 0.2180. (b) x s = (0.2404, 0.1956); x 2s = (0.4584, 0.4136); x 3s = (0.6764, 0.6316). (c) 72.7%, 90.9%, and 100%. The distribution is not exactly normal.

1.97

1.99

1.101 The stemplot shows a fairly symmetric distribution with a high and low outlier. The normal quantile plot shows a fairly good t to the diagonal line in the middle, but the two outliers stick out again. 1.103 0.65 to 2.25 grams per mile driven. 1.105 (a) 0.0122. (b) 0.9878. (c) 0.0384. (d) 0.9494. 1.107 0.1711 or 17.11%. 1.109 (a) Between 21% and 47%. (b) 0.2236. (c) 0.2389. 1.111 (a) The rst and third quartiles for the standard Normal distribution are approximately 0.67. (b) The rst quartile is about 255 days, and the third quartile is about 277 days. 1.113 Normal quantile plot looks fairly linear. 1.115 The histogram is fairly rectangular and the Normal quantile plot is S-shaped. 1.117 (a) The mean is 64.49. (b) The beginning years are all below average until January 1966. (c) The values are high (in the 80s) between 1/00 and 7/00, but then the last month to be above average is 4/01. Then there is a severe drop with no recovery. (d) The dot.com economy was still booming in January 2000. As 2001

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-5

progressed, the dot.com economy started to crash, and September 11, 2001, did bad things to the economy as well. 1.119 (a) Min = 1, Q 1 = 1, M = 1, Q 3 = 1. (b) The box has a length of 0. This makes sense because Q 1 , M, and Q 3 are all the same. (c) The box plot indicates very, very right, skewed data. Almost all the data was 1. 1.121 (a) 11/22 = 50%. (b) Negative revenue growth means the previous years revenue was higher than this years revenue. (c) 0.82%. (d) The top 25% of all telecom companies had revenue growth greater than 0.12475. 1.123 Gender and automobile preference are categorical. Age and household income are quantitative. 1.125 The distribution is right-skewed. The median salary is $1.6 million and the mean salary is slightly over $3.5 million, being pulled up by the strong right-skewness of the distribution. The spread goes from $200 thousand to slightly over $12 million. 1.127 (a) For normal corn, M = 358 grams, Q 1 = 337 grams, Q 3 = 400.5 grams, the minimum is 272 grams, and the maximum is 462 grams. For new corn, M = 406.5 grams, Q 1 = 383.5 grams, Q 3 = 428.5 grams, the minimum is 318 grams, and the maximum is 477 grams. Overall, new corn gives higher weight gains. (b) For normal corn, x = 366.30 and s = 50.80, and for the new corn x = 402.95 and s = 42.73. The mean weight gain for the new corn is 36.65 grams larger than for the normal corn. 1.129 (a) From the histogram or the boxplot, it is clear that the distribution is skewed to the right, with three high outliers. (b) M = 37.8 thousand barrels and x = 48.25 thousand barrels. Since the distribution is skewed right and has three high outliers, we expect the mean to be substantially larger than the median. (c) M = 37.8 thousand barrels, Q 1 = 21.5 thousand barrels, Q 3 = 60.1 thousand barrels, the minimum is 2 thousand barrels, and the maximum is 204.9 thousand barrels. The box containing the rst quartile, the median, and the third quartile is fairly symmetric, with the third quartile being slightly farther from the median, indicating some right-skewness. The maximum is much farther from the median than the minimum, which could suggest right-skewness or just a high outlier. The boxplot and the histogram clearly show the right-skewness, but it is difcult to be certain of this from only the ve-number summary. 1.131 The mean is = 250 and the standard deviation is = 175.78. 1.133 The range of values is extreme, making plotting difcult. Because of the rightskewness, the mean of the populations is 583,994 and is clearly quite far from the center of the data. The median is 159,778. Division into three groups is fairly arbitrary. A natural way to proceed is to use cutoffs that correspond to gaps in the data. For example, the group of the most populous counties might consist of the 8 counties with populations over 1 million, and we would sample from all of these counties. The next break is not as clear, but it could be taken at 100,000. There are 27 counties with populations between 100,000 and 1 million, and we could sample households from half of these counties. Finally, there are 23 counties with populations below 100,000, and a smaller fraction of these could be sampled. 1.135 Answers will vary. For the mean, the distribution should look fairly symmetric with a center at about 20. Because of the small number of observations, the Normal quantile plot may not be that smooth, but it shouldnt appear to deviate

Moore-212007

pbs

November 27, 2007

9:50

S-6

SOLUTIONS TO ODD-NUMBERED EXERCISES

from a straight line in a systematic way. For the standard deviation, the distribution should look slightly skewed to the right with a center close to 5. The Normal quantile plot should not look nearly as much like a straight line as when plotting the 20 values of x, suggesting the distribution of s is not Normal.

CHAPTER . . . . . . . . . . .2
2.1 (a) Time spent studying is the explanatory variable, and grade on the exam is the response. (b) Explore the relationship between the two variables. (c) Yearly rainfall is the explanatory variable, and the yield of a crop is the response variable. (d) Many factors affect salary and the number of sick days one takes. It may be most reasonable to simply explore the relationship between these two variables. (e) Economic class of the father is the explanatory variable, and that of the son is the response. Hand wipe is the explanatory variable, and whether or not the skin appears abnormally irritated is the response. Both are categorical variables. (a) The city gas mileage is approximately 62 MPG, and the highway mileage is 68 MPG. (b) The pattern is roughly linear. Highway gas mileage tends to be between 5 and 10 MPG greater than city mileage. (c) It appears to fall roughly on the line dened by the pattern of the remaining points. (a) Speed is the explanatory variable and would be plotted on the x axis. (b) At rst, fuel used decreases, and then, at about 60 km/h, it increases as speed increases. At very slow speeds and at very high speeds, engines are very inefcient and use more fuel. (c) In the scatterplot, both low and high speeds correspond to high values of fuel used, so we cannot say that the variables are positively or negatively associated. (d) The points lie close to a simple curved form, so the relationship is reasonably strong. Explanatory variable: parent income. Response variable: money student borrows. Both variables are quantitative and probably have a negative association because the more money the parents have, the less a student will probably need to borrow to pay for college tuition. (a) The means for the market sectors are: consumer (30.960), nancial services (32.760), natural resources (23.317), and technology (54.314). (b) Technology was the best place to invest in 2003. (c) No, because market sector is a categorical variable. (a) Explanatory variable = accounts, response variable = assets. (c) Strong, positive, and linear, but two points stand out at the top right corner of the plot. (d) Charles Schwab and Fidelity. (a) The longer the duration, the greater the decline. The plot shows a positive association. (b) A straight line that slopes up from left to right describes the general trend reasonably well. The association is not very strong. (c) This bear market appears to have had a decline of 48% and a duration of about 21 months. (a) If a person has a high income, his or her household income will also be high, so personal and household incomes are positively associated. We therefore expect mean personal income for a state to be positively associated with its median household income. Household incomes will always be greater than or equal to

2.3 2.5

2.7

2.9

2.11

2.13

2.15

2.17

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-7

the incomes of the individuals in the household. We therefore expect median household income in a state to be larger than mean personal income in that state. (b) If the distribution of incomes in a state is strongly right-skewed, we know that the mean will be larger than the median. Although mean household income must be larger than mean personal income, and median household income must be larger than median personal income, median household income could be smaller than mean personal income. (c) The overall pattern of the plot is roughly linear. The variables are positively associated. The strength of the relationship (ignoring outliers) is moderately strong. (d) The two points are the District of Columbia and Connecticut. The District of Columbia is a city with a few very wealthy inhabitants and many poorer inhabitants. Many very wealthy people who work in New York City live in Connecticut, resulting in a high mean income. We expect the distribution of both personal and household incomes to be strongly right-skewed. 2.19 (a) The overall pattern is linear, the association is positive, and the strength is moderately strong. Hot dogs that are high in calories are generally high in sodium. (b) Brand 13 has the fewest calories, so this probably corresponds to Eat Slim Veal Hot Dogs. (a) Business starts is taken as the explanatory variable, so it is represented on the horizontal axis. (b) The association is positive. (c) Of the four points in the right side of the plot, Florida is the one with the fewest failures. (d) The outlier is California. (e) The four states outside the cluster are California, Florida, New York, and Texas. These states are scattered throughout the country. They are the most populous states in the country. (a) The means are: Consumer = 7.66, Financial services = 35.68, Technology = 24.03, Utilities and natural resources = 25.4. (b) Financial services and Utilities and natural resources were good places to invest. (c) Market sector is categorical, so we cannot speak of positive or negative association between market sector and total return. (a) In this case, neither variable is necessarily the explanatory variable. We arbitrarily put City MPG on the horizontal axis of our plot. (b) x = 18.87, y = 29.07, sx = 0.3055, s y = 0.3215, and r = 0.8487. (c) The association is positive and is reasonably strong. (a) High values of duration go with high values of decline, so the association is positive. The points are not tightly clustered about a line, so the correlation is not near 1. (b) The correlation in Figure 2.2 is closer to 1 than the correlation in Figure 2.6. The points in Figure 2.2 appear to lie more closely along a line than do those in Figure 2.6. Computation shows r = 0. The correlation r measures the strength of a straightline relationship. (a) Explanatory variable = price. The pattern is strong, positive, and linear. (b) For price, the mean is 50.000 and the standard deviation is 16.325. For deforestation, the mean is 1.738 and the standard deviation is 0.928. The correlation is 0.955. (a) In 2.31 we found that the scatterplot shows a strong, positive, linear relationship and the correlation is 0.955. (b) Answers will vary, but results should look fairly random and without pattern. (c) The correlation should be close to 0.

2.21

2.23

2.25

2.27

2.29 2.31

2.33

Moore-212007

pbs

November 27, 2007

9:50

S-8

SOLUTIONS TO ODD-NUMBERED EXERCISES

2.35 2.37

(a) 0.968. (b) 0.707. (c) These points are apparently not outliers because the correlation dropped quite a bit when they were removed. There is a straight-line pattern that is fairly strong. The correlation, computed from statistical software, is r = 0.898. There do not appear to be any extreme outliers from the straight-line pattern. (a) and (b) The plots look very different. (c) The correlation between x and y is 0.253. The correlation between x and y is also 0.253. This is not surprising: r does not change when we change the units of measurement of x, y, or both. The magazines report implies that there is a negative correlation between compensation of corporate CEOs and the performance of their companys stock, not that there is no correlation. A more accurate statement might be that in companies that compensate their CEOs highly, the stock is just as likely to perform well as to not perform welllikewise for companies that do not compensate their CEOs highly. In Figure 2.8, after removing the outliers, the remaining points appear to be more tightly clustered around a line. In Figure 2.2, the outlier accentuates the linear trend because it appears to lie along the line dened by the other points. After its removal, the linear trend is not as pronounced. (a) Gender is a categorical variable. (b) A correlation of 1.09 is not possible. (c) The correlation has no unit of measurement. (a) y = 1.0892 + 0.188999x. (b) x = 22.31, sx = 17.74, y = 5.306, s y = 3.368, sy r = 0.995, b = r sx = 0.1889, and a = y bx = 1.0916. (a) The percent is 35.5% because r 2 = (0.596)2 = 0.355. (b) y = 6.08% + 1.707x. (c) y = 9.08%. The least-squares regression line passes through the point (x, y). Thus, we would predict y = y = 9.07% when x = x = 1.75%. (a) The scatterplot shows a curved pattern. (b) No. The scatterplot suggests that the relation between y and x is a curved relationship, not a straight-line relationship. (c) Using a calculator we found the sum to be 0.01. (d) The residuals show the same curved pattern. (a) y = 2.95918 + 6.74776x. Observation 1 is not very inuential. (b) r 2 increases from 0.660 to 0.779. Observation 1 is an outlier, and after its removal, the remaining points appear more tightly clustered about the least-squares regression line. (a) The number y in inventory after x weeks must be y = 96 4x. (b) The graph shows a negatively sloping line. (c) No. After 25 weeks the equation predicts that there will be y = 4. No, you could not make an accurate prediction of stock returns because the R 2 is so low and the scatterplot shows a very weak association between Treasury bills and stock returns. (a) Slightly parabolic, opening downward, very weak. (b) R 2 = 2.3%, which is very low. The regression line with year does not do a good job of explaining the variation in returns. (a) There is a fairly strong, positive, linear relationship between appraised value and selling price. (b) y = 127.270 + 1.047x. The predicted selling price is 967.6 thousand dollars for a unit appraised for $802,600.

2.39

2.41

2.43

2.45 2.47 2.49

2.51

2.53

2.55

2.57

2.59

2.61

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-9

2.63

(a) Slope = b = 0.0832, and intercept = a = 17.1275. (b) y = 17.1275 + 0.0832x. If 793.6 is substituted for x, then y will be 48.9. Therefore, Fact 3 is true for this least-squares line. (c) R 2 = 93.76%. R 2 is the percent of the variation in the values of total assets that is explained by the least-squares regression line of total assets on number of accounts. (a) The predicted assets for DLJ Direct with 590 accounts will be 31.849. (b) The actual assets are 11.2, so the prediction is an overestimate. The actual assets the predicted assets = 20.76946 = the residual. (c) The residual is 20.76946. Yes. (a) R 2 = 93.4%, a = 1,859,988, and b = 0.879. (b) R 2 = 93.4%, a = 1.860, and b = 0.879. (c) Only the intercept is affected by the change in units. (a) Yes, there is a fairly strong, positive, linear relationship. (b) R = 0.928, R 2 = 86.1%. R 2 is the percent of the variation in the selling price that is explained by the least-squares regression line of selling price on appraisal values. (c) y = 127.270 + 1.047x. (d) The predicted selling price will be $848,391. (e) The mean selling price is $848,130, and the mean appraisal is $688,750. If the correlation has increased to 0.8, then there is a stronger relationship between American and European stocks. When American stocks have gone down, European stocks have tended to go down also, so European stocks have not provided much protection against losses in American stocks. (a) y = 15.464 + 0.858x. (b) r 2 is 39.5%. (c) The predicted value is y = 28.334%. The observed decline for this particular bear market is 14%, so the residual is 14.334%. (a) r = 0.999993, so the calibration does not need to be done again. (b) y = 1.6571 + 0.113301x. For x = 500, y = 58.3. We would expect the predicted absorbance to be very accurate based on the plot and the correlation. Experiment with the applet on the Web and comment. (a) Spaghetti and snack cake appear to be outliers. (b) For all 10 points, y = 58.5879 + 1.30356x. Leaving out spaghetti and snack cake, y = 43.8814 + 1.14721x. (c) The points, taken together, are moderately inuential. (a) b = 0.16 and a = 30.2. (b) We would predict Julies nal exam score to be 78.2. (c) r 2 = 0.36. With 64% of the variation unexplained, the least-squares regression line would not be considered an accurate predictor of nal-exam score. r 2 = 0.64. Smaller raises tended to be given to those who missed more days, and so the variables are negatively associated. Thus, r = 0.8. We would predict Octavio to score 4.1 points above the class mean on the nal exam. (a) Number of Target stores = 0.525483 + 0.382345 (number of Wal-Mart stores). (b) There are 254 Wal-Mart stores in Texas, so we predict number of Target stores = 96.590147. There are 90 Target stores in Texas, so the residual = 6.590147. (c) Number of Wal-Mart stores = 30.3101 + 1.13459 (number of Target stores). (d) There are 90 Target stores in Texas, so we predict number of Wal-Mart stores = 132.4232. There are 254 Wal-Mart stores in Texas, so the residual = 121.5768.

2.65

2.67 2.69

2.71

2.73

2.75

2.77 2.79

2.81

2.83 2.85 2.87

Moore-212007

pbs

November 27, 2007

9:50

S-10

SOLUTIONS TO ODD-NUMBERED EXERCISES

2.89

We would expect the correlation for individual stocks to be lower. Correlations based on averages (such as the stock index) are usually too high when applied to individuals. The reasoning assumes that the correlation between the number of reghters at a re and the amount of damage done is due to causation. It is more plausible that a lurking variable, namely the size of the re, is behind the correlation. No. Correlation does not imply causation. The most seriously ill patients may be sent to larger hospitals rather than smaller hospitals, and this could account for the observed correlation. Answers will vary but may include: age, years of education, years of experience in job, and location. (a) The story states that these are the ten kinds of business that employ the MOST people so it would not be possible to have GREATER x-values than the ones listed in this data. (b) Yes, this is an extrapolation because it is very unlikely that a kind of business that has 0 employees in 1997 would have 1.86 million employees only 5 years later. Intelligence or support from parents may be a lurking variable. More generally, factors that may lead students to take more math in high school may also lead to more success in college. If lurking variables are present, then requiring students to take algebra and geometry may have little effect on success in college.

2.91

2.93

2.95 2.97

2.99

2.101 (a) If the consumption of an item falls when price rises, we should see a negative association. However, the plot shows a modest positive association. (b) y = 44.6954 + 9.59163x. r 2 = 0.365. (c) There are periodic uctuations with two peaks around 1977 and 1986. There also appears to be an overall downward trend over time. 2.103 (a) r 2 = 0.9101. This is the fraction of the variation in daily sales data that can be attributed to a linear relationship between daily sales and total item count. (b) We would expect the correlation to be smaller. Correlations based on averages are usually higher than the correlation one would compute for the individual values from which an average was computed. 2.105 Intelligence or family background may be lurking variables. Families with a tradition of more education and high-paying jobs may encourage children to follow a similar career path. 2.107 (a) The residuals are more widely scattered about the horizontal line as the predicted value increases. The regression model will predict low salaries more precisely because lower predicted salaries have smaller residuals and hence are closer to the actual salaries. (b) There is a curved pattern in the plot. For very low and very high numbers of years in the majors, the residuals tend to be negative. For intermediate numbers of years the residuals tend to be positive. The model will overestimate the salaries of players that are new to the majors, will underestimate the salaries of players who have been in the major leagues about 8 years, and will overestimate the salaries of players who have been in the majors more than 15 years. 2.109 (a) The data describe 5375 students. (b) The number who smoke is 1004, so the percent who smoke is 18.68%.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-11

(c) Neither parent smokes One parent smokes Both parents smoke Total Percent 2.111 Neither parent One parent Both parents smokes smokes smoke % students who smoke 13.86% 18.58% 22.47% 1356 25.23% 2239 41.66% 1780 33.12%

The data support the belief that parents smoking increases smoking in their children. 2.113 (a) Female Accounting Administration Economics Finance 30.22% 40.44% 2.22% 27.11% Male 34.78% 24.84% 3.73% 36.65%

The most popular choice for women is administration, while for men accounting and nance are the most popular. (b) 46.54% did not respond. 2.115 (a) Percent of Hospital A patients who died = 3%. Percent of Hospital B patients who died = 2%. (b) Percent of Hospital A patients who died who were classied as poor before surgery = 3.8%. Percent of Hospital B patients who died who were classied as poor before surgery = 4%. (c) Percent of Hospital A patients who died who were classied as good before surgery = 1%. Percent of Hospital B patients who died who were classied as good before surgery = 1.3%. (d) If you are in good condition before surgery or if you are in bad condition before surgery, your chance of dying is lower in Hospital A, so choose Hospital A. (e) The majority of patients in Hospital A are in poor condition before surgery, while the majority of patients in Hospital B are in good condition before surgery. Patients in poor condition are more likely to die, and this makes the overall number of deaths in Hospital A higher than in Hospital B, even though both types of patients fare better in Hospital A. 2.117 The marginal distribution for payment method is: cash (0.351), check (0.155), and credit card (0.495). The conditional distribution of payment method for impulse purchases is: cash (0.452), check (0.129), and credit card (0.419). The conditional distribution of payment method for planned purchases is: cash (0.303), check (0.167), and credit card (0.530). For impulse purchases, cash is most likely. For planned purchases, credit card is most likely. For both types of purchases, check is least likely. Answers will vary for explaining the choice of payment method for impulse purchases. 2.119 (a) The percent is 20.40%. (b) The percent is 9.85%. 2.121 There are 1716 thousands of older students. The distribution is 2-year part-time % older students 7.05% 2-year full-time 43.59% 4-year part-time 13.75% 4-year full-time 35.61%

Moore-212007

pbs

November 27, 2007

9:50

S-12

SOLUTIONS TO ODD-NUMBERED EXERCISES

Older students tend to prefer full-time colleges and universities to part-time colleges and universities, with 79.2% of all older students enrolled in some fulltime institution. 2.123 (a) Hired Applicants < 40 Applicants 40 6.44% 0.61% Not hired 93.56% 99.39%

(b) A graph shows results similar to those in (a). (c) Only a small percent of all applicants are hired, but the percent (6.44%) of applicants who are less than 40 who are hired is more than 10 times the percent (0.61%) of applicants who are 40 or older who are hired. (d) Lurking variables that might be involved are past employment history (why are older applicants without a job and looking for work?) or health. 2.125 (a) Desipramine Relapse No relapse 41.67% 58.33% Lithium 75% 25% Placebo 83.33% 16.67%

(b) The data show that those taking desipramine had fewer relapses than those taking either lithium or a placebo. These results are interesting but association does not imply causation. 2.127 (a) The sum is 59,920 thousand. The entry in the Total column is 59,918 thousand. The difference may be due to rounding off to the nearest thousand. (b) Never married Married Widowed Divorced 21.05% 57.69% 10.54% 10.73%

2.129 (a) A combined two-way table is Admit Male Female (b) Converting to percents gives Admit Male Female 70% 56% Deny 30% 44% 490 280 Deny 210 220

(c) The percents for each school are as follows: Business Admit Deny Male Female 80% 90% 20% 10% Male Female Law Admit 10% 33% Deny 90% 67%

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-13

(d) If we look at the tables, we see that it is easier to get into the business school than the law school, and more men apply to the business school than the law school, while more women apply to the law school than the business school. Because it is easier to get into the business school, the overall admission rate for men appears relatively high. It is hard to get into the law school, and this makes the overall admission rate for women appear low. 2.131 (b) There is a fairly strong, positive, linear association with an outlier from point Professor #9 (102,300, 144,200). (c) Professors #3 and #5 have 2002 salary similar to this one. Professor #16 has a 2005 salary similar to this one. Professor #9 does not follow the overall pattern and would be considered an outlier. (d) Professor #15 has the highest 2002 and 2005 values, but the data point follows the general trend of the other data. It would not be considered an outlier. 2.133 (d) The association looks stronger as the range of axes increases because the points look closer together. 2.135 (a) 0.966. (b) 0.966. (c) 0.966. (d) The calculations are the same, which is a general fact. Units of measure dont change the correlation between two variables. (e) The rounding didnt have an effect for this data because the part of each data point which was rounded off was a small percentage of the actual data value. 2.137 (a) If a professor had a 2002 salary of $0, his or her 2005 salary would be $27,459.44. This is not a practical interpretation because no professor would have had a 2002 salary of $0. (c) The zero intercept line completely misses all the data because it is too low. The estimated intercept raises the least-squares regression line to the right height to pass through the data. 2.139 (a) With all data points: y = 27459.441 + 0.896x, R 2 = 75.6%, s = 8710.554. With #15 removed: y = 21681.387 + 0.952x, R 2 = 75.6%, s = 8765.396. (b) No, #15 is not an inuential observation because R 2 , s, and the slope hardly changed at all when it was removed. 2.141 (a) 0.676. (b) Amps Average weight

10 9.5

12 10.5

13 11.5

14 12.5

15 11.56

(c) 0.877. The correlation of amps with average weight is greater than the correlation of amps with the individual weights. 2.143 High interest rates are weakly negatively associated with lower stock returns. Because many other factors (lurking variables) affect stock returns, one should not conclude from these data that the observed association is evidence that high interest rates cause low stock returns. 2.145 Removing this point would make the correlation closer to 0. The point is an outlier in the horizontal direction, and its location strongly inuences the regression line. 2.147 Suppose the return on Fund A is always twice that of Fund B. Then if Fund B increases by 10%, Fund B increases by 20%, and if Fund B increases by 20%, Fund A increases by 40%. This is a case where there is a perfect straight-line relation between Funds A and B. 2.149 (a) and (b) The overall pattern shows strength decreasing as length increases until length is 9, then the pattern is relatively at with strength decreasing only

Moore-212007

pbs

November 27, 2007

9:50

S-14

SOLUTIONS TO ODD-NUMBERED EXERCISES

very slightly for lengths greater than 9. (c) The least-squares line is Strength = 488.38 20.75 Length. A straight line does not adequately describe these data because it fails to capture the bend in the pattern at Length = 9. (d) The equation for the lengths of 5 to 9 inches is 283.1 3.4 Length. The equation for the lengths of 9 to 14 inches is 667.5 46.9 Length. The two lines describe the data much better than a single line. I would want to know how the strength of the wood product compares to solid wood of various lengths. For what lengths is it stronger, and are these common lengths that are used in building? 2.151 Let x denote degree-days per day and y gas consumed per day. x = 21.544, sy sx = 13.419, y = 558.889, s y = 274.383, and r = 0.989. Thus, b = r sx = 20.222 and a = y bx = 123.226. The equation of the regression line is gas use = 123.226 + 20.222 (degree-days). The slope has units of cubic feet per degree-day. To nd the equation of the regression line for predicting degree-days from gas use, we interchange the roles of x and y and compute b = 0.048 and a = 5.283. The equation of the regression line is degree-days = 5.283 + 0.048 (gas use). The slope has units of degree-days per cubic feet. 2.153 (a) Selling price = 189,226 1334.49 age. (b) For a house built in 2000, age is 0 and selling price = 189,226. For a house built in 1999, age is 1 and selling price = $187,891.51. For a house built in 1998, age is 2 and selling price = $186,557.02. For a house built in 1997, age is 3 and selling price = $185,222.53. We see that for each 1-year increase in age the selling price drops by $1334.49. (c) 1900 (age = 100) is within the range of the data used to calculate the leastsquares regression line, and 1899 (age = 101) is almost within the range, so we would probably trust the regression line to predict the selling price of a house in these years. 1850 (age = 150) is well outside the range of the data, and we would not trust the regression line to predict the selling price of such a house. The regression line predicts a house that is 150 years old to have a selling price = 189,226 1334.49 150 = $10,947.50; a negative value makes no sense! (d) r = 0.682. The association is negative, indicating that older houses are associated with lower selling prices and newer houses with higher selling prices. 2.155 We convert the table entries into percents of the column totals to get the conditional distribution of heart attacks and strokes for the aspirin group and for the placebo group. Aspirin group Fatal heart attacks Other heart attacks Strokes 0.09% 1.17% 1.08% Placebo group 0.24% 1.93% 0.89%

The data show that aspirin is associated with reduced rates of heart attacks but a slightly increased rate of strokes. Association does not imply causation. However, the design of the study is such that it is difcult to identify lurking variables. Doctors were assigned at random to the treatments (aspirin or placebo), so there should be no systematic differences in the groups that received the two treatments. 2.157 (a) The marginal distribution of opinion about quality is: higher (0.368), same (0.241), lower (0.391). The proportions for higher and lower are very close,

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-15

with the proportion for lower slightly bigger. These are the proportions of opinion in general, not separated out for buyers and nonbuyers. (b) The conditional distribution of opinion for buyers is: higher (0.556), same (0.194), and lower (0.250). The conditional distribution of opinion for nonbuyers is: higher (0.299), same (0.258), and lower (0.443). For buyers, the majority voted for higher. For nonbuyers, the largest proportion was for lower. For both groups, the smallest proportion was for same. Yes, we can conclude that using recycled lters causes more favorable opinions.

CHAPTER . . . . . . . . . . .3
3.1 It is an observational study because information is gathered without imposing a treatment. The explanatory variable is consumers gender, and the response variables are whether or not the individual considers the particular features essential in a health plan. Many variables (lurking variables) could have changed over the ve years to increase the unemployment rate, such as a recession or factories leaving the area. (a) All adult U.S. residents. (b) U.S. households. (c) All regulators from the supplier, or just the regulators in the last shipment. We have used two-digit labels, numbering across rows. The selected sample is 04Bowman, 10Fleming, 17Liao, 19Naber, 12Gates, and 13Goel. Label the retailers from 001 to 440. The selected sample includes the retailers numbered 400, 077, 172, 417, 350, 131, 211, 273, 208, and 074. Label the midsize accounts 001, 002, . . . , 500 and select an SRS of 25 of these. Label the small accounts 0001, 0002, . . . , 4400 and select an SRS of 44 of these. First select 5 midsize accounts and, continuing from where you left off in the table, select 5 small accounts. The rst 5 midsize accounts have labels 417, 494, 322, 247, and 097, and the rst 5 small accounts have labels 3698, 1452, 2605, 2480, and 3716. You would expect that the higher rate of no-answer was probably during the second period, as more families are likely to be gone on vacation. (a) To make this an experiment, randomly assign 25 members to a control group that just has them visiting a health club and 25 members to a treatment group which has them visiting a health club and working with a trainer. The unseen bias involved in using the study as written in the exercise is that currently individuals have decided on their own to work with the trainer. These individuals may be more health conscious, more actively training for some athletic event, or more seriously trying to lose weight. (b) To make this an experiment, start with a sample of tellers, all of the same experience and ability, and randomly select half to participate in the advanced training while the other half get no additional training. An alternate way of doing this is to treat it as a matched pairs experiment where pairs of tellers with similar backgrounds and experience are paired up, with one from each pair completing the training and the other not. The unseen bias in the study as written in the exercise is that since the tellers were allowed to select themselves for advanced training, these are probably the tellers who are more experienced or more interested in advancing their careers.

3.3 3.5 3.7 3.9 3.11

3.13 3.15

Moore-212007

pbs

November 27, 2007

9:50

S-16

SOLUTIONS TO ODD-NUMBERED EXERCISES

3.17

(a) The site is a true random number generator which uses little variations in amplitude in atmospheric noise picked up by a radio. (b) There are testimonials on the Web site. (c) and (d) Answers will vary. (a) The committee intends for the population to be all local businesses, but in actuality the population they are using is all local businesses listed in the telephone book. (b) The sample is 150 randomly selected businesses. (c) 51.33%. (d) Answers will vary. Some businesses list only in the yellow pages, and some only in the white pages. Some businesses are listed by a persons name. Some businesses may only use cell phones or Web sites and choose not to be listed in the phone book at all. How did the committee know which names in the phone book were businesses? (a) The population eating-and-drinking establishments in the large city. (b) The population is the congressmans constituents. (c) The population is all claims led in a given month. This is a voluntary response sample, and persons with strong opinions on the subject are more likely to take the time to write. We have used two-digit labels, numbering across rows. The selected sample is 12B0986, 04A1101, and 11A2220. Various labels are possible. We labeled tracts numbered in the 1000s one to six, in the 2000s seven to eighteen, and in the 3000s nineteen to forty-four. The labels and blocks selected are 21 (block 3002), 18 (block 2011), 23 (block 3004), 19 (block 3000), and 10 (block 2003). (a) The selected number is 35, so the systematic sample includes clusters numbered 35, 75, 115, 155, and 195. (b) A simple random sample of size n would allow every set of n individuals an equal chance of being selected. A systematic sample doesnt allow every set of n individuals a chance of being selected. The clusters numbered 1, 2, 3, 4, and 5 would have no chance of being selected in a systematic sample. Label the alphabetized lists 001 to 500 for the females and 0001 to 2000 for the males. The rst 5 females are those with the labels 138, 159, 052, 087, and 359. Continuing in the table, the rst 5 males are those with the labels 1369, 0815, 0727, 1025, and 1868. (a) A national system of health insurance should be favored because it would provide health insurance for everyone and would reduce administrative costs. (b) The elimination of the tenure system at public institutions should be considered as a means of increasing the accountability of faculty to the public, while at the same time the question of whether such a move would have a deleterious effect on the academic freedom so important to such institutions cannot be ignored. Subjects are the 300 sickle cell patients, the factor is the type of medication given, and the treatments are the two levels of medication. The response variable is the number of pain episodes reported. (a) The individuals are different batches and the response is the yield. (b) There are two factorstemperature and stirring rateand six treatments. Lay them out in a diagram like that in Figure 3.3. (c) Twelve batches, or individuals, are needed.

3.19

3.21

3.23 3.25 3.27

3.29

3.31

3.33

3.35

3.37

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-17

3.39

To assign the 20 pairs to the four treatments, give each of the pairs a two-digit number from 01 to 20. The rst 5 selected are assigned to Group 1, the next 5 to Group 2, and the next 5 to Group 3. The remaining 5 are assigned to Group 4. Group 1 is pairs numbered 16, 04, 19, 07, and 10; Group 2 is 13, 15, 05, 09, and 08; Group 3 is 18, 03, 01, 06, and 11. The remaining 5 pairs are assigned to Group 4. Use a diagram like those in Figures 3.4 and 3.5 to describe the design. If charts or indicators are introduced in the second year, and the electric consumption in the rst year is compared with the second year, you wont know if the observed differences are due to the introduction of the chart or indicator or due to lurking variables. A statistically signicant result means that it is unlikely that the salary differences we are observing are just due to chance. This experiment suffers from lack of realism and limits the ability to apply the conclusions of the experiment to the situation of interest. (a) The subjects and their excess weights, rearranged in increasing order of excess weight, are listed with the columns as the 5 blocks and the 4 subjects in each block labeled 1 to 4. With these conventions, Regimen A = Williams, Smith, Obrach, Brown, Birnbaum; Regimen B = Moses, Kendall, Loren, Stall, Wilansky; Regimen C = Hernandez, Santiago, Brunk, Jackson, Nevesky; Regimen D = Deng, Mann, Rodriguez, Cruz, Tran. (a) This is a completely randomized design with two treatment groups. (b) Answers will vary using software, but using line 131 on Table B, Group 1 will contain: Dubois (05), Travers (19), Chen (04), Ullman (20), Quinones (16), Thompson (18), Fluharty (07), Lucero (13), A (02), Gerson (08). Group 2 will then contain the remaining subjects: Abate, Brown, Engel, Gutierrez, Hwang, Iselin, Kaplan, McNiell, Morse, and Rosen. (a) Randomly assign your subjects to either Group 1 or Group 2. Each group will taste and rate both the regular and the light mocha drink. However, Group 1 will drink them in the regular/light order, and Group 2 will drink them in the light/regular order. For each group, the taste ratings of the regular and light drinks will be compared, and then the results of the two groups will be compared to see if the order of tasting made a difference to the ratings. To properly blind the subjects, both mocha drinks should be in identical opaque (we are only measuring taste, not appearance) cups with no labels on them. (b) Using line 141 on Table B, the regular/light group will use subjects with labels: 12, 16, 02, 08, 17, 10, 05, 09, 19, 06. The light/regular group will use the remaining 10 subjects. (a) This is matched pairs data because each day two related measurements are taken. (b) The sample mean for the kill room is 2138.5, and the sample mean for the processing is 314. This is a comparative experiment with two treatments: steady price and price cuts. The explanatory variable is the price history that is viewed. The response variable is the price the student would expect to pay for the detergent. (a) The subjects are the 210 children. (b) The factor is the set of choices that are presented to each subject. The levels correspond to the three sets of choices. The response variable is whether they chose a milk drink or a fruit drink. (c) Use a diagram like those in Figures 3.4 and 3.5 to describe the design. (d) The rst

3.41

3.43 3.45 3.47

3.49

3.51

3.53

3.55

3.57

Moore-212007

pbs

November 27, 2007

9:50

S-18

SOLUTIONS TO ODD-NUMBERED EXERCISES

5 children to be assigned to receive Set 1 have labels 119, 033, 199, 192, and 148. 3.59 (a) The more seriously ill patients may be assigned to the new method by their doctors. The seriousness of the illness would be a lurking variable that would make the new treatment look worse than it really is. (b) Use a diagram like those in Figures 3.4 and 3.5 to describe the design. In a controlled scientic study, the effects of factors other than the treatment can be eliminated or accounted for. (a) Use a diagram like those in Figures 3.4 and 3.5 to describe the design. (b) As a practical issue, it may take a long time to carry out the study. Some people might object because some participants will be required to pay for part of their health care, while others will not. (a) Use a diagram like those in Figures 3.4 and 3.5 to describe the design. (b) Each subject will do the task twice, once under each temperature condition in a random order. The difference in the number of correct insertions at the two temperatures is the observed response. Use a diagram like those in Figures 3.4 and 3.5 to describe the design. Draw a rectangular array with ve rows and four columns. Label the plots across rows. The rst 10 plots selected are labeled 19, 06, 09, 10, 16, 01, 08, 20, 02, and 07. These are assigned to Method A, and the remaining 10 are assigned to Method B. The number 73% is a statistic and the number 68% is a parameter. (a) 25.234 million dollars, which is smaller. (b) 30.788 million dollars, which is smaller. (c) 13.3 million dollars, which is close to the other medians. (a) Approximately 10 million to 70 million dollars. (b) Approximately 5 million to 80 million dollars. (c) Approximately 18 million to 40 million dollars. (d) Sampling variability decreases as the sample size increases. (a) Population is all people who live in Ontario. Sample is the 61,239 residents interviewed. (b) Yes. This is a very large probability sample. (a) Worst case scenario would be that we have a p of 1, which would be 0.8 away from the true proportion. (b) If we use a p of 0.5, then p would only be 0.5 away from the true proportion at most. (a) p = 0.8133. (b) Worst case scenario is the true proportion is 0. Then the estimate would be 0.8133 away. (c) It is impossible to know how far off this sample proportion is from the true population proportion because we do not know what the true population proportion is. (b) Approximately 95% of the p values are inside the interval (0.8072, 0.9196). (c) Approximately 95% of the p values from 1200 samples are between 0.8072 and 0.9196, so this is a good indication of where the true proportion is. 2.503 is a parameter and 2.515 is a statistic. The larger sample size suggested by the faculty advisor will decrease the sampling variability of the estimates. The number of adults is larger than the number of men.

3.61 3.63

3.65

3.67 3.69

3.71 3.73 3.75

3.77 3.79

3.81

3.83

3.85 3.87 3.89

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-19

3.91

(a) Answers will vary depending on the random numbers you choose. For a sample of invoice numbers 9, 8, 1, and 6 corresponding to days past due of 6, 7, 12, and 15, the average is x = 10. (b) Answers will vary depending on the random numbers you choose. The center of the histogram of the 10 repetitions should not be far from 8.2. (a) Starting at line 101, p = 0.6. (b) The 9 additional samples have p = 0.4, 0.6, 0.4, 0.0, 0.6, 0.2, 0.8, 0.8, and 0.6. (c) These 10 samples can be used to draw a histogram. (d) The center is close to 0.6. If more samples were taken, the center should be 0.6 since p is an unbiased estimator of p. (a) Many of the subjects will be non-scientists, and their viewpoint is important to consider. Scientists may be so concerned with the experiment itself that they do not fully consider the experience of participating in the experiment from the subjects point of view. (b) Answers will vary. Choosing a member of the clergy might be trickywhich religion would be picked? A medical doctor might be considered just another scientist. An activist for patients rights might be acceptable; however, the activist would have a bias before even reviewing the experiment. (a) This is acceptable. (b) This is acceptable as long as no names will be reported and the social psychologist doesnt interfere in any way. (c) This is not acceptable. This is not anonymous because it takes place in the persons home. However, it is condential if the name/address is then separated from the response before the results are publicized.

3.93

3.95

3.97 3.99

3.101 (a) The pollsters must tell the potential respondents what type of questions will be asked and how long it will take to complete the survey. (b) This is required so the respondents can make sure the polling group is legitimate or so the respondent can issue a complaint if necessary. 3.103 Psychology 001 uses dependent subjects, which does not seem ethical. The other two courses have acceptable alternatives which make the use of the students more ethical. 3.105 Answers will vary. 3.107 Answers will vary. 3.109 Answers will vary. 3.111 (a) This is an experiment because the students are reacting to the ads they were shown. The ads are the treatment which is applied to the subjects. (b) The explanatory variable is the type of ad which is shown to the student, and the response variable is the expected price for the cola which the student states. 3.113 (a) All adults. (b) 37.17%. (c) Most people probably will not be able to accurately remember how many movies they have watched in a movie theater over the past 12 months. That is a very long time period, and they were not warned ahead of time to keep track of their ticket stubs. (d) The results would be more accurate if the question concerned just the past month. 3.115 (a) This is a completely randomized design with 2 treatment groups (20 women in each group). One group would receive the brochures for Company A and for Company B with child-care. The other group would receive the brochures for Company A and for Company B without child-care. At the end of the experiment, record the each womens choice of company. (b) Using Table B, starting at line

Moore-212007

pbs

November 27, 2007

9:50

S-20

SOLUTIONS TO ODD-NUMBERED EXERCISES

161, Group 1 will consist of: Wong (40), Adamson (02), Rivera (31), Chen (06), Morse (28), Roberts (32), Janle (21), Ullmann (38), Gupta (16), Brown (04), Chen (06), Sugiwara (34), Gerson (14), Travers (36), Kim (23), Edwards (11), McNeill (27), Danielson (09), and A (03). The other 20 subjects would be in Group 2. 3.117 This is an observational study because no treatment is applied. The researcher is just measuring the men as they already are. 3.119 This is a matched pairs experiment. The two types of mufn are the treatments. 3.121 The wording of questions has the most important inuence on the answers to a survey. Leading questions can introduce strong bias, and both of these questions lead the respondent to answer Yes, which results in contradictory responses. 3.123 (a) There may be systematic differences between the recitations attached to the two lectures. (b) Randomly assign the 20 recitations to the two groups, 10 in each group. First number the 20 recitations from 01 to 20. To carry out the random assignment, enter Table B and read two-digit groups until 10 recitations have been selected. Answers will vary depending on where in the table you start. 3.125 The nonresponse rate can produce serious bias. For the original questionnaires, there was a 37% response rate. For the rms sent follow-up questionnaires, nonresponse was a serious problem as well. 3.127 Take an SRS from each of the four groups for a stratied sample. 3.129 This is a sensitive question and many people are embarrassed to admit that they do not vote. 3.131 (a) The response variable is whether or not the subject gets colon cancer. The explanatory variables are the four different supplement combinations. (b) Use a diagram like those in Figures 3.4 and 3.5 to describe the design. (c) The rst ve subjects assigned to the beta carotene group are subjects 731, 253, 304, 470, and 296. (d) Neither the subjects nor those evaluating the subjects response knew which treatments were applied. (e) Any observed differences are due to chance. (f) People who eat lots of fruits and vegetables tend to have different diets, may exercise more, may smoke less, etc. 3.133 (a) The factors are storage method at three levels and when cooked at two levels. (b) One possible design is to take the group of judges and divide them at random into six groups. One group is assigned to each treatment. (c) Having each subject taste fries from each of the six treatments in a random order is a block design and eliminates the variability between subjects. 3.135 Subjects should not be told where each burger comes from and in fact shouldnt be told which two burger chains are being compared. 3.137 Answers will vary depending on the random numbers you choose. We would expect that in a long series of random assignments, about half, or 5, of the rats with genetic defects would be in the experimental group. 3.139 For comparison purposes, it is important to draw the histograms using the same scales. Increasing the sample size decreases the sampling variability of p . The histograms are becoming more concentrated about their center, which is p = 0.6, so the chance of getting a value of p far from 0.6 becomes smaller as the sample size increases.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-21

CHAPTER . . . . . . . . . . .4
4.1 4.3 4.5 4.7 4.9 4.11 We spun a nickel 50 times and got 22 heads. We estimate the probability of heads to be 0.44. Answers will vary. The rst 200 digits contain 21 0s. The proportion of 0s is 0.105. We tossed a thumbtack 100 times and it landed with the point up 39 times. The approximate probability of landing point up is 0.39. (a) 0. (b) 1. (c) 0.01. (d) 0.6. (a) The number of buys (1s) in our simulation was 50. The percent of simulated customers who bought a new computer is 50%. (b) The longest run of buys (1s) was 4. The longest run of not buys was 5. (a) In our 100 draws the number in which at least 14 people had a favorable opinion was 37. The approximate probability is 0.37. (b) The shape of our histogram was roughly symmetric and bell-shaped, the center appeared to be about 65%, and the values ranged from 40% to 90%. (c) The shape of our histogram was roughly symmetric and bell-shaped, the center was about 65% or 66%, and the values ranged from 58% to 74%. (d) Both distributions are roughly symmetric with similar centers. The histogram in (b) is more spread out. (a) Answers will vary, but should be close to 6%. (b) This simulation represents 100 randomly selected items which have been purchased. A success is an item which is returned. Over the long run, there is a 0.06 probability that an item which is purchased will be returned. (a) S = {any number (including fractional values) between 0 and 24 hours}. (b) S = {any integer value between 0 and 11,000}. (c) S = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. (d) One possibility is S = {any number 0 or larger}. (e) We might take S = {any possible number, either positive or negative}. P(death was either agriculture-related or manufacturing-related) = 0.253. P(death was related to some other occupation) = 0.747. Model 1: Not legitimate. The probabilities do not sum to 1. Model 2: Legitimate. Model 3: Not legitimate. The probabilities do not sum to 1. Model 4: Not legitimate. Some of the probabilities are greater than 1. (a) P(completely satised) = 0.39 since probabilities sum to 1. (b) P(dissatised) = 0.14. (a) The area of a triangle is (1/2) height base = (1/2) 1 2 = 1. (b) The probability that T is less than 1 is 0.5. (c) The probability that T is less than 0.5 is 0.1125. (a) P(Y > 300) = 0.5. (b) P(Y > 370) = 0.025. (a) 0.19. (b) 0.287. (c) 0.790. (a) P(Y > 1) = 0.69. (b) 0.28. (c) 0.65. (a) Yes, the percentages add up to 100%. (b) 0.02. (c) 0.919. (a) P(not farmland) = 0.08. (b) P(either farmland or forest) = 0.93. (c) P(something other than farmland or forest) = 0.07.

4.13

4.15

4.17

4.19 4.21

4.23 4.25

4.27 4.29 4.31 4.33 4.35

Moore-212007

pbs

November 27, 2007

9:50

S-22

SOLUTIONS TO ODD-NUMBERED EXERCISES

4.37 4.39 4.41

(a) P(blue) = 0.1. (b) P(blue) = 0.2. (c) P(plain M&M is red, yellow, or orange) = 0.5. P(peanut M&M is red, yellow, or orange) = 0.5. (a) Legitimate. (b) Not legitimate. The sum of the probabilities is greater than 1. (c) Not legitimate. The probabilities of the outcomes do not sum to 1. (a) P(A) = 0.29. P(B) = 0.18. (b) A does not occur means the farm is 50 acres or more. P(A does not occur) = 0.71. (c) A or B means the farm is less than 50 acres or 500 or more acres. P(A or B) = 0.47. (a) The 8 arrangements of preferences are NNN, NNO, NON, ONN, NOO, ONO, OON, OOO. Each must have probability 0.125. (b) P(X = 2) = 0.375. (c) X 0 1 0.375 2 0.375 3 0.125 P(X ) 0.125

4.43

4.45

(a) All the probabilities given are between 0 and 1 and they sum to 1. (b) P(X 5) = 0.11. (c) P(X > 5) = 0.04. (d) P(2 < X 4) = 0.32. (e) P(X = 1) = 0.75. (f) P(a randomly chosen household contains more than two persons) = P(X > 2) = 0.43. (a) Continuous. All times greater than 0 are possible without any separation between values. (b) Discrete. It is a count that can take only the values 0, 1, 2, 3, or 4. (c) Continuous. Any number 0 or larger is possible without any separation between values. (d) Discrete. Household size can take only the values 1, 2, 3, 4, 5, 6, or 7. (a) All the probabilities given are between 0 and 1 and they sum to 1. (b) {X 1} means the household owns at least 1 car. P(X 1) = 0.91. (c) Households that have more cars than the garage can hold have 3 or more cars. 20% of households have more cars than the garage can hold. (a) The probability that a tire lasts more than 50,000 miles is P(X > 50,000) = 0.5. (b) P(X > 60,000) = 0.0344. (c) The normal distribution is continuous, so P(X = 60,000) = 0 and P(X 60,000) = P(X > 60,000) = 0.0344. = 18.5. If we record the size of the hard drive chosen by many, many customers in the 60-day period and compute the average of these sizes, the average will be close to 18.5. Knowing is not very helpful, as it is not one of the possible choices and does not indicate which choice is most popular. (a) X = 280. Y = 195. (b) Prot at the mall is 25X and 25X = 7000. Prot downtown is 35 and 35Y = 6825. (c) The combined prot is 25X + 35Y. 25X +35Y = 13,825.
2 Y = 445, Y = 19,225.00, and Y = 138.65. 2 2 (a) X = 280, Y = 5600, and X = 74.83. (b) Y = 195, Y = 6475.00, and Y = 80.47. 2 X Y = 100. X Y = 10,000. X Y = 100. If the correlation between two variables is positive, then large values of one tend to be associated with large values of the other, resulting in a relatively small difference. Also, small values of one tend to be associated with small values of the other, again resulting in a relatively small difference. This suggests that when two variables are positively associated, they vary together and the difference tends to stay relatively small and varies little.

4.47

4.49

4.51

4.53

4.55

4.57 4.59 4.61

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-23

4.63

(a) 720. (b) No. of transactions % of clients

0 0.08

1 0.17

2 0.35

3 0.19

4 0.18

5 0.13

(c) Yes, if for 5 transactions the 0.125 is rounded to 0.13, then the probabilities add up to 1. If the 0.125 is rounded down to 0.12, the probabilities add up to 0.99. 4.65 4.67 (a) 0.0010. (b) 0.1131. (c) 379.28 bets per second. (a) 1280. (b) X Probability 4.69

218,720 0.003

1280 0.997

(a) Two important differences between the histograms are (1) the center of the distribution of the number of rooms of owner-occupied units is larger than the center of the distribution of the number of rooms of renter-occupied units and (2) the spread of the distribution of the number of rooms of owner-occupied units is slightly larger than the spread of the distribution of the number of rooms of renter-occupied units. (b) owned = 6.284. rented = 4.187. The mean number of rooms for owner-occupied units is larger than the mean number of rooms for renter-occupied units. This reects the fact that the center of the distribution of the number of rooms of owner-occupied units is larger than the center of the distribution of the number of rooms of renter-occupied units. The histogram for renter-occupied units is more peaked and less spread out than 2 2 the histogram for owner-occupied units. owned = 2.69204, rented = 1.71174, owned = 1.64074, rented = 1.30833. There are 1000 three-digit numbers (000 to 999). If you pick a number with three different digits, your probability of winning is 6/1000 and your probability of not winning is 994/1000. Your expected payoff is $0.49998. (a) We would expect X and Y to be independent because they correspond to events that are widely separated in time. (b) Experience suggests that the amount of rainfall on one day is not closely related to the amount of rainfall on the next. We might expect X and Y to be independent or, because they are not widely separated in time, perhaps slightly dependent. (c) Orlando and Disney World are close to each other. Rainfall usually covers more than just a very small geographic area. We would not expect X and Y to be independent. (a) If X is the time to bring the part from the bin to its position on the automobile chassis and Y is the time required to attach the part to the chassis, then the total time for the entire operation is X + Y . X +Y = 31 seconds. (b) It will not affect the mean. (c) The answer in (a) will remain the same in both cases. X +Y = 4.47. If X and Y have correlation 0.3, X +Y = 4.98. If X and Y are positively correlated, then large values of X and Y tend to occur together, resulting in a very large value of X + Y . Likewise, small values of X and Y tend to occur together, resulting in a small value of X + Y . Thus, X + Y exhibits larger variation when X and Y are positively correlated than if they are not.

4.71

4.73

4.75

4.77

4.79

Moore-212007

pbs

November 27, 2007

9:50

S-24

SOLUTIONS TO ODD-NUMBERED EXERCISES

4.81 4.83 4.85

2 X = ( + )(0.5) + ( )(0.5) = . X = ( + X )2 (0.5) + 2 2 ( X ) (0.5) = . Thus, X = . 2 2 2 (a) X +Y = 7,812,763.75 and X +Y = 2795.13. (b) Z = 2000X +3500Y = 13 3.135635594 10 . Thus, Z = 5,599,674.

(a) The two students are selected at random and we expect their scores to 2 be unrelated or independent. (b) femalemale = 15. femalemale = 2009 and 2 femalemale = 44.82. (c) We cannot nd the probability that the woman chosen scores higher than the man chosen because we do not know the probability distribution for the scores of women or men.
2 0.8W +0.2Y = 15.60. Thus, 0.8W +0.2Y = 3.95. This is smaller than the result in Exercise 4.70 because we no longer include the positive term 20.8W 0.2Y . The mean return remains the same. 2 2 2 2 2 If X Y = 1, X +Y = X + Y + 2 X Y X Y = X + Y + 2 X Y = ( X + Y )2 . So X +Y = X + Y . 2 (a) X = 550. X = 5.7. (b) X 550 = 0. X 550 = 32.5 and X 550 = 5.7. 2 2 (c) Y = (9X/5)+32 = 1022. Y = (9X/5)+32 = 105.3 and Y = 10.26.

4.87

4.89 4.91 4.93 4.95

These are statistics. Pfeiffer undoubtedly tested only a sample of all the models produced by Apple, and these means are computed from these samples. The law of large numbers says that in the long run, the average payout to Joe will be 60 cents. However, Joe pays $1.00 to play each time, so in the long run his average winnings are $0.40. Thus, in the long run, if Joe keeps track of his net winnings and computes the average per bet, he will nd that he loses an average of 40 cents per bet. That is not right. The law of large numbers tells us that the long-run average will be close to 34%. Six of seven at-bats is hardly the long run. Furthermore, the law of large numbers says nothing about the next event. It only tells us what will happen if we keep track of the long-run average. (a) Using statistical software we compute the mean of the 10 sizes to be = 69.4. (b) Using line 120, our SRS is companies 3, 5, 4, 7. x = 67.25. (c) The center of our histogram appears to be at about 69, which is close to the value of = 69.4 computed in part (a).

4.97

4.99

4.101 (a) To say that x is an unbiased estimator of means that x will neither systematically overestimate or underestimate in repeated use and that if we take many, many samples, calculate x for each, and compute the average of these x-values, this average would be close to . (b) If we draw a large sample from a population, compute the value of some statistic (such as x), repeat this many times, and keep track of our results, these results will vary less from sample to sample than the results we would obtain if our samples were small. 4.103 The sampling distribution of x should be approximately N (1.6, 0.085). P(x > 2) = P(Z > 4.76), which is approximately 0. 4.105 500,000,000 is a parameter, and 5.6 is a statistic. 4.107 19 is a parameter. 14 is a statistic. 4.109 The gambler pays $1.00 for an expected payout of $0.947. His expected winnings are $0.053 per bet. The law of large numbers tells us that if the gambler makes

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-25

a large number of bets on red, keeps track of his net winnings, and computes the average of these, this average will be close to $0.053. He will nd he loses about 5.3 cents per bet on average. 4.111 (a) The mean x of n = 3 weighings will follow an N (123, 0.0462) distribution. (b) P(x 124) = P(Z 21.65), which is approximately 0. 4.113 P(X > 210.53) = 0.0052. 4.115 0.0125. 4.117 (a) P(X < 295) = 0.1587. (b) The mean contents x will vary according to an N (298, 1.225) distribution. P(x < 295) = P(Z < 2.45) = 0.0071. 4.119 The range 0.11 to 0.19 will contain approximately 95% of the many xs. 4.121 (a) P(x > 400) = P(Z > 10.94), which is approximately 0. (b) We would need to know the distribution of weekly postal expenses to compute the probability that postage for a particular week will exceed $400. In part (a) we applied the central limit theorem, which does not require that we know the distribution of weekly expenses. 4.123 Sheilas mean glucose level x will vary according to an N (125, 5) distribution. L must satisfy P(x > L) = 0.05. We nd L = 133.25. 4.125 (a) E(X ) = 620 and = 12,031.808. (b) E(Y ) = 620 and = 12,031.808. (c) The standard deviation is the same for both. 4.127 (a) 0.0866. (b) 9. (c) There is always sampling variability, but this variability is reduced when an average is used instead of an individual measurement. The larger the sample size, the smaller the variability. 4.129 (a) 0.3544. (b) 0.8558. (c) 0.9990. (d) The probabilities using the central limit theorem are more accurate as the sample size increases. The 150-bag probability calculation is probably fairly accurate, but the 3-bag probability calculation is probably not. 4.131 P(Chavez promoted) = 0.55. 4.133 (a) All the probabilities listed are between 0 and 1 and they sum to 1. (b) P(worker is female) = 0.43. (c) P(not in occupation F) = 0.96. (d) P(occupation D or E) = 0.28. (e) P(not in occupation D or E) = 0.72. 4.135 The probability distribution is Y 1 2 3 4 5 6 7 8 9 10 11 12

P(Y ) 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 1/12 4.137 The weight of a carton varies according to an N (780, 17.32) distribution. Letting Y denote the weight of a carton, P(750 < Y < 825) = P(1.73 < Z < 2.60) = 0.9535. 4.139 (a) All the probabilities listed are between 0 and 1 and they sum to 1. (b) = 2.45. 4.141 If a single home is destroyed by re, the replacement cost could be several hundred thousand dollars. The money received from 12 policies (unless extra charges for costs and prots are huge) would not cover this replacement cost.

Moore-212007

pbs

November 27, 2007

9:50

S-26

SOLUTIONS TO ODD-NUMBERED EXERCISES

Although the chance that a home will be destroyed by re is small, the risk to the company is too great. If one sells thousands of policies, one can appeal to the law of large numbers and feel condent that the mean loss per policy will be close to $250. Thus, the company can be reasonably sure that the amount it charges for extra costs and prots will be available for these costs and prots, and that the company will make money. The more policies the company sells, the better off the company will be. 4.143 P(age at death 26) = 0.99058. X = 303.35.
2 4.145 X = 94,236,826.6. X = 9707.57.

4.147 (a) S = {0, 1, 2, 3, . . . , 100} is one possibility (but assuming a person could be employed for 100 years is extreme). (b) Because we are allowing only a nite number of possible values, X is discrete. (c) We included 101 possible values (0 to 100). 4.149 (a) S = {all numbers between 0 and 35 ml with no gaps}. (b) S is a continuous sample space. All values between 0 and 35 ml are possible with no gaps. (c) We included an innite number of possible values.

CHAPTER . . . . . . . . . . .5
5.1 5.3 5.5 5.7 5.9 (a) The probability that the rank of the second student falls into the ve categories is unaffected by the rank of the rst student selected. (b) 0.1681. (c) 0.0041. (a) 0.32768. (b) 0.65908. 0.66761. (a) 15% drink only cola. (b) 20% drink none of these beverages. (a) W = {0, 1, 2, 3}. Part (b) W W =0 W =1 DDD DDF DFD FDD DFF FDF FFD FFF (0.73)3 = 0.389 (0.27)(0.73)2 = 0.144 (0.27)(0.73)2 = 0.144 (0.27)(0.73)2 = 0.144 (0.27)2 (0.73) = 0.0532 (0.27)2 (0.73) = 0.0532 (0.27)2 (0.73) = 0.0532 (0.27)3 = 0.0197 Part (c) 0.389 0.432

Arrangements Probability of each arrangement Probability for W

W =2 W =3 5.11 5.13 5.15

0.160 0.197

(a) (0.09)6 = 0.0000005. (b) (0.91)6 = 0.568. (c) 0.337. 0.09608. (a) 0.2746. (b) The probability of the price being down in any given year is 1 0.65 = 0.35. Since the years are independent, the probability of the price being down in the third year is 0.35. (c) 0.5450. 0.8.

5.17

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-27

5.19 5.21 5.23

Yes. A and B are independent if P(A and B) = P(A)P(B) and 0.3 = (0.6)(0.5). 0.3762. P(O and Rh-positive) = 0.3780, P(O and Rh-negative) = 0.0720, P(A and Rh-positive) = 0.3360, P(A and Rh-negative) = 0.0640, P(B and Rhpositive) = 0.0924, P(B and Rh-negative) = 0.0176, P(AB and Rh-positive) = 0.0336, and P(AB and Rh-negative) = 0.0064. (a) The probability of an 11 is 2/36. The probability of three 11s in three independent throws is 0.000171. (b) The writers rst statement is correct. The odds 3 against throwing three straight 11s, however, are 1P = 1(2/36) = 5831 to 1. P (2/36)3 When computing the odds for the three tosses, the writer multiplied the odds, which is not the correct way to compute the odds for the three throws. The assumption of a xed number of observations is violated. (a) X can be 0, 1, 2, 3, 4, 5. (b) P(X = 0) = 0.2373, P(X = 1) = 0.3955, P(X = 2) = 0.2637, P(X = 3) = 0.0879, P(X = 4) = 0.0146, and P(X = 5) = 0.0010. These probabilities can be used to draw a histogram. P(X = 11) = 0.0074. = 8 and = np(1 p) = 20(0.4)(0.6) = 2.191. (a) = 16. (b) = np(1 p) = 20(0.8)(0.2) = 1.789. (c) When p = 0.9, = 1.342; and when p = 0.99, = 0.455. As the value of p gets closer to 1, there is less variability in the values of X . (a) Using the Normal approximation, P(X 100) = 0.0019. This probability is extremely small and suggests that p may be greater than 0.4. (b) P(X 10) was evaluated for a sample of size 20 and found to be 0.2447. The proportion in the sample will be closer to 40% for larger sample sizes, so the chance of the proportion in the sample being as large as 50% decreases. (a) x = np = 6, p = p = 0.5. (b) x = 60 if n = 120, and x = 600 if n = 1200. p stays the same regardless of sample size. (a) X is binomial with n = 1555 and p = 0.20. (b) Using the Normal approximation to the binomial for proportions, p = 0.20, p = 0.0101, and the P( p 0.193) = 0.2451. Using software for the binomial distribution, P(X 300) = 0.254. (a) The 20 machinists selected are not an SRS from a large population of machinists and could have different success probabilities. (b) We know that the count of successes in an SRS from a large population containing a proportion p of successes is well approximated by the binomial distribution. This description ts this setting. (a) n = 10 and p = 0.25. (b) P(X = 2) = 0.2816. (c) P(X 2) P(X = 0)+ = P(X = 1) + P(X = 2) = 0.5256. (d) = 2.5 and = np(1 p) = 10(0.25)(0.75) = 1.37. (a) n = 5 and p = 0.65. (b) The possible values of X are 0, 1, 2, 3, 4, 5. (c) P(X = 0) = 0.0053, P(X = 1) = 0.0488, P(X = 2) = 0.1811, P(X = 3) = 0.3364, P(X = 4) = 0.3124, and P(X = 5) = 0.1160. (d) = 3.25 and = np(1 p) = 5(0.65)(0.35) = 1.067. The value of 3.25 should be

5.25

5.27 5.29

5.31 5.33 5.35

5.37

5.39 5.41

5.43

5.45

5.47

Moore-212007

pbs

November 27, 2007

9:50

S-28

SOLUTIONS TO ODD-NUMBERED EXERCISES

included in the histogram from part (c) and should be a good indication of the center of the distribution. 5.49 (a) Using the Normal approximation, P(X 70) = 0.1251. (b) P(X 175) corresponds to Jodi scoring 70% or lower. Using the Normal approximation, P(X 175) = 0.0344. (a) The binomial distribution with n = 150 and p = 0.5 is reasonable for the count of successes in an SRS of size n from a large population, as in this case. (b) We expect 75 businesses to respond. (c) Using the Normal approximation, P(X 70) = 0.2061. (d) n must be increased to 200. (a) = 180 and = np(1 p) = 1500(0.12)(0.88) = 12.586. (b) Using the Normal approximation, P(X 170) = 0.2148. (a) Poisson distribution with a mean of 12 7 = 84 accidents. (b) P(X 66) = 0.0248. (a) The employees are independent and each equally likely to be hospitalized. (b) 0.0620. (c) 0.938. (d) 0.256. (a) 0.0821. (b) 0.2424. (a) Poisson distribution with a mean of 48.7. P(X 50) = 1 P(X 49) = 0.4450. (b) For a 15-minute period, = = 48.7 = 6.98. For a 30-minute period, = = 97.4 = 9.87. (c) Poisson with mean 2 48.7 = 97.4. P(X 100) = 1 P(X 99) = 1 0.5905 = 0.4095. (a) = = 17 = 4.12. (b) P(X 10) = 0.0491. (c) P(X > 30) = 1 P(X 30) = 1 0.9986 = 0.0014. (a) P(X 5) = 1 P(X 4) = 1 0.0018 = 0.9982. (b) Poisson with mean 1/2 14 = 7. P(X 5) = 1 P(X 4) = 1 0.1730 = 0.8270. (c) Poisson with mean 1/4 14 = 3.5. P(X 5) = 1 P(X 4) = 1 0.7254 = 0.2746. (a) = = 2.3 = 1.52. (b) P(X > 5) = 1 P(X 5) = 1 0.9700 = 0.0300. (c) k = 3. 0.1472. (a) 0.4335. (b) 0.4776. (c) 0.4617. (d) 0.0489. (a) 0.5970. (b) 0.3150. (c) If the events A and B were independent, then P(B|A) = P(B), which is not the case. Using Bayess rule, the probability is 0.7660. 0.3090. 0.8989. Given that the customer defaults on the loan, there is an 89.89% chance that the customer overdraws the account. (a) 0.25. (b) 0.3333. P(Y < 1/2 and Y > X ) = 1/8. Drawing a diagram should help. (a) 0.20. (b) 0.62. These answers are simpler to see if you rst draw a tree diagram. Using Bayess rule, the probability is 0.323.

5.51

5.53 5.55 5.57 5.59 5.61

5.63 5.65

5.67 5.69 5.71 5.73 5.75 5.77 5.79 5.81 5.83 5.85 5.87

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-29

5.89

(a) Using Bayess rule, the probability is 0.064. (b) Expect 94 not to have defaulted. (c) With the credit managers policy, the vast majority of those whose future credit is denied would not have defaulted. Yes. A and B are independent if P(A and B) = P(A)P(B) and 0.3 = (0.6)(0.5). Using Bayess rule, the probability is 0.1930. (a) 0.1416. (b) 0.1029. (c) The people are independent from each other, and each person has a 70% chance of being male. 0.75. (a) = 3.75. (b) P(X 10) = 1 P(X 9) = 1 0.9992 = 0.0008. (c) Using the Normal approximation, P(X 275) = 0.0336.

5.91 5.93 5.95 5.97 5.99

5.101 (a) = 1250. (b) Using the Normal approximation, P(X 1245) = 0.5596. 5.103 (a) The probability of a success p should be the same for each observation. To ensure that this is true, it is important to take the observations under similar conditions, such as the same location and time of day. (b) The probability of the driver being male for observations made outside a church on Sunday morning may differ from the probability for observations made on campus after a dance. (c) P(X 8) = 0.4557. (d) Using software, P(X 80) = 0.1065. 5.105 0.84. 5.107 (a) 0.674. (b) 0.787. (c) If the events in labor force and college graduate were independent, we should have P(in labor force) = P(in labor force | college graduate), which is not true by comparing the answers in (a) and (b). 5.109 (a) Drawing the tree diagram will be helpful in solving the remaining parts. (b) 0.01592. (c) 0.627. 5.111 (a) ( 5 )( 1 ). (b) ( 5 )2 ( 1 ). (c) P(rst one on kth toss) = ( 5 )k1 ( 1 ). 6 6 6 6 6 6 5.113 (a) 0.751. (b) 0.48. 5.115 (a) 0.125. (b) 0.024. 5.117 Using Bayess rule, the probability is 0.0404.

CHAPTER . . . . . . . . . . .6
6.1 6.3 6.5 6.7 6.9 The standard deviation for x is $22. 2 (standard deviation for x) = $44. (189.91 million dollars, 250.09 million dollars). n = 1244.68. Use n = 1245. (a) The response rate is 1468/13,000 = 0.1129. (b) If there are systematic patterns in the organizations that did not respond, the survey results may be biased. The small margin of error is probably not a good measure of the accuracy of the surveys results. Sample size Margin of error 10 3.10 20 2.19 40 1.55 100 0.98

6.11

Moore-212007

pbs

November 27, 2007

9:50

S-30

SOLUTIONS TO ODD-NUMBERED EXERCISES

As sample size increases, the width of the condence interval (or the size of the margin of error) decreases. 6.13 The students in your major will have a smaller standard deviation because many of them will be taking the same classes which require the same textbooks. The smaller standard deviation leads to a smaller margin of error. 30.8 4.51 or (26.29, 35.31). 5.96 0.41 (a) 115 13.25 minutes, or (101.75 minutes, 128.25 minutes). (b) No. The condence coefcient of 95% is the probability that the method we used will give an interval containing the correct value of the population mean study time. It does not tell us about individual study times. (a) The mean weight of the runners in kilograms is x = 61.863. The mean weight of runners in pounds is 136.099. (b) The standard deviation of the mean weight in kilograms is 0.9186. The standard deviation of the mean weight in pounds is 2.0208. (c) A 95% condence interval for the mean weight, in kilograms, of the population is (60.062, 63.664). In pounds we get (132.137, 140.061). 11.8 0.77 years or (11.03 years, 12.57 years). $17,528.90 $961.53, or ($16,567.37, $18,490.43). n = 74.37. Use n = 75. (a) No. We are 95% condent that the true population percent falls in this interval. (b) This particular interval was produced by a method that will give an interval that contains the true percent of the population that like their job 95% of the time. When we apply the method once, we do not know if our interval correctly includes the population percent or not. Because the method yields a correct result 95% of the time, we say we are 95% condent that this is one of the correct intervals. (c) 1.531. (d) No, the margin of error only covers random variation. Answers will vary. Some possibilities include blue collar vs. white collar jobs, management vs. entry- and mid-level employees. (a) 95% condence means that this particular interval was produced by a method that will give an interval that contains the true percent of the population that will vote for Ringel 95% of the time. When we apply the method once, we do not know if our interval correctly includes the true population percent or not. Thus, 95% refers to the method, not to any particular interval produced by the method. (b) The margin of error is 3%, so the 95% condence interval is 52% 3% = (49%, 55%). The interval includes 50%, so we cannot be condent that the true percent is 50% or even slightly less than 50%. Hence, the election is too close to call from the results of the poll. The results are not trustworthy. The formula for a condence interval is relevant if our sample is an SRS (or can plausibly be considered an SRS). Phone-in polls are not SRSs. H0 : = 0, Ha : < 0. (a) z = 1.58. (b) P-value = 0.1142. This would not be considered strong evidence that Cleveland differs from the national average.

6.15 6.17 6.19

6.21

6.23 6.25 6.27 6.29

6.31 6.33

6.35

6.37 6.39

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-31

6.41

If the homebuilders have no idea whether Cleveland residents spend more or less than the national average, then they are not sure whether is larger or smaller than 31%. The appropriate hypotheses are H0 : = 31%, Ha : = 31%. z-values that are signicant at the = 0.005 level are z > 2.807 and z < 2.807. The signicance level corresponding to z = 2 is 0.0456, and the signicance level corresponding to z = 3 is 0.0026. (a) P-value = 0.0287. (b) P-value = 0.9713. (c) P-value = 0.0574. (a) With a P-value of 0.07 we would not reject H0 : = 15, and thus 15 would not fall outside the 95% condence interval. (b) With a P-value of 0.07 we would reject H0 : = 15, and thus 15 would fall outside the 90% condence interval. (a) Yes. 0.049 < 0.05. (b) No. 0.049 > 0.01. (c) Reject the null hypothesis whenever the P-value is . (a) The null hypothesis should be = 0. The alternative hypothesis could be > 0. (b) The standard deviation of the sample mean should be 18/ 30. (c) The hypothesis test uses the population parameter , not the sample statistic x. H0 : = 100, Ha : = 100, Z = 0.108, P-value = 2(0.4562) = 0.9124. Do not reject the null hypothesis. There is not enough evidence to say that the average north-south location is signicantly different from 100. If a signicance level of 0.05 is used, do not reject the null hypothesis. There is not enough evidence to say that private four-year students have signicantly higher debt than public four-year borrowers. (a) H0 : = 31, Ha : > 31. (b) H0 : = 4, Ha : = 4. (c) H0 : = 1400, Ha : < 1400. (a) H0 : p M = p F , Ha : p M > p F , where the parameters of interest are the percent of males, p M , and the percent of females, p F , in the population who name economics as their favorite subject. (b) H0 : A = B , Ha : A > B , where A is the mean score on the test of basketball skills for the population of all sixthgrade students if all were treated as those in group A and B is the mean score on the test of basketball skills for the population of all sixth-grade students if all were treated as those in group B. (c) H0 : = 0, Ha : > 0, where the parameter of interest is the correlation between income and the percent of disposable income that is saved by employed young adults. The P-value is the probability that we would observe, simply by chance, results that are as strongly or more strongly in support of the calcium supplement if it is really no more effective than the placebo. In this case the P-value is 0.008, which is very small. Because it is unlikely that we would obtain data this strongly in support of the calcium supplement by chance, this is strong evidence against the assumption that the effect of the calcium supplement is the same as that of the placebo. (a) Answers may vary, but one possibility is a two-sample comparison of means test with H0 : A = B , Ha : A = B where group A consists of students who exercise regularly and group B consists of students who do not exercise regularly. (b) No, the P-value is large. There is not enough evidence to say that exercise signicantly affects how students perform on their nal exam in

6.43 6.45 6.47 6.49

6.51 6.53

6.55

6.57

6.59 6.61

6.63

6.65

Moore-212007

pbs

November 27, 2007

9:50

S-32

SOLUTIONS TO ODD-NUMBERED EXERCISES

statistics. (c) It would be good to know how the sample was selected and if this was an observational study or an experiment. 6.67 z = 3.56. The P-value is P(Z 3.56) < 0.002. We would conclude that there is strong evidence that these 5 sonnets come from a population with a mean number of new words that is larger than 6.9, and thus we have evidence that the new sonnets are not by our poet. The P-value is 0.0164. This is reasonably strong evidence against the null hypothesis that the population mean corn yield is 135. Our results are based on the mean of a sample of 40 observations, and such a mean may vary approximately according to a Normal distribution (by the central limit theorem) even if the population is not Normal, so our conclusions are probably still valid. (a) The P-value = 0.2040 and the result is not signicant at the 5% level. (b) The P-value is 0.2040 and the result is not signicant at the 1% level. The approximate P-value is < 0.001. (a) z > 1.645. (b) z > 1.96 or z < 1.96. (c) In part (a) we reject H0 only for large values of z because such values are evidence against H0 in favor of Ha : > 0. In part (b) we reject H0 in favor of Ha : = 0 if either z is too large or z is too small because both extremes are evidence against H0 in favor of Ha : = 0. (a) (99.041, 109.225). (b) The hypotheses are H0 : = 105, Ha : = 105. The 95% condence interval in part (a) contains 105, so we would not reject H0 at the 5% level. (a) H0 : = 7, Ha : = 7. The 95% condence interval does not contain 7, so we would reject H0 at the 5% level. (b) 5 is in the 95% condence interval, so we would not reject the hypothesis that = 5 at the 5% level. (a) z = 1.64. We reject H0 at the 5% level if z > 1.645. The result is not signicant at the 5% level. (b) z = 1.65. The result is signicant at the 5% level. Any convenience sample (phone-in or write-in polls, surveys of acquaintances only) will produce data for which statistical inference is not appropriate. Poorly designed experiments also provide examples of data for which statistical inference is not valid. Statistical signicance and practical signicance are not necessarily the same thing. With a signicance level this high, we would be willing to reject the null hypothesis even when no practical effect exists. A P-value of 0.5 corresponds to a Z value of 0, which means that our sample statistic would be 0 standard deviations away from the null hypothesis mean, but we would still reject the null hypothesis. Answers will vary. (a) No. (b) Yes. (c) No. (a) P-value = 0.3821. (b) P-value = 0.1711. (c) P-value = 0.0013. The conclusion is not justied. Statistical inference is not valid for badly designed surveys or experiments. This is a call-in poll, and such polls are not random samples.

6.69

6.71 6.73 6.75

6.77

6.79

6.81 6.83

6.85

6.87 6.89 6.91 6.93

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-33

6.95

Using the Bonferroni procedure for k = 6 tests with = 0.05, we should require a P-value of /k = 0.05/6 = 0.0083 (or less) for statistical signicance for each test. Of the six P-values given, only two, 0.008 and 0.001, are below 0.05/6 = 0.0083. (a) X has a binomial distribution with n = 77 and p = 0.05. (b) The probability that 2 or more are signicant is 0.90266. As sample size increases, power increases.

6.97 6.99

6.101 Since 80 is farther away from 50 than 70 is, the power will be higher than 0.5. 6.103 (a) The power of the test against the alternative = 298 is 0.4960. (b) The power is higher. The alternative = 295 is farther away from 0 = 300 than = 298 and so is easier to detect. 6.105 The power of the test against the alternative = 298 is 0.9099. This is quite a bit larger than the power of 0.4960 that we found in Exercise 6.81. 6.107 The probability of a Type I error is = 0.05. The probability of a Type II error at = 298 is 1 minus the power of the test at = 298, which is 1 0.4960 = 0.5040. 6.109 (a) The hypotheses are H0 : patient does not need to see a doctor and Ha : patient does need to see a doctor. The program can make two types of error: (1) The patient is told to see a doctor when, in fact, the patient does not need to see a doctor. This is a false-positive. (2) The patient is diagnosed as not needing to see a doctor when, in fact, the patient does need to see one. This is a false-negative. (b) The error probability one chooses to control usually depends on which error is considered more serious. In most cases, a false-negative is considered more serious. If you have an illness and it is not detected, the consequences can be serious. 6.111 Age Months employed, x 95% CI, x 0.32 18 2.9 19 4.2 20 5.0 21 5.3 22 6.4 23 7.4 24 8.5 25 8.9 26 9.3

(2.58, (3.88, (4.68, (4.98, (6.08, (7.08, (8.18, (8.58, (8.98, 3.22) 4.52) 5.32) 5.62) 6.72) 7.72) 8.82) 9.22) 9.62)

There is a strong, positive, linear relationship between age and months employed. All the widths of all the condence intervals are exactly the same. 6.113 Answers will vary. 6.115 Industries with SHRUSED values above the median were found to have cash ow elasticities less than those for industries with lower SHRUSED values. The probability is less than 0.05 that we would observe a difference as large as or larger than this by chance if, in fact, on average the cash ow elasticities for the two types of industries are the same. This probability is quite low, and so it is unlikely that the observed difference is merely accidental. 6.117 (a) The stemplot shows that the data are roughly symmetric. (b) (26.06 g/l, 34.74 g/l). (c) Let denote the mean DMS odor threshold among all beginning oenology students. We test the hypotheses H0 : = 25, Ha : > 25. z = 2.44.

Moore-212007

pbs

November 27, 2007

9:50

S-34

SOLUTIONS TO ODD-NUMBERED EXERCISES

The P-value is 0.0073. This is strong evidence that the mean odor threshold for beginning oenology students is higher than the published threshold of 25 g/l. 6.119 $782.82 $6.02 = ($776.80, $788.84). 6.121 (a) The authors probably want to draw conclusions about the population of all adult Americans. The population to which their conclusions most clearly apply is all people listed in the Indianapolis telephone directory. (b) Store type 95% condence interval Food stores Mass merchandisers Pharmacies 18.67 3.45 32.38 4.61 48.60 4.92

(c) None of these intervals overlap, which suggests that the observed differences are likely to be real. 6.123 (a) Increasing the size n of a sample will decrease the width of a level C condence interval. (b) Increasing the size n of a sample will decrease the P-value. (c) Increasing the sample size n will increase the power. 6.125 No. The null hypothesis is either true or false. Statistical signicance at the 0.05 level means that if the null hypothesis is true, the probability that we will obtain data that lead us to incorrectly reject H0 is 0.05. 6.127 (a) Assume that in the population of all mothers with young children, those who would choose to attend the training program and those who would not choose to attend actually remain on welfare at the same rate. The probability is less than 0.01 that we would observe a difference as or more extreme than that actually observed. (b) 95% condence means that the method used to construct the interval 21% 4% will produce an interval that contains the true difference 95% of the time. Because the method is reliable 95% of the time, we say that we are 95% condent that this particular interval is accurate. (c) The study is not good evidence that requiring job training of all welfare mothers will greatly reduce the percent who remain on welfare for several years. Mothers chose to participate in the training program. They were not assigned using randomization. Thus, the effect of the program is confounded with the reasons why some women chose to participate and some didnt. 6.129 Only in 5 cases did we reject H0 . Thus, the proportion of times was 0.05. This is consistent with the meaning of a 0.05 signicance level. In 100 trials where H0 was true, we would expect the proportion of times we reject to be about 0.05. 6.131 (a) We used statistical software to conduct the simulations. (b) The sample size is large, and the central limit theorem suggests that it is probably reasonable to assume that the sampling distribution for the sample means is approximately Normal. (c) m = 10.11. (d) The calculations agree with our simulation result up to roundoff error. (e) 15 of the 25, or 60%, of the simulations contained = 240. If we repeated the simulations, we would not expect to get exactly the same number of intervals to contain = 240. This is because each simulation is random, and so results will vary from one simulation to the next. The probability that any given simulation contains = 240 is 0.50, and so in a very large number of simulations we would expect about 50% to contain = 240.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-35

CHAPTER . . . . . . . . . . .7
7.1 7.3 7.5 (a) SEx = 24.54. (b) There are 11 degrees of freedom. The 95% condence interval for the mean monthly rent is ($481.19, $604.81). Hypotheses are H0 : = 500 and Ha : > 500. t = 1.574 with 9 df, and using software, we get P-value = 0.075. We conclude that there is not much evidence that the mean rent of all advertised apartments exceeds $500. (a) Two-sided, since we are interested in whether the average sales are different from last month (no direction of the difference is specied). Hypotheses are H0 : = 0% and Ha : = 0%. (b) t = 2.47 with 49 df, and using software, 0.005 < P-value < 0.01. There is strong evidence that the average sales have increased. (c) The mean is 4.8% and the standard deviation is 15%, so there are certainly stores with a percent change that is negative. (a) The t statistic must exceed 2.5768. (b) This result can be found in Table D in the row corresponding to z . This illustrates that for large degrees of freedom, there is little difference between critical values for the t and the Normal distribution. A 95% condence interval for the mean loss in vitamin C is (50.11, 59.89). (a) The stemplot using split stems shows the data are clearly skewed to the right and have several high outliers. (b) The 95% condence interval is (20.00, 27.12), which agrees quite well with the bootstrap intervals. The lesson is that for large sample sizes the t procedures are very robust. The power of the t test against the alternative = 2.2 is 0.9394. (a) t = 2.262 using degrees of freedom n 1 = 9. (b) t = 2.064 using degrees of freedom n 1 = 24. (c) t = 2.797 using degrees of freedom n 1 = 24. (d) As sample size increases, the t decreases for the same condence level. As the condence level increases, t increases for the same sample size. (a) Degrees of freedom = 19. (b) 2.093 and 2.205. (c) 0.025 and 0.02. (d) 0.02 < P-value < 0.025. (e) Signicant at 5%, not signicant at 1%. (f) Excel gives 0.02145. (a) Degrees of freedom = 119, but use df = 100 on Table D to be conservative. (b) P-value < 0.0005. (c) Excel gives 9.7287 106 using 119 degrees of freedom. (17.022, 19.938). (a) 0.9916. (b) (59.811, 63.914). (c) No, this is the condence interval for the population average weight, not for individuals. (a) There is no obvious skewness and there are no outliers present. (b) A 95% condence interval for the mean annual earnings of hourly-paid white female workers at this bank is ($22,719.87, $27,350.63). The hypotheses are H0 : = 20,000 and Ha : > 20, 000. t = 4.552 with 19 df, and using software, P-value 0. We conclude there is strong evidence that the mean annual earnings exceed $20,000. (a) The data are skewed right with 3 points that are particularly high, clearly nonNormal. (b) x = 34.156, s = 21.36917, x = 3.37876. (c) (27.3218, 40.9902).

7.7

7.9

7.11 7.13

7.15 7.17

7.19

7.21

7.23 7.25 7.27

7.29

7.31

Moore-212007

pbs

November 27, 2007

9:50

S-36

SOLUTIONS TO ODD-NUMBERED EXERCISES

7.33

(a) 6 streams are classied very poor or poor out of 49 total, so the sample proportion is 0.122. (b) No, sample proportions use Z , not t . Sample proportions are for categorical data, sample means are for quantitative data. We have converted quantitative data into categorical data by sorting the streams into very poor, poor, and other. A 95% condence interval for the mean price received by farmers for corn sold in October is ($1.76, $2.45). A 95% condence interval for the mean score on the question Feeling welcomed at Purdue is (3.734, 3.866). The hypotheses are H0 : = 0 and Ha : > 0, where represents the average improvement in scores over six months for preschool children. t = 6.90 with 33 df and P-value < 0.0005. There is extremely strong evidence that the scores improved over six months, which is in agreement with the condence interval (2.55, 6.88) obtained in Exercise 7.34. A 95% condence interval for the mean amount of D-glucose in cockroach hindguts under these conditions is (18.69 micrograms, 70.19 micrograms). (a) The hypotheses are H0 : = 0 and Ha : < 0, where represents the average difference in vitamin C between the measurement ve months later in Haiti and the factory measurement. (b) t = 4.95 with 26 df, and using software, P-value < 0.0005. There is very strong evidence that vitamin C is destroyed as a result of storage and shipment. (c) The 95% condence intervals are (40.96, 44.75) for the mean at the factory, (36.55, 38.48) for the mean after ve months, and (7.54, 3.12) for the mean change. The 90% condence interval for the mean time advantage is (21.17, 5.47). The ratio of mean time for right-hand threads as a percent of mean time for lefthand threads is x R / x L = 88.7%, so those using the right-hand threads complete the task in about 90% of the time it takes those using the left-hand threads. Taking the differences (Variety A Variety B) to determine if there is evidence that Variety A has the higher yield corresponds to the hypotheses H0 : = 0 and Ha : > 0 for the mean of the differences. t = 1.29 with 9 df, and 0.10 < P-value < 0.15. Do not reject the null hypothesis. There is not enough evidence to support that Variety A tomatoes have a higher mean yield than Variety B tomatoes. (a) Single sample. (b) The best answer is 2 independent samples. Even though we are looking at changes of opinion over time from the same basic group, problems with missing values for one survey due to nonresponse or customers leaving the pool would make the paired sample difcult. However, if the exact same sample was used both years with a stable population of customers, matched pairs would also be an acceptable answer. (a) t = 2.403. (b) x 34.745. (c) The power is P(Z 4.513) = 1.000. There is no need for increasing the sample size beyond 50. (a) H0 : population median = 0 and Ha : population median > 0, or the population of differences (time to complete task with left-hand thread) (time to complete task with right-hand thread). If p is the probability of completing the task faster with the right-hand thread, the hypotheses would be H0 : p = 1/2 and Ha : p > 1/2. (b) The number of pairs with a positive difference in our

7.35 7.37 7.39

7.41 7.43

7.45

7.47

7.49

7.51 7.53

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-37

data is 19, and n = 24 since the zero difference is dropped. The P-value is P(X 19) = 0.0021 using the Normal approximation or 0.0033 using the binomial distribution. 7.55 (a) Two-sided signicance test, since you just want to know if there is evidence of a difference in the two designs. (b) 29 df using the conservative approximation. (c) P-value lies between 2 0.0025 = 0.005 and 2 0.001 = 0.002. There is strong evidence of a difference in daily sales for the two designs. This is a matched pairs experiment with the pairs being the two measurements on each day of the week. Randomization makes the two groups similar except for the treatment and is the best way to ensure that no bias is introduced. (a) All report both means. (b) Excel reports the two variances, SPSS and Minitab report both the standard deviation and the standard error of the mean for each group, while SAS reports only the standard error of the mean for each group. (c) Excel is doing the pooled two-sample t procedure, SPSS and SAS provide the t with Satterthwaite degrees of freedom as well as the pooled t, and Minitab provides the t with Satterthwaite degrees of freedom. All report degrees of freedom and P-values to various accuracies. (d) With the exception of Excel, all report the condence interval for the mean difference. (e) Excel has the least information, while SAS seems to provide the most information because it includes more information about the two groups individually. The pooled t = 17.13 with 133 df. Results are almost identical to those of Example 7.13. First write pooled variance as the average of the individual variances and then use this expression in the pooled t to see that it gives the same result for equal sample sizes. (a) The data are probably not exactly normally distributed because there are only 5 discrete answer choices. (b) Yes, the large sample sizes would compensate for uneven distributions. The two-sample comparison of means t test is fairly robust. (c) H0 : I = C and Ha : I > C . (A = would also be appropriate in the alternative hypothesis, but be sure to double the P-value when checking your answer with this one.) (d) t = 3.57, P-value < 0.0005. Reject the null hypothesis. There is strong evidence the average self-efcacy score for the intervention group is signicantly higher than the average self-efcacy score for the control group. (e) (0.191, 0.669). (b) (4.374, 5.426). (c) H0 : D/B = C and Ha : D/B > C . t = 18.49, P-value is close to 0. Reject the null hypothesis. There is very strong evidence the average exposure to respirable dust is signicantly higher for the drill and blast workers than it is for the concrete workers. (d) The sample sizes are so large that a little skewness will not affect the results of the two-sample comparison of means test. (a) No. If we use the 689599.7% rule, then 68% of the younger kids would consume between 2.5 and 18.9 oz. of sweetened drinks every day. The same problem with negative consumption for the older kids (starting with 95%) as well. (b) H0 : older = younger and Ha : older = younger . t = 1.44, 0.2 < P-value < 0.3. Do not reject the null hypothesis. There is not enough evidence to say that there is a signicant difference between the average consumption of sugary drinks

7.57 7.59 7.61

7.63 7.65

7.67

7.69

7.71

Moore-212007

pbs

November 27, 2007

9:50

S-38

SOLUTIONS TO ODD-NUMBERED EXERCISES

between the older and younger groups of children. (c) (5.85, 18.45). (d) The sample sizes are very uneven and fairly small. The sample data is not normally distributed either. The t procedures are not particularly appropriate here. (e) How were these children selected to participate? Where they chosen because they consume such large quantities of sweetened drinks? 7.73 (a) Reject the null hypothesis because 0 is not inside the condence interval. H0 : A = B can also be written as H0 : A B = 0. (b) As sample size increases, the margin of error decreases. 95% CI = (15.54, 24.46) using 40 degrees of freedom as the conservative estimate from the t-table. A 99% CI will be wider than a 95% CI because the t value increases as the condence level increases. The 95% condence interval for the cost of the extra bedroom is (0.662, 160.662) using software. (a) The hypotheses are H0 : 0 = 3 and Ha : 0 > 3 , where 0 corresponds to immediately after baking and 3 corresponds to three days after baking. t = 22.16, df = 1.47, and P-value = 0.014, which indicates there is strong evidence that the vitamin C content has decreased after three days. (b) The 90% condence interval is (19.2, 34.58). (a) The hypotheses are H0 : 0 = 3 and Ha : 0 > 3 , where 0 corresponds to immediately after baking and 3 corresponds to three days after baking. t = 0.32, and P-value > 0.5, which indicates no evidence of a loss of vitamin E. (b) The 90% condence interval is (11.29, 10.2). (a) P-value is 1 0.07/2 = 0.965. Do not reject the null hypothesis. Our sample mean difference (x 1 x 2 ) is far below 0, so it is very unlikely that the rst mean is greater than the second mean. (b) P-value is 0.07/2 = 0.035. Reject the null hypothesis. Our sample mean difference is well above 0, so it is reasonable to conclude that the rst mean is greater than the second mean. Using 33 df, the 95% condence interval is (1.932, 4.532), so the improvement is greater for those who took piano lessons. (a) The hypotheses are H0 : Low = High and Ha : Low = High . t = 8.23, df = 21.77, and the P-value is approximately zero. You reject at both the 1% and 5% levels of signicance. (b) The individuals in the study are not random samples from low-tness and high-tness groups of middle-aged men, so the possibility of bias is denitely present. (c) These are observational data. This would now be a matched pairs design, since we have before and after measurements on the same men. The analysis would proceed by rst taking the change in score and then computing the mean and standard deviation of the changes. These values would be used in the t statistic. (a) Using 50 df to be conservative and using Table D, the 95% condence interval is (0.9, 6.9). (b) The negative values correspond to a decrease in sales. If the true difference in means was 1, which is included in the condence interval, then the percent sales would have decreased. (a) The Normal quantile plots are fairly linear, and the sum of the sample sizes is close to 40, so in spite of the skewness appearing in the histograms, the t procedures can be used. (b) The value of the t statistic is 2.223 with a P-value of

7.75

7.77 7.79

7.81

7.83

7.85 7.87

7.89

7.91

7.93

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-39

0.0165. This gives fairly strong evidence that the mean SSHA score is lower for men than for women at this college. (c) The condence interval for the difference is (4.907, 35.981). 7.95 7.97 The approximate degrees of freedom are 37.859. (a) t = 12.71. (b) t = 4.303. (c) The increase in degrees of freedom means that the value of the t statistic needs to be less extreme for the pooled t statistic in order to nd a statistically signicant difference. (a) The upper 5% critical value is F = 2.20. (b) Signicant at the 10% level but not at the 5% level.

7.99

7.101 The hypotheses are H0 : 1 = 2 and Ha : 1 = 2 . F = 1.59, and using an F(33, 43) distribution and statistical software, P-value = 2 0.0764 = 0.1528, so there is little evidence of a difference in variances between the two groups. 7.103 (a) The value in the table is 647.79. (b) F = 3.94, and using an F(1, 1) distribution and statistical software, P-value = 2 0.2981 = 0.5962, so there is little evidence of a difference in variances between the two groups. 7.105 (a) The hypotheses are H0 : M = W and Ha : M > W . (b) F = 1.74. (c) Using an F(19, 17) distribution P-value > 0.10, so there is little evidence that the males have larger variability in their scores. 7.107 The power for n = 25 is 0.4855; n = 50 is 0.7411; n = 75 is 0.8787; n = 100 is 0.9460; n = 125 is 0.9769. These powers were calculated using SAS. The powers obtained using the Normal approximation are quiter close to these values. 7.109 (a) Using SAS, POWER = 1-PROBT(t , DF, ) = 0.3390. (b) Using SAS, POWER = 1-PROBT(t , DF, ) = 0.7765, which is greater than the power found in (a). 7.111 Perceived quality Food served in promised time Quickly corrects mistakes Well-dressed staff Attractive menu Serving accurately Well-trained personnel Clean dining area Employees adjust to needs Employees know menu Convenient hours xH xL 0.45 0.16 0.39 0.21 0.37 0.06 0.08 0.14 0.29 0.24
2 sH 170

2 sL 224

t 3.24 1.28 2.87 1.47 3.01 0.48 0.630 0.609 2.44 1.86

P-value Between 0.001 and 0.002 Between 0.2 and 0.3 0.005 Between 0.1 and 0.2 Between 0.002 and 0.005 > 0.5 > 0.5 > 0.5

Conclusion Reject H0 Do not reject H0 Reject H0 Do not reject H0 Reject H0 Do not reject H0 Do not reject H0 Do not reject H0

0.139 0.125 0.136 0.143 0.123 0.125 0.127 0.23 0.119 0.129

Between 0.01 Reject H0 and 0.02 Between 0.05 Do not reject H0 and 0.1

There is a signicant difference between the high- and low-performing restaurants in regards to food served in promised time, well-dressed staff, serving ordered

Moore-212007

pbs

November 27, 2007

9:50

S-40

SOLUTIONS TO ODD-NUMBERED EXERCISES

food accurately, and employees know menu. There is not a signicant difference in the other qualities. We need to assume fairly normally distributed data without outliers and that a simple random sample was taken from each group. 7.113 (a) Study was done in South Korea; results may not apply to other countries. Only selected QSRs were studied; results may not apply to other QSRs. Response rate is low (394/950); we would trust the results more if the rate were higher. The fact that no differences were found when the demographics of this study were compared with the demographics of similar studies suggests that we do not have a serious problem with bias based on these characteristics. (b) Answers will vary. 7.115 H0 : = 4.88 and Ha : > 4.88. t = 21.98, P-value is close to 0 so we can reject the null hypothesis. There is strong evidence that hotel managers have a signicantly higher average masculinity score than the general male population. 7.117 (a) No outliers, slightly skewed right but fairly symmetric. (b) (12.9998, 13.3077). 7.119 x C = 48.9513, sC = 0.25377, x R = 41.6488, s R = 0.39219. Side-by-side boxplot shows cotton much higher than ramie. H0 : C = R and Ha : C > R . t = 46.162, P-value is very close to 0 so reject the null hypothesis. There is strong evidence that cotton has a signicantly higher mean lightness than ramie. 7.121 (1.0765, 7.6035). 7.123 2.555 0.139. 7.125 The 95% condence interval on percentage of lower priced products at the alternate supplier is (64.55, 92.09). This suggests that more than half of the products at the alternate supplier are priced lower than the original supplier. 7.127 (a) This study used a matched pairs design; therefore they used a single sample t test. (b) The average weight loss in this program was signicantly different from zero, and we can conclude that the program was effective. (c) The P-value is approximately equal to zero. 7.129 (a) H0 : 1 = 2 and Ha : 1 < 2 , t = 8.954, P-value 0. Conclude that the workers were faster than the students. (b) The t procedures are robust for large sample sizes even when the distributions are slightly skewed. (c) The middle 95% of scores would be from 29.66 to 44.98. (d) The scores for the rst minute are clearly much lower than the scores for the 15th minute. 7.131 (a) (0.207, 0.573). (b) (0.312, 0.012). 7.133 (a) No, the t test is robust to skewness. (b) Yes, the F test is not robust to skewness. 7.135 The F test result shows that there is no reason to believe the variances of the two sexes are unequal. Results of the t tests show that there is a signicant difference in the average SATM scores between the two sexes. The P-values of both are approximately zero. 7.137 As the degrees of freedom increase for small values of n, the value of t rapidly approaches the z score. As sample sizes get larger, the t values are close to z = 1.96 but never become greater than 1.96.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-41

CHAPTER . . . . . . . . . . .8
8.1 8.3 8.5 8.7 (0.21, 0.47). (0.22, 0.48). This plus four interval is shifted slightly to the right of the original interval of (0.21, 0.47). Smaller samples sizes will make a bigger difference when the sample proportion is close to 0. An example would be 1/100. (a) (0.23, 0.41). If we know how many answered Yes, we automatically know how many answered No. (b) The test statistic is z = 1.62 and P-value = 0.1052. Because the proportion who answer Yes is 1 minus the proportion who answer No, testing whether the proportion who answer Yes is equal to 0.75 is equivalent to testing whether the proportion who answer No is 0.25. Need to sample at least 601 people if a p of 0.5 is used. (a) p Z SE p (forgot the Z ). (b) Hypotheses need population parameters, not the sample statistics. H0 : p = 0.3 is appropriate. (a) 0.634 0.0126. If we were to repeat our sampling many times and compute a condence interval from each sample, over the long run, approximately 95% of these intervals would contain the population proportion. (b) No, a high nonresponse rate would skew the results. Those most likely not to reply are the cheaters. (a) m = 0.00132. (b) Students most likely not to respond are the cheaters, and a 15% response rate is very low. What about the schools that cant afford the fee? Just because the sample size is large doesnt mean that good data was collected. The response rate and other issues may be larger sources of error here than pure statistical variation quantied by the margin of error. (a) (0.300, 0.354). (b) (0.301, 0.355). The methods have the same margin of error = 0.027, but the plus four method shifts the interval slightly higher. (c) Nonresponse rate is only 3.64%, which is small, so we can trust these results. (d) Yes, the person delivering the sermon probably thinks the sermon is shorter than it actually is, and the congregation would probably think the sermons are longer than they actually are. (0.324, 0.376). 99% condence interval is wider because the Z is bigger. (0.316, 0.384). (0.635, 0.745). (0.218, 0.251). (a) (0.359, 0.401). (b) Teens 1619 years old may have jobs, 18 and 19 year olds may be living on their own. It would make more sense to group the teens as 1215 year olds, 1617 year olds, and 1819 year olds. No. A person could have both lied about having a degree (for example, having an advanced degree such as a masters or Ph.D.) and about their major (for example, their undergraduate major if they lied about having a masters degree but did have a bachelors degree). Because lying about having a degree and lying about major

8.9 8.11 8.13

8.15

8.17

8.19 8.21 8.23 8.25 8.27

8.29

Moore-212007

pbs

November 27, 2007

9:50

S-42

SOLUTIONS TO ODD-NUMBERED EXERCISES

are not necessarily mutually exclusive events, we cannot automatically conclude that a total of 24 = 15 + 9 applicants lied about one or the other. 8.31 8.33 8.35 (0.768, 0.912). (0.642, 0.768). (a) p = 0.3168, n = 1711, and X = 542 (the number who tested positive) are basic summary statistics. (b) (0.295, 0.339). (c) This is an observational study, not a designed experiment. Observational studies generally do not provide a good basis for concluding causality, only association. Also, this is not a random sample from the population of all bicyclists who were fatally injured in a bicycle accident. (a) Not safe. np0 = 4 < 10 and n(1 p0 ) = 6 < 10. (b) Safe. np0 = 60 and n(1 p0 ) = 40. Both are greater than 10. (c) Safe. (d) Safe. np0 = 150 and n(1 p0 ) = 350. Both are greater than 10. (a) H0 : p = 0.64, Ha : p = 0.64. (b) The test statistic is z = 0.93. The P-value = 0.3524. This is not strong evidence against the null hypothesis that the sample represents the state in regard to rural versus urban residence in terms of the proportion of urban residents. (c) These results are consistent (same P-value) with the previous exercise. We conclude there is not strong evidence against the hypothesis that the sample represents the state in regard to rural versus urban residence in terms either of the proportion of rural residents or in terms of the proportion of urban residents. (0.106, 0.294). (a) The test statistic is z = 1.34. The P-value = 0.1802, and since this is larger than 0.05, we would not reject the null hypothesis that the probability that Kerrichs coin comes up heads is 0.5. (b) (0.4969, 0.5165). n = 450.18. Round up to get n = 451. We round up to get n = 201. The margin of error for the 90% condence interval is 0.0691. p: 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 m: 0.0537 0.0716 0.0820 0.0877 0.0895 0.0877 0.0820 0.0716 0.0537 8.51 (a) For p1 : mean p1 = p1 , standard deviation p1 = p2 = p2 , standard deviation p2 = p1 p2 . (c) 8.53 8.55 8.57
2 D p1 (1 p1 ) . n

8.37

8.39

8.41 8.43

8.45 8.47 8.49

For p2 : mean

2 p1 p2

2 p1

p2 (1 p2 ) . (b) D = p1 p2 n p1 (1 p1 ) p2 (1 p2 ) 2 p2 = + . n n

= p1 p2 =

(0.0301, 0.0788). Plus four interval: (0.125, 0.267), Z interval: (0.112, 0.272). H0 : p1 = p2 , Ha : p1 > p2 . The test statistic z = 4.46. The P-value is approximately 0. This is strong evidence that a higher proportion of complainers than noncomplainers leave voluntarily. (a) p1 p2 = 0.1 and p1 p2 = 0.0947. (c) (0.289, 0.089). (a) H0 : pImpulse = pPlanned and Ha : pImpulse = pPlanned , Z = 1.02, P-value = 0.3078. Do not reject the null hypothesis. There is not enough evidence to say

8.59 8.61

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-43

the difference in credit card use between impulse and planned purchases is statistically signicant. 8.63 H0 : pS = pH and Ha : pS = pH , Z = 1.14, P-value = 0.2542. Do not reject the null hypothesis. There is not enough evidence to say that the detergent preferences are signicantly different for people with hard water and people with soft water. (a) Tippecanoe County: p1 = 0.511. Benton County: p2 = 0.408. (b) SE D = 0.029. (c) (0.0273, 0.1787). The interval does not contain 0, and this is strong evidence that the opinions differed in the two counties. (a) H0 : p1 = p2 , Ha : p1 = p2 . (b) z = 1.23, P-value = 0.2186. This is not strong evidence that there is a difference in preference for natural trees versus articial trees between urban and rural households. (c) (0.0209, 0.1389). (a) z = 5.40, P-value = approximately 0. This is strong evidence that there is a difference in the proportions of the two types of shields removed. (b) (0.2125, 0.4115). Zero is well outside this interval, so that bolt-on shields appear to be removed more often than ip-up shields. We would recommend that ip-up shields be used on new tractors. (a) p1 = 0.141 and p2 = 0.339. A 95% condence interval is (0.253, 0.143). p1 (1 p1 ) contributes more to the standard error of the difference (b) The term n1 because n 1 = 191 is so much smaller than n 2 = 1520. We test the hypotheses H0 : p1 = p2 , Ha : p1 = p2 . z = 5.5, P-value = approximately 0. This is strong evidence that there is a difference in the proportions of females and males who were in a fatal bicycle accident, were tested for alcohol, and tested positive. H0 : p1 = p2 , Ha : p1 = p2 , Z = 0.34, P-value = 0.7333. There is not enough evidence to say that the applicants are lying in different proportions than they did 6 months ago. Let p1 denote the proportion of all Danish males born in Copenhagen with a normal male chromosome who have had criminal records and p2 the proportion of all Danish males born in Copenhagen with an abnormal male chromosome who have had criminal records. We test the hypotheses H0 : p1 = p2 , Ha : p1 < p2 . z = 3.51, P-value < 0.0003. This is strong evidence that the proportion of males born in Copenhagen with an abnormal male chromosome who have criminal records is larger than that of males born in Copenhagen with a normal male chromosome. (a) (0.038, 0.482). (b) z = 1.91, P-value = 0.0281. We could conclude that there is reasonably strong evidence that the proportion of cockroaches that will die on glass is less than the proportion that will die on plasterboard. H0 : p1 = p2 , Ha : p1 = p2 , Z = 14.8, P-value is approximately 0, so reject the null hypothesis. There is evidence of a signicant difference between the proportion of male athletes who admit to cheating and the proportion of female athletes who admit to cheating. The 95% condence interval (male female) is (0.1963, 0.2377). Someone who gambles will be less likely to respond to the survey. Do you think men or women are more likely to report that they do not gamble when, in fact, they do gamble?

8.65

8.67

8.69

8.71

8.73

8.75

8.77

8.79

8.81

Moore-212007

pbs

November 27, 2007

9:50

S-44

SOLUTIONS TO ODD-NUMBERED EXERCISES

8.83

(a) and (b) Category Download less Peer-to-peer E-mail and IM Web sites iTunes Overall use of new services Overall use paid services p (in %) 38 33.3 24 20 17 7 3 n 247 247 247 247 247 1371 1371 m (in %) 3.09 3.00 2.72 2.55 2.39 0.69 0.46

(c) Argument for (A): Readers should understand that the population percent is not guaranteed to be at the sample percent; there is variability involved in taking a sample. Argument for (B): Listing each individual margin of error does seem excessive, so you could summarize by saying that the margin of error was no greater than 3.09% for each of these questions. You could also separate out the last 2 questions by saying their margin of error was less than 1%. 8.85 (a) H0 : p1 = p2 , Ha : p1 = p2 , Z = 6.97 with a P-value near 0, so reject the null hypothesis. There is strong evidence that the users and nonusers differ signicantly in the proportion of college graduates. (b) 95% CI (users nonusers) = (0.1141, 0.2019). Users (total number) Analysis of education Analysis of income 1132 871 Nonusers (total number) 852 677

8.87

For users, the proportion of rather not say is 0.231. For nonusers, the proportion of rather not say is 0.205. H0 : p1 = p2 , Ha : p1 = p2 , Z = 1.38, P-value = 0.1676, so do not reject the null hypothesis. There is not much evidence of a signicant difference in the proportion of rather not say answers between users and nonusers. The 95% CI (users nonusers) is (0.0106, 0.0626). Since the nonresponse rate is not signicantly different for the two groups, it is not a serious limitation for this study. 8.89 (a) prepeat = 0.783, pnorepeat = 0.517, 95% CI (repeat no repeat) is (0.102, 0.430). (b) H0 : p1 = p2 , Ha : p1 = p2 , Z = 3.05, P-value = 0.0022, so reject the null hypothesis. There is strong evidence of a signicant difference in the proportion of tips received between servers who repeat the customers order and those who do not repeat the order. (c) Cultural differences, personalities of the servers, and gender differences could all play a role in interpreting these results. Did one server only do the repeating while the other server did no repeating, or did they switch off? (d) Answers will vary. Let p1 be the proportion of all die-hard fans who attend a Cubs game at least once a month and p2 the proportion of less loyal fans who attend a Cubs game at least once a month. We test H0 : p1 = p2 , Ha : p1 > p2 . z = 9.04, P-value = approximately 0. This is strong evidence that the proportion of die-hard Cubs fans who attend a game at least once a month is larger than the proportion of less loyal

8.91

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-45

Cubs fans who attend a game at least once a month. A 95% condence interval for the difference in the two proportions is (0.3755, 0.5645), so the magnitude of the difference is large. 8.93 8.95 The number of people in the survey who said they had at least one credit card is n = 0.76(1025) = 779 (after rounding off). The margin of error is m = 0.0275. The 95% condence interval is (0.5524, 0.6736). This does not include 0.485, the proportion of U.S. adults who are male, so we would conclude that the proportion of heavy lottery players who are male is different from the proportion of U.S. adults who are male. Let p be the proportion of products that will fail to conform to specications in the modied process. We test the hypotheses H0 : p = 0.11, Ha : p < 0.11. z = 3.16, P-value = 0.0008. This is strong evidence that the proportion of nonconforming items is less than 0.11, the former value. This conclusion is justied provided the trial run can be considered a random sample of all (future) runs from the modied process and that items produced by the process are independently conforming or nonconforming. Let p1 denote the proportion of male college students who engage in frequent binge drinking and p2 the proportion of female college students who engage in frequent binge drinking. We test the hypotheses H0 : p1 = p2 , Ha : p1 = p2 . z = 9.5, P-value = approximately 0. This is strong evidence that there is a difference in the proportions of male and female college students who engage in frequent binge drinking.

8.97

8.99

8.101 (a) Let p1 denote the proportion of men with low blood pressure who die from cardiovascular disease and p2 the proportion of men with high blood pressure who die from cardiovascular disease. A 95% condence interval for the difference in the two proportions is (0.01411, 0.00319). This is consistent with the results in Example 8.8. (b) H0 : p1 = p2 , Ha : p1 < p2 . z = 3, P-value = 0.0013. We would conclude that there is strong evidence that the proportion of men with low blood pressure who die from cardiovascular disease is less than the proportion of men with high blood pressure who die from cardiovascular disease. 8.103 n m 10 0.438 25 0.277 50 0.196 100 0.139 150 0.113 200 0.098 400 0.069 500 0.062

As sample size increases, the margin of error decreases. 8.105 Starting with a sample size of 25 for the rst sample, it is not possible to guarantee a margin of error of 0.15 or less. m = 1.960 a negative value for n 2 , which is not feasible.
(0.5)(0.5) 25

(0.5)(0.5) . n2

This leads to

8.107 (a) p0 = 0.791. (b) p = 0.390, z = 29.09, P-value = approximately 0. We would conclude that there is strong evidence that the probability that a randomly selected juror is Mexican American is less than the proportion of eligible voters who are Mexican American. (c) z = 29.20, P-value = approximately 0. The z statistic and P-value are almost the same as in part (b).

Moore-212007

pbs

November 27, 2007

9:50

S-46

SOLUTIONS TO ODD-NUMBERED EXERCISES

CHAPTER . . . . . . . . . . .9
9.1 (a) Simplest is two columns, French or not French music playing, and two rows, French wine purchased or other wine purchased. (b) The explanatory variable is the type of music because we think this inuences the type of wine purchased, so the columns are type of music. 87.8% of successful rms and 72.3% of unsuccessful rms offer exclusive territories. 28 rms lack exclusive territories and 27.6% of all rms are unsuccessful. If there is no association between success and exclusive territories, then rms with exclusive territories and those lacking exclusive territories should both have 27.6% unsuccessful rms. df = 10. (a) Minitab calculates 2 = 10.949. z 2 = (3.31)2 = 10.95. (b) (3.291)2 = 10.83. (c) No relation says that whether or not a person is a label user has no relationship to gender, which is the same as H0 : p1 = p2 . Answers will vary, but one example would be: X A B C Totals 9.13 (a) Two-way table: Admit Gender Male Female Total Yes 490 400 890 No 310 300 610 Total 800 700 1500 10 10 10 30 Y 10 10 10 30 Z 10 10 10 30 Totals 30 30 30 90

9.3 9.5

9.7 9.9

9.11

(b) % of males who are admitted: 490/800 = 61.25%; % of females who are admitted: 400/700 = 57.14%. (c) H0 : There is no association between gender and admission, Ha: There is an association between gender and admission. 2 = 2.610, P-value = 0.106 from SPSS. Do not reject the null hypothesis. There is not enough evidence to say that there is an association between gender and admission. (d) % of males who are admitted to business school: 400/600 = 66.67%; % of females who are admitted to business school: 200/300 = 66.67%. % of males who are admitted to law school: 90/200 = 45%; % of females who are admitted to law school: 200/400 = 50%. (e) Business: 2 = 0, P-value = 1 from SPSS. Do not reject the null hypothesis. There is not enough evidence to say that there is an association between gender and admission in business school. Law: 2 = 1.335, P-value = 0.248 from SPSS. Do not reject the null hypothesis. There is not enough evidence to say that there is an association between gender and admission in law school. (f) Simpsons paradox: Because the business school has so many more students both admitted and rejected (and 600 men apply, more

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-47

than any gender to any program), it changes the overall results when business and law are combined. 9.15 % of Department As classes which are small: 32/52 = 61.54%. % of Department Bs classes which are small: 42/106 = 39.62%. % of Department As classes which are for third and fourth year students: 40/52 = 76.92%. % of Department Bs classes which are for third and fourth year students: 36/106 = 33.96%. Department A teaches a much larger percentage of small classes, but Department A teaches more than twice the percentage of third and fourth year students as Department B. 2 = 2.591, P-value = 0.107 from SPSS. Do not reject the null hypothesis. There is not enough evidence to say that there is an association between model dress and magazine readership age group. The marginal percentages for model dress were 73.6% of the ads had models dressed not sexually, and 26.4% had models dressed sexually. The marginal percentages for age group were 33.3% in the young adult category and 66.7% in the mature adult category. (a) 2 = 76.675, P-value is close to 0 from SPSS. Reject the null hypothesis. There is very strong evidence that there is an association between collegiate sports division and report of cheating. (b) Answers will vary, but here is one possibility: If we change all the sample sizes to 1000 but keep the percentages the same, 2 = 15.713 and the P-value remains very close to 0. If we change all the yes answer counts to 100 but keep the percentages the same, then 2 = 7.749 and the P-value increases to 0.021. (c) The people most likely not to respond are those who gamble, so the results may be biased towards no answers. (d) If one member of a team is cheating, it may be more likely that others are cheating too. The teammates also may have had a discussion about how to ll out the form. 2 = 12, P-value = 0.001 from SPSS. Reject the null hypothesis. There is strong evidence to say that there is an association between gender and visits to the H. bihai owers. The 0 cell count does not invalidate the signicance test. It is the expected counts that need to be 5 or greater for a 2 2 table in order for the chi-square test to be appropriate. Visits H. bihai Gender Female Male Total 9.23 Yes 20 0 20 No 29 21 50 Total 49 21 70

9.17

9.19

9.21

Initial major Biology Chemistry Mathematics Physics

Transferred to other 202 64 38 33

2 = 50.527, P-value is close to 0 from SPSS. Reject the null hypothesis. There is strong evidence to say that there is an association between initial major and transferred area.

Moore-212007

pbs

November 27, 2007

9:50

S-48

SOLUTIONS TO ODD-NUMBERED EXERCISES

9.25

2 = 50.81, df = 2, P-value = 0.000. The older employees (over 40) are almost twice as likely to fall into the lowest performance category but are only 1/3 as likely to fall into the highest category. 2 = 3.277, df = 4, P-value = 0.513. The data show no evidence the response rates vary by industry. df = 4. Statistical software gives P-value = 0.4121. There is no evidence of a difference in the income distribution of the customers at the two stores. (a) A phone call has a 68.2% response rate, a letter has a 43.7% response rate, and no intervention has a 20.6% response rate. (b) H0 : There is no relationship between intervention and response rate. Ha : There is a relationship. (c) 2 = 163.413, df = 2, P-value = 0.000. Intervention seems to increase the response rate, with a phone call being more effective than a letter. Use the information on nonresponse rate, and take a larger sample size than necessary to make sure that you have enough observations with nonresponse accounted for. z = 6.1977 and 2 = 38.411. It is easily veried that z 2 = 2 . 2 = 19.683, df = 1, P-value = 0.000. There is strong evidence of a relationship between winning or losing this year and winning or losing next year. In Exercise 9.20 good performance continued, while for the data in this exercise the opposite is true. (a) Combined 2 = 13.572 and P < 0.000, so the difference in percent on time for the two airlines is highly signicant, with America West being the winner. (b) Los Angeles: 2 = 3.241 and P = 0.072. Phoenix: 2 = 2.346 and P = 0.126. San Diego: 2 = 4.845 and P = 0.028. San Francisco: 2 = 21.223 and P < 0.001. Seattle: 2 = 14.903 and P < 0.001. (c) Most of the effects illustrating the paradox are statistically signicant. 2 = 43.487, df = 12, P-value = 0.000. This is highly signicant, indicating a relationship between the PEOPLE score and eld of study. Among the major differences, note that science has an unusually large percent of low-scoring students relative to other elds, while liberal arts and education have an unusually large percent of high-scoring students relative to the other elds. The graph shows an increasing trend, yet 2 = 9.969 with 13 degrees of freedom (P-value = 0.696). Although there appears to be a strong increasing trend, the changes in the percents are quite small (from 60% to about 66%). Women represented a very small percent in 1970, but that percent increased quite rapidly until the mid-1980s, when it reached about 60%. The percent continued to increase in the late 1980s but much more slowly, with some leveling off in the early 1990s. The percent started increasing again in the mid-1990s, but very slowly. (a) Column percents because the source of the cat is the explanatory variable. (b) 2 = 6.611, df = 2, P-value = 0.037. At the 5% level of signicance we would conclude a relationship between the source of the cat and whether or not the cat is brought to an animal shelter. 2 = 24.9, df = 2, and P-value < 0.0005. The source of dogs and cats differs. A much higher percent of dogs than cats come from private sources, while a much higher percent of cats than dogs come from other sources such as born in home, stray, or obtained from a shelter.

9.27 9.29 9.31

9.33

9.35 9.37

9.39

9.41

9.43

9.45

9.47

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-49

9.49

Exercise 9.12 is a test of independence based on a single sample. Exercise 9.13 is a comparison of several populations based on separate samples from each. Exercise 9.14 is a comparison of several populations based on separate samples from each. Exercise 9.16 is a test of independence based on a single sample. (a) 2 = 24.9, df = 2, and P-value < 0.0005. Inspection of the data shows that the males have higher percents in the two high social comparison categories, while the females have higher percents in the two low social comparison categories. (b) 2 = 23.45 and P-value < 0.0005. This agrees with what we found from the full 4 2 table. (c) 2 = 0.03 and P-value = 0.863, which indicates no gender differences for the high versus low mastery categories.

9.51

CHAPTER .10 ......... ..


10.1 10.3 Predicted wages = $449.91. Residual = $60.91. (a) 0 = 4.7. When U.S. returns are 0%, overseas returns are 4.7%. (b) 1 = 0.66. A 1% increase in U.S. returns is associated with a 0.66% increase in overseas returns. (c) MEAN OVERSEAS RETURN = 4.7 + 0.66 U.S. RETURN + . The term in this model accounts for the differences in overseas returns in years when the U.S. return is the same. (a) Yes, there is a strong, positive, linear relationship between year and spending. (b) y = 2651.400 + 1.340x. (c) Year 1995 1996 1997 1998 1999 Residual 0.30 0.24 0.18 0.12 0.24 (d) SPENDING = 0 + 1 YEAR + , with estimates 0 = 2651.400, 1 = 1.340, = 0.2898. (e) For 2001, the predicted spending is $29.94. The residual is $2.76. This is an extrapolation. The trend might not stay the same beyond 1999. 10.7 10.9 b1 = 0.6270. SEb1 = 0.0992. H0 : 1 = 0, Ha : 1 > 0. t = 6.32 with 49 degrees of freedom. P-value < 0.0005. The mutual funds that had the highest returns last year did so well partly because they were a good investment and partly because of good luck (chance).

10.5

10.11 (a) r = 0.6700. t = 6.32 with 49 degrees of freedom. P-value < 0.0005 and there is strong evidence that the population correlation > 0. (b) The output gives t = 6.3177. 10.13 Area: x = 28.29, s = 17.714, min = 2, max = 70, fairly symmetric and normally distributed except for 2 high outliers. IBI: x = 65.94, s = 18.280, min = 29, max = 9, skewed left but no outliers. (b) Scatterplot looks weak, positive, and linear. Its very hard to tell if there are any outliers or unusual patterns because the relationship is so weak. (c) IBI = 0 + 1 AREA + . (d) H0 : 1 = 0, Ha : 1 = 0. (e) y = 52.923 + 0.460x. From SPSS, the hypothesis test in part (d) has t = 3.415 and a P-value of 0.001, so we can reject the null hypothesis. There is strong evidence that Area and IBI have a linear relationship. R 2 = 19.9%, and the regression standard error is s = 16.535. (f) The residual plot shows that the residuals get slightly less spread out as Area increases, but overall it doesnt look too bad. (g) Yes, the normal probability plot looks good. (h) The assumptions are reasonablewe are assuming a simple random sample from the population,

Moore-212007

pbs

November 27, 2007

9:50

S-50

SOLUTIONS TO ODD-NUMBERED EXERCISES

there is a fairly linear (although weak) relationship between IBI and Area, there is approximately the same spread above and below the regression line on the scatterplot which is fairly uniform for all Area values on the plot, and the Normal probability and residual plots look good. 10.15 Area is the better explanatory variable for regression with IBI. IBI and Forest have a much weaker relationship than IBI and Area. 10.17 H0 : = 0, Ha : = 0, P-value = 0.061, so do not reject the null hypothesis. There is not enough evidence to say that the correlation is signicantly different from 0 if a signicance level of 0.05 is used. This agrees with what we saw in the test using the slope in the 10.14. The correlation is a numerical description of the linear relationship between Area and Forest, which is very weak. 10.19 (a) Each of the vertical stacks corresponds to an integer value of age. (b) Older men might earn more than younger men because of seniority or experience. Younger men might earn more than older men because they are better trained for certain types of high-paying jobs (technology) or have more education. The correlation is positive, so there is a positive association between age and income in the sample. r 2 = 0.03525, so that age explains only about 3% of the variation in income. The relationship is weak. (c) Income = 24,874.3745 + 892.1135 Age. The slope tells us that for each additional year in age, the predicted income increases by $892.1135. 10.21 (a) We see the skewness in the plot by looking at the vertical stacks of points. There are many points in the bottom (lower income) portion of each stack. At the upper (higher income) portion of each vertical stack the points are more dispersed, with only a very few at the highest incomes. (b) Regression inference is robust against a moderate lack of Normality for larger sample sizes. 10.23 (a) The intercept tells us what the T-bill percent will be when ination is at 0%. No one will purchase T-bills if they do not offer a rate greater than 0. (b) b0 = 2.6660 and SEb0 = 0.5039. (c) t = 5.29 with 49 degrees of freedom. P-value < 0.0005. This is strong evidence that 0 > 0. (d) (1.6476, 3.6844). 10.25 (a) r 2 = 0.696 and indicates that square footage is helpful for predicting selling price. (b) Selling price = 4786.46 + 92.8209 Square footage. P-value < 0.0001. There is a statistically signicant straight-line relationship between selling price and square footage. 10.27 H0 : = 0, Ha : > 0, r = 0.835, t = 10.51 with 48 degrees of freedom, P-value < 0.0005. We conclude that there is strong evidence that > 0. 10.29 (a) r 2 increases from 69.6% to 71.8%, the intercept and slope for the least-squares regression line change from 4786.46 and 92.8209 to 8039.26 and 89.3029, respectively, and the t statistic for the slope changes from 10.5 to 10.9. The conclusion we reached in Exercise 10.17 is not changed. (b) r 2 decreases from 69.6% to 59.1%, the intercept and slope for the least-squares regression line change from 4786.46 and 92.8209 to 27982.8 and 73.3070, respectively, and the t statistic for the slope changes from 10.5 to 7.97. The conclusion we reached in Exercise 10.17 is not changed. 10.31 (a) Growth is much faster than linear. (b) The plot looks very linear. (c) (0.187469, 0.200261). 10.33 (a) y t SE = (391.86, 454.54). (b) y t SE = (397.05, 449.35).

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-51

10.35 (a) (62.379, 71.996). (b) (33.579, 100.797). (c) A 95% condence interval for mean response is the interval for the average IBI for all areas of 31 km2 . A 95% prediction interval for a future response estimates a single IBI for a 31 km2 area. (d) It would depend on the type of terrain and the location. Mountain regions, latitude, and proximity to industrial areas might make a difference. 10.37 The condence intervals for mean response and the prediction intervals are both fairly similar. They are not exactly the same because the IBI and Forest did not have as strong a relationship as IBI and Area. 10.39 (a) y = 24,874.3745 + 892.1135 Age = 51,637.78. (b) Under the column labeled 95.0% CI, we see that the desired interval is (49,780, 53,496). (c) Under the column labeled 95.0% PI, we see that the desired interval is (41,735, 145,010). It is too wide to be very useful. 10.41 (58,915, 62,203). 10.43 (a) (44,446, 55,314). (b) This interval is wider because the regression output uses all 5712 men in the data set, while the one-sample t procedure uses only the 195 men of age 30. 10.45 A 90% prediction interval for the BAC of someone who drinks 5 beers is (0.03996, 0.11428). This interval includes values larger than 0.08, so Steve cant be condent that he wont be arrested if he drives and is stopped. 10.47 SEb1 = 0.09944. 10.49 The hypotheses are H0 : 1 = 0, Ha : 1 = 0. F = 39.914 and t = 6.3177. F = t. P-value = 7.563E 08. Regression SS 10.51 (a) r 2 = = 0.4489. (b) s = Residual MS = 2.1801. Total SS 10.53 s = 19.0417. r 2 = 0.2534. 10.55 (a) H0 : 1 = 0, Ha : 1 = 0, t = 6.041, and P-value = 0.0001. There is strong evidence that the slope of the population regression line of protability on reputation is nonzero. (b) 19.36% of the variation in protability among these companies is explained by regression on reputation. (c) Part (a) tells us that the test of whether the slope of the regression line of protability on reputation is 0 is statistically signicant, but the value of r 2 in part (b) tells us that only 19.36% of the variation in protability among these companies is explained by regression on reputation. 10.57 (0.111822, 0.140586). 10.59 The F statistic (36.492) is the square of the t statistic (6.041) for the slope. P-value for F statistic = 0.0001 = P-value for t statistic for the slope. 10.61 r 2 = 0.1936, which equals the value of R-square in the output. 10.63 Answers will vary. 10.65 (a) For more expensive items, the pharmacy would be charging less of a markup. (b) y = 2.885 0.295x where y is the predicted (log) markup and x is the (log) cost. (c) df = 137, but using the t table we would use df = 100 to be conservative. For testing H0 : 1 = 0, Ha : 1 < 0, the P-value is less than 0.005, so we can reject the null hypothesis. There is strong evidence that (log) markup and (log) cost have a negative linear relationship (evidence that charge compression is taking place).

Moore-212007

pbs

November 27, 2007

9:50

S-52

SOLUTIONS TO ODD-NUMBERED EXERCISES

10.67 (a) Some questions to consider: Is this a national chain or an independent restaurant? What kind of food is prepared? Do the restaurants have similar stafng and experience? (b) Answers will vary. 10.69 (a) The relationship looks positive, linear, and moderately strong except for potential outliers for the two largest hotels (1388 and 1590 rooms each). (b) The moderately strong linear relationship is good, but the outliers are not. (c) y = 101.981 + 0.514x, where y is the predicted number of employees and x is the number of rooms. (d) H0 : 1 = 0, Ha : 1 = 0 with t = 4.287 and P-value from SPSS of 0.001, so reject the null hypothesis. There is strong evidence that there is a signicant linear relationship between the number of rooms and the number of employees working at hotels in Toronto. (e) (0.253, 0.775). 10.71 (a) Hotel 1 (1388 rooms) and Hotel 11 (1590 rooms) are the two outliers. (b) The R 2 drops from 60.5% in the original model to just 26.3% in the model without the outliers. The s drops from 188.774 in the original model to 129.151 in the model without the outliers. The new equation of the line is y = 72.526 + 0.589x. The new t test statistic is 1.891, and the P-value is now 0.088, so we cannot reject the null hypothesis. There is not enough evidence to say that the slope is signicantly different from 0. The 95% condence interval for the slope is (0.105, 1.284), which contains 0. The 95% condence interval for the slope from the original model did not contain 0. 10.73 The scatterplot shows a strong, linear, and positive relationship between the number of students and the total yearly expenditures. The least squares regression line is y = 0.526 + 0.530x, where y is the predicted total yearly expenditure and x is the number of students. R 2 is very good at 98.4%, and the regression standard error is s = 1.70. A two-sided test of the slope gives a t = 41.623 and a P-value close to 0. These results sound very promising; however, the normal probability plot does not look good, and the residual plot has a distinct funnel shape. The assumptions for regression may not be appropriate here. 10.75 It appears that regressing wages on length of service will do a better job of explaining wages for women who work in small banks than for women who work in large banks. 10.77 For women who work for large banks, the P-value for the test of the hypothesis that the slope is 0 is 0.2134. For women who work in small banks, the P-value for the test of the hypothesis that the slope is 0 is 0.0002. We might use length of service to predict wages for women who work in small banks, but not for women who work in large banks. 10.79 (a) y = 106.6, or 2.91066 meters. (b) A plot of the data shows that these data closely follow a straight line. r 2 = 99.8%, which tells us that for the years 1975 to 1987, the least-squares regression line of lean on year explains 99.8% of the variation in lean. The standard error of regression is s = 4.181. However, the absolute difference between the observed value in 1918 (coded value of 71) and the value predicted by the least-squares regression line (coded value 106.6) is 35.6. This is a multiple of more than 8 times s and would be considered an extreme outlier from the pattern of the data from 1975 to 1987. 10.81 t = 2.16, and 0.02 < P-value < 0.04.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-53

CHAPTER .11 ......... ..


11.1 11.3 11.5 Pred. SPACE = 210 + 160(45.24) + 160(17.00) + 150(38.00) + 65(315.00) + 120(43.25) = 41,533.4 ft2 . (a) The response variable is bank assets. (b) The explanatory variables are the number of banks and deposits. (c) p = 2. (d) n = 54. The distribution of sales is skewed to the right with two high outliers. The two high outliers are different than the high outliers in sales. It is not surprising that Wal-Mart has high sales relative to its assets because its primary business function is the distribution of products to nal users. This would require less in the form of assets and increases the amount of sales. The correlation between log(prots) and log(sales) is 0.526 (compared to 0.538 on the original scale). The correlation between log(prots) and log(assets) is 0.569 (compared to 0.533 on the original scale). The correlation between the explanatory variables log(assets) and log(sales) is 0.643 (compared to 0.455 on the original scale). The linear association between log(assets) and log(sales) appears much stronger, and the high outliers were eliminated from the other two plots. log(prots) = 1.50 + 0.238 log(assets) + 0.478 log(sales).

11.7

11.9

11.11 With General Motors and Wal-Mart deleted, prots = 1.55 + 0.00496 assets + 0.0553 sales. The coefcient of sales has more than doubled, and the coefcient of assets is much smaller. 11.13 For Excel, the unrounded values are s = 2.449581635 and s 2 = 6.000450185. The name given to s in the output is Standard Error. For Minitab, the unrounded values are s = 2.450 and s 2 = 6.000. The name given to s in the output is S. For SPSS, the unrounded values are s = 2.44958 and s 2 = 6.000. The name given to s in the output is Std. Error of the Estimate. For SAS, the unrounded values are s = 2.44958 and s 2 = 6.00045. The name given to s in the output is Root MSE. 11.15 (a) Variable Price Weight Amps Depth x 91.84 11.21 13.53 2.416 s 43.308 1.032 1.744 0.0765 Min 30 9 10 2.2 Q1 50 11 12 2.4 M 90 11 14 2.4 Q3 140 12 15 2.5 Max 150 13 15 2.5

(b) and (c) Price: fairly normally distributed, no outliers. Weight: fairly normally distributed, one low outlier. Amps: very left-skewed, no outliers. Depth: fairly normally distributed, one low outlier. 11.17 (a) y = 303.445 10.808 weight + 25.674 amps + 70.030 depth (b) 27.831 = 774.574. 11.19 (a) A histogram shows that the residuals are fairly Normally distributed with no outliers. (b) Residuals vs. Weight plot looks good with possibly an outlier at Weight = 9. Residuals vs. Amps looks good with possibly an outlier at Amps = 10. Residuals vs. Depth looks funnel shaped. 11.21 Saw 6 has the most negative residual of all, and Saws 2 and 11 both have negative residuals. However, Saw 8 has a positive residual. Therefore, we cannot say that

Moore-212007

pbs

November 27, 2007

9:50

S-54

SOLUTIONS TO ODD-NUMBERED EXERCISES

there is a relationship between rank and the residuals. A scatterplot of Residuals vs. Rank looks completely random. 11.23 (a) Variable Share Accounts Assets Mean 8.96 794 48.9 St.dev. 7.74 886 76.2 Median 8.85 509 15.35 Min. 1.30 125 1.3 Max. 27.50 2500 219.0 Q1 2.80 134 5.9 Q3 11.60 909 38.8

(b) Use split stems. (c) All three distributions appear to be skewed to the right, with high outliers. 11.25 (a) Share = 5.16 0.00031 Accounts + 0.0828 Assets. (b) s = 5.488. 11.27 All summaries are smaller except minimums. Largest effect is on the mean and standard deviation. Variable Share Accounts Assets Mean 6.60 392 13.76 St.dev. 4.63 293 12.27 Median 6.00 316.5 9.00 Min. 1.30 125 1.30 Max. 12.90 909 38.80 Q1 2.50 132 5.70 Q3 10.80 602.5 20.30

11.29 Share = 1.85 + 0.00663 Accounts + 0.157 Assets and s = 3.501. 11.31 (a) Stemplots show that all four distributions are right-skewed and that gross sales, cash items, and credit items have high outliers. Variable Gross Cash Credit Check Mean 320.3 20.52 20.04 7.68 Median 263.3 19.00 15.00 5.00 St.dev. 180.1 11.80 14.07 7.98

(b) There is a strong linear trend between gross sales and both cash and credit card items. The association is less strong with check items but this is due to an outlier. 11.33 Plot the residuals versus the three explanatory variables and give a Normal quantile plot for the residuals. There are no obvious problems in the plots. 11.35 (a) TBill98 = 0.883 + 0.138 Arch + 0.160 Eng + 0.0478 Staff. (b) s = 1.162. 11.37 HSS does not help much in predicting GPA, with math and English grades available for prediction. Package Excel Minitab SPSS SAS Coefcient 0.034315568 0.03432 3.432E-02 0.03432 Standard error 0.03755888 0.03756 .038 0.03756 t-statistic 0.913647251 0.91 .914 0.91 P-value 0.361902429 0.362 .362 0.3619

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-55

11.39 GPA = 0.624 + 0.183 HSM + 0.0607 HSE. HSM is still the most important variable in predicting GPA, although HSE is more helpful without HSS in the model. 11.41 The two models give similar predictions. Fit 1.84086 95.0% C.I. (1.6011, 2.0806) 95.0% P.I. (0.4415, 3.2402)

HSM, HSE

11.43 (a) HSM, HSS, HSE: 20.5%. (b) HSM, HSE: 20.2%. (c) HSM, HSS: 20.0%. (d) HSS, HSE: 12.3%. (e) HSM: 19.1%. Almost all of the information for predicting GPA is contained in HSM. 11.45 R12 = 21.15%, and after removing the grade variables, R22 = 6.34%. F = 13.65 with degrees of freedom q = 3 and n p1 = 218. Software gives P < 0.0001. High school grades contribute signicantly to explaining GPA when SAT scores are already in the model. 11.47 (a) t = 8.917, df = 40 30 1 = 9, P-value < 0.001. There is strong evidence that the coefcient of P5 is not 0. (b) (1.6135, 0.7585). Since we rejected the null hypothesis in part (a), we did not expect the condence interval to include 0. 11.49 The difference in signs makes sense because employees generally sign up for either the DC or the DB plan. The total number of dual-earner couples available to sign up for a plan probably stays about the same from year to year; however, as more dual-earner couples sign up for DB plans, fewer would be signing up for DC plans. 11.51 (a) The squared multiple correlation, R 2 , gives the proportion of the variation in the response variable that is explained by the explanatory variables. (b) The null hypothesis should be H0 : 2 = 0. (c) One of the assumptions for multiple regression is that the deviations i are independent Normal random variables with mean 0 and a common standard deviation . Another possible correction would be: in each subpopulation, y varies Normally with a mean given by the population regression equation. 11.53 (a) F = 4, which has the F(2, 60) distribution. Software gives P = 0.0234. (b) R 2 = 11.76%. Very little of the variability in the response is explained by this set of 3 explanatory variables. 11.55 (a) The hypotheses about the jth explanatory variable are H0 : j = 0 and Ha : j = 0. The degrees of freedom for the t statistics are 2215. At the 5% level, values that are less than 1.96 or greater than 1.96 will lead to rejection of the null hypothesis. (b) The signicant explanatory variables are loan size, length of loan, percent down payment, cosigner, unsecured loan, total income, bad credit report, young borrower, own home, and years at current address. If a t is not signicant, conclude that the explanatory variable is not useful in prediction when all other variables are available to use for prediction. (c) The interest rate is lower for larger loans, lower for longer length loans, lower for a higher percent down payment, lower when there is a cosigner, higher for an unsecured loan, lower for those with higher total income, higher when there is a bad credit report, higher when there is a young borrower, lower when the borrower owns a home, and lower when the number of years at current address is higher.

Moore-212007

pbs

November 27, 2007

9:50

S-56

SOLUTIONS TO ODD-NUMBERED EXERCISES

11.57 (a) The hypotheses about the jth explanatory variable are H0 : j = 0 and Ha : j = 0. The degrees of freedom for the t statistics are 5650. At the 5% level, values that are less than 1.96 or greater than 1.96 will lead to rejection of the null hypothesis. (b) The statistically signicant explanatory variables are loan size, length of loan, percent down payment, and unsecured loan. (c) The interest rate is lower for larger loans, lower for longer length loans, lower for a higher percent down payment, and higher for an unsecured loan. 11.59 (a) y varies Normally with a mean GPA = 0 + 81 + 92 + 73 . (b) The GPA of students with an B+ in math, A in science, and B in English has a Normal distribution with an estimated mean of 2.563. 11.61 (a) yi = 0 + 2002 x2002i + years xyearsi + i . (b) 0 , 2002 , years , . (c) b0 = 30,236.981, b2002 = 0.865, b year s = 57.392, s = 9035.243. (d) F = 20.177 with degrees of freedom 2 and 13, and P-value = 0.000 from software. We conclude years in rank and 2002 salary contain information that can be used to predict 2005 salary. (e) R 2 = 75.6%. (f) For 2002 salary, t = 2.750, P-value = 0.017, reject the null hypothesis. For years in rank, t = 0.109, P-value = 0.915, do not reject the null hypothesis. Both tests use df = 13. There is evidence the coefcient for 2002 salary is signicantly different from zero, but the coefcient for years in rank is not. 11.63 y = 112,518.7 + 1351.837x. (b) t = 4.725, P-value is close to 0. Years in rank are useful for predicting 2006 salary. (c) Years in rank included with the 2002 salary produced these results: coefcient = 57.392, t = 0.109, P-value = 0.915. When included with the data concerning 2002 salary, years in rank is not very useful in predicting 2005 salary; however, rank in years is useful when performing the same regression analysis without the data from 2002 salary. 11.65 From the stemplot, six of these are the six most expensive homes. Three of these are clearly outliers (the three most expensive) for this location. 11.67 1000-square-foot homes, predicted price = $79,621.62. For 1500-square-foot homes, predicted price = $96,783.43. 11.69 For 1000-square-foot homes, predicted price = $78,253.47. For 1500-squarefoot homes, predicted price = $97,041.71. Overall, the two sets of predictions are fairly similar. 11.71 The pooled t test statistic is t = 2.875 with df = 35 and 0.005 < P-value < 0.01. These results agree (up to roundoff) with the results for the coefcient of Bed3 in Example 11.16. 11.73 The trend is increasing and roughly linear as one goes from 0 to 2 garages and then levels off. 11.75 With an extra half bath, predicted price = $99,719. Without an extra half bath, predicted price = $84,585. The difference in price is $15,134. 11.77 Because we have no data on smaller homes with an extra half bath, our regression equation is probably not trustworthy. 11.79 (a) The relationship between y and x is curved and increasing. (b) The relationship between y and x is curved, rst decreasing and then increasing. (c) The relationship between y and x is curved and decreasing.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-57

11.81 For part (a), the difference in means (Group B Group A) = 3 = the coefcient of x. This can be veried directly for the other parts and shown to be true in general. 11.83 For part (a), the difference in slopes (coefcient x2 for Group B coefcient x2 for Group A) = 19 7 = 12 = the coefcient of x1 x2 . The difference in the intercepts (Group B Group A) = 120 80 = 40 = coefcient of x1 . This can be veried directly for the other parts and shown to be true in general. 11.85 (a) Assets = 7.61 0.0046 Account + 0.000034 Account2. (b) 0.000034 0.000021. (c) t = 3.76, df = 7, and P = 0.007. The quadratic term is useful for predicting assets in a model that already contains the linear term. (d) The variables Account and the square of Account are highly correlated. 11.87 (a) R12 = 65.38%, and after removing the squared term, R22 = 62.48%. (b) F = 1.09 with 1 and 13 df. Software gives P = 0.3155. The squared term does not contribute signicantly to explaining salary. (c) t = 1.05, and t 2 = 1.10, which agrees with F up to rounding error. 11.89 (a) Price vs. Promotions: negative, moderate, linear. Price vs. Discount: negative, weak, fairly linear. (b) The table below shows the mean and standard deviation for expected price for each combination of promotion and discount. Promotions 1 3 5 7 10 4.920, 0.1520 4.756, 0.2429 4.393, 0.2685 4.269, 0.2699 20 4.689, 0.2331 4.524, 0.2707 4.251, 0.2648 4.094, 0.2407 30 4.225, 0.3856 4.097, 0.2346 3.89, 0.1629 3.76, 0.2618 40 4.423, 0.1848 4.284, 0.2040 4.058, 0.1760 3.780, 0.2144

(b) and (c) At every promotion level, the 10% discount yields the highest expected price, then 20%, then 40%, then 30%. For every discount level, the 1 promotion yields the highest expected price, then 3, then 5, then 7. 11.91 In the previous exercise, the residual plot for promotions looks ok but the residual plot for discount seems to have a curved shape. Therefore, try the quadratic term for discount and the interaction of discount and promotion. If the quadratic term for discount is used, the R 2 increases to 61.1%, the s decreases to 0.2511, the F test statistic is 81.809, and the P-value is still very close to 0. All coefcients are signicantly different from 0. If the quadratic term is removed and the interaction of discount and promotion is used instead, the coefcient for the interaction term is not signicant, and R 2 and s are very close to what they were in the original model. The best model is the one with the quadratic term for discount: y = 5.540 0.102 Promotion 0.060 Discount+ 0.001 Discount2 . 11.93 (a) Area Mean St. dev. Description of stemplot 28.29 17.714 Right skewed, 2 high outliers Forest 39.39 32.204 Right skewed, no outliers IBI 65.94 18.280 Left skewed, no outliers

(b) All relationships are linear and moderately weak. Area and Forest have a negative relationship, but the others are positive. Data point #40 looks like a

Moore-212007

pbs

November 27, 2007

9:50

S-58

SOLUTIONS TO ODD-NUMBERED EXERCISES

potential outlier on the Area/Forest scatterplot, but not anyplace else. (c) yi = 0 + Area xAreai + Forest xForesti + i . (d) H0 : Area = Forest = 0, Ha : The coefcients are not both 0. (e) R 2 = 35.7%, s = 14.972, F = 12.776, P-value = 0.000. All coefcients are signicant. y = 40.629 + 0.569 Area + 0.234 Forest. (f) Residual plots look goodrandom and no outliers. (g) A histogram of the residuals looks slightly skewed left, but the Normal probability plot looks good. (h) Yes. We had general linear relationships between the variables and the residuals look good. However, only 35.7% of the variation in IBI is explained by Area and Forest. 11.95 (a) PCB PCB52 PCB118 PCB138 PCB180 Mean 68.4674 0.9580 3.2563 6.8268 4.1584 St. dev. 59.3906 1.5983 3.0191 5.8627 4.9864 Description of distribution Skewed right, 5 high outliers Skewed right, 6 high outliers Skewed right, 5 high outliers Skewed right, 5 high outliers Skewed right, 7 high outliers

(b) All the variables are positively correlated with each other. All the explanatory variables are signicantly correlated with PCB, although PCB52 is the most weakly correlated with PCB and with the other explanatory variables. Only the correlation between PCB52 and PCB180 is not signicant (P-value = 0.478). 11.97 (a) #50 and #65 are the two potential outliers. #50 (residual = 22.0864) is the overestimate. (b) y = 1.628 + 14.442 PCB52 + 2.600 PCB118 + 4.054 PCB138 + 4.109 PCB 180. All coefcients are signicant again. R 2 = 99.4%, s = 4.555, F = 2628.685, P-value = 0. The residuals look fairly Normally distributed, and the residual plots look random. There are two new potential outliers though at #44 and #58. 11.99 (a) 0 = 0, 1 = 2 = 3 = 1 because TEQ = TEQPCB + TEQDIOXIN + TEQFURAN. (b) The error terms are all 0, so = 0. (c) Conrms what we did in (a) and (b). 11.101 (a) Results will vary with software. (b) Most software should ignore these data points, but ignoring data is not a good idea. (c) Table below uses Base 10 logs. Log of variable PCB138 PCB153 PCB180 PCB28 PCB52 PCB126 PCB118 PCB TEQ Mean 0.7009 0.7397 0.4235 0.5793 0.3354 1.504 0.3717 1.701 0.3495 St. dev. 0.3494 0.3914 0.4028 0.4918 0.5167 0.8576 0.3592 0.3483 0.2591 Description of stemplot Fairly symmetric, 1 low outlier Fairly symmetric, no outliers Fairly symmetric, 1 high outlier Fairly symmetric, 1 low and 1 high outlier Fairly symmetric, 1 low and 2 high outliers Right skewed, 16 high outliers Fairly symmetric, 1 low and 1 high outlier Fairly symmetric, no outliers Right skewed, no outliers

11.103 Answers will vary, but if you start with the full model and then drop one variable at a time, one good possibility is leaving out Log PCB126 because all the coefcients are signicant at the 5% level. This model has R 2 = 97.5% and

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-59

s = 0.0581. The least-squares regression line is: y = 1.287 + 0.400 Log PCB138+ 0.144 Log PCB153 + 0.132 Log PCB180 + 0.088 Log PCB28 + 0.101 Log PCB52 + 0.151 Log PCB118. If the students choose a lower signicance level, a good model with only 4 variables would be: y = 1.212 + 0.668 Log PCB138 + 0.152 Log PCB153 + 0.108 Log PCB28 + 0.087 Log PCB52. This model has an R 2 = 97.2% and s = 0.0600. A model with 3 variables would be: y = 1.214 + 0.821 Log PCB138 + 0.091 Log PCB28 + 0.103 Log PCB52. This model has an R 2 = 96.8% and s = 0.0638. 11.105 Answers will vary. 11.107 (a) R 2 = 18.9%, s = 1.1421, F = 3.152, P-value = 0.059. Neither the coefcient for x1 or for x2 is signicant. (b) All correlations are signicant. The scatterplots show that both x1 and x2 have positive linear relationships with y and with each other. If just x1 is used, its coefcient is signicant. If just x2 is used, its coefcient is signicant. (c) This isnt always the best approach either. Multiple regression is complicated. 11.109 (a) Vitamin C = 46 6.05 Days, t = 10.62, df = 8, P-value = 0. It appears there is a signicant linear relationship between vitamin C level and the number of days after baking. (b) It appears that the relationship may be slightly curved. The residuals show a systematic pattern in that the values go from positive to negative to positive as the number of days increases. (c) Vitamin C = 50.1 11.3 Days + 0.763 Days2 . The squared term t statistic is 6.08 and df = 7. The P-value = 0.000503. It appears that the squared term is signicant in the model. (d) R 2 without the squared term is 0.93 and with the squared term is 0.99. A look at the scatterplot of residuals vs. days in the second model shows that there is no systematic pattern. Choose the model with the squared term. 11.111 Wages = 349 + 0.6 LOS, t = 2.86, P-value = 0.006. R 2 = 0.12. While the t statistic shows that the coefcient on the variable LOS is signicantly different from zero, it does not appear that this is a strong linear model. LOS does not explain more than 12% of the variation in wages. 11.113 Wages = 302.5 + 0.67 LOS + 71.8 Size, R 2 = 0.29, t statistics have small P-values. 11.115 (a) Corn yield = 46.2+4.8Soybean yield, R 2 = 0.87, t = 16.04, P-value = 0. (b) The residuals appear fairly Normal. (c) It appears that there is an increasing and then decreasing effect when looking at the residuals plotted against year. It makes sense to include year in the regression model. 11.117 (a) Corn yield = 607.5 + 0.3x Year 0.045 Year2 + 3.9 Soybean yield. (b) H0 : All coefcients are equal to 0. Ha : At least one coefcient is not equal to zero. F = 233, df = 3 and 36, P-value = 0. This indicates that the model provides signicant prediction of corn yield. (c) R 2 = 0.95, compared to 0.90 from the previous model. (d) In the order they appear in the model, the t statistics for the coefcients are: t = 5.82 with a P-value = 0, t = 1.73 with a P-value = 0.09, t = 9.512 with a P-value = 0. (e) The residuals all appear random when plotted against the explanatory variables. 11.119 For the linear model, predicted yield for 2001 was 136.41 with a 95% prediction interval of (114, 158.61). For the quadratic model, predicted yield for 2001 was 127.14 with a 95% prediction interval of (101.82, 152. 45). The linear model gave a closer prediction to actual.

Moore-212007

pbs

November 27, 2007

9:50

S-60

SOLUTIONS TO ODD-NUMBERED EXERCISES

11.121 The scatterplots do not show strong relationships. The plot of SATM vs. GPA shows two possible outliers on the left side. The plot of SATV vs. GPA does not show any obvious outliers. 11.123 The residuals do not show any obvious patterns or problems. 11.125 GPA = 0.666 + 0.193 HSM + 0.00061 SATM. (b) H0 : All coefcients are equal to 0. Ha : At least one coefcient is not equal to zero. This means that we assume the model is not a signicant predictor of GPA and let the data provide evidence that the model is a signicant predictor. F = 26.63 and P-value = 0. This model provides signicant prediction of GPA. (c) HSM: (0.1295, 0.2565), SATM: (0.00059, 0.001815). Yes, the interval describing SATM coefcient does contain zero. (d) HSM: t = 5.99, P-value = 0. SATM: t = 0.999, P-value = 0.319. Based on the large P-value associated with the SATM coefcient, one should conclude that SATM does not provide signicant prediction of GPA. (e) s = 0.703. (f) R 2 = 0.1942. 11.127 Looking at the sample of males only shows the same results when compared to the sample of all students. R 2 = 0.184 (compared to 0.20) and while HSM is a signicant predictor, HSS and HSE are not. 11.129 (a) GPA = 0.582 + 0.155 HSM + 0.050 HSS + 0.044 HSE + 0.067 Gender + 0.05 GHSM 0.05 GHSS 0.012 GHSE. The t statistics for each coefcient have P-values greater than 0.10 for all except the explanatory variable HSM. (b) Verify. (c) Verify. (d) F = 0.143, P-value = 0.966. This indicates there is no reason to include gender and the interactions.

CHAPTER .12 ......... ..


12.1 A owchart for making coffee might look like Measure coffee Grind coffee Add coffee and water to coffeemaker Brew coffee Pour coffee into mug and add milk and sugar if desired. In making a cause-and-effect diagram, consider what factors might affect the nal cup of coffee at each stage of the owchart. The 9 DRGs account for 80.5% of total losses. A Pareto chart indicates that the hospital ought to study DRGs 209 and 116 rst in attempting to reduce its losses. In Exercise 12.1 we described the process of making a good cup of coffee. Some sources of common cause variation are variation in how long the coffee has been stored and the conditions under which it has been stored, variation in the measured amount of coffee used, variation in how nely ground the coffee is, variation in the amount of water added to the coffeemaker, variation in the length of time the coffee sits between when it has nished brewing and when it is drunk, and variation in the amount of milk and/or sugar added. Some special causes that might at times drive the process out of control would be a bad batch of coffee beans, a serious mismeasurement of the amount of coffee used or the amount of water used, a malfunction of the coffeemaker or a power outage, interruptions that result in the coffee sitting a long time before it is drunk, and the use of milk that has gone bad. (a) CL = 11.5, UCL = 11.8, LCL = 11.2. (b) In the chart for data set A one nds that samples 12 and 20 fall above UCL. In the chart for data set B no samples are outside the control limits. In the chart for data set C, samples 19 and 20 are above UCL. (c) Data set B comes from a process that is in control. Data set A comes

12.3 12.5

12.7

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-61

from a process in which the mean shifted suddenly. Data set C comes from a process in which the mean drifted gradually upward. 12.9 CL = 0.46065, UCL = 1.044, LCL = 0. 12.11 5:45 A.M., alarm rings 5:45 A.M., get out of bed 5:46 A.M., start coffeemaker 5:47 A.M., shower, shave, and dress 6:15 A.M., breakfast 6:30 A.M., brush teeth 6:35 A.M., leave home 6:50 A.M., park at school 6:55 A.M., arrive at ofce. 12.13 My main reasons for late arrivals at work are dont get out of bed when the alarm rings (responsible for about 40% of late arrivals), too long eating breakfast (responsible for about 25% of late arrivals), too long showering, shaving, and dressing (responsible for about 20% of late arrivals), and slow trafc on the way to work (responsible for about 15% of late arrivals). 12.15 For the s chart, CL = 0.001128, UCL = 0.0023568, LCL = 0. For the x chart, CL = 0.8750, UCL = 0.87662, LCL = 0.87338. 12.17 First sample: x = 48, s = 8.94. Second sample: x = 46, s = 13.03. For the x chart UCL = 60.09, CL = 43, LCL = 25.9. For the s chart we note that UCL = 25.02, CL = 11.98, LCL = 0. The s chart shows a lack of control at sample point 5 (8:15 3/8), but otherwise neither chart shows a lack of control. We would want to nd out what happened at sample 5 to cause a lack of control in the s chart. 12.19 We want to compute the probability that x will be beyond the control limits, that is, that x will be either larger than UCL = 713 or smaller than LCL = 687. The desired probability is 0.1591. 12.21 (a) 2 control limits are UCL = + 2( n), LCL = 2( n). (b) For an s chart, the 2 control limits are UCL = (c4 + 2c5 ) , LCL = (c4 2c5 ) . 12.23 Presumably, a given sample is equally likely to be from the experienced clerk or the inexperienced clerk, and which clerk a sample comes from will be random. Thus, the x chart should display two types of points: those that have relatively small values (corresponding to the experienced clerk) and those with relatively large values (corresponding to the inexperienced clerk). Both types should occur about equally often in the chart, and the pattern of large and small values should appear random. 12.25 (a) = 275.065, = 37.5. (b) In Figure 12.7 we see that most of the points lie below 40 (and more than half of those below 40 lie well below 40). Of the points above 40, all but one (Sample 12) are only slightly larger than 40. The s chart suggests that typical values of s are below 40, which is consistent with the estimate of in part (a). 12.27 By practicing statistical process control, the manufacturer is, in essence, inspecting samples of the monitors it produces and xing any problems that arise. The control charts the manufacturer creates are a record of this inspection process. Incoming inspection is thus redundant and no longer necessary. 12.29 The plot shows no serious departures from Normality. For Normal data the natural tolerances are trustworthy, so based on our Normal quantile plot, the natural tolerances we found in the previous exercise are trustworthy. 12.31 99.06% of monitors will meet the new specications if the process is centered at 250 mV.

Moore-212007

pbs

November 27, 2007

13:26

S-62

SOLUTIONS TO ODD-NUMBERED EXERCISES

12.33 The natural tolerances for the distance between the holes are 3 = 43.41 37.17. 12.35 About 43.16% of meters meet specications. 12.37 The large-scale pattern in the plot looks Normal in that the centers of the horizontal bands of points appear to lie along a straight line. These horizontal bands correspond to observations having the same value, resulting from the limited precision (roundoff). With greater precision, these observations would undoubtedly differ and would probably appear to come from a Normal distribution. However, because of the lack of precision, we should probably view these data as only approximately Normal and thus the calculations in Exercise 12.34 should be considered only approximate. 12.39 (a) If all pilots suddenly adopt the policy of work to rule (and are now more careful and thorough about doing all the checks), one would expect to see a sudden increase in the time to complete the checks but probably not much of a change in the variability of the time. Thus, we would expect to see a sudden change in level on the x chart. (b) Presumably samples of measurements made by the laser system will have a much smaller standard deviation than samples of measurements made by hand. We would expect to see this reected in a sudden change (decrease) in the level of an s or R chart. (c) Presumably the measurement of interest is either the temperature in the ofce or the number of invoices prepared. If it is the temperature of the ofce, then as time passes, the temperature will gradually increase as outdoor temperature increases. This would be reected in a gradual shift up on the x chart. If the measurement of interest is the number of invoices prepared, as the temperature in the ofce gradually increases, the workers gradually become less comfortable. They may take more breaks for water or become tired more quickly and thus accomplish less. We might expect to see a gradual decrease in the number of invoices prepared and hence a gradual shift down in the x chart. 12.41 The winning time for the Boston Marathon has been gradually decreasing over time. There are thus two sources of variation in the data. One is the process variation that we would observe if times were stable (random variation that we might observe over short periods of time), and the other is variation due to the downward trend in times (the large differences between times from the early 1950s compared to recent times). In using the standard deviation s of all the times between 1950 and 2002, we include both of the sources of variation. However, to determine if times in the next few years are unusual, we should consider only the process variation (variability in recent times, where the process is relatively stable). Using s overestimates the process variation, resulting in control limits that are too wide to effectively signal unusually fast or slow times. 12.43 LSL and USL dene the acceptable set of values for individual measurements on the output of a process. These limits represent the desired performance of the process. LCL and UCL for x are the lower and upper control limits for the means of samples of several individual measurements on the output of a process. They represent the actual performance of the process when it is in control. 12.45 (a) Cp = 1.303, Cpk = 1.085. (b) Cp = 0.869, Cpk = 0.651. 12.47 (a) Cpk = 1, 50% meet specications. (b) 99.74%. (c) The capability index formulas make sense for normal distributions, but for distributions that are clearly not normal they will give misleading results.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-63

12.49 Cp is referred to as the potential capability index because it measures process variability against the standard provided by external specications for the output of the process. It estimates what the process is capable of producing if the process could be centered. Cpk is referred to as the actual capability because index because it considers both the center and the variability against the standard provided by external specications for the output of the process. It estimates what the process is capable of if the process target is centered between the specication limits. 12.51 (a) = 43.41, = 12.39. (b) Cp = 0.27, Cpk = 0.02. The capability is poor (both indexes are small). The reasons are that the process is not centered (we estimate the process mean to be 43.41, but the midpoint of the specication limits is 54), and the process variability is large (we saw in Exercise 12.33 that the natural tolerances for the process are 43.41 37.17, a much wider range than the specication limits of 54 10). 12.53 (a) 17.78% of clip openings will meet specications if the process remains in its current state. (b) Cpk = 0.06. 12.55 Cp = USL LSL 6 , so if Cp 2 we must have USL LSL 22 . If the process is properly centered, then its mean will be halfway between USL and LSL, and hence USL and LSL will be at least 6 from . Thus, this is called six-sigma quality. 12.57 If the same representatives work a given shift, and calls can be expected to arrive at random during a given shift, then it might make sense to choose calls at random from all calls received during the shift. In this case, the process mean and standard deviation should be stable over the entire shift, and random selection should lead to sensible estimates of the process mean and standard deviation. If different representatives work during a shift, or if the rate of calls varies over a shift, then the process may not be stable over the entire shift. There may be changes in either the process mean or the process standard deviation over the shift. A random sample from all calls received during the shift would overestimate the process variability because it would include variability due to special causes. Stability would more likely be seen only over much shorter time periods, and thus it would be more reasonable to time 6 consecutive calls. 12.59 The three outliers are a response time of 276 seconds (occurring in Sample 28), resulting in a value of s = 107.20; a response time of 244 seconds (occurring in Sample 42), resulting in a value of s = 93.68; and a response time of 333 seconds (occurring in Sample 46), resulting in a value of s = 119.53. When the outliers are omitted, the values of s become s = 9.28 (Sample 28), s = 6.71 (Sample 42), s = 31.01 (Sample 46). 12.61 (a) The total number of opportunities for unpaid invoices = 128,750, p = 0.0334. (b) UCL = 0.0435, CL = 0.0334, LCL = 0.0233. 12.63 UCL = 0.01307, CL = 0.00599, LCL = 0. 12.65 (a) p = 0.356, n = 921.8. (b) For the p chart, UCL = 0.4033, CL = 0.356, LCL = 0.3087. The process (proportion of students per month with 3 or more unexcused absences) appears to be in control. That is, there are no months with unusually high or low proportions of absences, and no obvious trends. (c) For October, UCL = 0.4027 and LCL = 0.3093. For June, UCL = 0.4043 and LCL = 0.3077. These exact limits do not affect our conclusions in this case. 12.67 (a) p = 0.008 and if the manufacturer processes 500 orders per month, we would expect to see 4 defective orders per month. (b) UCL = 0.020, CL = 0.008,

Moore-212007

pbs

November 27, 2007

9:50

S-64

SOLUTIONS TO ODD-NUMBERED EXERCISES

LCL = 0. To be above the UCL, the number of defective orders must be greater than 10. 12.69 (a) The percents add up to larger than 100% because customers can have more than one complaint. (b) The category with the largest number of complaints is the ease of obtaining invoice adjustments/credits, so we might target this area for improvement. 12.71 (a) Presumably we are interested in monitoring and controlling the amount of time the system is unavailable. We might measure the time the system is not available in sample time periods and use an x and an s chart to monitor how this length of time the system is unavailable varies. (b) To monitor the time to respond to requests for help we might measure the response times in a sample of time periods and use an x and an s chart to monitor how the response times vary. (c) We might examine samples of programming changes and record the proportion in the sample that are not properly documented. Because we are monitoring a proportion, we would use a p chart. 12.73 CL = 7.65, UCL = 19.642, LCL = 0. The rst sample is out of control, perhaps reecting an initial lack of skill or initial problems in using the new system. After the rst sample, the process is in control with respect to short-term variation, although sample 10 is very close to the upper control limit. 12.75 (a) Cp = 1.14. This value tells us that the specication limits will lie just within 3 standard deviations of the process mean if the process mean is in the center of the specication limits. (b) Cpk depends on the value of the process mean. If the process mean can easily be adjusted, it is easy to change the value of Cpk . A better measure of the process capability is to center the process mean within the specication limits and then compute a capability index. If the process is properly centered, Cpk and Cp will be the same, so using Cp is ultimately more informative about the process capability. (c) We used s/c4 to estimate , the process variation. A better estimate would be to compute the sample standard deviation s of all 22 3 = 66 observations in the samples. This gives an estimate of all the variation in the output of the process (including sample-to-sample variation). Using s/c4 is likely to give a slightly too small estimate of the process variation and hence a slightly too large (optimistic) estimate of Cp . 12.77 (a) We would use a p chart. The control limits would be UCL = 0.019, CL = 0.003, and LCL = 0. (b) If the proportion of unsatisfactory lms is 0.003, then in a sample of 100 lms the expected number of unsatisfactory lms is 0.3. Thus, most of our samples will have no unsatisfactory lms, and plotting the sample values (most of which will be 0) will not be very informative. 12.79 (a) With p = 0.506, UCL = 0.718, and LCL = 0.294, the process does appear to be in control. (b) The new sample proportions are not in control according to the original control limits from part (a). Attempts 2, 4, and 10 are above the UCL. With new p = 0.702, the new limits for future samples should be UCL = 0.896, and LCL = 0.508.

CHAPTER .13 ......... ..


13.1 (a) Each year, sales are lowest for the rst two quarters and then increase in the third and fourth quarters. Sales decrease from the fourth quarter of one year to the rst quarter of the next. (b) The pattern described is obvious in the time plot.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-65

(c) There appears to be a positive trend, although the trend levels off after 1998. (d) The pattern described in part (a) is repeated year after year. 13.3 (a) Sales = 5903.22 + 118.75x with sales in millions of dollars and x takes on values 1, 2, . . . , 24. (b) The fourth quarter of 1995. (c) The slope is the increase in sales (in millions of dollars) that occurs from one quarter to the next. (a) Sales = 7858.76 + 99.54x 2274.21X 1 2564.58X 2 2022.79X 3. (b) If we know that X 1 = X 2 = X 3 = 0, then we know that we are not in any of the rst three quarters, and hence must be in the fourth quarter. (c) The intercept again represents the fourth quarter of 1995. (a) The seasonality factors are 0.923, 0.885, 0.960, and 1.231 for quarters 1 through 4, respectively. (b) The average is 0.999, which is close to 1. The fourth-quarter seasonality factor of 1.231 tells us that fourth-quarter sales are typically 23.1% above the average for all four quarters. (c) The plot mimics the pattern of seasonal variation in the original series. (a) Seasonally adjusted series is a little smoother. (b) Seasonally adjusting the DVD player sales data smoothed the time series a little but not to the degree that seasonally adjusting the sales data in Figure 13.7 did. This suggests that the seasonal pattern in the DVD player sales data is not as strong as it is in the monthly retail sales data.

13.5

13.7

13.9

13.11 (b) There is a positive, fairly linear trend over time. (c) There is usually a peak in January, a drop until April, and then an increase until the next January each year. (d) There are two temporary troughs: a big one from January 1991 to November 1994 and a smaller one from January 1998 to January 2003. There could also be considered a temporary peak at the very end of the data set where there is a steep increase. 13.13 (b) There is no obvious increase or decrease overall, but there is lots of month-tomonth variability. (c) The biggest dips come in August each year, and then there is another major (but not quite as big) dip in December each year. There are smaller dips in June each year. In August 2002, there is a dip, but its just not as dramatic as for all the other years. However, the June 2002 dip was bigger than all the other June dips, so that might help explain why the August dip isnt as big as expected. 13.15 (a) The dashed line in the time plot corresponds to the least-squares line. Using the trend-only model, we nd that the sales for the rst three quarters tend to be overpredicted and those for the fourth quarter are underpredicted. (b) The rst quarter of 2002 is $8871.97 million. The fourth quarter of 2002 is $9228.22 million. (c) In the past, predictions in the rst three quarters tended to be slightly more accurate than in the fourth quarter, so the rst quarter of 2002. 13.17 (a) The rst quarter of 2002 is $8188.83 million. The fourth quarter of 2002 is $11,359.94 million. (b) The rst-quarter forecast has been multiplied by 0.923 as the trend-only model typically overpredicts the rst quarter, while the fourthquarter forecast has been multiplied by 1.231 to account for the fact that the trend-only model typically underpredicts the fourth quarter. (c) The trend-andseason model and the trend-only model with seasonality factors give similar predictions as both are adjusting the trend model for the seasonality effects. 13.19 (a) For the trend-only model R 2 = 35% and for the trend-and-season model R 2 = 86.8%. The trend-and-season model explains much more of the variability

Moore-212007

pbs

November 27, 2007

9:50

S-66

SOLUTIONS TO ODD-NUMBERED EXERCISES

in the JCPenney sales. (b) For the trend-only model, s = 1170, and for the trendand-season model, s = 566.7. (c) The trend-and-season model closely follows the original series. (d) It is clear from the plot that the trend-and-season model is a substantial improvement over the trend-only model. 13.21 (a) There are two very large temporary troughs (approximately) between July 1990 and July 1995 and then between April 1997 and November 2003. At the end of the data, there is a sharp rise. (b) A closer look at the time plot suggests a positive autocorrelation. The majority of pairs of successive residuals shows the rst residual lower than the second residual in the pair. (c) The lagged residual plot shows a beautiful, positive, linear relationship between et1 and et . The correlation between et1 and et is 0.985. Yes, we have strong evidence of autocorrelation. 13.23 (a) The residuals for the fourth quarters are positive, and most of the residuals for the rst three quarters are negative. (b) Autocorrelation is not apparent in the lagged residual plot. The correlation between successive residuals et and et1 is only 0.095. 13.25 (a) The other group of 10 points has December as the y coordinate. (b) The correlation of 0.9206 suggests a strong autocorrelation in much of the time series. (c) If we looked at the seasonally adjusted time series, the correlation would be closer to 0.9206. The outlying groups of points have the December sales as either the x or y coordinate, and this is what is reducing the correlation from 0.9206 to 0.4573. If there were a seasonal adjustment, these two sets of 10 points should no longer stand out from the remaining points. 13.27 (a) Fitting the simple linear regression model using yt as the response variable and yt1 as the predictor gives the equation yt = 34.0 + 0.992yt1 . The sales in July 2001 are 5170 thousands of units, so the forecast of August 2001 sales using this model is 5162.64. (b) Fitting the AR(1) model gives the equation yt = 13.9 + 0.996yt1 . The sales in July 2001 are 5170 thousands of units, so the forecast of August 2001 sales using this model is 5163.22. (c) The coefcients of yt1 are very similar in parts (a) and (b), but the constants differ. The constant for the AR(1) model is much smaller than the constant (intercept) for the simple linear regression model. The August 2001 estimates are very close. The AR(1) model is preferred because it was estimated using maximum likelihood, which is preferred over least-squares for time series models. 13.29 (a) The 12-month moving average forecast predicts the November 2002 price by the average of the preceding 12 months. Averaging the last 12 values in the series gives the 12-month moving average forecast as $3.31. The 120-month moving average forecast predicts the November 2002 price by the average of the preceding 120 months. Averaging the last 120 values in the series gives the 120-month moving average forecast as $3.39. (b) Going to the Web page gives the actual winter wheat price received by Montana farmers for November 2002 as $4.28. The 120-month moving average forecast is slightly better, but neither captures the sharp rise in price that occurred over the latter part of 2002. 13.31 (a) When w = 0.1 the coefcients are 0.1, 0.09, 0.081, 0.0729, 0.06561, 0.059049, 0.0531441, 0.0478297, 0.0430467, and 0.0387420. (b) When w = 0.5 the coefcients are 0.5, 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.0039062, 0.0019531, and 0.0009766. (c) When w = 0.9 the coefcients are 0.9, 0.09, 0.009, 0.0009, 0.00009, 0.000009, 0.0000009, 0.0000001, 0.0000000, and

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-67

0.0000000. (d), (e) The curves for w = 0.9 and w = 0.5 show a more rapid exponential decrease than the curve for w = 0.1, which decreases more slowly. The curve for w = 0.9 puts more weight on the most recent value of the time series in the computation of a forecast. (f) The coefcients of y1 are 0.348678, 0.000977, and 0.0000000 for the values w = 0.1, w = 0.5, and w = 0.9, respectively. The value of w = 0.1 puts the greatest weight on y1 in the calculation of a forecast. Note that as indicated in the text, the values of the coefcients in the forecast model decrease exponentially in value with the exception of this last coefcient. 13.33 (a) yt = 0.021 + 1.004yt1 , yJuly2005 = 703.5822. (b) yt = 2.45385 + 1.00877yt1 , yJuly2005 = 704.49217. (c) The slope is very similar for both equations; however, the y-intercept is not the same. The estimates are fairly close. The assumptions necessary for the regression model are no longer valid, so it is better to use the AR(1) model. 13.35 (a) There is a moderate positive linear relationship. (b) 0.692. (c) Yes, the relationship is strong enough to make production 12 months ago a good predictor or this months production. The correlation is large and the relationship looks strong. 13.37 (a) Fitting the simple linear regression model using yt as the response variable and yt1 as the predictor gives yt = 100.4 + 0.572yt1 . R 2 = 0.327 and the regression standard error is 47.89. (b) An AR(1) model was tted and the estimated autoregression equation is yt = 100.9 + 0.568yt1 . R 2 = 0.327 and the model standard error is 47.74. (c) The equations obtained for least-squares and maximum likelihood are almost identical. (d) There is very little difference between the values of R 2 the model standard error s in this example. Thus, there is no clear indication which tting method is preferred, although in general maximum likelihood is preferred to least-squares in time series models. 13.39 (a) The moving averages do appear to be very linear. (b) y = 181.372 + 1.481x. (c) The moving average line and predicted value line completely overlap until about 1996. From 1996 on until the end of the data, there is a slight spiraling of the moving averages around the predicted value line, but he lines are still very close. 13.41 (a), (b) The smoothness decreases as the value of w increases. The model with w = 0.9 would be best for forecasting the monthly ups and downs in orange prices. (c) The Minitab output for an exponential smoothing model provides forecasts for each value of the smoothing constant. The results are summarized below. Smoothing constant w 0.1 0.5 0.9 Prediction 218.998 219.329 216.976

The forecasts are fairly similar, but the model with w = 0.5 provided the forecast that was closest to the actual value. (d) The January 2001 data point was added to the series and the models with the three weighting constants were t to the new series. These were then used to forecast the orange price for February 2001. The results are summarized on the next page.

Moore-212007

pbs

November 27, 2007

9:50

S-68

SOLUTIONS TO ODD-NUMBERED EXERCISES

Smoothing constant w 0.1 0.5 0.9

Prediction 219.518 221.764 223.478

The actual value in February was 229.6, so in this case the model with w = 0.9 provided the forecast that was closest to the actual value. 13.43 (a) If we use a span of k = 1, the moving average forecast equation would be yt = yt1 . (b), (c), (d) We used k = 4 and k = 100. The moving averages with a span of k = 4 are not particularly smooth. The moving averages with a span of k = 100 are quite smooth. 13.45 (a) If we use an exponential smoothing forecast equation with smoothing constant of w = 1, the forecast equation is yt = yt1 . If we use an exponential smoothing forecast equation with smoothing constant of w = 0, the forecast equation is yt = yt1 . (b), (c), (d) We used w = 0.2 and w = 0.9. The exponential smoothing model with w = 0.2 is smoother than the exponential smoothing model with w = 0.9. (e) Using the model from part (b), we nd the forecast for the July 24, 2002, exchange rate to be 0.993532. Using the model from part (c), we nd the forecast for the July 24, 2002, exchange rate to be 0.991149. The actual exchange rate on July 24, 2002 (from the web site www.oanda.com/convert/fxhistory), is 1.01140. Both forecasts underestimate the actual exchange rate. 13.47 (a) There are no clear seasonal patterns in the time series plot. (b) We tted a line to the data using statistical software and obtained y = 487,430 + 256.735(year), R 2 = 0.620, and s = 5048. (c) We tted a second degree poly nomial to the data using statistical software and obtained y = 20,082,384 20,752.3(year) + 5.36356(year)2 , R 2 = 0.753, and s = 4091. (d) We tted a third degree polynomial to the data using statistical software and obtained y = 1,101,586,137 + 1,697,733(year) 872.171(year)2 + 0.149355(year)3 , R 2 = 0.802, and s = 3684. (e) Both the quadratic and cubic models appear to t appreciably better than the straight line model. The cubic model ts a little better (a bit larger R 2 and a bit smaller s) than the quadratic model and might be a slightly better choice for the trend equation. 13.49 (a) To t the trend-and-season model, we must rst dene 11 indicator variables, as in Example 13.3. We let Jan. = 1 if the month is January, 0 otherwise; Feb. = 1 if the month is February, 0 otherwise; and so on through November. Using software we obtained the estimated trend-and-season model ln(UNITS) = 10.691 + 0.069(Case) 0.761(Jan.) 0.786(Feb.) 0.556(Mar.) 0.544(Apr.) 0.632 (May)0.371(Jun.)0.525(Jul.)0.533(Aug.)0.041(Sep.)+0.057(Oct.) 0.225(Nov.). (b) If Jan. = 0, Feb. = 0, Mar. = 0, . . . , Nov. = 0, then we know the month must be December. Adding an indicator variable for December would be redundant. We can determine whether the month is December from the other indicator variables. (c) To use the equation in part (a) to forecast for July 2002, we notice that this would correspond to case 64. We set all indicators equal to 0 except the indicator for July, which we set to 1. We get ln(UNITS) = 14.582 Thus, we forecast sales for July 2002 to be UNITS = e14.582 = 2,152,198. (d) The forecast using the seasonality factors in a trend-and-season model from Example 13.5 is UNITS = 2,120,278. The forecast in part (c) is slightly larger.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-69

(e) The forecast using the AR(1) model from Example 13.10 is 1,515,036. The forecast in part (c) is considerably larger. 13.51 We tried spans of 12, 24, and 36. The plot with span k = 36 does the best job of smoothing the minor ups and downs while still capturing the major jump about two-thirds of the way through the data. An even larger value of k might also work well. Using the moving average model with span k = 36, Minitab forecasts the June 2002 average price per pound of coffee as 3.25178. 13.53 The AR(1) model does not smooth the minor ups and downs in the data, although it does capture the major jump in the series that occurs two-thirds of the way through the values. Using the AR(1) model, Minitab forecasts the June 2002 average price per pound of coffee as 3.01245.

CHAPTER .14 ......... ..


14.1 I = 3, N = 60, n 1 = 20, n 2 = 20, and n 3 = 20. The ANOVA model is xij = i + ij . The ij are assumed to be from an N (0, ) distribution where is the common standard deviation. The parameters of the model are the I population means 1 , 2 , and 3 , and the common standard deviation . (a) The largest standard deviation is 120 and the smallest is 80. The ratio of these is 120/80 = 1.5. This ratio is less than 2, so it is reasonable to pool the standard deviations for these data. (b) I = 3, x 1 = 150, x 2 = 175, x 3 = 200. We estimate by sp = 101.32 Stem plots for the three groups are given below.
Basal 0 4 0 67 0 888999 1 01 1 22222223 1 45 1 6 DRTA 0 0 6777 0 888889999 1 000 1 2233 1 5 1 6 Strat 0 445 0 666777 0 889 1 111 1 22333 1 44 1

14.3

14.5

The distribution for the Basal group appears to be centered at a slightly larger score than that for the DRTA group and perhaps for the Strat group. The distribution of the DRTA scores shows some right-skewness. There are no clear outliers in any of the groups. 14.7 14.9 SSG + SSE = 20.58 + 572.45 = 593.03 = SST. DFG + DFE = 2 + 63 = 65 = DFT.

14.11 MST = 9.124. Using statistical software (we used Minitab) we nd the mean of all 66 observations in Table 14.1 to be 9.788 and the variance of all 66 observations in Table 14.1 to be 9.125. 14.13 (a) The ANOVA F statistic has 3 numerator degrees of freedom and 20 denominator degrees of freedom. (b) The F statistic would need to be larger than 3.10 to have a P-value less than 0.05.

Moore-212007

pbs

November 27, 2007

9:50

S-70

SOLUTIONS TO ODD-NUMBERED EXERCISES

2 14.15 (a) It is always true that sp = MSE. Hence, sp = MSE. Because of this, SAS calls sp Root MSE. (b) sp = MSE. In Excel, MSE is found in the ANOVA table in the row labeled Within Groups and under the column labeled MS. Take the square root of this entry (0.616003 in this case) to get sp . 14.17 Try the applet. You should nd that the F statistic increases and the P-value decreases if the pooled standard error is kept xed and the variation among the group means increases. 14.19 The standard error for the contrast SEc = 2. 14.21 The 95% condence interval is 15 3.968 = (11.032, 18.968). 46.7273 44.2727 14.23 t23 = = 1.29. 1 1 6.31 + 22 22 14.25 The value of t for the Bonferroni procedure when = 0.05 is t = 2.46 (see Example 14.22). t23 = 1.29. Because |1.29| < t , using the Bonferroni procedure we would not reject the null hypothesis that the population means for groups 2 and 3 are different. 14.27 From the output in Figure 14.16 x 2 x 3 = 2.4545 and the standard error for this difference is 1.90378. Thus, t23 = 2.4545 1.90378 = 1.29. 14.29 Group Group 1 Group 2 Group 3 Group 4 Group 5 Mean 150.2 121.9b,c 129.2b,d,e 140.8a,d 117.2c,e
a

SD 19.1 18.3 18.4 22.1 20.6

n 30 30 30 30 30

We see that Groups 1 and 4 have the largest means. Group 1 differs from Groups 2, 3, and 5. Group 4 differs from Groups 2 and 5. 14.31 The Bonferroni 95% condence interval is (2.2, 7.1). This interval includes 0. 14.33 (a) n 10 20 30 40 50 100 F 2.866 2.725 2.683 2.663 2.651 2.627 DFG 3 3 3 3 3 3 DFE 36 76 116 156 196 396 0.9042 1.8084 2.7126 3.6168 4.521 9.042 PROBF 0.89757 0.83012 0.75683 0.68113 0.60573 0.29117 Power 0.10243 0.16988 0.24317 0.31887 0.39427 0.70883

(b) As sample size increases, power increases. (c) Of the sample sizes selected in the table above, 100 would be best since the power is fairly low at smaller sample sizes. 14.35 (a) The response variable needs to be quantitative, not categorical. (b) You do want to use one-way ANOVA when there are at least three means to be compared. (c) The pooled estimate sp is an estimate for , which is a parameter.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-71

14.37 (a) F(3, 7). p F 0.100 3.07 0.050 4.35 0.025 5.89 0.010 8.45 0.001 18.77

(b) The P-value would be between 0.050 and 0.100. (c) You can never conclude from an F test that all the means are different. The alternative hypothesis is that not all the means are the same. However, in this case, with a large P-value at the 5% signicance level, do not reject the null hypothesis. There is not enough evidence to stay that any of the means are different. 14.39 (a) F(4, 35), F = 1.14, P-value = 0.3538 (from software). (b) F(2, 18), F = 4, P-value = 0.0365 (from software). 14.41 (a) I = 3, n 1 = 220, n 2 = 145, n 3 = 76, N = 441, response variable = 1 to 5 rating of Did you discuss the presentation with any of your friends? (b) I = 3, n i = 5, N = 15, response = cholesterol level. (c) I = 3, n i = 25, N = 75, response = 1 to 10 rating of the game. 14.43 For parts (a), (b), and (c): H0 : 1 = 2 = 3 and Ha: Not all the means are the same. (a) DFG = 2, DFE = 438, DFT = 440, F(2, 438). (b) DFG = 2, DFE = 12, DFT = 14, F(2, 12). (c) DFG = 2, DFE = 72, DFT = 74, F (2, 72). 14.45 Answers will vary. 14.47 (a) Yes, because 20 < 2 15. (b) 48,400; 36,100; and 40,000. (c) 41,866. (d) 204.61. 14.49 (a) In general, as the number of accommodations increased, the mean grade decreased, but it is not a steady decline. (b) So many decimal places are not necessary. No additional information is gained by having so many. Using 2 decimal places would be ne. (c) The biggest s is not exactly less than 2 times the smallest s (1.66233 vs. 2 0.82745), but it is close. Pooling should not be used (sp would be 0.86). (d) It is never a good idea to eliminate data points without a good reason. Since the mean grade is affected similarly for 2, 3, and 4 accommodations, it is not unreasonable to group these data together. (e) These data do not represent 245 independent observations. Some students were measured multiple times. (f) Answers will vary. University policies, teacher policies, student backgrounds might not be the same from school to school. (g) There is no control group, so it is impossible to comment on the effectiveness of the accommodations. 14.51 (a) Checking whether the largest s < 2 smallest s for each group indicates that it is appropriate to pool the standard deviations for intensity and recall but not frequency. (b) F(4, 405). The P-value for all three one-way ANOVA tests is < 0.001, so for frequency, intensity, and recall there is strong evidence that not all the means are the same. p F 0.100 1.95 0.050 2.42 0.025 2.85 0.010 3.41 0.001 4.81

(c) Hispanic Americans were highest in each of the three groups. For Emotion score, the European, Asian, and Indian scores were low but close together. For

Moore-212007

pbs

November 27, 2007

9:50

S-72

SOLUTIONS TO ODD-NUMBERED EXERCISES

Frequency and Intensity and Recall, Asian and Japanese were both low, and European and Indian are close together in the middle. (d) Answers will vary. People near a university might not be representative of all citizens of those countries. (e) The chi-square test has a test statistic of 11.353 and a P-value of 0.023, so at the 5% signicance level there is enough evidence to say that there is a relationship between gender and culture. In other words, there is evidence that the proportion of men and women is not the same for all of the cultural groups. This may affect how broadly the results can be applied. 14.53 (a) The histogram of bone density for the control group shows a skewed right distribution. The histograms for low and high groups show fairly symmetric distributions. There are no outliers for any groups. A side-by-side boxplot of the groups shows heavy overlap of control and low but not much overlap of high with the other two groups. A means plot shows that control is slightly higher than low but high is much higher than the other two groups. The following are the (mean, standard deviation) for each group: control (0.219, 0.0116), low (0.216, 0.0115), high (0.235, 0.0188). (b) The normal quantile plots for low and high look good, but the plot for control shows some deviation from Normality towards the very low and very high ends. The largest s is < 2 smallest s, so it is appropriate to pool the standard deviations. (c) F = 7.718, P-value = 0.001. There is strong evidence that not all the means are the same. (d) Bonferroni multiple comparisons shows that high is signicantly different from both control and low groups, but control and low groups are not signicantly different from each other. (e) The high dose of kudzu isoavones yields a signicantly greater mean bone density for the femur of a rat than the control or low dose does. There is no signicant difference in mean bone density between the control and low dose. 14.55 Condition i j 10 30 31 50 51 53 70 71 73 75 Bonferroni Tests Difference Std. error 6.75000 26.9100 20.1600 36.2900 29.5400 9.38000 40.3850 33.6350 13.4750 4.09500 1.321 1.321 1.321 1.321 1.321 1.321 1.321 1.321 1.321 1.321 P-value 0.036734 0.000053 0.000219 0.000012 0.000033 0.008542 0.000007 0.000017 0.001551 0.238149

The differences in the group means are all signicantly different from 0 at the = 0.05 level except the difference after 5 and 7 days. At the = 0.01 level, all the differences are signicantly different from 0 except the difference between 0 and 1 days after baking, and the difference between 5 and 7 days after baking. 14.57 (a) The ANOVA in Exercise 14.56 did not reject the hypothesis at the 0.05 level that any of the group means differed. Thus, no further analysis on which group means differed is appropriate.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-73

(b) Condition i j 10 30 31 50 51 53 70 71 73 75 Bonferroni Tests Difference Std. error 0.110000 0.140000 0.030000 0.045000 0.065000 0.095000 0.385000 0.275000 0.245000 0.340000 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 0.0608 P-value 0.752553 0.514112 0.999966 0.998871 0.982858 0.861026 0.014421 0.061029 0.096013 0.025003

The only group means that are signicantly different at the = 0.05 level are the difference between 0 and 7 days after baking and between 7 and 5 days after baking. 14.59 (a) The plots show that Group 2 has a modest outlier, but otherwise there are no serious departures from Normality. The assumption that the data are (approximately) Normal is not unreasonable. (b) Number of promotions Sample size Mean Std. dev. 1 3 5 7 40 40 40 40 4.22400 4.06275 3.75900 3.54875 0.273410 0.174238 0.252645 0.275031

(c) The ratio of the largest to the smallest is 0.275031 0.174238 = 1.58. This is less than 2, so it is not unreasonable to assume that the population standard deviations are equal. (d) The hypotheses for AVOVA are H0 : 1 = 2 = 3 = 4 , Ha : not all of the i are equal. The ANOVA table is Source Groups Error Total Degrees of freedom 3 156 159 Sum of squares 10.9885 9.53875 20.5273 Mean square 3.66285 0.061146 F 59.903 P-value 0.0001

The F statistic has 3 numerator and 156 denominator degrees of freedom. The P-value is 0.0001, and we would conclude that there is strong evidence that the population mean expected prices associated with the different numbers of promotions are not all equal. 14.61 (a) Group Piano Singing Computer None Sample size 34 10 20 14 Mean 3.61765 0.300000 0.450000 0.785714 Std. dev. 3.05520 1.49443 2.21181 3.19082

Moore-212007

pbs

November 27, 2007

9:50

S-74

SOLUTIONS TO ODD-NUMBERED EXERCISES

(b) The hypotheses for ANOVA are H0 : 1 = 2 = 3 = 4 , Ha : not all of the i are equal. The ANOVA table is Source Groups Error Total Degrees of freedom 3 74 77 Sum of squares 207.281 553.437 760.718 Mean square 69.0938 7.47887 F 9.2385 P-value 0.0001

The F statistic has 3 numerator and 74 denominator degrees of freedom. The P-value is 0.0001, and we would conclude that there is strong evidence that the population mean changes in scores associated with the different types of instruction are not all equal. 14.63 The contrast is = 1 (1/3)2 (1/3)3 (1/3)4 . To test the null hypothesis H0 : = 0, the t statistic is t = c/SEc = 3.306/0.636 = 5.20. P-value 0.001. We conclude that there is strong statistical evidence that the mean of the piano group differs from the average of the means of the other three groups. 14.65 (a) Plots show no serious departures from Normality, so the Normality assumption is reasonable. (b) Bonferroni Tests Group i j Difference Std. error P-value 21 31 32 11.4000 37.6000 26.2000 9.653 9.653 9.653 0.574592 0.001750 0.033909

At the = 0.05 level we see that Group 3 (the high-jump group) differs from the other two. The other two groups (the control group and the low-jump group) are not signicantly different. It appears that the mean density after 8 weeks is different (higher) for the high-jump group than for the other two. 14.67 (a) Plots show no serious departures from Normality, so the Normality assumption is reasonable. (b) Bonferroni Tests Group i j Difference Std. error P-value 21 31 32 0.120000 2.62250 2.50250 0.3751 0.3751 0.3751 0.985534 0.000192 0.000274

At the = 0.05 level we see that Group 3 (the iron pots) differs from the other two. The other two groups (the aluminum and clay pots) are not signicantly different. It appears that the mean iron content of yesiga wet when it is cooked in iron pots is different (higher) than when it is cooked in the other two. 14.69 (a) Plots show no serious departures from Normality (but note the granularity of the Normal probability plot. It appears that values are rounded to the nearest 5%), so the Normality assumption is not unreasonable.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-75

(b) Group i j ECM2 ECM1 ECM3 ECM1 ECM3 ECM2 MAT1 ECM1 MAT1 ECM2 MAT1 ECM3 MAT2 ECM1 MAT2 ECM2 MAT2 ECM3 MAT2 MAT1 MAT3 ECM1 MAT3 ECM2 MAT3 ECM3 MAT3 MAT1 MAT3 MAT2 Bonferroni Tests Difference Std. error 1.66667 8.33333 10.0000 41.6667 40.0000 50.0000 58.3333 56.6667 66.6667 16.6667 53.3333 51.6667 61.6667 11.6667 5.00000 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 3.600 P-value 1.00000 0.450687 0.223577 0.000001 0.000002 0.000000 0.000000 0.000000 0.000000 0.008680 0.000000 0.000000 0.000000 0.101119 0.957728

At the = 0.05 level we see that none of the ECMs differ from each other, that all the ECMs differ from all the other types of materials (the MATs), and that MAT1 and MAT2 differ from each other. The most striking differences are those between the ECMs and the other materials. 14.71 (a) Group Lemon White Green Blue Sample size 6 6 6 6 Mean 47.1667 15.6667 31.5000 14.8333 Std. dev. 6.79461 3.32666 9.91464 5.34478

(b) The hypotheses are H0 : 1 = 2 = 3 = 4 , Ha : not all of the i are equal. ANOVA tests whether the mean number of insects trapped by the different colors are the same or if they differ. If they differ, ANOVA does not tell us which ones differ. (c) Source Groups Error Total Degrees of freedom 3 20 23 Sum of squares 4218.46 920.500 5138.96 Mean square 1406.15 46.0250 F 30.552 P-value 0.0001

sp = 6.78. We conclude that there is strong evidence of a difference in the mean number of insects trapped by the different colors. 14.73 (a) Source Groups Error Total Degrees of freedom 3 32 35 Sum of squares 104,855.87 70,500.59 175,356.46 Mean square 34,951.96 2,203.14 5,010.18 F 15.86

Moore-212007

pbs

November 27, 2007

9:50

S-76

SOLUTIONS TO ODD-NUMBERED EXERCISES

(b) H0 : 1 = 2 = 3 = 4 , Ha : not all of the i are equal. (c) The F statistic has the F(3, 32) distribution. The P-value is smaller than 0.001. 2 (d) sp = 2,203.14, sp = 46.94.
2 14.75 (a) sp = 3.90. This quantity corresponds to MSE in the ANOVA table. (b)

Source Groups Error Total

Degrees of freedom 2 206 208

Sum of squares 17.22 803.4 175,356.46

Mean square 8.61 3.90 5010.18

F 2.21

(c) H0 : 1 = 2 = 3 , Ha : not all of the i are equal. (d) The F statistic has the F(2, 206) distribution. The P-value is greater than 0.100. We conclude that these data do not provide evidence that the mean weight gains of pregnant women in these three countries differ. 14.77 (a) A contrast is 1 = (0.5)1 + (0.5)2 3 . (b) A contrast is 2 = 1 2 . 14.79 (a) For the contrast 1 = (0.5)1 + (0.5)2 3 , it would be reasonable to test the hypotheses H0 : 1 = 0, Ha : 1 > 0. For the contrast 2 = 1 2 , it would be reasonable to test the hypotheses H0 : 2 = 0, Ha : 2 = 0. (b) Estimate of 1 : c1 = 49. Estimate of 2 : c2 = 10. (c) SEc1 = 6.32, SEc2 = 7.29. (d) The test statistic for H0 : 1 = 0, Ha : 1 > 0, is t = 7.75. P-value < 0.005. We conclude that there is strong evidence that the average of the mean SAT mathematics scores for computer science and engineering and other sciences majors is larger than the mean SAT mathematics scores for all other majors. For H0 : 2 = 0, Ha : 2 = 0, the test statistic is t = 1.37. 0.10 < P-value < 0.20. We conclude that there is not strong evidence that the average of the mean SAT mathematics scores for computer science majors differs from that for engineering and other science majors. (e) A 95% condence interval for 1 is 49 12.54 = (36.46, 61.54). A 95% condence interval for 2 is 10 14.46 = (24.46, 4.46). 14.81 (a) Question 1. Contrast: 1 = 1 2 . Hypotheses: H0 : 1 = 0, Ha : 1 > 0. Question 2. Contrast: 2 = 1 (0.5)2 (0.5)4 . Hypotheses: H0 : 2 = 0, Ha : 2 > 0. Question 3. Contrast: 3 = 3 (1/3)1 (1/3)2 (1/3)4 . Hypotheses: H0 : 3 = 0, Ha : 3 > 0. (b) Question 1: P-value > 0.25. We do not have evidence that T is better than C. Question 2: 0.10 < P-value < 0.15. There is at best weak evidence that T is better than the average of C and S. Question 3: P-value < 0.005. There is strong evidence that J is better than the average of the other three groups. (c) This is an observational study. Males were not assigned at random to treatments. Thus, although the researchers tried to match those in the groups with respect to age and other characteristics, there are reasons why people choose to jog or choose to be sedentary that may affect other aspects of their health. It is always risky to draw conclusions of causality from a single (small) observational study, no matter how well designed it is in other respects. 14.83 For each pair of means we get the following: Group 1 (T) vs. Group 2 (C): t12 = 0.66. |.66| is not larger than t = 2.81, so we do not have strong evidence that T and C differ. Group 1 (T) vs. Group 3 (J): t13 = 3.65. |3.65| is larger than t = 2.81, so we have strong evidence that T and J differ.

Moore-212007

pbs

November 27, 2007

9:50

SOLUTIONS TO ODD-NUMBERED EXERCISES

S-77

Group 1 (T) vs. Group 4 (S): t14 = 3.14. |3.14| is larger than t = 2.81, so we have strong evidence that T and S differ. Group 2 (C) vs. Group 3 (J): t23 = 2.29. |2.29| is not larger than t = 2.81, so we do not have strong evidence that C and J differ. Group 2 (C) vs. Group 4 (S): t24 = 3.22. |3.22| is larger than t = 2.81, so we have strong evidence that C and S differ. Group 3 (J) vs. Group 4 (S): t34 = 6.86. |6.86| is larger than t = 2.81, so we have strong evidence that J and S differ. 14.85 n 50 100 150 175 200 DFG 2 2 2 2 2 DFE 147 297 447 522 597 F 3.0576 3.0261 3.0158 3.0130 3.0108 2.78 5.56 8.34 9.73 11.12 Power 0.2950 0.5453 0.7336 0.8017 0.8548

A sample size of 175 gives reasonable power. The gain in power by using 200 women per group may not be worthwhile unless it is easy to get women for the study. If it is difcult or expensive to include more women in the study, one might consider a sample size of 150 per group. 14.87 (a) Group 0 1 3 5 7 Sample size 2 2 2 2 2 Mean 76.1016 65.5547 34.0547 19.3984 13 Std. dev. 2.39753 3.32561 1.20429 1.69043 0.419845 Std. error 1.695 2.352 0.852 1.195 0.297

(b) The sample sizes would be the same in both tables. The means, standard deviations, and standard errors above could have been obtained from those in Exercise 14.44 by dividing each by 64 and multiplying the result by 100. Means, standard deviations, and standard errors change the same way as individual values do. (c) The ANOVA table is Source Degrees of freedom Sum of squares Groups Error Total 4 5 9 6263.97 21.2920 6285.26 Mean square F P-value

1565.99 367.74 0.0001 4.25840

We conclude that there is strong evidence that the group means differ, that is, that the mean % vitamin C content is not the same for all conditions. The degrees of freedom, the F statistic, and P-value are all the same as in Exercise 14.44. 14.89 (a) The ANOVA table with the incorrect observation is Source Groups Error Total Degrees of freedom 3 20 23 Sum of squares 40,820.3 135,853 176,673 Mean Square 13,606.8 6,792.65 F 2.0032 P-value 0.1460

Moore-212007

pbs

November 27, 2007

9:50

S-78

SOLUTIONS TO ODD-NUMBERED EXERCISES

The P-value is larger than 0.10, so we would conclude that we do not have strong evidence that the mean number of insects that will be trapped differs between the different-colored traps. (b) The results are very different. In Exercise 14.61, P-value 0.0001, and we concluded that there was strong evidence that the mean number of insects that will be trapped differs between the different-colored traps. The outlier increased the sum of squares of error considerably, and this results in a much smaller value of F. (c) Group Count Mean Std. dev. Lemon yellow White Green Blue 6 6 6 6 114.667 15.6667 31.5000 14.8333 164.416 3.32666 9.91464 5.34478

The unusually large values of the mean and standard deviation might indicate that there was an error in the data recorded for the lemon yellow trap. 14.91 (a) The pattern in the scatterplot is a roughly linear, decreasing trend. (b) The test in regression that tests the null hypothesis that the explanatory variable has no linear relationship with the response variable is the t test of whether or not the slope is 0. (c) Using software we obtain the following: Variable Constant Number of promotions Coefcient 4.36453 0.116475 Std. error of coeff. 0.0401 0.0087 t ratio 109 13.3 P-value 0.0001 0.0001

The t statistic for testing whether the slope is 0 is 133, and we see that P-value is 0.0001. Thus, there is strong evidence that the slope is not 0. The ANOVA in Exercise 14.49 showed that there is strong evidence that the mean expected price is different for the different numbers of promotions. This is consistent with our regression results here, because if the slope is different from 0, the mean expected price is changing as the number of promotions changes. In this example the regression is more informative. It not only tells us that the means differ but also gives us information about how they differ.

Das könnte Ihnen auch gefallen