Sie sind auf Seite 1von 12

MDM4U 2010 SEM 1

Unit 3: Analysis Two Variable Statistics


Multiple Choice
Identify the choice that best completes the statement or answers the question. Place answer on the line and on scantron
[10K]
____ 1. What is the dependent variable in a correlational study of ice cream sales and air temperature?
a. the amount of ice cream sold
b. the temperature each day
c. the number of people buying ice cream
d. a) or c)

____ 2. If a relationship has a strong, negative, linear correlation, the correlation coefficient that would be appropriate
is
a. 1.0 c. 0.45
b. 0.92 d. 0.32

____ 3. The scatter plot shown includes an outlier.

By keeping it in the regression analysis, how will the outlier affect the line of best fit created without it being
included?
a. The line of best fit is rotated clockwise.
b. The line of best fit is shifted downward.
c. The line of best fit is shifted upward.
d. The line of best fit is unaffected.

The table shows the percent drop in price of several MP3 players relative to their time since entering the
market.

Brand A B C D E F G H I J
Months on Market

7

3

5

1

14

12

4

2

2

9
Percent Drop
in Price

31

21

29

6

40

3

23

15

12

34

The line of best fit has equation P = 1.2t + 14.3, where P is the percent drop in price and t is the time on the
market, in months.
r = 0.44.

____ 4. Which data point might be considered an outlier?
a. (4, 23) c. (1, 6)
b. (12, 3) d. (7, 31)

!"#$%&'(& *+,"!,"( -..%,/-*,#" /#001",/-*,#"
23 24 25 26

____ 5. A researcher hypothesized that an increase in stress at work was caused by an increase in consumption of
alcohol. Further research found that increased consumption of alcohol was caused by increased work stress.
This is an example of
a. a cause-and-effect relationship
b. a reverse cause-and-effect relationship
c. a presumed cause-and-effect relationship
d. an accidental cause-and-effect relationship

For the following data pair, x is the independent variable (cause) and y is the dependent variable (effect).
Would the cause-and-effect relationship tend to be a positive linear relationship, a negative linear relationship
or no relationship?

____ 6. x = height of the father, y = height of the son
a. positive c. none
b. negative

____ 7. Which is not an appropriate question to ask in a critical analysis?
a. Have the sources been properly documented?
b. What software was used to compile the statistics?
c. Are the data recent enough to be current and relevant?
d. Has causality been inferred with only correlational evidence?

____ 8. Which error in the use of linear regression might be present?


a. too few data points
b. an outlier was not removed from the data set
c. the model is not linear
d. all of the above

____ 9. The iesiuuals foi a set of uata iepiesent the
a. uiffeiences between consecutive !-values
b. veitical uiffeiences between uata points anu the line of best fit
c. uata points that lie below the line of best fit
u. uata points that uo not lie on the line of best fit


____ 1u. A coefficient of ueteimination, " = u.7S, inuicates that
a. 7S% of the uata lie on the iegiession line
b. the slope of the iegiession line is u.7S
c. 7S% of the vaiiance in # is a iesult of the vaiiance in !
u. the uata have a stiong positive coiielation

Short Answer

11. Sketch a scatter plot that could represent data from each pair of variables. Label the axes to indicate the
independent and the dependent variables.
[8T]
a) peoples ages (starting at 20), their reaction
times










b) oven temperature, cooking time for a turkey

c) exposure to sunlight, risk of heart attack










d) size of vocabulary, age (birth to 25 years old)














12. The two scatter plots shown generate the same line of best fit. Which will have the more reliable estimates?
Explain.



[2C]






13. For each pair of variables, assume that a strong positive correlation has been observed with the first variable
as the independent variable. Identify the most likely type of causal relationship.
[4C]

a) time spent practising tennis serves, serve success percent




b) marks in history class, marks in geography class




c) family income, level of education of the parents




d) price of gasoline, crime rate






14. List three factors that could lead to erroneous conclusions in a statistical study.


[3T]








1S. A uataset has the iegiession equation # = -2S2! + S.6S. Beteimine the iesiuual foi the point (2, 1). Is
the point above oi below the Line of Best Fit.









[3A]



Problem



16. The coach of the Statsville football team wants to determine if there is a relationship between how fast players
can run 60 m and how far they can throw the football. The results for the Statsville players were as follows.

Player Sprint Time (s) Throwing Distance (m)
Jon H. 7.92 32
Tom M. 8.66 29
Sarjay P. 6.58 35
Brandon F. 8.90 32
Tyler C. 7.12 34
Steve K. 8.76 29
Matt H. 7.55 40
Robin L. 7.37 33
Alex H. 7.96 30
Mike N. 8.45 31
Ankit K. 7.75 26
Scott R. 8.05 32

a) Using technology, create a scatter plot of sprint times versus throwing distances.
b) Perform a linear-regression analysis of the data to find the line of best fit and the correlation coefficient.

[2A]


c) Describe the relationship between these sprint times and throwing distances. Explain.

[2C]


d) State which data points could be identified as outliers, and explain why you chose them.


[2A]

e) Remove the outliers and repeat the regression analysis. Determine the line of best fit and the correlation
coefficient for this smaller sample.


[3T]


f) What might the coach conclude from this analysis? What limits the predictions he could make?

[4C]



g) Use the two regression equations from parts b) and e) to estimate the throwing distance for a player whose
sprint time is 6.50 s.
[2T]

17. The following table iepiesents the numbei of passengeis flying fiom Canaua to othei countiies.

Year
Number of People
(millions)
Year
Number of People
(millions)
1981 11 1990 13
1982 10 1991 12.5
1983 9.9 1992 13.2
1984 10.1 1993 13.8
1985 10.3 1994 13.8
1986 11 1995 17.5
1987 11.1 1996 21
1988 12 1997 22
1989 12.2 1998 23.2


a) 0se lineai iegiession to finu the equation of line of best fit anu the coiielation of ueteimination.

[2A]

b) Pieuict the numbei of passengeis flying out of Canaua in the yeai 2u1u.

[1A]

c) 0se quauiatic iegiession to finu the equation of cuive of best fit anu the coiielation of ueteimination (0se
Y2).

[2A]

u) Pieuict the numbei of passengeis flying out of Canaua in the yeai 2u1u using the quauiatic iegiession
mouel.


[1A]




e) 0se youi answeis foi questions above to ueciue mouel is the best to pieuict the numbei (in millions) of
passengeis flying out of Canaua in the futuie.



[3T]






U3 2010 Sem1
Answer Section

MULTIPLE CHOICE

1. ANS: D PTS: 1 DIF: 1 REF: Knowledge & Understanding
OBJ: Section 3.1 LOC: D2.1 TOP: Statistical Analysis
KEY: dependent and independent variables


2. ANS: B PTS: 1 DIF: 1 REF: Knowledge & Understanding
OBJ: Section 3.1 LOC: D2.1 TOP: Statistical Analysis
KEY: linear correlation


3. ANS: B PTS: 1 DIF: 2 REF: Knowledge & Understanding
OBJ: Section 3.2 LOC: D2.4 TOP: Statistical Analysis
KEY: line of best fit | outlier



4. ANS: B PTS: 1 DIF: 2 REF: Application
OBJ: Section 3.2 LOC: D2.4 TOP: Statistical Analysis
KEY: outlier


5. ANS: B PTS: 1 DIF: 1 REF: Application
OBJ: Section 3.4 LOC: D2.2 TOP: Statistical Analysis
KEY: cause-and-effect relationship



6. ANS: A PTS: 1 DIF: 1 REF: Application
OBJ: Section 3.4 LOC: D2.2 TOP: Statistical Analysis
KEY: cause-and-effect relationship



7. ANS: B PTS: 1 DIF: 1 REF: Knowledge & Understanding
OBJ: Section 3.5 LOC: D3.2 TOP: Statistical Analysis
KEY: critical analysis



8. ANS: C PTS: 1 DIF: 1 REF: Knowledge & Understanding
OBJ: Section 3.5 LOC: D2.5 TOP: Statistical Analysis
KEY: line of best fit | outlier



9. ANS: B REF: Knowleuge anu 0nueistanuing 0B}: 1.4 Tienus 0sing Technology
L0C: ST4.u1 T0P: The Powei of Infoimation




1u. ANS: B REF: Knowleuge anu 0nueistanuing 0B}: 1.4 Tienus 0sing Technology
L0C: STv.u4 T0P

SHORT ANSWER

11. ANS:
Answers may vary. The scatter plots should have the following characteristics.
a) moderate negative linear correlation, age as the independent variable
b) strong negative linear correlation, oven temperature as the independent variable
c) no correlation, amount of exposure to sunlight as the independent variable
d) strong positive linear correlation, age as the independent variable

PTS: 1 DIF: 2 REF: Application OBJ: Section 3.1
LOC: D2.3 TOP: Statistical Analysis KEY: correlation coefficient

12. ANS:
Scatterplot A will have the better estimates because the points will cluster more closely around the line of best
fit.

PTS: 1 DIF: 2 REF: Communication
OBJ: Section 3.2 LOC: D2.4 TOP: Statistical Analysis
KEY: line of best fit


13. ANS:
a) cause-and-effect relationship
b) common cause factor
c) reverse common cause
d) accidental cause-and-effect relationship

PTS: 1 DIF: 2 REF: Application OBJ: Section 3.4
LOC: D2.2 TOP: Statistical Analysis KEY: causal relationships

14. ANS:
Answers may vary. Possible sources of error include
bias in the survey
outliers in the data
failing to account for extraneous variables
failing to detect hidden variables
assuming that a strong correlation proves the existence of a cause-and-effect relationship

PTS: 1 DIF: 2 REF: Knowledge & Understanding
OBJ: Section 3.5 LOC: D2.5 TOP: Statistical Analysis
KEY: critical analysis


15. # = -2S2! + S.6S. (2, 1).
# = -2S2$%& + S.6S
# = -Su4 + S.6S
# = -498.S7


Sub 2 in foi x anu solve foi y
Resiuule = Ylobf - Yu
= -498.27 - 1
= -499.27

Theiefoie the point in below the line.


PROBLEM



16. ANS:
a)

b) The linear regression can be done with a graphing calculator, a spreadsheet, or Fathom. As shown in
the spreadsheet screen above, the equation for the line of best fit is y = 2.48x + 51.6 with r = 0.494.
c) There is a moderate negative linear correlation between the sprint times and throwing distances.
d) On the scatter plot, points (7.55, 40) and (7.75, 26) appear to be outliers since they are somewhat removed
from the rest of the data.
e) If the two possible outliers are removed, the line of best fit becomes y = 2.17x + 49.0 with r = 0.827.

f) There appears to be a negative linear correlation between the sprint times and throwing distances. In other
words, the faster runners tend to throw the ball farther. However, a sample of 12 is too small to make any
reliable predictions, and the coach does not have enough data to determine whether the possible outliers
really are outliers. The correlation between sprint times and throwing distances may, in fact, be only
moderate.
g) Using the regression with the possible outliers,
y = !2.48(6.50) + 51.6
= 35.5 m

Using the regression without the possible outliers,
y = !2.17(6.50) + 49.0
= 34.9 m


PTS: 1 DIF: 4 REF: Application | Communication
OBJ: Section 3.2 LOC: D2.3 | D2.4 TOP: Statistical Analysis
KEY: linear regression | outlier

17. a) y = 0.699X 1376

b) 28.1

c) y = .073x
2
288.77x + 286570

d) 56.7 million

e) Quadratic because R
2
= 0.945