Sie sind auf Seite 1von 15

Tutorial 11

Scope of this tutorial: Discussion - scatter plots Regression Exercise Revision of earlier hypothesis tests

Worksheet 1: Q. 1 What sort of a relation is that?


X (determinant) (a) Max. daily temperature and soft drink sales (b) Odometer reading and sale price of used cars Max. daily temperature Odometer reading Y (outcome) soft drink sales Relation Positive

sale price of Negative used cars

Annual (c) Annual income and credit card balance of bank income clients

credit card balance

Positive
2

Worksheet 1: Q.2: Looking for relationship


What relation can you see in the following scatter diagrams?

Rising trend, plus periodic rise and fall.

Negative linear relation between temperature and latitude: 3 Higher latitude => lower temp.

Positive linear relation between chest girth and weight for males.

Probably no relation between attendance of crowd at MCG and temperature.


4

Astronomy: galaxy
What is a galaxy? A galaxy is a collection of stars, ranging from ten million (107) up to a hundred trillion (1014) stars.

Group of galaxies

Expansion of the universe after the Big Bang creation

Worksheet 2: Relation between distance from


Earth and radial velocity of galaxies in the universe Hubbles law

There appears to be a positive relation between velocity and distance. V = -40.784+454.158*distance

Research Question: Is the distance from earth a


useful predictor of the radial velocity of galaxies? v = -40.784+454.158*distance Meaning of the regression equation (or meaning of the slope): For each increase of 1 megaparsec (Mpc) from Earth, velocity increases by 454 km/sec, on average.
1 Mpc

454 km/sec

(1 Mpc=3.26 million light years)

H Ho: = 0 A The relation appears reasonably linear. The points seem to be fairly evenly spread round the line with no obvious outliers, indicating that the residuals have constant standard deviation and residuals may be normally distributed. T t = 6.036, df=22 P p-value 0. Since p<0.05, reject Ho
11

10

Predictions:
Predict the radial velocity of a galaxy which is 1.25 Mpc from Earth. v=-40.784+454.158*distance =-40.784+454.158*1.25 = 526.9 km/sec Predict the radial velocity of a galaxy which is 2.25 Mpc from Earth. 2.25 megaparsecs is out of range of data, hence not valid to predict.

C: There is a significant positive linear relation between distance and radial velocity (Hubbles law). For each increase of a distance of 1 megaparsec (Mpc) from earth, a galaxys velocity increases by 454 km/sec, on average. We are 95% confident that the true increase is between 298 and 610 km/sec.

12

13

Predictions:
Predict the distance from earth for a celestial object which has a radial velocity of 400 km/sec. Not valid to predict independent variable (X) from outcome (Y)
(For those curious: If we really want to predict distance from velocity, we need to re-do the regression using velocity as x (independent variable) and distance as y (dependent variable). Then the new regression will be Distance = a + b*velocity)
14

Goodness-of-fit statistic r2
Interpret the goodness of fit statistic: r2 = 0.624. 62.4% of the variation in radial velocity of galaxies can be explained by the variation in distance from Earth. Calculate and interpret the correlation coefficient: r=+0.624 = 0.79, indicating there is a fairly strong positive linear relation between the two variables.

15

Variable ID D BF%

Description ID of male (1 252) Density determined from underwater weighing Percent body fat from Siri's (1956) equation Age (years) Weight (kg) Height (m) Body Mass Index (kg/m2) Neck circumference (cm) Chest circumference (cm) Abdomen circumference (cm) Hip circumference (cm) Thigh circumference (cm) Knee circumference (cm) Ankle circumference (cm) Biceps (extended) circumference (cm) Forearm circumference (cm) Wrist circumference (cm)

Revision Questions

Age W H BMI Nec Che Abd Hip Thi Kne Ank Bic Arm Wri

16

17

Question 1: Display
1. a) What type of graphical display should you provide to compare the percentage body fat (BF%) of males aged less than 39 years and males aged 39 years or more? BF%: numeric (continuous) variable Less than or more than 39 years old: binary variable Hence comparative box plots

Question 2: One-sample z-test


2. Research Question: Was the mean BMI of Australian males in 2008 the same as it was in the 1980s? Assume the mean BMI of Australian men in the 1980s was equal to 25 with a SD of 3.5. In 2008, a random sample of 20 Australian males was selected and the BMI of each male was recorded.
27.84 30.44 29.86 31.04 30.81 24.93 23.57 21.23 30.98 26.25 24.84 27.03 31.02 23.54 25.49 29.38 24.52 27.62 32.94 22.46

New variable

b) An obese person is said to have a body mass index (BMI) of more than 30. What type of graphical display should you provide to compare the proportions of obese males aged less than 39 years with those aged 39 or more years?
BMI above or below 30 (obese or not obese): binary variable New variable Less than or more than 39 years old: binary variable New variable

Hence clustered bar chart.


18

Carry out a suitable hypothesis test to answer the research question. Assume that the variation in BMI has not changed. 19

One-sample z-test, NOT t-test s, NOT used

Question 3(a)
Was there a difference between the average percentage body fat (BF%) of American males in 1985 aged less than 39 years and the average BF% of American males aged 39 years or more? => 2-sample t-test
Freq.
20 15

Freq.
6 5 4 3 2 1 0
20 22 24

BMI

<39yrs

26

28

30

32

10 5 0
0 5 10 15 20 25 30

Freq.
20 15 10 5 0
0 10

>39yrs

20

30

40

0.0003

95% CI = y 1.96

= 27.2895 1.96

3.5 = (25.76,28.82) 20

20

21

Double this is NOT the CI.

Question 3(b): Was the ankle circumference 5cm more, on average, than the wrist circumference of American males in 1985? => paired t-test

We are 95% confident that the BF% of males aged over 39 years between 1.87% and 6.26% higher than the younger males on average.

Freq.
100 80 60 40 20

difference

95% CI for 1 2 = ( y1 y2 ) t s p = 4.066 2.198 = (1.87, 6.26)


1 n1

1 n2

2 3 4 5 6 7 8 9 1011 12

CI/2 CI/2 (-------------+------------) 4.066


2*2.198=4.396 is NOT CI. It is the length of CI.
22 23

Question 4: Regression
Research Question: Was the BMI of American males in 1985 a useful predictor of BF%? Use the output to complete this question.
1. Which is the dependent/response variable? 2. Which is the independent/predictor variable? 3. Comment on the scatterplot. 4. Write down the equation of the regression line. 5. Test the statistical significance of the relation. 6. Predict, if appropriate, the expected % Body Fat for: (a) a male with a BMI of 20; (b) a male with a BMI of 15 7. Predict, if appropriate, the expected BMI for a male with 20% Body Fat. 8. (a) Calculate r and interpret. (b) Calculate r2 and interpret.
24

1. Which is the dependent/response variable? BF% 2. What is the independent/predictor variable? BMI?
50 40 30 20 10 0 25 BMI 30 35 3. Comment on the scatter plot. 15 20 Positive linear relation: higher BMI => higher BF% Residual constant SD No outliers; symmetric on both sides 40 BF% BF% vs BMI

4. Regression equation BF % = 26.9872 + 1.8186 * BMI

25

5. Test the statistical significance of the relation.

6. Predict, if appropriate, the expected % Body Fat for: (a) a male with a BMI of 20; (b) a male with a BMI of 15 A male with a BMI of 20: BF% = -26.9872 + 1.8186*20 = 9.384 A male with a BMI of 15: Not valid to predict, since 15 is out of the range of the data. 7. Predict, if appropriate, the expected BMI for a male with 20% Body Fat. Not valid to predict the independent variable (predictor or x) from the dependent variable (outcome or y) 27

26

Question 5: Best predictor Research Question: Which of the BMI, Neck Circumference or Abdomen circumference is the best predictor of BF%?

8. (a) Calculate r and interpret. r = 0.535 =0.73 There is a fairly strong positive linear relation between BMI and BF%. (b) Calculate r2 and interpret. r2 = 0.535 This indicates that about 53% of the variation in BF% can be explained by the variation in BMI.
28 29

Best predictor Fill in the table. Explain your answer.

Practice Exercises: Question 1


Consider the computer output which shows the relation between students ideal weights and their actual weights for females. Note the dotted line represents the cases when the ideal weight is the same as the actual weight.

Each of the predictors is a significant predictor of BF%; the p-val for each of the predictors is 0.000. Each regression equation satisfies the assumptions of linearity, constant spread and normality of the residuals. However, the abdomen circumference (Abd) provides the best fitting as r2 = 67% is much higher than the others. Note: 1. NEVER compare values of b. 2. It is easier, and better, to compare r2 instead of p-vals. 3. Discard (cross out) variables if they break assumptions or if p-val>0.05.
30

31

Question 1
(a) By comparing the regression line (solid) with the
line y=x (ie ideal weight_f = weight_f) (dotted), comment on the scatter plot.

Question 1
(b) From the partial EcStat output above, perform an appropriate hypothesis test to see if there is a linear relation between Ideal weight_f (Y) and Weight_f (X). Partial ans: t=21.29

32

33

Question 1 (answers)

Question 1 (continued)
(c) What is the value of goodness-of-fit statistic? Interpret its meaning. Ans: 70.8% Meaning: .

34

35

Stat 110 100

Question 2

90 80 70 60 50 40 30 20 20 30 40 50 60 Acc 70 80 90 100

The table on the right shows Accounting and Statistics marks for 12 students. Research question: Can Accounting marks (X) be used to predict Statistics marks (Y)? Use the partial EcStat output below to answer the research question.
df: 10 outcome: Stat predictor coeff SE t p-value constant 7.0194 7.971 0.8806 0.399 Acc 0.9560 0.129 r-sq: 0.845 Resid SS: 1046.876 s: 10.232

Acc Stat 74 81 93 86 55 67 41 35 23 30 92 100 64 55 40 52 71 76 33 24 30 48 71 87

Question 2 (answers)
(Partial Ans: t=7.411)

95% C.I. -10.741 24.779


36 37

Question 2 (continued)
(b) What is the value of goodness-of-fit statistic? Interpret its meaning.

Question 3

Height 195 190 185 180 175 170 165 160 155 150 40 50 60 70 Weight 80 90 100

Research question: Can Weight (X) be used to predict Height (Y)? Using the partial EcStat output below to answer the research question.
df: 82 outcome: Height predictor coeff SE t p-value constant 130.1702 4.041 32.2109 0.000 Weight 0.6699 0.061 r-sq: 0.595 Resid SS: 2855.483 s: 5.901 95% C.I. 122.131 138.209

38

39

Question 3 (answers)
(Partial Ans: t=10.98)

Question 3 (continued)
(b) What is the value of goodness-of-fit statistic? Interpret its meaning.

40

41

Weight 100

Question 4

90 80 70 60 50 40 150

Question 4 (answers)
(Partial Ans: t=10.976)
160 170 Height 180 190

(Swap X and Y in Question 3.)


Research question: Can Height (X) be used to predict Wight (Y)? Using the partial EcStat output below to answer the research question.
df: 82 outcome: Weight coeff SE t p-value predictor -89.1380 14.096 -6.3238 0.000 constant 0.8881 0.081 10.9760 0.000 Height r-sq: 0.595 Resid SS: 3785.467 s: 6.794 95% C.I. -117.179 -61.097 0.727 1.049
42 43

Question 4 (continued)
(b) What is the value of goodness-of-fit statistic? Interpret its meaning.

Question 5
For each of the following given regression equations, interpret (i) the equation and (ii) r2. (a) X=time a bee spends on a flow y = 13 + 2.05x Y = % pollen removed, r2 = 0.384 Interpretation of equation (slope): Interpretation of r2:
44 45

(c) Explain why the value of r2 is the same as that in Question 3.

Question 5 (continued)
(b) X = students high school results Y = STAT170 exam results r2 = 6.2%

Question 5 (continued)
y = 29.23 + 0.54x
(c) X = number of cans of beer drank Y = blood alcohol content y = 0.0217 + 0.0203x r2 = 82.1%

46

47

Question 1(Q.2 of previous exercise)


Research question: Can Accounting marks (X) be used to predict Statistics marks (Y)?

Computer (EcStat) Exercises


1. Enter the 2 columns as shown. 2. Optional but recommended: Pre-highlight Y (Account), then press Ctrl key and highlight X (Stat). 3. Click Relationship (4th icon).
48 49

Make sure the X (Account) and Y (Stat) are chosen correctly, otherwise you will have the wrong graph, and wrong regression results.

df: 10 outcome: Stat (Y) predictor coeff SE t p-value constant 7.0194 7.971 0.8806 0.399 Account (X) 0.9560 0.129 7.3927 0.000 r-sq: 0.845 Resid SS: 1046.876 s: 10.232 Fitted line: Stat (Y) = 7.0194 + 0.956 Account (X)
Stat (Y) 110 100 90 80 70 60 50 40 30 20
50

95% C.I. -10.741 24.779 0.668 1.244

20

40 Account (X) 60

80

100

51

Question 1(continued)
Fill in the following answers: (a) Ho: ___________________ (b) Write down the regression equation: ______________________________ (c) What is the value of test statistic? (Include symbol z/t) ___________________ (d) What is the value of p-val? ________ (e) Do you reject or not reject Ho? _________ (f) What is a 95% CI for ? ____________________ (g) Does the 95% CI for include the null value? ______ (h) What is the value of goodness-of-fit statistic? _______
52

Question 2 (Pract 8 Exercises)


Load the file pulse.xls (used in Pract/WASP 8) Research question: Can Height (X) be used to predict Weight (Y) ? Perform the hypothesis test using EcStat. Then answer the questions on the next slide.

53

Question 2(continued)
Fill in the following answers: (a) Ho: ___________________ (b) Write down the regression equation: ______________________________ (c) What is the value of test statistic? (Include symbol z/t) ___________________ (d) What is the value of p-val? ________ (e) Do you reject or not reject Ho? _________ (f) What is a 95% CI for ? ____________________ (g) Does the 95% CI for include the null value? ______ (h) What is the value of goodness-of-fit statistic? _______
54

Question 3 (Pract 8 Exercises)


Load the file Storks.xls (used in Pract/WASP 8) Research question: Can the number of storks (Stork) be used to predict the number of babies born (Birth)? Perform the hypothesis test using EcStat. Then answer the questions on the next slide.

55

Question 3(continued)
Fill in the following answers: (a) Ho: ___________________ (b) Write down the regression equation: ______________________________ (c) What is the value of test statistic? (Include symbol z/t) ___________________ (d) What is the value of p-val? ________ (e) Do you reject or not reject Ho? _________ (f) What is a 95% CI for ? ____________________ (g) Does the 95% CI for include the null value? ______ (h) What is the value of goodness-of-fit statistic? _______
56

Question 4 (Pract 8 Exercises)


Load the file Peru.xls (used in Pract/WASP 8) Research question: Can the number of years (Years) since migration be used to predict the systolic blood pressure (Systol)? Perform the hypothesis test using EcStat. Then answer the questions on the next slide.

57

Question 4 (continued)
Fill in the following answers: (a) Ho: ___________________ (b) Write down the regression equation: ______________________________ (c) What is the value of test statistic? (Include symbol z/t) ___________________ (d) What is the value of p-val? ________ (e) Do you reject or not reject Ho? _________ (f) What is a 95% CI for ? ____________________ (g) Does the 95% CI for include the null value? ______ (h) What is the value of goodness-of-fit statistic? _______
58

Question 5 (Pract 8 Exercises)


Continue with the file Peru.xls. Research question: Can Forearm (X) be used to predict Weight (Y)? Perform the hypothesis test using EcStat. Then answer the questions on the next slide.

59

Question 5 (continued)
Fill in the following answers: (a) Ho: ___________________ (b) Write down the regression equation: ______________________________ (c) What is the value of test statistic? (Include symbol z/t) ___________________ (d) What is the value of p-val? ________ (e) Do you reject or not reject Ho? _________ (f) What is a 95% CI for ? ____________________ (g) Does the 95% CI for include the null value? ______ (h) What is the value of goodness-of-fit statistic? _______
60

Das könnte Ihnen auch gefallen