Sie sind auf Seite 1von 8

SKILLS SET

Chapter 9: Correlation & Regression


No.
Skills
1 To obtain a
scatter
diagram for
bivariate data
(labeling max
& min values
of each
variable)

Examples of questions involving the skills


Lecture example 1
The temperature, T , in degree Celsius ( C ) of the tyre of a car is measured
when the car travels at different speed, v (kmh-1). Eight sets of data are
obtained. Sketch the scatter diagram for the data.
v
20
30
40
50
60
70
80
P

T
Solution

45

52

64

66

91

86

104

45

20

To comment on
the relationship
(with respect to
the direction,
form and
strength) of the
bivariate data
using the
scatter diagram

104

98

90

90

Lecture example 2
Comment on the correlation between the two variables based on the scatter
diagram below.
100
80
60
40
20
0

20

60

40

100
80
60
40
20
0

80

Positive linear correlation (strong)


30
25

50

100

Positive correlation (weak)

100

20
15
10
5
0

50
0
0

10

20

No clear correlation

20

40

30

Non-linear correlation

60

To calculate
product
moment
coefficient
using formula
or TI-84+

Lecture example 3
In a physical education class, the number of push-ups (x) and sit-ups (y) done by
a sample of ten randomly chosen students were recorded and summarized as
shown.
Student

10

Push-ups (x)

27

22

15

35

30

52

35

55

40

40

Sit-ups (y)

30

26

25

42

38

40

32

54

50

43

xy = 14257, x 2 = 13717, y 2 = 15298, x = 351, y = 380


Find the product-moment correlation coefficient of the sample.
Solution

14257
r=

( 351)( 380 )
10

( 351)2
( 380 )2
13717
15298

10
10

0.839

To find the
equation of the
regression line
of y on x or
equation of the
regression line
of x on y using
the given data
list.

Lecture example 5

[Note: The
regression line
of y on x or
equation of the
regression line
of x on y are
different unless
r = 1 ]

Find the equation of the regression line of y on x and the regression line of y on

In a physical education class, the number of push-ups ( x ) and sit-ups ( y ) done


by a sample of ten randomly chosen students were recorded in the table below.
Student

10

Push-ups ( x )

27

22

15

35

30

52

35

55

40

40

Sit-ups ( y )

30

26

25

42

38

40

32

54

50

43

x.
Solution
From TI84+, the equation of the regression line of y on x is y = 0.658 x + 14.9
The equation of the regression line of x on y is x = 1.07 y 5.60 .
Regression Line y on x
Regression Line x on y

To determine
the appropriate
regression line
to use to
predict value;
justify the
choice of
regression line
used and
comment on
the reliability of
the prediction

The choice of the regression line used depends on the context of the situation:
(a) If there is a clear indication that x is the independent variable, we will
always use the regression line of y on x to do estimation.
(b) For cases where there is no clear independent variable, if we want to
estimate y for a given value of x , we use the regression line of y on x .
If we want to estimate x for a given value of y , use the regression line of
x on y .
Estimates using regression lines are only reliable if both the following conditions
are met:
(a)
The value of r of the data is close to 1 (and the scatter diagram also
suggests that there is a strong linear correlation).
(b)
The estimation is done within the given range of values of data.

Using previous Lecture example 5


(iii) Use a suitable line to predict the number of sit-ups a student can do when
he can do 50 push-ups. Give a reason if the predicted value is reliable.
(iv) Give a reason whether it is reliable to use the equation in (i) to predict the
number of sit-ups when 60 push-ups are done.
Solution
(iii) Since there is no clear independent variable and we want to estimate y for
a given value of x , we use the regression line of y on x .
When x = 50 , y = 0.658 (50) + 14.9 = 47.8 He can do 47.8 sit ups 48 sit ups.
Since 50 is within the data range of x and from the graphic calculators result
that r = 0.839 which is close to 1 , thus indicating a strong linear correlation
between x and y . Therefore the estimated value of y obtained should be
reliable.
(iv) No because 60 is outside the data range [15, 55] (extrapolation), hence it is
not reliable to predict the number of sit-ups based on this set of data.

Lecture example 6
An electrical fire was switched on in a cold room and the temperature of the
room was noted at 5-minute interval.

Time, x (in
minutes) from
switching on fire
Temperature, y
(in C )

10

15

20

25

30

35

40

0.4

1.5

3.4

5.5

7.7

9.7

11.7

13.5

15.4

Explain why the regression line of y on x rather than the regression line of x
on y should be used to predict the time that has passed after switching on the
fire if the temperature is 9 oC.
P

Solution
From the question, x is the controlled variable and y is measured based on
regular time interval of 5 minutes, suggesting that y is dependent on x . Thus,
the regression line of y on x should be used.

To interpret the
value of the
slope and yintercept in the
context of the
question

Using previous Lecture example 5


Interpret the slope and intercept in the context of the question.
Solution
Slope: For every increase of 10 push ups, it is estimated that there will be an
expected increase of about 6.6 sit ups. y -intercept ( 0, 15 ) : when a student
cannot do any push ups, the student can still do about 15 sit ups. However, as 0
is outside the data range of x , this estimate is not meaningful.

To find the
equation of the
regression line
of y on x or
equation of the
regression line
of x on y using
the data
statistics, b and
d values

Tutorial Example 4
The following summarizes the data from 10 sets of lengths(x) and breadths(y) in
mm:

x = 1782, y = 1483, x

= 318086, y 2 = 220257, xy = 264582

Find the equation of the regression line of y on x and that of x on y.


Solution
x = 178.2 ; y = 148.3

x y
xy n 311.4
b=
=
= 0.58358 ;
533.6

x
x ( )

n
2

y y = b ( x x ) y = 0.584 x + 44.3

Method 1

x y
xy n 311.4
d=
=
= 0.94910
328.1

y
y ( )

n
2

x x = d ( y y ) x = 0.949 y + 37.4
Method 2

r 2 = bd
r 2 0.7442312
=
= 0.94910
b
0.58358
x x = d ( y y ) x = 0.949 y + 37.4

d=

To find missing
data using
regression lines

Tutorial Example 8

[Note: x and

FM2003/II/11OR(modified)
A random sample of eight pairs of values of x and y is used to obtain the
following equations of the regression lines of y on x and x on y respectively.

y lie on both

y=

regression
lines]

7
151
7
x+
, x = y + 20
10
10
6

10

11

12

11

17

14

19

Seven of the pairs of data are given in the table.


Find the eighth pair of values of x and y.
Solution

7
151
7
x+
---(1) and
x = y + 20 ---(2)
10
10
6
Solving (1) & (2) gives x = 13, y = 6
y =

x = 94 + x

8
= 13 x8 = 10
8
y = 40 + y8 = 6 y = 8
y=
8
n
8
The eighth pair of values is (10,8 ) .

x=

To perform
transformat-ion
of data in order
to obtain
regression line

Some examples here:

Relationship

Transformation

Linear Relationship

y = ax b

Take natural logarithm


(or take logarithm of another
base)

i.e., ln y and ln x have a


linear relationship.

y = aebx

Take natural logarithm


(or take logarithm of another
base)

i.e., ln y and x have a linear


relationship.

ln y = ln a + b ln x

ln y = ln a + bx

y 2 = ax + b
Square both sides

y = ax + b

y=

1
ax + b

i.e., y 2 and x have a linear


relationship.

1
= ax + b
y
1
i.e.,
and x have a linear
y

Take reciprocal

relationship.
To decide on
model using
(A) scatter
diagrams,
graphs of
various models
(ie checking
concavity and if
graph
increasing or
decreasing),
(B) r values
(ie observing
which model
has r closer
to 1).

(A)
Tutorial Example 7
The data shows the result of an experiment to investigate the relationship
between two variables x and t, where x is dependent on t.
x 22.5 25.0 28.0 30.5 38.0 40.5 42.5 48.0 54.5 55.0 70.0
t

44.0 42.0 33.5 28.0 18.0 13.6 15.0 10.3 9.0

6.3

4.0

(i)

Obtain the scatter diagram and comment on any relationship between x


and t.
(ii)
State, with a reason, which of the following models is more appropriate
to fit the data points:
(a)
x = at b where a > 0 and b < 0
(b)
x = a + bt 2 where a > 0 and b < 0.
Solution
(i)

x and t are negatively correlated (OR: as t increases, x decreases)


80
70
60
50
x

10

40
30
20
10
0
0

10

20

30

40

50

(ii)

(a)

x = at b , a > 0, b < 0 : Graph is concave upwards.

(b)
x = a + bt 2 , a > 0, b < 0 : Graph is concave downwards.
From the scatter diagram, the shape of the graph is concave upwards,
therefore the model x = at b , a > 0, b < 0 is more appropriate to fit the
data points.

(B)
Lecture example 8
The following data were collected during an experiment which investigated the
average lifespan of plants, t days, as the pH, y, of the soil in which the plant was
grown varied.
y

4.5

5.2

6.1

6.5

7.0

7.3

8.5

9.5

1.14

1.20

1.26

1.29

1.33

1.35

1.42

1.48

State, with a reason, which of the following models is more appropriate to fit
the data points:
(a)
(b)
t = a + by
t = ay b
Solution

t = ay b ln t = ln a + b ln y
If t = ay b is a suitable model, ln y and ln t should have a strong linear
correlation.
If t = a + by is a suitable model, y and t should have a strong linear correlation.
Using TI84+, the value of r between ln y and ln t is 0.99953 and the value of
r between y and t is 0.9976. Since the value of r between y and t is closer
to 1. Hence the model t = ay b should be more suitable than the model
t = a + by .

11

12

To estimate the
value of
unknowns in
within model
through the
regression line
obtained after
transformation

To compare the
sum of squares
of residuals

Continue using Tutorial Example 7 from above.


(iii)

For the appropriate model, find the product moment correlation


coefficient for the transformed data. Estimate the values of a and b.

Solution

x = at b ln x = ln a + b ln t
Generate new transformed data ln x and ln t using GC, and obtained required
regression line.

ln a = 4.9134 a = 136.1 136 ;


b = 0.453
In the context of the least squares regression line of y on x , it is the line that
produces the smallest sum of the squares of the residuals
ei2 , where

between the
least squares
regression line
and other lines.

ei (which is known as the residual) = observed y value the predicted y value


of the observations ( xi , y i ) .
In the context of the least squares regression line of x on y , it is the line that
ei2 , where
produces the smallest sum of the squares of the residuals

ei (which is known as the residual) = observed x value the predicted x value


of the observations ( xi , y i ) .

Das könnte Ihnen auch gefallen