Beruflich Dokumente
Kultur Dokumente
Sharanya Sudhakar
Prof. Saraswati Bala
Math 130
January 31, 2016
Height
70
Male
68
Linear (Male)
Female
66
64
62
60
58
18
19
20
21
22
23
24
25
Knee Height
The regression line has the equation y= 22.9 + 2.10x and the correlation r = 0.86
Sudhakar 2
Predicting the height of three individuals with the given regression line we have,
Pers
on
A
B
C
Gende
r
M
F
F
Knee Height
Predicted Height
24.1
22.5
18.2
73.51
70.15
61.12
In examining the scatterplot we look at the overall form, direction and strength of
the relationship and finally outliers or deviations of pattern. The form of the current
scatter plot is linear. The line formed is a regression line. Since the direction is clear,
we can say this line has a positive direction. As in if the knee height increases then
so do the height of the person in question. There is hence a positive correlation (i.e.,
r=0.86). The strength of the scatterplot is measured by how close the points are to
the regression line and is determined by the value of r. Since r is positive and 0.86
the scatterplot is pretty strong meaning the predicted values of Height will be
accurate more than 80% of the time. Outliers for the scatterplot fall well above or
below the general pattern and in this case we have a max at Knee Height 24.5 and
Height 77 and a min at Knee Height 19.2 and Height 64.5. But no outliers,
because removing these max or min values from the data does not improve the r
value, they are clearly part of the data and hence this data set has the perfect mix
of values to accurately calculate the regression line.
Residuals: Residuals is calculated by finding the sum of the differences between
the measured y value and the predicted y ( ^y ) value.
X
21
22.5
22
22.5
22
21
22.5
21.5
23.5
21.5
22.8
22
20
23
21.78
19.8
21.2
22.8
23
24.5
67
66.5
70
70
68.5
67
67
69.5
73
67
70
71.5
66
71.5
69
65
67
72.5
72.5
77
^y
y- ^y
67
70.15
69.1
70.15
69.1
67
70.15
68.05
72.25
68.05
70.78
69.1
64.9
71.2
68.638
64.48
67.42
70.78
71.2
74.35
0
-3.65
0.9
-0.15
-0.6
0
-3.15
1.45
0.75
-1.05
-0.78
2.4
1.1
0.3
0.362
0.52
-0.42
1.72
1.3
2.65
Sudhakar 3
19.2
64.5
63.22
Total
Residual
1.28
4.93
Mean(y)=69.1
Notice the regression line passes through the mean and the data set Knee
height:21.78 and Height: 69.
Conclusion:
From the scatterplot we are not only able to plot two sets of variables we are able to
correlate them and give them a value. This value sets a trend and helps relate one
variable in terms of the other and make a decent prediction within the range of the
data used. The stronger the correlation the more accurate your prediction and the
lower or closer to zero is the residual value. Depending on the correlation value we
can verify how much one value is dependent on the other or we can move on to
another set of variable that will give a better correlation thus enabling us to fine
tune our prediction. For example, if Knee Height Vs. Height has a better correlation
(in this case) then we use the regression line from this to base our prediction but in
another case if arm length and height can be correlated and it has a better
correlation value then measuring arm length for predicting height might give a
better or more accurate a prediction where residual values are near zero. In this
case the data range will not fall under dwarfism, for which this regression line will
not hold true or if it does it will have to be proven. Thus the scatterplot not only
helps us find the trend of the data set it helps in accurately predicting a data within
the range of the regression line.