Sie sind auf Seite 1von 6

Contents

What is scatter diagram ...................................................................................................................... 2


What is correlation and Coefficient of Correlation............................................................................. 2
How to calculate the Coefficient of Correlation: ................................................................................ 3
Problem with Correlation ................................................................................................................... 4
What is Linear regression.................................................................................................................... 4
What is coefficient of determination .................................................................................................. 6
What is scatter diagram
Examining a set of values corresponding to measured data where X1 the first variable value
corresponding value to Y1 written as (X1, Y1) which indicates an ordered pair and X2 the second
variable value corresponding value to Y2 written as (X2, Y2) hence forming (Xn, Yn), where “n” being
the data pairs number in the series. To understand the connection between these set of values, a
scatter plot or scatter diagram (Fig. 1) is used as a data visualization tool to portray whether there
exists a relationship in the data and to determine the nature of relationship. Therefore, scatter
diagrams are the drawn the data pairs (Xn, Yn) presented as dots or crosses in a coordinate plane and
the more the data the more accurate the prediction of the relationship between the data. The
independent (Controlled) variable is displayed in the X-axis and the dependent (Response/measured)
variable on the Y-axis.

Y-Axis

X-Axis
Figure 1 Scatter Diagram

What is correlation and Coefficient of Correlation


Correlation examines the relationship between two sets of variables. It shows the effect of one
variable change on the other, it is not causation, meaning it is not the reason behind the relation
between the variables but rather the measurement of the strength of quantity of change between
variables. It is measured by the Pearson correlation coefficient also known as Pearson Product
Moment correlation (PPMC) or Person r test, denoted as “r” which is measure the degree of the linear
relationship between two sets of data, the strength and direction of correlation.
The coefficient of correlation measures how close the ordered pairs come into a straight line. The
distance of the points to the line dictates the type of linear relationship; the further the points are
from the line the weaker the relationship and the closer they are to the line, the stronger the
relationship. Whereas “r” always falls between -1≤ r ≤ 1, r=1 indicates a strong positive relationship,
r=-1 indicates a negative relationship while r=0 indicates no relationship
In summary (Fig. 2), the closer the r value is to “1” the stronger the relationship and the closer the
points are to a perfect line and the direction of the slope of the line is indicated by sign of the r value,
the positive the number the positive the slope and vice-versa. The closer the r value is to “0”, the
weaker the relationship.

r=-1 r=1
r=0

Figure 2 Correlation types


How to calculate the Coefficient of Correlation:
Taking the equation step by step:
𝐶𝑜𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋𝑛 , 𝑌𝑛 )
𝑟=
𝜎𝑋 𝜎𝑌

R, measurement of the strength of the relationship, is the covariance (measurement of change of


linear dependence of X and Y, how much they move together), divided by the standard deviation of X
and Y.

Where:
n= number of ordered pairs in the series.
∑XY= Sum of the product of the pairs.
∑X= Sum of X Values.
∑Y= Sum of Y Values.
∑X2= Sum of square of X Values.
∑Y2= Sum of square of Y Values.
Taking an example where a relationship is examined between the annual expenses of advertisement
and the annual sales. The values of the variables stated in the table below.
Advertisement Annual Sales
Expenses ($M) ($M)
11 21
9 15
7 10
10 20
8 14
5 6
8 10

Taking the Advertisement Expenses as the variable on the X-axis and the Annual Sales variable on the
Y-Axis, the Scatter Graph is plotted as seen in Fig. 3

Annual Sales
25
Advertisement Expenses

20

15

10

0
0 2 4 6 8 10 12
Annual Sales
Figure 3 Scatter Graph
The Scatter graph shows a high strength positive relationship between the variable values, meaning
the calculated “r” is predicted to be a positive value that is close to the integer “1”.
Advertisement Annual Sales XY X2 Y2
Expenses
11 21 231 121 441
9 15 135 81 225
7 10 70 49 100
10 20 200 100 400
8 14 112 64 196
5 6 30 25 36
8 10 80 64 100
Total 58 96 858 504 1498
Table 1
Substituting the values in the equation:

7(858) − (58𝑥96) 7(438)


𝑟= = = 0.9597
√((7𝑥504) − (58)2 )𝑥((7𝑥1498) − (96)2 ) √(164)𝑥(1270)

Problem with Correlation


Inability to determine the difference between the dependent variable and the independent, if in the
example above, if the variables on each axis were exchanged, the relationship according to this graph
would imply as the sales increases advertisement expenses also increases, which is not the correct.

What is Linear regression


The degree of correlation of the variables is measured by the linear regression, which is a prediction
of the change of one variable given the change in the other. Meaning given a set of ordered pairs, if a
value is unknown for a variable (Predictor Variable) in the set, it can be predicted from its
corresponding known variable (Criterion Variable). Just as correlations strength is measured by
closeness of the points to the line, the degree of regression is measured by the correlation of the
points to the regression line (Fig. 4) or the Line of least square. It is the line closest to all the set of
points plotted.

Figure 4 Regression Line


This models a system of prediction as the least squares line composed of a linear equation in terms
𝑌 = 𝑎𝑋 + 𝑏
Where:
a=slope of the line
b= Y intercept
(∑𝑋)(∑𝑌) − 𝑛∑𝑋𝑌
𝑎=
(∑𝑋)2 − 𝑛∑𝑋 2

(∑𝑋)(∑𝑋𝑌) − (∑𝑌)(∑𝑋 2 )
𝑏=
(∑𝑋)2 − 𝑛∑𝑋 2
Using the same data in Table 1:
Advertisement Annual Sales X2 Y2 XY
Expenses

11 21 121 441 231


9 15 81 225 135
7 10 49 100 70
10 20 100 400 200
8 14 64 196 112
5 6 25 36 30
8 10 64 100 80
Total 58 96 504 1498 858

Substituting in the equations to Solve for the a and b Values.


(58)(96) − 7 ∗ 858
𝑎= = 2.6707
(504)2 − 7 ∗ 504

(58)(858) − (96)(504)
𝑏= = −8.4146
504 − 7 ∗ 504
Therefore, using

𝑌 = 𝑎𝑋 + 𝑏
𝑌 = 2.6707(𝑋) − 8.4146
Taking x=6
𝑌 = 2.6707(6) − 8.4146 = 7.6
This Table Summarizes the Difference between Correlation and Regression
1

What is coefficient of determination


Coefficient of determination “r2” is a measure of the accuracy of the regression equation’s data as a
percentage variation in the Y-value explained in X-variables, it is the square of the coefficient of
correlation “r” which has a value from “0” to “1”/ 0% to 100%. The value measures how strong the
linear relationship is. It is a measurement of how many y-predicted values will be close to the line of
least squares formed by the regression equation. The higher the coefficient, the closer it’s value is to
100% the more occurrences of points that the line pass through.
Taking the previous data and calculating the Coefficient of determination:
r was calculated to be 0.9597

Therefore, r2= (0.9597)2= 0.921 which is 92.1% meaning 92.1% of the points fall within the regression
line.

1
“Correlation and Regression” Surbhi, Key Differences. May 3 2016. Web.

Das könnte Ihnen auch gefallen