Sie sind auf Seite 1von 10

Correlation and Regression

Correlation and Regression

Libeeth B. Guevarra
Department of Mathematics and Natural
Sciences

August 31, 2018

Data Management 1
Correlation and Regression

Correlation and Regression


Correlation is a statistical method used to
determine whether a relationship between
variables exists.
Regression is a statistical method used to
describe the nature of the relationship between
variables, that is, positive or negative, linear or
nonlinear.
A scatter plot is a graph of the ordered pairs
(x, y) of numbers consisting of the independent
variable x and the dependent variable y.
Data Management 2
Correlation and Regression

Example
Construct a scatter plot for the data shown for
car rental companies in City A for a recent year.

Company Cars Revenue


(in ten thousands) (in billions)
A 63.0 7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5

Data Management 3
Correlation and Regression

The Correlation coefficient measures the


strength and direction of a linear relationship
between two variables.
The range of the correlation coefficient is from
−1 to +1.
Formula for the Correlation Coefficient r
P P P
n( xy ) − ( x)( y )
r=p P P P P
[n( x 2 ) − ( x)2 ][n( y 2 ) − ( y )2 ]

where n is the number of data pairs.

Data Management 4
Correlation and Regression

Example
Compute the correlation coefficient for the data:

Company Cars Revenue


(in ten thousands) (in billions)
A 63.0 7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5

Data Management 5
Correlation and Regression

If the value of the correlation coefficient is


significant, the next step is to determine the
equation of the regression line, which is the
data’s line of best fit.
This enables the researcher to see the trend
and make predictions on the basis of the data.
The equation of the least-squares line for the
ordered pairs (x1 , y1 ), (x2 , y2 ), . . . (xn , yn ) is the
line

y − ȳ = m(x − x̄)

Data Management 6
Correlation and Regression

y − ȳ = m(x − x̄)
where:
x̄ = mean of variable x
ȳ = mean of variable y
m =slope of the line
P
xy − nx̄ ȳ
m=P 2
x − n(x̄)2

Data Management 7
Correlation and Regression

Example
Find the equation of the regression line for the
data

Company Cars Revenue


(in ten thousands) (in billions)
A 63.0 7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5

Data Management 8
Correlation and Regression

Another formula for the Regression line


y = a + bx.
( y)( x 2 ) − ( x)( xy)
P P P P
a= P P
n( x 2 ) − ( x)2
P P P
n( xy ) − ( x)( y)
b= P P
n( x 2 ) − ( x)2
where a is the y intercept and b is the slope of the line.

Data Management 9
Correlation and Regression

The Coefficient of Determination is a measure


of the variation of the dependent variable that is
explained by the regression line and the
independent variable. The symbol for the
coefficient of determination is r 2 . If r = 0.90,
then r 2 = 0.81, which is equivalent to 81%. This
result means that 81% of the variation in the
dependent variable is accounted for by the
variations in the independent variable. The rest
of the variation, 0.19, or 19 %, is unexplained.

Data Management 10

Das könnte Ihnen auch gefallen