You are on page 1of 5

Coefficient of Correlation

Sometimes we wish to obtain an indicator of the strength of the linear relationship between two
variables Y and X that is independent of their respective scales of measurement. We call this a measure
of the linear correlation between Y and X.

The measure of linear correlation commonly used in Statistics is called the Pearson coefficient of
correlation. This quantity is denoted by the symbol r, and it is computed as follows:


=
2 ( )2 2 ( )2

What does the correlation coefficient tell us?

The correlation coefficient is a sample statistic from a data set of ordered pairs (x,y). It
is a measurement indicating the strength of a linear relationship between x and y.
r is a metric ranging from -1 to 1
An r value close to 1 indicates that a positive linear relationship exists between x and
y. This means that as x increases, y increases in a linear fashion. ( Positively correlated)
An r value close to -1 indicates that a negative linear relationship exists between x and
y. This means that as x increases, y decreases in a linear fashion. (Negatively
correlated)
An r value close to 0 indicates that the relationship (if any) between x and y is not
linear. (Uncorrelated)
Example:
Use the data in the following table to find the correlation between the Average Personal
Income and the Average State and Local Taxes paid per capita

State Average Personal Income Average Taxes per capita


Arkansas 9724 771
California 14344 1337
Connecticut 16369 1434
Illinois 13728 1255
Louisiana 10850 1051
Mississippi 8857 769
New Jersey 15282 1457
North Dakota 12461 1110
Oregon 11582 1229
Oklahoma 11745 1123

1600

1400

1200

1000

800

600

400

200

0
0 2000 4000 6000 8000 10000 12000 14000 16000 18000
As we can observe from the scatterplot, there is a positive linear relationship between these
two variables.

X Y XY X^2 Y^2
9724 771 7497204 94556176 594441
14344 1337 19177928 2.06E+08 1787569
16369 1434 23473146 2.68E+08 2056356
13728 1255 17228640 1.88E+08 1575025
10850 1051 11403350 1.18E+08 1104601
8857 769 6811033 78446449 591361
15282 1457 22265874 2.34E+08 2122849
12461 1110 13831710 1.55E+08 1232100
11582 1229 14234278 1.34E+08 1510441
11745 1123 13189635 1.38E+08 1261129
Sum 124942 11536 1.49E+08 1.61E+09 13835872

n = 10
X = 124942
Y = 11536
XY = 1.49 x 108
X2 = 1.61 x 109
Y2 = 13835872


= 0.94
2 ( )2 2 ( )2

There is a high positive linear correlation between Average Personal Income and Average
Taxes per capita. As one increases, so does the other.
r is a sample correlation coefficient and , therefore it will always give us a value different to
zero even if in the population level the correlation is zero
To test whether X and Y are truly correlated at the population level we do the following test

Test for population correlation coefficient

1. H0 : = 0 vs Ha : > 0 ( or < or )
2. Test statistic


=

With degrees of freedom df = n-2
Where n is the sample size and r is the sample correlation coefficient
3. Rejection region : As for any t-test
4. Conclusion : It the test statistic is in the rejection region , reject the null hypothesis ,
otherwise do not reject the null hypothesis

Practice exercise:
Let X be the list price of a vehicle (in thousands of dollars) and Y be the dealer invoice for the
same vehicle (in thousands of dollars). The table below lists X and Y for a random sample of 5
vehicles.

X Y
32.1 29.8
33.5 31.1
36.1 32.0
44.0 42.1
47.8 42.2

a) Do a scatterplot of X vs Y. What type of correlation appears to exist between X and Y?


b) Compute the sample correlation coefficient
c) Test whether there is a correlation between these two variables at a 5 % level of
significance