1. Pearson r correlation
Pearson r correlation is widely used in statistics to
measure the degree of the relationship between
linear related variables. For example, in the stock
market, if we want to measure how two
commodities are related to each other,
Pearson r correlation is used to measure the
degree of relationship between the two
commodities. The following formula is used to
calculate the Pearson r correlation:
Where:
r = Pearson r correlation coefficient
N = number of value in each data set
xy = sum of the products of paired scores
x = sum of x scores
y = sum of y scores
x2= sum of squared x scores
y2= sum of squared y scores
Questions a Pearson
correlation answers
GLUCOSE
SUBJECT AGE X
LEVEL Y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Step 1:Make a chart. Use the given data,
and add three more columns: xy, x2, and y2.
GLUCOSE
SUBJECT AGE X XY X2 Y2
LEVEL Y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Step 2: Multiply x and y together to fill the xy
column. For example, row 1 would be 43 99
= 4,257.
GLUCOSE
SUBJECT AGE X XY X2 Y2
LEVEL Y
1 43 99 4257
2 21 65 1365
3 25 79 1975
4 42 75 3150
5 57 87 4959
6 59 81 4779
Step 3: Take the square of the numbers in the x
column, and put the result in the x2 column.
GLUCOSE
SUBJECT AGE X XY X2 Y2
LEVEL Y
1 43 99 4257 1849
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481
Step 4: Take the square of the numbers in the y
column, and put the result in the y2 column.
GLUCOSE
SUBJECT AGE X XY X2 Y2
LEVEL Y
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Step 5: Add up all of the numbers in the columns
and put the result at the bottom.2 column. The Greek
letter sigma () is a short way of saying sum of.
GLUCOSE
SUBJECT AGE X XY X2 Y2
LEVEL Y
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
247 486 20485 11409 40022
Step 6:Use the following correlation
coefficient formula.
Where:
Nc= number of concordant
Nd= Number of discordant
Kendalls Tau is a non-parametric measure
of relationships between columns of
ranked data. The Tau correlation
coefficient returns a value of 0 to 1,
where:
0 is no relationship,
1 is a perfect relationship.
A quirk of this test is that it can also produce negative
values (i.e. from -1 to 0). Unlike a linear graph, a
negative relationship doesnt mean much with ranked
columns (other than you perhaps switched the columns
around), so just remove the negative sign when youre
interpreting Tau.
Several versions of Tau exist.
Tau-A and Tau-B are usually used for square tables (with
equal columns and rows). Tau-B will adjust for tied
ranks.
Tau-C is usually used for rectangular tables. For square
tables, Tau-B and Tau-C are essentially the same.
Most statistical packages have Tau-B
built in, but you can use the
following formula to calculate it by
hand:
Kendalls Tau = (Nc Nd /Nc +
Nd)
Where Nc is the number of
concordant and Nd is the number
of discordant pairs.
Example Problem
Kendalls Tau = (C D / C + D)
= (61 5) / (61 + 5) = 56 / 66 = .
85.
The Tau coefficient is .85,
suggesting a strong relationship
between the rankings.
Perfect Correlation
Counting how many values are below the second column seems
very odd when you first do it. But it does work. Just as a
thought experiment, heres what the spreadsheet looks like if
both interviewers were in perfect agreement:
= 1 (6*12)/(9(81-1))
= 1 72/720
= 1-0.1
= 0.9
The Spearman Rank Correlation for this set of data is
0.9
Spearman Rank Correlation:
Worked Example (Tied Ranks)
= 1 (6*14.5)/(9(81-1))
= 1 87/720
= 1 0.120833333
= 0.879