Sie sind auf Seite 1von 8

SPEARMAN’S CORRELATION

STUDENT NAME : EZAD JUFERI


LECTURER :X
SUBJECT :X
SUBMISSION DATE: X
2|Page

SPEARMAN’S CORRELATION
1 Introduction
Correlation is a bivariate analysis that measures the extent of association between two variables
in terms of strength and direction of the relationship. For example, we might ask whether
examination scores in mathematics are related to examination scores in physics for form one
students; or whether the blood levels of cholesterol are related nicotine excretion in male
smokers. A quantitative measure of the strength of the correlation is a correlation coefficient,
which expresses how closely a change in the magnitude of one of the variables is accompanied
by a change in the magnitude of the other variable. The value of the correlation coefficient
varies between +1 and -1. A value of ± 1 indicates a perfect degree of association between the
two variables. As the correlation coefficient value goes towards 0, the relationship between
the two variables weakens. The direction of the relationship is indicated by the sign of the
coefficient; a + sign indicates a positive relationship and a – sign indicates a negative
relationship. The four common types of correlations include Pearson correlation, Spearman
correlation, Kendall rank correlation, and the Point-Biserial correlation.

Before learning about Spearman’s correlation it is important to understand Pearson’s


correlation which is a statistical measure of the strength of a linear relationship between paired
data. Its calculation and subsequent significance testing of it requires the following data
assumptions:
1) interval or ratio level; 2) linearly related; 3) bivariate normally distributed.
For better understanding, a knowledge of difference between these two sets of techniques and
its distinction proves to be of important value. The word ‘parametric’ itself comes from
‘parameter’, or characteristic of a population. The parametric tests (e.g. t-tests, Pearson’s) make
assumptions about the population from which the sample has been drawn. This often includes
assumptions about the shape of the population distribution (e.g. normally distributed). If the
data does not meet the three assumptions of Pearson’s correlation by being not normally
distributed or ordinal, Spearman rank correlation coefficient, which is the nonparametric
version of the Pearson correlation coefficient can then be utilized. Non-parametric techniques,
are sometimes referred to as distribution-free tests as they do not make assumptions about the
underlying population distribution. However, non-parametric statistics tend to be less sensitive
3|Page

than parametric ones, and may therefore fail to detect differences between groups that actually
exist. Non-parametric techniques are ideal for use when the data that are measured are of
nominal (categorical) and ordinal (ranked) scales constitute a random sample and that the two
members of each of the n pairs of data are measurements taken on the same subject. They are
also useful with small samples and when the current data do not meet the stringent assumptions
of the parametric techniques.

2 Computing the coefficient


As mentioned earlier, the Spearman rank correlation test does not carry any assumptions about
the distribution of the data, unlike Pearson’s correlation, as there is no requirement of normality
and is the appropriate correlation analysis when the variables are measured on a scale that is at
least ordinal. The notation used for the sample correlation is rs. The rS calculated from a sample
of data is an estimate of ρS, the Spearman rank correlation coefficient that would be obtained
from the entire population of data from which that sample came. ρS is sometimes referred as
Spearman ρ (Rho). The following formula (Figure 1 &2) is used to calculate the Spearman
rank correlation. The Spearman rank correlation coefficient, rS, may be obtained by subjecting
the ranks, instead of the raw measurements, to the following formula (Figure 1 &2). The
measurements may be ranked from high to low (e.g. rank 1 indicates the highest mark, rank 2
the next highest, and so on, with rank n the lowest) or from low to high (rank 1 denotes the
lowest and rank n the highest). The formula for the Spearman rank correlation coefficient when
there are no tied ranks is:

Figure 1: Spearman rank correlation coefficient formula for no tied ranks.

Where,

ρ= Spearman rank correlation


di= the difference between the ranks of corresponding variables
n= number of observations

Two or more data with the same value are said to be “tied” and each of their ranks may be set
equal to the mean of the ranks of the positions they occupy in the ordered data set. For example,
4|Page

in the data set 36, 47, 47, 65, and 71cm, data 2 and 3 are tied; the mean of 2 and 3 is 2.5, so the
ranks of the five data are 1, 2.5, 2.5, 4, and 5 respectively. When there is a large number of tied
ranks, a better option may be to calculate correlation with another method, like Kendall's Tau
(τ ). Kendall's τ measures the degree to which a relationship is always positive or always
negative and is useful for small data sets with a large number of tied ranks. Another option
would be to use the full version of Spearman’s formula (a slightly modified Pearson’s r), which
will deal with tied ranks:

Figure 2: Spearman rank correlation coefficient formula for tied ranks.

Where,
i = paired score.

Spearman’s correlation coefficient is a statistical measure of the strength of a monotonic


relationship between paired data. In a sample it is denoted by and is by design constrained as
follows: -1≤rs≥1. Spearman correlation coefficient, rs, can take values from +1 to -1. A rs of
+1 indicates a perfect association of ranks, a rs of zero (no correlation) indicates that the
magnitudes of the ranks of one variable are independent of the magnitudes of the ranks of the
second variable (no association between ranks) and a rs of -1 indicates a perfect negative
association of ranks. Similar to the interpretation of Pearson’s, whereby in Spearman Rho, the
closer it is to value of 1 the stronger the monotonic relationship. Correlation is an effect size
and so we can verbally describe the strength of the correlation using the following guide Table
1.

Table 1: rs values
5|Page

Prior to analysing the data using a Spearman's correlation, two assumptions that support
Spearman's correlation have to be met for a valid result to be obtained. These assumptions of
Spearman correlation are that data should be of ordinal or continuous scale (eg: interval or ratio
scale) and the scores on one variable must be monotonically related to the other variable. In an
ordinal scale, the levels of a variable are ordered such that one level can be considered
higher/lower than another. However, the magnitude of the difference between levels is not
necessarily known. An ordinal variables example include Likert scales (e.g., a 7-point scale
from "strongly agree" through to "strongly disagree"), amongst other ways of ranking
categories (e.g., a 5-point scale for measuring job satisfaction, ranging from "most satisfied" to
"least satisfied"; a 4-point scale determining how easy it was to navigate a new website, ranging
from "very easy" to "very difficult"; or a 3-point scale explaining how much a customer liked
a product, ranging from "Not very much" to "Yes, a lot"). Examples of continuous
variables include blood cholesterol levels (measured in mmol/l), test performance (measured
from 0 to 100), height (measured in meters and centimetres), temperature (measured in °C),
salary (measured in Ringgit), intelligence (measured using IQ score), and so forth. For the
second assumption, there is the need for a monotonic relationship between the two variables.

3 Monotonic function
As Spearman’s rank correlation coefficient is a statistical measure of the strength of a
monotonic relationship between paired data, it is necessary to know what a monotonic function
is to understand Spearman’s correlation. A monotonic function is one that either never increases or
never decreases as its independent variable increases (monotonically increasing, monotonically
decreasing, not monotonic). The following graphs illustrate monotonic functions:
 Monotonically decreasing - as the x variable increases the y variable never increases;
 Monotonically increasing - as the x variable increases the y variable never decreases;

 Not monotonic - as the x variable increases the y variable sometimes decreases and
sometimes increases.
6|Page

Figure 3: Examples of monotonic and non-monotonic relationships: monotonically


decreasing, monotonically increasing, not monotonic.

In a monotonic relationship, the variables tend to move in the same relative direction, but not
necessarily at a constant rate whereas in a linear relationship, the variables move in the same
direction at a constant rate. if a scatterplot showing that the relationship between two variables
looks monotonic a Spearman's correlation would be run as this will measure the strength and
direction of this monotonic relationship.

4 Testing Hypotheses
The types of research questions a Spearman Correlation can examine for example, are
enumerated as follows. The research question explored in the sample case to be discussed
below are research questions 1&2.
1) Is there a statistically significant relationship between students’ performance in maths test
and physics test?
2) Is there a statistically significant relationship between nicotine excreted and cholesterol level
in cigarette smokers?
A common desire in rank correlation analysis is to test the null hypothesis that there is no
correlation in the population between the paired ranks , i.e. we wish to test the two-tailed
hypotheses H0:  s  0 vs H1:  s  0. Thus A null hypothesis statement for the following
research questions in spearman correlation can be expressed as follows:
1) H0: There is no [monotonic] association between maths and physics marks.
2) H0: There is no [monotonic] association between nicotine excreted and blood cholesterol
levels.
7|Page

Against the alternative hypothesis, H1, that there is monotonic correlation between the two
variables. There are many tables of critical values of rS, and if rS is greater than the relevant
critical value, then H0 is rejected.

The use of Σdi2, instead of rS, as the test statistic for rank-correlation testing is sometimes called
the “Hotelling–Pabst test”. Σdi2 is small when rS is large, and H0 is rejected if Σdi2is less than
the critical value. Published tables offer critical values for various sample sizes, n, and levels
of significance α (Refer Appendix). If there are tied data, critical values are only approximate.
It should be noted that computer software packages may use approximations that are not as
accurate as published tables.

5 Reporting Output
When you reporting the output of Spearman's correlation, it is good practice to include the
following details:
A. An introduction to the analysis you carried out.
B. Information about your sample (including any missing values).
C. The Spearman correlation coefficient, rs.
D. The statistical significance level (i.e., p-value) of the result.

In addition to the reporting the results as above, a diagram can be used to visually present the
results. For example, using a scatterplot.

6 CASE SAMPLE #1: MANUAL CALCULATION (No tied


rank)
The scores for nine students in physics and math are as follows (Figure 4):
Physics: 35, 23, 47, 17, 10, 43, 9, 6, 28; Mathematics: 30, 33, 45, 23, 8, 49, 12, 4, 31
Compute the student’s ranks in the two subjects and compute the Spearman rank correlation.

Step 1: Hypothesis is formed H0: There is no (monotonic) association between maths and
physics marks against the alternative hypothesis, H1, that there is monotonic correlation
between the two variables.
8|Page

Step 2: A rank table is constructed (Table 2). The ranks for each individual subject can be
obtained using manual ranking by hand (order the scores from greatest to smallest; assign the
rank 1 to the highest score, 2 to the next highest and so on) or by Excel rank function to find
the ranks.

Table 2

Step 3: A third column, d, is added to the data. The d is the difference between ranks. For
example, the first student’s physics rank is 3 and math rank is 5, so the difference is 2 points.
In a fourth column, d values were squared (Table 3).

Table 3

Step 4: All of the d-squared values were summed up to be used in the formula as Σ d2 (the sum
of d-squared values). 4 + 4 + 1 + 0 + 1 + 1 + 1 + 0 + 0 = 12.

Das könnte Ihnen auch gefallen