Sie sind auf Seite 1von 42

CORELATION

- Pearson-r
- Spearman-rho
Scatter Diagram
A scatter diagram is a graph that shows that
the relationship between two variables
measured on the same individual.
Each individual in the set is represented by
a point on in the scatter diagram. The
predictor variable is plotted on the
horizontal axis and the response variable is
plotted on the vertical axis.
Do not connect points when drawing a
scatter diagram.
Scatterplot
A scatterplot is a graph that shows location
of each data formed by a pair of X-Y scores.
In a positive linear relationship, as the X
scores increase, the Y scores tends to
increase.
In a negative linear relationship, as the X
scores increase, the Y scores tends to
decrease.
In a nonlinear relationship, as the X scores
increase, the Y scores do not only increase
or only decreases
Types of relationship
A horizontal scatterplot, with horizontal
regression line, indicates no relationship.
Slopping scatterplots with regression lines
oriented so that Y increases as X increases
indicate a positive linear relationship.
Slopping scatterplots with regression lines
oriented so that Y decreases as X increases
indicate a negative linear relationship.
Scatterplots producing curved regression
lines indicate nonlinear relationships.
Strength of relationship
The strength of a relationship is the extent to which
one value of Y is consistently paired with one and only
one value of X.
The strength of a relationship is also referred to as the
degree of association between the two variables
The absolute value of the correlation coefficient (the
size of the number we calculate) indicates the strength
of the relationship.
The largest value you can obtain is 1.0 and the
smallest value is 0.
The larger the value the stronger the relationship.
For example, on average, as height in people
increases, so does weight.
Height(in) Weight
(lbs)
1 60 102
2 62 120
3 63 130
4 65 150
5 65 120
6 68 145
7 69 175
8 70 170
9 72 185
10 74 210
Example of a Positive Correlation
If the correlation is positive, when one variable increases, so does the other.
For example, as study time increases, the
number of errors on an exam decreases
Study
time (min)
No. Errors
on test
1 90 25
2 100 28
3 130 20
4 150 20
5 180 15
6 200 12
7 220 13
8 300 10
9 350 8
10 400 6
Example of a negative correlation
If the correlation is negative, when one variable increases, the other decreases.
Example of a zero correlation
If there is no relationship between the two variables, then
as one variable increases, the other variable neither
increases nor decreases. In this case, the correlation is
zero. For example, if we measure the SAT-V scores of
college freshmen and also measure the circumference of
their right big toes, there will be a zero correlation.
What is the correlation
coefficient?
Linear means straight line.
Correlation means co-relation, or the
degree that two variables "go together".
Linear correlation means to go together in a
straight line.
The correlation coefficient is a number
that summarizes the direction and degree
(closeness) of linear relations between two
variables.
What is the correlation
coefficient?
The correlation coefficient is also
known as the Pearson Product-
Moment Correlation Coefficient.
The sample value is called r,
and the population value is called
(rho).
What is the correlation
coefficient?
The correlation coefficient can take
values between -1 through 0 to +1.
The sign (+ or -) of the correlation
affects its interpretation.
When the correlation is positive (r >
0), as the value of one variable
increases, so does the other.
Correlation & Association
Scale Example
Interval-interval Pearson r
Ordinal-ordinal Spearman Rank
Nominal-nominal
Phi, Chi-square Independent test
Nominal-interval Eta
Nominal-ordinal
Theta, Kruskal-Wallis H test
Ordinal-interval
Jaspens M, F test
Pearson correlation coefficient
o The conceptual (definitional) formula of
the correlation coefficient is:
where x and y are deviation scores, that
SX and SY are sample standard deviations, that is,
(1.1)
Pearson correlation coefficient
Another way of defining correlation is:
where zx is X in z-score form, zy is Y in z-score
form, and S and N have their customary meaning.
This says that r is the average cross-product of z-
scores.
(1.2)
Pearson correlation coefficient
Where
Pearson correlation coefficient
Sometimes you will see these formulas
written as:
and
Pearson correlation coefficient
These formulas are correct when the
standard deviations used in the
calculations are the estimated population
standard deviations rather than the
sample standard deviations.
so the main point is to be consistent.
Either use N throughout or use N-1
throughout.
Example:
Interpretation of Pearson
Coefficient
r Interpretation
0.00-0.20 can be ignored
0.20-0.40 low
0.40-0.60 medium
0.60-0.80 high
0.80-1.00 very high
Strength of Pearson r
Coefficient Strength
0.01 0.09 Trivial
0.10 0.29 Low to moderate
0.30 0.49 Moderate to
substantial
0.50 0.69 Substantial to very
strong
0.70 0.89 Very strong
>0.90 Near perfect
Spearmans Coefficient of Rank
Correlation, r
s
Spearmans rank-order correlation
coefficient
The correlation coefficient is used when one or more
variables is measured on an ordinal (ranking) scale
Describes the linear relationship between two variables
measured using ranked scores
Symbol used r
s
(The subscript s stands for Spearman;
Charles Spearman invented this one)
Computational Formula for the Spearman
Rank-Order Correlation Coefficient is:
R
s
= 1 6(D
2
)
-----------
N (N
2
-1)
N is the number of pair ranks
D is the difference between the two ranks in each
pair
Running the Spearman Rank-Order
Correlation Test
1. Determine the difference between the ranks for each
subjects
2. Square each difference and sum them
3. Calculate the rho statistics.
4. Compare the obtained rho value with the critical value
Summary of the Spearman Rank-Order
Correlation Test
Hypotheses:
H
0
: Rho = 0
H
a
: Rho 0, or Rho < 0, or Rho > 0
Assumptiojns:
Subjects are randomly selected
Observations are ranked order
Decision Rules:
n = number of pairs of ranks
If rho
obt
rho
crit
, reject H
0
If rho
obt
< rho
crit
, do not reject H
0
Formula
rho = 1 6(D
2
)
n (n
2
-1)
Sample data
Participant Observer A: X Observer B: Y
1 4 3
2 1 2
3 9 8
4 8 6
5 3 5
6 5 4
7 6 7
8 2 1
9 7 9
Solution
Participant Observer A: X Observer B: Y D D
2
1 4 3 1 1
2 1 2 -1 1
3 9 8 1 1
4 8 6 2 4
5 3 5 -2 4
6 5 4 1 1
7 6 7 -1 1
8 2 1 1 1
9 7 9 -2 4
D
2
=18
Solution
Rs = 1 6(D
2
)
-----------
N (N
2
-1)
= 1 (6(18))
----------
9 (9
2
-1)
= 1 - ((108)/720)
= 1 0.15
= + .85
What does the value of r
s
tell you?
Spearmans rank correlation coefficient is actually derived from
the product-moment correlation coefficient , such that:
-1 r
s
1
r
s
= 0.85 Means that a child receiving a particular ranking from
one observer tended to receive very close to the same ranking
from other observer
r
s
= +1 means the ranking is in complete agreement
r
s
= 0 means that there is no correlation between the rankings
r
s
= -1 means that the ranking are in complete disagreement. In
fact they are in exact reverse order.
Exercise:
The marks of eight candidates in English and Mathematics are:
Candidate 1 2 3 4 5 6 7 8
English (x) 50 58 35 86 76 43 40 60
Maths (y) 65 72 54 82 32 74 40 53
Rank the results and hence find Spearmans rank
correlation coefficient between the two sets of marks.
Comment on the value obtained,
Solution
English
(x)
50 58 35 86 76 43 40 60
Maths
(y)
65 72 54 82 32 74 40 53
Rank x
4 5 1 8 7 3 2 6
Rank y
5 6 4 8 1 7 2 3
D
-1 -1 -3 0 6 -4 0 3
D
2
1 1 9 0 36 16 0 9 D
2
=
72
Solution
R
s
= 1 6(D
2
)
-----------
N (N
2
-1)
= 1 (6(72))
----------
8 (8
2
-1)
= 1 - ((432)/504)
= 1 0.857
= .142
Spearmans coefficient of rank
correlation is 0.142
This appears to show a very
weak positive correlation
between the English and
Mathematics ranking
Tied Ranks
A tied rank occurs when two participants receive the
same rank on the same variable (e.g two person are tied
for first on variable x)
Tied ranks result in an incorrect value of r
s
Resolve (correct) any tied ranks before computing r
s
Therefore, for each participant at a tied rank, assign the mean
of the ranks that would have been used had there not been a tie
Example
Runner Race X Race Y To resolve ties New Y
A 4 1 Tie uses ranks 1
and 2, becomes 1.5
1.5
B 3 1 Tie uses ranks 1
and 2, becomes 1.5
1.5
C 2 2 Becomes 3rd 3
D 1 3 Becomes 4th 4
Example
Runner Race X New Y D D
2
A 4 1.5 2.5 6.25
B 3 1.5 1.5 2.25
C 2 3 -1 1
D 1 4 -3 9
D
2
= 18.5
Solution
R
s
= 1 6(D
2
)
-----------
N (N
2
-1)
= 1 (6(18.5))
----------
4 (4
2
-1)
= 1 - ((111)/60)
= 1 1.85
= - .85

Das könnte Ihnen auch gefallen