Sie sind auf Seite 1von 119

A science that deals with

collecting, organizing,
analyzing and
interpreting pertinent
data.
STATISTICS
Imagine this situation:
You are in a class with just four other
students, and the five of you took a 5-
point pop quiz. Today your instructor is
walking around the room, handing back
the quizzes. She stops at your desk and
hands you your paper. Written in bold
black ink on the front is 3/5. How do you
react?
Are you happy with
your score of 3 or
disappointed?
How do you decide?
You might calculate your
percentage correct, realize it is
60%, and be appalled.

But it is more likely that when
deciding how to react to your
performance, you will want
additional information.

What additional information
would you like?

If you are like most students,
you will immediately ask your
neighbors, "Whad'ja get?" and
then ask the instructor, "How
did the class do?"
In other words, the additional
information you want is how your
quiz score compares to other
students' scores. You therefore
understand the importance of
comparing your score to the class
distribution of scores.
Should your score of 3 turn out to be
among the higher grades then you'll be
pleased after all.

On the other hand, if 3 is among the
lowest scores in the class, you won't be
quite so happy.

3 Common Measures of
Central Tendency

Mean of Ungrouped Data



To compute the mean of ungrouped data, simply
add the given observations and divide it by the
number of observations.

X
i
where: X
i
sum of all observations
X = ______ n total number
n observations


Example:

Mean of Grouped Data


Example:
C.I. Freq. X
i
F
i
X
i

76 79 2 77.5 155
80 83 5 81.5 407.5
84 87 5 85.5 427.5
88 91 11 89.5 984.5
92 95 4 93.5 374
96 99 3 97.5 292.5
n = 30 = 2,641
Median of Ungrouped Data

To get the median of ungrouped
data, arrange the given observations
according to magnitude, then
identify the middle value.

Example

Note:
In case of an even number of
observations, we expect two middle
values, what simply need to be done
is to get the average of the two
observations by adding the two
observations and dividing it by 2.

Median of Grouped Data

Example:
C.I. C.B. Freq. F<
76 79 75.5 79.5 2 2
80 83 79.5 83.5 5 7
84 87 83.5 87.5 5 12
88 91 87.5 91.5 11 23
92 95 91.5 95.5 4 27
96 99 95.5 99.5 3 30
n = 30
Solution:
Identify the Median class. The median class
is computed using the formula n/2. Since
n=30, therefore n/2 = 15.

Locate the computed n/2 in the F<. Since 15
cannot be located on the F<, we will use the
closest higher value to 15 and that is 23.
Therefore 88 91 is the median class.

Identify the value of the different
variables needed in the formula.

L
Me
= 87.5
n/2 = 15
cfb = 12
f
Me
= 11
c = 4

Mode of Ungrouped Data



To get the mode of ungrouped data,
identify the observation/s having
the most number of frequency or
occurrence.

Example:

Mode of Grouped Data


Example:
C.I. C.B. Freq.
76 79 75.5 79.5 2
80 83 79.5 83.5 5
84 87 83.5 87.5 5
88 91 87.5 91.5 11
92 95 91.5 95.5 4
96 99 95.5 99.5 3
n = 30
Solution:

Identify the Modal class. The modal
class is the class having the highest
frequency. Since the highest frequency
is 11, therefore 88 91 is the modal
class.

Scores of 5 Boys and 5 Girls in Mathematics


Boys Girls
Frederick 70 Grace 82
Russel 95 Irish 80
Murphy 60 Abigail 83
Jerome 80 Sherry 81
Tom 100 Kristine 79
Mean: 81 Mean: 81

Boys



60 70 80 90 100

Girls


60 70 80 90 100

Measures of Variability or Dispersion
RANGE:
The difference between the highest and the
lowest observation
R = H L

Boys: R = 100 60
R = 40
Girls: R = 83 79
R = 4
Therefore the
girls are more
homogeneous
than the boys in
their math
ability
Mean Deviation:

The average of the summation of the
absolute deviation of each observation
from the mean.

MD =

| Xi X |
n

BOYS Xi lXi Xl
Frederick 70 11
Russel 95 14
Murphy 60 21
Jerome 80 1
Tom 100 19
Mean: 81 = 405 = 66
M.D = 66 / 5 = 13.2
GIRLS Xi lXi Xl
Grace 82 1
Irish 80 1
Abigail 83 2
Sherry 81 0
Kristine 79 2
Mean: 81 = 405 = 6
M.D = 6 / 5 = 1.2
MD ( boys ) = 13.2
MD ( girls ) = 1.2

- based from the computed Mean
Deviation, the girls are more
homogeneous than the boys.
VARIANCE:
The average of the squared deviation
from the mean.
Population Variance

2
= ( Xi X )
2


n
Sample Variance
s
2
= ( Xi X )
2



n - 1

BOYS Xi Xi X ( Xi X )
2

Frederick 70 -11 121
Russel 95 14 196
Murphy 60 -21 441
Jerome 80 -1 1
Tom 100 19 361
Mean: 81 = 405 = 1,120

2
= 1,120 / 5 s
2
= 1,120 / 4
= 224 = 280
GIRLS Xi Xi X ( Xi X )
2

Grace 82 1 1
Irish 80 1 1
Abigail 83 2 4
Sherry 81 0 0
Kristine 79 2 4
Mean: 81 = 405 = 10

2
= 10 / 5 s
2
= 10 / 4
= 2 = 2.5
BOYS

2
= 1,120 / 5 s
2
= 1,120 / 4
= 224 = 280

GIRLS


The values of
the Variance
also reveals that
the score of
boys are more
spread out than
that of the girls.

2
= 10 / 5 s
2
= 10 / 4
= 2 = 2.5
STANDARD DEVIATION:
The square root of the Variance

BOYS

2
= 224 s
2
= 280
= 14.97 s = 16.73
GIRLS

2
= 2 s
2
= 2.5
= 1.41 s = 1.58
Let us pause for
a BREAK
HYPOTHESIS
TESTING
HYPOTHESIS TESTING
Inferential Statistics formalized body of
techniques used to make conclusions about
populations based on the results of the study on
the samples.
Two areas of Inferential Statistics
Estimation
Point Estimation
Interval Estimation
Hypothesis Testing


.
HYPOTHESIS TESTING
Research Problem: How effective is Minoxidil in
treating male pattern baldness?

Specific Objectives:

1. To estimate the population proportion of patients who
will show new hair growth after being treated with
Minoxidil.

2. To determine whether treatment using Minoxidil is better
than the existing treatment that is known to stimulate hair
growth among 40% of patients with male pattern
baldness.

HYPOTHESIS TESTING
Hypothesis Testing - is the process of making
an inference or generalization about a population
by using data gathered from a sample of the
population

It is an area of statistical inference in which one
evaluates a conjecture about some characteristic
of the parent population based upon the
information contained in the random sample.

Usually the conjecture concerns one of the
unknown parameters of the population.
HYPOTHESIS TESTING
Kinds of Hypothesis:

Scientific Hypothesis is a suggested
explanation or solution to a phenomenon.

Statistical Hypothesis:

It is a guess or prediction made by a researcher
regarding the possible outcome of the study.
It is a claim or a statement about an unknown
parameter.


HYPOTHESIS TESTING
Examples of Scientific Hypothesis:
When Darwin hypothesized that man
evolved from the apes, he was making a
scientific hypothesis.
Similarly when Copernicus hypothesized
that the earth and the planets in the solar
system revolved around the sun in
concentric circles with the sun as the
center.

HYPOTHESIS TESTING
Examples of Statistical Hypothesis:


1. The correlation between X and Y(in the population)
is equal to zero;

2. There is no significant difference in the mean of
the two groups;

3. The mean IQ of the population is 100;
0 =
XY

B A
=
100 =
HYPOTHESIS TESTING
Two Types of Statistical Hypothesis
Null hypothesis (H
0
): It is the hypothesis to be
tested which one hopes to reject. It shows
equality or no significant difference, effect, or
relationship between variables.

denoted by Ho.
the statement being tested.
it represents what the experimenter doubts to be true.
must contain the condition of equality and must be written
with the symbol =, , or

s >
HYPOTHESIS TESTING
For the mean, the null hypothesis will be stated in
one of these three possible forms:

Ho: = some value
Ho: some value
Ho: some value

Note: the value of can be obtained from previous studies
or from knowledge of the population

s
>

HYPOTHESIS TESTING
Alternative hypothesis (H
a
): It generally
represents the idea which the researcher wants
to prove.
denoted by Ha
is the statement that must be true if the null
hypothesis is false
the operational statement of the theory that the
experimenter believes to be true and wishes to
prove
is sometimes referred to as the research hypothesis




HYPOTHESIS TESTING
For the mean, the alternative hypothesis will be
stated in only one of three possible forms:

Ha: some value
Ha: > some value
Ha: < some value

Note:
Ha is the opposite of Ho. For example, if Ho is given as
= 37.0, then it follows that the alternative hypothesis is
given by Ha: 37.0.

=
=

HYPOTHESIS TESTING
Note About Using or in Ho

Even though we sometimes express Ho with the
symbol or as in Ho: 37.0
or Ho: 37.0, we conduct the test by assuming
that = 37.0 is true.
We must have a single fixed value for so that we
can work with a single distribution having a
specific mean.

s >
s
s >
>

HYPOTHESIS TESTING
Note About Stating Your Own
Hypotheses
If you are conducting a research study
and you want to use a hypothesis test to
support your claim, the claim must be
stated in such a way that it becomes the
alternative hypothesis, so it cannot
contain the condition of equality.

HYPOTHESIS TESTING
Example in Stating your Hypothesis

If you believe that your brand of refrigerator
lasts longer than the mean of 14 years for
other brands, state the claim that > 14,
where is the mean life of your
refrigerators.
Ho: = 14 vs. Ha: > 14



HYPOTHESIS TESTING
In this context of trying to support the goal
of the research, the alternative hypothesis is
sometimes referred to as the research
hypothesis.
Also in this context, the null hypothesis is
assumed true for the purpose of conducting
the hypothesis test, but it is hoped that the
conclusion will be rejection of the null
hypothesis so that the research hypothesis is
supported.

HYPOTHESIS TESTING
Research Problem:
Comparative performance in Mathematics of
the first-born and the last-born children.

H
0
: There is no significant difference in the
performance in mathematics between the first-
born and last-born children.
H
a
: There is a significant difference in the
performance in mathematics between the first-
born and last-born children.

HYPOTHESIS TESTING
Research Problem:
Effectiveness of an Instructional Strategy
H
0
: There is no significant effect of modified worked
example strategy in the problem solving ability of
students in physics.

H
a
: The modified worked example strategy will have a
significant effect in the problem solving ability of students
in physics.
H
a
: Students exposed to the modified worked examples
are better problem solvers than those exposed to
conventional worked examples.
HYPOTHESIS TESTING
Research Problem:
Relationship between emotional intelligence of
students and their level of math anxiety

H
0
: There is no significant relationship between
students emotional intelligence and their level of
math anxiety.
H
a
: There is significant relationship between
students emotional intelligence and their level of
math anxiety.

HYPOTHESIS TESTING
REMARK:
If the null hypothesis is rejected, the
alternative hypothesis is accepted and
vice versa. Rejection of the null
hypothesis means it is wrong, while
acceptance of the null hypothesis
does not mean it is true, it simply
means that we do not have enough
evidence to reject it.

HYPOTHESIS TESTING
Types of Hypothesis Testing
1. Two-tailed test: It is non-directional test with
the region of rejection lying on both tails of the
normal curve. It is used when the alternative
hypothesis uses words such as not equal to,
significantly different, etc.
Acceptance
region
Rejection region Rejection region
HYPOTHESIS TESTING
Example: A teacher wants to know if there is
significant difference in the performance in
Statistics between his morning and afternoon
classes.

H
0
: There is no significant difference in the
performance in Statistics between his morning
and afternoon classes.
H
a
: There is a significant difference in the
performance in Statistics between his morning
and afternoon classes.


HYPOTHESIS TESTING
2. One-tailed test: It is a directional test with the
region of rejection lying on either left or right tail
of the normal curve.
Right directional test. The region of rejection is on the
right tail. It used when the alternative hypothesis uses
comparatives such as greater than, higher than, better
than, superior to, exceeds, etc.
Acceptance region
Region of Rejection
HYPOTHESIS TESTING
Example:
Research Problem: Performance in
Mathematics of the First-born and Last-born
Children

H
0
: The first born-children perform equally
well in mathematics as the last-born children.

H
a
: The first born-children perform better in
mathematics than the last-born children.
HYPOTHESIS TESTING
Left directional test. The region of rejection
is on the left tail. It is used when the alternative
hypothesis uses comparatives such as less
than, smaller than, inferior to, lower than,
below, etc.
Acceptance region
Rejection region

HYPOTHESIS TESTING
Example:

Research problem:
It is known that in the school canteen, the average waiting
time for a customer to receive and pay for his order is 20
minutes, Additional personnel has been added and now
the management wants to know if the average waiting time
had been reduced.

H
0
: The average waiting time had not been reduced or the
average waiting time is equal to 20 minutes.

H
a
: The average waiting time had been reduced, or the
average waiting time is less than 20 minutes.


HYPOTHESIS TESTING
What is a test of Significance?

A test of significance is a problem of deciding
between the null and the alternative hypotheses on
the basis of the information contained in a random
sample.

The goal will be to reject Ho in favor of Ha, because
the alternative is the hypothesis that the researcher
believes to be true. If we are successful in rejecting
Ho, we then declare the results to be significant.
HYPOTHESIS TESTING
Two Types of Error in Hypothesis Testing:

TYPE 1 ERROR

The mistake of rejecting the null hypothesis when it is
true.

It is not a miscalculation or a procedural misstep; it is
an actual error that can occur when a rare event
happens by chance.

The probability of rejecting the null hypothesis when it
is true is called the significance level ( ).

The value of is typically predetermined, and the very
common choices are = 0.05 and = 0.01.


o
o
o
o
HYPOTHESIS TESTING
Examples of Type I Error

1.The mistake of rejecting the null
hypothesis that the mean body
temperature is 37.0 when that mean is
really 37.0.

2.BFA did not allow the release of an
effective medicine.

HYPOTHESIS TESTING
Type II Error

The mistake of failing to reject the null
hypothesis when it is false.
The symbol (beta) is used to
represent the probability of a type II
error.
|
HYPOTHESIS TESTING
Examples of Type II Errors

1.The mistake of failing to reject the null
hypothesis ( = 37.0) when it is actually
false (that is, the mean is not 37.0).

2.BFA allowed the release of an ineffective
drug.

HYPOTHESIS TESTING
A typical example of testing a statistical hypothesis is
summarized in the following table.


Accept H
0
Reject H
0

H
0
is true
Correct
Decision
Type 1 Error
H
0
is false Type II Error
Correct
Decision
HYPOTHESIS TESTING
Controlling Type I and Type II Errors

o The experimenter is free to determine o. If the test leads to the
rejection of Ho, the researcher can then conclude that there is
sufficient evidence supporting Ha at o level of significance.

o Usually, | is unknown because its hard to calculate it. The common
solution to this difficulty is to withhold judgment if the test leads to
the failure to reject Ho.

o o and | are inversely related. For a fixed sample size n,
as o decreases | increases.

o In almost all statistical tests, both o and | can be reduced by
increasing the sample size.

o Because of the inverse relationship of o and |, setting a very small o
should also be avoided if the researcher cannot afford a very large
risk of committing a Type II error.




HYPOTHESIS TESTING
The choice of o usually depends on the
consequences associated with making a
Type I error.

Common Choices
Of
Consequences of
Type I Error

0.01 or smaller
0.05
0.10

Very serious
Moderately serious
Not too serious
o
HYPOTHESIS TESTING
Level of Confidence

a.) 0.05 level
95% sure that the error is only 5%.
When a different set of samples is taken from the same
population, the probability of getting a result similar to the present
study is 95%.

b.) 0.01 level
99% sure that the error is only 1%

Note:
A test is said to be significant if the null hypothesis is
rejected at the 0.05 level of significance and is considered
highly significant if the null hypothesis is rejected at the 0.01
level of significance.



HYPOTHESIS TESTING
Steps in Testing the Hypothesis

1. State the null and alternative hypotheses.

2. Decide on a level of significance, o.

3. Determine the testing procedure and methods
of analysis (responsibility of the statistician).

4. Decide on the type of data collected and
choose an appropriate test statistic and testing
procedure.


HYPOTHESIS TESTING
Steps in Testing the Hypothesis

5. State the decision rule.

6. Collect the data and compute for the value of the
test statistic using the sample data.

7. If decision rule is based on region of rejection:
Check if the test statistic falls in the region of
rejection. If yes, reject Ho.

If decision rule is based on p-value: Determine
the p-value. If the p-value is less than or equal to
o, reject Ho.

8. Interpret results.

HYPOTHESIS TESTING
The Test Statistic - a statistic computed from the sample
data that is especially sensitive to the differences
between Ho and Ha.

1. The test statistic should tend to take on certain values when Ho
is true and different values when Ha is true.

2. The decision to reject Ho depends on the value of the test statistic

3. A decision rule based on the value of the test statistic:
Reject Ho if the computed value of the test statistic falls
in the region of rejection.

HYPOTHESIS TESTING
Critical Value/s

the value or values that separate the critical
region from the values of the test statistic that
would not lead to rejection of the null hypothesis.

It depends on the nature of the null hypothesis,
the relevant sampling distribution, and the level
of significance.

level of significance (o): the smaller o is,
the smaller the region of rejection


HYPOTHESIS TESTING
Test Concerning Means
A. Test for one sample mean
a. When is known and n 30.

z =

b. When unknown and n < 30

t =
n
x
o

n
s
x
HYPOTHESIS TESTING
Example 1.

The production manager of a large manufacturing
company estimates that the mean age of his workers is
22.8 years. The treasurer of the firm needs more
accurate employee mean age figure in order to estimate
the cost of an annuity benefit program being considered
for employees. The treasurer takes a random sample of
70 employees and finds that the mean age of the
sampled employees is 26.2 years with a standard
deviation of 4.6 years. At 0.05 level of significance, test
the hypothesis that the mean age of the employees is not
equal to 22.8 years.


HYPOTHESIS TESTING
Solution:
The steps include:
1. Null Hypothesis : H
0
: years
Alternative Hypothesis: H
a
: years

Level of Significance:
Test Statistics: Two - tailed Test; n = 70

Critical Region: Reject the null hypothesis if z < -1.96 or z > 1.96, otherwise
accept it.
Note: z is used since our sample size n = 70 is quite large.

Compute:


Decision:
Since z
c
= 6.184 exceeds 1.96, the null hypothesis must be rejected; In other
words, the difference between = 26.2, and years is too large to
attribute it to chance. So we can say that their difference is significant. Hence,
the mean age of the employees is not 22.8.
8 . 22 =
8 . 22 =
05 . 0 = o
184 . 6
70
6 . 4
8 . 22 2 . 26
=

=
c
Z
n
x
Z
o

=
x 8 . 22 =
HYPOTHESIS TESTING
Example 2.
A random sample of 20 drinks from a soft-
drink machine has an average content of
21.9 deciliters, with a standard deviation of
1.42 deciliters. At .05 level of significance,
test the hypothesis that = 22.2 deciliters
against the alternative that < 22.2 and
assume that the distribution of the soft
drinks contents be normal.


HYPOTHESIS TESTING
Solution:
Null Hypothesis H
0
: = 22.2 deciliters
Alternative Hypothesis H
1
: < 22.2 deciliters
Level of Significance: = 0.05
Test Statistics: with df = n-1


Note: The students statistic can be used since our sample size n = 20 is
small and the soft drinks content was assumed to be normally distributed.

Criterion: Reject the null hypothesis if computed t < -1.729 (the tabular value
of t at 20-1 degrees of freedom and otherwise, accept it.

Compute:

Decision: Since computed t = -0.945 is greater than -1.729, we accept H
0
.
Conclude that the mean content of the soft drinks is equal to 22.2 deciliters.
In other words, though there is a numerical difference of 0.3, this difference
can be attributed to chance.


n
s
x
t

=
945 . 0
20
42 . 1
2 . 22 9 . 21
=

=
c
t
HYPOTHESIS TESTING
Test of Differences of Two Means
a. When n
1
30 or n
2
30

z =

b. When n
1
< 30 and n
2
< 30

t = where

2
2
2
1
2
1
2 1
n n
x x
o o
+

2 1
2 1
1 1
n n
s
x x
p
+

( ) ( )
2
1 1
2 1
2
2 2
2
1 1
+
+
n n
s n s n
s
p
=
HYPOTHESIS TESTING
Example 3.

An instructor wishes to determine which of the two
methods of teaching: A or B, is more effective in teaching
certain concepts in Physics. In a class of 36 students, he
used method A and in the other class of 40 students,
method B. He gave the same final examination for both
class and garnered the following results:
Method A Method B


Is the instructor correct in assuming that method A is more
effective than method B. Use 0.01 level of significance.



78
1
= x 70
2
= x
4
1
= s
6
2
= s



HYPOTHESIS TESTING
Solution:

Null Hypothesis H
0
:
Alternative Hypothesis H
1
:

Level of Significance:
Test Statistics:

Z =


Critical Region: Reject the null hypothesis if Z
c
> 2.326; otherwise state the difference
between two sample means is not significant.

Compute:

Z
c
=


Decision: Since Z
c
= 6.899 is greater than 2.326 the null hypothesis must be rejected.
Conclude that the instructors claim is correct that method A is more effective than method B.



B A
=
B A
>
01 . 0 = o
2
2
2
1
2
1
2 1
n n
x x
o o
+

( ) ( )
899 . 6
40
6
36
4
70 78
2 2
=
+

HYPOTHESIS TESTING
Example:
The height of two sections of females classes
were compared. The first section with only 24
students recorded an average height of 163.5 cm
with a standard deviation of 6.9 cm. The other
class with 20 students recorded an average
height of 160.3 standard deviation of 5.5 cm. Is
there a reason to believe that the first section of
female students are taller than the females of the
second class. Use 0.05 level of significance.

HYPOTHESIS TESTING
Solution:

Null Hypothesis H
0
:
Alternative Hypothesis H
1
:

Level of Significance: with df = n(1) + n(2) - 2
Test Statistics:

t = s
p
=


Criterion: Reject the null hypothesis if t
c
> 1.68 for 24 + 20 2 = 42 degrees of freedom;
otherwise state the difference between two sample means is not significant.

Compute:
s
p
= t =


Decision: Since t
c
= 1.684, the null hypothesis must be accepted; in other words, we
conclude that the female students of the first section are not taller than the other class.
The difference in the mean heights is not significant.

2 1
=
2 1
>
05 . 0 = o
2 1
2 1
1 1
n n
s
x x
p
+

( ) ( )
2
1 1
2 1
2
2 2
2
1 1
+
+
n n
s n s n
( ) ( )
31 . 6
2 20 24
) 5 . 5 ( 1 20 ) 9 . 6 ( 1 24
2 2
=
+
+
675 . 1
20
1
24
1
31 . 6
3 . 160 5 . 163
=
+

CORRELATION
CORRELATION
Definition:
Correlation is a method used to measure the
strength of relationship between two variables that
tend to vary together in a consistent way. The nature
and degree of relationship is indicated by a coefficient,
designated by letter r.
By direct causal relations, we mean that if X and Y
are correlated, then X is partly the cause of Y or Y is
partly the cause of X.
CORRELATION
Examples of correlation:

There is correlation between
- income and savings
- the extent of fatigue and performance
in speed test.

There is no correlation between
- weight and IQ, or
- shoe size and mathematical ability

CORRELATION
The Scatter Diagram

One can usually and roughly estimate if a
relationship exists between two variables
by constructing a scatter diagram. This is
done by plotting the point corresponding to
each observation on a rectangular
coordinate system.
Scatter Plot Examples
y
x
y
x
y
y
x
x
Linear relationships Curvilinear relationships
Scatter Plot Examples
y
x
y
x
y
y
x
x
Strong relationships Weak relationships
(continued)
Scatter Plot Examples
y
x
y
x
No relationship
(continued)
CORRELATION
Examples:
1. Consider the following marks of five students in English
and Mathematics. Notice that for each student, there
corresponds two scores (paired observations).

Student English (X) Mathematics (Y)
A 55 69
B 64 85
C 96 99
D 44 52
E 83 89
CORRELATION
2. The following data are the life spans of nine husbands and wives
randomly selected from a certain community. Draw a scatter diagram
and decide whether a relationship exists between their ages.

Couple Age of Husband (X) Age of Wife (Y)
1 65 90
2 72 95
3 68 45
4 71 51
5 75 50
6 67 62
7 76 45
8 73 63
9 71 83
CORRELATION
Types of Correlation

1. A positive correlation exists when high values in one variable
are associated with high values in the second variable. This is
also true when low values in one variable are associated with low
values in the other. Thus, there is a direct relationship that exists
in positive correlated variables. Also, in a positive correlation, the
points on the scatter diagram closely follow a straight line rising
to the right.
Examples:
problem solving ability and reading comprehension
income and savings
income and expenses
CORRELATION
Types of Correlation:

2. A negative correlation exists when high values
in one variable are associated with low values
in the second variable, and vice versa. Here,
points on the scatter diagram closely follow a
straight line falling to the right.

Example:
pressure and volume (at constant temperature)
CORRELATION
Types of Correlation
3. A zero correlation exists when scores in one variable
tend to score neither systematically high nor
systematically low in the other variable. The points on
the scatter diagram are spread in a random manner
when this relationship exists.

Examples:
sex and IQ
athletic ability and mental ability
shoe size and mathematical performance
CORRELATION
Note:

Correlational descriptions are descriptive and they
may not be sufficient to explain the relationship
between two variables.

Correlation coefficient (r) is a numerical measure
of the linear relationship between two variables. Its
values range from -1 to +1.
Correlation Coefficient
The population correlation coefficient
(rho) measures the strength of the
association between the variables

The sample correlation coefficient r is
an estimate of and is used to
measure the strength of the linear
relationship in the sample
observations
(continued)
Features of and r
Unit free
Range between -1 and 1
The closer to -1, the stronger the
negative linear relationship
The closer to 1, the stronger the positive
linear relationship
The closer to 0, the weaker the linear
relationship
r = +.3 r = +1
Examples of Approximate r Values
y
x
y
x
y
x
y
x
y
x
r = -1 r = -.6 r = 0
CORRELATION
Correlational Tests:

1. Pearson Product Moment Correlation
It measures the degree of relation between two at least
interval scale data.

2. Spearmans Rank Correlation Coefficient
It is the measure of the correlation between two ordinal
variables.

3. Phi-Coefficient
The phi coefficient determines the degree of relationship
between two variables which are both nominal dichotomous
like sex (male-female) and marital status (married-unmarried).

4. Point Biserial
It measure correlation between an interval and a nominal
dichotomous data.
CORRELATION
Interpretation of the Correlation Coefficient
Once the value of r is found significant, the rule of
thumb for assessing the degree of relationship between
the two quantitative variables can be interpreted using
the following criteria:

r-value Verbal Description
0.00-0.29 Little or weak positive (negative) correlation
0.30-0.49 Low positive (negative) correlation
0.50-0.69 Moderate positive (negative) correlation
0.70-0.89 High positive (negative) correlation
0.90-1.00 Very High or strong positive (negative)
correlation
CORRELATION
Test of significance for r

When r is calculated on the basis of sample data,
we may get a strong positive or negative correlation
purely by chance, even though there is actually no
linear relationship whatever between the two variables
in the population from which the sample came. The
value we obtain for r is only an estimate of a
corresponding parameter, the population correlation
coefficient (). What r measures for a sample,
measure s for a population.
CORRELATION
1. T-distribution with n-2 degrees of freedom



This is used to test the significance of r arising from
Pearson, Spearman, and Point Biserial.

Note: Reject the null hypothesis of no correlation at
the level of significance, if the computed value of t
exceeds the value of the critical t for one-tailed test or
for a two-tailed test; otherwise we accept the null
hypothesis.
2
1
2
r
n r
t

=
CORRELATION
2. The Inference about the phi
coefficient uses


1 = n r Z
|
CORRELATION
NOTE:

The coefficient of determination, the
square of the coefficient of correlation, r
2
,
is the proportion of the total variation in the
dependent variable (y) that can be
attributed to the relationship with the
independent variable (x).




Calculating the Correlation Coefficient



=
] ) y y ( ][ ) x x ( [
) y y )( x x (
r
2 2
where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable




=
] ) y ( ) y ( n ][ ) x ( ) x ( n [
y x xy n
r
2 2 2 2
Sample correlation coefficient:
or the algebraic equivalent:
Sample Calculation
Tree
Height
Trunk
Diameter
y X xy y
2
x
2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
E=321 E=73 E=3142 E=14111 E=713
0
10
20
30
40
50
60
70
0 2 4 6 8 10 12 14
0.886
] (321) ][8(14111) (73) [8(713)
(73)(321) 8(3142)
] y) ( ) y ][n( x) ( ) x [n(
y x xy n
r
2 2
2 2 2 2
=


=


=


Trunk Diameter, x
Tree
Height,
y
Sample Calculation
(continued)
r = 0.886 strong high positive
linear association between x and y
Significance Test for Correlation
Hypotheses
H
0
: = 0 (no correlation)
H
A
: 0 (correlation exists)

Test statistic

(with n 2 degrees of freedom)
2 n
r 1
r
t
2

=
Solution
Is there evidence of a linear relationship
between tree height and trunk diameter at
the 0.05 level of significance?
H
0
:

= 0 (No correlation)
H
1
: 0 (correlation exists)
o = 0.05 , df = 8 - 2 = 6
4.68
2 8
.886 1
0.886
2 n
r 1
r
t
2 2
=

=
4.68
2 8
.886 1
.886
2 n
r 1
r
t
2 2
=

=
Solution
Conclusion:
There is
evidence of a
linear relationship
at the 5% level of
significance
Decision:
Reject H
0
Reject H
0
Reject H
0
o/2=.025
-t
/2
Do not reject H
0
0

t
/2
o/2=.025
-2.4469 2.4469
4.68
d.f. = 8-2 = 6


Thank You for Listening!

Das könnte Ihnen auch gefallen