Sie sind auf Seite 1von 109

St.

Paul University Philippines


Graduate School

A Course Presentation in
STATISTICS
Course Content
• Basic Concepts
• Measures of Central Tendency
• Measures of Variability
• Measures of Correlation
• Non-parametric Test
• Parametric Tests of Hypothesis
• One-Way ANOVA
Introduction to
Statistics
• Definition
• The Nature of Data
• Uses of Statistics
• Methods of Sampling
• Statistics and Computers
What is
Statistics?

Statistics
Definition
Statistics is a collection of
methods for planning
experiments, obtaining data and
then organizing, summarizing,
presenting, analyzing,
interpreting and drawing
conclusions based on the data.
Nature of Data
Population vs. Sample
A population is the complete
collection of elements (scores,
people, measurements, and so on)
to be studied.
A sample is a sub-collection of
elements drawn from a population.
Parameter vs. Statistic
A parameter is a numerical
measurement describing some
characteristic of a population.
A statistic is a numerical
measurement describing some
characteristic of a sample.
Qualitative vs.
Quantitative Data
Qualitative (or categorical or
attribute) data can be separated into
different categories that are
distinguished by some non-numerical
characteristic.
Quantitative data consist of numbers
representing counts or measurements.
Discrete vs. Continuous Data
Discrete data result from either a
finite number of possible values or a
countable number of possible values.
(That is, the number of possible
values is 0, 1, 2 or more)
Continuous data result from
infinitely many possible values that
can be associated with points on a
continuous scale in such a way that
there are no gaps or interruptions.
Levels of Measurement
Nominal Level of
Measurement
The nominal level of measurement is
characterized by data that consists
of names, labels, or categories only.
The data cannot be arranged in an
ordering scheme.
Example: gender, civil status,
nationality, religion, etc.
Ordinal Level of
Measurement
The ordinal level of measurement
involves data that may be arranged in
some order, but differences between
data values either cannot be
determined or are meaningless.
Example: good, better or best
speakers; 1 star, 2 star, 3 star movie;
employee rank
Interval Level of
Measurement
The interval level of measurement is
like the ordinal level, with the
additional property that meaningful
amounts of differences between data
can be determined. However, there
are no inherent (natural) zero starting
point.
Example: body temperature, year
(1955, 1843, 1776, 1123, etc.)
Ratio Level of
Measurement
The ratio level of measurement is
the interval modified to include the
inherent zero starting point. For
values at this level, differences and
ratios are meaningful.
Example: weights of plastic, lengths
of movies, distances traveled by cars
Determining Adequate
Sample Size
Sampling Formula
(Slovin’s)
N
n = -----------
1 + e N
2

Where n = sample size


N = population size
e = margin of error
Example for Slovin’s
Formula
If N = 3000 and e = .05, then n is
3000
n = -------------------
1 + (.05)2(3000)

n = 3000/8.5 = 352.94 = 353


SAMPLING
TECHNIQUES
Definition
• Sampling may be defined as
measuring a small portion of
something and then making a general
statement about the whole thing
(Bradfield & Moredock, 1957)
Why do we need
sampling?
Why we need sampling…
Sampling makes possible the study of
a large, heterogeneous population.
Sampling is for economy.
Sampling is for speed.
Sampling is for accuracy.
Sampling saves the sources of data
from being all consumed.
General Types of
Sampling
There are two general
types of sampling…

Probability Sampling
Non-Probability Sampling
Probability Sampling
The sample is a proportion (a certain
percent) of the population and such
sample is selected from the
population by means of some
systematic way in which every
element of the population has a
chance of being included in the
sample.
Non-Probability Sampling
The sample is not a proportion of the
population and there is no system in
selecting the sample. The selection is
dependent on the situation from
which the sample is taken.
Types of Non-Probability
Sampling are…
Accidental Sampling
Quota Sampling
Convenience Sampling
Accidental Sampling
The sample elements are
selected by chance.
Example: the researcher stands
in a street corner and interviews
everyone who passes by
Quota Sampling
Specified number of
elements of certain types
are included in the sample.
Example: the number of
viewers to a TV show
Convenience Sampling
A process of picking out
elements to constitute a
sample in the most convenient
and fastest way.
Example: samples to get
reactions to hot and
controversial issues.
Types of Probability
Sampling are…
Pure Random Sampling
Systematic Sampling
Stratified Sampling
Purposive Sampling
Cluster Sampling
RANDOM SAMPLING
Random Sampling is a sampling technique
where members of the population are
selected in such a way that each member
has an equal chance of being selected.
It is also called the lottery or raffle type
of sampling. It uses table of random
numbers.
Stratified Sampling
With stratified sampling, the
population is subdivided into at least
two different subpopulations(or
strata) that share the same
characteristics (such as gender), and
then a sample is drawn from each
stratum.
Systematic Sampling
In systematic sampling, one chooses
a starting point and then select every
kth (such as every 5th) element in
the population.
Purposive Sampling
In purposive sampling, the respondents are
chosen on the basis of their knowledge of
the information desired.
Ex: If a research is to be conducted on
the history of a place, the old people of
the place must be consulted and included
in the sample.
Cluster Sampling
In cluster sampling, the population
area is divided into sections (or
clusters), a few of those sections are
randomly selected , and then all the
members from the selected sections
are chosen as samples.
Measures of Central
Tendency
• Mean
• Median
• Mode
Measures of Central
Tendency
Mean
• The most reliable and the most
sensitive measure of position.
• It is the most widely used
measure.
• It is commonly known as the
“average” although the median and
the mode are also known as
averages.
Mean:

• It comes into 2 different


forms:
1) Simple Mean
2) Weighted Mean
Example 1:
A study was done on 5 typical fast-food
meals in Metro Manila. The following table
shows the amount of fat, in number of
teaspoons, present in each meal. Calculate
the mean amount of fat for these 5 fast-
food meals.

Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16
How to solve the simple
mean:
• The simple mean is obtained by
adding all the values/
observations of a certain
variable and divide the sum by
the total number of values,
cases or observations.
Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16

• To obtain the simple mean amount


of fat for the 5 fast-food meals
• Mean = (14+18+22+10+16)/5
• Mean = 80/5 = 16
• This means to say that mean fat
content of the 5 fast-food meals
is too much.
Exercise #2: Find the simple
mean for the following set of data:
• Data A: 17, 19, 25, 14,
18, 24, 11,19
• Data B: 79, 75, 82, 84,
82, 75, 79
• Data C: 35, 32, 37, 42, 45,
33, 41, 44, 35, 38
The simple mean for the
given data are …
• Data A: 18.38
• Data B: 79.43
• Data C: 38.20
Example 2:
• The following represents the final
grades obtained by a nursing
student one summer term:
• Anatomy (5 units) - - - 93
• Chemistry (3 units) - - - 88
• SOT 2 (2 units) - - - 89
– Find the weighted average of the
student.
To solve for the weighted average
of the student we have...
wixi
Mean = ----------
w

93(5) + 88(3) + 89(2)


Mean = --------------------------
10

465 + 264 + 178 907


Mean = ----------------------- = -------- = 90.7 (Excellent)
10 10
Example 3:
• The following represents the responses of
50 randomly chosen respondents in one
item of a research questionnaire:
• Very Strongly Agree (5) - - - 17
• Strongly Agree (4) - - - 11
• Agree (3) - - - 9
• Disagree (2) - - - 12
• Strongly Disagree (1) - - - 1
– Find the weighted response of the
respondents.
To solve for the weighted
response we have...
wixi
Mean = ----------
w

5(17) + 4(11) + 3(9) + 2(12) + 1(1)


Mean = ------------------------------------------
50

85+44+27+24+1 181
Mean = ----------------------- = -------- = 3.62 (Strongly Agree)
50 50
Table of Interpretation
(5 pt. Likert Scale)
4.20 – 5.00 Very Strongly Agree
3.40 – 4.19 Strongly Agree
2.60 – 3.39 Agree
1.80 – 2.59 Disagree
1.00 – 1.79 Strongly Disagree
The Median
What is
the
Median?
The median is . . .
• A positional measure that divides
the set of data exactly into two
parts.
• It is the score/observation that is
centrally located between the
highest and the lowest observation.
• Determined by rearranging the data
into an array.
Median for Odd Sample Median for Even Sample

n+1 n n
X = ------- X = --- + --- + 1
2 2 2
--------------
2
Using the data
in Example 1,
find the
median fat
content of the
5 meals.
Example 1:
A study was done on 5 typical fast-food
meals in Metro Manila. The following table
shows the amount of fat, in number of
teaspoons, present in each meal. Calculate
the mean amount of fat for these 5 fast-
food meals.

Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16
Median for Odd Sample

Odd???
The array for the data A is :
10, 14, 16, 18, 22
• To obtain the median fat
content of the 5 meals we have
to use the median formula for
odd sample since n = 5.
• Median = [(n + 1)/2]s
• Median = (5 + 1)/2
• Median = 3rd item = 16
Median for
Even Sample

What is
even?
The following are samples scores
obtained from a 75 item summative test:
(n= 12) 48, 53, 63, 65, 45, 47, 52, 48,
63, 54, 63, 53

Array : 45, 47, 48, 48, 52, 53, 54, 55, 63, 63, 63, 65

• Since n = 12 (even).
• Median = [ 6ths + 7ths /2]
• Median = [(53 + 54)/2] = 53.5
Find the median for
Exercise #2.
Mode
The mode is …
The most favorite score.
The score having the highest
frequency.
The most frequently occurring score.
The least reliable measure of position
Determined by way of inspection.
A set of data is said to
be …
• Unimodal or monomodal if it
has only one mode.
• Example: 33, 35, 35, 38,
40, 46
• Its mode is 35.
A set of data is said to
be …
• Bimodal if it has two modes.
• Example: 33, 35, 35, 38,
40, 40, 46
• Its modes are 35 and 40.
A set of data is said to be …
• Multimodal if it has more than
two modes.
• Example: 33, 35, 35, 38, 40,
40, 46, 46, 51, 58, 58, 60
• Its modes are 35, 40, 46 and
58.
Assignment #1: Find the mean,
median and the mode of the ff:
1. 85, 82, 83, 88, 85, 87, 89,
90
2. 12, 14, 20, 19, 23, 22, 28
3. 24, 34, 27, 27, 34, 24
4. 102, 100, 111, 100, 106, 102
5. 75, 86, 78, 84, 88, 86, 84,
85, 81, 84, 80
Grouped
Data
What is a Frequency
Distribution?
• A Frequency
Distribution is a tabular
representation of data
consisting of intervals
and their respective
frequencies.
Other ways of
presenting
data are . . .
BAR CHART
90
80
70
60
50 East
40 West
30 North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
LINE GRAPH
100
90
80
70
60 East
50 West
40 North
30
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
PIE CHART

1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Scatter Plot
100
90
80
70
60 East
50 West
40 North
30
20
10
0
0 1 2 3 4 5
How to construct a
Frequency Distribution:
• Determine the range. R = H0 –
LO.
• Determine the class size (c) using
the formula, c = (R+1)/ #CI.
• Construct the interval
• Tally the data and determine the
frequency for each interval.
The class interval in a
frequency distribution must:
• Not overlap.
• Be relatively complete where
each data can be tallied in the
different interval.
• Have a uniform class size.
• Not be less than 7 but not
more than 15.
Data:
77 77 85 72 69 80 75 69 80 64
72 68 48 60 44 87 52 74 72 76
63 81 56 71 54 76 81 78 55 74
82 59 40 73 61 80 58 75 63 48
46 51 80 42 65 54 79 57 72 67
Data:
77 77 85 72 69 80 75 69 80 64
72 68 48 60 44 87 52 74 72 76
63 81 56 71 54 76 81 78 55 74
82 59 40 73 61 80 58 75 63 48
46 51 80 42 65 54 79 57 72 67
Total 3342 Mean 66.84
Frequency Distribution
Class Interval f % CF< %
82-87 3 6% 50 100%
76-81 12 24% 47 94%
70-75 10 20% 35 70%
64-69 6 12% 25 50%
58-63 6 12% 19 38%
52-57 6 12% 13 26%
46-51 4 8% 7 14%
40-45 3 6% 3 6%
50 100%
(fM)
Mean = ----------
n
Class Interval f Mdpt. fM
82-87 3 84.5 253.5
76-81 12 78.5 942
70-75 10 72.5 725
64-69 6 66.5 399
58-63 6 60.5 363
52-57 6 54.5 327
46-51 4 48.5 194
40-45 3 42.5 127.5
50 3331

Mean = 3331/ 50 = 66.62


C (n/2 -CF<)
Median = Lbe + -----------------
fm
Where: Lbe is the lower boundary of the median class
C is the class size
fm is the frequency of the median class
CF< is the cumulative frequency less than before
the median class
Median Class is the class interval containing half of n
Class Interval f Lb CF<
82-87 3 81.5 50
76-81 12 75.5 47
70-75 10 69.5 35
64-69 6 63.5 25
58-63 6 57.5 19
52-57 6 51.5 13
46-51 4 45.5 7
40-45 3 39.5 3
50
25 - 19
Median = 63.5 + 6 (----------)
6
Median = 69.5
C (d1)
Mode = Lbo + ------------
(d1 + d2 )
Where: Lb0 is the lower boundary of the modal class
d1 is the difference in the frequency of the modal
class with the frequency of the class interval
before the modal class
d2 is the difference in the frequency of the modal
class with the frequency of the class interval
after the modal class
Modal Class is the class interval with the highest frequency
Class Interval f Lb CF<
82-87 3 81.5 50
76-81 12 75.5 47
70-75 10 69.5 35
64-69 6 63.5 25
58-63 6 57.5 19
52-57 6 51.5 13
46-51 4 45.5 7
40-45 3 39.5 3
50
2
Mode = 75.5 + 6 (------)
2 + 9
Mode = 76.59
Compute the Mean, Median
and Mode of the distribution
Class Interval f M fM CF Lb
86-95 5
76-85 14
66-75 28
56-65 17
46-55 17
36-45 16
26-35 3
100
Compute for the Mean, Median & Mode
Class Interval f Md fd CF Lb
98-100 16 99 1584.0 150 97.5
95-97 21 96 2016.0 134 94.5
92-94 13 93 1209.0 113 91.5
89-91 20 90 1800.0 100 88.5
86-88 27 87 2349.0 80 85.5
83-85 24 84 2016.0 53 82.5
80-82 13 81 1053.0 29 79.5
77-79 16 78 1248.0 16 76.5
150 13275
Uses of the Measures
of Central Tendency
The Mean is used…
 For interval and ratio measurements
 When there are no extreme values in a
distribution since it is easily affected by
extremely high or extremely low scores
 When higher statistical computations are
wanted
 When the greatest reliability of the
measure of central tendency is wanted
since its computations include all the given
values
The Median is used…
 For ordinal and ranked measurements
 When there are extreme values, thus the
distribution is markedly skewed
 For an open-end distribution; that is, the
lowest or the highest class interval or both
are defined (i.e., 50 and below or 100 and
above)
 When one desires to know whether the
cases fall within the upper halves or the
lower halves of a distribution.
The Mode is used…
For nominal and categorical data
When a rough or quick estimate of a
central value is wanted
When the most popular or the most
typical case or value in a distribution
is wanted
Limitations of the
Measures of Central
Tendency
The Limitations of the Mean…
 It is the most widely used average, since it
is the most familiar. However, it is often
misused. It can not be used if the
clustering of values. Or items is not
substantial.
 If the given values do not tend to cluster
around a central value, the mean is a poor
measure of central location.
 It is easily affected by extremely large or
small values. One small value can easily pull
down the mean.
The Limitations of the Mean…
 The mean can not be used to compare
distributions since the means of 2 or more
distributions may be the same but their
other characteristics may be entirely
different. The means of distribution A
whose values are 80, 85 and 90 and
distribution B whose values are 86, 85, 84
are both 85. We can not imply, however,
that both distributions possess the same
characteristics since their patterns of
dispersions or variations are markedly
different despite having the same mean.
The Limitations of the Median…
 It is easily affected by the number of
items in a distribution.
 It can not be determined if the given values
are not arranged according to magnitude
 If several values are contained in a
distribution, it becomes laborious task to
arrange them according to magnitude
 Its value is not as accurate as the mean
since it is just an ordinal statistic.
The Limitations of the Mode…
It is seldom or rarely used since it
does not always exist.
Its value is just a rough estimate of
the center of concentration of a
distribution.
It is very unstable since its value
easily changes depending on the
approaches used in finding it.
Measures of Variability
• The statistical tool used to
describe the degree to
which scores/ observations
are scattered/dispersed.
• It is also used to determine
the degree of consistency/
homogeneity of scores.
Measures of Variability
Range
Mean Deviation
Standard Deviation
Variance
Coefficient of Variation
Measures of Variability
R = HO – LO
MD = |X – X|/n
S = (X – X)2 /n – 1
V = S2
CV = (S/X)*100
The following are the scores obtained by
two groups of 1st year Psychology
students in PSY 102:
Group A Group B
30 30
28 20
27 18
25 16
25 15
23 15
21 14
20 13
18 12
12 12
2 Range = 30 – 12 = 18
X |X - Mean| (X - Mean)
30 7.1 50.41 Standard dev’n =

G 28 5.1 26.01 256.9/(10-1)


R 27 4.1 16.81
= 28.54
O 25 2.1 4.41 = 5.34
U 25 2.1 4.41
P Mean Dev’n = 41.2/10
23 0.1 0.01 = 4.12
21 1.9 3.61
A Variance = (5.34)2
20 2.9 8.41 = 28.54

18 4.9 24.01
12 10.9 118.81 CV = (5.34/22.9) X 100
22.9 41.2 256.9
= 23.32%
Do the same computation
for Group B…
Problem:
 Two seemingly equally excellent BS
Psychology students are vying for an
academic honor where only one must
have to be chosen to get the award.
The following are their grades used
as basis for the award:
Franzen : 91, 90, 94, 93, 92
Rico : 92, 92, 90, 94, 92
Whom do you think deserves to get
the award?
Guiding Principle
 The lesser the value of the
measure, the more consistent,
the more homogeneous and
the less scattered are the
observations in the set of
data.

Das könnte Ihnen auch gefallen