You are on page 1of 52

Coverage

Measures of Central Tendency


Mean
Median
Mode

Measures of Variability and Dispersion


Range
Average deviation
Variance
Standard deviation

Introduction to Notations
If variable X is the variable of
interest, and that n
measurements are taken;
then the notation X1, X2, X3, ,
Xn will be used to represent n
observations.

Sigma

Indicates summation of

Summation Notation
If variable X is the variable of interest,
and that n measurements are taken;
the sum of n observations can be
written as
n

X
=
X
+X
+

+X
i
1
2
n
i=1

Summation Notation
Upper limit of summation
Greek letter Sigma
n

X
=
X
+X
+

+X
i
1
2
n
i=1
Lower limit of summation

Rules of Summation
n

(X
+Y
)
=

X
+

Y
i
i
i
i
i=1
i=1
i=1
summation of the
sum of variables
is

the sum of
their
summations

The summation of the sum of


variablesn is

(a
+b
++z
)
i
i
i
i=1
=
n
n
n

a
+

b
+

z
i=1 i
i=1 i
i=1 i

the sum of their summation

Rules of Summation
If c is a constant, then
n

cX
=
c

X
i
i
i=1
i=1
= c(X1+X2+ +Xn)

Rules of Summation
n

i=1 c = nc
The summation of a constant is
the product of upper limit of
summation n and constant c.

Statistics in Research

MEASURES OF CENTRAL
TENDENCY

Mean
The sum of all values of the
observations divided by the total
number of observations
The sum of all scores divided by the
total frequency

Population mean

= i=1
Xi
N

Sample mean x

=
X
i
i=1
n

Mean in an Ungrouped
Frequency
n

f
X
i
i
i=1

where f is the frequency


of the occurring score

n
= (f1X1+f2X2+ +fnXn)
n

Properties - Mean
The most stable measure of central
tendency
Can be affected by extreme values
Its value may not be an actual value in
the data set
If a constant c is added/substracted to
all values, the new mean will
increase/decrease by the same amount
c

Median
Positional middle of an array of data
Divides ranked values into halves
with 50% larger than and 50%
smaller than the median value.

If n is odd:

Md = X(n+1)/2
If n is even:

Md = Xn/2 + X(n/2)+1
2

Properties - Median
The median is a positional measure
Can be determined only if arranged in
order
Its value may not be an actual value in
the data set
It is affected by the position of items in
the series but not by the value of each
item
Affected less by extreme values

Mode
Value that occurs most frequently in
the data set
Locates the point where scores occur
with the greatest density
Less popular compared to mean and
median measures

Properties - Mode
It may not exist, or if it does, it may
not be unique
Not affected by extreme values
Applicable for both qualitative and
quantitative data

Statistics in Research

MEASURES OF
VARIABILITY AND
DISPERSION

Range
Measure of distance along the
number line over where data exists
Exclusive and inclusive range
Exclusive range = largest score smallest score
Inclusive range = upper limit - lower
limit

Properties - Range
Rough and general measure of
dispersion
Largest and smallest extreme values
determine the range
Does not describe distribution of
values within the upper and lower
extremes
Does not depend on number of data

Absolute Deviation
Average of absolute deviations of
scores from the mean (Mean
Deviation)
or the median (Median Absolute
Deviation)

MD = i=1 | Xi - X |
n
n

MAD = | Xi - Md |
i=1

Properties Absolute
Deviation
Measures variability of values in the
data set
Indicates how compact the group is
on a certain measure

Variance
Average of the square of deviations
measured from the mean
Population variance (2) and sample
variance (s2)

= i=1 ( Xi - )
2

N
n

s = ( Xi X )
2

i=1

n -1

s = n Xi - ( Xi )
2

i=1

i=1

n(n -1)

Properties Variance
Addition/subtraction of a constant c
to each score will not change the
variance of the scores
Multiplying each score by a constant
c changes the variance, resulting in a
new variance multiplied by c2

Standard Deviation
Square root of the average of the
square of deviations measured from
the mean square root of the
variance
Population standard deviation () and
sample standard deviation (s)

= i=1
( Xi - )

N
n

s = ( Xi X )
i=1

n -1

Why n-1?
Degrees of freedom
Measure of how much precision an
estimate of variation has
General rule is thatthe degrees of
freedom decrease as more parameters
have to be estimated
Xbar estimates
Using an estimated mean to find the
standard deviation causes the loss of ONE
degree of freedom

Properties Standard
Deviation
Most used measure of variability
Affected by every value of every
observation
Less affected by fluctuations and
extreme values

Properties Standard
Deviation
Addition/subtraction of a constant c
to each score will not change the
standard of the scores
Multiplying each score by a constant
c changes the standard deviation,
resulting in a new standard deviation
multiplied by c

Choosing a measure
Range
Data are too little or scattered to justify
more precise and laborious measures
Need to know only the total spread of scores

Absolute Deviation
Find and weigh deviations from the
mean/median
Extreme values unduly skews the standard
deviation

Choosing a measure
Standard Deviation
Need a measure with the best stability
Effect of extreme values have been
deemed acceptable
Compare and correlate with other data
sets

Statistics in Research

FREQUENCY
DISTRIBUTION

Raw data
74
72
72
72
73
73
73
73
74
50

79
79
77
77
78
59
79
79
79
57

69
68
66
66
68
65
68
68
68
63

72
71
69
69
70
69
71
71
71
69

53
50
50
50
50
50
51
52
53
72

76
75
75
75
75
75
76
76
76
74

62
60
59
60
60
77
62
62
62
77

82
81
80
81
81
80
81
82
82
80

84
84
82
83
83
82
84
84
84
82

87
86
85
85
86
84
87
87
87
84

96
91
88
89
89
87
92
94
94
87

Array
50
50
50
50
50
50
51
52
53
53

57
59
59
60
60
60
62
62
62
62

63
65
66
66
68
68
68
68
68
69

69
69
69
69
70
71
71
71
71
72

72
72
72
72
73
73
73
73
74
74

74
75
75
75
75
75
76
76
76
76

77
77
77
77
78
79
79
79
79
79

80
80
80
81
81
81
81
82
82
82

82
82
82
83
83
84
84
84
84
84

84
84
85
85
86
86
87
87
87
87

87
87
88
89
89
91
92
94
94
96

Frequency Distribution
Table
Class Frequency
Number of observations within a class, f

Class Limits
End numbers of the class

Class Interval
Interval between the upper and lower
class limits, ie: [Xupper limit , Xlower limit ]

Frequency Distribution
Table
Class Boundaries
True limits of the class, halfway between class
limit of the current class and that of the
preceding/succeeding class, LCB and UCB

Class Size
Difference between UCB and LCB,
ie: XUCB - XLCB

Class Mark
Midpoint of the class interval, average value
of the upper and lower class limits, ie. Xupper limit
- Xlower limit

Constructing an FDT
Determine number of classes
Sturges Formula, K = 1 + 3.322 log n
Square Root, K = sqrt(n)

Determine the approximate class


size, C = R/K
Round off C to a more convenient
number C

Constructing an FDT
Determine lower class limit
Lowest class should not be empty, must
contain the lowest value in the data set

Determine succeeding lower class


limits by adding class size C to the
current lower class limit
Tally frequencies

Array
50
50
50
50
50
50
51
52
53
53

57
59
59
60
60
60
62
62
62
62

63
65
66
66
68
68
68
68
68
69

69
69
69
69
70
71
71
71
71
72

72
72
72
72
73
73
73
73
74
74

74
75
75
75
75
75
76
76
76
76

77
77
77
77
78
79
79
79
79
79

80
80
80
81
81
81
81
82
82
82

82
82
82
83
83
84
84
84
84
84

84
84
85
85
86
86
87
87
87
87

87
87
88
89
89
91
92
94
94
96

Class

Frequency Distribution
Table
Frequen
cy

LCB

UCB

RF

<CF

>CF

50-54

10

49.5

54.5

0.09

10

110

55-59

54.5

59.5

0.03

13

100

60-64

59.5

64.5

0.07

21

97

65-69

13

64.5

69.5

0.12

34

89

70-74

17

69.5

74.5

0.15

51

76

75-79

19

74.5

79.5

0.17

70

59

80-84

22

79.5

84.5

0.20

92

40

85-89

13

84.5

89.5

0.12

105

18

90-94

89.5

94.5

0.04

109

95-99

94.5

99.5

0.01

110

Other Terms
Relative frequency, RF
Class frequency divided by number of
observations, ie. RF = fi / n

Relative Frequency Percentage, RFP


RF = (fi / n) x 100%

Cummulative frequency
Shows accumulated frequencies of
successive classes, either from the
beginning (less than CF) or end (greater
than CF) of the FDT

Mean from an FD
K

X = i=1
fiXi
K

fi
i=1

where Xi = class mark of the ith cla

Median from an FD
Md = LCBMd + C n/2 - <CFMd-1
fMd
where LCBMd = lower class boundary of median
class
<CFMd-1 = less than cumulative frequency

Mode from an FD
Mo = LCBMo + C

fMo - fMo-1
2fMo - fMo-1 - fMo+1

where LCBMo = lower class boundary of modal class


fMo, fMo-1, fMo+1 = frequency of modal class, class
preceding and
class succeeding the

Mean Deviation from an


FD
n

MD = fi |Xi - X|
i=1

n
where Xi =
class mark of the ith class
n = total number of observations; total
frequency, ie. n = fi

Variance from an FD
n

s = fi(Xi - X)
2

i=1

(n -1)
where Xi =
class mark of the ith class
n = total number of observations; total
frequency, ie. n = fi

Variance from an FD
n

s = n fiXi - ( fiXi )
2

i=1

i=1

n(n -1)
where Xi =
class mark of the ith class
n = total number of observations; total
frequency, ie. n = fi