Beruflich Dokumente
Kultur Dokumente
S
N
A
H
IT
H
Prepared by
AT
Dr. S. ATHITHAN
F
O
Assistant Professor
S
Department of of Mathematics
TE
S
? Measures of dispersion: Range, Quartile deviation, Mean deviation, Definition, Problems
N
? Standard deviation and Co-efficient of variation: Definition, Problems
A
H
? Skewness, Definition, Problems
IT
? Kurtosis and Moments, Definition, Problems
H
AT
F
O
S
TE
O
N
E
R
TU
C
LE
Page 1 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
Contents
1 Measures of Central Tendency 6
2 Measures of Dispersion 6
D EAR A LL , H ERE I HAVE SOLVED FEW PROBLEMS ONLY AND SOME TOPICS MAY BE
MISSED . P LEASE FOLLOW THE CLASSWORK TO HAVE ALL THE TOPICS FOR PREPARA -
S
TION . TAKE E XRECISE PROBLEMS GIVEN AT THE END FOR YOUR PRACTICE . A PART
N
FROM E XERCISE , YOU CAN FOLLOW ANY REFERENCE BOOK FOR YOUR PRACTICE .
A
H
S OME OF THE SECTIONS / TOPICS IN THESE UNITS ARE PRELIMINARY IDEAS WHICH
ARE BASICS NEEDED TO DO OUR REGULAR COURSE EXAMPLES AND EXERCISES .
IT
H
AT
F
O
S
TE
O
N
E
R
TU
C
LE
Page 2 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
Measures of Central Tendency- Table 1
S. No. Attribute Forumula
1. Raw or Ungrouped data Discrete Freq. Dist. Contns. Freq. Dist.
Arithmetic Mean x
n
P n
P n
P n
P n
P
xi fi xi fi xi fi xi fi xi
i=1 i=1 i=1 i=1 i=1
P = P =
n fi N fi N
n
P n
P n
P
di fi di fi di
i=1 i=1 i=1
A+ A+ A+ ×c
S
n N N
N
A -Assumed Mean, di = xi − A, c -common length/width of the class interval
A
2. Raw or Ungrouped data Discrete Freq. Dist. Contns. Freq. Dist.
Geometric Mean GM
H
IT
n
!1/n n
!1/N n
!1/N
xfi i xfi i
Y Y Y
xi
H
i=1 i=1 AT i=1
P n P n P n
log xi fi log xi fi log xi
F
i=1
Antilog i=1 Antilog i=1
Antilog
O
P P
n f =N
i
f =N i
S
TE
n N N
N
n n n
P 1 P fi P fi
xi xi xi
i=1 i=1 i=1
E
R
TU
C
LE
Page 3 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
Measures of Central Tendency- Table 2
S. No. Attribute Forumula
1. Raw or Ungrouped data Discrete Freq. Dist. Contns. Freq. Dist.
Median
N
If the number of Calculate and Median =
observation is odd, 2 N
−m
see the cumulative l+ 2
×c
the median is the frequency (c.f.) just f
n + 1 th N where l is the lower
size of greater than The
2 2 limit of the median
item or value. corresponding value of class, m is cumula-
If the number of x is median. tive frequency of the
observation is even, pre-median class, f
S
median is the
arithmetic is the corresponding
n th
mean of and frequency of the me-
N
2 dian class and c is the
A
th
n common length/width
+1 item.
H
2 of the class interval.
IT
2.
Quartiles
H
vations arranged in
i value 4
×c
ascending order of 4
AT fi
magnitude Qi = with , i = 1, 2, 3 where li is the lower
th in the cumula-
n+1 limit of the Qi class,
i ob- tive frequency with
F
4 mi is cumulative fre-
n
O
median.
3.
Mode
×c
of times in the given maximum frequency. 2f1 − (f0 + f2 )
TU
Page 4 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
Measures of Dispersion
S. No. Attribute Forumula Co-efficient
L−S
1. L − S (Largest-Smallest)
Range L+S
Q3 − Q1 Q3 − Q1
2.
Quartile Deviation (QD) 2 Q3 + Q1
n
P
|xi − M |
i=1 MD
3.
Mean Deviation (MD) about n M
S
n
P
fi |xi − M |
N
i=1
n M −Mean or Median
Mean, Median and Mode
A
P
fi = N or Mode
i=1
H
IT
v
2
u n 2 n
uP P
xi xi
H
u
u i=1 i=1
4. u − Coefficient of
Standard Deviation (SD) t n AT
n
v
F
2
u n 2 n
uP P
di di
O
u
√ u i=1 i=1
SD=σ = V ar t n − n variation (CV) is
u
S
TE
v
2
u n n
O
uP P
f d2 fd
u i=1 i i i=1 i i
u
SD
N
u −
×c
× 100
V ar = V ariance t N
N M ean
E
R
Page 5 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
1 Measures of Central Tendency
One of the important objectives of statistics is to find out various numerical values which ex-
plains the inherent characteristics of a frequency distribution. The first of such measures is
averages. The averages are the measures which condense a huge unwieldy set of numerical
data into single numerical values which represent the entire distribution. Averages are also
called measures of location since they enable us to locate the position or place of the distribu-
tion in question. Averages are statistical constants which enables us to comprehend in a single
value the significance of the whole group.
S
Objectives of Central Tendency
N
A
The most important objective of calculating an average or measuring central tendency is to
H
determine a single figure which may be used to represent a whole series involving magnitudes
of the same variable. Second objective is that an average represents the empire data, it facilitates
IT
comparison within one group or between groups of data. Thus, the performance of the members
H
of a group can be compared with the average performance of different groups. Third objective
AT
is that an average helps in computing various other statistical measures such as dispersion,
skewness, kurtosis etc.
F
Different methods of measuring Central Tendency provide us with different kinds of averages.
O
The following are the main types of averages that are commonly used:
S
1. Mean
TE
2. Median
3. Quartiles
C
LE
4. Mode
2 Measures of Dispersion
Measures of central tendency, Mean, Median, Mode, etc., indicate the central position of a
series. They indicate the general magnitude of the data but fail to reveal all the peculiarities
and characteristics of the series. In other words, they fail to reveal the degree of the spread out
or the extent of the variability in individual items of the distribution. This can be explained by
certain other measures, known as ‘Measures of Dispersion’ or Variation.
Page 6 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
Simplest meaning that can be attached to the word ‘dispersion’ is a lack of uniformity in the
sizes or quantities of the items of a group or series. According to Reiglemen, “Dispersion is the
extent to which the magnitudes or quantities of the items differ, the degree of diversity.” The
word dispersion may also be used to indicate the spread of the data. In all these definitions, we
can find the basic property of dispersion as a value that indicates the extent to which all other
values are dispersed about the central value in a particular distribution.
Methods of studying dispersion are divided into two types :
1. Mathematical Methods: We can study the ‘degree’ and ‘extent’ of variation by these
methods. In this category, commonly used measures of dispersion are :
(i) Range
S
(ii) Quartile Deviation
N
A
(iii) Average/Mean Deviation about any point (the point may be mean, median, mode or
H
any other)
IT
(iv) Standard deviation and
H
(v) Coefficient of variation. AT
2. Graphical Methods: Where we want to study only the extent of variation, whether it is
higher or lesser a Lorenz-curve is used.
F
O
Definition 3.0.1 (Moments about origin). The r th moment of a random variable X about the
O
origin is defined as E(X r ) and is denoted by µ0r . Moments about origin are known as raw
N
moments.
E
1. µ01 = E(X)=Mean
C
2. µ02 = E(X 2 )
LE
3. µ03 = E(X 3 )
4. µ04 = E(X 4 )
2
Note: V ar(X) = E(X 2 ) − [E(X)]2 = µ02 − µ01 =Second moment - square of the first
moment.
Definition 3.0.2 (Moments about mean or Central moments). The r th moment of a random
variable X about the mean µ is defined as E[(X − µ)r ] and is denoted by µr .
The first four moments about the mean are given by
1. µ1 = E(X − µ) = E(X) − E(µ) = µ − µ = 0
Page 7 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
2. µ2 = E[(X − µ)2 ] = V ar(X)
3. µ3 = E[(X − µ)3 ]
4. µ4 = E[(X − µ)4 ]
Definition 3.0.3 (Moments about any point a). The r th moment of a random variable X about
any point a is defined as E[(X − a)r ] and we denote it by m0r .
The first four moments about a point ‘a’ are given by
1. m01 = E(X − a) = E(X) − a = µ − a
S
3. m03 = E[(X − a)3 ]
N
A
4. m04 = E[(X − a)4 ]
H
IT
Relation between moments about the mean and moments about any arbitrary point a
H
AT
Let µr be the r th moment about mean and m0r be the r th moment about any point a. Let µ be
the mean of X.
F
O
S
∴ µr = E[(X − µ)r ]
TE
= E[(X − a) − (µ − a)]r
= E[(X − a) − m01 ]r
O
Page 8 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
Formulae:
S
p
β1 = 33 , γ1 = + β1 and β2 = 2 , γ2 = β2 − 3
µ2 µ2
N
A
2. Karl Pearson’s Coefficient of Skewness
H
M ean − M ode x − M ode
IT
Sk = =
SD σ
H
Sometimes the mode may not be properly defined for the given data, in that case
AT
3(M ean − M edian) 3(x − Md )
Sk = =
SD σ
F
O
The limits for the Karl Pearson’s Coefficient of Skewness are ±3. In practice,
these limits are rarely attained.
S
TE
Sk = =
(Q3 − Md ) + (Md − Q1 ) Q3 − Q1
E
Note 3.0.1. Note that the range of Bowley’s Coefficient of Skewness is between -1
R
and +1.
TU
√
LE
β1 (β2 + 3)
Sk =
2(5β2 − 6β1 − 9)
Page 9 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
S
N
A
H
IT
E XAMPLE 3.1
H
The bus fare of 7 selected B.Sc. students is recorded as follows (Rs.) : 10, 5, 15, 8, 6, 14
AT
and 12. Calculate the arithmetic mean of this data.
Hints/Solution:
F
O
Let the bus fare be denoted by x. First Arrange them in ascending order. Then we have
S
TE
X
Bus Fare x 5 6 8 10 12 14 15 x = 70
O
P
x 70
N
E XAMPLE 3.2
Calculate Geometric Mean (GM) and Harmonic Mean (HM) for the following data:
C
f 8 12 18 8 6
Hints/Solution:
Page 10 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
S
0 − 10 5 8 0.2 1.6
10 − 20 15 12 0.0666667 0.8
N
20 − 30 25 18 0.04 0.72
A
30 − 40 35 8 0.0285714 0.228571
H
40 − 50 45 6 0.0222222 0.133333
IT
T otal 52 0.35746 3.4819
H
n
P
n
P
AT fi log xi
i=1
f log xi
P
fi =N
i=1 i
GM = Antilog P
(OR) 10
f =N i
F
O
S
[1.2911]
= Antilog [1.2911] (OR) 10 = 19.549968
TE
O
N
N
E
HM = P
n = 14.934354
R
fi
xi
TU
i=1
C
LE
E XAMPLE 3.3
Calculate Mean deviation about mean, median and mode for the following data:
x 0-10 10-20 20-30 30-40 40-50
f 1 3 5 4 2
Hints/Solution:
Page 11 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
C.I. M id x f cf d = x − x = Md d1 = x − M0 fd f d1
0 − 10 5 1 1 22 21.7 22 21.7
10 − 20 15 3 4 12 11.7 36 35.1
20 − 30 25 5 9 2 1.7 10 8.5
30 − 40 35 4 13 8 8.3 32 33.2
40 − 50 45 2 15 18 18.3 36 36.6
T otal 15 62 61.7 136 135.1
n
P
fi di
i=1
Mean(x) = A + × c = 27
S
N
N
N
2
−m
× c = 27
A
Second Quartile=Median(Md = Q2 ) = l +
f
H
f1 − f0
IT
Mode(M0 ) = l + × c = 26.66 u 26.7
2f1 − (f0 + f2 )
H
n
P
fi |xi − x|
AT
i=1 136
Mean Deviation (MD) (about mean x) = n = = 9.0666
15
F
P
fi = N
O
i=1
n
P
fi |xi − Md |
S
i=1 136
TE
i=1
n
N
P
fi |xi − M0 |
i=1 135.1
Mean Deviation (MD) (about mode M0 ) = n = = 9.00666
E
P
fi = N 15
R
i=1
TU
C
E XAMPLE 3.4
LE
Calculate Karl Pearson’s, Bowley’s Coefficient of skewness for the following data:
x 0-10 10-20 20-30 30-40 40-50
f 1 3 5 4 2
Hints/Solution:
Page 12 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
x − 25
C.I. M id x f cf d= fd f d2
10
0 − 10 5 1 1 −2 −2 4
10 − 20 15 3 4 = Q1 Class −1 −3 3
20 − 30 25 5 9 0 0 0
30 − 40 35 4 13 = Q3 Class 1 4 4
40 − 50 45 2 15 2 4 8
T otal 15 0 3 19
n
P
fi di
S
i=1
Mean(x) = A + × c = 27
N
N
A
N
2
−m
Second Quartile=Median(Md = Q2 ) = l + × c = 27
H
f
IT
f1 − f0
H
Mode(M0 ) = l + × c = 26.66
2f1 − (f0 + f2 ) AT
N
4
−m 3.75 − 1
First Quartile Q1 = l + × c = 10 + × 10 = 19.1666
f 3
F
O
3 N4 − m 11.25 − 9
Third Quartile Q3 = l + × c = 30 + × 10 = 35.625
S
f 4
TE
v
uP n P n 2
O
fi d2i fi di
u s
u 2
× c = 19 − 3
N
u i=1 i=1
SD(σ) = u
t N −
N
× 10 = 11.075
15 15
E
R
TU
Sk = = = = 0.0301
SD σ 11.075
LE
The limits for the Karl Pearson’s Coefficient of Skewness are ±3. In practice, these
limits are rarely attained.
Page 13 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
Note 3.0.2. Note that the range of Bowley’s Coefficient of Skewness is between -1 and
+1.
E XAMPLE 3.5
Calculate the Coefficient of skewness based on moments, measures of skewness β1 and
measures of kurtosis β2 for the following data:
S
x 0-10 10-20 20-30 30-40 40-50
f 1 3 5 4 2
N
A
Hints/Solution:
H
IT
H
C.I. M id x f cf d = x − 25 fd
AT f d2 f d3 f d4
0 − 10 5 1 1 −20 −20 400 −8000 160000
10 − 20 15 3 4 −10 −30 300 −3000 30000
20 − 30 25 5 9 0 0 0 0 0
F
30 − 40
O
OR
O
x − 25
N
C.I. M id x f cf d= fd f d2 f d3 f d4
10
E
0 − 10 5 1 1 −2 −2 4 −8 16
R
10 − 20 15 3 4 =m −1 −3 3 −3 3
TU
l = 20 − 30 25 5 =f 9 → Median class 0 0 0 0 0
30 − 40 35 4 13 1 4 4 4 4
C
40 − 50 45 2 15 2 4 8 16 32
LE
T otal 125 15 15 0 3 19 9 55
n
P
fi di
i=1
Mean=A + × c = 27
N
N
2
−m
Median = l + × c = 27
f
f1 − f0
Mode=l + × c = 26.66
2f1 − (f0 + f2 )
Page 14 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
Moments about any point a = 25 are given below:
n
P
fi di
i=1
m01 = ×c=2
N
n
fi d2i
P
i=1
m02 = × c2 = 126.66667
N
n
fi d3i
P
S
i=1
m03 = × c3 = 600
N
N
n
fi d4i
P
A
i=1
m04 = × c4 = 36666.667
H
N
IT
Using the following relations along with m00 = 1, we get the moments about mean as follows:
H
µ1 = m01 − m00 m01 = 0
AT
µ2 = m02 − 2 C1 m01 · m01 + (m01 )2 m02 − (m01 )2 = 122.66667
F
µ23
Now, the measure of skewness β1 = = 0.0112343 and the measure of kurtosis β2 =
E
µ32
µ4
R
= 2.3166352
TU
µ22
Since the measure of skewness β1 = 0.0112343 > 0, the distribution is positively skewed
C
LE
E XAMPLE 3.6
Calculate the geometric mean of the following data:
x 1 7 29 92 115 375
Page 15 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
Hints/Solution:
X
Ans.: log x = 8.9060, N = 6, Antilog(1.4843) = 30.54
E XAMPLE 3.7
Calculate the geometric mean of the following data:
x 2574 475 75 5 0.8 0.08 0.005 0.0009
Hints/Solution:
S
X
Ans.: log x = 2.1208, N = 8, Antilog(0.2651) = 18.41
N
A
H
IT
E XAMPLE 3.8
Calculate the geometric mean of the following data:
H
x 1000 80 40 750 100 150 AT 120 60
f 1 50 25 2 3 4 3 5
F
Hints/Solution:
O
X
Ans.: f log x = 173.7907, N = 93, Antilog(1.8687) = 73.95
S
TE
O
E XAMPLE 3.9
N
f 5 15 25 35 45
R
TU
Hints/Solution:
C
X
Ans.: f log m = 67.1394, N = 52, Antilog(1.2911) = 19.55
LE
E XAMPLE 3.10
Calculate the harmonic mean of the following data:
x 1 0.5 10 45 175 0.01 4 11.2
Hints/Solution:
X1
Ans.: = 103.4672, N = 8, HM = 0.077
x
Page 16 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
E XAMPLE 3.11
Calculate the harmonic mean of the following data:
x 10 20 25 40 50
f 20 30 50 15 5
Hints/Solution:
X 1
Ans.: f = 5.975, N = 120, HM = 20.08
x
S
E XAMPLE 3.12
N
Calculate the harmonic mean of the following data:
A
x 0-10 10-20 20-30 30-40 40-50
H
f 8 12 20 6 4
IT
H
Hints/Solution:
X 1 AT
Ans.: f = 3.46, N = 50, HM = 14.45
x
F
O
E XAMPLE 3.13
S
Calculate the mean deviation about (i) mean (ii) median (iii) mode and (iv) coefficients
TE
of mean deviation about mean, median and mode for the following data:
O
Hints/Solution:
R
E XAMPLE 3.14
Calculate the range and quartile deviation and the coefficient of quartile deviation for the
following data:
x 0-5 5-10 10-15 15-20 20-30 30-40 40-50 50-60 60-70
f 3 5 8 12 34 46 28 14 10
Hints/Solution:
Hint: To make uniformity, arrange the class intervals as 0-10, 10-20,20-30, . . . by adding the
corresponding frequencies.
Ans.: Range is 70-0=70, Q1 = 23.53, Q2 = M edian = 33.91, Q3 = 44.29, Q.D. =
Page 17 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
10.38, Coef f icient = 0.31
E XAMPLE 3.15
Calculate the range, quartile deviation, standard deviation, variance and the coefficient of
quartile deviation, standard deviation and varation for the following data:
S
Hints/Solution:
N
A
Ans.: Range is 165.5-125.5=40, Q1 = 137.06, Q3 = 153.77, Q.D. = 8.355, x =
H
145.53, S.D. = σ = 10.28 Coefficient of variation=7.06
IT
H
E XAMPLE 3.16
AT
The scores of two players A and B are given below for 12 rounds. Identify the better/-
consistent player.
F
A 74 75 78 72 78 77 79 81 79 76 72 71
O
B 87 84 80 88 89 85 86 82 82 79 86 80
S
TE
Hints/Solution:
O
N
Hint: Use the coefficient of variation for this problem. A player having less coefficient of
variation will be consistent player
E
E XAMPLE 3.17
LE
Hints/Solution:
Page 18 of 19 https://sites.google.com/site/lecturenotesofathithans/home
15MA305-Statistics for Information Technology S. ATHITHAN
E XAMPLE 3.18
Calculate Karl Pearson’s, Bowley’s Coefficient of skewness, measures of skewness β1
and measures of kurtosis β2 for the following data:
x 0-10 10-20 20-30 30-40 40-50 50-60
f 5 20 15 45 10 15
Hints/Solution:
X X X
Ans.: N = 100, f d = −50, f d2 = 170, f d3 = −260, m01 = −5, m02 = 170, m03 = −2
S
P RACTICE MORE PROBLEMS ON SOME OF THE REFERENCE BOOKS .
N
A
Acknowledgement:
H
Some of the portions of this material are taken from the sources available from various sources.
IT
I thank the authors for those who prepared the calculus books and related materials.
H
Contact: (+91) 979 111 666 3 (or) athithan.s@ktr.srmuniv.ac.in
AT
Visit: https://sites.google.com/site/lecturenotesofathithans/home
F
O
S
TE
O
N
E
R
TU
C
LE
Page 19 of 19 https://sites.google.com/site/lecturenotesofathithans/home