Stat For Knowledge

Lecture Notes
15MA305-Statistics for Information Technology
S
N
A
H
IT
H
Prepared by
AT
Dr. S. ATHITHAN
F
O
Assistant Professor
S
Department of of Mathematics
TE
Faculty of Engineering and Technology

O
N
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

Kattankulathur-603203, Kancheepuram District.
E
R
TU
C
LE
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

Kattankulathur-603203, Kancheepuram District.
15MA305-Statistics for Information Technology S. ATHITHAN
Unit-1
INTRODUCTION TO STATISTICS
(numerical problems only)
U NIT-1 T OPICS :
? Introduction to uni-variate data
? Measures of central tendency: Arithmetic mean, Median, Definition, Problems Median:
Definition, Problems
? Mode, Geometric Mean and Harmonic Mean: Definition, Problems
S
? Measures of dispersion: Range, Quartile deviation, Mean deviation, Definition, Problems
N
? Standard deviation and Co-efficient of variation: Definition, Problems
A
H
? Skewness, Definition, Problems
IT
? Kurtosis and Moments, Definition, Problems
H
AT
F
O
S
TE
O
N
E
R
TU
C
LE
Page 1 of 19 https://sites.google.com/site/lecturenotesofathithans/home
Contents
1 Measures of Central Tendency 6
2 Measures of Dispersion 6
3 Moments, Measures of Skewness and Kurtosis 7
D EAR A LL , H ERE I HAVE SOLVED FEW PROBLEMS ONLY AND SOME TOPICS MAY BE
MISSED . P LEASE FOLLOW THE CLASSWORK TO HAVE ALL THE TOPICS FOR PREPARA -
S
TION . TAKE E XRECISE PROBLEMS GIVEN AT THE END FOR YOUR PRACTICE . A PART
N
FROM E XERCISE , YOU CAN FOLLOW ANY REFERENCE BOOK FOR YOUR PRACTICE .
A
H
S OME OF THE SECTIONS / TOPICS IN THESE UNITS ARE PRELIMINARY IDEAS WHICH
ARE BASICS NEEDED TO DO OUR REGULAR COURSE EXAMPLES AND EXERCISES .
IT
H
AT
F
O
S
TE
O
N
E
R
TU
C
LE
Measures of Central Tendency- Table 1
S. No. Attribute Forumula
1. Raw or Ungrouped data Discrete Freq. Dist. Contns. Freq. Dist.
Arithmetic Mean x
n
P n
P n
P n
P n
P
xi fi xi fi xi fi xi fi xi
i=1 i=1 i=1 i=1 i=1
P = P =
n fi N fi N
n
P n
P n
P
di fi di fi di
i=1 i=1 i=1
A+ A+ A+ ×c
S
n N N
N
A -Assumed Mean, di = xi − A, c -common length/width of the class interval
A
Geometric Mean GM
H
IT
n
!1/n n
!1/N n
!1/N
xfi i xfi i
Y Y Y
xi
H
i=1 i=1 AT i=1
P n  P n  P n 
log xi fi log xi fi log xi
F
 i=1
 Antilog  i=1  Antilog  i=1
    
Antilog 
O
P P 
 n   f =N 
i
 f =N  i
S
TE

Harmonic Mean HM
n N N
N
n n n
P 1 P fi P fi
xi xi xi
i=1 i=1 i=1
E
R
TU
C
LE
Measures of Central Tendency- Table 2
S. No. Attribute Forumula
Median
N
If the number of Calculate and Median =
observation is odd, 2 N
−m
see the cumulative l+ 2
×c
the median is the frequency (c.f.) just f
n + 1 th N where l is the lower
size of greater than The
2 2 limit of the median
item or value. corresponding value of class, m is cumula-
If the number of x is median. tive frequency of the
observation is even, pre-median class, f
S
median is the
arithmetic is the corresponding
n th

mean of and frequency of the me-
N
2 dian class and c is the
A
th
n common length/width
+1 item.
H
2 of the class interval.
IT
2.
Quartiles
For a set of obser- Q

iis just greater
than Qi = li +
N +1 N
i× − mi
H
vations arranged in
i value 4
×c
ascending order of 4
AT fi
magnitude Qi = with , i = 1, 2, 3 where li is the lower
th in the cumula-
n+1 limit of the Qi class,
i ob- tive frequency with
F
4 mi is cumulative fre-
n
O
servation or value with X quency of the preceding

N = fi .
, i = 1, 2, 3. Qi is class, fi is the corre-
i=1
S
the ith quartile with n sponding frequency of

TE
being the total number the Qi class and c is the

of observations. The common length/width
O
second quartile is the of the class interval.

N
median.
3.
Mode
The value which oc- In general, the value of Mode=l +

E
curs maximum number x corresponding to the f1 − f0

R
×c
of times in the given maximum frequency. 2f1 − (f0 + f2 )
TU
data. But in any one of the where l is the lower

following cases occurs limit of the modal
C
we obtain the mode by class, f0 is cumula-

LE
method of grouping tive frequency of the

(i) if the maximum pre-modal class, f1
frequency is repeated, is the corresponding
(ii) if the maximum frequency of the modal
frequency occurs in the class, f2 is cumulative
very beginning or at the frequency of the post-
end of the distribution modal class and c is the
and (iii) if there are common length/width
irregularities in the of the class interval.
distribution.
Measures of Dispersion
S. No. Attribute Forumula Co-efficient
L−S
1. L − S (Largest-Smallest)
Range L+S
Q3 − Q1 Q3 − Q1
2.
Quartile Deviation (QD) 2 Q3 + Q1
n
P
|xi − M |
i=1 MD
3.
Mean Deviation (MD) about n M
S
n
P
fi |xi − M |
N
i=1
n M −Mean or Median
Mean, Median and Mode
A
P
fi = N or Mode
i=1
H
IT
v
2
u n 2 n
uP P
xi xi
H
u
u i=1  i=1 
4. u −   Coefficient of
Standard Deviation (SD) t n AT
 n 
v
F
2
u n 2 n
uP P
di di
O
u
√ u i=1  i=1 
SD=σ = V ar t n − n  variation (CV) is
u  
S
TE
v 
2
u n n
O
uP P
f d2 fd
u i=1 i i  i=1 i i  
u 
SD
N
u −

 ×c
  × 100
V ar = V ariance t N
 N  M ean
E
R
A-Assumed Mean, di = xi − A, c-size of the class interval

TU
C
LE
1 Measures of Central Tendency
One of the important objectives of statistics is to find out various numerical values which ex-
plains the inherent characteristics of a frequency distribution. The first of such measures is
averages. The averages are the measures which condense a huge unwieldy set of numerical
data into single numerical values which represent the entire distribution. Averages are also
called measures of location since they enable us to locate the position or place of the distribu-
tion in question. Averages are statistical constants which enables us to comprehend in a single
value the significance of the whole group.
S
Objectives of Central Tendency
N
A
The most important objective of calculating an average or measuring central tendency is to
H
determine a single figure which may be used to represent a whole series involving magnitudes
of the same variable. Second objective is that an average represents the empire data, it facilitates
IT
comparison within one group or between groups of data. Thus, the performance of the members
H
of a group can be compared with the average performance of different groups. Third objective
AT
is that an average helps in computing various other statistical measures such as dispersion,
skewness, kurtosis etc.
F
Different methods of measuring Central Tendency provide us with different kinds of averages.
O
The following are the main types of averages that are commonly used:
S
1. Mean
TE
(a) Arithmetic mean

O
(b) Weighted mean

N
(c) Geometric mean

E
(d) Harmonic mean

R
TU
2. Median
3. Quartiles
C
LE
4. Mode
2 Measures of Dispersion
Measures of central tendency, Mean, Median, Mode, etc., indicate the central position of a
series. They indicate the general magnitude of the data but fail to reveal all the peculiarities
and characteristics of the series. In other words, they fail to reveal the degree of the spread out
or the extent of the variability in individual items of the distribution. This can be explained by
certain other measures, known as ‘Measures of Dispersion’ or Variation.
Simplest meaning that can be attached to the word ‘dispersion’ is a lack of uniformity in the
sizes or quantities of the items of a group or series. According to Reiglemen, “Dispersion is the
extent to which the magnitudes or quantities of the items differ, the degree of diversity.” The
word dispersion may also be used to indicate the spread of the data. In all these definitions, we
can find the basic property of dispersion as a value that indicates the extent to which all other
values are dispersed about the central value in a particular distribution.
Methods of studying dispersion are divided into two types :
1. Mathematical Methods: We can study the ‘degree’ and ‘extent’ of variation by these
methods. In this category, commonly used measures of dispersion are :
(i) Range
S
(ii) Quartile Deviation
N
A
(iii) Average/Mean Deviation about any point (the point may be mean, median, mode or
H
any other)
IT
(iv) Standard deviation and
H
(v) Coefficient of variation. AT
2. Graphical Methods: Where we want to study only the extent of variation, whether it is
higher or lesser a Lorenz-curve is used.
F
O
3 Moments, Measures of Skewness and Kurtosis

S
TE
Definition 3.0.1 (Moments about origin). The r th moment of a random variable X about the
O
origin is defined as E(X r ) and is denoted by µ0r . Moments about origin are known as raw
N
moments.
E
Note:By moments we mean the moments about origin or raw moments.

R
The first four moments about the origin are given by

TU
1. µ01 = E(X)=Mean
C
2. µ02 = E(X 2 )
LE
3. µ03 = E(X 3 )
4. µ04 = E(X 4 )
2
Note: V ar(X) = E(X 2 ) − [E(X)]2 = µ02 − µ01 =Second moment - square of the first
moment.
Definition 3.0.2 (Moments about mean or Central moments). The r th moment of a random
variable X about the mean µ is defined as E[(X − µ)r ] and is denoted by µr .
The first four moments about the mean are given by
1. µ1 = E(X − µ) = E(X) − E(µ) = µ − µ = 0
2. µ2 = E[(X − µ)2 ] = V ar(X)
3. µ3 = E[(X − µ)3 ]
4. µ4 = E[(X − µ)4 ]
Definition 3.0.3 (Moments about any point a). The r th moment of a random variable X about
any point a is defined as E[(X − a)r ] and we denote it by m0r .
The first four moments about a point ‘a’ are given by
1. m01 = E(X − a) = E(X) − a = µ − a
2. m02 = E[(X − a)2 ]
S
3. m03 = E[(X − a)3 ]
N
A
4. m04 = E[(X − a)4 ]
H
IT
Relation between moments about the mean and moments about any arbitrary point a
H
AT
Let µr be the r th moment about mean and m0r be the r th moment about any point a. Let µ be
the mean of X.
F
O
S
∴ µr = E[(X − µ)r ]
TE
= E[(X − a) − (µ − a)]r
= E[(X − a) − m01 ]r
O
= E (X − a)r − r C1 (X − a)r−1 m01 + r C2 (X − a)r−2 (m01 )2 − · · · + (−1)r (m01 )r

N

= E(X − a)r − r C1 E(X − a)r−1 m01 + r C2 E(X − a)r−2 (m01 )2

E
−r C3 E(X − a)r−3 (m01 )3 + r C4 E(X − a)r−4 (m01 )4 − · · · + (−1)r (m01 )r

R
TU
= m0r − r C1 m0r−1 m01 + r C2 m0r−2 (m01 )2 − r C3 m0r−3 (m01 )3 + r C4 m0r−4 (m01 )4

− · · · + (−1)r (m01 )r
C
LE
We define(fix) m00 = 1, then we have
µ1 = m01 − m00 m01 = 0

µ2 = m02 − 2 C1 m01 · m01 + (m01 )2
= m02 − (m01 )2
µ3 = m03 − 3 C1 m02 · m01 + 3 C2 m01 · (m01 )2 − (m01 )3 ·
= m03 − 3m02 · m01 + 2(m01 )3
µ4 = m04 − 4 C1 m03 · m01 + 4 C2 m02 · (m01 )2 − 4 C3 m01 · (m01 )3 + (m01 )4 ·
= m04 − 4m03 · m01 + 6m02 · (m01 )2 − 3(m01 )4
Formulae:
Various Measures of Skewness are

(i) Sk = M ean − M edian
(ii) Sk = M ean − M ode
(iii) Sk = (Q3 − M edian) − (M edian − Q1 )
1. Pearson’s β and γ Coefficients:

µ2 µ4
S
p
β1 = 33 , γ1 = + β1 and β2 = 2 , γ2 = β2 − 3
µ2 µ2
N
A
2. Karl Pearson’s Coefficient of Skewness
H
M ean − M ode x − M ode
IT
Sk = =
SD σ
H
Sometimes the mode may not be properly defined for the given data, in that case
AT
3(M ean − M edian) 3(x − Md )
Sk = =
SD σ
F
O
The limits for the Karl Pearson’s Coefficient of Skewness are ±3. In practice,
these limits are rarely attained.
S
TE
3. Bowley’s Coefficient of Skewness

O
(Q3 − Md ) − (Md − Q1 ) Q3 + Q1 − 2Md

N
Sk = =
(Q3 − Md ) + (Md − Q1 ) Q3 − Q1
E
Note 3.0.1. Note that the range of Bowley’s Coefficient of Skewness is between -1
R
and +1.
TU
4. Coefficient of Skewness (Based on the moments)

C
√
LE
β1 (β2 + 3)
Sk =
2(5β2 − 6β1 − 9)
Skewness is measured by β1 and Kurtosis is measured by β2 .

From Figure 3.1, we observed that curve of the type ‘A’ which is neither flat nor peaked is
called the normal curve or mesokurtic curve and for such a curve β2 = 3, i.e., γ2 = 0.
Curve of the type ‘B’ which is flatter than the normal curve is known as platykurtic and for
such a curve β2 < 3, i.e., γ2 < 0. Curve of the type ‘C’ which is more peaked than the
normal curve is called leptokurtic and for such a curve β2 > 3, i.e., γ2 > 0.
Figure 3.1: Shapes of the Curve showing kurtosis types
S
N
A
H
IT
E XAMPLE 3.1
H
The bus fare of 7 selected B.Sc. students is recorded as follows (Rs.) : 10, 5, 15, 8, 6, 14
AT
and 12. Calculate the arithmetic mean of this data.
Hints/Solution:
F
O
Let the bus fare be denoted by x. First Arrange them in ascending order. Then we have
S
TE
X
Bus Fare x 5 6 8 10 12 14 15 x = 70
O
P
x 70
N
Arithmetic Mean (A.M.) x = = = 10.

n 7
E
R
TU
E XAMPLE 3.2
Calculate Geometric Mean (GM) and Harmonic Mean (HM) for the following data:
C
x 0-10 10-20 20-30 30-40 40-50

LE
f 8 12 18 8 6
Hints/Solution:
C.I. M id x f log x f log x

0 − 10 5 8 0.69897 5.59176
10 − 20 15 12 1.17609 14.1131
20 − 30 25 18 1.39794 25.1629
30 − 40 35 8 1.54407 12.3525
40 − 50 45 6 1.65321 9.91928
T otal 52 6.47028 67.1396
C.I. M id x f 1/x f (1/x)
S
0 − 10 5 8 0.2 1.6
10 − 20 15 12 0.0666667 0.8
N
20 − 30 25 18 0.04 0.72
A
30 − 40 35 8 0.0285714 0.228571
H
40 − 50 45 6 0.0222222 0.133333
IT
T otal 52 0.35746 3.4819
H
 n 
P
n
P 
AT  fi log xi 
 i=1
f log xi

 P
fi =N

 i=1 i
 

GM = Antilog  P
  (OR) 10
f =N  i
F
O
S
[1.2911]
= Antilog [1.2911] (OR) 10 = 19.549968
TE
O
N
N
E
HM = P
n = 14.934354
R
fi
xi
TU
i=1
C
LE
E XAMPLE 3.3
Calculate Mean deviation about mean, median and mode for the following data:
x 0-10 10-20 20-30 30-40 40-50
f 1 3 5 4 2
Hints/Solution:
C.I. M id x f cf d = x − x = Md d1 = x − M0 fd f d1
0 − 10 5 1 1 22 21.7 22 21.7
10 − 20 15 3 4 12 11.7 36 35.1
20 − 30 25 5 9 2 1.7 10 8.5
30 − 40 35 4 13 8 8.3 32 33.2
40 − 50 45 2 15 18 18.3 36 36.6
T otal 15 62 61.7 136 135.1
n
P
fi di
i=1
Mean(x) = A + × c = 27
S
N
N
N
2
−m
× c = 27
A
Second Quartile=Median(Md = Q2 ) = l +
f
H
f1 − f0
IT
Mode(M0 ) = l + × c = 26.66 u 26.7
2f1 − (f0 + f2 )
H
n
P
fi |xi − x|
AT
i=1 136
Mean Deviation (MD) (about mean x) = n = = 9.0666
15
F
P
fi = N
O
i=1
n
P
fi |xi − Md |
S
i=1 136
TE
Mean Deviation (MD) (about median Md ) = n = = 9.0666

P
fi = N 15
O
i=1
n
N
P
fi |xi − M0 |
i=1 135.1
Mean Deviation (MD) (about mode M0 ) = n = = 9.00666
E
P
fi = N 15
R
i=1
TU
C
E XAMPLE 3.4
LE
Calculate Karl Pearson’s, Bowley’s Coefficient of skewness for the following data:
x 0-10 10-20 20-30 30-40 40-50
f 1 3 5 4 2
Hints/Solution:
x − 25
C.I. M id x f cf d= fd f d2
10
0 − 10 5 1 1 −2 −2 4
10 − 20 15 3 4 = Q1 Class −1 −3 3
20 − 30 25 5 9 0 0 0
30 − 40 35 4 13 = Q3 Class 1 4 4
40 − 50 45 2 15 2 4 8
T otal 15 0 3 19
n
P
fi di
S
i=1
Mean(x) = A + × c = 27
N
N
A
N
2
−m
Second Quartile=Median(Md = Q2 ) = l + × c = 27
H
f
IT
f1 − f0
H
Mode(M0 ) = l + × c = 26.66
2f1 − (f0 + f2 ) AT
N
4
−m 3.75 − 1
First Quartile Q1 = l + × c = 10 + × 10 = 19.1666
f 3
F
O
3 N4 − m 11.25 − 9
Third Quartile Q3 = l + × c = 30 + × 10 = 35.625
S
f 4
TE
v 
uP n P n 2
O
fi d2i fi di 
u s 
u 2
  × c =  19 − 3
N
u i=1  i=1  
SD(σ) = u
t N −
 N  
 × 10 = 11.075
  15 15
E
R
TU
1. Karl Pearson’s Coefficient of Skewness

M ean − M ode x − M ode 27 − 26.666
C
Sk = = = = 0.0301
SD σ 11.075
LE
The limits for the Karl Pearson’s Coefficient of Skewness are ±3. In practice, these
limits are rarely attained.
2. Bowley’s Coefficient of Skewness
(Q3 − Md ) − (Md − Q1 ) Q3 + Q1 − 2Md

Sk = =
(Q3 − Md ) + (Md − Q1 ) Q3 − Q1
35.625 + 19.166 − 54 0.791

= = = 0.0481
35.625 − 19.166 16.459
Note 3.0.2. Note that the range of Bowley’s Coefficient of Skewness is between -1 and
+1.
E XAMPLE 3.5
Calculate the Coefficient of skewness based on moments, measures of skewness β1 and
measures of kurtosis β2 for the following data:
S
x 0-10 10-20 20-30 30-40 40-50
f 1 3 5 4 2
N
A
Hints/Solution:
H
IT
H
C.I. M id x f cf d = x − 25 fd
AT f d2 f d3 f d4
0 − 10 5 1 1 −20 −20 400 −8000 160000
10 − 20 15 3 4 −10 −30 300 −3000 30000
20 − 30 25 5 9 0 0 0 0 0
F
30 − 40
O
35 4 13 10 40 400 4000 40000

40 − 50 45 2 15 20 40 800 16000 320000
S
T otal 125 15 0 30 1900 9000 550000

TE
OR
O
x − 25
N
C.I. M id x f cf d= fd f d2 f d3 f d4
10
E
0 − 10 5 1 1 −2 −2 4 −8 16
R
10 − 20 15 3 4 =m −1 −3 3 −3 3
TU
l = 20 − 30 25 5 =f 9 → Median class 0 0 0 0 0
30 − 40 35 4 13 1 4 4 4 4
C
40 − 50 45 2 15 2 4 8 16 32
LE
T otal 125 15 15 0 3 19 9 55
n
P
fi di
i=1
Mean=A + × c = 27
N
N
2
−m
Median = l + × c = 27
f
f1 − f0
Mode=l + × c = 26.66
2f1 − (f0 + f2 )
Moments about any point a = 25 are given below:
n
P
fi di
i=1
m01 = ×c=2
N
n
fi d2i
P
i=1
m02 = × c2 = 126.66667
N
n
fi d3i
P
S
i=1
m03 = × c3 = 600
N
N
n
fi d4i
P
A
i=1
m04 = × c4 = 36666.667
H
N
IT
Using the following relations along with m00 = 1, we get the moments about mean as follows:
H
µ1 = m01 − m00 m01 = 0
AT
µ2 = m02 − 2 C1 m01 · m01 + (m01 )2 m02 − (m01 )2 = 122.66667
F
µ3 = m03 − 3 C1 m02 · m01 + 3 C2 m01 · (m01 )2 − (m01 )3 ·

O
= m03 − 3m02 · m01 + 2(m01 )3 = - 144

S
µ4 = m04 − 4 C1 m03 · m01 + 4 C2 m02 · (m01 )2 − 4 C3 m01 · (m01 )3 + (m01 )4 ·

TE
= m04 − 4m03 · m01 + 6m02 · (m01 )2 − 3(m01 )4 = 34858.667

O
N
µ23
Now, the measure of skewness β1 = = 0.0112343 and the measure of kurtosis β2 =
E
µ32
µ4
R
= 2.3166352
TU
µ22
Since the measure of skewness β1 = 0.0112343 > 0, the distribution is positively skewed
C
LE
and the measure of kurtosis β2 = 2.3166352 < 3, the distribution is platykurtic
1. We can also calculate the Coefficient of Skewness (Based on the moments)

√
β1 (β2 + 3)
Sk = = 0.1119976
2(5β2 − 6β1 − 9)
E XAMPLE 3.6
Calculate the geometric mean of the following data:
x 1 7 29 92 115 375
Hints/Solution:
X
Ans.: log x = 8.9060, N = 6, Antilog(1.4843) = 30.54
E XAMPLE 3.7
x 2574 475 75 5 0.8 0.08 0.005 0.0009
Hints/Solution:
S
X
Ans.: log x = 2.1208, N = 8, Antilog(0.2651) = 18.41
N
A
H
IT
E XAMPLE 3.8
H
x 1000 80 40 750 100 150 AT 120 60
f 1 50 25 2 3 4 3 5
F
Hints/Solution:
O
X
Ans.: f log x = 173.7907, N = 93, Antilog(1.8687) = 73.95
S
TE
O
E XAMPLE 3.9
N

x 0-10 10-20 20-30 30-40 40-50
E
f 5 15 25 35 45
R
TU
Hints/Solution:
C
X
Ans.: f log m = 67.1394, N = 52, Antilog(1.2911) = 19.55
LE
E XAMPLE 3.10
Calculate the harmonic mean of the following data:
x 1 0.5 10 45 175 0.01 4 11.2
Hints/Solution:
X1
Ans.: = 103.4672, N = 8, HM = 0.077
x
E XAMPLE 3.11
x 10 20 25 40 50
f 20 30 50 15 5
Hints/Solution:
X 1
Ans.: f = 5.975, N = 120, HM = 20.08
x
S
E XAMPLE 3.12
N
A
x 0-10 10-20 20-30 30-40 40-50
H
f 8 12 20 6 4
IT
H
Hints/Solution:
X 1 AT
Ans.: f = 3.46, N = 50, HM = 14.45
x
F
O
E XAMPLE 3.13
S
Calculate the mean deviation about (i) mean (ii) median (iii) mode and (iv) coefficients
TE
of mean deviation about mean, median and mode for the following data:
O
x 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100

f 3 8 9 15 20 13 8 4
N
E
Hints/Solution:
R
Ans.:M D(M ean) = 14.1875, M D(M edian) = 14.0625

TU
C
LE
E XAMPLE 3.14
Calculate the range and quartile deviation and the coefficient of quartile deviation for the
following data:
x 0-5 5-10 10-15 15-20 20-30 30-40 40-50 50-60 60-70
f 3 5 8 12 34 46 28 14 10
Hints/Solution:
Hint: To make uniformity, arrange the class intervals as 0-10, 10-20,20-30, . . . by adding the
corresponding frequencies.
Ans.: Range is 70-0=70, Q1 = 23.53, Q2 = M edian = 33.91, Q3 = 44.29, Q.D. =
10.38, Coef f icient = 0.31
E XAMPLE 3.15
Calculate the range, quartile deviation, standard deviation, variance and the coefficient of
quartile deviation, standard deviation and varation for the following data:
x 126-130 131-135 135-140 141-145 146-150 151-155 156-160 161-165

f 31 44 48 51 60 55 43 28
S
Hints/Solution:
N
A
Ans.: Range is 165.5-125.5=40, Q1 = 137.06, Q3 = 153.77, Q.D. = 8.355, x =
H
145.53, S.D. = σ = 10.28 Coefficient of variation=7.06
IT
H
E XAMPLE 3.16
AT
The scores of two players A and B are given below for 12 rounds. Identify the better/-
consistent player.
F
A 74 75 78 72 78 77 79 81 79 76 72 71
O
B 87 84 80 88 89 85 86 82 82 79 86 80
S
TE
Hints/Solution:
O
N
Hint: Use the coefficient of variation for this problem. A player having less coefficient of
variation will be consistent player
E
Ans.: x = 76, y = 84, S.D. = σA = 3.082, S.D. = σB = 3.215, Coefficient of

R
variation for A=4.055 and Coefficient of variation for B=3.827

TU
C
E XAMPLE 3.17
LE
Calculate Karl Pearson’s, Bowley’s Coefficient of skewness, measures of skewness β1

and measures of kurtosis β2 for the following data:
x Below 200 200-400 400-600 600-800 800-1000 Above 1000
f 25 40 85 75 16 16
Hints/Solution:
Ans.: N = 261, Q1 = 400.59, Q2 = M edian = 554.12, Q3 = 722
E XAMPLE 3.18
Calculate Karl Pearson’s, Bowley’s Coefficient of skewness, measures of skewness β1
and measures of kurtosis β2 for the following data:
x 0-10 10-20 20-30 30-40 40-50 50-60
f 5 20 15 45 10 15
Hints/Solution:
X X X
Ans.: N = 100, f d = −50, f d2 = 170, f d3 = −260, m01 = −5, m02 = 170, m03 = −2
S
P RACTICE MORE PROBLEMS ON SOME OF THE REFERENCE BOOKS .
N
A
Acknowledgement:
H
Some of the portions of this material are taken from the sources available from various sources.
IT
I thank the authors for those who prepared the calculus books and related materials.
H
Contact: (+91) 979 111 666 3 (or) athithan.s@ktr.srmuniv.ac.in
AT
Visit: https://sites.google.com/site/lecturenotesofathithans/home
F
O
S
TE
O
N
E
R
TU
C
LE

Stat For Knowledge

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Stat For Knowledge

Hochgeladen von

Copyright:

Verfügbare Formate

Lecture Notes

15MA305-Statistics for Information Technology

Faculty of Engineering and Technology

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

3 Moments, Measures of Skewness and Kurtosis 7

3. Raw or Ungrouped data Discrete Freq. Dist. Contns. Freq. Dist.

For a set of obser- Q

servation or value with X quency of the preceding

the ith quartile with n sponding frequency of

being the total number the Qi class and c is the

second quartile is the of the class interval.

The value which oc- In general, the value of Mode=l +

curs maximum number x corresponding to the f1 − f0

data. But in any one of the where l is the lower

we obtain the mode by class, f0 is cumula-

method of grouping tive frequency of the

A-Assumed Mean, di = xi − A, c-size of the class interval

(a) Arithmetic mean

(b) Weighted mean

(c) Geometric mean

(d) Harmonic mean

3 Moments, Measures of Skewness and Kurtosis

Note:By moments we mean the moments about origin or raw moments.

The first four moments about the origin are given by

2. m02 = E[(X − a)2 ]

= E (X − a)r − r C1 (X − a)r−1 m01 + r C2 (X − a)r−2 (m01 )2 − · · · + (−1)r (m01 )r

= E(X − a)r − r C1 E(X − a)r−1 m01 + r C2 E(X − a)r−2 (m01 )2

−r C3 E(X − a)r−3 (m01 )3 + r C4 E(X − a)r−4 (m01 )4 − · · · + (−1)r (m01 )r

= m0r − r C1 m0r−1 m01 + r C2 m0r−2 (m01 )2 − r C3 m0r−3 (m01 )3 + r C4 m0r−4 (m01 )4

We define(fix) m00 = 1, then we have

µ1 = m01 − m00 m01 = 0

Various Measures of Skewness are

1. Pearson’s β and γ Coefficients:

3. Bowley’s Coefficient of Skewness

(Q3 − Md ) − (Md − Q1 ) Q3 + Q1 − 2Md

4. Coefficient of Skewness (Based on the moments)

Skewness is measured by β1 and Kurtosis is measured by β2 .

Figure 3.1: Shapes of the Curve showing kurtosis types

Arithmetic Mean (A.M.) x = = = 10.

x 0-10 10-20 20-30 30-40 40-50

C.I. M id x f log x f log x

C.I. M id x f 1/x f (1/x)

Mean Deviation (MD) (about median Md ) = n = = 9.0666

1. Karl Pearson’s Coefficient of Skewness

2. Bowley’s Coefficient of Skewness

(Q3 − Md ) − (Md − Q1 ) Q3 + Q1 − 2Md

35.625 + 19.166 − 54 0.791

35 4 13 10 40 400 4000 40000

T otal 125 15 0 30 1900 9000 550000

µ3 = m03 − 3 C1 m02 · m01 + 3 C2 m01 · (m01 )2 − (m01 )3 ·

= m03 − 3m02 · m01 + 2(m01 )3 = - 144

µ4 = m04 − 4 C1 m03 · m01 + 4 C2 m02 · (m01 )2 − 4 C3 m01 · (m01 )3 + (m01 )4 ·

= m04 − 4m03 · m01 + 6m02 · (m01 )2 − 3(m01 )4 = 34858.667

and the measure of kurtosis β2 = 2.3166352 < 3, the distribution is platykurtic

1. We can also calculate the Coefficient of Skewness (Based on the moments)

Calculate the geometric mean of the following data:

x 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100

Ans.:M D(M ean) = 14.1875, M D(M edian) = 14.0625

x 126-130 131-135 135-140 141-145 146-150 151-155 156-160 161-165

Ans.: x = 76, y = 84, S.D. = σA = 3.082, S.D. = σB = 3.215, Coefficient of

variation for A=4.055 and Coefficient of variation for B=3.827

Calculate Karl Pearson’s, Bowley’s Coefficient of skewness, measures of skewness β1