You are on page 1of 52

DESCRIPTIVE STATISTICS

PRESENTATION OF DATA
Data are collections of any number of related observations. A collection of data is called a data
set and a single observation a data point.
Raw data - the data which have not been arranged and analysed is called raw data.
Structured data - data which are arranged in a systematic manner where from some inference
can be drawn.
Types of data : Data can come from actual observtions or from records that are kept for
normal purposes. There are two types of data :
Primary data Primary data are collected through first hand investigation. When the data
required for a particular study can be found neither in the internal records of the enterprise, nor in
the published sources, it becomes necessary to collect original data by conducting first hand
investigation. Data so collected are called primary data.
There are two methods of collecting primary data :
(i)Questioning
(ii)Observation
Secondary data The data which have already been collected by others.
Why to arrange data ?
For data to be useful, the observations must be organised properly so that the pattern could
be understood and a logical conclusion could be drawn.
Data can come from actual observtions or from records that are kept for normal purposes.
Data can assist decision makers :
1

In educated guesses - Knowledge of trends from past experience to plan in advance.

The marketing survey may reveal that the product is preferred by suburban
community, average incomes, and average education. So, the products advertising can
cover this target audience. If hospital records show that more patients used the x-ray
facilities in June than in January, the hospital personnel division should determine

whether this was accidental to this year or an indication of trend, and perhaps it should
adjust its hiring accordingly.
Classification of data
Types of classification :
Data can be classified on the following four basis :
1.Geographical As for example, area wise ( states, cities, districts etc.)
2.Chronological As for example, on the basis of time
Year
Sales of the company
2012
Rs.30 crores
2011
Rs.39 crores
2010
Rs.29 crores
3.Qualitative Data are classified on the basis of some attribute or quality, such as literacy,
religion, sex etc. In this type of classification, the attribute under study can not be measured. For
example, if the attribute under study is Blindness, we may find out how many persons are blind
in a given population. It is not possible to measure the degree of blindness in each case.
Populatio
n

Male

Literate

Female

Illiterate

Literate

Illiterate

4.Quantitative Quantitative classification refers to the classification of data according to some


characteristics that can be measured, such as height, weight, income, sales etc.
No. of children
0
1
2

No. of families
10
400
800

Age
20-25
25-30
30-35

No. of employees
10
15
40

Formation of frequency distribution For forming a frequency distribution table, we are to count
the number of times a particular value is repeated which is called frequency of the class. In order
to facilitate counting, a column of tally is prepared. In another column, all possible values of
variables are placed from the lowest to the highest. Then a bar(vertical line) is put opposite the

particular value to which it relates. To facilitate counting, blocks of five bars are prepared and
some space is left in between each block. We finally count the number of blocks and bars
corresponding to each value of the variable and place it in the column of frequency. The process
shall be clear from the following example of number of refrigerators sold on 20 working days by
a company :
23,
30,
20,
26,
30,
30,
20,
23,
40,
40,
26,
20,
23,
40,
28,
26,
23,
30,
40,
28,
28,
30,

FREQUENCY DISTRIBUTION OF THE NUMBER OF REFRIGERATORS SOLD


No. of refrigerators
Tally Bars
Frequency
No. of days
20
III
3
23
IIII
4
26
III
3
28
III
3
30
IIII
5
40

IIII

4
T 22

OTAL

Classification according to class-intervals


The following technical terms are important when data are classified according to class
intervals :
Class Limits The class limits are the lowest and highest values that can be included in the
Class. Let us take the r example of the class 20 40. The lowest value of this class is 20 and the
highest value is 40. The two boundaries of the class are known as the lower limit and upper limit
of the class.
Class-intervals The span of class, that is, the difference between the upper limit and lower
limit is called class interval. For example, in the class 20 40, the class-interval is 20.
Class frequency The number of observations corresponding to the particular class is known as
the frequency of that class or the class frequency.
Class mid-point It is the value lying half-way between the lower and upper class limits of a
class interval. Mid-point of a class is ascertained as follows :
Mid-point of a class = (Upper limit of the class + Lower limit of the class) 2
Tabulation of data
One of the simplest and most revealing devices for summarizing data and presenting them in
meaningful way is the statistical table. A table is a systematic arrangement of statistical data in

columns and rows. The purpose of a table is to simplify the presentation and to facilitate
comparisons.
Parts of a Table :
1.Table number Each table should be numbered. The number may be given either in the centre
at the top above the title or in the side of the table at the top or at the bottom of the table on the
left hand side.
2.Title of the table Every table must have a suitable title which should describe the content of
the table. A complete title has to answer :
(i)What precisely are the data in the table?
(ii)Where the data occurred?
(iii)When the data occurred?
3.Caption - The caption refers to the column headings. It explains what the column represents.
4.Stub Stubs are the designation of the rows or row headings.
5.Body of the table The body of the table contains the numerical informations. Data presented
in the body are arranged according to descriptions are classification of the captions and stub.
Headnote - It is a brief explanatory statement applying to all or major part of the material in the
table, and is placed below the title entered and enclosed in brackets.
Footnote - Anything in a table which the reader may find difficult to understand from the title,
captions and stub should be explained in footnotes.
Types of table : (i)Simple and complex table In a simple table, only one characteristic is shown.
Hence this type of table is known as one-way table. In a complex table, on the other hand, two or
more characteristics are shown.
Example of simple or one-way table is shown below :
Age (In years)
No. of Employees
Below 25
50
25 35
67
35 45
43
45 55
15
55 and above
5
Total 180
:

Example of complex or Two-way table is shown below :


Age (In years)
Employees
Male
Female
Below 35
32
18
25 35
40
27
35 45
25
18
45 55
10
5
55 and above
5
Total 112
68

Total
50
67
43
15
5
180

Charting data
One of the most convincing and appealing ways in which data may be presented is through
charts. Evidence of this can be found in the financial pages of newspapers, journals,
advertisements etc. The pictorial presentation helps in quick understanding of the data. Through
pictorial presentation data can be presented in an interesting form.
Types of Diagrams
1.One-dimensional diagram, e.g., Bar diagrams
2.Two-dimensional diagrams, e.g., Rectangles, Squares and Circles
3.Pictograms and Cartograms
Types of Bar Diagrams :
Simple Bar Diagram
Subdivided Bar Diagram
Multiple Bar Diagram

1Simple Bar Diagram :

Funds Flow (Rs. In crores)


140
120
100
80

Funds Flow (Rs. In


crores)

60
40
20
0

2Sub-divided bar diagrams : Funds provided by different Banks to a company


Year
HDFC Bank (In AXIS
Bank(In HSBC
crores)
crores)
crores
2008-2009
85
50
62
2009-2010
70
100
82
2010-2011
93
83
78
300
250
200
150
100
50
0

2010-2011
2009-2010
2008-2009

Bank(In

3Multiple Bar Diagrams


Sales of product TV
Sales of AC
Sales of washing machine

2008-2009
80
100
178

2010-2011
98
126
205

250
200
150
2008-2009
2010-2011

100
50
0
Sales of product TV

Sales of AC

Sales of washing machine

Pie Diagram This type of diagram enables us to show the partitioning of a total into
component parts.In constructing a pie chart, the first step is to prepare the data so that the
various component values can be transposed in a series. The market share of different
companies are : Samsung 25%, Technip 60%, Hitachi 10%, LG -5%. The Pie chart for
this is shown below :

Market share of the company


Samsung
Technip
Hitachi
LG

Line graphs
When we observe the values of a variable at different points of time, the series so formed is
known as time series. The technique of graphic presentation is extremely helpful in analysing
changes at different points of time.
Illustration. The following data relate to imports of steel pipes by IOCL ;
Year
:
2000 2001 2002 2003 2004 2005
Imports
:
2
3
2.8
4.2
6.7
8.5
(In Million Tonnes)

Imports of steel pipes(In Million Tonnes)


9
8
7
6
5
4
3
2
1
0
1999 2000 2001 2002 2003 2004 2005 2006

Imports of steel
pipes(In Million
Tonnes)

Histograms
A Histogram is a graphical method for presenting data, where the observations are located on
a horizontal axis (Usually grouped into intervals) and the frequency of those observations is
depicted along the vertical axis.
The histogram is most widely used for graphical presentation of a frequency
distribution.The histograms should be clearly distinguished from a bar diagram. The
distinction lies in the fact that whereas a bar diagram is one dimensional, i.e. only the
length of the bar is the material not the width; a histogram is two-dimensional, i.e. in a
histogram both the length as well as the width are important.
Frequency Polygon
A frequency polygon is a graph of frequency distribution. It is prticularly effective in
comparing two or more frquncy distribution.There are two ways in which a frequency
polygon may be constructed :
1.We may draw a histogram of the given data and then join by straight lines the mid-points of
upper horizontal side of each rectangle with the adjacent rectangle. The figure so formed is
called the frequency polygon.
2.Another method of constructing frequency polygon is to take the mid-points of the various
class-intervals and then plot the frequency corresponding to each point and to join all these
points by straight lines.In this method, we do not have to construct a histogram.
By constructing a frequency polygon the value of mode can be easily ascertained. If
from the apex of the polygon a perpendicular is drawn on the X-axis, we get the value of
mode.

#1. Prepare a frequency distribution


by the students in an exam. :
15
45
40
42
50
75
75
80
81
25
31
45
42
43
55
60
62
58
69
70
75
62
62
65
60

for the folowing observations on Marks obtained


60
26
56
45
70

62
31
78
50
35

68
32
80
56
37

70
78
81
72
40

42
45
62
58
55

SOLUTION :
Marks
15-25

FREQUENCY DISTRIBUTION
Tallies
Frequency
I
1

25-35

IIII

35-45

IIII III

45-55

IIII I

55-65

IIII IIII

65-75

IIII II

75-85

IIII IIII

IIII

14
7
9
TOT 50

AL :
#2. Classify the following data by taking class interval such that their mid-values are 17, 22,
27, 32, and so on.
30
42
30
54
40
48
15
17
51
42
25
41
30
27
42
36
28
26
37
54
44
31
36
40
36
22
30
31
19
48
16
42
32
21
22
46
33
41
21
SOLUTION : Since we are to classify the data in such a way that the mid-values are 17,
22, 27, 32, and so on, the first class should be 15-19 (Mid-value = (15+19) 2 = 17), the
second class 20-24 etc.
Frequency Distribution
Variables
Tallies
Frequency
15-19
IIII
4
20-24

IIII

25-29

IIII

30-34

IIII III

35-39

IIII

40-44

IIII IIII

45-49

III

50-54

III

3
3
Tot 39

al :

#3. The data given below relate to the height and weight of 20 persons. You are required to
form a two-way frequency table with class intervl 62 to 64, 64 to 66 and so on and 115
to 125 lb, 125 to 135 lb, etc.
Sl.No.
Weight
Height
Sl.No.
Weight
Height
1
2
3
4
5
6
7
8
9
10

170
135
136
137
148
121
117
128
143
129

70
65
65
64
69
63
65
70
71
62

11
12
13
14
15
16
17
18
19
20

163
139
122
134
140
132
120
148
129
152

70
67
63
68
67
69
65
68
67
69

SOLUTION:
As per the requirement of the question, the population is to be divided into five classes according
to the height of the persons included in each group and six classes according to the weight. Thus,
there will be 5 x 6 = 30 cells.
For tabulating the information in appropriate cells, first, the raw to which the height
measurement (say X) should belong is determined. Afterwards on consideration of the weight
(say Y), the column in which it should be included is determined. The tabulation is recorded by
Tally Bars. Thus the two-way table shall be prepared as follows :
TWO-WAY FREQUENCY TABLE SHOWING WEGHT AND HEIGHT OF 20 PERSONS
Weight in 115 - 125 125 - 135 135 - 145 145 - 155 155 - 165 165 - 175 Total
lbs.(Y)

Height in
Inches(X
)

62 - 64
64 66
66 68
68 70
70 -72
Total

II (2)
II (2)

I (1)
I (1)
II (2)
I (1)
5

III (3)
II (2)
I (1)
6

I (1)
II (2)
3

I (1)
1

I (1)
1

3
5
4
4
4
20

#4. The following table gives the birth rate per thousand of different countries over a
certain period :
Country
Birth rate
India
33
Germany
16
U.K.
20
China
40
New Zealand
30
Sweden
15

Solution :

Birt rate
45
40
35
30
25
20
15
10
5
0

Birt rate

#5. The production of steel by Govt. sector and Private sector are given below. Represent
the data by sub-divided bar diagram.
Year

Govt.

Private

1996-97
1997-98
1998-99
19992000
2000-01
2001-02
2002-03

Sector
400
370
550

Sector
150
75
270

620
710
780
600

330
440
500
410

1400
1200
1000
800
600

Private Sector

400

Govt. Sector

200
0

#6. Draw a multiple bar diagram from the following data ;


Year
Sales
Gross Profit
(Rs. In lacs)
(Rs. In lacs0
2008
120
40
2009
135
45
2010
140
55
2011
150
60
2012
160
65

Net profit
(Rs. In lacs)
20
30
35
40
45

160
140
120
100

Sales(Rs. In lacs)
Gross Profit(Rs. In lacs)

80

Net profit (Rs. In lacs)

60
40
20
0
1

#7. Draw a Pie diagram for the following data of sixth Five-Year Plan Public Sector
outlays :
Agriculture and Rural Development
: 12.9 %
Irrigation
: 12.5 %
Energy
: 27.2 %
Industry and Minerals
: 15.4 %
Transport, communication
: 15.9 %
Social Services and others
: 16.1 %
Solution :
The Angle at the centre is given by
Percentage outlay
x 360 = Percentage outlay x 3.6
100
COMPUTATION FOR PIE-DIAGRAM
Sector
Percentage Angle outlay
Agriculture and rural deelopment
12.9
12.9 x 3.6 = 46 deg.
Irrigation
12.5
12.5 x 3.6 = 45 deg.
Energy
27.2
27.2 x 3.6 = 98 deg.
Industry and Minerals
15.4
15.4 x 3.6 = 56 deg.
Transport, communication
15.9
15.9 x 3.6 = 57 deg.
Social services and others
16.1
16.1 x 3.6 = 58 deg.
Tota 100
360 deg.
l:

Percentage

Agriculture
development;
13%
Social
services and
and rural
others;
16%
Irrigation; 13%
Transport, communication; 16%
Energy;
27%
Industry and minerals;
15%

#8. Draw the histogram and frequency polygon from the following data :
Marks
Number of students
0-10
4
10-20
6
20-40
14
40-50
16
50-60
14
60-70
8
70-90
16
90-100
5

MEASURE OF CENTRAL TENDENCY


Objectives : Even after the data have been collected and tabulated, one may find too much
details presented in the table for many uses.We, therefore, need the furthe analysis of
tabulated data. One of the powerful tools of analysis is to calculate a single average value
that represents the entire mass of the data. Such a value is neither the smallest nor the largest
value. For this reason, an average is frequently referred to as a measure of central tendency or
central value.
Types of Average :
Arithmatic mean : (i) Simple, and (ii) Weighted
Median
Mode
Geometric mean

Harmonic mean

Arithmatic Mean : The most popularly used measure for representing the entire data by one
value is what laymen call is average and what statisticians call is arithmatic mean.
CALCULATION OF ARITHMATIC MEAN :
A CALCULATION OF SIMPLE ARITHMATIC
OBSERVATIONS
Direct method
x = (x) N
Where
x = Values of observations
N = Number of observations

MEAN

INDIVIDUAL

Short-cut method
x = A + (d) N
Where
d=xA
A = Assumed mean
#1. The following table gives the monthly income of 10 employees in an office ;
Income Rs. : 1780 1760 1690 1750 1840 1920 1100 1810 1050 1950
Calculate the arithmatic mean of incomes.
Solution : (Direct Method)
Calculation of Arithmatic Mean
Employee
Monthly Income(Rs.)
1
1780
2
1760
3
1690
4
1750
5
1840
6
1920
7
1100
8
1810
9
1050
10
1950
N=10
X = 16650
X = (X) /N = 16650 / 10 = 1665
Short-cut method

x = A + (d) N
Where
d=xA
A = Assumed mean
Employee
1
2
3
4
5
6
7
8
9
10
N=10

Income
1780
1760
1690
1750
1840
1920
1100
1810
1050
1950

x = A + (d) N
Let A = 1800,
d = - 1350,

X = 1800 + (-1350) 10
= 1800 135 = 1665

(X 1800) = d
- 20
- 40
- 110
- 50
+ 40
+ 120
- 700
+10
- 750
+150
d = -1350

N=10

B FOR UNGROUPED DATA [ Discrete series]


Direct method
x = (f.x) N
Where
f = Frequency
x = The variable in question
N = Total number of observations

Short-cut method
x = A + (f.d) N
Where
d=xA
A = Assumed mean
N = Total number of observations i.e.f

#2. From the following data of the marks obtained by 60 stuents of a class, calculate the
arithmatic mean :
Marks
No. of sudents
Marks
No. of students
20
8
50
10
30
12
60
6
40
20
70
4
Solution :
By Direct method :
Marks
x
20
30
40
50
60
70

No. of students
f
8
12
20
10
6
4
N=60

f.x
160
360
800
500
360
280
fx=246
0

X 40 =
d
-20
-10
0
+10
+20
+30

x = (f.x) N = 2460/60 = 41
By Short-cut method :
Let Assumed Mean, A = 40
X = A + (f. d) N = 40 + 60/60 =40 + 1 = 41
C FOR GROUPED DATA [Continuous series]
1 Direct Method (For grouped data)
x = ( f. m) N
Where
x = Sample arithmetic mean
f = The frequency of each class
m = Mid-point of the class
N = The total frequency

2 Short-cut method (For grouped data)


x = A + (f.d ) N
Where

f.d
-160
-120
0
+100
+120
+120
f.d =
60

A = Assumed mean
m = Mid point of the class
d = (m A)
N = Total number of observations
#3. From the following data compute arithmetic mean by Direct method and Short-cut
method :
Marks
0-10
10-20
20-30
30-40
40-50
50-60
No. of
5
10
25
30
20
10
students
Solution :
By direct method
Marks

Mid-point
m
5
15
25
35
45
55

0-10
10-20
20-30
30-40
40-50
50-60

No. of students
f
5
10
25
30
20
10
N=100

f.m
25
150
625
1050
900
550
f.m =3300

x = ( f. m) N
= 3300 / 100 = 33
By short-cut method
Marks
0-10
10-20
20-30
30-40
40-50
50-60

Mid-point
m
5
15
25
35
45
55

No. of students
f
5
10
25
30
20
10
N=100

x = A + (f.d ) N = 35 (200/100) = 35 2 = 33

(m 35)
D
-30
-20
-10
0
+10
+20

f.d
-150
-200
-250
0
+200
+200
fd = -200

CALCULATION OF ARITHMATIC MEAN IN CASE OF OPEN-END CLASSES


Open-end classes are those in which lower limit of the first class and the upper limit of the
last class are unknown. In such a case, we can not find out the arithmetic mean unless we
make an assumption about the unknown limits. The assumption would naturally depend upon
the class interval following the first class and preceding the last class. For example, observe
the following data :
Marks
Below 10
10-20
20-30

No. of students
4
6
10

Marks
30-40
40-50
Above 50

No. of students
15
8
7

In the above case, since the class interval is uniform, the appropriate assumption would be
that the lower limit of the first class is zero and the upper limit of the last class is 60. The first
class thus would be 0-10 and the last class 50-60.
COMBINED MEAN OF TWO GROUPS
x12 = (N1 x1 + N2 x2 ) (N1 + N2 )
Where
x1 = Mean of Ist group
N1 = No. of observations of Ist group

x2 = Mean of second group


N2 = No. of observations of second group

#4. The mean height of 25 male workers in a factory is 61 inches and the mean height of
35 female workers in the same facotry is 58 inches. Find the combined mean height of
60 workers in the factory.
Solution :
x12 = (N1 x1 + N2 x2 ) (N1 + N2 )
N1 = 25,
X1 =61,
N2 = 35,
X2 = 58
X12 = [ (25 x 61) + (35 x 58) ] (25 + 35)
= [ 1525 + 2030 ] 60
= 3555 60
= 59.25
Thus the combined mean height of 60 workers is 59.25 inches.

WEIGHTED MEAN METHOD


Sometimes relative importance of different observations are not the same, we then
compute the weighted mean.

Weighted mean takes into account the importance of each value to the overall total.

WX = (w.x) w
Where
x = Value of each element
w = Weight assigned to each observation

Simple arithmatic mean :


Bombay : x = x/n = 432/6 = 72
Kolkata : x = x/n = 432/6 = 72
Chennai : x = x/n = 432/6 = 72
Weighted arithmetic mean :
Bombay : xw = wx / w =1451/20 = 72.55
Kolkata : xw = wx / w =1977/28 = 70.61
Chennai : xw = wx / w = 1513/21 = 72.05
The arithmatic mean is the same for all the three universities i.e.72 and hence it may be
concluded that the performance of students is alike. But this will be a wrong conclusion
because what we should compare here is the weighted arithmetic mean. On comparing the
weighted arithmetic mean we find that for bombay the mean value is the highest and hence
say the performance of students of bombay university is best

CONCEPT OF MEDIAN
The median is a measure of central tendency. The Median by definition refers to the middle
value in a distribution. Half of the items lie above this point, and the other half lie below it.
As distinct from the arithmatic mean which is calculated from the value of of every item in
the series, the median is called a positional average. The term position refers to the place of a
value in a series. The place of of the median in a series is such that an equal number of items
lie on either side of it.
CALCULATING THE MEDIAN FROM UNGROUPED DATA INDIVIDUAL
OBSERVATIONS
To find the median of a data set :
1 Arrange the data in asscending or descending order of magnitude
2 If the data set contains an odd number of items, the middle item of the array is the
median.
3 If there is an even number of items, the median is the average of the two middle items.
n+1
Median = (
) th item in a data array
2
COMPUTATION OF MEDIAN - DISCRETE SERIES
STEPS :
1.Arrange the data in asscending or descending order of magnitude.
2.Find out the cumulative frequencies
3.Apply the formula :
n+1
Median = Size of (
)
2
4.Now look at the cumulative frequency column and find that total which is either equal to
n+1
2

or next higher to that and determine the value of the variable corresponding to it.

That gives the value of the median.


#1. From the following data find the value of median ;
Income(Rs.)
1000 1500 800
2000 2500 1800
No. of persons 24
26
16
20
6
30
Calcuation of median
Income arranged in
No. of
Cumulative
ascending order
persons
Frequency
f
c.f
800
16
16

1000
1500
1800
2000
2500

Median = Size of

24
26
30
20
6

n+1
2

40
66
96
116
122

) th item

= (122 + 1)/2 = 61.5 th item


Size of 61.5th item = 1500.
So, Median = 1500
COMPUTATION OF MEDIAN - CONTINUOUS SERIES
STEPS :
1 Determine the particular class in which the value of median lies. Use N/2 as the rank of
the median and not (N+1)/2.
2 Use following formula to determine the exact value of median :
N
p.c.f .
2
Median = L +
xi
f
L = Lower limit of the median class i.e. the class in which the middle item of the
distribution lies
p.c.f. = Preceding cumulative frequency to the median class
f = Simple frequency of the median class
i = The class interval of the median class.
#2. Calculate the median from the following data :
Marks
No. of students
Less than 5
29
Less than 10
195
Less than 15
241
Less than 20
117
Less than 25
52
Less than 30
10
Less than 35
6
Less than 40
3
Less than 45
2
Solution :
Calculation of Median

Marks

No. of students
f
29
195
241
117
52
10
6
3
2

0-5
5-10
10-15
15-20
20-25
25-30
30-35
35-40
40-45

Median = Size of

N
2

c.f.
29
224
465
582
634
644
650
653
655

th item = 655/2=327.5 th item

Median lies in the class 10-15


N
p.c.f .
2
Median = L +
f

xi

L = 10,
N/2 = 327.5
p.c.f = 224
i=5
Median = 10 + [ (327.5 224)/241] x 5
= 10 + [(103.5)/241] x 5
= 10 + 0.429 x 5
= 10 + 2.145 = 12.145
#3. An incomplete distribution is given below :
Variable :
0-10 10-20 20-30 30-40 40-50 50-60 60-70
Frequency:
10
20
?
40
?
25
15
i
ii

You are given that the median value is 35. Find out missing frequency, given that
the total frequency is 170
Calculate the arithmatic mean of the completed table

Solution :
Let the missing frequency of the class 20-30 is f1 and that of 40-50 is f2
The total frequency of the casses = 170
Therefore, 170 = 10 + 20 + f1 + 40 + f2 + 25 + 15
Or, 170 = 110 + f1 + f2
Hence, f1 + f2 = 60 . (1)

Median =

L +

N
p.c.f .
2
f

xi

Median = Size of N/2 th item = 170/2 = 85th item


We are given Median = 35
Hence, it must lie in the class 30 40
Variable
Frequency, f
0-10
10
10-20
20
20-30
f1
30-40
40
40-50
f2
50-60
25
60-70
15
To N=170
tal

c.f.
10
30
30+f1
70+f1
70 + f1 + f2
95 + f1 + f2
110 + f1 + f2

Therefore, 35 = 30 + [{(170)/2 (30 +f1)}/ 40 ] x 10


35 = 30 +[ {55 f1}/40 ] x 10
5 = [55 f1]/4
f1 = 55 - 20
f1 = 35
Since, f1 + f2 = 60 , from equation (1) above
Therefore, f2 = 60 35 = 25

CALCULATION OF ARITHMATIC MEAN


x = A + (f.d ) N
Where
A = Assumed mean = 35 (Suppose)
m = Mid-point of the class
d = (m A)
N = Total number of observations

Variable

0-10
10-20

10
20

Mid-point
m
5
15

(m 35)
d
5 - 35 = - 30
15 - 35 = -20

f.d
-

300
400

20-30
30-40
40-50
50-60
60-70

35
40
25
25
15
N=170

25
35
45
55
65

25 35 = - 10
35 35 = 0
45 35 = 10
55 35 = 20
65 35 = 30

350

0
+ 250
+ 500
+ 450
150

x = A + (f.d ) N
= 35 + 0.882 = 35.882
QUARTILES, DECILES, PERCENTILES
Besides median, there are other measures which divide a series into equal number of parts.
Important amongst these are Quartiles, Deciles and Percentiles.
Quartiles are those values of the variate which divide the total frequency into four equal
parts.
Deciles divide the total frequency into 10 equal parts.
Percentiles divide the total frequency in 100 equal parts.
Just as one point divides a series into two parts, three points would divide it into four parts, 9
points into 10 parts and 99 points into 100 parts, consequently there are only 3 Quartiles, 9
Deciles and 99 Percentiles for a series. The quartiles are denoted by symbol Q, deciles by D
and percentiles by P. The subscript 1,2,3 etc., beneath Q, D, P would refer to the particular
value that we want to compute. Thus Q1 would refer to first quartile, D1 first decile, P1 first
percentile.
COMPUTATION OF QUARTILES, DECILES, PERCENTILES :
The procedure for computing quartiles, deciles, percentiles is the same as for median.
For grouped data, the following formulae are used for quartiles, deciles, and percentiles :

Qj = L

DK = L

Pm = L

jN
p.c.f .
4
f

xi

kN
p . c . f .
10
f

xi

for K = 1,2,3,..

mN
p.c.f .
100
f

xi

for m = 1,2,3,..

L = Lower limit of the class

for j = 1,2,3,..

f = Frequency of the class


pcf = Cumulative frequency of the previous class
N=Number of observations
i=Width of the class
#4. The profits earned by 100 companies during 2003-04 are given below :
Profits (Rs.lakhs)
No. of companies
Profits(Rs. lakhs)
No. of companies
20-30
4
60-70
15
30-40
8
70-80
10
40-50
18
80-90
8
50-60
30
90-100
7
Calculate Q1, Q2, D4 and P80 and interpret the values.
Solution :
CALCULATION OF Q1, Q2, D4 and P80
Profits (Rs. lakhs)
f
20-30
4
30-40
8
40-50
18
50-60
30
60-70
15
70-80
10
80-90
8
90-100
7
Q1 = Size of N/4 observation = 100/4 = 25th observation
Hence, Q1 lies in the class 40 50.
jN
1 x 100
p.c.f .
12
4
4
Q1 = L +
x
i
=
40
+
f
18

c.f.
4
12
30
60
75
85
93
100

x 10= 40 +

2512
18

x 10
= 40 + 7.22 = 47.22
That means 25% of the companies earn an annual profit of Rs.47.22 lacs or less.
Q2 = Size of 2N/4 observation = 2x100/4 = 50th observation
Hence, Q2 lies in the class 50 60.
jN
2 x 100
p.c.f .
30
4
4
Q2 = L +
x i = 50 +
x 10
f
30

10

= 50 +

5030
30

= 50 + 6.67 = 56.67
That means 50% of the companies earn an annual profit of Rs.56.67 lacs or less.
D4 = Size of

4N
10

th observation = (4x 100)/10 = 40th observation

D4 lies in the class 50 60.

D4 = L

4N
p . c . f .
10
f

x i = 50 +[ (40 30)/30] x 10 = 53.33

Thus, 40% of the companies earn an annual profit of Rs.53.33 lacs or less
P80 = Size of

80 N
100

th observation = (80 x 100)/100 = 80th observation

P80 lies in the class 70 80

P80 = 70

80 x 100
75
100
10

x 10 = 70 + [(80 75)/10]x 10 = 70 + 5 = 75

Thus, 80% of the companies earn an annual profit of Rs.75 lacs or less and 20% of the
companies earn an annual profit of more than Rs.75 lacs.

Concept of Mode
The mode is another measure of central tendency that is different from the mean but somewhat
like the median. The mode or the modal value is that value in a series which occurs with
highest frequency.

We rarely use the mode of ungrouped data as a measure of central tendency. Table-1, for
example, shows the number of
delivery trips per day made by supplier. The mode or the
modal value is 15 because it occurs more often than any other value (three times).
Table-1:Delivery trips per day in 20 day period
0
0
1

2
2
4

5
5
6

7
7
8

12

15
1515
1515

15
19

A mode of 15 implies that 15 is the most frequent number of trips, but it fails to let us know that
most of the values are under 10.
If we group these data into a frequency distribution as shown in Table -2, we select the class of
4-7 with the most observations as the modal class. This class is more representative of the
delivery trips than the mode of 15 trips per day. For this reason, whenever we use the mode as a
measure of the central tendency of a data set, we should calculate the mode from grouped
data.

Table-2 Frequency distribution of delivery trips


Class in number
of trips
Frequency

0-3

4 -7

8 - 11

12 & above

MODAL CLASS
Calculating the mode from grouped data
1.When data are grouped in a frequency distribution, we must assume that the mode is
located in the class with the highest frequency. To determine a single value for the mode from
this modal class, we use equation below :
M0 = L + {

d1
d 1+d 2

}. i

Where
L = Lower limit of the modal class

d1 = The difference between the frequency of the modal class and the frequency of the preceding
class
d2 = The difference between the frequency of the modal class and the frequency of the
succeeding class
i = Size of the modal class
2.Another form of this formula is :
M0 = L + [( f1 f0 ) / (2f1 f0 f2)] x i
Where
L = Lower limit of the modal class
f1 = Frequency of the modal class
f0 = Frequency of the class preceding the modal class
f2 = Frequency of the class succeeding the modal class
i = Width of the modal class interval

#1. The following data relate to the sales of 100 companies :


Sales(Rs. Lacs)
No. of companies
Sales(Rs. Lacs)
Below 60
12
66 68
60 62
18
68 70
62 64
25
70 72
64 - 66
30

No. of companies
10
3
2

Calculate the value of Modal sales.


Solution :
Since the maximum frequency 30 is in the class 64 66, therefore, 64 66 is the modal class.
L = Lower limit of the modal class = 64
d1 = The difference between the frequency of the modal class and the frequency of the preceding
class = 30 25 = 5
d2 = The difference between the frequency of the modal class and the frequency of the
succeeding class = 30 10 = 20
i = Size of the modal class = 2
d1
Mode = M0 = L + { d 1+d 2 }. i = 64 +[(5)/(5+20)] x 2 = 64 + 0.4 = 64.4

#2. The median and mode of the following wage distribution are Rs.33.5 and Rs.34
respectively. However, three frequencies are missing. Determine their values.
Wages :
0-10 10-20 20-30 30-40 40-50 50-60 60-70 Total
(In hundred Rs.)
Frequencies :
4
16
?
?
?
6
4
230

Solution :
Let the missing frequencies be f0, f1, and f2 corresponding to classes 20-30, 30-40 and 40-50
respectively. Since median and mode are 33.5 and 34, they lie in the class 30-40. The frequency
of this class is f1.
DETERMINING MISSING VALUES
Wages (In hundred Rs.)
Frequency
Cumulative frequency
0-10
4
4
10-20
16
20
20-30
f0
20+f0
30-40
f1
20+f0+f1
40-50
f2
20+f0+f1+f2
50-60
6
226
60-70
4
230
N=230
From the given frequencies in Table above, we can write,
f0 + f1 + f2 = 230 - (4+16+6+4) = 200
Or, f2 = 200 f0 f1 ..(1)
Mode = L +[ ( f1 f0 ) / (2f1 f0 f2)] x i
Therefore, 34 = 30 + [(f1 f0)/(2f1 f0 f2)] x i
Or, 34 30 =[ (f 1 f0)/{2f1 f0 (200 f0 f1)}] x 10 [Putting the value of
200 f0 f1 from Equation (1) above ]
Or, 4/10 = (f1 f0)/(2f1 f0 200 + f0 + f1 )
Or, 4/10 = (f1 f0)/(3f1 200)
Or, 4(3f1 200) = 10(f1 f0)
Or, 12f1 800 = 10f1 10 f0
Or, 2f1 + 10f0 = 800
Or, f1 + 5f0 = 400
Or, f1 = 400 5f0 ..(2)

Median =

L +

N
p.c.f .
2
f

xi

f2 =

Therefore, 33.5 = 30 + [ {230/2 (20 + f0)}/f1] x 10


Or,33.5 30 = [(115 20 f0)/f1] x 10
Or,3.5 f1= 950 10f0
Or,7f1 = 1900 - 20f0

Or, 7(400 -5f0) = 1900 20f0 [ Substituting f1 = 400 5f0 from Equation (2) ]
Or, 2800 35f0 = 1900 20f0
Or, 2800 1900 = -20f0 + 35f0
Or, 900 = 15f0
Or, f0 = 900/15
Or, f0 = 60
Now substituting the value of f0=60 in Equation (2), we get
f1 = 400 5 x 60 = 400 300 = 100
Now, substituting values of f0 = 60 and f1 = 100 in Equation (1), we get
f2 = 200 60 100 = 40
Therefore, f0 = 60, f1 = 100,
f2 = 40

RELATIONSHIP AMONG MEAN, MEDIAN AND MODE

Under peak
Curve

Divides area
in halves

Centre of
Gravity

M0

Me X

M0 : Mode
Me : Median
X : Mean
In moderately skewed or asymmetrical distributions a very important relationship exists among
mean, median and mode. In such distributions the distance between the mean and median is
about one-third the distance between mean and the mode.
Karl Pearson has expressed this relationship as follows :
Mode = Mean - 3[Mean - Median]

Mode = 3 Median 2 Mean

#4.In a moderately asymmetrical distribution, the mode and the mean are 32.1 and 35.4
respectively. Find the value of Median.
Solution:
Mode = 3 Median 2 Mean
Given Mean = 35.4,
Mode = 32.1
Therefore, 32.1 = 3 x Median - 2 x 35.4
Or, 3 Median = 32.1 + 70.8 = 102.9
Or, Median = 34.3

THE GEOMETRIC MEAN


When we deal with quantities that change over a period of time, we need to know an
average rate of change, such as an average growth rate over a period of several years. In such
cases, the simple arithmatic mean is inappropriate, because it gives the wrong answers. What
we need to find is the geometric mean, simply called G.M.
G.M. = [Product of all X values]1/n , where n = Number of X values
We use the geometric mean to show multiplicative effects over time in compound interest and
inflation calculations. The geometric mean is to be used to calculate the average percentage
change in some variable (sales, production, population , or other business data) over time.
# . The growth of bad-debt expense of Montari Industries Ltd. over the last few years is as
follows. Calculate the average percentage increase in bad-debt expense over this time
period. If this rate continues, estimate the percentage increase in bad-debts for 1997,
relative to 1995.
1989 1990
0.110.09

1991 1992
0.075 0.08

1993 1994 1995


0.095 0.108 0.120

SOLUTION :
G.M. = [(1.11)(1.09)(1.075)(1.08)(1.095)(1.108)(1.12)]1/7
= [1.908769992]1/7
= 1.09675

The average increase is 9.675% per year.


The estimate for bad-debt expenses in 1997 is [1.09675]2 - 1= 0.2029
That is 20.29% higher than in 1995.

Harmonic Mean
Harmonic mean is used for computing the average rate of increase of profits or average speed at
which journey has been performed. The rate usually indicates the relation between two different
types of measuring units that can be expressed reciprocally. For example speed = km/hr. Here,
km and hr are two different units.
Harmonic mean is defined as the reciprocal of the arithmetic mean of the reciprocal of the
individual observations. Thus, by definition,
for individual observations,
HM = [ N/ (1/X1 + 1/X2 + 1/X3 + . + 1/Xn ) ]
H.M. = N/[(1/X)
1
For discrete series, H.M. = N/[f. X ]
For continuous series, H.M. = N/[f.

1
m

] Here, we take the reciprocal of mid-points

#. An aeroplane covers the four sides of a square at speeds of 1000, 2000, 3000 and 4000 km
per hour respectively. What is the average speed of the plane in its flight around the
square?
Solution:
If we compute the arithmetic mean, we get the following answer :
X = [1000+2000+3000+4000]/4 = 2500 km/hr
However, this is not the correct answer. In such a problem, harmonic mean is an appropriate
average.
H.M. = N/[(1/X)]
1
1
1
1
+
+
+
100 2000 3000 4000

=
4

= 1920 km/hour

#. From the following data compute the value of harmonic mean :

Marks
No. of students

: 10
: 20

20
30

25
50

40
15

50
5

Solution :
CALCULATION OF HARMONIC MEAN
f
(f/X)
20
2
30
1.5
50
2
15
0.375
5
0.1
N=120
(f/X)=5.975

Marks
10
20
25
40
50

H.M. =

N
f
( )
X

120
5.975

= 20.08

#. From the following data compute the value of Harmonic Mean :


Class interval
:10-20
20-30
30-40
40-50
Frequency
: 4
6
10
7

50-60
3

Solution :
Class interval
10-20
20-30
30-40
40-50
50-60

H.M. = N/ (f/m)

CALCULATION OF HARMONIC MEAN


Mid-points
Frequency
m
f
15
4
25
6
35
10
45
7
55
3
N=30

f/m
0.267
0.240
0.286
0.156
0.055
(f/m) = 1.004

= 30/1.004 = 29.88

RELATIONSHIP AMONG A.M., G.M. & H.M.


In any distribution when the original items differ in size the value of A.M., G.M., and H.M.
would also differ and would be in following order :

A.M. G.M. H.M.


The equality signs hold only if all the numbers X1, X2, ..Xn are identical
PROOF :
Let a and b be two positive quantities such that a b
The A.M., G.M., and H.M. of these two quantities are :
2
a+b
2 ab
1 1
X=
;
G.M.
=
(a
x
b);
H.M.
=
=
+
2
a+b
a b
We have to prove that A.M. G.M. H.M.
a+b
Let us first prove that A.M. G.M. Or
2

(a x b)

Or, a + b 2(ab)
Or, a + b - 2(ab) 0
Or, (a - b)2 0
But square of any real quantity is positive. Hence, (a - b)2 will be
Positive.
a+b
Hence,
(ab)
2
Let us now prove that G.M. H.M.
2 ab
Or, (ab) a+b
Or, a + b

2 ab
( ab)

Or, a + b 2(ab)
This has already been proved above. Hence, G.M. H.M.
Therefore, A.M. G.M. H.M.
If a and b are equal in that case, A.M. = G.M. = H.M.
Thus, A.M. G.M. H.M.
In any distribution when the original items differ in size the value of A.M., G.M., and
H.M. would also differ and will be in the following order :
A.M. G.M. H.M.
The equality signs hold only if all the numbers X1, X2, .Xn are identical.
WHICH AVERAGE TO USE ?
The methods of computing various types of average have been discussed in details above. Now
question comes that which type of average is to be used under what condition. The following
considerations influence the selection of an appropriate average :
The type of data available. If the data are badly skewed, avoid the Mean.

If the data are gappy around the middle, avoid the Median.
If the data are unequal in class-interval, avoid the Mode.

Arithmetic Mean In following cases the arithmetic mean should not be used :
In highly skewed distributions.
In distributions with open-end intervals
To average ratios, and rates of change
When there is very large and small items as there will be undue influence from extreme
items.
Median The median is generally the best average in open-end grouped distributions especially
where if plotted as a frequency curve, one gets a J or reverse J curve. For example, in case of
income distribution or price distribution, very high or very low values would cause the mean to
be higher to lower than the most common values. In such cases, the median or middle value of
the series may be a more representative figure to use in describing the mass of data.
Mode The mode is best suited where there is an outstandingly large frequency. The mode can
be used in problems involving the expression of preferences where the quantitative
measurements are not possible. If we want to compare the consumer preferences for different
kinds of products or different kinds of advertisements, we can compare the modal preferences
expressed by different groups of people but we can not calculate the median or mean.Mode is
particularly useful average for discrete series e.g. number of people wearing a given size of shoe.
Geometric Mean Geometric mean is useful for averaging ratios, percentages and in computing
average rates of increase or decrease. It is particularly important in Economics and Business
Statistics in Index Number construction.
Harmonic Mean Harmonic mean is useful in Problems in which values of a variable are
compared with a constant quantity of another variable i.e.distance covered within certain time
and quantities purchased or sold per unit.
MESURE OF VARIATION
The various measures of central value discussed above gives us one single figure that represents
the entire data. But the average alone can not adequately describe a set of observations, unless all
the observations are same. It is necessary to describe the variability or dispersion of the
observations. In two or more distributions, the central value may be the same but still there can
be wide disparities in the formation of distributions. Measures of dispersion help us to study this
important characteristic of a distribution.
Some important definitions of dispersion are given below :
1 Dispersion is the measure of variation of the items. [A.L.Bowley]
2 The degree to which numerical data tend to spread about an average value is called the
variation or dispersion of the data. [Spiegel]
3 Dispersion or spread is the degree of the scatter or variation of the variable about a
central value.

The measurement of the scatteredness of the mass of figures in a series about an average
is called measure of variation or dispersion. [Simpson & Kalfa]

Methods of studying variation


The following are the important methods of studying variation :
1 The Range
2 The Interquartile Range and Quartile Deviation
3 The Mean Deviation or Average Deviation
4 The Standard Deviation
Range The range is the difference between the highest and lowest observed values.
Range = Value of highest observation - value of lowest observation
The relative measure corresponding to range is called the Coefficient of Range which is
obtained by following formula :
LS
Coefficient of Range = L+ S
Where, L = Largest observation and S=Smallest observation
Range is useful in the following cases :
1 Quality control The Range is used in preparing Quality Control Chart.If the
observations (say dimension or weight) of the product do not fall within the range
i.e.lower acceptable limit and higher acceptable limit, the production machinery should
be examined to find out why the items produced have not followed their usual more
consistent pattern.
2 Fluctuation in the share prices Range is useful in studying the variations in the prices of
stocks and shares and other commodities that are sensitive to price changes from one
period to another.
3 Weather forecast - The Metrological department does make use of the range in
determining the difference between the minimum temperature and the maximum
temperature. This information is of great concern to the public because they know as to
within what limits the temperature is likely to vary on a particular day.
The Interquartile Range - The range as a measure of dispersion has certain limitations. It is
based on two extreme items and it fails to take account of the scatter within the range. From
this there is reason to believe that if the dispersion of the extreme items is discarded the
limited range thus established might be more instructive. For this purpose, there has been
developed a measure called the interquartile range, the range which includes the middle
50% of the distribution. That is one quarter of the observations at the lower end, another
quarter of the observations at the upper end of the distribution are excluded in computing the

interquartile range. In other words, interquartile range represents the difference between the
third quartile and the first quartile.
Symbolically, Interquartile Range = Q3 Q1
Quartile Deviation
Quartile deviation gives the average amount by which the two quartiles differ from the
median. In asymmetrical distribution, the two quartiles are equidistant from the median.
Q 3Q1
Quartile Deviation =
2
Coefficient of Quartile Deviation
The relative measure corresponding to quartile deviation is called Coefficient of Quartile
Deviation.
Q 3Q1
Coefficient of quartile deviation = Q3+Q 1
#. You are given the frequency distribution of 292 workers of a factory according to
their average weekly income. Calculate quartile deviation and its coefficient from the
following data :
Weekly Income
No. of workers
Weekly Income
No. of workers
(Rs.)
(Rs.)
Below 1350
8
1450-1470
22
1350-1370
16
1470-1490
15
1370-1390
39
1490-1510
15
1390-1410
58
1510-1530
9
1410-1430
60
1530 & above
10
1430-1450
40

Solution :
Weekly income
Below 1350
1350-1370
1370-1390
1390-1410
1410-1430
1430-1450

No. of workers
f
8
16
39
58
60
40

c.f.
8
24
63
121
181
221

1450-1470
1470-1490
1490-1510
1510-1530
1530 & above

22
15
15
9
10
N=292

243
258
273
282
292

Median = Size of N/2 th observation = 292/2 = 146th observation


So, Median lies in the class 1410- 1430.
N
p.c.f .
2
Median = L +
xi
f
= 1410 +

146121
x 20
60

= 1410 + 8.333 = 1418.333

Q1 = Size of N/4th observation = 292/4 = 73rd observation


Q1 lies in the class 1390-1410

Q1 =

L +

N
p.c.f .
4
f

x i = 1390 +

7363
x
58

20 = 1390 + 3.448 = 1393.448

Q3 = Size of 3N/4th observation = (3 x 292)/4 = 219th observation


Q3 lies in the class 1430-1450

Q3 =

L +

3N
p.c.f .
4
f

Coefficient of Q.D. =

Q 3Q1
Q3+Q 1

x i = 1430 +

219181
x
40

20 = 1430 + 19 = 1449

= (1449 1393.448) / (1449 + 1393.448) = 55.552/2842.448

= 0.020

THE MEAN DEVIATION


The two methods of deviations discussed above, namely, range and quartile deviation are not
the measures of variation in strict sense as they do not show the scatterness around an
average. However, to study the formation of a distribution we should take the deviation from

an average. The two other measures, namely, the average deviation and standard deviation,
help us in achieving this goal.
The mean deviation is also known as average deviation. It is the average difference between
the items in a distribution and the median or mean of that series. Theoretically there is an
advantage in taking the deviations form median because the sum of deviations of items from
median is minimum when signs are ignored. However, in practice, the arithmetic mean is
more frequently used in calculating the value of average deviation and this is the reason why
it is more commonly called mean deviation. The mean deviation is obtained by calculating
the absolute deviations of each observations from median (or mean), and then
averaging these deviations by taking their arithmetic mean.
1 Computation of Mean Deviation Individual Observations
If, X1, X2,X3, .XN are N given observations, then the deviation about an
average A is given by
1
1
M.D. = N | X A | = N | D |
Where
| D | = |X A
Coefficient of mean deviation The relative measure corresponding to the mean
deviation is called the coefficient of mean deviation. This is obtained by dividing mean
deviation by the particular average used in computing mean deviation. Thus, if mean
deviation is computed from median, the coefficient of mean deviation shall be obtained
by dividing mean deviation by median
M .D.
Coefficient of M.D. = Median
#. Calculate the mean deviation and its coefficient of the two income groups of five and
seven members given below :
I(Rs.)
II(Rs.)

4000
3000

4200
4000

4400
4200

4600
4400

4800
4600

4800

5800

SOLUTION :
Group I
Deviation from median 4400 |D|
4000
400
4200
200
4400
0
4600
200

Group II
Deviation from median 4400 |D|
3000
1400
4000
400
4200
200
4400
0

4800
N=5

400
|D| = 1200

4600
4800
5800
N=7

200
400
1400
|D| = 4000

Mean deviation (Group I) :


1
M.D. = N | D |
|D| = Deviation from median ignoring signs
N +1
Median = Size of
th item = (5 + 1)/2 = 3rd item
2
Size of 3rd item is 4400
|D| = 1200, N=5, M.D. = 1200/5 = 240
This means that the average deviation of the individual incomes from the median
income is Rs.240
Mean deviation (Group II) :
1
M.D. = N | D |
|D| = Deviation from median ignoring signs
N +1
Median = Size of
th item = (7 + 1)/2 = 4th item
2
Size of 4th item is 4400
|D| = 4000, N=7 M.D. = 4000/7 = 571.43
Note: If we are to compute the coefficient of mean deviation, we shall divide mean
deviation by median. Thus for first group :
Coefficient of M.D. = 240/4400 = 0.054
And for the second group,
Coefficient of M.D. = 571.43/4400 = 0.130

2. CALCULATION OF MEAN DEVIATION FOR DISCRETE SERIES


1
M.D. = N f | D |
|D| = Deviations of the items from median ignoring signs
f = Frequency of observations
N=Total number of frequencies

#.(a) Calculate the mean deviation from the following series :


X :10
11
12
13
14
f :3
12
18
12
3

SOLUTION :
CALCULATION OF MEAN DEVIATION
f
|D|
f|D|
3
2
6
12
1
12
18
0
0
12
1
12
3
2
6
N=48
f|D| = 36

X
10
11
12
13
14

M.D. =

1
N

c.f.
3
15
33
45
48

f | D |

Median = Size of (N+1)/2 th item = (48+1)/2 = 24.5 th item


Size of 24.5th item is 12, hence Median = 12
1
Mean Deviation = M.D. = N f | D | = 36/48 = 0.75
(b) Calculate the mean deviation from the Mean for the following data :
Size
:2
4
6
8
10
12
14
16
Frequency
:2
2
4
5
3
2
1
1
CALCULATION OF MEAN DEVIATION FROM MEAN
X
f
f.X
f|D|
|D| = |X-8
2
2
4
6
12
4
2
8
4
8
6
4
24
2
8
8
5
40
0
0
10
3
30
2
6
12
2
24
4
8
14
1
14
6
6
16
1
16
8
8
N=20
fX=160
f|D|=56

X =

fX
N

= 160/20 = 8

M.D. =

f .D
N

= 56/20 = 2.8

4 CALCULATION OF MEAN DEVIATION FOR CONTINUOUS SERIES


For calculating mean deviation in continuous series the procedure remains the same as
discussed above. The only difference is that here we are to obtain the mid-point of various
classes and take deviations of these points from median. The formula is same, i.e.
1
M.D. = N f | D |
Where |D| = |m - Median| Where m = Mid-value of the class

#. Find the median and mean deviation of the following data :


Size
Frequency
Size
0-10
7
40-50
10-20
12
50-60
20-30
18
60-70
30-40
25

Frequency
16
14
8

Solution :
Size
0-10
10-20
20-30
30-40
40-50
50-60
60-70

CALCULATION OF MEDIAN AND MEAN DEVIATION


f
c.f.
Mid-point
|D|=|m 35.2|
m
7
7
5
30.2
12
19
15
20.2
18
37
25
10.2
25
62
35
0.2
16
78
45
9.8
14
92
55
19.8
8
100
65
29.8
N=100

Median = Size of N/2 th item = 100/2 = 50th item


Therefore, Median lies in the class 30 - 40
N
p.c.f .
2
Median = L +
xi
f
L=30, N/2 = 50, p.c.f.=37 f=25

i=10

Median = 30 + [(50 37)/25]x 10 =30 + 5.2 = 35.2

f|D|
211.4
242.4
183.6
5.0
156.8
277.2
238.4
f|D| =
1314.8

M.D. =

1
N

f | D | = 1314.8/100 = 13.148

The reason for taking absolute deviation is to avoid the signs since we want to find out the
amount of differences of observations from median rather than the direction of the
differences.

DISPERSION
The most comprehensive descriptions of dispersion are those that deal with the average
deviation from some measure of central tendency. Two of these measures are :
1 Variance
2 Standard deviation.
Both of these tell us an average distance of any observation in the data set from the mean of
the distribution.
STANDARD DEVIATION
The standard deviation concept was introduced by Karl Pearson in 1823. It is most widely
used measure of studying dispersion.The standard deviation is also known as Root Mean
Square Deviation for the reason that it is the square root of the mean of the squared
deviation from the arithmetic mean. The standard deviation measures the absolute dispersion;
the greater the standard deviation, the greater will be the magnitude of the deviations of the
values from their mean. A small standard deviation means a high degree of uniformity of the
observations as well as homogeneity of the series; a large standard deviation means just the
opposite.
VARIANCE : the variance is the average of the squared distances of the
observations from the mean. Every population has a variance, which is
symbolised by 2 .
2 =

( x ) 2
N

x2
N

- 2

Where
2 = Variance
X = Item or observation
= Population mean
N = Total number of items in the population
CALCULATION OF STANDARD DEVIATION

1.FOR INDIVIDUAL OBSERVATIONS


When deviations are taken from actual mean, the following formula is applied :
= [(x2)/N ]
Where, x = (X X )
When actual mean is in fractions, say 23.45 it becomes too cumbersome to take deviations
from it and then obtain squares of these deviations. In such a case, either the mean may be
approximated or else the deviations be taken from an assumed mean. When deviations are
taken from assumed mean the following formula is applied :
= [ (d2/N ) (d/N)2 ]
Where d = X - A
# Calculate the standard deviation from the following observations :
240.12
240.13
240.15
240.12
240.17
240.15
240.17
240.16
240.22
240.21
Solution : The Assumed Mean should be as nearer to the actual mean as far as possible to
minimize calculations. In this case the actual mean is 240.16, so let us take 240 as assumed
mean.

X
240.12
240.13
240.15
240.12
240.17
240.15
240.17
240.16
240.22
240.21
N=10

CALCULATION OF STANDARD DEVIATION


(X 240)
d2
d
+0.12
0.0144
+0.13
0.0169
+0.15
0.0225
+0.12
0.0144
+0.17
0.0289
+0.15
0.0225
+0.17
0.0289
+0.16
0.0256
+0.22
0.0484
+0.21
0.0441
d = +1.66
d2=0.2666

= [ (d2/N ) (d/N)2 ]
= [(0.2666/10) (1.6/10)2] = 0.033
2.FOR DISCRETE SERIES

For calculating standard deviation in discrete series, any of the following methods may be
applied :
(a)Actual Mean Method
(b)Assumed Mean Method
(c)Step Deviation Method
a

Formula for Actual Mean Method :


= [(f.x2)/N], Where x = (X X )

Formula for Assumed Mean Method :


= [(fd2)/N ((fd/N)2 ], Where d = (X A)

(c) Step Deviation Method : When this method is used, we take deviations of mid-points
from an Assumed Mean and divide these deviations by the width of Class Interval i.e. i
In such case, = [(fd2)/N ((fd/N)2] x i, Where d = (X A)/i and i = Class Interval

# The annual salaries of a group of employees are given in the following table :
Salaries (In Rs.000) 45
50
55
60
65
70
75
80
No. of persons
3
5
8
7
9
7
4
7
Calculate the standard deviation of the salaries.

Solution :(Using Step Deviation Method)


CALCULATION OF STANDARD DEVIATION
Salaries
No. of persons
(X 60)/5
f.d
X
f
d
45
3
-3
-9
50
5
-2
-10
55
8
-1
-8
60
7
0
0
65
9
1
9
70
7
2
14
75
4
3
12
80
7
4
28
N=50
fd=36

f.d2
27
20
8
0
9
28
36
112
f.d2=240

= [(fd2)/N (fd/N)2 x i = [(240/50) (36/50)2 x 5 = [4.8 0.5184] = 10.35

3.FOR CONTINUOUS SERIES


In continuous series any of the methods discussed above for discrete frequency
distribution can be used. However, in practice it is step deviation method that is most
used. The formula is :
= [ (fd2)/N (fd/N)2 ] x i
Where d = (m A)/i,
m = Mid-point of the class and i = Class Interval
# Calculate mean and standard deviation of following frequency distribution of marks :
Marks
No. of students
Marks
No. of students
0-10
5
40-50
50
10-20
12
50-60
37
20-30
30
60-70
21
30-40
45
Solution :
Marks
0-10
10-20
20-30
30-40
40-50
50-60
60-70

f
5
15
25
35
45
55
65
N=200

Mid-point
m
5
12
30
45
50
37
21

d=(m-35)/10

fd

fd2

-3
-2
-1
0
+1
+2
+3

-15
-24
-30
0
+50
+74
+63
fd=118

fd2=510

X = A + [ (fd)/N ] x i = 35 + [(118/200)] x 10 = 35 + 5.9 = 40.9


= [ (fd2)/N (fd/N)2 ] x i
= [ (510/200) (118/200) ] x 10
= [ 2.55 0.3481 ] x 10
= 1.4839 x 10
= 14.839
COMPUTATION OF COMBINED STANDARD DEVIATION
Just as it is possible to compute combined mean of two or more than two groups, similarly,
we can also compute the combined standard deviation of two or more groups. Combined
standard deviation is denoted by 12 and is computed as follows :
12 = [ (N1.12 + N2.22 + N1d12 + N2d22 ) / (N1 + N2)]
Where 12 = Combined Standard Deviation
1 = Standard deviation of Goup-1

2 = Standard deviation of Goup-2


d1 = | X1 - X12 |
d2 = | X2 - X12 |
X1 = Mean of Group-1
X2 = Mean of Group-2
# The number of workers employed, the mean wages (In Rs.) per month and standard
deviations (In Rs.) in each section of factory are given below. Calculate the men wages and
standard deviation of all the workers taken together.
Section
A
B
C

No. of
employed
50
60
90

workers Mean wages(In Rs.)


1113
1120
1115

Standard deviation
(In Rs.)
60
70
80

Solution :
X123 = [ N1X1 + N2X2 + N3X3 ]/ [N1 + N2 + N3 ]
= [(50 x 1113) + (60 x 1120) + (90 x 1115)] / [50 + 60 + 90]
= 223200/200 = Rs.1116
d1 = | X1 - X123 | = | 1113 1116 | = 3
d2 = | X2 - X123 | = | 1113 1116 | = 4
d3 = | X3 - X123 | = | 1113 1116 | = 1
123 = [ (N1.12 + N2.22 + N3.32 + N1.d12 + N2d22 + N3d32 ) / (N1 + N2 + N3)]
= [ (50 x 602 + 60 x 702 + 90 x 802 + 50 x 32 + 60 x 42 + 90 x 12) / (50+60+90)]
= [(180000 + 294000 + 576000 + 450 + 960 + 90) / 200 ]
= [ (1051500 / 200 ) ]
= 5257.5
= 72.51

RELATION BETWEEN MEASURES OF DISPERSION


In a normal distribution there is a fixed relationship between the three most commonly used
measures of dispersion. The Quartile Deviation is smallest, the Mean Deviation next and the
Standard Deviation is largest in the following properties :
2
4
Q.D.= 3
M.D. = 5
COEFFICIENT OF VARIATION

The standard deviation discussed above is in absolute measure of dispersion. The


corresponding relative measure is known as coefficient of variation. This measure developed
by Karl Pearson is the most commonly used measure of relative variation. It is used in such
problems where we want to compare the variability of two or more than two series. That
series (or group) for which the coefficient of variation is greater is said to be more variable or
conversely less consistent, less uniform, less stable or less homogeneous. On the other hand,
the series for which coefficient of variation is less is said to be less variable or more
consistent, more uniform, more stable or more homogeneous. Coefficient of variation is
denoted by C.V. and is obtained as follows :
Coefficient of variation or C.V. = [/X ] . ( 100 )
# From the prices of shares of Company X and Company Y as given below, find out
which is more stable in value :
X
Y

35
108

54
107

52
105

53
105

56
106

58
107

52
104

50
103

51
104

49
101

Solution :
In order to find out which share is more stable, we are to compare coefficient of
variations.

X
35
54
52
53
56
58
52
50
51
49
X = 510

CALCULATION OF COEFFICIENT OF VARIATION


x2
Y
x=XX
y =Y Y
-16
256
108
+3
+3
9
107
+2
+1
1
105
0
+2
4
105
0
+5
25
106
+1
+7
49
107
+2
+1
1
104
-1
-1
1
103
-2
0
0
104
-1
-2
4
101
-4
x = 0
x2=350
Y=1050
y=0

X = 510/10 = 51
= [(x2)/N ] = [350/10] = 5.916
C.V. = [/X ] . 100 = [ 5.916/51 ] . 100 = 11.6
Y = 1050/10 = 105
= [(y2)/N ] = [40/10] = 2
C.V. = [/Y ] . 100 = [ 2/105 ] . 100 = 1.905

y2
9
4
0
0
1
4
1
4
1
16
y2=40

Since, the coefficient of variation is much less in case of shares of Company Y, hence they
are more stable as compared to that of X.