Sie sind auf Seite 1von 7

1.

NOTES ON STATISTICS FOR INTERMEDIATE


Introduction :It is defined by various authors in different ways. For ex.:
i)
ii)
iii)
iv)

1.2
1.3

1.

2.

Statistics may be called as the science of counting or science of


averages (A.L.Bowley)
Statistics is the science of estimates and probabilities (Boddington)
Statistics may be defined as the scientific exposition of statistical methods.
Statistics is collection, compilation, analysis and interpretation of data.

Usages:- In Business and commerce, in medical science, in agriculture


research, in economics, geography, Meteorology etc.
Meteorological Statistics:Statistics as applied to meteorology is called Meteorological Statistics.
Statistical methods are widely used in meteorology. Some of the principal
branches of its application are discussed below:Analysis of climatological data:
Every day vast amount of meteorological data is collected all over the
world. These should be properly complied and archived. Statistical methods help
in efficient compilation of data.
Studies on climate change.

3.

In weather forecasting.

4.

Relation amongst meteorological variables.

5.

Forecast verification.

1.4

Frequency distribution:-

1.4.1

Variable: A quantity that can vary from one item to another is called variable or
variate, e.g. Height and weight of students, surface pressure of a station, number
of annual rainy days etc.

Continuous variable: - A variable that can take any numerical value within certain
range is called continuous variable, e.g. Pressure, rainfall, height, weight etc. an
observation on continuous variable is obtained by the process of measurement.
Discrete variable: - A variable which is incapable of taking all possible values is called
discrete variable,
e.g. number of rainy days, number of children, number of telephone calls
received in an office, number of defects in a machine etc. The value of a discrete
variable is usually obtained by the process of counting
1.4.2

Grouped / Ungrouped data: -

When the values of a variable are collected in an arbitrary order, it is


difficult to grasp the significance of the data. This could be called the ungrouped
(raw) data. It is preferable to arrange them in arrays.
We divide the range of the variate in to equally spaced intervals. Then we
again peruse the data and find out how much frequency lies in each interval. This
is grouped data. This can be done by using simple method known as tally method.
The classes and the frequencies can be properly grouped together. The
arrangement thus result is called Frequency distribution.
1.4.3

Exclusive Method: Let us consider two intervals say 10-20, 20-30 etc. Here the convention is
that, the lower limit of the class (i.e. 10 for first class and 20 for second class) is
included, that is, the values from 10 to 19.9999 are in the first class and values
from 20 to 29.9999 are included in the second class.

1.4.4

Inclusive Method:Let us consider two intervals say 10- 19, 20- 29 etc. Here the convention is
that, both the limits i.e. lower limit and upper limit of the class are included in the
class. This means that the values from 10 to 19 are in the first class and values
from 20 to 29 are included in the second class.

Diagrammatic/Graphical Representation of a Frequency Distribution


Usefulness of diagram:It is often useful to represent a frequency distribution by means of a
suitable diagram. Such an exercise makes the unwieldy data intelligible and
conveys to eye the general run of observations. Diagrams have a more lasting
effect on brain. They make visual comparison of various sets of data easier.
Histogram:To draw the histogram we first mark off along the axis of X, all the class
intervals on a suitable scale. With the class intervals as bases rectangles are drawn
with areas proportional to the frequencies of class intervals. For equal class
intervals the heights of the rectangles will be proportional to the frequencies,
while for unequal class intervals they will be proportional to the ratios of
frequencies to the widths of classes.
Thus if d1, d2,, dn are widths of the classes, f1, f2, fn are the
corresponding frequencies and h1, h2,..hn are the heights of the rectangle then
we should have
f1 : f2 : : fn = h1d1 : h2d2 : .. : hndn
The histogram is therefore a two dimensional diagram. However if all the
class intervals are equal it reduces practically to a one dimensional diagram. The
area covered by all the rectangles represents the total frequency.

Frequency Polygon
If the mid points of the upper horizontal side of each rectangle are joined
by means of straight lines we obtain the frequency polygon. When this done, two
hypothetical classes with zero frequency at each end are to be included. The area
under the frequency polygon is the same as the area under the histogram.
Frequency Curve
If smooth free hand curve is drawn through the various vertices of the
polygon, we get the frequency curve. Again, the area under the frequency curve
and above the horizontal line is equal to the total frequency.
Unit 2. Measure of central tendency or Averages
Properties Of Arithmetic Mean ( Mathematical):i)

The sum of the deviations of the items from the Arithmetic


Mean is always zero i. e. ( X X )= 0(taking sign into
consideration)

ii)

The sum of the squared deviations of the items from the


Arithmetic Mean is minimum.

iii)

Since X = X/ N, therefore N X = X

iv)

If we have the Arithmetic Mean of two or more than two


related groups, we can compute combined average (X 12 )
of these groups by applying the following formula

12

= ( N1 X

N1 and N2

+ N2 X 2 )/ (N1+ N2)

are number of items in group1 and group 2

X 1 and X 2 are Arithmetic Mean of group1 and group 2

v)
bX then

If the given observation on X be changed to observation on Y = a+


Y = a+ b X

Median:The median is the middle value in the distribution. It is also defined as that value
of the variate which the frequency above or below is half the total frequency.
For ungrouped observations, the data is first arranged in ascending or descending
order.
The median is the middle item or the Arithmetic Mean of the middle items.

If X1, X2,, Xn are n items arranged in order, median is the size of the
[(n+1)/2]th item if n is odd and the Arithmetic Mean of {n/2 and [(n/2)+1]}th item
if n is even.
For discrete series also median can locate in the same way even though in certain
series it could be intermediate.
Merits:i)

It is rigidly defined for a continuous series.

ii)

It is easy to compute, understand and interpret.

iii)

It may not require measure of all the items.

iv)

It is not unduly affected by extreme items.

v)

It can be easily graphically located.

vi)

It can be computed for open end class frequency distribution.

vii)

It is the most appropriate data for qualitative variables and also for highly
skewed distributions.

Demerits
i)

It is not based on each and every item

ii)

It does not lend itself to further algebraic treatment

iii)

It does not have sampling stability

iv)

It is not always determinate or meaningful for discrete distributions

v)

Its location requires arranging the data

Mode:The mode or modal value is that value in a series of observations which occurs
with greatest frequency.
Merits:i)

Easy to compute, understand and interpret

ii)

It is not influenced affected by extreme items

iii)

It can be located by mere inspection

iv)

It can be easily graphically determined

v)

It can be used to describe qualitative phenomena

vi)

It can be determined for open end class distribution

vii)

It is the most suitable average for discrete distributions

Demerits:i)

It is ill defined. Some distributions are bi- or multi- modal

ii)

It requires and depends on grouping of data

iii)

It is not based on each and every item

iv)

It is not capable of algebraic treatment

v)

It does not have sampling stability

Empirical Relation:Mode = 3 Median- 2 Mean


Geometric Mean is defined as the nth root of the product of n items
G.M. = (X1 x X2 x Xn )1/n
Log G.M. = log X/n
G.M. = Antilog (log X/n)
Harmonic Mean is defined as the reciprocal of the Arithmetic Mean of the reciprocal
of the individual observation.
H.M. = [n/(1/x1+1/x2 +.1/x n)
= n/(1/x)
Quartiles:The values which divide the given data in to four equal parts are known as quartiles.
Q1 = Ist quartile, covers 25 % of item
Q2 = IInd quartile, covers 50 % of item (Median)
Q3 = IIIrd quartile, covers 75 % of item
Deciles :- Deciles are the values which divide the given data in to ten equal parts.
Obviously there are nine deciles. i.e. D1, D2, . , D9.
Percentiles :- Percentiles are the values which divide the given data in to hundred
equal parts. Obviously there are ninety nine percentiles . i.e. P1, P2, . , P99.
Unit 3. MEASURES OF DISPERSION

Let us consider the data of the Maximum and Minimum temperatures of three stations
A, B and C for a given day.
Max. ( Deg. C)
A :
B :
C :

40
35
32

Min.

Mean

20
25
28

30
30
30.

...........(1.1)

The mean of Max. and Min. temperatures is generally taken as the mean temperature
of the day. We see that for A, B and C mean temperature is the same, viz. 30 Deg. C.
However the extreme temperatures show considerable variation. This feature has not
been brought out by the Arithmatic Mean. This shows that the measures of central tendency are
insufficient and should be supported by other measures.

Definition : The property which denotes the extent to which the values are dispersed
about the central value is termed as DISPERSION or VARIATION and the
measures that measure this property are the MEASURES OF DISPERSION.
In other words, the measurement of scatterness of the mass of figures in a series about
an average is called Measure of Variation or Dispersion.
These determine the realiability of an average served as a basis for the control of
Variability and facilitate the use of other Statistical Measures.
The measures of dispersion which are frequently used in Statistics are Range,
Quartile Deviation, Mean Deviation and Standard Deviation.
RANGE

This is the simplest possible measure of dispersion and is the difference between the
largest and the smallest items of the data set.
Thus, RANGE = L - S
where L is the largest item & S is the smallest item.
Range is based just on two item and does not depend upon the items in between.
However there are several variables for which the intermediate values are not as
important as the extreme values. Surface temperature is the fine example. The maximum
& minimum temperatures are the most important
observation. Relative Humidity is the another example. In such variables, range is used
as a measure of dispersion.
The ranges of temperatures for places A, B and C (given in 1.1) are 20, 10 and 4 Deg.
C. Thus we see that though AM is 30 Deg. C for all, the dispersion is different.
QUARTILE DEVIATION

QD = Q3-Q1/2, where Q1 and Q3 are the first and the third quartiles respectively.
This is also called semi inter quartile range.
The QD is better measure of dispersion than range. One of its major demerits is that it
ignores nearly 50% of the observations i.e. those below Q1 and above Q3. It does not
depend upon the formation of the series in the above range.

MEAN DEVIATION

The range and QD, though lucid measures of dispersion, do not exactly confirm to the
definition, in as much as they do not measure the scatter of individual or average
deviation could be considered as a better measure in this respect. The M.D. is generally
defined with reference to a central number preferablly ( not necessarily) an average. If A
is such a number of items x1,x2,x3, x3,.....,xn the MD from A is defined as 1/n 1 x-A 1.
The MD(A) is the AM of the absolute values of the deviations of the induvidual items
measured from A. Hence the name Average Deviation for it. When MD is computed
the average or central number should be invariably mentioned to avoid any ambiguity.
It should be noted that the mean deviation is minimum when the deviations are
taken from the Median. This makes the mean deviation more efficient only when it is
measured from the median.

Das könnte Ihnen auch gefallen