Statistics (MA in Economics)

Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Coefficient of Correlation
Population Correlation Coefficient:
1. The measure of joint or mutual variation in a bivariate population with two variables x and y, is called
‘covariance of x and y’:
2. In order to make comparison, the covariance must be standardised by dividing (x – μ x ) and (y – μ y ) by their SDs
σ x and σ y respectively. This expression is called ‘coefficient of correlation’; the ‘population coefficient of
correlation’ is denoted by ‘ρ’ (rho):
Sample Correlation Coefficient:
1. The sample covariance of x and y, S xy , measures the tendency for x and y to increase or decrease together in the
sample:
2. The ‘sample coefficient of correlation’ is denoted by ‘r’. It is also known as ‘Karl Pearson’s product moment
coefficient of correlation’. The coefficient of correlation always lies between –1 and +1 respectively, i.e., –1 ≤ r ≤
+1:
3. (a) If r = –1, all the points on the scatter diagram lie on the regression line of negative slope. It is called a ‘perfect
negative correlation’.
(b) If r = 1, all the points on the scatter diagram lie on the regression line of positive slope. It is called a ‘perfect
positive correlation’.
(c) If r = 0, all the points on the scatter diagram are spread throughout the diagram indicating no correlation
between x and y.
“Correlation coefficient is a measure of the closeness of linear relationship between the two variables.”
Correlation Coefficient and Regression Coefficient:
1. The two regression coefficients b and d of the two regression lines can also be stated as follows:
2. Since , therefore, S xy = r ∙ S x ∙ S y .
3. The regression coefficients b and are related to correlation coefficient r by:
or
or
Where
Properties of Coefficient of Correlation:
1. The correlation coefficient is symmetrical with respect to x and y, i.e., r xy = r yx

2. The correlation coefficient is the geometric mean of the two regression coefficients, i.e.: .
3. The correlation coefficient is a pure number and does not depend upon the units employed. For e.g., if the
correlation coefficient between the heights and weights of students is computed as 0.98, it will be expressed
simply as 0.98 (neither as 0.98 inches nor 0.98 pounds).
4. The correlation coefficient is independent of origin and unit of measurement. By this we mean that if we take
deviations of x and y from some suitable origins or transform x and y into u and v respectively, it will not affect
the correlation coefficient. Symbolically:
r xy = r uv
5. The correlation coefficient lies between –1 and +1, i.e., it cannot be less than –1 and greater than +1:
–1 ≤ r ≤ +1
Example:
x 3 1 1 2 4 2 3 5 2 3
y 2 4 3 2 1 2 1 3 2 1
Required:
(a) Covariance of x and y,
(b) Standard deviation of x and y,
(c) Coefficient of correlation, and
(d) Scatter diagram.
Solution:
(a) Covariance of x and y:
x y x – μx x – μy (x – μ x )( x – μ y ) (x – μ x )2 (x – μ y )2
3 2 0.4 –0.1 –0.04 9 4
1 4 –1.6 1.9 –3.04 1 16
1 3 –1.6 0.9 –1.44 1 9
2 2 –0.6 –0.1 0.06 4 4
4 1 1.4 –1.1 –1.54 16 1
2 2 –0.6 –0.1 0.06 4 4
3 1 0.4 –1.1 –0.44 9 1

5 3 2.4 0.9 2.16 25 9
2 2 –0.6 –0.1 0.06 4 4
3 1 0.4 –1.1 –0.44 9 1
26 21 –4.6 82 53
(b) Standard deviation of x and y:
(c) Coefficient of correlation:
(d) Scatter diagram:

Example:
Calculate:
(a) Covariance of x and y,
(b) Variances of x and y,
(c) Coefficient of correlation, and
(d) Coefficient of determination.
For the following sample data:
x 1 2 4 6 8 10 14 15 18 20
y 10 20 30 40 50 60 70 80 90 100
Solution:
(a) Covariance of x and y:
x y ( )( ) ( )2 ( )2
1 10 –8.8 –45 396 77.44 2025
2 20 –7.8 –35 273 60.84 1225
4 30 –5.8 –25 145 33.64 625
6 40 –3.8 –15 57 14.44 225
8 50 –1.8 –5 9 3.24 25
10 60 0.2 5 1 0.04 25
14 70 4.2 15 63 17.64 225
15 80 5.2 25 130 27.04 625
18 90 8.2 35 287 67.24 1225
20 100 10.2 45 459 104.04 2025

98 550 1820 405.6 8250
(b) Variances of x and y:
(c) Coefficient of correlation:
(d) Coefficient of determination:
r2 = b × d
r2 = 4.48720 × 0.22059 = 0.9898 = 98.98%
Probable Error:
1. The probable error is about two-third of the standard error:

2. Assuming ρ = 0, the sampling distribution of r has standard error:
3. In a standard normal distribution, z = ± 0.6745 will contain 50% of the area under curve, symbolically:
P(–0.6745 ≤ z ≤ 0.6745) = 0.5
4. Thus, the probable error r is:
P.E. = 0.6745 × σ r
or
P.E. = 0.6745 ×
5. Probabilities of r can now be calculated using P.E. as a unit of deviation:
P(–P.E. ≤ r ≤ P.E.) = 0.5
P(–3P.E. ≤ r ≤ 3P.E.) = 0.9544
Rank Correlation:
1. If observations on two variables are given in the form of ranks rather than some numerical measurements, it is
possible to compute a coefficient of correlation between ranks of the two variables. This correlation coefficient
is called ‘Rank Correlation Coefficient’.
2. As this formula was presented by Spearman in 1904, it is also known as ‘Spearman’s Rank Correlation
Coefficient’:
Where d i = x i – y i (the difference between the rankings).
3. In order to test that there is no correlation between the two rankings, critical values of r s at α = 0.05 are given
below:
Number of ranks (n) Critical value (r s )

5 1.0
6 0.89
7 0.79
8 0.74
9 0.74
10 0.65
20 0.45
25 0.40
50 0.28
Example:
Ranks of 9 students in a class in History (x) and Geography (y) are as follows:
Students I II III IV V VI VII VIII IX

x 1 9 7 4 5 3 8 2 6
y 4 5 6 3 7 2 8 1 9
Calculate Spearman’s Rank Correlation Coefficient and test its significance.
Solution:
Students x y d=x–y d2
I 1 4 –3 9
II 9 5 4 16
III 7 6 1 1
IV 4 3 1 1
V 5 7 –2 4
VI 3 2 1 1
VII 8 8 0 0
VIII 2 1 1 1
IX 6 9 –3 9
Total 45 45 0 42
Where d i = x i – y i
Critical value of r s for n = 9 and α = 0.05 is 0.74
Since 0.65 is less than the critical value of 0.74, r s is insignificant.
Top
Home Page
Graphical Presentation I
Types of Graphs:
(a) Histogram
(b) Frequency Polygon
(c) Relative Frequency Histogram and Polygon
(d) Cumulative Frequency Polygon or Ogive
(e) Frequency Curves and Smoothed Ogive
(a) Histogram:
1. A histogram consists of a set of adjacent rectangles having bases along x-axis (marked off
by class boundaries) and areas proportional to class frequencies.
2. To adjust the heights of rectangles in a frequency distribution with unequal class interval
sizes, each class frequency is divided by its class interval size.
Class Frequency
boundaries
109.5-119.5 1
119.5-129.5 4
129.5-139.5 17
139.5-149.5 28
149.5-159.5 25
159.5-169.5 18
169.5-179.5 13
179.5-189.5 6
189.5-199.5 5
199.5-209.5 2
209.5-219.5 1
Σf 120
Class Class Adjusted

f Size
Interval Boundaries Frequency
10-11 4 9.5-11.5 2 4/2=2
12-14 12 11.5-14.5 3 12 / 3 = 4
15-19 25 14.5-19.5 5 25 / 5 = 5
20-29 60 19.5-29.5 10 6
30-34 25 29.5-34.5 5 5
35-39 15 34.5-39.5 5 3
40-42 6 39.5-42.5 3 2
147
(b) Frequency Polygon:
1. It is constructed by plotting the class frequencies against their corresponding class marks
(mid-points) and then joining the resulting points by means of straight lines.
2. The ends of the graph so drawn do not meet the ends of x-axis. A polygon is a many sided
closed figure. Therefore, extra classes are to be added at both ends of the frequency
distribution with zero frequencies.
3. The frequency polygon can also be obtained by joining the mid-points of the tops of
rectangles of histogram.
(c) Relative Frequency Histogram and Polygon: Same as described above.
(d) Cumulative Frequency Polygon or Ogive:
1. The graph showing the cumulative frequencies plotted against the upper class boundaries
is called a ‘cumulative frequency polygon’ or ‘ogive’.
2. The graph corresponding to a less than or a more than cumulative frequency distributions
are called ‘less-than’ and ‘more-than ogives’ respectively.
Less than More than
Class
Frequency Cumulative Cumulative
Boundaries
Frequency Frequency
109.5-119.5 1 1 119
119.5-129.5 4 5 115
129.5-139.5 17 22 98
139.5-149.5 28 50 70
149.5-159.5 25 75 45
159.5-169.5 18 93 27
169.5-179.5 13 106 14
179.5-189.5 6 112 8
189.5-199.5 5 117 3
199.5-209.5 2 119 1
209.5-219.5 1 120 0
Σf 120
(e) Frequency Curves and Smoothed Ogives:

Types of Frequency Distribution and Curves:
(a) Symmetrical Distribution,

(b) Moderately Skewed or Asymmetrical Distribution,
(c) Extremely Skewed or J-Shaped Distribution,
(d) U-Shaped Distribution, and
(e) Multi-Modal Distribution.
(a) Symmetrical Distribution: A frequency distribution is said to be symmetrical if the

frequencies equidistant from the maximum are equal.
Class 0-9 10-19 20-29 30-39 40-49 50-59 60-69

interval
Frequency 2 5 9 12 9 5 2
(b) Moderately Skewed or Asymmetrical Distribution: A frequency distribution is said to be

skewed when it departs from symmetry, i.e., when the frequencies tend to pile up in one end or
the other end of a distribution.
Asymmetrical distributions are of two types, i.e.:
(i) Positively skewed, and

(ii) Negatively skewed.
(i) Positively Skewed:
Class 0-9 10-19 20-29 30-39 40-49 50-59 60-69

interval
Frequency 2 5 12 9 7 4 1
(ii) Negatively Skewed:
Class 0-9 10-19 20-29 30-39 40-49 50-59 60-69

interval
Frequency 1 4 7 9 12 5 2
(c) Extremely Skewed or J-Shaped Distribution:
Income 0-1999 2000- 4000- 6000- 8000- 10000- 12000-

3999 5999 7999 9999 11999 13999
No. of 4000 3000 2500 1500 500 350 150
persons
(d) U-Shaped Distribution: In such a distribution, the maximum frequencies occur at both ends
and a minimum in the centre.
Class 1-5 6-10 11-15 16-20 21-25 26-30

interval
Frequency 45 30 18 12 24 40
(e) Multi-Modal Distribution:
1. Frequency distributions with more than one maximum are called ‘multi-modal distribution’.
2. A distribution with two
maxima is called a ‘bimodal
distribution’.
Types of Charts:
(a) Simple Bar Chart,

(b) Multiple Bar Chart,
(c) Component Bar Chart,
(d) Percentage Component Bar Chart, and
(e) Pie Chart.
(a) Simple Bar Chart:

1. Simple bar chart consists of vertical or horizontal bars of equal width.

2. The length of the bars is taken proportionately to the magnitude of the values represented.
The width of the bars has no significance.
3. Vertical bars are used to represent quantitative data or chronological data. Whereas, the
horizontal bars are represented for qualitative data or geographical data.
4. If the data do not relate to time, then they should be arranged in ascending or descending
order of magnitude.
Exports of Pakistan (in US $ million)
Year Exports
1948 138
1951 406
1961 378
1971 683
1981 2958
1991 6168
2001 9202
2005 14410
(b) Multiple Bar Chart:
1. Multiple bar chart is an extension of simple bar chart.

2. Grouped bars are used to represent related sets of data. For example, imports and exports
of a country together are shown in multiple bar chart.
3. Each bar in a group is shaded or coloured differently for the sake of distinction.
Imports Exports
Years
Rs. (billion) Rs. (billion)
1982-83 68.15 34.44
1983-84 76.71 37.33
1984-85 89.78 37.98
1985-86 90.95 49.59

1986-87 92.43 63.35
1987-88 111.38 78.44
(c) Component Bar Chart:
1. This chart consists of bars which are sub-divided into two or more parts.
2. The length of the bars is proportional to the totals.
3. The component bars are shaded or coloured differently.
Current and Development Expenditure – Pakistan (All figures in Rs. Billion)
Current Development Total

Years
Expenditure Expenditure Expenditure
1988-89 153 48 201
1989-90 166 56 222
1990-91 196 65 261
1991-92 230 91 321
1992-93 272 76 348
1993-94 294 71 365
1994-95 346 82 428
(d) Percentage Component Bar Chart:
1. Component bar charts may also be drawn on percentage basis by expressing the
components as percentages of their respective totals.
2. All the bars are of equal length showing the 100%. These bars are sub-divided into
component bars in proportion to the percentages of their components.
Areas Under Crop Production (1985-90)
(‘000 hectors)
Year Wheat Rice Others Total
1985-86 7403 1863 1926 11192
1986-87 7706 2066 1906 11678
1987-88 7308 1963 1612 10883
1988-89 7730 2042 1966 11738
1989-90 7759 2107 1970 11836
Percentage Areas Under Production
Year Wheat Rice Others Total

1985-86 66.2% 16.6% 17.2% 100%
1986-87 66.0 17.7 16.3 100
1987-88 67.2 18.0 14.8 100
1988-89 65.9 17.4 16.7 100
1989-90 65.6 17.8 16.6 100
(e) Pie Chart:
1. Pie chart is used to compare the relation between the whole and its components.
2. The difference between the component bar chart and pie chart is that in case of component
bar chart the length of the bars are used while in case of a pie chart the area of the sector
of a circle is used.
3. In pie chart, the circle is drawn with radii proportional to the square root of the quantities to
be represented because the area of a circle is given by 2πr2.
4. The sectors are coloured and shaded differently.
5. To construct a pie chart, we draw a circle with some suitable radius (square root of the
total). The angles are calculated for each sector as follows:
Angles for each sector = Component Part × 360o
Total
Development Expenditure (1994-95)
Development
Angles of Sectors Cumulative
Provinces Expenditure
(In Degrees) Angle
(In Rs. Million)
Balochistan 4874 56o
N.W.F.P. 7861 147o
Punjab 12954 297o
Sindh 5500 360o

Total 31189 360o
Continued
Top
Home page
Index Numbers I
Introduction:
1. An index number is a device which shows by its variations the change in a magnitude which is not
capable of accurate measurement in itself or of direct valuation over time.
2. To measure changes in a situation we combine the prices and qualities and find a single number. This
single number which shows overall changes in a phenomenon is known as ‘Index Number’.
3. It is used to compare changes in a complex phenomenon like the cost of living, total industrial
production, wages, etc.
4. It is very useful in measuring changes in prices and quantities of commodities with different measuring
units, for example, wheat per maund, cloth per yard, etc., which cannot be compared directly.
Types of Index Number:
(a) Price Index Number: It compares changes in prices, from one period to another. Wholesale price
index and cost of living index are the examples.
(b) Quantity Index Number: It measures how much the quantity of a variable changes over time. Index of
industrial production and business activity index are examples.
(c) Value Index Number: It measures changes in total monetary worth. It combines price and quantity
changes to present a more informative index. Index of GNP and index of retail sales are the examples.
Uses of Index Numbers:
1. An index number is a device for measuring changes in a variable or a group of related variables.
2. It can be used to compare changes in one or more variables in one period with those of others or in one
region with those in the others.
3. The index number of industrial activity enables us to study the progress of industrialisation in the
country.
4. The quantity index numbers show rise or fall in the volume of production, volume of exports and
imports, etc.
5. The cost of living index numbers are, in fact, the retail price indices. They show changes in the prices of
goods generally consumed by the people. Therefore, they can help the government to formulate the
suitable price policy.
6. The cost of living index number can be made a basis for regulation of wage rates and can be used by
industrial and commercial organisations to grant dearness allowance and bonus to their employees in
order to meet the increased cost of living.
7. Index numbers are also used for forecasting business activity and in discovering seasonal fluctuations
and business cycles.
Steps in the Construction of Index Numbers of Prices:
(i) Defining the purpose and scope of index number, i.e., the general-purpose or special purpose,
(ii) Selecting commodities to be included,
(iii) Collection of prices, i.e., (a) considering the prices to be used like average price, retail price or
wholesale price, etc; and (b) the sources of price data like from representative markets, price lists or
trade journals.
(iv) Selecting base period, (a) fixed-base method, and (b) chain-base method.
(v) Choice of average to be used, i.e., AM, median or GM.
(vi) Selecting suitable weights: (a) implicit weighting, and (b) explicit weighting.
Notations:
Pn = Price in current year

Po = Price in base year
Qn = Quantity in current year
Qo = Quantity in base year
P on = Price for the nth year to the base year
Q on = Quantity for the nth year to the base year
Construction of Price Index Numbers:
(a) Simple Relatives or Simple Index Numbers,
(b) Unweighted Index Numbers, and
(c) Weighted Index Numbers.
(a) Simple Relatives: are further classified into two categories:
(i) Price Relatives: are obtained by dividing the price in a given year by the base year price
and expressed as percentage. Thus:
Example:
The prices of sugar for 2001 and 2005 are given as below:
Year Price / Kg
2001 11
2005 30
Required:
(a) Taking 2001 as base year, find price relative for 2005.
(b) Taking 2005 as base year, find price relative for 2001.
Solution:
(a) Base year: 2001
Year Price Price Relative (v)

2005 30
(b) Base year: 2005
Year Price Price Relative (V)

2001 11
(ii) Link Relatives: are obtained by dividing the price in a given year by the price in the
preceding year and expressed as percentage:
Link relatives are not directly comparable, therefore, they are converted to a fixed based index
number. The process of conversion is called the ‘chaining process’, and the index numbers so
obtained are chain indices:
= (L.R. × C.I. of preceding year) ÷ 100
Example:
The price of rice for the 6 years is as follows:
Year Price / Kg
2000 21
2001 20
2002 20
2003 22
2004 25
2005 28
Required:
Taking 2000 as base year, find price relatives for the years 2001 to 2005.
Solution:
Year Price Price Relative (V) Chain Indices

2000 21 100%
2001 20
2002 20
2003 22
2004 25
2005 28
(b) Unweighted Index Numbers: There are two methods of constructing this type of index:
(i) Simple Aggregative Method: In this method, the total of the prices of commodities in a
given year is expressed as percentage of the total of the prices of commodities in the base year:
This method has two disadvantages which make it unsatisfactory:
• It does not take into account the relative importance of various commodities.
• The units in which prices are given, e.g., maunds, yards, gallons, etc., affect the value of
index very much.
Example:
The prices of 3 commodities for the 5 years are as follows:
Prices (per kg)

Commodity
2001 2002 2003 2004 2005
Rice 20 20 22 25 28
Sugar 11 12 14 27 30
Tea 178 176 174 180 180
Required:
Simple aggregative index numbers for the years 2001-05, with 2001 as base year.
Solution:
Prices (per kg)

Commodity
2001 2002 2003 2004 2005
Rice 20 20 22 25 28
Sugar 11 12 14 27 30
Tea 178 176 174 180 180
Total 209 208 210 232 238
Simple
Aggregative
Index
(ii) Average of Relatives’ Method: In this method, we use the average (mean, median, GM, etc.) of
the price relatives or link relatives. It does not affect the value of index numbers. The only
disadvantage of this method is that it gives equal weight to all commodities.
Example:
The prices of 3 commodities for the 5 years are as follows:
Prices (per kg)

Commodity
2001 2002 2003 2004 2005
Rice 20 20 22 25 28
Sugar 11 12 14 27 30
Tea 178 176 174 180 180
Required:
Construct price index numbers using average of relatives’ method, taking 2001 as base year.
Solution:
Prices (per kg)

Commodity
2001 2002 2003 2004 2005
Rice
Sugar
Tea
Total 300 307.97 335.02 471.52 513.85
Mean
100 102.66 111.67 157.17 171.28
(Index)
(c) Weighted Index Numbers: This type of index can be further classified into two categories:
(i) Weighted Aggregative Index Numbers: In these index numbers, the quantities
produced, sold or bought or consumed during the base year or current year are used as weights.
These weights indicate the importance of the particular commodity. Some well-known weighted
index numbers are given below:*
(1) Lespeyre’s Index: This index uses base year quantities as weights. For this reason, it is also
known as ‘Base Year Weighted Index’:
Here W = Q o
(2) Paasche’s Index: This index uses current years quantity as weights. For this reason, it is known
as ‘Current Year Weighted Index’:
Here W = Q n
(3) Fisher’s Ideal Index: This index number is the GM of the Lespeyre’s and Paasche’s index
numbers. It is called ‘ideal’ because it satisfies two tests (Time Reversal and Factor Reversal
Tests):
(4) Marshall-Edgeworth’s Index: This index number uses the average of the base year and current
quantities as weights:
Example:
2001 2005
Commodities
Price (Rs. / kg) Qty. (kgs) Price (Rs. / kg) Qty. (kgs)
Rice 20 100 28 160
Sugar 11 18 30 37
Salt 1 1 5 1.1
Milk 18 57 32 149
* W.A.I.N. is equal to
Required:
Construct the following price index numbers using 2001 as base year:
(a) Lespeyre’s
(b) Paasche’s
(c) Fisher’s
(d) Marshall-Edgeworth’s
Solution:
2001 2005
PoQo PnQo PnQn PoQn Q o +Q n P o (Q o +Q n ) P n (Q o +Q n )
Po Qo Pn Qn
Rice 20 100 28 160 2000 2800 4480 3200 260 5200 7280
Sugar 11 18 30 37 198 540 1110 407 55 605 1650
Salt 1 1 5 1.1 1 5 5.5 1.1 2.1 2.1 10.5
Milk 18 57 32 149 1026 1824 4768 2682 206 3708 6592
Total 3225 5169 10363.5 6290.1 9515.1 15532.5
(a) Lespeyre’s:
(b) Paasche’s:
(c) Fisher’s:
(d) Marshall-Edgeworth’s:
(ii) Weighted Average of Relatives: The formula of weighted average of relatives is:
or
(Arithmetic Mean taken as average); where
or
(Geometric Mean taken as average)
The total value of the commodity is used as weights. If the base year value (P o Q o ) is used as base,
then the formula becomes:
or
If the current year value (P n Q n ) is used as base, then the formula becomes:
Example:
Prices
Commodity Weights
2001 2005
Rice 20 28 35
Tea 178 180 5
Sugar 11 30 24
Required:
Weighted index for 2005, taking 2001 as base year using:
(a) Arithmetic Mean
(b) Geometric Mean
Solution:
(a) Arithmetic Mean:
Commodity V W VW
Rice 35 4900
Tea 5 505.6
Sugar 24 6545.52
64 11951.12
(b) Geometric Mean:
Commodity V W log V W.logV

Rice 35 2.146 75.11
Tea 5 2.005 10.025
Sugar 24 2.436 58.464

64 143.599
Quantity Index Number: The formula described for obtaining price indices can be easily used to obtain
quantity indices or volume indices simply by interchanging the Ps and Qs, for example:
and:
The Lespeyre’s index number can be converted as follows:
and so on.
Value Index Numbers: Like price or quantity index numbers, we can obtain formulae for value index numbers.
The simplest value index number is defined as below:
This is the ‘Simple Aggregative Index’ because the values have not been obtained.
Continued
Top
Home Page
Measures of Dispersion
Definition:
1. Two or more distributions may differ greatly in their dispersion, although their means may be the same, for e.g.:
67,67,67,67,67,67,67,67
43,43,50,55,66,90,91,97
2. By dispersion we mean the extent to which the values are spread out from the average. The measures used for
computing the amount of dispersion in a distribution is known as ‘measures of dispersion’ or ‘measures of
variation’.
3. In the above distribution, the first distribution has zero dispersion, and the second distribution has a dispersion
greater than the former. The dispersion cannot be less than zero.
Types of Measures of Dispersions:
Measures of dispersion are of two types:
(i) Measures of Absolute Dispersion, and
(ii) Measures of Relative Dispersion.
(i) Measures of Absolute Dispersion: The actual variation or dispersion determined by the Measures
of Absolute Dispersion is called ‘absolute dispersion’.
(ii) Measures of Relative Dispersion: The measures of absolute dispersion cannot be used to compare
the variation of two or more series. For e.g., the SD of the height of students (in inches) cannot be
compared with the SD of weights (in pounds). Even if the units are identical, for e.g., the comparison of
height of men (in inches) and length of their noses (in inches). If the SD of heights of man is greater than the
SD of their nose lengths, it does not mean that the degree of variability is greater in case of heights.
To compare the variation of two or more series, we need a measure of relative dispersion. It is defined as:
Types of Measures of Absolute Dispersion:
(a) The Range,
(b) The Quartile Deviation,
(c) The Mean Deviation, and

(d) The Standard Deviation.
(a) The Range:
1. The range is the simplest measure of dispersion. It is defined as the difference between the largest
value and the smallest value in the data:
2. For grouped data, the range is defined as the difference between the upper class boundary (UCB) of the
highest class and the lower class boundary (LCB) of the lowest class.
(b) Quartile Deviation (QD):
1. It is also known as the Semi-Interquartile Range. The range is a poor measure of dispersion where
extremely large values are present. The quartile deviation is defined half of the difference between the
third and the first quartiles:
2. The difference between third and first quartiles is called the ‘Inter-Quartile Range’.
(c) Mean Deviation (MD):
1. The MD is defined as the average of the deviations of the values from an average:
It is also known as Mean Absolute Deviation.
2. MD from median is expressed as follows:
3. For grouped data:

(d) Standard Deviation (SD):
1. The SD is defined as the positive Square root of the mean of the squared deviations of the values from
their mean.
2. Thus, the SD of population of N values, x 1 , x 2 , ….. x n is expressed as follows:
--------------------- Population Standard Deviation
3. In case of a frequency distribution with x 1 , x 2 , ….. , x k as class marks, and f 1 , f 2 , ……, f k as the
corresponding class frequencies, the SD is expressed as follows:
Alternate Method for Computing Standard Deviation:
1. If the values (or class marks) and the mean are not integral values, the computation of SD from its definition
becomes labourious.
2. The shortcut alternate method for computing SD is:
---------------- for ungrouped data (population SD)
-------------- for grouped data
3. If the values x are large, considerable time is served by taking deviations from x from an arbitrary value A. If D
denotes deviations of x from A, i.e., D = x – A, then the SD can be expressed in another way:
4. Under coding method, the SD can be calculated as below:
Where u =
The Variance:
The variance is defined as the square of the SD, i.e., the mean of the squared deviations from mean:
----------------- for ungrouped data (population variance)
---------------- for grouped data
Sample Variance and Standard Deviation:
1. Variance of a sample of n values called sample variance, is expressed as below:
2. Standard deviation of sample of n values:
Alternate Method:
1. Variance:
2. Standard Deviation:
Properties of SD and Variance:
1. The SD or variance of a constant is zero. If x = a (a constant), SD(a) = 0 and var(a) = 0.

2. The SD and the variance are independent of origin, i.e., they remain unchanged when the values are increased
or decreased by a constant:
SD(x + a) = SD(x); var(x + a) = var(x)
SD(x – a) = SD(x); var(x – a) = var (x)

3. When all the values are multiplied or divided by a constant the SD of these values is multiplied or divided by the
constant and the variance is multiplied or divided by the square of the constant:
SD(ax) = a × SD(x); var(ax) = a × var(x)
SD(x/a) = (1/a) × SD(x); var(x/a) = (1/a) × var(x)
4. If two sets of data consisting of n 1 and n 2 have variances S 1 2 and S 2 2 respectively, the combined variance of both
sets of data is expressed as follows:
5. The variance of the sum or difference of two independent random variables is the sum of their respective
variance. Thus, if x and y are independent random variables:
Var(x + y) = Var(x) + Var(y)
Var(x – y) = Var(x) – Var(y)
6. The variance has the minimal property. This means that the variance or the SD is minimum if and only if the
deviation are taken from the mean. In other words:
is a minimum when
7. For normal distributions:
(i) the interval to includes 68.27% of the values,
(ii) the interval to includes 95.45% of the values, and
(iii) the interval to includes 99.73% of the values.
The above results also hold approximately for moderately skewed distributions.
Characteristics of Measures of Dispersion:
(a) Range:
1. The range is simple to understand and easy to calculate because its value is determined by the two
extreme items.
2. It is useful as a rough measure of variance.
3. Its value may be greatly changed if an extreme value (either lowest or highest) is withdrawn or a fresh
value is added. It is a highly unstable measure of variation.
4. It gives no indication how the values within the two extremes are distributed.
(b) Quartile Deviation:
1. The QD is simple to understand and easy to calculate.

2. As a rough measure of variation, it is superior to the range because it is not affected by extreme
values.
3. It is not capable of algebraic manipulation.
4. It is mainly used in situations where extreme values are thought to be un-representative.
(c) Mean Deviation:
1. The MD is simple to understand and to interpret.

2. It is affected by the value of every observation.
3. It is less affected by absolute deviations than the standard deviation.
4. It is not suited to further mathematical treatment. It is, therefore, not as logical as convenient
measure of dispersion as the SD.
(d) Standard Deviation:
1. The SD is affected by the value of every observation.

2. The process of squaring the deviations before adding avoids the algebraic fallacy of disregarding
signs.
3. In general, it is less affected by fluctuations of sampling than the other measures of dispersion.
4. It has a definite mathematical meaning and is perfectly adaptable to algebraic treatment.

5. It has great practical utility in sampling and statistical inference.
6. The SD is the best general purpose measure of dispersion and should be employed in all cases
where a high degree of accuracy is required.
Example:
Class Boundaries Frequency

9.5-19.5 5
19.5-29.5 8
29.5-39.5 13
39.5-49.5 19
49.5-59.5 23
59.5-69.5 15
69.5-79.5 7
79.5-89.5 5
89.5-99.5 3
99.5-109.5 2
Total 100
Calculate:
(a) Range
(b) Quartile deviation
(c) Mean deviation from mean
(d) Standard deviation
(e) Variance
Solution:
f
CB CF x fx
9.5-19.5 5 5 14.5 72.5 -37.7 37.7 188.5 1421.29 7106.45
19.5-29.5 8 13 24.5 196 -27.7 27.7 221.6 767.29 6138.32
29.5-39.5 13 26 34.5 448.5 -17.7 17.7 230.1 313.29 4072.77
39.5-49.5 19 45 44.5 845.5 -7.7 7.7 146.3 59.29 1126.51
49.5-59.5 23 68 54.5 1253.5 2.3 2.3 52.9 5.29 121.67
59.5-69.5 15 83 64.5 967.5 12.3 12.3 184.5 151.29 2269.35
69.5-79.5 7 90 74.5 521.5 22.3 22.3 156.1 497.29 3481.03
79.5-89.5 5 95 84.5 422.5 32.3 32.3 161.5 1043.29 5216.45
89.5-99.5 3 98 94.5 283.5 42.3 42.3 126.9 1789.29 5367.87
99.5-109.5 2 100 104.5 209 52.3 52.3 104.6 2735.29 5470.58
Total 100 5220 1573 40371
(a) Range:
(b) Quartile Deviation:
(c) Mean Deviation from Mean:
(d) Standard Deviation:
(e) Variance:
Types of Measures of Relative Dispersions:
(a) Coefficient of Variation,
(b) Coefficient of Dispersion,
(c) Quartile Coefficient of Dispersion, and
(d) Mean Coefficient of Dispersion.
(a) Coefficient of Variation (CV):
1. Coefficient of variation was introduced by Karl Pearson. The CV expresses the SD as a percentage in
terms of AM:
---------------- for sample data
--------------- for population data
2. It is frequently used in comparing dispersion of two or more series. It is also used as a criterion of
consistent performance, the smaller the CV the more consistent is the performance.
3. The disadvantage of CV is that it fails to be useful when is close to zero.
4. It is sometimes also referred to as ‘coefficient of standard deviation’.
5. It is used to determine the stability or consistency of a data.
6. The higher the CV, the higher is instability or variability in data, and vice versa.
(b) Coefficient of Dispersion (CD):
If X m and X n are respectively the maximum and the minimum values in a set of data, then the coefficient of
dispersion is defined as:
(c) Coefficient of Quartile Deviation (CQD):
1. If Q 1 and Q 3 are given for a set of data, then (Q 1 + Q 3 )/2 is a measure of central tendency or average of
data. Then the measure of relative dispersion for quartile deviation is expressed as follows:
2. CQD may also be expressed in percentage.
(d) Mean Coefficient of Dispersion (CMD):
The relative measure for mean deviation is ‘mean coefficient of dispersion’ or ‘coefficient of mean deviation’:
-------------------- for arithmetic mean
-------------------- for median
Example:
(Take the previous example)
Calculate:
(a) Coefficient of Variation,
(b) Coefficient of Dispersion,
(c) Quartile Coefficient of Dispersion, and
(d) Mean Coefficient of Dispersion
Solution:
(a) Coefficient of Variation:
(b) Coefficient of Dispersion:
(c) Quartile Coefficient of Dispersion:

(d) Mean Coefficient of Dispersion:
Example:
During a soccer tournament, two players make the following series of goals:
Player 1 2 2 4 3 2 4 2 3
Player 2 1 2 5 5 5 2 1 1
Who is more consistent player?
Solution:
x y
2 1 -0.75 0.5625 -1.75 3.0625
2 2 -0.75 0.5625 -0.75 0.5625
4 5 1.25 1.5625 2.25 5.0625
3 5 0.25 0.0625 2.25 5.0625
2 5 -0.75 0.5625 2.25 5.0625
4 2 1.25 1.5625 -0.75 0.5625
2 1 -0.75 0.5625 -1.75 3.0625
3 1 0.25 0.0625 -1.75 3.0625
22 22 5.5 25.5
Conclusion: The higher the CV, the higher the instability, and vice versa. From the above calculations, it is evident that
Player 1 is more consistent than Player 2.
Standard Scores or Z-Scores:
Raw data can be converted into a special type of values by subtracting the mean from each value and then dividing by
the SD of the data. These values are called ‘standard scores’ or ‘z-scores’ or ‘values in SD units’:
----------------------- for sample data
----------------------- for population data
Properties of Z-Score:
1. Z-scores are free of units.
2. The mean of z-scores is always zero.
3. The SD of z-scores is always one.
4. The distribution of z-scores looks exactly the same as the distribution of original data.
Example:
A student gets 82 marks in a final examination in Accounting; the mean is 75 marks with a standard deviation of 10
marks. In Economics, he gets 86 marks in the final examination on which the mean is 80 marks with a SD of 14 marks. Is
his relative standing better in Accounting or Economics?
Solution:
Accounting Economics
S = 10 S = 14
x = 82 x = 86
Conclusion: His marks in Accounting are 0.7 SD above the mean, while in Economics his marks are 0.43 SD above the
mean. Therefore, his relative standing in Accounting is higher than Economics.
Chebyshev’s Theorem:
1. A Russian mathematician P.L. Chebyshev has devised a rule called ‘Chebyshev’s Theorem’ to determine the
minimum proportion of values in intervals that are equidistant from mean.
2. The theorem states that for any data at least of the values must lie within k standard deviations on
either side of the mean, where k is any constant number greater than 1.
3. In other words, the interval will contain at least of the values. For example:
will contain 75% of the values (k=2)
will contain 88.88% of the values (k=3)
will contain 82.64% of the values (k=2.4)
Limitations of Chebyshev’s Theorem:
1. Proportions of values are given only for intervals which are equidistant from mean, that is the mean should
always be the mid-point of the interval.
2. Minimum proportion is specified rather than exact or approximate value of the proportion.
3. Proportions for values of k less than or equal to one cannot be determined.
Example:
Two populations have the same mean . Their SDs are . Find the percentages of the values
that must lie between 125 and 155.
Solution:
Population 1 Population 2
Therefore 125 to 155 will contain at least: Therefore 125 to 155 will contain at least:
Normal Distribution:
1. Three mathematicians, namely, P. Laplace, A. De Moivre and K.F. Gauss have independently developed a law
which gives the proportion of values that lie in specific intervals of a special type of symmetrical distribution
called ‘Normal Distribution’.
2. The mathematical form of a normal distribution is complicated and difficult to use frequently. Tables have
constructed to make the application of normal law simple, known as ‘tables of areas under normal curve’ or
‘normal area tables’.
3. Whenever the frequency curve is bell shaped or symmetrical, the distribution (or curve) can be assumed
approximately normal and hence normal law can be applied.
Interval Percentage of Values

68%
95%
99.7%
Linear Transformation of a Variable:
1. Let and S x be the mean and SD of a variable x.

2. Let the variable x multiplied by a constant number and a constant number added to the product giving a new
variable y.
3. Then the variable x is said to be linearly transformed to the variable y and the process is called a ‘linear
transformation of x to y’.
4. Symbolically is a linear transformation where k and h are any constant numbers.
5. The mean and SD of the transformed variable y may be expressed in terms of the mean and SD of the variable x
by the following relations:
6. It should be noted here that the z-score is a linear transformation of a variable x such that:
and
Since or
Example:
Given: .
Determine the mean and standard deviation of the following transformations of x:
(i)
(ii)
Solution:
(i) :
Rules:
SD(x + a) = SD(x)
SD(ax) = a × SD(x)
(ii) :
Rules:
SD(x + a) = SD(x)
SD(ax) = a × SD(x)
Top
Home Page
Random Variable and

Its Probability Distribution
Random Numbers:
1. In our every day life, we base many of our decisions on random outcomes, i.e., change occurrence. For
e.g., captains of two cricket teams toss a coin to decide as to which team will play first, or lotteries are
drawn by spinning wheal, etc.
2. Random numbers are the numbers obtained by some random process (manually or mechanically).
3. These numbers are assumed to be randomly and uniformly (equally) distributed. The basic random
numbers are the 10 one-digit numbers, i.e., 0, 1, 2, ………. 9. Each of these numbers has an equal
change 1/ 10 of being selected.
4. Random numbers can be generated manually as well as mechanically. Random numbers can be
generated manually by drawing cards from playing cards or rotating spinning wheel, etc. Mechanically
generated random numbers are from calculators and computers.
5. The most common use of random numbers is for selection of samples.
Random Variables:
1. Experiments in which outcomes vary from trial to trial are called ‘Random Experiments’.
2. A variable whose values are determined by the outcomes of a random experiment is called a random
variable.
3. In other words, random variable is a rule which assigns numbers to the outcomes of the possibility space
and is denoted by X.
4. For example, throwing of a die is a random experiment and its outcomes, i.e., the occurrence of 1, 2, 3,
3, 4, 5 and 6 is a random variable.
5. A random variable is also called a ‘chance variable’, ‘stochastic variable’ or simply a ‘variable’.
Capital letters of X or Y are used to denote a variable and lower case letters x or y are used to denote its
values.
6. Many random variables may be defined for one and the same possibility space.
7. When any characteristics of the individuals of a population (or a sample) are measured or counted, the
characteristic itself is a random variable.
8. The random variables are further bifurcated into:
(a) Discrete Random Variable, and
(b) Continuous Random Variable.
(a) Discrete Random Variable: A random variable which can assume only a finite number of values or
a sequence of whole numbers is called a discrete random variable. For example, the number of spots
on a die is a discrete random variable, number of persons enrolled for CSS examinations, number of
students passed in 1st division in a particular class, number of defective items in a lot, etc. are
discrete random variables, which could assume any of the possible values, i.e., 1, 2, 3…….
(b) Continuous Random Variable: A random variable which can assume all possible values on a
continuous scale in a given interval is called a continuous random variable. For example, height,
weight, temperature, distance, life periods, speed, etc. are continuous random variables.
Example:
A coin is tossed three times. Find the possibility space and define two random variables for this possibility
space.
Solution:
S = {HHH, HHT, HTH, THH, HTT, TTH, THT, TTT}
(i) Let a random variable (X) the number of heads:
X = no. of heads.
Note: The same value may be assigned to different outcomes of the possibility space.
(ii) Let a random variable (X) head as +1 and tail as –1:
Probability Distribution:
1. An arrangement of all possible values of a random variable along with their respective probabilities is
called a ‘probability distribution’ or a ‘probability function’.
2. Probability distribution can be further bifurcated into:
(a) Discrete Probability Distribution, and
(b) Continuous Probability Distribution.
(a) Discrete Probability Distribution: Let a discrete random variable X assume values x 1 , x 2 , x 3 ,
……….., x n with respective probabilities P(x 1 ), P(x 2 ), P(x 3 ), …………, P(x n ). Since the random
variable takes a discrete set of values, it is also called a discrete probability distribution. A discrete
probability distribution may take the form of a table, a graph or a mathematical equation.
A probability distribution is similar to a relative frequency distribution with probabilities replacing

relative frequencies.
A discrete probability distribution must possess the following two properties:
(i) 0 ≤ P(x i ) ≤ 1
(ii) ∑P(x i ) = 1, which means that the sum of probabilities is equal to one.
Example:
A coin is tossed three times. Find the probability distribution of the random variable number of heads.
Solution:
S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
No. of Heads Probability of X

X P(X)
1
0 /8
3
1 /8
3
2 /8
1
3 /8
Total 1
Example:
Determine whether the function for X = 1, 2, 3 and 4 can be a probability distribution.
Solution:
X P(X)
2
1 / 14
3
2 / 14
4
3 / 14
5
4 / 14
Total 1
(b) Continuous Probability Distribution: As we known that a random variable which can assume all
possible values within a given interval is called a continuous random variable. Within a given interval,
there are an infinite number of values. For example, there may be an infinite number of weights
between 69.5 kgs and 70.5 kgs. In case of a continuous random variable, therefore, we compute
probabilities for various intervals of continuous random variable, such as P(a ≤ X ≤ b) or P(X ≥ c).
The probability distribution of a continuous random variable cannot be presented in tabular form. It can
be represented by means of a formula or through a graph. The formula is necessarily in the form of a
function of the numerical values of the continuous random variable X. For e.g., a continuous random
variable can assume values between X = 2 and X = 4 and the function is given by:
The continuous probability distribution is further discussed in detail later.
Mean and Variance of a Random Variable:

In a probability distribution of a random variable X, the mean, also referred to as ‘Mathematical Expectation’ or
‘Expected Value’, and the variance are defined as:
μ = E(X) = Σ X · P(X)
and σ2 = V(X) = Σ X2 · P(X) – [E(X)]2
Distribution Function:
A function showing probabilities that a random variable X has a value less than or equal to x is called the
‘cumulative distribution function’ or ‘distribution function of x’.
Symbolically, the cumulative distribution function, denoted by f(x) is defined as:
The cumulative distribution function has the following properties:
(i) f(– ∞) = 0 and f(∞) = 1, which means that f(x) is an increasing function ranging from 0 to 1.
(ii) If a < b then f(a) < f(b) for any real numbers a and b.
For a discrete random variable, distribution function is obtained by cumulating probabilities just as we obtained
cumulative distribution.
The distribution function for the probability distribution of the previous two examples is as below:
x f(x)
x<0 0
1
0≤x<1 /8
4
1≤x<2 /8
7
2≤x<3 /8
x≥3 1
x f(x)
x<1 0
2
1≤x<2 / 14
5
2≤x<3 / 14
9
3≤x<4 / 14
x≥4 1
Example:
Calculate the mean and variance for the following probability distribution:
X 0 1 2 3 4 5 6 7
P(X) 0.11 0.23 0.34 0.16 0.10 0.06 0.04 0.01
Solution:
X P(X) Xּ P(X) X2ּ P(X)

0 0.11 0 0
1 0.23 0.23 0.23
2 0.34 0.68 1.36
3 0.16 0.48 1.44
4 0.10 0.40 1.60
5 0.06 0.30 1.50
6 0.04 0.24 1.44
7 0.01 0.07 0.49
Total 1 2.4 8.06
μ = E(X) = Σ X · P(X) = 2.4
σ2 = V(X) = Σ X2 · P(X) – [E(X)]2 = 8.06 – (2.4)2 = 2.3
Binomial Probability Distribution:
1. Binomial probability is a mathematical formula to determine probabilities of the discrete values of a

random variable called ‘Binomial Random Variable’.
2. The following are the conditions of Binomial Probability:
(i) If an experiment contains only two possible outcomes, i.e., success or failure.
(ii) The probability of ‘success’ is denoted by ‘p’ and the probability of ‘failure’ is denoted by
‘q’ where q = 1 – p or p + q = 1.
(iii) Such an experiment is repeated n times independently. In independent repetitions, the

probability p remains constant.
3. The number of success in n experiments is the Binomial Random Variable and is denoted by X. The
possible values of X are 0, 1, 2, 3, 4, ….., n. The probabilities of the values of X are calculated by the
following formula:
Where x = 1, 2, 3, 4, ………, n
The above formula is ‘Binomial Probability Distribution’. The two constant quantities p and n are
called the parameters of a Binomial Distribution. The quantity q is not a separate parameter because q =
1 – p.
Mean and Variance of a Binomial Distribution:
The mean and variance of a binomial distribution are directly evaluated in terms of its parameters p and n.
Example:
A coin is tossed 3 times. ‘Number of heads’ in 3 tosses is the random variable X. Calculate probabilities of all
possible values of X. Also calculate mean and variance.
Solution:
Experiment: A coin is tossed for 3 times.
Success: Head
p = P(success) = P(head) = ½
n = number of times the coin is tossed = 3
x = 0, 1, 2, 3.
Now applying the Binomial Formula:

P(x=0) = =
P(x=1) = =
P(x=2) = =
P(x=3) = =
Mean and Variance:
X P(X) X.P(X) X2.P(X)

0 0.125 0 0
1 0.375 0.375 0.375
2 0.375 0.75 1.5
3 0.125 0.375 1.125
Total 1 1.5 3
Hyper Geometric Probability Distribution:
1. It is a formula to determine the probabilities of the values for a random variable called ‘Hyper
Geometric Random Variable’.
2. Following are the conditions of hyper geometric random variable:
(i) There are N items of which K are of first kind and the remaining (N – K) are of second
kind,
(ii) A sample of n items is randomly drawn without replacement from the N items.
Number of items of first kind in the sample is the random variable X:
Possible values of X are 0, 1, 2, ………, k when n ≥ K and
0, 1, 2, ……….., n when n < K
3. The probabilities of these values are calculated by the formula:

Where x = 0, 1, 2, 3, …….., k when n ≥ k
And x = 0, 1, 2, 3, ………, n when n< k
The above formula is called ‘Hyper Geometric Probability Distribution’. A schematic explanation of
this formula may be given as:
Example:
A committee of 3 persons is to be formed from among 3 men and 2 women. If the selection of the committee
members is random, construct the probability distribution of the random variable ‘Number of women in the
committee’.
Solution:
Where x = 0, 1, 2, 3, ……, k; when n ≥ k
And x = 0, 1, 2, 3, …….., n; when n < k

The Hyper Geometric Probability Distribution of RV ‘No. of Women in the Committee’ is as follows:
X P(X)
0 0.1
1 0.6
2 0.3
Total 1
Poisson Probability Distribution:
1. A random variable created by counting the number of items or events in a unit of either time or space is
called a ‘Poisson Random Variable’.
2. Examples of Poisson random variable are the number of accidents per day on a highway, number of cars
arriving at petrol pump in a five minute period of time, number of typing mistakes per page and number
of defects in a painted surface, etc.
3. A Poisson probability distribution formula assigns probabilities to the values of the ‘Poisson Random
Variable’:
Where x = 0, 1, 2, 3, ……..
4. Where λ (lambda) is the only parameter of the distribution and e is the mathematical constant
2.71828………..:
(i) The number of events per unit of time or space remains stable for a long period of time. This is
the parameter of the distribution denoted by λ.
(ii) The number of events in one time period is independent of the number of events in another time
period.
Example:
In an industry, the average number of damaged output units per week is 10. What is the probability that there
will be (i) no damaged unit in the next week, (ii) 5 damaged units in the next week, and (iii) 15 damaged units
in the next week.
Solution:
(i) no damaged unit in the week:
X = number of damaged output units next week = 0
λ = average number of damaged units per week = 10
(ii) 5 damaged units in the next week:
(iii) 15 damaged units in the next week:
Poisson Approximation to Binomial Distribution:
The computations involved in the binomial distributions become quite tedious when n is large. In such cases
the binomial distribution can be approximated to a Poisson distribution with λ = n ּ p under the following
conditions:
(i) n is very large,
(ii) p is very small, and
(iii) n ּ p is finite.
A frequently used rule of thumb is that the approximation is appropriate when p ≤ 0.05 and n ≥ 20. However,
the Poisson distribution sometimes provides close approximations even in cases where n is not large nor p is
very small.
Example:
In a village, the local government approximated that 2% of the population are infected with seasonal flu due to
absence of proper medication. What is the probability that the number of infected persons in a random sample
of 50 will be 4?
Solution:
Using binomial distribution with:
n = 50, p = 0.02 and x = 4
Using Poisson approximation to the binomial with:
λ = n ּ p = 50 × 0.02 = 1
The Poisson probability is close to the binomial probability.
Mean and Variance of Poisson Distribution:
The mean of a Poisson Random Variable is the parameter of the Poisson distribution λ, that is:
E(X) = λ
The variance is also the parameter λ:
V(X) = λ
Thus mean and variance of Poisson distribution are equal to λ.
Continuous Probability Distribution (In Detail):
1. The concept of probability for continuous random variable is somewhat different with that of a discrete
random variable.
2. The function or the formula of continuous probability distribution is generated and its curve is drawn on
a graph paper such that:
(i) the function is non-negative for all possible values of the random variable, and
(ii) the total area under the curve of the function is one.
This function is called ‘probability density function’ and its curve a ‘probability curve’.
3. The probability of an interval from a to b is defined as the area under the probability curve between the
two vertical lines erected on the x-axis at the points a and b.
4. The probability of an individual value under the continuous probability distribution is considered zero.
5. Probabilities of continuous random variable are represented by areas under the probability curve.
Normal Probability Distribution:
1. The most important and widely used probability density function is the ‘Normal Distribution’ where
probability curve is a bell shaped symmetrical curve:
2. The most mathematical form of Normal Probability Density Function is:
Where – ∞ ≤ x ≤ ∞
3. A normal probability distribution or its probability curve characterised by two quantities μ and σ called
the parameters of the distribution.
4. Two normal curves with different means μ and equal standard deviations σ are as below:
5. The normal curves with different standard deviations σ and equal means μ:
6. Two normal curves with different means μ and different standard deviations σ:
Area under Normal Curve:
1. The area between two limits of an interval under a normal probability curve cannot be determined
analytically.
2. Tables of areas evaluated numerically could have been constructed but it would be impossible for an
infinite number of normal curves for all values of μ and σ.
3. This problem is overcome by ‘Standard Normal Probability Distribution’ whose mean is zero (μ = 0)
and standard deviation is one (σ = 1). The standard normal variable is denoted by ‘x’:
4. The table of areas under the standard normal curve is used to find area under normal probability curve:
5. Following steps are involved in determining the area or probability of a particular interval of a normal
distribution with μ and σ:
(i) Determine the z-values for each limit of interval,
(ii) From the normal area table, determine the area for each z-value,
(iii) Subtract the smaller area from the larger one.
6. Precisely, a value of random variable ‘x’ can be converted to value ‘z’ by:
Where μ and σ are the mean and standard deviation of the random variable z.
7. Conversely, the z-value can be converted into random variable x by:
x=μ+σ·z
8. ‘z’ is the number of standard deviations from or to the mean. All intervals containing the same number
of standard deviations from mean will contain the same area under the curve for any normal distribution.
9. ‘Normal Area Table’ gives an idea under the curve to the left of a z-value. For example, for z = 1.51,
the Area under Normal Curve (as shown in the Table) is 0.9345; for z = – 2.69, the Area under Normal
Curve (from the Table) is 0.0036.
10. Some of the rules should be remembered:
(i) Area to the left of z = 0 is 0.5000
(ii) Area to the left of z = is 0
(iii) Area to the left of z = is 1.000
Example:
A normal random variable x has mean µ = 24 and standard deviation σ = 1.8. Determine z values for x = 14,
15.9, 29.2 and 33. Also show these values on normal curve.
Solution:
For x = 14;
For x = 15.9;
For x = 29.2;
For x = 33;
Example:
A normal random variable x has mean μ = 36 and standard deviation 2.05, determine the values of x for z = –
3.36, – 1.8, 0.95 and 2.75.
Solution:
x=μ+σ·z
For z = – 3.36; x = 36 + 2.05 × (– 3.36) = 29.112 ≈ 29.11
For z = – 1.8; x = 36 + 2.05 × (– 1.8) = 32.31
For z = 0.95; x = 36 + 2.05 × 0.95 = 37.9475 ≈ 37.95
For z = 2.75; x = 36 + 2.05 × 2.75 = 41.6375 ≈ 41.64
Example:
The mean and SD of a normal random variable are 34.5 and 5.8 respectively. Find the following areas:
(i) to the left of 19.5
(ii) to the right of 40
(iii) between 19.5 and 40
Solution:
(i) to the left of 19.5, i.e., P(x ≤ 19.5):
P(– ∞ ≤ x ≤ 19.5) = P(– ∞ ≤ z ≤ –2.59) = 0.0048
Where
(ii) to the right of 40, i.e., P (x ≥ 40):
P(40 ≤ x ≤ ∞) = P(0.95 ≤ z ≤ ∞) = 0.8289
Where
(ii) between 19.5 and 40, i.e., P(19.5 ≤ x ≤ 40):
P(19.5 ≤ x ≤ 40) = P(– 2.59 ≤ z ≤ 0.95) = 0.8289 – 0.0048 = 0.8241
Continuity Correction:
1. A population with unknown mean and standard deviation can be assumed a normal population of the
frequency distribution of a sample is symmetrical. The sample mean and sample standard deviation are
used as estimates of population mean and population standard deviation respectively.
2. Observations or data are always discrete, recorded up to a certain degree of accuracy irrespective of
whether the variable itself is discrete or continuous.
3. When the symmetrical distribution of any data is assumed to be normal, a continuity correction is
applied to the observed values to make the data continuous.
4. If the data are recorded in whole numbers, data values are considered as mid-points of the intervals x ±
0.5, if the data are recorded up to one decimal place, data values are considered as mid points of the
intervals x ± 0.05 and so on. It should be cleared that the 0.5 and 0.05 should be subtracted from lower
limit and added to upper limit or at most limit.
Normal Approximation to Binomial Distribution:
A Binomial Distribution with large n and moderate p can be approximated to a Normal Distribution with mean
μ = nּ p and :
μ = nּ p
Example:
A pair of dice is rolled for 800 times. What is the probability that a total of 6 occur:
(i) at least 100 times, and
(ii) between 150 to 300 times.
Solution:
n = 800
p= E = {15, 24, 33, 42, 51}
q=
(i) Probability of at least 100 times, i.e., P(100 ≤ x ≤ 800) or P(99.5 ≤ x ≤ 800.5):
P(–1.19 ≤ z ≤ 70.49)
From ‘Normal Area Table’ the Normal Area corresponding to – 1.19 is 0.1170
= 1 – 0.1170 = 0.8830
(ii) Probability of between 150 and 300 times, i.e., P(130 ≤ x ≤ 300) or P(149.5 ≤ x ≤ 300.5):
P(1.88 ≤ z ≤ 19.37)
From ‘Normal Area Table’ the Normal Area corresponding to 1.88 is 0.9699
= 1 – 0.9699 = 0.0301
Top
Home Page
Regression
1. The term ‘regression’ was used by Sir Frances Galton in connection with the studies he made on the
statures fathers and sons.
2. It is a technique which determines a relationship between two variables to estimate one of the variables
(dependent) for a given value of the other variable (independent).
3. The variable whose value is to be estimated is called dependent variable (y) whereas the variable whose
value is given is called independent variable (x).
4. Examples of dependent and independent variables are:
Independent Dependent
Price Demand
Rainfall Yield
Credit sales Bad debts
Volume of production Manufacturing expenses
5. The values of the independent variable are assumed to be fixed. Hence it is not a random variable. On
the other hand, the dependent variable, whose values are determined on the basis of the independent
variable, is a random variable.
6. If x is the independent variable and y is the dependent variable then the relationship between x and y,
described by a straight line (y = a + bx), is called ‘linear relationship’.
Regression Lines:
1. If we plot the paired observations (X 1 Y 1 ), (X 2 Y 2 ), ……….., (X n Y n ) on a graph, the resulting set of points
is called a ‘scatter diagram’.
2. A scatter diagram indicates a relationship between the variables X and Y and the dots of the scatter
diagram tend to cluster around a curve or a line. Such a curve or line is known as ‘curve of regression’
or ‘line of regression’.
Linear Regression Model:

1. For a fixed value of independent variable ‘x’, if the value of dependent variable ‘y’ is observed a large
number of times, different values are possible each time because of the random error involved in the
measurement process. The mean of these y-values is called the ‘conditional mean of y given x’ and is
denoted by .
2. The linear relationship between and x is called a ‘population regression equation of y on x’:
Where α and β are the parameters of the equation.
3. An observation y i is the sum of a population mean and a component called ‘Random Error ( )’
(read as “epsilon”).
or
This equation is called a ‘linear regression model of y on x’ and is the random variable with mean is
equal to zero and variance .
4. In the above diagram, the line represents the line of regression of Y on X. The parameter α, which is the
expected value of Y when X = 0, is called Y-intercept. The parameter β is slope of the population
regression line and is known as the ‘population regression coefficient’. When the line slopes
downward to the right, the value of β will be negative; it then represents the amount of decrease in Y
for each unit increase in X.
5. In practice, the population regression line is unknown. Since the regression is defined by the Y-intercept
α and the slope β, therefore, the task of estimating the population regression line involves obtaining the
estimates of α and β (based on sample data). Thus the ‘population regression line’ (μ y/x = α + βx) is
estimated by the ‘sample regression line’ or ‘sample regression equation’:
------------------------ (i)
The problem of estimating the regression parameters α and β can be considered as fitting the best model
on the scatter diagram. One method for this purpose is the ‘method of least squares’.
Method of Least Squares:
1. According to the principle of least squares, a line or a curve is best fitted if the sum of squares of the
deviations of estimated values of y from the observed values of y is minimum. Such line or a curve is
called the ‘least square curve’ or ‘least square line’. And the sum of squares is called the ‘Error Sum of
Squares (ESS)’. Therefore, the ESS is to be minimised and is represented by:
ESS =
Where ESS : Error sum of squares
yi : observed values
: estimated values, i.e., ( )
It is further elaborated as:
ESS = Σ(y i – a – bx)2
2. As we know that the statistic b is an estimator of β, is known as ‘sample regression coefficient’. It

measures changes in y per unit change in x. Therefore, it represents the slope of regression line.
Mathematically it is represented as below:
------------------------ (ii)(a)
------------------------ (ii)(b)
3. The statistic a is the estimator of α, is called the ‘sample regression constant’, and it measures the y-
intercept of the sample regression line:
------------------------ (iii)
4. Now assume ‘y’ to be ‘independent’ and ‘x’ to be ‘dependent’. The ‘regression equation of x on y’ is as
follows:
------------------------ (i)
------------------------(ii)(a)
------------------------(ii)(b)
------------------------(iii)
Example:
A sample of paired observations is given as below:
X 2 4 6 7 9 10 11
Y 1 2 4 7 10 12 14
Required:
(a) Fit a line of regression to the data in the above table.
(b) Construct a scatter diagram and graph the fitted line on the scatter diagram, and
(c) Calculate error sum of squares.
Solution:
(a):
Regression Line of Y on X
x y xy x2
2 1 2 4 –0.438 1.438 2.068
4 2 8 16 2.594 –0.594 0.353
6 4 24 36 5.626 –1.626 2.644
7 7 49 49 7.142 –0.142 0.020
9 10 90 81 10.174 –0.174 0.030
10 12 120 100 11.69 0.31 0.096
11 14 154 121 13.206 0.794 0.630
49.994 0.006
49 50 447 407 5.841
≈ 50 ≈0
-------------------- (i)
-------------------- (ii)
------------------------- (iii)
For x = 2,
x = 4,
x = 6,
x = 7,
x = 9,
x = 10,
x = 11,
(b):
(c) Error Sum of Squares (ESS):
ESS =
= 5.841
Coefficient of Determination:
1. A measure of variation in a sample of n values is given by the sample variance:
It measures the variation in y about the sample mean . The term is called ‘Total Sum of
Squares (TSS)’.
2. Another measure of variance in a sample of n paired values is called ‘variance of estimate’:
It measures the variation in y about the estimated regression line. The term is called the
‘Error Sum of Squares (ESS)’:
ESS ≤ TSS
3. The ‘Regression Sum of Squares (RSS)’ is the difference or excess of TSS over ESS:
RSS = TSS – ESS
Therefore, the TSS is partitioned into two components, i.e., ESS and RSS:
TSS = RSS + ESS
4. RSS is the variation in y reduced (or explained) by the regression equation and the ESS is the variation
which remains (or unexplained) in y when regression line is filled. Thus, the total variation is divided
into two, i.e., explained variation and unexplained variation.
5. RSS is used as a measure of reliability of the estimate obtained by the filled regression line. For this
purpose the proportion of variation explained by the regression equation, called ‘Coefficient of
Determination’ denoted by r2, is calculated as:
Note that the minimum value of r2 is zero (when RSS = 0 and ESS = TSS), and the maximum value of r2
is +1 (when RSS = TSS and ESS = 0); therefore, r2 lies between 0 to 1:
0 ≤ r2 ≤ 1
6. Another formula is:
7. Coefficient of determination of two regression equations:
r2 = b × d
Example:
Take the previous example, and calculate the coefficient of determination.
Solution:
Coefficient of Determination
x y xy x2 y2
2 1 2 4 1
4 2 8 16 4
6 4 24 36 16
7 7 49 49 49
9 10 90 81 100
10 12 120 100 144
11 14 154 121 196
49 50 447 407 510
Top
Home Page
Sampling Distribution Theory II

Sampling Distribution of Proportion:
1. The sampling distribution of proportion is defined as:
Where x is the number of successes (values with a specified characteristic) in a sample of size n.
2. If the sampling procedure is simple random, with replacement, x is recognised as Binomial Random
Variable with parameters n and π, π is the probability of success. π can also be interpreted as the
population proportion, since:
3. To determine the mean and variance of p:
Infinite Population with Replacement:
Finite Population without Replacement:

or alternatively
Example:
A coordination team consists of seven members. The education of each member as follows: (G = Graduate, PG
= Post Graduate)
Members 1 2 3 4 5 6 7
Education G PG PG PG PG G G
(i) Determine the proportion of post-graduates in the population.
(ii) Select all possible samples of two members from the population without replacement, and compute
the proportion of post-graduate members in each sample.
(iii) Compute the mean (μ p ) and the SD (σ p ) of the sample proportion computed in (ii).
Solution:
(i) Proportion of PG in the population:
N=7
No. of PG = 4
π = 4/7 = 0.57
(ii) No. of possible samples (without replacement) = NC n = 7C 2 = 21 samples.
1,2 1,3 1,4 1,5 1,6 1,7

2,3 2,4 2,5 2,6 2,7
3,4 3,5 3,6 3,7
4,5 4,6 4,7
5,6 5,7
6,7
The corresponding sampling proportions are:
0.5 0.5 0.5 0.5 0 0

1 1 1 0.5 0.5
1 1 0.5 0.5
1 0.5 0.5
0.5 0.5
0
Sampling Distribution of Proportion

p Tally Marks f P(p)
0 ||| 3 3/21 = 1/7 = 0.143
0.5 || 12 12/21 = 4/7 = 0.571
1 | 6 6/21 = 2/7 = 0.286
Total 21 1
p P(p) p.P(p) p2.P(p)

0 0.143 0 –0.5715 0.32661 0.04671 0
0.5 0.571 0.2855 –0.0715 0.00511 0.00292 0.14275
1 0.286 0.286 0.4285 0.18361 0.05251 0.286
Total 0.5715 0.10214 0.42875
(iii) Mean ( ) and SD ( ) of sample proportion distribution:

or alternatively
The results are verified as below:
Shape of the Sampling Distribution of Proportion p:
The central limit theorem also holds for the random variable p, which states that:
(i) The sampling distribution of proportion p approaches a normal distribution with mean and
SD (with replacement)
(ii) If the random sampling is without replacement and the sampling fraction , the f.p.c. must
be used as below in the formula of SD:
(iii) When n ≥ 50 and both n.π and n(1 – π) are greater than 5, the sampling distribution can be
considered ‘normal’.
(iv) When the distribution of p is normal, the following statistic will be standard normal variable:
Sampling Distribution of Difference between Two Proportions:
1. If two random samples of size n 1 and n 2 are drawn independently from two populations with
proportions π 1 and π 2 the sampling distribution of (p 1 – p 2 ) the difference between two sample
proportions, approaches normal distribution with:
as n 1 and n 2 increase.
Moreover:
will be standard normal variable.
2. For unknown π 1 and π 2 , sample estimates p 1 and p 2 are used thus:
3. When the two unknown population proportions can be assumed equal, an estimated is obtained as
below:
and the estimated standard error as below:
Sampling Distribution of t:
1. If a random sample of size n is drawn from a known Normal Population with mean μ and SD σ, the
sampling distribution of the sample mean is a normal distribution with mean and standard
error , and hence z would be a standard normal variable:
2. But when the population is unknown with unknown SD σ, the value of σ is replaced the sample SD ‘S’,
as given below:
Therefore, the standard error is equal to :
3. According to W.S. Gossett, the following statistics is denoted by ‘t’ instead of ‘z’, which follows another
distribution known as ‘students’ t-distribution’ or simply ‘t-distribution’.
4. The sample standard deviation is given by:
In the above equation the (n – 1) is called ‘Degree of Freedom’ or simply d.f., through which we can
obtain ‘t-value’ from ‘t-table’.
5. The t-distribution approaches standard normal distribution as n increases. Typically when n > 30, the t-
distribution is considered approximately standard normal.
Properties of t-distribution:
1. The t-distribution, like the standard normal, is bell shaped, unimodal and symmetrical about the mean,
2. There is a different t-distribution for every possible sample size,
3. The exact shape of t-distribution, depends on the parameter, the number of degrees of freedom, denoted
by ν.
4. As the sample size increases, the shape of t-distribution becomes approximately equal to the standard
normal distribution:
5. The mean and standard error of t-distribution are:

Sampling Distribution of Variances:
Population Variance:
or alternatively
Mean of sampling distribution of S2 ( ):
Example:
A population consists of the following numbers: 1,3,5,7. Find the population variance (σ2) and the mean of
sampling distribution of variances ( ), if all samples are drawn with replacement of size 2 from the
population.
Solution:
No. of possible samples (with replacement) = Nn = 42 = 16 samples
Samples:
1,1 1,3 1,5 1,7

3,1 3,3 3,5 3,7
5,1 5,3 5,5 5,7
7,1 7,3 7,5 7,7
Means of samples:
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
Variances of samples:
0 1 4 9
1 0 1 4
4 1 0 1
9 4 1 0
Sampling Distribution of S2:
S2 Tally Marks f f.S2

0 |||| 4 0
1 | 6 6
4 |||| 4 16
9 || 2 18
Total 16 40
Pooled Estimate of Variance:
1. If random samples of size n 1 and n 2 are drawn independently from two normal populations with means
μ 1 and μ 2 and variances σ 1 2 and σ 2 2, the sampling distribution of the difference between the sample
means follows a normal distribution with mean and standard error given as below:
Thus, the π will be equal to:
and it will be a standard normal variable.
2. But if σ 1 2 and σ 2 2 are unknown and equal, their estimators S 1 2 and S 2 2 are defined as:
When the σ 1 2 and σ 2 2 are replaced by the estimators S 1 2 and S 2 2 the distribution of can be
standardised provided that the samples are large (n 1 and n 2 > 30).
3. But when samples are small, i.e., less than 30 (n 1 and n 2 ≤ 30), σ 1 2 and σ 2 2 are replaced by a single
estimator known as ‘pooled variance’ denoted by S p 2:
Weighted Average of S 1 2 and S 2 2:
Where (n 1 + n 2 – 2) is the degree of freedom.
4. With same size of samples n 1 and n 2 , the estimator S p 2 is the simple average of S 1 2 and S 2 2:
5. The pooled variance S p 2 assumes that the population variance is unknown and equal. However, the
same S p 2 is used to replace σ 1 2 and σ 2 2 for slightly unequal population variances provided that the
samples are of equal size, i.e., n 1 = n 2 .
6. In both of the above situations, i.e., equal population variance and slightly unequal population variance
with equal samples (i.e., n 1 = n 2 ), the statistic t is calculated as below:
Where S p is pooled SD.
7. Now consider the situation where σ 1 2 and σ 2 2 are considerably different (both unknown) and it is
impossible to draw samples of equal size, the statistics used in this case would be:
Where the degree of freedom ν is as follows:
Top
Home
Trend Series Analysis II

Measurement of Seasonal Trend:
(a) Simple Average Method,
(b) Link Relative Method,
(c) Ratio to Moving Average Method, and
(d) Ratio to Trend Method.
(a) Simple Average Method: Under this method, the average ( ) of all the monthly or quarterly
values for each year are found out. Each monthly or quarterly value (y i ) is divided by the
corresponding average and the results are expressed as percentage:
Then the mean index or seasonal index (S i ) is calculated for each month or quarter. If the mean of
all seasonal indices is not equal to 100, then they will be adjusted.
Example 8:
Take data from Example 1, and calculate the four seasonal indices by the ‘simple-average method’.
Solution:
y
Year/Quarter Mean
I II III IV
2003 219 357 645 513 433.5
2004 549 640 701 590 620
2005 657 394 543 600 548.5
Now the above observed values are converted to indices using the following formula:
Year/Quarter I II III IV Total

2003 50.52% 82.35% 148.79% 118.34%
2004 88.55 103.23 113.06 95.16
2005 119.78 71.83 99.00 109.39
Season Index (S i ) 86.28% 85.80% 120.28% 107.63% 400.00
(b) Link Relative Method: Under this method, the data for each month or quarter are expressed in
percentage, known as ‘Link Relatives’. An appropriate average of the link relatives is taken, usually
a median is taken. Convert these averages into a series of chain indices. The chain indices are
adjusted for the fraction of the effect of the trend. The adjusted chain indices are further reduced to
the same level as the first month or quarter.
Example 9:
Take the data from Example 1, and calculate seasonal indices by using ‘link relative method’.
Solution:
The observed values are converted into price relatives or link relatives using the following formula:
Where P n is the value of current year

P o is the value of base year
and then the link relatives are converted into chain indices (chaining process) using the following formula:
= (L.R. × C.I. of preceding year) ÷ 100

Where L.R. is the link relative
C.I. is the chain index
Quarter
Year Total
I II III IV
2003 – 163.01% 180.67% 79.53%
2004 107.02% 116.58 109.53 84.17
2005 111.36 59.97 137.82 110.50
Median
109.19 116.58 137.82 84.17
(link relative)
Chain Index 100 116.58 160.67 135.24 512.49
Adj. C. I. 78.05%* 90.99% 125.40% 105.56% 400.00
*
Adjusted chain index for QI: 100 ÷ 512.49 × 400 = 78.05, and so for other quarters.
(c) Ratio to Moving Average Method: A 12-month or 4-quarter moving average centred is computed.
The observed values are divided by the corresponding centred moving average and the results are
expressed in percentage:
The monthly or quarterly averages of these percentages are found out. The adjusted values are the
indices of the seasonal variations.
Example 10:
Take data from Example 1 and calculate seasonal indices using ‘ratio to moving average method’.
Solution:
Quarter y 4-Quarter 8-Quarter 4-Quarter Ratio to Moving Average

Moving Moving Moving

Total Total Average
(Centred)
I II III IV
2003 I 219 –
II 357 –
1734
III 645 3798 475 135.8
IV 2064 4411
513 551 93.1
I 2347 4750 92.4
2004 549 594
II 2403 4883
640 610 104.9
III 2480 5068 110.6
701 634
IV 2588 4930
590 616 95.8
I 2342 4526 116.1
2005 657 566
II 2184 4378
394 547 72.0
III 2194 –
543
IV 600 –
Mean Seasonal Index (total = 410.5) 104.3 88.5 123.2 94.5
Adjusted Seasonal Index (S i ) (total = 400) * 86.2% 120.1% 92.1%
101.6%
*
Adjusted seasonal index for QI: 104.3 ÷ 410.5 × 400, and so on for other three quarters.
(d) Ratio to Trend Method: An average for each year is found out and a straight line is fitted by least
squares method. The trend values for each month or quarter are calculated on the assumption that
the data correspond to the middle of the month or quarter. Each original value is divided by the
corresponding calculated trend values and expressed in percentage. A mean of these percentages are
calculated for each month or quarter. The adjusted values are the indices of seasonal variation.
Example 11:
Take data from Example 1 and calculate seasonal indices using ‘ratio to trend method’.
Solution:
Quarters y x x x2
2003 I 219 –11 455 48.13%
II 357 –9 469 76.12
433.5 –8 –3468 64
III 645 –7 484 133.26
IV 513 –5 498 103.01
2004 I 549 –3 512 107.23

II 640 –1 527 121.44
620 0 0 0
III 701 1 541 129.57
IV 590 3 556 106.12
2005 I 657 5 570 115.26
II 394 7 584 67.47
548.5 8 4388 64
III 543 9 599 90.65
IV 600 11 613 97.88

0 1602 0 920 128
Now arranging the above calculated values in last column as follows:
Year/Quarter I II III IV Total

2003 48.13 76.12 133.26 103.01
2004 107.23 121.44 129.57 106.12
2005 115.26 67.47 90.65 97.88
Seasonal Index (S i ) 90.21 88.34 117.83 102.34 398.72
Adj. S.I. 90.50%* 88.62% 118.21% 102.67% 400.00
*
Adjusted seasonal index for QI: 90.21 ÷ 398.72 × 400, and so on.
Measurement of Cyclical Variation:
The cyclical and random components of a time series are first isolated from the time series using the
multiplicative model:
yi = Ti + Si + Ci + Ri
Where T i : Secular trend
Si: Seasonal variation
Ci: Cyclical variation
Ri: Random variation
This can be done by dividing y i by the product of T i and S i :
The Random component R i will now be separated from the time series by using the smoothing technique,
moving average. These moving averages show the indices of cyclical variation.
Example 12:
Take data from Example 1 and isolate cyclical component from the time series.
Solution:
C i × Ri = 3-Quarter
Moving
*
Quarters y S i ** Average
×100 (C i )
2003 I 219 455 86.28 392.57 55.79% –
II 357 469 85.80 402.40 88.72 85.10
III 645 484 120.28 582.16 110.79 98.41
IV 513 498 107.63 536.00 95.71 110.26
2004 I 549 512 86.28 441.75 124.28 120.51
II 640 527 85.80 452.17 141.54 124.52
III 701 541 120.28 650.71 107.73 115.95
IV 590 556 107.63 598.42 98.59 113.30
2005 I 657 570 86.28 491.80 133.59 103.60
II 394 584 85.80 501.07 78.63 95.86
III 543 599 120.28 720.48 75.37 81.74
IV 600 613 107.63 657.77 91.22 –
*
As calculated in the previous example
** As calculated in Example
Top
Home Page

Statistics (MA in Economics)

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Statistics (MA in Economics)

Hochgeladen von

Copyright:

Verfügbare Formate

Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!

Sample Correlation Coefficient:

Correlation Coefficient and Regression Coefficient:

3. The regression coefficients b and are related to correlation coefficient r by:

Properties of Coefficient of Correlation:

1. The correlation coefficient is symmetrical with respect to x and y, i.e., r xy = r yx

(a) Covariance of x and y,

(b) Standard deviation of x and y,

(c) Coefficient of correlation, and

(d) Scatter diagram.

(a) Covariance of x and y:

3 1 0.4 –1.1 –0.44 9 1

(b) Standard deviation of x and y:

(c) Coefficient of correlation:

(d) Scatter diagram:

(a) Covariance of x and y,

(b) Variances of x and y,

(c) Coefficient of correlation, and

(d) Coefficient of determination.

For the following sample data:

(a) Covariance of x and y:

20 100 10.2 45 459 104.04 2025

(b) Variances of x and y:

(c) Coefficient of correlation:

(d) Coefficient of determination:

r2 = 4.48720 × 0.22059 = 0.9898 = 98.98%

1. The probable error is about two-third of the standard error:

2. Assuming ρ = 0, the sampling distribution of r has standard error:

P(–0.6745 ≤ z ≤ 0.6745) = 0.5

4. Thus, the probable error r is:

5. Probabilities of r can now be calculated using P.E. as a unit of deviation:

P(–P.E. ≤ r ≤ P.E.) = 0.5

P(–3P.E. ≤ r ≤ 3P.E.) = 0.9544

Where d i = x i – y i (the difference between the rankings).

Number of ranks (n) Critical value (r s )

Students I II III IV V VI VII VIII IX

Calculate Spearman’s Rank Correlation Coefficient and test its significance.

Critical value of r s for n = 9 and α = 0.05 is 0.74

Since 0.65 is less than the critical value of 0.74, r s is insignificant.

Class Class Adjusted

(b) Frequency Polygon:

(c) Relative Frequency Histogram and Polygon: Same as described above.

(d) Cumulative Frequency Polygon or Ogive:

(e) Frequency Curves and Smoothed Ogives:

Types of Frequency Distribution and Curves:

(a) Symmetrical Distribution,

(a) Symmetrical Distribution: A frequency distribution is said to be symmetrical if the

Class 0-9 10-19 20-29 30-39 40-49 50-59 60-69

(b) Moderately Skewed or Asymmetrical Distribution: A frequency distribution is said to be

Asymmetrical distributions are of two types, i.e.:

(i) Positively skewed, and

(i) Positively Skewed:

Class 0-9 10-19 20-29 30-39 40-49 50-59 60-69

(ii) Negatively Skewed:

Class 0-9 10-19 20-29 30-39 40-49 50-59 60-69

(c) Extremely Skewed or J-Shaped Distribution:

Income 0-1999 2000- 4000- 6000- 8000- 10000- 12000-

Class 1-5 6-10 11-15 16-20 21-25 26-30

(e) Multi-Modal Distribution:

(a) Simple Bar Chart,

(a) Simple Bar Chart:

1. Simple bar chart consists of vertical or horizontal bars of equal width.

Exports of Pakistan (in US $ million)