Beruflich Dokumente
Kultur Dokumente
Coefficient of Correlation
Population Correlation Coefficient:
1. The measure of joint or mutual variation in a bivariate population with two variables x and y, is called
‘covariance of x and y’:
2. In order to make comparison, the covariance must be standardised by dividing (x – μ x ) and (y – μ y ) by their SDs
σ x and σ y respectively. This expression is called ‘coefficient of correlation’; the ‘population coefficient of
correlation’ is denoted by ‘ρ’ (rho):
1. The sample covariance of x and y, S xy , measures the tendency for x and y to increase or decrease together in the
sample:
2. The ‘sample coefficient of correlation’ is denoted by ‘r’. It is also known as ‘Karl Pearson’s product moment
coefficient of correlation’. The coefficient of correlation always lies between –1 and +1 respectively, i.e., –1 ≤ r ≤
+1:
3. (a) If r = –1, all the points on the scatter diagram lie on the regression line of negative slope. It is called a ‘perfect
negative correlation’.
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
(b) If r = 1, all the points on the scatter diagram lie on the regression line of positive slope. It is called a ‘perfect
positive correlation’.
(c) If r = 0, all the points on the scatter diagram are spread throughout the diagram indicating no correlation
between x and y.
“Correlation coefficient is a measure of the closeness of linear relationship between the two variables.”
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
1. The two regression coefficients b and d of the two regression lines can also be stated as follows:
2. Since , therefore, S xy = r ∙ S x ∙ S y .
or
or
Where
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
r xy = r uv
5. The correlation coefficient lies between –1 and +1, i.e., it cannot be less than –1 and greater than +1:
–1 ≤ r ≤ +1
Example:
x 3 1 1 2 4 2 3 5 2 3
y 2 4 3 2 1 2 1 3 2 1
Required:
Solution:
x y x – μx x – μy (x – μ x )( x – μ y ) (x – μ x )2 (x – μ y )2
3 2 0.4 –0.1 –0.04 9 4
1 4 –1.6 1.9 –3.04 1 16
1 3 –1.6 0.9 –1.44 1 9
2 2 –0.6 –0.1 0.06 4 4
4 1 1.4 –1.1 –1.54 16 1
2 2 –0.6 –0.1 0.06 4 4
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Example:
Calculate:
x 1 2 4 6 8 10 14 15 18 20
y 10 20 30 40 50 60 70 80 90 100
Solution:
x y ( )( ) ( )2 ( )2
1 10 –8.8 –45 396 77.44 2025
2 20 –7.8 –35 273 60.84 1225
4 30 –5.8 –25 145 33.64 625
6 40 –3.8 –15 57 14.44 225
8 50 –1.8 –5 9 3.24 25
10 60 0.2 5 1 0.04 25
14 70 4.2 15 63 17.64 225
15 80 5.2 25 130 27.04 625
18 90 8.2 35 287 67.24 1225
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
r2 = b × d
Probable Error:
3. In a standard normal distribution, z = ± 0.6745 will contain 50% of the area under curve, symbolically:
P.E. = 0.6745 × σ r
or
P.E. = 0.6745 ×
Rank Correlation:
1. If observations on two variables are given in the form of ranks rather than some numerical measurements, it is
possible to compute a coefficient of correlation between ranks of the two variables. This correlation coefficient
is called ‘Rank Correlation Coefficient’.
2. As this formula was presented by Spearman in 1904, it is also known as ‘Spearman’s Rank Correlation
Coefficient’:
3. In order to test that there is no correlation between the two rankings, critical values of r s at α = 0.05 are given
below:
7 0.79
8 0.74
9 0.74
10 0.65
20 0.45
25 0.40
50 0.28
Example:
Ranks of 9 students in a class in History (x) and Geography (y) are as follows:
Solution:
Students x y d=x–y d2
I 1 4 –3 9
II 9 5 4 16
III 7 6 1 1
IV 4 3 1 1
V 5 7 –2 4
VI 3 2 1 1
VII 8 8 0 0
VIII 2 1 1 1
IX 6 9 –3 9
Total 45 45 0 42
Where d i = x i – y i
Top
Home Page
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Graphical Presentation I
Types of Graphs:
(a) Histogram
(b) Frequency Polygon
(c) Relative Frequency Histogram and Polygon
(d) Cumulative Frequency Polygon or Ogive
(e) Frequency Curves and Smoothed Ogive
(a) Histogram:
1. A histogram consists of a set of adjacent rectangles having bases along x-axis (marked off
by class boundaries) and areas proportional to class frequencies.
2. To adjust the heights of rectangles in a frequency distribution with unequal class interval
sizes, each class frequency is divided by its class interval size.
Class Frequency
boundaries
109.5-119.5 1
119.5-129.5 4
129.5-139.5 17
139.5-149.5 28
149.5-159.5 25
159.5-169.5 18
169.5-179.5 13
179.5-189.5 6
189.5-199.5 5
199.5-209.5 2
209.5-219.5 1
Σf 120
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
1. It is constructed by plotting the class frequencies against their corresponding class marks
(mid-points) and then joining the resulting points by means of straight lines.
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
2. The ends of the graph so drawn do not meet the ends of x-axis. A polygon is a many sided
closed figure. Therefore, extra classes are to be added at both ends of the frequency
distribution with zero frequencies.
3. The frequency polygon can also be obtained by joining the mid-points of the tops of
rectangles of histogram.
1. The graph showing the cumulative frequencies plotted against the upper class boundaries
is called a ‘cumulative frequency polygon’ or ‘ogive’.
2. The graph corresponding to a less than or a more than cumulative frequency distributions
are called ‘less-than’ and ‘more-than ogives’ respectively.
Less than More than
Class
Frequency Cumulative Cumulative
Boundaries
Frequency Frequency
109.5-119.5 1 1 119
119.5-129.5 4 5 115
129.5-139.5 17 22 98
139.5-149.5 28 50 70
149.5-159.5 25 75 45
159.5-169.5 18 93 27
169.5-179.5 13 106 14
179.5-189.5 6 112 8
189.5-199.5 5 117 3
199.5-209.5 2 119 1
209.5-219.5 1 120 0
Σf 120
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
persons
(d) U-Shaped Distribution: In such a distribution, the maximum frequencies occur at both ends
and a minimum in the centre.
1. Frequency distributions with more than one maximum are called ‘multi-modal distribution’.
2. A distribution with two
maxima is called a ‘bimodal
distribution’.
Types of Charts:
Year Exports
1948 138
1951 406
1961 378
1971 683
1981 2958
1991 6168
2001 9202
2005 14410
1. This chart consists of bars which are sub-divided into two or more parts.
2. The length of the bars is proportional to the totals.
3. The component bars are shaded or coloured differently.
1. Component bar charts may also be drawn on percentage basis by expressing the
components as percentages of their respective totals.
2. All the bars are of equal length showing the 100%. These bars are sub-divided into
component bars in proportion to the percentages of their components.
Areas Under Crop Production (1985-90)
(‘000 hectors)
Year Wheat Rice Others Total
1985-86 7403 1863 1926 11192
1986-87 7706 2066 1906 11678
1987-88 7308 1963 1612 10883
1988-89 7730 2042 1966 11738
1989-90 7759 2107 1970 11836
1. Pie chart is used to compare the relation between the whole and its components.
2. The difference between the component bar chart and pie chart is that in case of component
bar chart the length of the bars are used while in case of a pie chart the area of the sector
of a circle is used.
3. In pie chart, the circle is drawn with radii proportional to the square root of the quantities to
be represented because the area of a circle is given by 2πr2.
4. The sectors are coloured and shaded differently.
5. To construct a pie chart, we draw a circle with some suitable radius (square root of the
total). The angles are calculated for each sector as follows:
Angles for each sector = Component Part × 360o
Total
Development
Angles of Sectors Cumulative
Provinces Expenditure
(In Degrees) Angle
(In Rs. Million)
Balochistan 4874 56o
Continued
Top
Home page
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Index Numbers I
Introduction:
1. An index number is a device which shows by its variations the change in a magnitude which is not
capable of accurate measurement in itself or of direct valuation over time.
2. To measure changes in a situation we combine the prices and qualities and find a single number. This
single number which shows overall changes in a phenomenon is known as ‘Index Number’.
3. It is used to compare changes in a complex phenomenon like the cost of living, total industrial
production, wages, etc.
4. It is very useful in measuring changes in prices and quantities of commodities with different measuring
units, for example, wheat per maund, cloth per yard, etc., which cannot be compared directly.
(a) Price Index Number: It compares changes in prices, from one period to another. Wholesale price
index and cost of living index are the examples.
(b) Quantity Index Number: It measures how much the quantity of a variable changes over time. Index of
industrial production and business activity index are examples.
(c) Value Index Number: It measures changes in total monetary worth. It combines price and quantity
changes to present a more informative index. Index of GNP and index of retail sales are the examples.
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
1. An index number is a device for measuring changes in a variable or a group of related variables.
2. It can be used to compare changes in one or more variables in one period with those of others or in one
region with those in the others.
3. The index number of industrial activity enables us to study the progress of industrialisation in the
country.
4. The quantity index numbers show rise or fall in the volume of production, volume of exports and
imports, etc.
5. The cost of living index numbers are, in fact, the retail price indices. They show changes in the prices of
goods generally consumed by the people. Therefore, they can help the government to formulate the
suitable price policy.
6. The cost of living index number can be made a basis for regulation of wage rates and can be used by
industrial and commercial organisations to grant dearness allowance and bonus to their employees in
order to meet the increased cost of living.
7. Index numbers are also used for forecasting business activity and in discovering seasonal fluctuations
and business cycles.
(i) Defining the purpose and scope of index number, i.e., the general-purpose or special purpose,
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
(iii) Collection of prices, i.e., (a) considering the prices to be used like average price, retail price or
wholesale price, etc; and (b) the sources of price data like from representative markets, price lists or
trade journals.
(iv) Selecting base period, (a) fixed-base method, and (b) chain-base method.
(vi) Selecting suitable weights: (a) implicit weighting, and (b) explicit weighting.
Notations:
(i) Price Relatives: are obtained by dividing the price in a given year by the base year price
and expressed as percentage. Thus:
Example:
The prices of sugar for 2001 and 2005 are given as below:
Year Price / Kg
2001 11
2005 30
Required:
(a) Taking 2001 as base year, find price relative for 2005.
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
(b) Taking 2005 as base year, find price relative for 2001.
Solution:
(ii) Link Relatives: are obtained by dividing the price in a given year by the price in the
preceding year and expressed as percentage:
Link relatives are not directly comparable, therefore, they are converted to a fixed based index
number. The process of conversion is called the ‘chaining process’, and the index numbers so
obtained are chain indices:
Example:
Year Price / Kg
2000 21
2001 20
2002 20
2003 22
2004 25
2005 28
Required:
Taking 2000 as base year, find price relatives for the years 2001 to 2005.
Solution:
2000 21 100%
2001 20
2002 20
2003 22
2004 25
2005 28
(b) Unweighted Index Numbers: There are two methods of constructing this type of index:
(i) Simple Aggregative Method: In this method, the total of the prices of commodities in a
given year is expressed as percentage of the total of the prices of commodities in the base year:
• It does not take into account the relative importance of various commodities.
• The units in which prices are given, e.g., maunds, yards, gallons, etc., affect the value of
index very much.
Example:
Required:
Simple aggregative index numbers for the years 2001-05, with 2001 as base year.
Solution:
Sugar 11 12 14 27 30
Tea 178 176 174 180 180
Total 209 208 210 232 238
Simple
Aggregative
Index
(ii) Average of Relatives’ Method: In this method, we use the average (mean, median, GM, etc.) of
the price relatives or link relatives. It does not affect the value of index numbers. The only
disadvantage of this method is that it gives equal weight to all commodities.
Example:
Required:
Construct price index numbers using average of relatives’ method, taking 2001 as base year.
Solution:
Sugar
Tea
Total 300 307.97 335.02 471.52 513.85
Mean
100 102.66 111.67 157.17 171.28
(Index)
(c) Weighted Index Numbers: This type of index can be further classified into two categories:
(i) Weighted Aggregative Index Numbers: In these index numbers, the quantities
produced, sold or bought or consumed during the base year or current year are used as weights.
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
These weights indicate the importance of the particular commodity. Some well-known weighted
index numbers are given below:*
(1) Lespeyre’s Index: This index uses base year quantities as weights. For this reason, it is also
known as ‘Base Year Weighted Index’:
Here W = Q o
(2) Paasche’s Index: This index uses current years quantity as weights. For this reason, it is known
as ‘Current Year Weighted Index’:
Here W = Q n
(3) Fisher’s Ideal Index: This index number is the GM of the Lespeyre’s and Paasche’s index
numbers. It is called ‘ideal’ because it satisfies two tests (Time Reversal and Factor Reversal
Tests):
(4) Marshall-Edgeworth’s Index: This index number uses the average of the base year and current
quantities as weights:
Example:
2001 2005
Commodities
Price (Rs. / kg) Qty. (kgs) Price (Rs. / kg) Qty. (kgs)
Rice 20 100 28 160
Sugar 11 18 30 37
Salt 1 1 5 1.1
Milk 18 57 32 149
* W.A.I.N. is equal to
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Required:
Construct the following price index numbers using 2001 as base year:
(a) Lespeyre’s
(b) Paasche’s
(c) Fisher’s
(d) Marshall-Edgeworth’s
Solution:
2001 2005
PoQo PnQo PnQn PoQn Q o +Q n P o (Q o +Q n ) P n (Q o +Q n )
Po Qo Pn Qn
Rice 20 100 28 160 2000 2800 4480 3200 260 5200 7280
Sugar 11 18 30 37 198 540 1110 407 55 605 1650
Salt 1 1 5 1.1 1 5 5.5 1.1 2.1 2.1 10.5
Milk 18 57 32 149 1026 1824 4768 2682 206 3708 6592
Total 3225 5169 10363.5 6290.1 9515.1 15532.5
(a) Lespeyre’s:
(b) Paasche’s:
(c) Fisher’s:
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
(d) Marshall-Edgeworth’s:
(ii) Weighted Average of Relatives: The formula of weighted average of relatives is:
or
or
The total value of the commodity is used as weights. If the base year value (P o Q o ) is used as base,
then the formula becomes:
or
If the current year value (P n Q n ) is used as base, then the formula becomes:
Example:
Prices
Commodity Weights
2001 2005
Rice 20 28 35
Tea 178 180 5
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Sugar 11 30 24
Required:
Solution:
Commodity V W VW
Rice 35 4900
Tea 5 505.6
Sugar 24 6545.52
64 11951.12
Quantity Index Number: The formula described for obtaining price indices can be easily used to obtain
quantity indices or volume indices simply by interchanging the Ps and Qs, for example:
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
and:
and so on.
Value Index Numbers: Like price or quantity index numbers, we can obtain formulae for value index numbers.
The simplest value index number is defined as below:
This is the ‘Simple Aggregative Index’ because the values have not been obtained.
Continued
Top
Home Page
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Measures of Dispersion
Definition:
1. Two or more distributions may differ greatly in their dispersion, although their means may be the same, for e.g.:
67,67,67,67,67,67,67,67
43,43,50,55,66,90,91,97
2. By dispersion we mean the extent to which the values are spread out from the average. The measures used for
computing the amount of dispersion in a distribution is known as ‘measures of dispersion’ or ‘measures of
variation’.
3. In the above distribution, the first distribution has zero dispersion, and the second distribution has a dispersion
greater than the former. The dispersion cannot be less than zero.
(i) Measures of Absolute Dispersion: The actual variation or dispersion determined by the Measures
of Absolute Dispersion is called ‘absolute dispersion’.
(ii) Measures of Relative Dispersion: The measures of absolute dispersion cannot be used to compare
the variation of two or more series. For e.g., the SD of the height of students (in inches) cannot be
compared with the SD of weights (in pounds). Even if the units are identical, for e.g., the comparison of
height of men (in inches) and length of their noses (in inches). If the SD of heights of man is greater than the
SD of their nose lengths, it does not mean that the degree of variability is greater in case of heights.
To compare the variation of two or more series, we need a measure of relative dispersion. It is defined as:
1. The range is the simplest measure of dispersion. It is defined as the difference between the largest
value and the smallest value in the data:
2. For grouped data, the range is defined as the difference between the upper class boundary (UCB) of the
highest class and the lower class boundary (LCB) of the lowest class.
1. It is also known as the Semi-Interquartile Range. The range is a poor measure of dispersion where
extremely large values are present. The quartile deviation is defined half of the difference between the
third and the first quartiles:
2. The difference between third and first quartiles is called the ‘Inter-Quartile Range’.
1. The MD is defined as the average of the deviations of the values from an average:
1. The SD is defined as the positive Square root of the mean of the squared deviations of the values from
their mean.
3. In case of a frequency distribution with x 1 , x 2 , ….. , x k as class marks, and f 1 , f 2 , ……, f k as the
corresponding class frequencies, the SD is expressed as follows:
1. If the values (or class marks) and the mean are not integral values, the computation of SD from its definition
becomes labourious.
2. The shortcut alternate method for computing SD is:
3. If the values x are large, considerable time is served by taking deviations from x from an arbitrary value A. If D
denotes deviations of x from A, i.e., D = x – A, then the SD can be expressed in another way:
Where u =
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
The Variance:
The variance is defined as the square of the SD, i.e., the mean of the squared deviations from mean:
Alternate Method:
1. Variance:
2. Standard Deviation:
3. When all the values are multiplied or divided by a constant the SD of these values is multiplied or divided by the
constant and the variance is multiplied or divided by the square of the constant:
4. If two sets of data consisting of n 1 and n 2 have variances S 1 2 and S 2 2 respectively, the combined variance of both
sets of data is expressed as follows:
5. The variance of the sum or difference of two independent random variables is the sum of their respective
variance. Thus, if x and y are independent random variables:
6. The variance has the minimal property. This means that the variance or the SD is minimum if and only if the
deviation are taken from the mean. In other words:
is a minimum when
The above results also hold approximately for moderately skewed distributions.
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
(a) Range:
1. The range is simple to understand and easy to calculate because its value is determined by the two
extreme items.
3. Its value may be greatly changed if an extreme value (either lowest or highest) is withdrawn or a fresh
value is added. It is a highly unstable measure of variation.
4. It gives no indication how the values within the two extremes are distributed.
Example:
Calculate:
(a) Range
(e) Variance
Solution:
f
CB CF x fx
9.5-19.5 5 5 14.5 72.5 -37.7 37.7 188.5 1421.29 7106.45
19.5-29.5 8 13 24.5 196 -27.7 27.7 221.6 767.29 6138.32
29.5-39.5 13 26 34.5 448.5 -17.7 17.7 230.1 313.29 4072.77
39.5-49.5 19 45 44.5 845.5 -7.7 7.7 146.3 59.29 1126.51
49.5-59.5 23 68 54.5 1253.5 2.3 2.3 52.9 5.29 121.67
59.5-69.5 15 83 64.5 967.5 12.3 12.3 184.5 151.29 2269.35
69.5-79.5 7 90 74.5 521.5 22.3 22.3 156.1 497.29 3481.03
79.5-89.5 5 95 84.5 422.5 32.3 32.3 161.5 1043.29 5216.45
89.5-99.5 3 98 94.5 283.5 42.3 42.3 126.9 1789.29 5367.87
99.5-109.5 2 100 104.5 209 52.3 52.3 104.6 2735.29 5470.58
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
(a) Range:
(e) Variance:
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
1. Coefficient of variation was introduced by Karl Pearson. The CV expresses the SD as a percentage in
terms of AM:
2. It is frequently used in comparing dispersion of two or more series. It is also used as a criterion of
consistent performance, the smaller the CV the more consistent is the performance.
6. The higher the CV, the higher is instability or variability in data, and vice versa.
If X m and X n are respectively the maximum and the minimum values in a set of data, then the coefficient of
dispersion is defined as:
1. If Q 1 and Q 3 are given for a set of data, then (Q 1 + Q 3 )/2 is a measure of central tendency or average of
data. Then the measure of relative dispersion for quartile deviation is expressed as follows:
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
The relative measure for mean deviation is ‘mean coefficient of dispersion’ or ‘coefficient of mean deviation’:
Example:
Calculate:
Solution:
Example:
During a soccer tournament, two players make the following series of goals:
Player 1 2 2 4 3 2 4 2 3
Player 2 1 2 5 5 5 2 1 1
Solution:
x y
2 1 -0.75 0.5625 -1.75 3.0625
2 2 -0.75 0.5625 -0.75 0.5625
4 5 1.25 1.5625 2.25 5.0625
3 5 0.25 0.0625 2.25 5.0625
2 5 -0.75 0.5625 2.25 5.0625
4 2 1.25 1.5625 -0.75 0.5625
2 1 -0.75 0.5625 -1.75 3.0625
3 1 0.25 0.0625 -1.75 3.0625
22 22 5.5 25.5
Conclusion: The higher the CV, the higher the instability, and vice versa. From the above calculations, it is evident that
Player 1 is more consistent than Player 2.
Raw data can be converted into a special type of values by subtracting the mean from each value and then dividing by
the SD of the data. These values are called ‘standard scores’ or ‘z-scores’ or ‘values in SD units’:
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Properties of Z-Score:
4. The distribution of z-scores looks exactly the same as the distribution of original data.
Example:
A student gets 82 marks in a final examination in Accounting; the mean is 75 marks with a standard deviation of 10
marks. In Economics, he gets 86 marks in the final examination on which the mean is 80 marks with a SD of 14 marks. Is
his relative standing better in Accounting or Economics?
Solution:
Accounting Economics
S = 10 S = 14
x = 82 x = 86
Conclusion: His marks in Accounting are 0.7 SD above the mean, while in Economics his marks are 0.43 SD above the
mean. Therefore, his relative standing in Accounting is higher than Economics.
Chebyshev’s Theorem:
1. A Russian mathematician P.L. Chebyshev has devised a rule called ‘Chebyshev’s Theorem’ to determine the
minimum proportion of values in intervals that are equidistant from mean.
2. The theorem states that for any data at least of the values must lie within k standard deviations on
either side of the mean, where k is any constant number greater than 1.
3. In other words, the interval will contain at least of the values. For example:
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
1. Proportions of values are given only for intervals which are equidistant from mean, that is the mean should
always be the mid-point of the interval.
2. Minimum proportion is specified rather than exact or approximate value of the proportion.
Example:
Two populations have the same mean . Their SDs are . Find the percentages of the values
that must lie between 125 and 155.
Solution:
Population 1 Population 2
Therefore 125 to 155 will contain at least: Therefore 125 to 155 will contain at least:
Normal Distribution:
1. Three mathematicians, namely, P. Laplace, A. De Moivre and K.F. Gauss have independently developed a law
which gives the proportion of values that lie in specific intervals of a special type of symmetrical distribution
called ‘Normal Distribution’.
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
2. The mathematical form of a normal distribution is complicated and difficult to use frequently. Tables have
constructed to make the application of normal law simple, known as ‘tables of areas under normal curve’ or
‘normal area tables’.
3. Whenever the frequency curve is bell shaped or symmetrical, the distribution (or curve) can be assumed
approximately normal and hence normal law can be applied.
6. It should be noted here that the z-score is a linear transformation of a variable x such that:
and
Since or
Example:
Given: .
(i)
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
(ii)
Solution:
(i) :
Rules:
SD(x + a) = SD(x)
SD(ax) = a × SD(x)
(ii) :
Rules:
SD(x + a) = SD(x)
SD(ax) = a × SD(x)
Top
Home Page
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
1. In our every day life, we base many of our decisions on random outcomes, i.e., change occurrence. For
e.g., captains of two cricket teams toss a coin to decide as to which team will play first, or lotteries are
drawn by spinning wheal, etc.
2. Random numbers are the numbers obtained by some random process (manually or mechanically).
3. These numbers are assumed to be randomly and uniformly (equally) distributed. The basic random
numbers are the 10 one-digit numbers, i.e., 0, 1, 2, ………. 9. Each of these numbers has an equal
change 1/ 10 of being selected.
4. Random numbers can be generated manually as well as mechanically. Random numbers can be
generated manually by drawing cards from playing cards or rotating spinning wheel, etc. Mechanically
generated random numbers are from calculators and computers.
5. The most common use of random numbers is for selection of samples.
Random Variables:
1. Experiments in which outcomes vary from trial to trial are called ‘Random Experiments’.
2. A variable whose values are determined by the outcomes of a random experiment is called a random
variable.
3. In other words, random variable is a rule which assigns numbers to the outcomes of the possibility space
and is denoted by X.
4. For example, throwing of a die is a random experiment and its outcomes, i.e., the occurrence of 1, 2, 3,
3, 4, 5 and 6 is a random variable.
5. A random variable is also called a ‘chance variable’, ‘stochastic variable’ or simply a ‘variable’.
Capital letters of X or Y are used to denote a variable and lower case letters x or y are used to denote its
values.
6. Many random variables may be defined for one and the same possibility space.
7. When any characteristics of the individuals of a population (or a sample) are measured or counted, the
characteristic itself is a random variable.
8. The random variables are further bifurcated into:
(a) Discrete Random Variable: A random variable which can assume only a finite number of values or
a sequence of whole numbers is called a discrete random variable. For example, the number of spots
on a die is a discrete random variable, number of persons enrolled for CSS examinations, number of
students passed in 1st division in a particular class, number of defective items in a lot, etc. are
discrete random variables, which could assume any of the possible values, i.e., 1, 2, 3…….
(b) Continuous Random Variable: A random variable which can assume all possible values on a
continuous scale in a given interval is called a continuous random variable. For example, height,
weight, temperature, distance, life periods, speed, etc. are continuous random variables.
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Example:
A coin is tossed three times. Find the possibility space and define two random variables for this possibility
space.
Solution:
X = no. of heads.
Note: The same value may be assigned to different outcomes of the possibility space.
Probability Distribution:
1. An arrangement of all possible values of a random variable along with their respective probabilities is
called a ‘probability distribution’ or a ‘probability function’.
2. Probability distribution can be further bifurcated into:
(a) Discrete Probability Distribution: Let a discrete random variable X assume values x 1 , x 2 , x 3 ,
……….., x n with respective probabilities P(x 1 ), P(x 2 ), P(x 3 ), …………, P(x n ). Since the random
variable takes a discrete set of values, it is also called a discrete probability distribution. A discrete
probability distribution may take the form of a table, a graph or a mathematical equation.
(i) 0 ≤ P(x i ) ≤ 1
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
(ii) ∑P(x i ) = 1, which means that the sum of probabilities is equal to one.
Example:
A coin is tossed three times. Find the probability distribution of the random variable number of heads.
Solution:
Example:
Solution:
X P(X)
2
1 / 14
3
2 / 14
4
3 / 14
5
4 / 14
Total 1
(b) Continuous Probability Distribution: As we known that a random variable which can assume all
possible values within a given interval is called a continuous random variable. Within a given interval,
there are an infinite number of values. For example, there may be an infinite number of weights
between 69.5 kgs and 70.5 kgs. In case of a continuous random variable, therefore, we compute
probabilities for various intervals of continuous random variable, such as P(a ≤ X ≤ b) or P(X ≥ c).
The probability distribution of a continuous random variable cannot be presented in tabular form. It can
be represented by means of a formula or through a graph. The formula is necessarily in the form of a
function of the numerical values of the continuous random variable X. For e.g., a continuous random
variable can assume values between X = 2 and X = 4 and the function is given by:
In a probability distribution of a random variable X, the mean, also referred to as ‘Mathematical Expectation’ or
‘Expected Value’, and the variance are defined as:
μ = E(X) = Σ X · P(X)
Distribution Function:
A function showing probabilities that a random variable X has a value less than or equal to x is called the
‘cumulative distribution function’ or ‘distribution function of x’.
(i) f(– ∞) = 0 and f(∞) = 1, which means that f(x) is an increasing function ranging from 0 to 1.
(ii) If a < b then f(a) < f(b) for any real numbers a and b.
For a discrete random variable, distribution function is obtained by cumulating probabilities just as we obtained
cumulative distribution.
The distribution function for the probability distribution of the previous two examples is as below:
x f(x)
x<0 0
1
0≤x<1 /8
4
1≤x<2 /8
7
2≤x<3 /8
x≥3 1
x f(x)
x<1 0
2
1≤x<2 / 14
5
2≤x<3 / 14
9
3≤x<4 / 14
x≥4 1
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Example:
Calculate the mean and variance for the following probability distribution:
X 0 1 2 3 4 5 6 7
P(X) 0.11 0.23 0.34 0.16 0.10 0.06 0.04 0.01
Solution:
(i) If an experiment contains only two possible outcomes, i.e., success or failure.
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
(ii) The probability of ‘success’ is denoted by ‘p’ and the probability of ‘failure’ is denoted by
‘q’ where q = 1 – p or p + q = 1.
3. The number of success in n experiments is the Binomial Random Variable and is denoted by X. The
possible values of X are 0, 1, 2, 3, 4, ….., n. The probabilities of the values of X are calculated by the
following formula:
Where x = 1, 2, 3, 4, ………, n
The above formula is ‘Binomial Probability Distribution’. The two constant quantities p and n are
called the parameters of a Binomial Distribution. The quantity q is not a separate parameter because q =
1 – p.
The mean and variance of a binomial distribution are directly evaluated in terms of its parameters p and n.
Example:
A coin is tossed 3 times. ‘Number of heads’ in 3 tosses is the random variable X. Calculate probabilities of all
possible values of X. Also calculate mean and variance.
Solution:
Success: Head
p = P(success) = P(head) = ½
x = 0, 1, 2, 3.
P(x=0) = =
P(x=1) = =
P(x=2) = =
P(x=3) = =
1. It is a formula to determine the probabilities of the values for a random variable called ‘Hyper
Geometric Random Variable’.
2. Following are the conditions of hyper geometric random variable:
(i) There are N items of which K are of first kind and the remaining (N – K) are of second
kind,
(ii) A sample of n items is randomly drawn without replacement from the N items.
The above formula is called ‘Hyper Geometric Probability Distribution’. A schematic explanation of
this formula may be given as:
Example:
A committee of 3 persons is to be formed from among 3 men and 2 women. If the selection of the committee
members is random, construct the probability distribution of the random variable ‘Number of women in the
committee’.
Solution:
The Hyper Geometric Probability Distribution of RV ‘No. of Women in the Committee’ is as follows:
X P(X)
0 0.1
1 0.6
2 0.3
Total 1
1. A random variable created by counting the number of items or events in a unit of either time or space is
called a ‘Poisson Random Variable’.
2. Examples of Poisson random variable are the number of accidents per day on a highway, number of cars
arriving at petrol pump in a five minute period of time, number of typing mistakes per page and number
of defects in a painted surface, etc.
3. A Poisson probability distribution formula assigns probabilities to the values of the ‘Poisson Random
Variable’:
Where x = 0, 1, 2, 3, ……..
4. Where λ (lambda) is the only parameter of the distribution and e is the mathematical constant
2.71828………..:
(i) The number of events per unit of time or space remains stable for a long period of time. This is
the parameter of the distribution denoted by λ.
(ii) The number of events in one time period is independent of the number of events in another time
period.
Example:
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
In an industry, the average number of damaged output units per week is 10. What is the probability that there
will be (i) no damaged unit in the next week, (ii) 5 damaged units in the next week, and (iii) 15 damaged units
in the next week.
Solution:
The computations involved in the binomial distributions become quite tedious when n is large. In such cases
the binomial distribution can be approximated to a Poisson distribution with λ = n ּ p under the following
conditions:
(iii) n ּ p is finite.
A frequently used rule of thumb is that the approximation is appropriate when p ≤ 0.05 and n ≥ 20. However,
the Poisson distribution sometimes provides close approximations even in cases where n is not large nor p is
very small.
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Example:
In a village, the local government approximated that 2% of the population are infected with seasonal flu due to
absence of proper medication. What is the probability that the number of infected persons in a random sample
of 50 will be 4?
Solution:
λ = n ּ p = 50 × 0.02 = 1
The mean of a Poisson Random Variable is the parameter of the Poisson distribution λ, that is:
E(X) = λ
V(X) = λ
1. The concept of probability for continuous random variable is somewhat different with that of a discrete
random variable.
2. The function or the formula of continuous probability distribution is generated and its curve is drawn on
a graph paper such that:
(i) the function is non-negative for all possible values of the random variable, and
(ii) the total area under the curve of the function is one.
This function is called ‘probability density function’ and its curve a ‘probability curve’.
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
3. The probability of an interval from a to b is defined as the area under the probability curve between the
two vertical lines erected on the x-axis at the points a and b.
4. The probability of an individual value under the continuous probability distribution is considered zero.
5. Probabilities of continuous random variable are represented by areas under the probability curve.
1. The most important and widely used probability density function is the ‘Normal Distribution’ where
probability curve is a bell shaped symmetrical curve:
Where – ∞ ≤ x ≤ ∞
3. A normal probability distribution or its probability curve characterised by two quantities μ and σ called
the parameters of the distribution.
4. Two normal curves with different means μ and equal standard deviations σ are as below:
5. The normal curves with different standard deviations σ and equal means μ:
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
6. Two normal curves with different means μ and different standard deviations σ:
1. The area between two limits of an interval under a normal probability curve cannot be determined
analytically.
2. Tables of areas evaluated numerically could have been constructed but it would be impossible for an
infinite number of normal curves for all values of μ and σ.
3. This problem is overcome by ‘Standard Normal Probability Distribution’ whose mean is zero (μ = 0)
and standard deviation is one (σ = 1). The standard normal variable is denoted by ‘x’:
4. The table of areas under the standard normal curve is used to find area under normal probability curve:
5. Following steps are involved in determining the area or probability of a particular interval of a normal
distribution with μ and σ:
(ii) From the normal area table, determine the area for each z-value,
6. Precisely, a value of random variable ‘x’ can be converted to value ‘z’ by:
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Where μ and σ are the mean and standard deviation of the random variable z.
x=μ+σ·z
8. ‘z’ is the number of standard deviations from or to the mean. All intervals containing the same number
of standard deviations from mean will contain the same area under the curve for any normal distribution.
9. ‘Normal Area Table’ gives an idea under the curve to the left of a z-value. For example, for z = 1.51,
the Area under Normal Curve (as shown in the Table) is 0.9345; for z = – 2.69, the Area under Normal
Curve (from the Table) is 0.0036.
10. Some of the rules should be remembered:
Example:
A normal random variable x has mean µ = 24 and standard deviation σ = 1.8. Determine z values for x = 14,
15.9, 29.2 and 33. Also show these values on normal curve.
Solution:
For x = 14;
For x = 15.9;
For x = 29.2;
For x = 33;
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Example:
A normal random variable x has mean μ = 36 and standard deviation 2.05, determine the values of x for z = –
3.36, – 1.8, 0.95 and 2.75.
Solution:
x=μ+σ·z
Example:
The mean and SD of a normal random variable are 34.5 and 5.8 respectively. Find the following areas:
Solution:
Where
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Where
Continuity Correction:
1. A population with unknown mean and standard deviation can be assumed a normal population of the
frequency distribution of a sample is symmetrical. The sample mean and sample standard deviation are
used as estimates of population mean and population standard deviation respectively.
2. Observations or data are always discrete, recorded up to a certain degree of accuracy irrespective of
whether the variable itself is discrete or continuous.
3. When the symmetrical distribution of any data is assumed to be normal, a continuity correction is
applied to the observed values to make the data continuous.
4. If the data are recorded in whole numbers, data values are considered as mid-points of the intervals x ±
0.5, if the data are recorded up to one decimal place, data values are considered as mid points of the
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
intervals x ± 0.05 and so on. It should be cleared that the 0.5 and 0.05 should be subtracted from lower
limit and added to upper limit or at most limit.
A Binomial Distribution with large n and moderate p can be approximated to a Normal Distribution with mean
μ = nּ p and :
μ = nּ p
Example:
A pair of dice is rolled for 800 times. What is the probability that a total of 6 occur:
Solution:
n = 800
q=
(i) Probability of at least 100 times, i.e., P(100 ≤ x ≤ 800) or P(99.5 ≤ x ≤ 800.5):
P(–1.19 ≤ z ≤ 70.49)
From ‘Normal Area Table’ the Normal Area corresponding to – 1.19 is 0.1170
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
= 1 – 0.1170 = 0.8830
(ii) Probability of between 150 and 300 times, i.e., P(130 ≤ x ≤ 300) or P(149.5 ≤ x ≤ 300.5):
P(1.88 ≤ z ≤ 19.37)
From ‘Normal Area Table’ the Normal Area corresponding to 1.88 is 0.9699
= 1 – 0.9699 = 0.0301
Top
Home Page
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Regression
1. The term ‘regression’ was used by Sir Frances Galton in connection with the studies he made on the
statures fathers and sons.
2. It is a technique which determines a relationship between two variables to estimate one of the variables
(dependent) for a given value of the other variable (independent).
3. The variable whose value is to be estimated is called dependent variable (y) whereas the variable whose
value is given is called independent variable (x).
4. Examples of dependent and independent variables are:
Independent Dependent
Price Demand
Rainfall Yield
Credit sales Bad debts
Volume of production Manufacturing expenses
5. The values of the independent variable are assumed to be fixed. Hence it is not a random variable. On
the other hand, the dependent variable, whose values are determined on the basis of the independent
variable, is a random variable.
6. If x is the independent variable and y is the dependent variable then the relationship between x and y,
described by a straight line (y = a + bx), is called ‘linear relationship’.
Regression Lines:
1. If we plot the paired observations (X 1 Y 1 ), (X 2 Y 2 ), ……….., (X n Y n ) on a graph, the resulting set of points
is called a ‘scatter diagram’.
2. A scatter diagram indicates a relationship between the variables X and Y and the dots of the scatter
diagram tend to cluster around a curve or a line. Such a curve or line is known as ‘curve of regression’
or ‘line of regression’.
1. For a fixed value of independent variable ‘x’, if the value of dependent variable ‘y’ is observed a large
number of times, different values are possible each time because of the random error involved in the
measurement process. The mean of these y-values is called the ‘conditional mean of y given x’ and is
denoted by .
2. The linear relationship between and x is called a ‘population regression equation of y on x’:
3. An observation y i is the sum of a population mean and a component called ‘Random Error ( )’
(read as “epsilon”).
or
This equation is called a ‘linear regression model of y on x’ and is the random variable with mean is
equal to zero and variance .
4. In the above diagram, the line represents the line of regression of Y on X. The parameter α, which is the
expected value of Y when X = 0, is called Y-intercept. The parameter β is slope of the population
regression line and is known as the ‘population regression coefficient’. When the line slopes
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
downward to the right, the value of β will be negative; it then represents the amount of decrease in Y
for each unit increase in X.
5. In practice, the population regression line is unknown. Since the regression is defined by the Y-intercept
α and the slope β, therefore, the task of estimating the population regression line involves obtaining the
estimates of α and β (based on sample data). Thus the ‘population regression line’ (μ y/x = α + βx) is
estimated by the ‘sample regression line’ or ‘sample regression equation’:
------------------------ (i)
The problem of estimating the regression parameters α and β can be considered as fitting the best model
on the scatter diagram. One method for this purpose is the ‘method of least squares’.
1. According to the principle of least squares, a line or a curve is best fitted if the sum of squares of the
deviations of estimated values of y from the observed values of y is minimum. Such line or a curve is
called the ‘least square curve’ or ‘least square line’. And the sum of squares is called the ‘Error Sum of
Squares (ESS)’. Therefore, the ESS is to be minimised and is represented by:
ESS =
yi : observed values
------------------------ (ii)(a)
------------------------ (ii)(b)
3. The statistic a is the estimator of α, is called the ‘sample regression constant’, and it measures the y-
intercept of the sample regression line:
------------------------ (iii)
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
4. Now assume ‘y’ to be ‘independent’ and ‘x’ to be ‘dependent’. The ‘regression equation of x on y’ is as
follows:
------------------------ (i)
------------------------(ii)(a)
------------------------(ii)(b)
------------------------(iii)
Example:
X 2 4 6 7 9 10 11
Y 1 2 4 7 10 12 14
Required:
(b) Construct a scatter diagram and graph the fitted line on the scatter diagram, and
Solution:
(a):
Regression Line of Y on X
x y xy x2
2 1 2 4 –0.438 1.438 2.068
4 2 8 16 2.594 –0.594 0.353
6 4 24 36 5.626 –1.626 2.644
7 7 49 49 7.142 –0.142 0.020
9 10 90 81 10.174 –0.174 0.030
10 12 120 100 11.69 0.31 0.096
11 14 154 121 13.206 0.794 0.630
49.994 0.006
49 50 447 407 5.841
≈ 50 ≈0
-------------------- (i)
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
-------------------- (ii)
------------------------- (iii)
For x = 2,
x = 4,
x = 6,
x = 7,
x = 9,
x = 10,
x = 11,
(b):
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
ESS =
= 5.841
Coefficient of Determination:
It measures the variation in y about the sample mean . The term is called ‘Total Sum of
Squares (TSS)’.
It measures the variation in y about the estimated regression line. The term is called the
‘Error Sum of Squares (ESS)’:
ESS ≤ TSS
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
3. The ‘Regression Sum of Squares (RSS)’ is the difference or excess of TSS over ESS:
Therefore, the TSS is partitioned into two components, i.e., ESS and RSS:
4. RSS is the variation in y reduced (or explained) by the regression equation and the ESS is the variation
which remains (or unexplained) in y when regression line is filled. Thus, the total variation is divided
into two, i.e., explained variation and unexplained variation.
5. RSS is used as a measure of reliability of the estimate obtained by the filled regression line. For this
purpose the proportion of variation explained by the regression equation, called ‘Coefficient of
Determination’ denoted by r2, is calculated as:
Note that the minimum value of r2 is zero (when RSS = 0 and ESS = TSS), and the maximum value of r2
is +1 (when RSS = TSS and ESS = 0); therefore, r2 lies between 0 to 1:
0 ≤ r2 ≤ 1
r2 = b × d
Example:
Solution:
Coefficient of Determination
x y xy x2 y2
2 1 2 4 1
4 2 8 16 4
6 4 24 36 16
7 7 49 49 49
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
9 10 90 81 100
10 12 120 100 144
11 14 154 121 196
49 50 447 407 510
Top
Home Page
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Where x is the number of successes (values with a specified characteristic) in a sample of size n.
2. If the sampling procedure is simple random, with replacement, x is recognised as Binomial Random
Variable with parameters n and π, π is the probability of success. π can also be interpreted as the
population proportion, since:
Example:
A coordination team consists of seven members. The education of each member as follows: (G = Graduate, PG
= Post Graduate)
Members 1 2 3 4 5 6 7
Education G PG PG PG PG G G
(ii) Select all possible samples of two members from the population without replacement, and compute
the proportion of post-graduate members in each sample.
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
(iii) Compute the mean (μ p ) and the SD (σ p ) of the sample proportion computed in (ii).
Solution:
N=7
No. of PG = 4
π = 4/7 = 0.57
or alternatively
The central limit theorem also holds for the random variable p, which states that:
(i) The sampling distribution of proportion p approaches a normal distribution with mean and
SD (with replacement)
(ii) If the random sampling is without replacement and the sampling fraction , the f.p.c. must
be used as below in the formula of SD:
(iii) When n ≥ 50 and both n.π and n(1 – π) are greater than 5, the sampling distribution can be
considered ‘normal’.
(iv) When the distribution of p is normal, the following statistic will be standard normal variable:
1. If two random samples of size n 1 and n 2 are drawn independently from two populations with
proportions π 1 and π 2 the sampling distribution of (p 1 – p 2 ) the difference between two sample
proportions, approaches normal distribution with:
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
as n 1 and n 2 increase.
Moreover:
3. When the two unknown population proportions can be assumed equal, an estimated is obtained as
below:
Sampling Distribution of t:
1. If a random sample of size n is drawn from a known Normal Population with mean μ and SD σ, the
sampling distribution of the sample mean is a normal distribution with mean and standard
2. But when the population is unknown with unknown SD σ, the value of σ is replaced the sample SD ‘S’,
as given below:
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
3. According to W.S. Gossett, the following statistics is denoted by ‘t’ instead of ‘z’, which follows another
distribution known as ‘students’ t-distribution’ or simply ‘t-distribution’.
4. The sample standard deviation is given by:
In the above equation the (n – 1) is called ‘Degree of Freedom’ or simply d.f., through which we can
obtain ‘t-value’ from ‘t-table’.
5. The t-distribution approaches standard normal distribution as n increases. Typically when n > 30, the t-
distribution is considered approximately standard normal.
Properties of t-distribution:
1. The t-distribution, like the standard normal, is bell shaped, unimodal and symmetrical about the mean,
2. There is a different t-distribution for every possible sample size,
3. The exact shape of t-distribution, depends on the parameter, the number of degrees of freedom, denoted
by ν.
4. As the sample size increases, the shape of t-distribution becomes approximately equal to the standard
normal distribution:
Population Variance:
or alternatively
Example:
A population consists of the following numbers: 1,3,5,7. Find the population variance (σ2) and the mean of
sampling distribution of variances ( ), if all samples are drawn with replacement of size 2 from the
population.
Solution:
Samples:
Means of samples:
1 2 3 4
2 3 4 5
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
3 4 5 6
4 5 6 7
Variances of samples:
0 1 4 9
1 0 1 4
4 1 0 1
9 4 1 0
1. If random samples of size n 1 and n 2 are drawn independently from two normal populations with means
μ 1 and μ 2 and variances σ 1 2 and σ 2 2, the sampling distribution of the difference between the sample
means follows a normal distribution with mean and standard error given as below:
2. But if σ 1 2 and σ 2 2 are unknown and equal, their estimators S 1 2 and S 2 2 are defined as:
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
When the σ 1 2 and σ 2 2 are replaced by the estimators S 1 2 and S 2 2 the distribution of can be
standardised provided that the samples are large (n 1 and n 2 > 30).
3. But when samples are small, i.e., less than 30 (n 1 and n 2 ≤ 30), σ 1 2 and σ 2 2 are replaced by a single
estimator known as ‘pooled variance’ denoted by S p 2:
4. With same size of samples n 1 and n 2 , the estimator S p 2 is the simple average of S 1 2 and S 2 2:
5. The pooled variance S p 2 assumes that the population variance is unknown and equal. However, the
same S p 2 is used to replace σ 1 2 and σ 2 2 for slightly unequal population variances provided that the
samples are of equal size, i.e., n 1 = n 2 .
6. In both of the above situations, i.e., equal population variance and slightly unequal population variance
with equal samples (i.e., n 1 = n 2 ), the statistic t is calculated as below:
7. Now consider the situation where σ 1 2 and σ 2 2 are considerably different (both unknown) and it is
impossible to draw samples of equal size, the statistics used in this case would be:
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Top
Home
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
(a) Simple Average Method: Under this method, the average ( ) of all the monthly or quarterly
values for each year are found out. Each monthly or quarterly value (y i ) is divided by the
corresponding average and the results are expressed as percentage:
Then the mean index or seasonal index (S i ) is calculated for each month or quarter. If the mean of
all seasonal indices is not equal to 100, then they will be adjusted.
Example 8:
Take data from Example 1, and calculate the four seasonal indices by the ‘simple-average method’.
Solution:
y
Year/Quarter Mean
I II III IV
2003 219 357 645 513 433.5
2004 549 640 701 590 620
2005 657 394 543 600 548.5
Now the above observed values are converted to indices using the following formula:
(b) Link Relative Method: Under this method, the data for each month or quarter are expressed in
percentage, known as ‘Link Relatives’. An appropriate average of the link relatives is taken, usually
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
a median is taken. Convert these averages into a series of chain indices. The chain indices are
adjusted for the fraction of the effect of the trend. The adjusted chain indices are further reduced to
the same level as the first month or quarter.
Example 9:
Take the data from Example 1, and calculate seasonal indices by using ‘link relative method’.
Solution:
The observed values are converted into price relatives or link relatives using the following formula:
and then the link relatives are converted into chain indices (chaining process) using the following formula:
(c) Ratio to Moving Average Method: A 12-month or 4-quarter moving average centred is computed.
The observed values are divided by the corresponding centred moving average and the results are
expressed in percentage:
The monthly or quarterly averages of these percentages are found out. The adjusted values are the
indices of the seasonal variations.
Example 10:
Take data from Example 1 and calculate seasonal indices using ‘ratio to moving average method’.
Solution:
(d) Ratio to Trend Method: An average for each year is found out and a straight line is fitted by least
squares method. The trend values for each month or quarter are calculated on the assumption that
the data correspond to the middle of the month or quarter. Each original value is divided by the
corresponding calculated trend values and expressed in percentage. A mean of these percentages are
calculated for each month or quarter. The adjusted values are the indices of seasonal variation.
Example 11:
Take data from Example 1 and calculate seasonal indices using ‘ratio to trend method’.
Solution:
Quarters y x x x2
2003 I 219 –11 455 48.13%
II 357 –9 469 76.12
433.5 –8 –3468 64
III 645 –7 484 133.26
IV 513 –5 498 103.01
The cyclical and random components of a time series are first isolated from the time series using the
multiplicative model:
yi = Ti + Si + Ci + Ri
The Random component R i will now be separated from the time series by using the smoothing technique,
moving average. These moving averages show the indices of cyclical variation.
Example 12:
Take data from Example 1 and isolate cyclical component from the time series.
Generated by Unregistered Batch DOC & DOCX Converter 2009.1.429.1144, please register!
Solution:
C i × Ri = 3-Quarter
Moving
*
Quarters y S i ** Average
×100 (C i )
2003 I 219 455 86.28 392.57 55.79% –
II 357 469 85.80 402.40 88.72 85.10
III 645 484 120.28 582.16 110.79 98.41
IV 513 498 107.63 536.00 95.71 110.26
2004 I 549 512 86.28 441.75 124.28 120.51
II 640 527 85.80 452.17 141.54 124.52
III 701 541 120.28 650.71 107.73 115.95
IV 590 556 107.63 598.42 98.59 113.30
2005 I 657 570 86.28 491.80 133.59 103.60
II 394 584 85.80 501.07 78.63 95.86
III 543 599 120.28 720.48 75.37 81.74
IV 600 613 107.63 657.77 91.22 –
*
As calculated in the previous example
** As calculated in Example
Top
Home Page