Beruflich Dokumente
Kultur Dokumente
Session 1
Meaning of Statistics
Duration: 1 hr
The term statistics mean that the numerical statement as well as statistical methodology. When it is used in the sense of statistical data it refers to quantitative aspects of things and is a numerical description. Example: Income of family, production of automobile industry, sales of cars etc. These quantities are numerical. But there are some quantities, which are not in themselves numerical but can be made so by counting. The sex of a baby is not a number, but by counting the number of boys, we can associate a numerical description to sex of all newborn babies, for an example, when saying that 60% of all live-born babies are boy. This information then, comes within the realm of statistics.
Definition
The word statistics can be used is two senses, viz, singular and plural. In narrow sense and plural sense, statistics denotes some numerical data (statistical data). In a wide and singular sense statistics refers to the statistical methods. Therefore, these have been grouped under two heads Statistics as a data and Statistics as a methods.
Statistics as a Data
Some definitions of statistics as a data are a) Statistics are numerical statement of facts in any department of enquiring placed in relation to each other. - Powley b) By statistics we mean quantities data affected to a marked extent by multiple of causes. - Yule and Kendall c) By statistics we mean aggregates of facts affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to reasonable standard of accuracy, collected in a systematic manner for predetermined purpose and placed in relation to each other. - H. Secrist This definition is more comprehensive and exhaustive. It shows light on characteristics of statistics and covers different aspects.
Some characteristics the statistics should possess by H. Secrist can be listed as follows. Statistics are aggregate of facts Statistics are affected to a marked extent by multiplicity of causes. Statistics are numerically expressed Statistics should be enumerated / estimated Statistics should be collected with reasonable standard of accuracy Statistics should be placed is relation to each other.
Statistics as a method
Definition a) Statistics may be called to science of counting - A.L. Bowley b) Statistics is the science of estimates and probabilities. - Boddington c) Dr. Croxton and Cowden have given a clear and concise definition. Statistics may be defined as the collection, presentation, analysis and interpretation of numerical data. According to Croxton and Cowden there are 4 stages. a) Collection of Data A structure of statistical investigation is based on a systematic collection of data. The data is classified into two groups i) Internal data and ii) External data Internal data are obtained from internal records related to operations of business organisation such as production, source of income and expenditure, inventory, purchases and accounts. The external data are collected and purchased by external agencies. The external data could be either primary data or secondary data. The primary data are collected for first time and original, while secondary data are collected by published by some agencies. b) Organisations of data The collected data is a large mass of figures that needs to be organised. The collected data must be edited to rectify for any omissions, irrelevant answers, and wrong computations. The edited data must be classified and tabulated to suit further analysis.
c) Presentation of data The large data that are collected cannot be understand and analysis easily and quickly. Therefore, collected data needs to be presented in tabular or graphic form. This systematic order and graphical presentation helps for further analysis. d) Analysis of data The analysis requires establishing the relationship between one or more variables. Analysis of data includes condensation, abstracting, summarization, conclusion etc. With the help of statistical tools and techniques like measures of dispersion central tendency, correlation, variance analysis etc analysis can be done. e) Interpretation of data The interpretation requires deep insight of the subject. Interpretation involves drawing the valid conclusions on the bases of the analysis of data. This work requires good experience and skill. This process is very important as conclusions of results are done based on interpretation. We can define statistics as per Seligman as follows. Statistics is a science which deals with the method and of collecting, classifying, presenting, comparing and interpreting the numerical data collected to throw light on enquiry.
Importance of statistics
In todays context statistics is indispensable. As the use of statistics is extended to various field of experiments to draw valid conclusions, it is found increased importance and usage. The number of research investigations in the field of economics and commerce are largely statistical. Further, the importance and statistics in various fields are listed as below. a) State Affairs: In state affairs, statistics is useful in following ways 1. To collect the information and study the economic condition of people in the states. 2. To asses the resources available in states. 3. To help state to take decision on accepting or rejecting its policy based on statistics. 4. To provide information and analysis on various factors of state like wealth, crimes, agriculture experts, education etc. b) Economics: In economics, statistics is useful in following ways 1. Helps in formulation of economic laws and policies 2. Helps in studying economic problems 3. Helps in compiling the national income accounts. 4. Helps in economic planning.
c) Business 1. Helps to take decisions on location and size 2. Helps to study demand and supply 3. Helps in forecasting and planning 4. Helps controlling the quality of the product or process 5. Helps in making marketing decisions 6. Helps for production, planning and inventory management. 7. Helps in business risk analysis 8. Helps in resource long-term requirements, in estimating consumers preference and helps in business research. d) Education: Statistics is necessary to formulate the polices regarding start of new courses, consideration of facilities available for proposed courses. e) Accounts and Audits: 1. Helps to study the correlation between profits and dividends enable to know trend of future profits. 2. In auditing sampling techniques are followed.
Functions of statistics
Some important functions of statistics are as follows 1. To collect and present facts in a systematic manner. 2. Helps in formulation and testing of hypothesis. 3. Helps in facilitating the comparison of data. 4. Helps in predicting future trends. 5. Helps to find the relationship between variable. 6. Simplifies the mass of complex data. 7. Help to formulate polices. 8. Helps Government to take decisions.
Limitations of statistics
1. Does not study qualitative phenomenon. 2. Does not deal with individual items. 3. Statistical results are true only on an average. 4. Statistical data should be uniform and homogeneous. 5. A statistical result depends on the accuracy of data. 6. Statistical conclusions are not universally true.
7. Statistical results can be interpreted only if person has sound knowledge of statistics.
Distrust of Statistics
Distrust of statistics is due to lack of knowledge and limitations of its uses, but not due to statistical sciences. Distrust of statistics is due to following reasons. a) Figures are manipulated or incomplete. b) Quoting figures without their context. c) Inconsistent definitions. d) Selection of non-representative statistical units. e) Inappropriate comparison f) Wrong inference drawn. g) Errors in data collection.
Statistical Data
Statistical investigation is a long and comprehensive process and requires systematic collection of data in large size. The validity and accuracy of the conclusion or results of the study depends upon how well the data were gathered. The quality of data will greatly influence the conclusions of the study and hence importance is to be given to the data collection process. Statistical data may be classified as Primary Data and Secondary Data based on the sources of data collection.
Primary data
Primary data are those which are collected for the first time by the investigator / researchers and are thus original in character. Thus, data collected by investigator may be for the specific purpose / study at hand. Primary data are usually in the shape of raw materials to which statistical methods are applied for the purpose of analysis and interpretation.
Secondary data
Secondary have been already collected for the purpose other than the problem at hand. These data are those which have already been collected by some other persons and which have passed through the statistical analysis at least once. Secondary data are usually in the shape of finished products since they have been already treated statistically in one or the other form. After statistical treatment the primary data lose their original shape and becomes secondary data. Secondary data of one organisation become the primary data of other organisation who first collect and publish them.
f) The dependability
Sources of data
Primary source The methods of collecting primary data. When data is neither internally available nor exists as a secondary source, then the primary sources of data would be approximate. The various method of collection of primary data are as follows a) Direct personal investigation Interview Observation
b) Indirect or oral investigation c) Information from local agents and correspondents d) Mailed questionnaires and schedules e) Through enumerations
v) Ex: vi)
Trade Association Publications like Sugar factory, Textile mill, Indian chamber of Industry and Commerce. Stock exchange reports, Co-operative society reports etc.
News papers and periodicals The Financial Express, Eastern Economics, Economic Times, Indian Finance, etc.
Reports of various committees and commissions Ex: Kothari commission report on education Pay commission reports Land perform committee reports etc. Internal and administrative data like Periodical Loss, Profit, Sales, Production Rate, Balance Sheet, Labour Turnover, Budges, etc.
vii)
Unpublished statistics -
Session 2
Classification and Tabulation
Duration: 1 hr
The data collected for the purpose of a statistical inquiry some times consists of a few fairly simple figures, which can be easily understood without any special treatment. But more often there is an overwhelming mass of raw data without any structure. Thus, unwieldy, unorganised and shapeless mass of collected is not capable of being rapidly or easily associated or interpreted. Unorganised data are not fit for further analysis and interpretation. In order to make the data simple and easily understandable the first task is not condense and simplify them in such a way that irrelevant data are removed and their significant features are stand out prominently. The procedure adopted for this purpose is known as method of classification and tabulation. Classification helps proper tabulation. Classified and arranged facts speak themselves; unarranged, unorganised they are dead as mutton. - Prof. J.R. Hicks
Meaning of Classification
Classification is a process of arranging things or data in groups or classes according to their resemblances and affinities and gives expressions to the unity of attributes that may subsit among a diversity of individuals.
Definition of Classification
Classification is the process of arranging data into sequences and groups according to their common characteristics or separating them into different but related parts. - Secrist The process of grouping large number of individual facts and observations on the basis of similarity among the items is called classification. - Stockton & Clark
Characteristics of classification
a) Classification performs homogeneous grouping of data b) It brings out points of similarity and dissimilarities. c) The classification may be either real or imaginary d) Classification is flexible to accommodate adjustments 9
10
a) Geographical Classification In geographical classification, the classification is based on the geographical regions. Ex: Sales of the company (In Million Rupees) (region wise) Region Sales North South East West b) Chronological Classification If the statistical data are classified according to the time of its occurrence, the type of classification is called chronological classification. Sales reported by a departmental store Sales Month (Rs.) in lakhs January February March April May June c) Qualitative Classification In qualitative classifications, the data are classified according to the presence or absence of attributes in given units. Thus, the classification is based on some quality characteristics / attributes. Ex: Sex, Literacy, Education, Class grade etc. Further, it may be classified as a) Simple classification b) Manifold classification i) Simple classification: If the classification is done into only two classes then classification is known as simple classification. Ex: a) Population in to Male / Female b) Population into Educated / Uneducated 22 26 32 25 27 30 285 300 185 235
11
ii) Manifold classification: In this classification, the classification is based on more than one attribute at a time. Ex: Population
Smokers
Non-smokers
Literate
Illiterate
Literate
Illiterate
Male
Female
Male
Female
Male
Female
Male
Female
d) Quantitative Classification: In Quantitative classification, the classification is based on quantitative measurements of some characteristics, such as age, marks, income, production, sales etc. The quantitative phenomenon under study is known as variable and hence this classification is also called as classification by variable. Ex: For a 50 marks test, Marks obtained by students as classified as follows Marks 0 10 10 20 20 30 30 40 40 50 No. of students 5 7 10 25 3
Total Students = 50 In this classification marks obtained by students is variable and number of students in each class represents the frequency.
Classification of tables
Classification is done based on 1. Coverage (Simple and complex table) 2. Objective / purpose (General purpose / Reference table / Special table or summary table) 3. Nature of inquiry (primary and derived table). Ex: a) Simple table: Data are classified based on only one characteristic Distribution of marks Class Marks 30 40 40 50 50 60 Total No. of students 20 20 10 50
13
b) Two-way table: Classification is based on two characteristics Class Marks 30 40 40 50 50 60 Total No. of students Boys 10 15 3 28 Girls 10 5 7 22 Total 20 20 10 50
Frequency Distribution
Frequency distribution is a table used to organize the data. The left column (called classes or groups) includes numerical intervals on a variable under study. The right column contains the list of frequencies, or number of occurrences of each class/group. Intervals are normally of equal size covering the sample observations range. It is simply a table in which the gathered data are grouped into classes and the number of occurrences, which fall in each class, is recorded.
Definition
A frequency distribution is a statistical table which shows the set of all distinct values of the variable arranged in order of magnitude, either individually or in groups with their corresponding frequencies. - Croxton and Cowden A frequency distribution can be classified as a) Series of individual observation b) Discrete frequency distribution c) Continuous frequency distribution a) Series of individual observation Series of individual observation is a series where the items are listed one after the each observation. For statistical calculations, these observation could be arranged is either ascending or descending order. This is called as array.
14
The above data list is a raw data. The presentation of data in above form doesnt reveal any information. If the data is arranged in ascending / descending in the order of their magnitude, which gives better presentation then, it is called arraying of data.
The above example shows a discrete frequency distribution, where the variable has discrete numerical values.
15
By grouping the marks into class interval of 10 following frequency distribution tables can be formed. Marks 0-5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 No. of students 0 0 0 1 2 7 4 1 3 2
16
Session 3
Duration: 1 hr
L = Largest value S = Smallest value R = the no. of classes Ex: If the mark of 60 students in a class varies between 40 and 100 and if we want to form 6 classes, the class interval would be I= (L-S ) / K =
100 40 6
60 6
= 10
L = 100 S = 40 K=6
Therefore, class intervals would be 40 50, 50 60, 60 70, 70 80, 80 90 and 90 100.
17
Ex: Marks 20 30 30 40 40 50 class. Better way of expressing is Marks 20 to les than 30 (More than 20 but les than 30) 30 to les than 40 40 to les than 50 Total Students 15 25 50 No. of students 5 No. of students 5 15 25
A student whose mark is 29 is included in 20 29 class interval and a student whose mark in 39 is included in 30 39 class interval.
Class Frequency
The number of observations falling within class-interval is called its class frequency.
18
Ex: The class frequency 90 100 is 5, represents that there are 5 students scored between 90 and 100. If we add all the frequencies of individual classes, the total frequency represents total number of items studied.
b) The number of classes should be neither too large nor too small. Too small classes result greater interval width with loss of accuracy. Too many class interval result is complexity. c) All intervals should be of the same width. computations. The width of interval = Number of classes d) Open end classes should be avoided since creates difficulty in analysis and interpretation. e) Intervals would be continuous throughout the distribution. This is important for continuous distribution. f) The lower limits of the class intervals should be simple multiples of the interval. Ex: A simple of 30 persons weight of a particular class students are as follows. Construct a frequency distribution for the given data. 62 57 52 58 56 56 58 46 57 52 48 52 48 53 52 53 56 53 54 57 54 63 59 58 69 58 61 63 53 63
Range
Steps of construction
Step 1 Find the range of data Range = H L = 69 46 = 23 Step 2 Find the number of class intervals. Sturges formula K = 1 + 3.322 log N. K = 1 + 3.222 log 30 K = 5.90 Say K = 6 No. of classes = 6 Step 3 Width of class interval Width of class interval = Number of classes = Step 4 Conclusions all frequencies belong to each class interval and assign this total frequency to corresponding class intervals as follows.
Range
23 = 3.883 4 6
20
Frequency 3 8 8 6 4 1
In the above less than cumulative frequency distribution, there are 5 students less than 10, 3 less than 20 and 10 less than 30 and so on. Similarly, following table shows greater than cumulative frequency distribution. Ex: Marks No. of students Less than cumulative
21
frequency 0 10 10 20 20 30 30 40 40 50 5 3 10 20 12 50 45 42 32 12
In the above greater than cumulative frequency distribution, 50 students are scored more than 0, 45 more than 10, 42 more than 20 and so on.
Diagrammatic presentation
A diagram is a visual form for presentation of statistical data. The diagram refers various types of devices such as bars, circles, maps, pictorials and cartograms etc.
Importance of Diagrams
1. They are simple, attractive and easy understandable 2. They give quick information 3. It helps to compare the variables 4. Diagrams are more suitable to illustrate discrete data 5. It will have more stable effect in the readers mind. Limitations of diagrams 1. Diagrams shows approximate value 2. Diagrams are not suitable for further analysis 3. Some diagrams are limited to experts (multidimensional) 4. Details cannot be provided fully 5. It is useful only for comparison
22
i) Each diagram should have suitable title indicating the theme with which diagram is intended at the top or bottom. ii) The size of diagram should emphasize the important characteristics of data. iii) Approximate proposition should be maintained for length and breadth of diagram. iv) A proper / suitable scale to be adopted for diagram v) Selection of approximate diagram is important and wrong selection may mislead the reader. vi) Source of data should be mentioned at bottom. vii) Diagram should be simple and attractive viii) Diagram should be effective than complex.
i) Line diagram This is simplest type of one-dimensional diagram. On the basis of size of the figures, heights of the bar / lines are drawn. The distances between bars are kept uniform. The limitation of this diagram are it is not attractive cannot provide more than one information. Ex: Draw the line diagram for the following data
23
2001 5
2002 7
2003 12
2004 5
2005 13
2006 15
(15) (13)
2001
Year
Indication of diagram: Highest FCD is at 2006 and lowest FCD are at 2001 and 2004. b) Simple bars diagram A simple bar diagram can be drawn using horizontal or vertical bar. In business and economics, it is very a common diagram. Vertical bar diagram The annual expresses of maintaining the car of various types are given below. Draw the vertical bar diagram. The annual expenses of maintaining includes (fuel + maintenance + repair + assistance + insurance). Type of the car Maruthi Udyog Hyundai Tata Motors Expense in Rs. / Year 47533 59230 63270 Source: 2005 TNS TCS Study Published at: Vijaya Karnataka, dated: 03.08.2006
24
70000 65000 60000 55000 50000 45000 40000 35000 30000 Maruthi Udyog Hyundai Tata Motors
Source: 2005 TNS TCS Study Published at: Vijaya Karnataka, dated: 03.08.2006 Indicating of diagram a) Annual expenses of Maruthi Udyog brand car is comparatively less with other brands depicted b) High annual expenses of Tata motors brand can be seen from diagram. Horizontal bar diagram World biggest top 10 steel makers are data are given below. Draw horizontal bar diagram. Steel maker Prodn. in million tonnes
Arcelo r Mittal Nippo n POSCO JFE BAO Steel US Stee l NUCOR RIVA Thyssenkrupp Tangshan
63270 59230
47533
110
32
31
30
24
20
18
18
17
16
25
Tangshan Thyssen-krupp
RIVA NUCOR US Steel BAO Steel JFE POSCO Nippon Arcelor Mittal
26
350 300 Value in Rs. 250 200 150 100 50 0 1 2 Model of Car
Santro Zen Wagnor
Source: True value used car purchase data Published by: Vijaya Karnataka, dated: 03.08.2006 Ex: Represent following in suitable diagram Class Male Female Total A 1000 500 1500 2300 B 1500 800 2300 2500 C 1500 1000 2500
800
1000
27
Ex: Draw the suitable diagram for following data Mode of investment NSC MIS Mutual Fund LIC Total Investment in 2004 in Rs. Investment 25000 15000 15000 3000 58000 %age 43.10 25.86 25.86 5.17 100 Investment in 2005 in Rs. Investment 30000 10000 25000 1000 66000 %age 45.45 15.15 37.87 1.52 100
110 100 90
5.17 25.86
1.52 37.87
% of Investment
80 70 60 50 40 30 20 10 0
25.86
15.15
43.10
45.45
2004
2005
Year
Two-dimensional diagram In two-dimensional diagram both breadth and length of the diagram (i.e. area of the diagram) are considered as area of diagram represents the data. The important two-dimensional diagrams are a) Rectangular diagram b) Square diagram a) Rectangular diagram Rectangular diagrams are used to depict two or more variables. This diagram helps for direct comparison. The area of rectangular are kept in proportion to the values. It may be of two types. i) ii) Percentage sub-divided rectangular diagram Sub-divided rectangular diagram
28
In former case, width of the rectangular are proportional to the values, the various components of the values are converted into percentages and rectangles are divided according to them. While later case is used to show some related phenomenon like cost per unit, quality of production etc. Ex: Draw the rectangle diagram for following data Item Expenditure Provisional stores Education Electricity House Rent Vehicle Fuel Total Expenditure in Rs. Family A 1000 250 300 1500 500 3500 Family B 2000 500 700 2800 1000 7000
Total expenditure will be taken as 100 and the expenditure on individual items are expressed in percentage. The widths of two rectangles are in proportion to the total expenses of the two families i.e. 3500: 7000 or 1: 2. The heights of rectangles are according to percentage of expenses. Monthly expenditure Item Expenditure Provisional stores Education Electricity House Rent Vehicle Fuel Total Family A (Rs. 3500) Rs. 1000 250 300 1500 500 3500 %age 28.57 7.14 8.57 42.85 12.85 100 Family B(Rs. 7000) Rs. 2000 500 700 2800 1000 7000 %age 28.57 7.14 10 40 14.28 100
29
% of Expenditure
80
60
40
20
Family
b) Square diagram To draw square diagrams, the square root is taken of the values of the various items to be shown. A suitable scale may be used to depict the diagram. Ratios are to be maintained to draw squares. Ex: Draw the square diagram for following data 4900 2500 1600 Solution: Square root for each item in found out as 70, 50 and 40 and is divided by 10; thus we get 7, 5 and 4.
6000
5000
4900
4000
3000
2500 1600 4 1 5 7
2000
1000
30
Pie diagram
Pie diagram helps us to show the portioning of a total into its component parts. It is used to show classes or groups of data in proportion to whole data set. The entire pie represents all the data, while each slice represents a different class or group within the whole. Following illustration shows construction of pie diagram.
Solution: Item / Source Value in crores 9600 49300 18900 48800 126600 Angle of circle
9600 x 360 = 27.30 o 126600 49300 x 360 = 140.20 o 126600 18900 x 360 = 53.70 o 126600 48800 x 360 = 138.80 o 126600
%ge
360o
31
7.58 38.5 39
14.92
Source: India Today 19 June, 2006
Graphic presentation
A graphic presentation is a visual form of presentation graphs are drawn on a special type of paper known are graph paper. Common graphic representations are a) Histogram b) Frequency polygon c) Cumulative frequency curve (ogive)
32
3. It is appropriate and effective to 3. It creates problem measure more variable 4. It cant be used for further analysis 5. It gives comparison 6. Data are rectangles represented by 4. Can be used for further analysis 5. It shows variables relationship between
Frequency Histogram
In this type of representation the given data are plotted in the form of series of rectangles. Class intervals are marked along the x-axis and the frequencies are along the y-axis according to suitable scale. Unlike the bar chart, which is one-dimensional, a histogram is two-dimensional in which the length and width are both important. A histogram is constructed from a frequency distribution of grouped data, where the height of rectangle is proportional to respective frequency and width represents the class interval. Each rectangle is joined with other and the blank space between the rectangles would mean that the category is empty and there are no values in that class interval. Ex: Construct a histogram for following data. Marks obtained (x) No. of students (f) 15 25 25 35 35 45 45 55 55 65 65 75 Total 5 3 7 5 3 7 30 Mid point 20 30 40 50 60 70
33
For convenience sake, we will present the frequency distribution along with mid-point of each class interval, where the mid-point is simply the average of value of lower and upper boundary of each class interval.
7 6 5 4 3 2 1 0 15 25 35 45 55 65 75
Frequency polygon
A frequency polygon is a line chart of frequency distribution in which either the values of discrete variables or the mid-point of class intervals are plotted against the frequency and those plotted points are joined together by straight lines. Since, the frequencies do not start at zero or end at zero, this diagram as such would not touch horizontal axis. However, since the area under entire curve is the same as that of a histogram which is 100%. The curve must be enclosed, so that starting mid-point is jointed with fictitious preceding mid-point whose value is zero. So that the beginning of curve touches the horizontal axis and the last mid-point is joined with a fictitious succeeding mid-point, whose value is also zero, so that the curve will end at horizontal axis. This enclosed diagram is known as frequency polygon. Ex: For following data construct frequency polygon. Marks (CI) No. of frequencies (f) 15 25 25 35 35 45 45 55 55 65 65 75 5 3 7 5 3 7 Mid-point 20 30 40 50 60 70
34
10
A Frequency polygon
Frequency
0 0 10 20 30 40 50 60 70 80 90 100
35
30
20
15
10
5 20 30 40 50 60 70
35
30
25
20
15
10
10
20
30
40
50
60
70
36