Sie sind auf Seite 1von 12

1. .

Explain the characteristics of Statistics



Statistics is the study of the collection, organization, analysis, interpretation and presentation of data. It deals
with all aspects of this, including the planning of data collection in terms of the design of surveys and
experiments.


Characteristics of Statistics


Aggregate of Facts

Statistics does not refer to a single figure but it refers to a series of figures. A single weigh of 50 kg is not
statistics but a series relating to the weight of a group of persons is called statistics. It means, all those
figures which relate to the totality of facts are called statistics. Such figures should also be comparable.

Affected by Multiplicity of Causes

Statistics are not affected by one factor only, rather they are affected by a large number of factors. It
is because statistics are commonly used in social sciences. It is not an easy job to study the effects of any one
factor on a phenomenon and effects of different sets of factors separately. In nutshell, we can say that
statistics are affected considerably by multiply causes e.g. prices are affected by conditions of demand,
supply, money supply, imports, exports and various other factors.

Numerically
expressed

Another characteristic of statistics is that qualitative expressions like young, old, good, bad etc. are not
statistics. To all statistics a numerical value must be attached. For example, the statements like "There are
932 females per 1,000 statements must contain figures so that they are called numerical statements of facts.
Furthermore, such numerical expressions are precise, meaningful and convenient form of communication.

Enumerated or Estimated according to Reasonable Standards of Accuracy

In case the numerical statement are precise and accurate, then these can be enumerated. But in case the
number of observations is very large, in that case the figures are estimated. It is obvious that the estimated
figures cannot be absolutely accurate and precise. The accuracy, of course, depends on the purpose for
which statistics are collected. There cannot be uniform standard of accuracy for all types of enquiries.
Thus enumeration refers to exact count as there are ten students of statistics. It is 100% accurate
statement. On the other hand, estimation refers to round about figure e.g. we say that two lakhs people
participated in the Rally. There can be a few hundreds more or less. Thus statistical results are true only on
average.

Collected in a Systematic Manner

For accuracy or reliability of data, the figures should be collected in a systematic manner. If the figures are
collected in a haphazard manner, the reliability of such data will decrease. Thus for reasonable standard of
accuracy, the data should be collected in a systematic manner, otherwise the results would be erroneous.

Collected for a Pre-determined Purpose

The usefulness of the data collected would be negligible if the data are not collected with some pre-
determined purpose. The figures are collected with some objective in mind. The efforts made without any set
objective would render the collected figures useless. Thus the purpose of collecting data must be decided
well in advance. Besides, the objective should be concrete and specific. For example, if we want to collect
data on prices, then we must be clear whether we have to collect whole-sale or retail prices. If we want
data on retail prices, then we have to see the number of goods required to serve the objective.








Placed in Relation to each other



The collection of data is generally done with the motive to compare. If the figures collected are not
comparable in that case, they lose a large part of their significance. It means the figures collected
should be homogeneous for comparison and not heterogeneous. In case of heterogeneity, the figures
cannot be placed in relation to each other.

b. What are the components of Statistics? Give a brief description of each of the
components.


The Four Components of Statistics and Probability:

Gathering
Displaying
Interpretation
Inference

Gathering data, whether in or out of a classroom, occurs on a daily basis. We are always observing and
processing information as we go about the routine of our day. At this level data is like a pile of clothing that
has just come from a dryer. When we sort the clothing we can see some order; in the same sense we see order
in data when it can be displayed.

Displaying information occurs when we wish to communicate our data or when we want to make decisions
about them. These displays can take several forms such as circle graphs, line graphs, bar charts, stem and leaf
charts, or box and whisker plots. Displaying data is both an art and a science. For those who wish to explore
this topic further, one of the best collections of the elegance of data display can be found in Edward Tuftes
masterful work The Visual Display of Quantitative Information.

Interpreting data can begin by determining measures of central tendency, outliers, symmetry, and range of a
data set. Generally we call such measures the shape of the data, and determining these measures gives
people a good sense for the overall meaning of the data. Many times students experience the first two
components in the classroom, but do not explore further. When the last two components enter the curriculum
in a structured fashion, students can gain facility in higher ordered thinking skills and become skilled
consumers of information. They are then active instead of passive.

Inference is the highest cognitive level of working with data, and generally occurs when we wish to use data to
make decisions based on past information as well as make predictions of future trends and events. Taking
random samples of an event such as rolling dice allows us to look at past events. When we ask what is likely to
happen in the future, we enter the realm of inference. Much of AP Statistics at the high school level deals with
inference. doing statistics and probability activities with hands-on experiences at the elementary levels lays
the groundwork for future success in high school and college level statistics and probability courses

The four components above are from the preface to the new book my partner, Brad Fulton, and I are working
on to supplement our Simply Great series. The book will feature activities based on concrete statistical
experiences; students are then guided into the abstract probability of the situation.


Q2.Explain the objectives of Statistical Average. What are the requisites of a good
average?
Answer : Objectives of Averages
Averages occupy a prime place in the theory of statistical methods. That is why Bowley remarked, Statistics is
a science of averages. The following are the m objectives of an average:
1. Facilitates Comparison:The foremost purpose of average is that it facilitates comparison. For instance, a
comparison of the production of jute in Maharashtra a Punjab shows that production of jute in Maharashtra is
much more as compared Punjab.


2. Formulation of Policies:Averages are of great use in the formulation of various policy measures. For
instance, when the Govt, finds that there is a fear of low product of sugar, it can formulate various policies to
compensate the same.


3. Short Description:Averages help to present the raw data in a brief a systematic manner.
4. Representation of Universe:Average represents universe. According conclusions can be drawn in respect of
the universe as a whole.
Q3.a. Mention the Characteristics of Chi-square test
Chi-square test is one of the most commonly used non-parametric tests in statistical work. The Greek Letter _2
is used to denote this test. _2 describe the magnitude of discrepancy between the observed and the expected
frequencies. The value of _2 is calculated as:
Where, O1, O2, O3.On are the observed frequencies and E1, E2, E3En are the corresponding expected or
theoretical frequencies.
Characteristics of Chi-Square test
The following are the characteristics of a Chi-Square test (_2 test): - The _2 test is based on frequencies and not
on parameters
- It is a non-parametric test where no parameters regarding the rigidity of populations are required
- Additive property is also found in _2 test
- The _2 test is useful to test the hypothesis about the independence of attributes
- The _2 test can be used in complex contingency tables
- The _2 test is very widely used for research purposes in behavioral and social sciences including business
research
- While testing whether the observed frequencies of certain outcomes fits with expected frequencies defined by
a theoretical distribution, the _2 value defined here follows _2 distribution:
_2 = (Oi Ei )2/ Ei

where, Oi is the observed frequency and Ei is the expected frequency.




Key Statistic
The observed frequencies are the frequencies obtained from the observation, which are sample frequencies. The
expected frequencies are the calculated frequencies.
The following are the conditions for using the Chi-Square test:
1.The frequencies used in Chi-Square test must be absolute and not in relative terms.
2.The total number of observations collected for this test must be large.


3.Each of the observations which make up the sample of this test must be independent of each other.
4.As _2 test is based wholly on sample data, no assumption is made concerning the population distribution. In
other words, it is a non parametric-test.
5._2 test is wholly dependent on degrees of freedom. As the degrees of freedom increase, the Chi-Square
distribution curve becomes symmetrical.
6.The expected frequency of any item or cell must not be less than 5, the frequencies of adjacent items or cells
should be polled together in order to make it more than 5.
7.The data should be expressed in original units for convenience of comparison and the given distribution
should not be replaced by relative frequencies or proportions.
8.This test is used only for drawing inferences through test of the hypothesis, so it cannot be used for estimation
of parameter value.
Restrictions in applying Chi-Square test
The sample observations should be independently and normally distributed. For this; either the parent
population should be infinitely large (for example, greater than 50), or sampling should be done with
replacement.
Constraints imposed upon the observations must be of linear character, for example,
Oi = Ei
The _2 distribution is essentially a continuous distribution; however its character of continuity is maintained
only when the individual frequencies of the variate values remain greater than or equal to 5. So, in applying _2
test in the testing of the goodness of fit or testing of the dependency of variables in a contingency table, the cell
frequency should not be less than 5. In practical problems we can combine a few values of small frequencies
into one to get the pooled frequency greater than 5.
b. Two research workers classified some people in income groups on the basis of sampling
studies. Their results are as follow:

Investigator
s
Income groups Total
Poor Middle Rich
A 160 3
0
1
0
200
B 140 1
2
0
4
0
300
To
tal
300 1
5
0
5
0
500

Show that the sampling technique of atleast one research worker is defective.













We can now calculate value of _
2
as

follows:


Table 10.6

Groups Observed
frequency
O
Expected
frequency
E
O E (O E )
2
E
Investigator A
classifies people as poor


160


120


40


1600/120 = 13.33
classifies people as
middle class people

30

60

30

900/60 = 15.00
classifies people as rich 10 20 10 100/20 = 5.00
Investigator B
classifies people as poor


140


180


40


1600/180 = 8.88
classifies people as
middle class people

120

90

30

900/90 = 10.00
classifies people as rich 40 30 10 100/30 = 3.33


A Let us take the hypothesis that the sampling techniques adopted by research workers are similar (i.e.,
there is no difference between the techniques adopted by research workers). This being so, the expectation
of A investigator classifying the people in

(i) Poor income group =
200 300
= 120
500

(ii) Middle income group =
200 150
= 60
500

(iii) Rich income group =
200 50
= 20
500
Similarly the expectation of B investigator classifying the people in

(i) Poor income group =
300 300
= 180
500

(ii) Middle income group =
300 150
= 90
500

(iii) Rich income group =
300 50
= 30
500













Hence,

=55.4


: Degrees of freedom = (c 1) (r 1)
= (3 1) (2 1) = 2.
The table value of _
2
for two degrees of freedom at 5 per cent level of significance is 5.991. The calculated
value of _
2
is much higher than this table value which means that the calculated value cannot be said to have arisen
just because of chance. It is significant. Hence, the hypothesis does not hold good. This means that the sampling
techniques adopted by two investigators differ and are not similar. Naturally, then the technique of one must be
superior than that of the other.

Q4. What do you mean by cost of living index? Discuss the methods of construction of
cost of living index with an example for each.
The cost-of-living index, or general index, shows the difference in living costs between cities. The cost of living
in the base city is always expressed as 100. The cost of living in the destination is then indexed against this number.
So to take a simple example, if London is the base (100) and New York is the destination, and the New York index is
120, then New York is 20% more expensive than London. Similarly, if London is the base and Budapest is the
destination, and the Budapest index is 80, than the cost of living in Budapest is 80% of London's.
The cost-of-living index expresses the difference in the cost of living between any two cities in the survey. How
is this index calculated?
Using exactly the same price data, but different methods of calculation, a number of different people could come
up with a number of markedly different indices. The challenge, therefore, when seeking to construct an index is to
know which method is best for the problem at hand and to represent equitably (in one figure) the general trend of
price differences in separate locations. To illustrate this point, let us take a simple price survey comparing two
fictional cities, "Westwood" and "Leville."






Westwood Leville
Bread (1kg) 1.00 1.25
Potatoes (1kg) 3.00 2.00
Coffee (1kg) 2.50 1.75


Sugar (1kg) 1.00 1.75
TOTAL 7.50 6.75
Assuming we give equal weight to each of the products, which of the two towns deserves the higher cost of
living index number? The answer is: it all depends on how the calculation is made.
1) Westwood is more expensive if we simply add up the prices of the four items in the index and compare the
two cities on that basis.
2) Leville, however, is more expensive when we use Westwood as a base city and calculate an index based on
the average of relative prices in the two cities:
Westwood Leville
Bread 100 125
Potatoes 100 67
Coffee 100 70
Sugar 100 175
Index 100 109
However, if the same calculation is done with Leville serving as a base city, Westwood becomes the more
expensive city:
Leville Westwood
Bread 100 80
Potatoes 100 150
Coffee 100 143
Sugar 100 57
Index 100 107.50
Thus with the standard price-relatives calculation we can end up in the paradoxical situation where each city is
more expensive than the other.
3) Using a different method, both Leville and Westwood would have the same index number, ie 100, and neither
would be considered more expensive than the other. Such a calculation would be made according to a well-
established statistical formula that takes prices in both cities, makes an average of them, and uses this average as the
basis for the index comparison. This formula, adopted by the Economist Intelligence Unit for its indices, has some
distinct advantages over the standard price-relatives calculation described in Step 2 above.


With the EIU formula, for example, the paradoxical situation of the two cities being more expensive than each
other cannot arise: if city A = 100 and city B = 110, then this relationship is maintained, even if city B is used as a
base (when B = 100 then A = 91). In other words, the EIU indices are reversible. This property ensures that the cost
of living allowances established with the aid of the indices are consistent in that executives transferred from city A to
B can be dealt with on the same footing as those transferred from city B to A. In addition, the indices are nearly
circular. This means that the relationship between any three cities is maintained regardless of which of the cities is
used as a base with which to compare the other two. This logical inter-relationship is important in assuring equitable
cost of living compensation as executives are transferred from location to location.


The index formula. The index is based on the arithmetic mean of price levels in the two selected cities. In order
to calculate the index for the two hypothetical cities examined on the previous page, we must first calculate the
average price of each item:
Westwood Leville Average price
Bread 1.00 1.25 1.125
Potatoes 3.00 2.00 2.500
Coffee 2.50 1.75 2.125
Sugar 1.00 1.75 1.375
Next we compare prices in each town to these average prices:

Average Westwood Leville
Bread 100 89 111
Potatoes 100 120 80
Coffee 100 118 82
Sugar 100 73 127
General Index 100 100 100
As we can see the relationship between Westwood and Leville prices remains intact: bread is still 25% more
expensive in Leville, potatoes are still 50% more expensive in Westwood. If we want to compare Westwood as a
base city to Leville, we must divide Leville's index by that of Westwood and multiply by 100. The result is 100. If
we reverse the operation and use Leville as base, the result is also 100. The two cities are equally expensive.
There is another element to the discussion. In the example above, we have assumed that each item is as
important as the other. But that's clearly not true of every product in the survey: the price of a car is more important
in determining the index than the cost of a loaf of bread, for example. Every EIU Cost of Living index therefore
applies an identical set of weights for each product in the survey.


The weights have been selected on the basis of research that indicates that while there are certainly differences
among the various national spending patterns, there are also some average figures that can probably be accepted by
most companies.
The figures below indicate the sum of individual weights attributed to all the items which compose each of the
index categories. They are as follows:
%
Shopping basket 25.0
Alcoholic beverages 3.5
Household supplies 4.5
Personal care 4.0
Tobacco 2.5
Utilities 6.5
Clothing 13.0
Domestic help 3.5
Recreation & entertainment 18.0
Transportation 19.5
TOTAL 100.0




Of course, the average weightings shown above should not be taken to indicate that the average expatriate
spends 25% of his total income on food. What is meant is that of the amount spent on products included in sections
one to ten of the present survey, about 25% on average goes into the types of products included in section one
(shopping basket).
Q5.Define trend. Enumerate the methods of determining trend in time series.
Trend estimation is a statistical technique to aid interpretation of data. When a series of measurements of a
process are treated as a time series, trend estimation can be used to make and justify statements about tendencies in
the data, by relating the measurements to the times at which they occurred. By using trend estimation it is possible to
construct a model which is independent of anything known about the nature of the process of an incompletely
understood system (for example, physical, economic, or other system). This model can then be used to describe the
behaviour of the observed data.
3 methods for measuring the trends in a times series graph:
1. Free Hand Method:
In this method all of the data is plotted on a graph. A smooth curve is then drawn through the midpoints of each
fluctuation. The advantage of this method is that it is simple, flexible and does not need any complex mathematical
formula. However its main disadvantage is that it is based on subjective judgements and its lack of mathematical
accuracy can lead to bias of the results.
2. Moving Means Method
This method smooths out seasonal variations in a graph by taking data averages over each cycle. When
calculating moving means, take the same number of intervals as the length of the seasonal patterns. The advantage of
this method is that it is easy and simple to compute. The disadvantage is that if the proper period of the moving
means is not used then the results can be misleading.
3. Least Square Method
Mathematically this is the most accurate method of finding a trend line. This approach can be used to fit a
straight line, parabolic trend or exponential trend. In this book we will only deal with the straight line trend. The
calculations used in this method can be quite time consuming, however a number of graphic calculators and/or Excel
can easily compute the equations of a straight line, parabolic or exponential trend. The method involves taking the
sum of the deviations from the actual values and forming a mathematical equation which can be used for forecasting.
The disadvantage is that the computations can be complex and if data is added later then all computations have to be
repeated.


















Answer
Null Hypothesis
(a) The machines are homogenous
i.e.,

A =

B=

C =

D
(b) The workers are homogeneous
i.e.,

1 =

2 =

3=

4 =

5
Alternative Hypothesis
(a) At least two of the machines differ significantly
(b) At least two of the workers differ significantly
In the usual notation, we have:
K = 5, H = 4, N = KH = 5 4 = 20
G = Xij = 20;

Calculation of various s.s






Raw S.S (RSS) = Xij
2

= [(16+4+49+16)+(36+0+144+9) + (36+16+16+64) + (9+4+36+49)
+ (4+4+81+1)]
= 594
Total S.S = RSS - CF = 594 - 20 = 574
Workers Machine Type Total
A B C D
I 4 -2 7 -4 R
1 =
5

II 6 0 12 3 R
2 =
21

III -6 -4 4 -8 R
3 =
-14

IV 3 -2 6 -7 R
4 =
0

V -2 2 9 -1 R
5 =
8

Total C
1 =
5

C
2 =
-

6

C
3 =
38

C
4 =
-17

G = 20





S.S Rows (Workers) =


=161.5
S.S Columns (Machine Type) =


=338.8
SSE = Error S.S = TSS SSR SSC


= 574 161.5 338.8

= 73.7
Since the various sum of the squares are not affected by change of origin, the ANOVA
table for the original data and the given data obtained on changing the origin to 20 will
be same and in given in following table.
Degrees of Freedom for various S.S
d.f for TSS = n -1 = 20 -1 = 19
d.f for Rows (Workers) = 5 -1 = 4
d.f for Column (Machines ) = 4 -1 = 3
d.f for SSE = 19 - (4 + 3) = 12
OR d.f for SSE = (d.f for Rows) (d.f for columns)
= (3 4) = 12





: SSE = Error S.S = TSS SSR SSC


= 574 161.5 338.8

= 73.7
Since the various sum of the squares are not affected by change of origin, the ANOVA
table for the original data and the given data obtained on changing the origin to 20 will
be same and in given in following table.
Degrees of Freedom for various S.S
d.f for TSS = n -1 = 20 -1 = 19
d.f for Rows (Workers) = 5 -1 = 4
d.f for Column (Machines ) = 4 -1 = 3
d.f for SSE = 19 - (4 + 3) = 12
OR d.f for SSE = (d.f for Rows) (d.f for columns)
= (3 4) = 12

Das könnte Ihnen auch gefallen