Sie sind auf Seite 1von 20

Business Statistics

Business Statistics

HUMAN PERITUS
ww.humanperitus.com

1 Human Peritus, www.humanperitus.com


Business Statistics

Copyright © 2019 Human Peritus


All rights reserved. No part of this publication may be commercially used,
reproduced, distributed, or transmitted in any form or by any means without
the prior written permission of Human Peritus. The copyright infringement is
a criminal offence under various sections of the Indian Copyright Act, 1957
and liable for damages, penalties and imprisonment up to 3 years.

Ph +91 97177 81110


contact@humanperitus.com
www.humanperitus.com

2 Human Peritus, www.humanperitus.com


Business Statistics

Table of Contents

1. Measures of Tendency, Dispersion and Skewness ..................................................................... 5


1.1 Measures of Central Tendency .......................................................................................................... 5
1.2 Measures of Dispersion ................................................................................................................... 10
1.3 Measures of Skewness .................................................................................................................... 13
2. Probability Theory ................................................................................................................. 15
2.1 Concepts of Probability ................................................................................................................... 15
2.2 Additive Law of Probability ............................................................................................................. 18
2.3 Conditional Probability ................................................................................................................... 20
2.4 Multiplicative law of Probability ...................................................................................................... 22
2.5 Bayes’ Theorem .............................................................................................................................. 24
2.6 Approaches to Probability ............................................................................................................... 26
3. Probability Distributions ........................................................................................................ 27
3.1 Binomial Probability Distribution .................................................................................................... 28
3.2 Poisson Distribution ........................................................................................................................ 29
3.3 Normal Distribution ........................................................................................................................ 30
3.4 Exponential Distribution ................................................................................................................. 33
4. Research Design and Data ..................................................................................................... 35
4.1 Research Design.............................................................................................................................. 35
4.2 Research Types ............................................................................................................................... 36
4.3 Data Types...................................................................................................................................... 38
4.4 Data Collection ............................................................................................................................... 39
5. Sampling and Estimation ....................................................................................................... 40
5.1 Methods of Sampling ...................................................................................................................... 40
5.2 Point Estimation ............................................................................................................................. 42
5.3 Sampling Distribution ..................................................................................................................... 43
5.4 Interval Estimation.......................................................................................................................... 45
6. Hypothesis Testing ................................................................................................................ 51
6.1 Hypothesis Testing .......................................................................................................................... 51
6.2 Type I and Type II Errors .................................................................................................................. 52
6.3 Z test ( is known) ............................................................................................................................ 53
6.4 t test ( is unknown) ........................................................................................................................ 58
6.5 Hypothesis test for Population Proportion ...................................................................................... 59

3 Human Peritus, www.humanperitus.com


Business Statistics

7. Chi-square and F Tests ........................................................................................................... 61


7.1 Variance of a Single Population ....................................................................................................... 61
7.2 Variances of Two Populations ......................................................................................................... 62
7.3 Goodness of Fit Test........................................................................................................................ 64
7.4 Test for Independence .................................................................................................................... 65
7.5 ANOVA ........................................................................................................................................... 66
8. Non-Parametric Tests ............................................................................................................ 69
8.1 Mann-Whitney-Wilcoxon Test ......................................................................................................... 69
8.2 Kruskal-Wallis Test.......................................................................................................................... 70
8.3 Rank Correlation Test ...................................................................................................................... 71
9. Correlation and Regression .................................................................................................... 74
9.1 Covariance ...................................................................................................................................... 74
9.2 Correlation Coefficient .................................................................................................................... 74
9.3 Regression ...................................................................................................................................... 76
9.4 Regression Equations ...................................................................................................................... 78
9.5 Coefficient of Determination ........................................................................................................... 79
10. Report Writing ................................................................................................................... 81

4 Human Peritus, www.humanperitus.com


Business Statistics

1. Measures of Tendency, Dispersion and Skewness


1.1 Measures of Central Tendency
The Measures of central tendency are statistical constants, which enable us to comprehend in a single effort the
significance of the whole. In other words, it is a score, that indicates where the center of the distribution tends to be
located.

The main objectives of Measure of Central Tendency are:


1) To condense data in a single value.
2) To facilitate comparisons between data.

The following are the measures of average or central tendency that are in common use:
(i) Arithmetic Average or Arithmetic Mean or Simple Mean
(ii) Median
(iii) Mode
(iv) Geometric mean and Harmonic mean
(v) Quartiles, Deciles and Percentile

Arithmetic mean, Geometric mean and Harmonic means are called Mathematical averages while Mode, Median,
Quartiles, Deciles and Percentile are called Positional averages.

Mean
Arithmetic Mean (AM) is the most commonly used measure of central tendency. It is defined as the sum of the values
of all observations divided by the number of observations and is usually denoted by X . In general, if there are N
observations as X1, X2, X3,..., XN

X1 + X2 + X3 + …… XN
̅) is given by, X
Then the Arithmetic Mean (𝐗 ̅=
N
ΣX
̅=
Which can also be written as X
N
There are three methods to find the arithmetic mean, which we will understand by using Income data of 10 families:

Families Column I: Column II: Column III:


Income d=X-850 d`=(X-850)/10
A 850 0 0
B 700 -150 -15
C 100 -750 -75
D 750 -100 -10
E 5000 4150 415
F 80 -770 -77
G 420 -430 -43
H 2500 1650 165
I 400 -450 -45
J 360 -490 -49
11160 2660 266

(i) Direct Method: In direct method, the AM is calculated by direct formula, given below:
X1 + X2 + X3 + …… XN
̅=
X
N
In the above table, the Column I, shows calculation of AM by direct Formula

5 Human Peritus, www.humanperitus.com


Business Statistics

11160
̅=
X = 1116
10

(ii) Assumed Mean Method: If the number of observations in the data is more, it is difficult to compute arithmetic
mean by direct method. The computation can be made easier by using assumed mean method.
Here you assume a particular figure in the data as the arithmetic mean, on the basis of logic/experience. Then you may
take deviations of the said assumed mean from each of the observation. You can, then, take the summation of these
deviations and divide it by the number of observations in the data.

The actual arithmetic mean is estimated by taking the sum of the assumed mean and the ratio of sum of deviations to
number of observations.
Let, A = assumed mean
X = individual observations
N = total numbers of observations
d = deviation of assumed mean from individual observation, i.e. d = X – A

Then sum of all deviations is taken as Σd = Σ (X-A)


Σd
Then we find and add A to it to get ̅
X
N
Σd
̅
X = A+
N
The Column II of above table shows calculation by the Assumed Mean Method.
Here we assumed mean to be A= 850
We get Σd = 2660
2660
̅
X = 850 +
10
= 1116

(iii) Step Deviation Method: The calculations can be further simplified by dividing all the deviations taken from
assumed mean by the common factor ‘c’. The objective is to avoid large numerical figures, i.e., if
d = X – A is very large, then find d’

d X−A
d’= =
c c

The formula for AM becomes


Σd`
̅ = A+
X ×c
N

The Column III of table (previous section) shows calculation by the Step Deviation Method.
Here we assumed mean to be c = 10 and A= 850
We get ∑ d` = 266
266
̅
X = 850 + × 10
10
= 1116

The mean of a sample of data organized in a frequency distribution is computed by the following formula:
f1 .x1 +f2 .x2 ……….fN .xN
̅
X=
f1 +f2 ……….fN .
where fi is the frequency of Class i and xi is the class mid-point of Class i.

Properties of Arithmetic Mean


(i) The sum of deviations of items about arithmetic mean is always equal to zero. Symbolically, Σ ( X –X ̅ ) = 0.
(ii) The arithmetic mean is affected by extreme values. Any large value, on either end, can push it up or down. Extreme
numbers relative to the rest of the data is called Outliers.

6 Human Peritus, www.humanperitus.com


Business Statistics

(iii) If each value of a variable X is increased or decreased or multiplied by a constant k, the arithmetic mean also
increases or decreases or multiplies by the same constant k.
(iv) The sum of squares of deviation of set of values about its mean is minimum.

Please note that, in a skewed distribution, the calculated mean can miss the centrality. This problem occurs because
outliers have a substantial impact on the mean. Extreme values in an extended tail pull the mean away from the center.
As the distribution becomes more skewed, the mean is drawn further away from the center. Consequently, it is best to
use the mean as a measure of the central tendency when you have a "symmetric distribution".

Weighted Mean
The Weighted Average Mean is measure of Central Tendency of a set of quantitative observations, when not all the
observations have same importance. In other words, it is important to assign weights to various data values, according
to their importance. In that case, the mean is called the Weighted Mean. If Mohan sold 5 blue balls for price of Rs
0.50 each, 15 red balls for price of Rs 0.75 each, 15 grey balls for price of Rs 0.90 each and 15 black balls for price of
Rs 1.10 each, the average selling price of a ball is given by weighted mean, as calculated below.

w1 x1 +w2 x2 +⋯…… wn xn 5 ×0.50+15 ×0.75+15 ×0.90+15 ×1.10 43.75


̅=
Weighted Mean, X = = = 0.875
w1 +w2 +⋯…… wn 5+15+15+15 50

The Arithmetic Mean and Number of observations for a Population is represented by µ and N, respectively. The
Arithmetic Mean and Number of observations for a Sample is represented by x̅ and n, respectively.

Median
The Median is that positional value of the variable which divides the distribution into two equal parts, one part
comprises all values greater than or equal to the median value and the other comprises all values less than or equal to
it. The Median is the “middle” element when the data set is arranged in order of the magnitude.

Since the median is determined by the position of different values, it remains unaffected if, say, the size of the largest
value increases (unlike mean).

The median can be easily computed by sorting the data from smallest to largest and finding out the middle value.
Suppose we have the following observation in a data set: 5, 7, 6, 1, 8, 10, 12, 4, and 3.

Arranging the data, in ascending order you have:


1, 3, 4, 5, 6, 7, 8, 10, 12.

The “middle score” is 6, so the median is 6. Half of the scores are larger than 6 and half of the scores are smaller.

However, if there are even numbers in the data, there will be two observations which fall in the middle. The median in
this case is computed as the arithmetic mean of the two middle values.

If arranged data is 1, 3, 4, 5, 6, 7, 7, 8, 10, 12, then the Median will be


6+7
Median =
2
= 6.5

Thus we conclude that:


n+1
If there are Odd number of observations, then value of th term will given value of Median.
2
n n+1
If there are Even number of observations, then average of th and th terms will given value of Median.
2 2

Median for grouped data is given by following formula


n
−C
2
Median = L + h ×
f

7 Human Peritus, www.humanperitus.com


Business Statistics

Where:
L is the lower limit of the median class
h is class interval size of the model class
C is the cumulative frequency before the median class
f is the frequency of the median class
n is number of values or total frequency.

Outliers and skewed data have a smaller effect on the Median.

Mode
Mode is the most frequently observed data value. It is denoted by Mo. It has been derived from the French word “la
Mode” which signifies the most fashionable values of a distribution, because it is repeated the highest number of times
in the series.

Consider the data set 1, 2, 3, 4, 4, 5. The mode for this data is 4 because 4 occurs most frequently (twice) in the data.

In this example, as there is a unique value of mode, the data is unimodal. But, the mode is not necessarily unique,
unlike arithmetic mean and median. You can have data with two modes (bi-modal) or more than two modes (multi-
modal). Example of bi-modal data is 1, 2, 2, 3, 4, 4, 5 (because both 2 and 4 are appearing twice each).
It may be possible that there may be no mode if no value appears more frequent than any other value in the distribution.
For example, in a series 1, 1, 2, 2, 3, 3, 4, 4, there is no mode.

In the continuous data below, no values repeat, which means there is no mode. With continuous data, it is unlikely that
two or more values will be exactly equal because there are an infinite number of values between any two values.
However, you can find the mode for continuous data by locating the maximum value on a probability distribution plot.
If you can identify a probability distribution that fits your data, find the peak value and use it as the mode.

Comparing Mean, Median and Mode


For a perfectly symmetrical distribution, Mean and Median are always equal.
Suppose we express,
Arithmetic Mean = Me
Median = Mi
Mode = Mo
The relative magnitude of the three are Me>Mi>Mo or Me<Mi<Mo

The median is always between the arithmetic mean and the mode.

For moderately asymmetrical distribution (or for asymmetrical curve), the relation
Mean – Mode = 3 (Mean - Median), approximately holds. In such a case, first evaluate mean and median and then
mode is determined by
Mode = 3 × Median – 2 × Mean

8 Human Peritus, www.humanperitus.com


Business Statistics

When you have a symmetrical distribution for continuous data, the mean, median, and mode are equal. In this case,
analysts tend to use the mean because it includes all of the data in the calculations. However, if you have a skewed
distribution, the median is often the best measure of central tendency.

When you have ordinal data, the median or mode is usually the best choice. For categorical data, you have to use the
mode.

Harmonic Mean and Geometric Mean


The Harmonic Mean of a series of values is the reciprocal of the arithmetic means of their reciprocals. In other words,
the HM is calculated by dividing the number of observations by the reciprocal of each number in the series. Thus if
X1, X2, X3,…, Xn (none of them being zero) is a series and H is its harmonic mean then
n
H= 1 1 1 1
+ + +
X1 X2 X 3 Xn

The Geometric Mean, G of a series of values X1, X2, X3,…, Xn (none of them being zero) is defined as the nth root
product of n numbers.

G = 𝑛√X1 . X 2 . X 3 … … . X n

For two observations, the product of Arithmetic Mean (AM) and Harmonic Mean (HM) is equal to square of Geometric
Mean (GM).
𝐆𝐌 𝟐 = 𝐀𝐌 × 𝐇𝐌

Also, AM ≥ GM ≥ HM

Partition Values
If the values of the variate are arranged in ascending or descending order of magnitudes then we have seen above that
median is that value of the variate which divides the total frequencies in two equal parts.

Similarly the given series can be divided into 4, 10 and 100 equal pars. The values of the variate dividing into 4 equal
parts are called Quartile, into 10 equal parts are called Decile and into 100 equal parts are called Percentile.

Quartiles
The values of the variate which divide the total frequency into 4 equal parts, are called quartiles. There are three
quartiles.
The first Quartile (denoted by Q1 ) or lower quartile has 25% of the items of the distribution below it and 75% of the
items are greater than it.
The second Quartile (denoted by Q2 ) or median has 50% of items below it and 50% of the observations above it.
The third Quartile (denoted by Q3 ) or upper Quartile has 75% of the items of the distribution below it and 25% of
the items above it. Thus, Q1 and Q3 denote the two limits within which central 50% of the data lies.

Illustration: Calculate the value of lower quartile (Q1) and upper quartile (Q3) from the data of the marks obtained by
ten students in an examination.
22, 26, 14, 30, 18, 11, 35, 41, 12, 32.
Solution: Arranging the data in an ascending order,
11, 12, 14, 18, 22, 26, 30, 32, 35, 41.
n+1
Q1 = value of th term
4
10+1
= value of th term
4
= value of 2.75th item
= 2nd term+ .75 (3rd term – 2nd term)
= 12 + .75(14 –12)

9 Human Peritus, www.humanperitus.com


Business Statistics

= 13.5 marks.
3.(n+1)
Similarly upper quartile (Q3) can be calculated by finding value of th term.
4

Percentiles
Percentiles divide the distribution into 100 equal parts, so you can get 99 dividing positions denoted by P1, P2, P3 , ...,
P99. The P50 is the median value.

To understand this, if you have scored 82 percentile in CAT exam, it means that your position is below 18 per cent of
total candidates appeared in the exam. It also means that 18% candidates score marks, greater than your score.

1.2 Measures of Dispersion


An average gives an idea of central tendency of the given distribution but it is necessary to know how the variates are
clustered around or scattered away from the average. The degree to which numerical data tend to spread about an
average value is called variation or dispersion or spread of the data. Central tendency describes the central point of
the distribution, and variability describes how the scores are scattered around that central point. Together, central
tendency and variability are the two primary values that are used to describe a distribution of scores.

When the population variability is small, all of the scores are clustered close together and any individual score or
sample will necessarily provide a good representation of the entire set. On the other hand, when variability is large and
scores are widely spread, it is easy for one or two extreme scores to give a distorted picture of the general population.

Various measures of dispersion or variation are available like Range, Mean Deviation, Variance or Standard Deviation.

Range
Range (R) is the difference between the largest (L) and the smallest value (S) in a distribution. Thus,
R=L–S
Higher value of Range implies higher dispersion and vice-versa.

Quartile Deviation
The presence of even one extremely high or low value in a distribution can reduce the utility of range as a measure of
dispersion. Thus, you may need a measure which is not unduly affected by the outliers.

In such a situation, if the entire data is divided into 4 equal parts, each containing 25% of the values, we get the values
of Quartiles and Median. The upper and lower quartiles (Q3 and Q1 , respectively) are used to calculate Inter Quartile
Range which is Q3 – Q1. Inter-Quartile Range is based upon middle 50% of the values in a distribution and is, therefore,
not affected by extreme values.

Half of the Inter-Quartile Range is called Quartile Deviation (QD).

Q3 −Q1
Quartile Deviation (QD) =
2
The Quartile Deviation (QD) is also called Semi inter Quartile Range. Quartile Deviation can generally be
calculated for open-ended distributions and is not unduly affected by extreme values.

Calculate Range and Q.D. of the following observations: 20, 25, 29, 30, 35, 39, 41, 48, 51, 60 and 70.

Range is clearly 70 – 20 = 50 (difference of highest and lowest values)


For Q.D., we need to calculate values of Q3 and Q1.
n+1
Q1 = value of th term
4
= Value of 3rd term ( n=11)
= 29
3(n+1)
Q3 = value of th term
4

10 Human Peritus, www.humanperitus.com


Business Statistics

= Value of 9th term


= 51
Q3 −Q1
Quartile Deviation (QD) =
2
51−29
=
2
= 11

Mean Deviation and Standard Deviation


As we learnt earlier that, the dispersion is the extent to which values differ from their average. Range and Quartile
Deviation are not useful in measuring, how far the values are, from their average. They just calculates the spread. Two
measures which are based upon deviation of the values from their average are Mean Deviation and Standard Deviation.

Since the average is a central value, some deviations are positive and some are negative. If these are added as they are,
the sum will not reveal anything. In fact, the sum of deviations from Arithmetic Mean is always zero.

The Mean Deviation tries to overcome this problem by ignoring the signs of deviations, i.e., it considers all deviations
positive. In case of Standard Deviation, the deviations are first squared and averaged and then square root of the
average is found.

Mean Deviation
The Mean Deviation is the Arithmetic Mean of absolute deviations of the observations from a measure of central
tendency. The steps to calculate mean deviation are given below:
(i) The A.M. of the values is calculated
(ii) Difference between each value and the A.M. is calculated. All differences are considered positive. These are
denoted as |d|
(iii)The A.M. of these differences (called deviations) is the Mean Deviation.
∑ |d|
Mean Deviation, MD =
n

Illustration: Calculate the Mean Deviation of the following values; 2, 4, 7, 8 and 9.


Solution: The Arithmetic Mean (AM) of data is 6 (Sum is 30 and n is 5). Then we calculate |d| for each X, which is
given in the table below.

∑ |d| 12
Mean Deviation, MD = =
n 5
=2.4

Variance & Standard Deviation


Standard Deviation is the positive square root of the mean of squared deviations from mean.

So if there are 5 values X1 , X2 , X3, X4 and X5 , first their mean is calculated. Then deviations of the values from mean
are calculated. These deviations are then squared. The mean of these squared deviations is the Variance. Positive
square root of the variance is the Standard deviation.

11 Human Peritus, www.humanperitus.com


Business Statistics

6. Hypothesis Testing
In last section, we estimated an unknown population parameter from the corresponding statistic obtained from the
analysis of a sample from the population. Now we shall similarly analyse a sample and then use its statistic to see
whether some claim made about the population is reasonable or not.

6.1 Hypothesis Testing


When an estimate from a sample is used to test some belief or claim or hypothesis about the population, the process
is known as hypothesis testing (Significance Testing).

In hypothesis testing, we begin by making a tentative assumption about a population parameter. This tentative
assumption is called the null hypothesis and is denoted by H0 . We then define another hypothesis, called the
alternative hypothesis, which is the opposite of what is stated in the null hypothesis. The alternative hypothesis is
denoted by Ha or H1 .

Thus, the Hypothesis testing uses sample data to determine whether a statement about the value of a population
parameter (a) should be rejected or (b) should not be rejected.

In some situations, it is easier to identify the alternative hypothesis first and then develop the null hypothesis. In other
situations, it is easier to identify the null hypothesis first and then develop the alternative hypothesis.

Use Alternative Hypothesis as a Research Hypothesis


Sometimes, the research hypothesis is stated as the alternative hypothesis. For example, a new teaching method is
developed that is believed to be better than the current method. The alternative hypothesis is that the new method is
better. The null hypothesis is that the new method is no better than the old method.
A new sales force bonus plan is developed in an attempt to increase sales. The alternative hypothesis is that the new
bonus plan increases sales. The null hypothesis is that the new bonus plan does not increase sales.
A new drug is developed with the goal of lowering blood pressure more than an existing drug. The alternative
hypothesis is that the new drug lowers blood pressure more than the existing drug. The null hypothesis is that the new
drug does not provide lower blood pressure than the existing drug.

Null Hypothesis as an Assumption to be Challenged


Not all hypothesis tests involve research hypotheses. Sometimes, we use a hypothesis test to challenge the assumption
and determine if there is statistical evidence to conclude that the assumption is incorrect. In these situations, it is helpful
to develop the null hypothesis first. The null hypothesis expresses the belief or assumption about the value of the
population parameter. The alternative hypothesis is that the belief or assumption is incorrect.

The hypothesis tests generally involve two population parameters: (a) population mean and (b) population proportion.
Depending on the situation, hypothesis tests about a population parameter may take one of following 3 forms; out of
which 2 use inequalities in the null hypothesis; the third uses an equality in the null hypothesis. Here µ0 denoted
hypothesized value.

Type 1 Type 2 Type 3


H0 : µ ≥ µ0 H0 : µ ≤ µ0 H0 : µ = µ0
Ha : µ < µ0 Ha : µ > µ0 Ha : µ ≠ µ0

Type 1 and Type 2 are type of One-tailed tests and Type 3 is a type of Two-tailed test.

In selecting the proper form of H0 and Ha , keep in mind that the alternative hypothesis is often what the test is
attempting to establish. Hence, asking whether the user is looking for evidence to support µ < µ0 , µ > µ0 , or µ ≠ µ0 will
help determine Ha .

The following steps are common to all Hypothesis Tests


Step 1. Develop the null and alternative hypotheses.

51 Human Peritus, www.humanperitus.com


Business Statistics

Step 2. Specify the level of significance.


Step 3. Collect the sample data and compute the value of the test statistic.
p-Value Approach
Step 4. Use the value of the test statistic to compute the p-value.
Step 5. Reject H0 if the p-value ≤ α.
Critical Value Approach
Step 4. Use the level of significance to determine the critical value and the rejection rule.
Step 5. Use the value of the test statistic and the rejection rule to determine whether to reject H0

You can use either of (a) p value approach or (b) critical value approach for Steps 4 and 5.

These steps have been explained in detail, with examples, in next sections.

6.2 Type I and Type II Errors


Ideally the hypothesis testing procedure should lead to the acceptance of H0 when H0 is true and the rejection of H0
when Ha is true. Unfortunately, the correct conclusions are not always possible. Two kinds of errors, that can be made
in hypothesis testing, as shown in the figure below.

The first row of the figure shows what can happen if the conclusion is to accept H0 . If H0 is true, this conclusion is
correct. However, if Ha is true, we make a Type II error; that is, we accept H0 when it is false.
The second row of the figure shows what can happen if the conclusion is to reject H0 . If H0 is true, we make a Type I
error; that is, we reject H0 when it is true. However, if Ha is true, rejecting H0 is correct.

The probability of making a Type I error, when the null hypothesis is true as an equality is called the level of
significance. The Greek symbol α (alpha) is used to denote the level of significance, and common choices for α are
0.05 and 0.01. In practice, the person responsible for the hypothesis test specifies the level of significance. By selecting
α, that person is controlling the probability of making a Type I error. If the cost of making a Type I error is high, small
values of α are preferred. If the cost of making a Type I error is not too high, larger values of α are typically used.
Applications of hypothesis testing that only control for the Type I error are called significance tests.

The probability of making a Type II error is denoted by beta, β.

Thus, we conclude that:


 A Type I error (α) is the probability of rejecting a true null hypothesis.
 A Type II error (β) is the probability of failing to reject a false null hypothesis.
Or simply:
 A Type I error (α) is the probability of telling you things are wrong, given that things are correct.
 A Type II error (β) is the probability of telling you things are correct, given that things are wrong.

Although most applications of hypothesis testing control for the probability of making a Type I error, they do not
always control for the probability of making a Type II error. Hence, if we decide to accept 𝐻0, we cannot determine
how confident we can be with that decision. Because of the uncertainty associated with making a Type II error when
conducting significance tests, statisticians usually recommend that we use the statement “do not reject 𝐻0” instead of
“accept 𝐻0”.

52 Human Peritus, www.humanperitus.com


Business Statistics

If you wish to reduce the level of Type I error, then reduce the significance level to a very low level, perhaps to α =
0.01, or even to α = 0.001. Remember though that, this implies a higher level of Type II error. Since the negative
consequences of Type I error are not so negative, then it is preferable to provide a better balance of Types I and II error
by adopting a significance level such as 0.05 or 0.10.

Power of test = 1-β

Type I errors are also called: 1) Producer’s risk 2) False alarm 3) False negative 4) α error
Type II errors are also called: 1) Consumer’s risk 2) Misdetection 3) False positive 4) β error

Let us understand Hypothesis Testing with few examples. We have grouped the tests in two categories:
(i) Calculation of Mean of Population, when Standard Deviation of Population (𝜎) is Known – using z test
(ii) Calculation of Mean of Population, when Standard Deviation of Population (𝜎) is NOT Known- using t test

Later, we have explained another example to calculate Population Portion, using Hypothesis Testing.

6.3 Z test ( is known)


Let us understand, how to conduct a hypothesis test about a population mean for the σ known case. As discussed
earlier, there can be three cases.

One tailed test (lower tailed) One tailed test (upper tailed) Two tailed test
H0 : µ ≥ µ0 H0 : µ ≤ µ0 H0 : µ = µ0
Ha : µ < µ0 Ha : µ > µ0 Ha : µ ≠ µ0

One tailed test


Suppose the Consumer Protection Court is interested in finding, if the company is packing 3 litres of refined oil in
every packet. The Court knows that the Company’s production process cannot place EXACTLY 3 litres in every
packet, however, as long as the population mean filling weight is at least 3 litre per packet, the rights of consumers will
be protected. We will show how the Court can check it by conducting a lower tail hypothesis test. Note that the
process of doing Upper tailed test is similar to the process of lower tailed test.

The students may note down steps of Hypothesis Testing, while we are doing this example. Almost all questions of
Hypothesis testing follow similar steps.

Step 1. Develop the null and alternative hypotheses


The first step is to develop the null and alternative hypotheses for the test. If the population mean filling weight is at
least 3 litres per packet, the Company’s claim is correct. This establishes the null hypothesis for the test. However, if
the population mean weight is less than 3 litres per packet, then the company’s claim is incorrect. With μ denoting the
population mean filling weight, the null and alternative hypotheses are as follows, where hypothesized value of the
population mean is µ0 = 3:

Null Hypothesis H0 : µ ≥ 3
Alternate Hypothesis Ha : µ < 3

If the sample data indicate that H0 cannot be rejected, then the Court should be take any action against the company.
However, if the sample data indicate H0 can be rejected, we will conclude that the alternative hypothesis, Ha : µ < 3,
is true. In this case, punitive action against the company would be justified.

Step 2. Specify the level of significance


The decision maker must specify the level of significance. If the cost of making a Type I error is high, a small value
should be chosen for the level of significance. If the cost is not high, a larger value is more appropriate. The Court
might say that “I am willing to risk a 1% chance of making Type I error”. Thus we set the level of significance for the
hypothesis test at α = 0.01.

53 Human Peritus, www.humanperitus.com


Business Statistics

Step 3. Collect the sample data and compute the value of the test statistic
Suppose Population Standard Deviation (from historic data) is, 𝜎 = 0.18 and we take sample size, n= 36.
Then the standard error of mean is given by
σ 0.18
σx̅ = = = 0.03
√n √36

For hypothesis tests about a population mean in the σ known case, we use the standard normal random variable z as a
test statistic to determine whether deviates from the hypothesized value of μ enough to justify rejecting the null
hypothesis. The test statistic (z) is as follows:
x̅ − µ0
z= σ
√n
A value of z = -1 means that the value of is one standard error below the hypothesized value of the mean, a value of z
= -2 means that the value of is two standard errors below the hypothesized value of the mean, and so on. We can use
the standard normal probability table to find the lower tail probability corresponding to any z value.

Let us now compute the value of z in our example, where 𝜎 = 0.18 and sample size, n= 36. Suppose mean of sample
comes out to be, x
̅ = 2.92 litres.
We calculate Z Statistic is below:
x̅ − µ0 2.92 − 3
z= σ = 0.18 = -2.67
√n √36

The key question for a lower tail test is, How small must the test statistic z be before we choose to reject the null
hypothesis? Two approaches can be used to answer this question: the p-value approach and the critical value approach.

p-value approach
The p-value is used to determine whether the null hypothesis should be rejected. A p-value is a probability that provides
a measure of the evidence against the null hypothesis provided by the sample. Smaller p-values indicate more evidence
against H0 .

Step 4. Use the value of the test statistic to compute the p-value
Using the standard normal probability table, we find that the lower tail area (p-value) at z = -2.67 is 0.0038. (Look at
the figure to appreciate relation in Z and p)

54 Human Peritus, www.humanperitus.com


Business Statistics

Step 5. ‘Reject’ or ‘Not Reject’ Null Hypothesis


Our chosen, level of significance is α = 0.01.
The sample of 36 packets resulted in a p-value = 0.0038, which means that the probability of obtaining a value of x ̅=
2.92 or less when the null hypothesis is true as an equality is .0038. Because .0038 is less than or equal to α = 0.01, we
reject H0 . Therefore, we find sufficient statistical evidence to reject the null hypothesis at the 0.01 level of significance.
Thus we conclude that, we always Reject Null Hypothesis, 𝐇𝟎 if p-value ≤ 𝜶

The region of rejection of the null hypothesis is called the critical region for the hypothesis test. The critical region is
sometimes referred to as the region of rejection of H0 , and the two terms are synonymous.

Critical Value Approach


The critical value is the value of the test statistic that corresponds to an area of α = 0.01 in the lower tail of a standard
normal distribution. Using the standard normal probability table, we find that z = -2.33 provides an area of 0.01 in the
lower tail. Thus, if the sample results in a value of the test statistic that is less than or equal to -2.33, the corresponding
p-value will be less than or equal to 0.01; in this case, we should reject the null hypothesis.

Thus the rejection rule is:


Reject 𝐇𝟎 if Z ≤ −𝐙𝛂
where −Zα is the critical value; that is, the z value that provides an area of α in the lower tail of the standard normal
distribution.

The p-value approach to hypothesis testing and the critical value approach will always lead to the same rejection
decision.

Upper Tailed Tests


In our example, we understood how to conduct a lower tail test. The upper tailed tests take following form:
H0 : µ ≤ µ0
Ha : µ > µ0

We can use the same general approach to conduct an upper tail test. The test statistic z is still computed using same
equation.

But, for an upper tail test, the p-value is the probability of obtaining a value for the test statistic as large as or larger
than that provided by the sample. Thus, to compute the p-value for the upper tail test, we must find the area under the
standard normal curve to the right of the test statistic.

For lower tail tests, the null hypothesis is rejected if the value of the test statistic is less than or equal to the critical
value. For upper tail tests, the null hypothesis is rejected if the value of the test statistic is greater than or equal to the
critical value.
In other words, for Upper Tailed Tests.
Reject 𝐇𝟎 if Z > 𝐙𝛂

55 Human Peritus, www.humanperitus.com


Business Statistics

Two tailed test


The general form of a two tailed test is
H0 : µ = µ0
Ha : µ ≠ µ0

Suppose a supplier supplies a metallic panel to Maruti, with mean length of 295 cms. Since the panel will fit in a car,
Maruti will NOT prefer length to be either greater or lesser than 295 cms.

Step 1. Develop the null and alternative hypotheses


The first step is to develop the null and alternative hypotheses for the test. The hypothesized value of the population
mean is µ0 =295.
H0 : µ = 295
Ha : µ ≠ 295
If the sample mean x̅ is significantly less than 295 cms or significantly greater than 295 cms, we will reject H0 .

Step 2. Specify the level of significance


The decision maker must specify the level of significance. The Maruti might say that “I am willing to risk a 5% chance
of making Type I error”. Thus we set the level of significance for the hypothesis test at α = 0.05.

Step 3. Collect the sample data and compute the value of the test statistic
Suppose Population Standard Deviation (from historic data) is, 𝜎 = 12 and we take sample size, n= 50.
Then the standard error of mean is given by
σ 12
σx̅ = = = 1.7
√n √50
Suppose mean of sample comes out to be, x
̅ = 297.6 cms.
We calculate Z Statistic is below:
x̅ − µ0 297.6 − 295
Z= σ = 12 = 1.53
√n √50

Step 4. Use the value of the test statistic to compute the p-value
Recall that the p-value is a probability used to determine whether the null hypothesis should be rejected. For a two-
tailed test, values of the test statistic in either tail provide evidence against the null hypothesis.

Now to compute the p-value we must find the probability of obtaining a value for the test statistic at least as unlikely
as z = 1.53. Clearly values of z ≥ 1.53 are at least as unlikely. But, because this is a two-tailed test, values of z ≤
−1.53 are also at least as unlikely as the value of the test statistic provided by the sample. As shown in the Figure, we
note that the two-tailed p-value in this case is given by P(z ≥ 1.53) + P(z ≤ - 1.53).

56 Human Peritus, www.humanperitus.com


Business Statistics

The table for the standard normal distribution shows that the area to the left of z = 1.53 is 0.9370. Thus, the area
under the standard normal curve to the right of the test statistic z = 1.53 is 1.0000 - 0.9370 =0.0630. Doubling this,
we find the p-value for our example is 2 × (.0630) = 0.1260.

Please note that, if the value of the test statistic is in the upper tail (z > 0), we find the area under the standard normal
curve to the right of z. If the value of the test statistic is in the lower tail (z < 0), find the area under the standard
normal curve to the left of z.

Step 5. ‘Reject’ or ‘Not Reject’ Null Hypothesis


Our chosen, level of significance is α = 0.05.
Next we compare the p-value to the level of significance to see whether the null hypothesis should be rejected. With a
level of significance of α = 0.05, we do not reject H0 because the p-value = 0.1260 > 0.05. Because the null hypothesis
is not rejected, no action will be taken on the Supplier by the Maruti and to adjust the manufacturing process.

Thus we conclude that, we Reject Null Hypothesis, 𝐇𝟎 if p-value ≤ 𝜶

Critical Value Approach


Let us see how the test statistic z can be compared to a critical value for a two-tailed test. As shown in the Figure, the
critical values for the test will occur in both the lower and upper tails of the standard normal distribution.

With a level of significance of α = 0.05, the area in each tail beyond the critical values is α/2 = -.05/2 = 0.025. Using
the standard normal probability table, we find the critical values for the test statistic are −𝑍0.025 = -1.96 and
𝑍0.025 = 1.96. Thus, using the critical value approach, the two-tailed rejection rule is

Reject Null Hypothesis, 𝐇𝟎 if Z ≤ -1.96 or if Z ≥ 1.96

57 Human Peritus, www.humanperitus.com


Business Statistics

Summary of all three types of Z tests (discussed above) is tabled below (using z statistic).

6.4 t test ( is unknown)


The t-Test is used to determine, if there is significant difference between means of two groups, which may be related
in certain feature. There are two conditions for z test (a) standard deviation of the population must be known and (b)
sample size should be above 30. If these conditions are not satisfied, then we use t-test.

To conduct a hypothesis test about a population mean for the σ unknown case, the sample mean 𝐱̅ is used as an estimate
of μ and the sample standard deviation s is used as an estimate of population standard deviation σ.

The steps of the hypothesis testing procedure for the “σ unknown case” are the same as those for the “σ known case”.
Recall that for the σ known case, the sampling distribution of the test statistic has a standard normal distribution. For
the σ unknown case, however, the sampling distribution of the test statistic follows the t distribution (with n-1 degrees
of freedom); it has slightly more variability because the sample is used to develop estimates of both μ and σ.

One tailed test


Suppose, Uber takes feedback of drivers from the riders on a scale of 1 to 10. All drivers with rating greater than 7 will
be Champions and offered high incentives. Mohammad is a driver with Uber. Feedback of 60 riders is obtained about
Mohammad. The mean of sample is x̅ = 7.25 and standard deviation of sample data, s = 1.052. Should Mohammad be
selected for Champion category?

Step 1. Develop the null and alternative hypotheses


We want to develop a hypothesis test for which the decision to reject H0 will lead to the conclusion that the population
mean rating for Mohammad is greater than 7. Thus, an upper tail test with Ha : µ > 7 is required.

The hypothesized value of the population mean is µ0 = 7.


H0 : µ ≤ 7
Ha : µ > 7

Step 2. Specify the level of significance


We set the level of significance for the hypothesis test at α = 0.05.

Step 3. Collect the sample data and compute the value of the test statistic
We calculate t Statistic is below:
x̅ − µ0 7.25−7
t= s = 1.052 = 1.84
√n √60
The students may note the difference in formula of z statistic (previous examples) and t statistic.

58 Human Peritus, www.humanperitus.com


Business Statistics

Step 4. Use the value of the test statistic to compute the p-value
The sampling distribution of t has 60-1=59 degrees of freedom. Because the test is an upper tail test, the p-value is the
area under the curve of the t distribution to the right of t = 1.84.
Then we use a t table to compute p-value. The p value comes out to be 0.0354 (from t table).

Step 5. ‘Reject’ or ‘Not Reject’ Null Hypothesis


Our chosen, level of significance is α = 0.05. Our p value is 0.0354
We Reject Null Hypothesis, 𝐇𝟎 if p-value ≤ 𝜶
Because 0.0354 ≤ 0.05, we reject Null Hypothesis in our example and Conclude that Mohammad should be made
Champion.

Two Tailed Test


Similarly, one can conduct two tail tests with t statistic, following exactly same steps.

The summary of both One tailed tests and Two tailed test presented in the table below (using t statistic):

6.5 Hypothesis test for Population Proportion


The procedure used to conduct a hypothesis test about a population proportion is similar to the procedure used to
conduct a hypothesis test about a population mean. Although we will only illustrate, how to conduct a hypothesis test
about a population proportion for an upper tail test, similar procedures can be used for lower tail and two-tailed tests.
We assume that np ≥ 5 and n(1-p) ≥ 5; thus the normal probability distribution can be used to approximate the
sampling distribution of p̅.

Suppose 20% of participants are women at Gold Gym. In order to increase participation of women, the Gold Gym
started a marketing campaign. After 3 months of campaign, the Gold Gym wants to find out, if proportion of women
has increased? A sample of 400 participants was taken, out of which 100 were women.

Step 1. Develop the null and alternative hypotheses


Because the objective of the study is to determine whether the proportion of women increased, an upper tail test with
Ha : µ > 0.20 is appropriate. The null and alternative hypotheses are:
H0 : µ ≤ 0.20
Ha : µ > 0.20

Step 2. Specify the level of significance


We set the level of significance for the hypothesis test at α = 0.05.

Step 3. Collect the sample data and compute the value of the test statistic
We calculate z Statistic is below:

59 Human Peritus, www.humanperitus.com

Das könnte Ihnen auch gefallen