Sie sind auf Seite 1von 19

ESTIMATION OF POPULATION PARAMETERS

The overall objective of descriptive statistics is to give you a detailed description of the data you have on
hand. As we have only limited data or sample data on hand, we are mostly required to estimate the
population parameters from the sample data. The parameter calculated from the sample data is not
100% accurate and might result in small errors while estimating the parameters for the population.

Estimates can be of 2 types:-


a) POINT Estimate
b) INTERVAL Estimate

a) POINT Estimate – is a statistic taken from a sample that is used to estimate a population
parameter, which is as good as the representativeness of its sample. If other random samples
are taken from the population, the point estimates derived from those samples are likely to
vary. The variations or errors likely to rise out of different samples are called STANDARD
ERRORS.

b) INTERVAL Estimate- Because of the variation in sample statistics, estimating a population


parameter with an interval estimate is often preferable to using a point estimate. An Interval
estimate (Confidence interval) is a range of values within which the analyst can declare, with
some confidence, the population parameter lies. Confidence intervals can be two-sided or one
side. In simple words, Confidence Intervals are a range of values within which the estimates can
fall.

Definition: A confidence interval for a parameter is an interval of numbers within which we expect the
true value of the population parameter to be contained. The endpoints of the interval are computed
based on sample information

Certain factors may affect the confidence interval size including size of sample, level of
confidence, and population variability. A larger sample size normally will lead to a better
estimate of the population parameter.

Most of the population parameters can be estimated based on sample statistics.

1) ESTIMATING THE POPULATION MEAN


The confidence interval is represented by z which is the area under the normal distribution,
which is taken into consideration by the analyst to arrive at the population parameter estimates.
This z represents the data values in percentages, which are considered significant for his analysis
and estimate. There is an Alpha (ά) concept which is the area under the normal curve in the
tails of the distribution outside the area defined by the confidence interval.

The CI yields a range within which we feel with some confidence, the population mean is
located. The interpretation is like this, it is not certain that the population mean is in the interval
unless we have a 100% confidence interval that is infinitely wide. If we want to construct a 95%
CI, the level of confidence of the analyst is 95% or 0.95 that the intervals would include the
population mean and five would not include.

In reality, a CI with 100% confidence would be meaningless. So researches go with 90%, 95%,
98% or 99% max. The reason is that there is a trade-off between sample sizes, interval width,
level of confidence etc. For Ex: as the level of confidence is increased, the interval gets
narrower. Which means data coverage range / data distribution (z level) and CI have an inverse
relationship. As the analyst takes wider data coverage (z value), his chances of increasing
estimate accuracy increases and his Confidence Interval will be more precise and narrow.

How confident are we that the true population average is in the shaded area? We are 95% confident.
This is the level of confidence. How many standard errors away from the mean must we go to be 95%
confident? From -z to z there is 95% of the normal curve.

There are 4 typical levels of confidence: 99%, 98%, 95% and 90%. Each of the levels of confidence has a
different number of standard errors associated with it. We denote this by

where a is the total amount of area in the tails of the normal curve. Thus, for a 95% level of confidence,
the z values from the table are:

Level of confidence /2

90% 5% 1.645

95% 2.5% 1.96

98% 1% 2.33

99% 0.5% 2.575


After selecting (or being told) that level of confidence, for a large (n>30) sample we use the formula

example: A sample of 100 observations is collected and yields mean xbar=75 and s=8. Find a 95%
confidence interval for the true population average.

=(73.432,76.568). So the Population Mean is estimated to be between 73.432 and 76.568 for a CI of
95%.

2) ESTIMATING POPULATION PROPORTION


We just saw the estimation of Mean (ex: estimating the average score of a cricketer, the
mean scores of all male students in a management course etc with intervals and
confidence levels). Business Decision makers often need to be able to estimate a
population proportion. Estimating market share (their proportion of the market) is
important for them. Market segmentation opportunities come from knowledge of the
proportion of various demographic characteristic among potential clients. More
examples may be the proportion of female students completing the course in the first
attempt, proportion or ratio of students coming back to the same university for a higher
degree/course, after completing the basic course. Ratio of students passing the IAS
entrance after a coaching etc.. All the estimations can be just based on a sample size
data taken from the whole population and estimates can be projected for the whole
population size. All these are estimated not in terms of Mean or Average, but in terms
of proportions and percentages. Here again, for ex: the CI will be like 92.3% to 94.5% of
students would pass the exam after taking up the course etc..

Ex: if the sample proportion of telebrand marketing proportion is 0.39 or 39%, we


estimate the proportion of telebrand marketing for a population based on sample size
of 87 observations, at a confidence interval of 95%. Using the same formula:

𝝁 ± 𝒛√𝒑 ∗ 𝒒/𝑵
The sample mean is 0.39, the z value of error for a 95% CI is 1.96, the sample size is 87,
but the proportion p of sample is 0.39, proportion q is 1-p=0.61 (since we deal with
proportions, we talk about a term called q value, which 1-p, in this case 1-0.39=0.61)
0.39±𝟏. 𝟗𝟔√𝟎. 𝟑𝟗 ∗ 𝟎. 𝟔𝟏/𝟖𝟕
0.39±0.102 = 0.39+0.102=0.492 or 0.39-0.102=0.29

Hence the proportion of the population is estimated to be between 0.29 and 0.49 or
between 29% and 49%, at a confidence level of 95% with a range of 0.102 from the
point estimate.

3) ESTIMATING POPULATION VARIANCE


At times the researcher is more interested in the population variance than in the population
mean or population proportion. For Ex: in the total quality checks, suppliers who want to
earn world-class supplier status or even those who want to maintain customer contracts are
often asked to show continual reduction of variation on supplied parts. Essentially to
minimize variations in production and to maintain consistency in quality, tests are
conducted on samples to determine lot variation and whether consistency goals are being
met. For ex: Variations in airplane altimeter readings need to be minimal, it is not just
enough to know the average, a particular brand of altimeter produces the correct altitude.
Thus measuring the variations of altimeters is critical. Variations mean the differences from
the strict quality specifications prescribed. Mostly, the quality in such situations needs to be
accurate and any differences have a drastic effect.

The relationship of the sample variance to the population variance is captured by the CHI-
SQUARE Distribution (x2). That is , the ratio of the sample variance (s2) multiplied by N-1 to
the population variance (σ2) is approximately chi-square distributed, if the population is
normally distributed. This does not suit conditions where the data is not normally
distributed.

Degrees of Freedom:
Degrees of Freedom refers to the number of independent observations for a source of
variation minus the number of independent parameters estimated, in computing the
variation. So, if you have 50 observations and you calculate 2 parameters, your DF will be N-
parameters, 50-2=48.

HYPOTHESIS TESTING

Introduction

Business researchers often need to provide insight and information to decision makers to assist them in
answering questions like,
 What container shape is most economical and reliable for shipping a product?
 Which management approach best motivates employees in the retail industry?
 What is the most effective means of advertising in a business setting?
 How can the company’s retirement investment financial portfolio be diversified for optimum
performance?
For these purposes, researches develop “Hypotheses” to be studied and explored.

HYPOTHESIS is something already proven and established


 Something that needs to be proven or disproved.
 It is an educated guess
 A claimed fact
 Is a tentative explanation of a principle operating in nature?
 Is an assumption about population or unknown value/parameter

We explore all types of hypotheses, how to test them, how to interpret the results of such tests to
help decision making. Research hypothesis is a statement of what the researcher believes will be the
outcome of an experiment or a study. Business researchers have some idea or theory based on
previous experience and data as to how the study will turn out. These are typically concerning
relationships, approaches and techniques in business.

Statistical hypothesis is required in order to scientifically test the research hypothesis. All Statistical
hypothesis consist of 2 parts, NULL HYPOTHESIS & ALTERNATIVE HYPOTHESIS.

 NULL HYPOTHESIS – usually states that “Null” condition exists, that there is nothing new
happening, the old theory is still true, the old standards / quality are correct, and the system is
under control. It is represented by H0.

 ALTERNATIVE HYPOTHESIS - on the other hand usually states that the new theory is true, there
are new standards, the system is out of control or something different is happening all the time.
Represented as H1.

Ex 1 : Suppose a baking flour manufacturer has a package size of 40 ounces and wants to test whether
their packaging process is correct, the NULL hypothesis for this test would be that the average weight of
the pack is 40 ounces (no problem). The ALTERNATIVE hypothesis is that the average is not 40 ounces
(process has differences).

Hypothesis is represented as follows:


NULL HYPOTHESIS is H0: µ = 40 ounces
ALTERNATIVE HYPOTHESIS is H1: µ ≠ 40 ounces

Ex2: Suppose a company held an 18% market share earlier and because of increased marketing effort,
company officials believe the market share is more than 18%, the market researches would like to prove
it.
H0: p = 0.18
H1: p > 0.18

APPLICATIONS OF HYPOTHESIS TESTING (called T Test or Z Test)

No. Application Example Test Used


To check whether Sample Mean Avg salary of Company A, Dept 4 One Sample T.Test or Z.Test
1
=Population Mean employees is 10,000 ( T or Z Value)
Independent T.Test or
To compare Mean of One Population Nokia Vs. Samsung mobile phone Z.Test for both companies
2
Vs Mean of another population sales in a region separately & compare (T or
Z Value to be considered)
To compare the effect before and Effect of a BP drug on patients,
Paired T.Test (T value to be
3 after a particular event, on the same before and after consumption of
taken)
sample the drug
To compare more than two
Sales of Nokia, Samsung and ANOVA or F.TEST (F.Value
4 independent variables or >2
Motorola mobiles in the result to be taken)
independent populations
To find the association between 2
Person's edu qualification with his Chi-Square Test (X2)
5 attributes (attributes are usually
earning capacity/salary/profession (T Value)
categorical/qualitative)
To find out Goodness of Fit Chi-Square Test (X2)
6 Estimated Vs actual sales
(Observed =Expected or not) (T Value)

Note: Though the statistical calculations gives results are T- Value of F Values for the above tests, in
SPSS,Excel.R,Python,SAS we look mainly at the Significance value(P-value). If this p value is less than
0.05, we reject Null Hypothesis, else accept it. This is because, the T Tables and F tables are not
handy.

STEPS for HYPOTHESIS TESTING – Most of the hypothesis testing is based on Mean comparisons.

1) State the Hypotheses, (both NULL and Alternative) clearly. H0 and H1. The purpose of the test
should be clearly understood as per the applications. The researcher needs to be doubly sure of
the requirement and purpose, to negate the null hypothesis.

2) Specify Level of significance – can also be said as allowable non-confidence limits. It is also the
probability of committing Type 1 Error. Common values of alpha are 0.05, 0.03, 0.01 etc,
depending on the criticality of the business errors. Ex: A Retail industry test may accept a level
of 5%, but the aeronautics industry would want only a 0.01 level. The drugs industry would
want even precise levels of testing as close as 99.2%.

3) Use appropriate statistical test, based on requirement and based on hypothesis in Step 1.
Hypothesis
Testing

2 > Variables ANOVA


One Variable Two Variables
(F Test)

N ≤ 30 One Independent T Dependent One Way Two Way


N ≥ 30 Z.Test
Sample T.Test / Z Test T/Z Test ANOVA ANOVA

One Dep& One Dep


N≤ 30 Ind N ≥ 30 Ind
Paired T.Test One Ind &Two Ind
T.Test Z.Test
Variable variables

4) Decision Rule- The researcher should be 1-α confident to prove his hypothesis. General rule is
P ≤ 0.05 (alpha level) you reject H0 and prove your theory.
P≥ 0.05 (alpha level is more) you accept H0 to prove the older theory.
OR
If T.Value or F.Value (from step 3) is more than table value, then reject H0, else accept H0.

Ex: if alpha is 0.05, and P value is 0.03, then the researcher is 97% confident of his
theory and can reject Null Hypothesis. If he is less confident or p value is more than alpha
level, his confidence levels are going down and is forced to accept the null hypothesis and
that his tests have failed.

5) Conclusion - TYPE 1 and TYPE 2 Errors (Confusion Matrix)

Actual Claim H0

If you ↓ If True If False

(TP)Correct
(TN) TYPE 2 Error
Accept Decision
(β=0.95)(1-alpha)
(1-alpha)=0.95
(FN) Correct
(FP) TYPE 1 Error Decision
Reject
Alpha=0.05 (power of test
Beta)

 Probability of Type 1 Error = level of significance (0.05)


 Power of Test β=0.95, since 0.05 alpha is already fixed.

t-Test

A t-test is an analysis of two populations means through the use of statistical examination; a t-test with
two samples is commonly used with small sample sizes, testing the difference between the samples
when the variances of two normal distributions are not known.

A t-test looks at the t-statistic, the t-distribution and degrees of freedom to determine the probability of
difference between populations; the test statistic in the test is known as the t-statistic.

Example 1) Philips Bulb Co. states that the average lifetime of EchoStar Bulb is 10 years.
Now WIPRO doesn’t accept this claim and tests the average life of 15 Philips Bulbs.

Lifetime_Yrs 1) H0: µ=10, H1=µ≠10


9 2) Alpha level = 0.05
11 3) Simple T.Test should check the average lifetime
10 there is only one variable to check, obs are less than 30
8
9
9
8
7
10
10
11
11
9
9
8.5
Variable: lifetime

N Mean Std Dev Std Err Minimum Maximum

15 9.3000 1.1922 0.3078 7.0000 11.0000

Mean 95% CL Mean Std Dev 95% CL Std Dev

9.3000 8.6398 9.9602 1.1922 0.8729 1.8803

DF t Value Pr > |t|

14 -2.27 0.0392

1) The only variable being checked for Mean value here is LIFETIME.
2) N is the number of observations
3) Mean of LIFETIME variable is 9.3000 (here it says average life of the bulb is 9.30 yrs)
4) STD Deviation of all the 15 values from the mean is 1.1922
5) Minimum and maximum are the values in the data range in the LIFETIME variable.
6) 95% CL Mean – is the Mean at 95% Confidence Limit of data being picked up for analysis.
7) The Confidence Interval for the Mean is 8.6398 to 9.9602
8) 95% Confidence Interval for SD is 0.8729 to 1.8803
9) DF – Degrees of Freedom (always N-1)
10) T.Value is actually not considered here as we cannot compare it with the table values.
11) Probability to T.Value is 0.0392.

Here the P value is the one to be considered. Since it is much lesser than alpha 0.05, we REJECT Null
Hypothesis and say that the average life time of Philips Bulb is NOT 10 yrs. Here is the P value was higher
than 0.05, we would accept the H0 and say the lifetime is 10 yrs.

Example 2: 15 Customers each in Mumbai and Delhi were asked to rate Brand X on a 7 point scale. The
responses of all 30 customers are presented. Test whether the responses to Brand X is the same in both
cities.

Rating of Brand X on a scale of 0-7


Mumbai Delhi

2 3 Step 1) H0: Mean of Mumbai = Mean of Delhi


3 4 H1: Mean of Mumbai not equal to Mean of Delhi
3 5 Step 2) Alpha 0.05
4 6 Step 3) Independent T.Test for 2 samples
5 5 Step 4) Since we have only one factor here, ie responses
4 5 from customers.
4 5
5 4
3 3
4 3
5 3
4 5
3 6
3 6
4 6

Example 3:- We have recorded the ratings of Tamarind brand garments from 18 respondents before and
after an advertisement campaign were released for this brand. The ratings are on a 10 point scale. Test
whether the campaign had an effect on sales of Tamarind brand garments.

Before After
3 4 Step 1) H0: No effect of the advt campaign (Mean before and after are same)
4 5 H1: Advt Campaign had an effect on sales (Mean sales before / after are diff)
2 3 Step 2) Alpha 0.05
5 4 Step 3) Go for Paired T.Test since the same sample obs are tested twice
3 5
4 5
5 6
3 4
4 5
2 4
2 4
4 5
1 3
3 5
6 8
3 4
2 4
3 5

ANOVA (ANALYSIS OF VARIANCE)

Hypothesis testing applies to testing the parameters of a sample or a population. While


Hypothesis testing involves testing one or more samples for comparing their means, ANOVA
also refers to a type of hypothesis testing, which involves more than two variables or factors,
which is a limitation of hypothesis testing.

When we say, more than 2 variables, it means, influence of 1 or more factors on a variable.
Ex: Sales affected by location of item displayed. (Either window or near counter or shelf)-
Though there are 3 options, they are categorised as a single factor influencing sales. Here
ANOVA is mainly concerned with the analysis of “which one is better among the 3” options
within the same variable called storage area.

2 Factors can also influence a variable. Ex: Sales because of storage location and price. In such
cases, ANOVA gives the combination of the factors which will fetch max results.

ANOVA mainly deals with a detailed analysis of variances in the variables as follows:-
 Between the variables, (ex. Variance of Variable 1 and variance of variable 2 )
 Within the variable (variances in variable 1 from its mean)
 In totality.

Example 1 - Simple One-Way ANOVA

Analyse if the sales of kit Kat is affected by the area of display (storage location) in a store. Sales for a
week are observed when kit Kat is placed at the window side, counter side or on the shelf.

Shelf Window Counter


450 500 550
490 530 570
500 510 560
470 500 530
480 530 590
500 540 600
460 520 580

H0=Storage area sales are all equal


H1= Storage area sales -at least one location is different
Alpha = 0.05
Test to be performed - one way ANOVA with 3 class
variables

F- TESTS

F-Test to Compare Two Variances. However, the f-statistic is used in a variety of tests
including regression analysis, the Chow test and the Scheffe Test (a post-hoc ANOVA test).

An F-test (Snedecor and Cochran, 1983) is used to test if the variances of two populations are equal.
This test can be a two-tailed test or a one-tailed test. The two-tailed version tests against the alternative
that the variances are not equal. The one-tailed version only tests in one direction that is the variance
from the first population is either greater than or less than (but not both) the second population
variance. The choice is determined by the problem. For example, if we are testing a new process, we
may only be interested in knowing if the new process is less variable than the old process.

The F hypothesis test is defined as:


H0: σ21 = σ22
Ha: σ21<σ22 for a lower one-tailed test
σ21>σ22 for an upper one-tailed test
σ21≠σ22 for a two-tailed test
Test F = s21/s22
Statistic: Where s21 and s22 and are the sample variances. The more this ratio deviates from 1, the
stronger the evidence for unequal population variances.

CHI SQUARE TESTS

All the hypothesis tests like Simple T.Test, Independent T.Tests or even ANOVA deal with numeric or
continuous data, which can be quantified for further calculations. If your data is categorical or nominal
and when the data is qualitative, statistical calculations get limited.

Categorical data are nonnumerical data that are frequency counts of categories in one or more
variables.
Hence the question of comparison of Means or Variances doesn’t arise. A categorical comparison needs
to be done based on possible category values and their frequency occurrences, in both qualitative
variables.

In such cases also, business requires you to analyse the categorical data and give inferences based on
past history and forecast the future. Thus, we use CHI-SQUARE Tests for Categorical data. Chi-Square
tests can be used even if one of the variables is qualitative or all your variables are qualitative.

Chi-Square is simply the sum of the squares of the difference between Observed minus predicted value
divided by expected values. Chi-square records the differences between observed and expected, takes
their squares and compare it by the expected values.

(Observed−Expected)2
Chi-Square is represented by 𝑥2 = ∑
Expected

Applications of Chi-Square:-
Your categorical data can be analysed by using Chi-Square in the following situations:-

1) CHI-SQUARE GOODNESS OF FIT Test - To check goodness of fit and compare observed vs.
predicted values For Ex: if the variable is Economic class with 3 possible outcomes of lower
income class, middle income class and upper income class, the single dimension is economic
class the 3 possible outcomes are the 3 classes. On each trial, only one class can occur. Or the
family can be categorised only under one class at a time.

Here, the Chi-Square goodness of fit test compares the expected, or calculated frequencies of
categories with the observed / actual frequencies from a dataset to determine whether there is a
difference between what was expected and what was actually observed.

Ho: The observed values are the same as expected values.


H1: The observed values are not the same as expected values.
This formula compares the frequency of observed values to the frequency of expected values
across the distribution. Here again, the Chi-Square value is either compared with the table value
or the p value of 0.05 is used for hypothesis acceptance.

2) CHI-SQUARE TEST of INDEPENDENCE - To find the dependency or influence of 2 attributes.


The Chi-Square goodness of fit test is used to analyse the category frequencies of one variable.
(Meaning, the possible categorical values in a single variable). But Chi-Square test of
independence can be used to analyse the frequencies of two variables with multiple categories
to determine whether the two variables are independent or not.

Ex: Financial investors decisions are based on region or not. The researcher gets answers to
two questions which have categorical answers.
a) In which region the investor resides (North, East, West, South)
b) Which type of investment he prefers (stocks, bonds, treasury bills)
In such cases, the researcher would tally the frequencies of responses to these two questions
into two categorical variables and record them in a two dimensional CONTINGENCY TABLE. Hence
this Chi-Square test of independence can also be called CONTINGENCY ANALYSIS.

The hypothesis is, H0=The 2 attributes are independent of each other.


H1= The 2 attributes are associated with each other.

There is a slight difference in the Chi-Square formula for Test of independence…

(F0−fe)2
𝑥2 = ∑ ∑ fe

Note: Here the frequencies of both variables are taken into account, and the DF lost will be 2.

3) To compare variances of 2 different categorical variables.

The relationship of the sample variance to the population variance is also captured by the CHI-SQUARE
Distribution (x2). That is , the ratio of the sample variance (s2) multiplied by N-1 to the population
variance (σ2) is approximately chi-square distributed, if the population is normally distributed. This does
not suit conditions where the data is not normally distributed.

PRACTICAL PROBLEMS

Example
Is gender independent of education level? A random sample of 395 people were surveyed and each
person was asked to report the highest education level they obtained. The data that resulted from the
survey is summarized in the following table:

High School Bachelors Masters Ph.d. Total


Female 60 54 46 41 201
Male 40 44 53 57 194
Total 100 98 99 98 395
Question: Are gender and education level dependent at 5% level of significance? In other words,
given the data collected above, is there a relationship between the gender of an individual and the
level of education that they have obtained?
Here's the table of expected counts:

High School Bachelors Masters Ph.d. Total


Female 50.886 49.868 50.377 49.868 201
Male 49.114 48.132 48.623 48.132 194
Total 100 98 99 98 395
So, working this out, χ2=(60−50.886)2/50.886+⋯+(57−48.132)2/48.132=8.006
The critical value of χ2 with 3 degree of freedom is 7.815. Since 8.006 > 7.815, therefore we reject
the null hypothesis and conclude that the education level depends on gender at a 5% level of
significance.

BINOMIAL DISTRIBUTION:

 The experiment consists of n repeated trials.


 Each trial can result in just two possible outcomes. We call one of these outcomes
a success and the other, a failure.
 The probability of success, denoted by P, is the same on every trial.
 The trials are independent; that is, the outcome on one trial does not affect the
outcome on other trials.

Notation

The following notation is helpful, when we talk about binomial probability.

 x: The number of successes that result from the binomial experiment.


 n: The number of trials in the binomial experiment.
 P: The probability of success on an individual trial.
 Q: The probability of failure on an individual trial. (This is equal to 1 - P.)
 n!: The factorial of n (also known as n factorial).
 b(x; n, P): Binomial probability - the probability that an n-trial binomial
experiment results in exactly x successes, when the probability of success on an
individual trial is P.
 nCX: The number of combinations of n things, taken X at a time.

The probability mass function is

b(x; n, P) =P(X=x)= nCx * Px * (1 - P)n - x

EX: Consider the following statistical experiment. You flip a coin 2 times and count the
number of times the coin lands on heads. This is a binomial experiment because:

 The experiment consists of repeated trials. We flip a coin 2 times.


 Each trial can result in just two possible outcomes - heads or tails.
 The probability of success is constant - 0.5 on every trial.
 The trials are independent; that is, getting heads on one trial does not affect
whether we get heads on other trials.

Q. A coin is tossed 10 times. What is the probability of getting exactly 6 heads?

I’m going to use this formula: b(x; n, P) – nCx * Px * (1 – P)n – x


The number of trials (n) is 10
The odds of success (“tossing a heads”) is 0.5 (So 1-p = 0.5)
x=6

P(x=6) = 10C6 * 0.5^6 * 0.5^4 = 210 * 0.015625 * 0.0625 = 0.205078125

POISSION DISTRIBUTION:

 The probability distribution of a Poisson random variable X representing the


number of successes occurring in a given time interval or a specified region of
space is given by the formula:


 where
 e is the base of natural logarithms (2.7183)
μ is the mean number of "successes"
x is the number of "successes" in question

EX: Suppose you knew that the mean number of calls to a fire station on a
weekday is 8. What is the probability that on a given weekday there would be 11
calls? This problem can be solved using the following formula based on the
Poisson distribution

Normal Distribution

The normal distribution refers to a family of continuous probability


distributions described by the normal equation.

Normal equation. The value of the random variable Y is:


Y = { 1/[ σ * sqrt(2π) ] } * e-(x - μ)2/2σ2

where X is a normal random variable, μ is the mean, σ is the standard deviation, π is


approximately 3.14159, and e is approximately 2.71828.

The random variable X in the normal equation is called the normal random variable. The
normal equation is the probability density function for the normal distribution.

The Normal Curve

The graph of the normal distribution depends on two factors - the mean and the
standard deviation.

The mean of the distribution determines the location of the center of the graph,

The standard deviation determines the height and width of the graph.

When the standard deviation is large, the curve is short and wide;

When the standard deviation is small, the curve is tall and narrow.

All normal distributions look like a symmetric, bell-shaped curve, as shown below.

The curve on the left is shorter and wider than the curve on the right, because the curve
on the left has a bigger standard deviation.

Probability and the Normal Curve

The normal distribution is a continuous probability distribution. This has several


implications for probability.

 The total area under the normal curve is equal to 1.


 The probability that X is greater than a equals the area under the normal curve
bounded by a and plus infinity (as indicated by the non-shaded area in the figure
below).
 The probability that X is less than a equals the area under the normal curve
bounded by a and minus infinity (as indicated by the shaded area in the figure
below).

Additionally, every normal curve (regardless of its mean or standard deviation) conforms
to the following "rule".

 About 68% of the area under the curve falls within 1 standard deviation of the
mean.
 About 95% of the area under the curve falls within 2 standard deviations of the
mean.
 About 99.7% of the area under the curve falls within 3 standard deviations of the
mean.

Collectively, these points are known as the empirical rule or the 68-95-99.7 rule. Clearly,
given a normal distribution, most outcomes will be within 3 standard deviations of the
mean.

Example 1

An average light bulb manufactured by the Acme Corporation lasts 300 days with a
standard deviation of 50 days. Assuming that bulb life is normally distributed, what is
the probability that an Acme light bulb will last at most 365 days?

Solution: Given a mean score of 300 days and a standard deviation of 50 days, we want
to find the cumulative probability that bulb life is less than or equal to 365 days. Thus,
we know the following:

 The value of the normal random variable is 365 days.


 The mean is equal to 300 days.
 The standard deviation is equal to 50 days.

We enter these values into the Normal Distribution Calculator and compute the
cumulative probability. The answer is: P( X < 365) = 0.90. Hence, there is a 90% chance
that a light bulb will burn out within 365 days.

Das könnte Ihnen auch gefallen