Normal Distribution and More

EGR Introduction to
Industrial & Systems Engineering

Complete your homework assignment only
with the given methods.
1
Course Topics
Overview of Industrial and Systems Engineering.
Probability in a finite sample space.
Conditional probability and Bayes theorem.
Discrete probability distributions.
Continuous probability distributions.
Probability for systems with 2 or more random events.
Statistical sampling and confidence intervals.
Hypothesis testing.
2
Statistical Independence
Two events A and B are statistically independent

if
P(A|B) = P(A) & P(B|A) = P(B)
and it follows that
P(AB) = P(A)P(B) (1)
Note (1) is a result of the conditional probability
formula
P(A B)
P(A | B) =
P(B) 3
Bayes Theorem
If Bi (i=1,...,k) are mutually exclusive and

B1B2...Bk = S
then for any event A
PB j PA | B j
PB j | A = k
PBi PA | Bi
i=1
4
Properties of Probability Function
For any discrete RV XS,

0 P(x) 1
and

P(x) 1
x=-

5
Definition
The expected value, E(X), of a discrete RV X is

given by
E(X) = xP(x)
x=-
Note that:
expected value average mean

Notation:
E(X) = = X
Note the units of E(X) are the same as those of X.

6
Note
An alternative method for computing variance.

VAR(X) = E[(X - E(X))2]
= E[X2 - 2XE(X) + E2(X)]
= E(X2) - E(2XE(X)) + E[E2(X)]
Since E(X) is a constant,
VAR(X) = E(X2) - 2E2(X) + E2(X)
VAR(X) = E(X2) - E2(X)
7
Definition
Bernoulli trials are a set of n trials of a random

process where the outcome of each trial is
statistically independent of the n-1 other
trials.
Examples.
Roll a pair of dice n times.
Flip a coin n times.
A random process whose outcomes are Bernoulli
trials is said to satisfy the Bernoulli property.
8
Binomial Distribution
Let RV X satisfy the Bernoulli property and be defined

by
X = {# of times of event A occurs in n trials}
then
n! x n-x
p q 0xn
P(x) = x!(n - x)!

0 elsewhere
where on any trial,
p = P(A) & q = P(Ac) = 1 - p

Note that order does not matter.
9
Aside
A combination, C(n,x), is the # of ways to select

x elements (without replacement) from a set of
n distinct elements where order does not
matter and is given by
n!
C(n,x) =
x!(n - x)!
Example: # ways to arrange 2 apples and 4 mangos
6!
C(6,2) =
2!(6 - 2)!
= 15
10
Negative Binomial Distribution
Let the random process satisfy the Bernoulli property

and define RV X by
X = {trial # when event A occurs for the kth time}
then
(x -1)!
p k q x-k 1 k x
P(x) = (k -1)!(x - k)!

0 elsewhere
where on any trial,
p = P(A) & q = P(Ac) = 1 - p

Note that order does not matter except for trial x.
11
Special Case
For k=1, the negative binomial distribution is

termed the geometric distribution where
pq x-1 x 1
P(x) =
0 elsewhere
12
Hypergeometric Distribution
Consider a discrete sample space S where
N = # of elements in S & k = # of events A in S
Randomly sample n elements without replacement
from S and define the RV
x = {# times an event A is selected}, then

k N-k

x n-x

C(k,x)C(N-k,n-x) 0 x n

P(x) = h(x; N, n, k) =

= C(N,n)

N
0 elsewhere

n

Note that order does not matter.

13
Binomial vs. Hypergeometric
Similarities Differences
Sample n items, x of Binomial:
which are event A and Bernoulli property
remaining n-x are event satisfied (i.e., events
Ac. statistically independent).
Order does not matter. Hypergeometric:

Bernoulli property not
satisfied (i.e., events not
statistically independent).
14
Binomial vs. Hypergeometric
If the size of the sample space N is large and

p k/N
then probability P(x) is such that
x n-x C(k, x)C(N - k,n - x)

C(n,x)p q
C(N,n)
Binomial Hypergeometric
Distribution Distribution
15
General
Hypergeometric Distribution
Consider a sample space containing N discrete
elements. Each of these elements is classified as one
of J events A1, A2, ..., AJ where there are ki of each
event Ai and k1+k2+...+kJ=N.
Select n elements at random without replacement
and let the J RVs be defined as
Xi = {# of times event Ai is selected} i=1,...,J
then
C(k1, x1 )C(k 2 , x 2 )...C(k J , x J )
P(x1, x 2 ,..., x J ) =
C(N, n)
Note that order does not matter & x1+x2+...+xJ=n
16
Uniform Distribution
Consider a sample space with k distinct

elements denoted by the RVs x1, x2, ..., xk.
If all k outcomes xi are equally likely to occur,
then

1/k i = 1,2,...,k
P(x i ) =

0 elsewhere
17
Poisson Distribution
Let the discrete RV for a Poisson process be

defined as
X = {# of events that occur during a
specified time span t (or space )}
then
(t)x e -t
x0
P(x) = x!

0 x <0
18
Helpful Hints
Some questions to help identify an
appropriate discrete probability distribution.
1. Does the outcome of the random process involve
one trial or a series of trials?
If a series, are they Bernoulli trials (i.e., statistically
independent)?
2. Does order matter?
For a given trial and/or between trials.
3. How many outcomes are possible on a given
trial?
Answers will not guarantee identification of correct probability distribution!
19
Discrete vs. Continuous
Discrete RV Continuous RV
b b
P(a x b) = P(x) P(a x b) = f(x)dx
x=a a
P(x)
f(x)
0.25
0.2

0.15
0.1
0.05
a b x
0
x f(x) measures the density
a b
of probability between 2
What if b = a? values 20
Final Note
This is a very important concept.
The pdf, f(x), is a measure of the

density of probability in a region.
The pdf, f(x), is not a probability function, hence

f(a) P(X=a)
21
A uniform distribution for a continuous RV X

has a pdf given by
1/(B - A) A xB
f(x) =
0 x < A, x > B
This formula is still valid if any of the strict

inequalities (< or >) is exchanged with its
associated inequality( or ).
We will see why this can be done later.
22
If A a x b B, then
b 1
P(a x b) = dx f(x)
a B- A 1/(B-A)
b-a
= a b x
B- A
Note that
P(axb) = P(a<xb) = P(ax<b) = P(a<x<b)

23
The cumulative distribution function is given by
0 x<A F(x)

x - A 1
F(x) = AxB
B - A
1 x
x>B A B
The expected value & variance are given by
E(X) = (A + B)/2 & VAR(X) = (B - A)2/12

24
Exponential Distribution
A exponential distribution with Poisson

parameter > 0 has a continuous RV
T = {time or space between successive
occurrences of a Poisson process}
and a pdf given by

e

-t
t 0 = 1/
f(t) = in book

0 t <0
Same Poisson process as in Poisson distribution.
25
Exponential Distribution vs. Poisson Distribution

e
-t
t 0 (t)x e -t
x0
f(t) = P(x) = x!

0 t <0
0 x <0
Similarities Differences
The random process is a Poisson:
Poisson process. Interested in a RV X that is
discrete.
represents average #
of events per unit time Exponential:
or space. Interested in a RV T that is
continuous.
The random process is the same for both!

26
Normal Distribution
A normal RV is denoted by
XN(,)
where
= mean & = standard deviation
and the pdf is given by
-(x-)2
1 2 2
f(x) = e - < x <
2
Also know as the Gaussian distribution.
27
Normal Distribution
How to use the ZN(0,1) table for XN,).

-(-)2
1 x 2 2
F(x) = e d
2 -
Substitute z = ( - )/ d = dz, then
2
(x-)/ -z Note
1
F(x) = e 2 dz If =x, then upper
2 - bound is (x-)/
x -
F(x) = FZ

28
Example
Portion of FZ(z) table for ZN(0,1)
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767
2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817
2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857
29
Special values for XN(,)
+ - - -
1. P(- X +) = FZ - FZ

= FZ(1) - FZ(-1)
= 0.8413 - 0.1587
= 0.6826
Approximately 2/3s of the outcomes from a
normally distributed random process will be
within one standard deviation of the mean.
30
Special Values for XN(,)
f(x) 0.5
0.45
0.4
0.35
P(- X +) = 0.683
0.3
P(-2 X +2) = 0.955

0.25
0.2
0.15
0.1
P(-3 X +3) = 0.997
0.05
x
-4 -2 0 2 4 6

2
3 31
Weibull Distribution
A continuous RV X has a Weibull distribution
if the pdf of X is
-1 -ax
ax e x0
f(x) =

0 x 0
Probability density function Cumulative distribution function
where the constant parameters a > 0 & > 0.
Note there are many physical systems where the

Weibull distribution fits the observed data very
well.
32
Weibull Distribution
The cumulative distribution function for the

Weibull distribution is given by
-ax
1- e x0
F(x) =

0 x 0

33
Bivariate Distributions
34
Schematic showing joint, marginal,
and conditional densities
35
Definition
Let X and Y be discrete RVs.

The joint probability function of X and Y is given by
P(x,y) = P[(X=x)(Y=y)]
The marginal probability functions of X and Y,
respectively, are given by

P(x) = P(x, y)
and y=-

P(y) = P(x, y)
x=- 36
Calculating P(x,y)
Consider a discrete bivariate distribution
with RVs X and Y. Values for the joint
probability function can be calculated using
P(x,y) = P[(X=x)(Y=y)]
= P(y|x)P(x) (1)
P(y|x) is the conditional probability function.
P(x) is the marginal probability.
Note its ok to exchange X with Y in (1).
37
Example
For this example, the structure for the joint probability

function table is
Y 0 1 P(x)
X
The RV Y & its
0 P(0,0) P(0,1) P(0) numerical
values.
1 P(1,0) P(1,1) P(1)
P(y) P(0) P(1) 1.0
38
Example

function table is
Y 0 1 P(x)
X
The RV X & its
0 P(0,0) P(0,1) P(0) numerical
values.
1 P(1,0) P(1,1) P(1)
P(y) P(0) P(1) 1.0
39
Example

function table is
Y 0 1 P(x)
X
The values
0 P(0,0) P(0,1) P(0) for the joint
probability
1 P(1,0) P(1,1) P(1)
function.
P(y) P(0) P(1) 1.0
40
Example

function table is
Y 0 1 P(x)
X
The values for
0 P(0,0) P(0,1) P(0) the marginal
probability
1 P(1,0) P(1,1) P(1)
function of Y.
P(y) P(0) P(1) 1.0
sum P(y) = P(x, y)
x=-
41
Example

function table is
sum
Y 0 1 P(x)
X
The values for
0 P(0,0) P(0,1) P(0) the marginal
probability
1 P(1,0) P(1,1) P(1)
function of X.
P(y) P(0) P(1) 1.0
P(x) = P(x, y)
y=-
42
Example

function table is
Y 0 1 P(x)
X
A reminder
0 P(0,0) P(0,1) P(0) these are
probabilities
1 P(1,0) P(1,1) P(1)
& must sum
P(y) P(0) P(1) 1.0 to 1.
43
Definition
Let X and Y be continuous RVs.

The joint probability density function, f(x,y), is such
that
b d
P[(a X b) (c Y d)] = f(x, y)dydx
a c
The marginal probability density functions are given by

Dont let the double integral scare you!
and f(x) = f(x, y)dy
Its not
- that bad.

f(y) = f(x, y)dx
- 44
Properties
The joint pdf is a pdf! Hence,

f(x,y) 0 for all x,yS
and

f(x, y)dydx = 1
- -
45
Illustration of Covariance
Consider a random process that produces RVs X and Y.

Collect several samples from this process and make a
scatter plot of its data. Applet
Y Y Y
X X X
If covariance If covariance If covariance

positive & large near zero negative & large
46
Definition
The covariance between RVs X and Y is given

by
COV(X,Y) = E[(X-X)(Y-Y)]
Formula valid for discrete RVs or continuous RVs.
Note an alternative formula for covariance is
COV(X,Y) = E(XY) - E(X)E(Y)
47
A Problem with Covariance
What is considered a large positive, or negative,

value for COV(X,Y)?
It depends on the units of the RVs X and Y.
Can eliminate this problem by considering the

correlation coefficient, which is defined by
COV(X, Y)
r=
VAR(X)VAR(Y)
Note that: -1 r 1
Also, r has same sign as COV(X,Y).
48
Whats It All About?
Consider a random process with RV X.
Probability Statistics
The type of distribution, The type of distribution,
and are all known. and are all unknown.
Use this information to Sample the output X then
compute the probability estimate and from the
of various outcomes X. sampled data.
Use this information to try
and determine the type of
distribution.
Called parametric estimation.
49
Central Limit Theorem
Consider N RVs X1, X2, ..., XN where
E(Xi) = i & VAR(Xi) = i2 i = 1,...,N
are known. Define a RV as

CLT implies that
X1 + X2 +... + XN X is approximately
X=
N a normal RV.
then let the standardized RV be,

X - 1 + ... + N /N Visualization
ZN =

12 + ... + 2N /N
and for N sufficiently large, ZN ZN(0,1).

50
More About the CLT
The approximation ZN ZN(0,1) is better as

N gets larger.
Only exact as N.
In general, a good Rule of Thumb is that the
Central Limit Theorem can be used if N > 30.
Can get accurate results using the CLT for some
distributions using a lower number of samples.
For example, N 12 works well for uniform distributions.
Some distributions (they rarely occur in the real-world)
require an N > 50 to get accurate results.
51
Statistical Sampling
Consider a random process with RV X.
Randomly sample N outputs from the
process and label their values x1, x2, ..., xN.
The sample mean is given by
x1 + x 2 +...+ x N
x=
N
The sample standard deviation is given by
Will show why Dont confuse

use N-1 rather
(x 1 - x )2 +...+(xN - x )2 these with
s=
than N later. N -1 and .
52
Statistical Sampling
distribution values
parameters from a sample
for the actual of the actual
population population
mean x
standard
s
deviation

Note that x and s are estimates of and ,

respectively, based on a sample of N outcomes.
As N gets larger, these estimates get better.
53
Once again, and are
Notes parameters for the
actual random process.
Some
. notes about the RV x.
Since it is a sum of RVs, it is an RV too!
Sometimes called an estimator or point estimator of .
The CLT implies that it has approximately a
normal distribution with mean of
E(x) =
and standard deviation of
Estimate of
gets better as
x =
N N gets larger.
We dont know the distribution of X ( and unknown.
54
How Good an Estimate of ?
Recall the RV x is approximately N(, / N).
fX x
Sample sizes
N<N<N
LargerWhy this
sample size N
behavior?
implies x more likely
to be closer to .
x (lbs.) 55
How Good an Estimate of ?
. x is approximately N( , / N), there is

Since
about a 95% chance that it is within 2 of .
But we dont know !
So use the estimate of from the sample s), thus
x s/ N
If N is large, this is a good estimate.
However,

if N is small, then will get a more accurate
result using the t-distribution rather than the normal
distribution.
Will consider shortly. 56

Back to the Example
Suppose we also randomly sampled 100 cars.

Sample size N = 10 Sample size N = 100
x = 3077 lbs x = 3156 lbs
s 565.6 s 483.2
X = =178.9 lbs X = = 48.3 lbs
N 10 N 100
Thus, a 95% chance 3077 lbs. Thus, a 95% chance 3156 lbs.
is within 357.8 lbs. of the is within 96.6 lbs. of the
average U.S. cars weight, . average U.S. cars weight, .
P(2719.2 3434.8) = 0.95 P(3059.4 3252.6) = 0.95
Called a confidence interval, will consider it soon. 57

Confidence Intervals
58
Whats a Confidence Interval?
Recall the sample meanx has approximately

a normal distribution.

Thus, can calculate the likelihood that x is within
a specified interval of the mean, , of the random
process.
Called a confidence level and its associated confidence
interval.
59
Definition
A confidence level and its associated confidence interval

is defined by
P(-c x - c) =1- a
where
[-c, c] is the confidence interval

and
(1-a) is the confidence level
In general, get an accurate result if N > 40.
Larger value than CLT Rule of Thumb due to increased
variability by using s instead of . 60
General Result - Normal Distribution
Some common confidence levels

& their associated confidence intervals.
Confidence Level Confidence Interval

(1 - a) [-c, c]
0.90 [-1.65s/N1/2, 1.65s/N1/2]
0.95 [-1.96s/N1/2, 1.96s/N1/2]
0.99 [-2.58s/N1/2, 2.58s/N1/2]
61
Recall This?
If the sample size is small, then the normal

distribution is not a good model for the distribution
of the sample mean x.
A better model is the t distribution.
Also called the Students t distribution.
62
t Distribution
Let X1, ..., XN be the sampled output from a

random process and define the sample mean
and standard deviation as
x1 + ... + x N (x1 - x)2 + ... + (xN - x)2
x= & s=
N N -1
then the RV given by
x - Note uses sample

T=
s/ N std. deviation.
has a t distribution with N-1 degrees of freedom.
63
How to read t-values
It is customary to let t represent
the t-value above which we find
an area equal to . Thus, the t-
value with 10 degrees of freedom
leaving an area of 0.025 to the
right (t0.025) is t = 2.228.
Find
a. t0.025 when v = 14
b. -t0.10 when v=10
c. t0.995 when v=7
64
Confidence Levels - t Distributions
The confidence interval [-c, c] associated with

the confidence level (1-a) for the t distribution
is given by
t a,k s
| c |=
N
ta,k is termed the percentage point of the t distribution.
Value of ta,k obtained from a table (see handout).

Note tables commonly present single-tailed ta,k values.
65
Hypothesis Testing
66
Format of the Hypothesis
Determine the probability that competing

hypotheses H0 and H1 are true where
H0: parameter = A
H1: parameter A
Hypothesis H0 called null hypothesis.
Hypothesis H1 called alternative hypothesis.
Note that H1: parameter A is termed a two-sided
alternative hypothesis.
67
Format of the Hypothesis
Note the alternative hypothesis may be of the

form
H1: parameter < A
or
H1: parameter > A
Called one-sided alternative hypothesis.
68
Testing the Hypotheses
Hypotheses are tested by estimating the value

of the parameter using sampling.
Sampled estimates value will rarely be equal to A.
Hence, the null hypothesis (H0: parameter = A) is
considered true if the estimate is close to A.
Close is defined by the critical values, AL and AH.
The acceptance region is given by
AL estimate AH
The critical region (reject H0) is given by
estimate < AL or AH < estimate
69
Testing the Hypotheses
Rejecting the null hypothesis H0 when it is

true is called a type I error.
That is, estimate implies H0 is false when it is
actually true.
Failing to reject the null hypothesis H0 when

it is false is called a type II error.
That is, estimate implies H0 is true when it is
actually false.
70
Hypothesis Testing Errors
Probability of a type I error is denoted by a,

P(reject H0 when H0 true) = a
Called the a error or significance level.
Probability of a type II error is denoted by ,

P(fail to reject H0 when H0 false) =
Called the error.
Note a 1 - since there are 4 possible outcomes.

71
Strength of Testing Decisions
The power of a statistical hypothesis test is the

probability of rejecting the null hypothesis H0
when the alternative hypothesis H1 is true.
Hence,
power = 1 -
Note the power represents the probability of correctly
rejecting a false null hypothesis, H0.
Power is a measure of the sensitivity of a

hypothesis test.
72
Minimum Significance Level
The P-value is the smallest significance level

that will result in rejection of the null
hypothesis H0 using the given sample data.
Recall the significance level is a.
If test statistic sample value is z0N(0,1) and null
hypothesis is H0: = 0, then
2(1 - FZ(z0)) H1: 0 (2-sided)

P= 1 - FZ(z0) H1: > 0 (1-sided)
FZ(z0) H1: < 0 (1-sided)
73

Normal Distribution and More

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Normal Distribution and More

Hochgeladen von

Copyright:

Verfügbare Formate

EGR Introduction to

Industrial & Systems Engineering

with the given methods.

Two events A and B are statistically independent

If Bi (i=1,...,k) are mutually exclusive and

For any discrete RV XS,

The expected value, E(X), of a discrete RV X is

Note the units of E(X) are the same as those of X.

An alternative method for computing variance.

Bernoulli trials are a set of n trials of a random

Let RV X satisfy the Bernoulli property and be defined

A combination, C(n,x), is the # of ways to select

Let the random process satisfy the Bernoulli property

For k=1, the negative binomial distribution is

Note that order does not matter.

Order does not matter. Hypergeometric:

If the size of the sample space N is large and

x n-x C(k, x)C(N - k,n - x)

Consider a sample space with k distinct

Let the discrete RV for a Poisson process be

This is a very important concept.

The pdf, f(x), is a measure of the

The pdf, f(x), is not a probability function, hence

A uniform distribution for a continuous RV X

This formula is still valid if any of the strict

The cumulative distribution function is given by

The expected value & variance are given by

E(X) = (A + B)/2 & VAR(X) = (B - A)2/12

A exponential distribution with Poisson

The random process is the same for both!

How to use the ZN(0,1) table for XN,).

Portion of FZ(z) table for ZN(0,1)

P(-2 X +2) = 0.955

Note there are many physical systems where the

The cumulative distribution function for the

Let X and Y be discrete RVs.

For this example, the structure for the joint probability

P(y) P(0) P(1) 1.0

For this example, the structure for the joint probability

P(y) P(0) P(1) 1.0

For this example, the structure for the joint probability

For this example, the structure for the joint probability

For this example, the structure for the joint probability

For this example, the structure for the joint probability

Let X and Y be continuous RVs.

The joint pdf is a pdf! Hence,

Consider a random process that produces RVs X and Y.

If covariance If covariance If covariance

The covariance between RVs X and Y is given

What is considered a large positive, or negative,

Can eliminate this problem by considering the

are known. Define a RV as

and for N sufficiently large, ZN ZN(0,1).

The approximation ZN ZN(0,1) is better as

Will show why Dont confuse

Note that x and s are estimates of and ,

Recall the RV x is approximately N(, / N).

. x is approximately N( , / N), there is

Will consider shortly. 56

Suppose we also randomly sampled 100 cars.

x = 3077 lbs x = 3156 lbs

Called a confidence interval, will consider it soon. 57

Recall the sample meanx has approximately