Sie sind auf Seite 1von 73

EGR Introduction to

Industrial & Systems Engineering


Complete your homework assignment only

with the given methods.

1
Course Topics
Overview of Industrial and Systems Engineering.
Probability in a finite sample space.
Conditional probability and Bayes theorem.
Discrete probability distributions.
Continuous probability distributions.
Probability for systems with 2 or more random events.
Statistical sampling and confidence intervals.
Hypothesis testing.

2
Statistical Independence

Two events A and B are statistically independent


if
P(A|B) = P(A) & P(B|A) = P(B)
and it follows that
P(AB) = P(A)P(B) (1)
Note (1) is a result of the conditional probability
formula
P(A B)
P(A | B) =
P(B) 3
Bayes Theorem

If Bi (i=1,...,k) are mutually exclusive and


B1B2...Bk = S
then for any event A

PB j PA | B j
PB j | A = k
PBi PA | Bi
i=1

4
Properties of Probability Function

For any discrete RV XS,


0 P(x) 1
and

P(x) 1
x=-


5
Definition

The expected value, E(X), of a discrete RV X is


given by
E(X) = xP(x)
x=-
Note that:
expected value average mean

Notation:
E(X) = = X

Note the units of E(X) are the same as those of X.


6
Note

An alternative method for computing variance.


VAR(X) = E[(X - E(X))2]
= E[X2 - 2XE(X) + E2(X)]
= E(X2) - E(2XE(X)) + E[E2(X)]
Since E(X) is a constant,
VAR(X) = E(X2) - 2E2(X) + E2(X)
VAR(X) = E(X2) - E2(X)

7
Definition

Bernoulli trials are a set of n trials of a random


process where the outcome of each trial is
statistically independent of the n-1 other
trials.
Examples.
Roll a pair of dice n times.
Flip a coin n times.
A random process whose outcomes are Bernoulli
trials is said to satisfy the Bernoulli property.
8
Binomial Distribution

Let RV X satisfy the Bernoulli property and be defined


by
X = {# of times of event A occurs in n trials}
then
n! x n-x
p q 0xn
P(x) = x!(n - x)!

0 elsewhere
where on any trial,
p = P(A) & q = P(Ac) = 1 - p

Note that order does not matter.
9
Aside

A combination, C(n,x), is the # of ways to select


x elements (without replacement) from a set of
n distinct elements where order does not
matter and is given by
n!
C(n,x) =
x!(n - x)!
Example: # ways to arrange 2 apples and 4 mangos

6!
C(6,2) =
2!(6 - 2)!
= 15
10
Negative Binomial Distribution

Let the random process satisfy the Bernoulli property


and define RV X by
X = {trial # when event A occurs for the kth time}
then
(x -1)!
p k q x-k 1 k x
P(x) = (k -1)!(x - k)!

0 elsewhere
where on any trial,
p = P(A) & q = P(Ac) = 1 - p

Note that order does not matter except for trial x.
11
Special Case

For k=1, the negative binomial distribution is


termed the geometric distribution where

pq x-1 x 1
P(x) =
0 elsewhere

12
Hypergeometric Distribution
Consider a discrete sample space S where
N = # of elements in S & k = # of events A in S
Randomly sample n elements without replacement
from S and define the RV
x = {# times an event A is selected}, then



k N-k







x n-x






C(k,x)C(N-k,n-x) 0 x n

P(x) = h(x; N, n, k) =

= C(N,n)



N
0 elsewhere


n

Note that order does not matter.


13
Binomial vs. Hypergeometric

Similarities Differences
Sample n items, x of Binomial:
which are event A and Bernoulli property
remaining n-x are event satisfied (i.e., events
Ac. statistically independent).

Order does not matter. Hypergeometric:


Bernoulli property not
satisfied (i.e., events not
statistically independent).

14
Binomial vs. Hypergeometric

If the size of the sample space N is large and


p k/N
then probability P(x) is such that

x n-x C(k, x)C(N - k,n - x)


C(n,x)p q
C(N,n)

Binomial Hypergeometric
Distribution Distribution
15
General
Hypergeometric Distribution
Consider a sample space containing N discrete
elements. Each of these elements is classified as one
of J events A1, A2, ..., AJ where there are ki of each
event Ai and k1+k2+...+kJ=N.
Select n elements at random without replacement
and let the J RVs be defined as
Xi = {# of times event Ai is selected} i=1,...,J
then
C(k1, x1 )C(k 2 , x 2 )...C(k J , x J )
P(x1, x 2 ,..., x J ) =
C(N, n)
Note that order does not matter & x1+x2+...+xJ=n
16
Uniform Distribution

Consider a sample space with k distinct


elements denoted by the RVs x1, x2, ..., xk.
If all k outcomes xi are equally likely to occur,
then


1/k i = 1,2,...,k
P(x i ) =

0 elsewhere

17
Poisson Distribution

Let the discrete RV for a Poisson process be


defined as
X = {# of events that occur during a
specified time span t (or space )}
then
(t)x e -t
x0
P(x) = x!

0 x <0
18
Helpful Hints
Some questions to help identify an
appropriate discrete probability distribution.
1. Does the outcome of the random process involve
one trial or a series of trials?
If a series, are they Bernoulli trials (i.e., statistically
independent)?
2. Does order matter?
For a given trial and/or between trials.
3. How many outcomes are possible on a given
trial?
Answers will not guarantee identification of correct probability distribution!
19
Discrete vs. Continuous
Discrete RV Continuous RV
b b
P(a x b) = P(x) P(a x b) = f(x)dx
x=a a
P(x)
f(x)
0.25

0.2

0.15

0.1

0.05
a b x

0
x f(x) measures the density
a b
of probability between 2
What if b = a? values 20
Final Note

This is a very important concept.

The pdf, f(x), is a measure of the


density of probability in a region.

The pdf, f(x), is not a probability function, hence


f(a) P(X=a)

21
Uniform Distribution

A uniform distribution for a continuous RV X


has a pdf given by

1/(B - A) A xB
f(x) =
0 x < A, x > B

This formula is still valid if any of the strict


inequalities (< or >) is exchanged with its
associated inequality( or ).
We will see why this can be done later.

22
Uniform Distribution

If A a x b B, then
b 1
P(a x b) = dx f(x)
a B- A 1/(B-A)

b-a
= a b x
B- A

Note that
P(axb) = P(a<xb) = P(ax<b) = P(a<x<b)

23
Uniform Distribution

The cumulative distribution function is given by

0 x<A F(x)

x - A 1
F(x) = AxB
B - A
1 x
x>B A B

The expected value & variance are given by

E(X) = (A + B)/2 & VAR(X) = (B - A)2/12


24
Exponential Distribution

A exponential distribution with Poisson


parameter > 0 has a continuous RV
T = {time or space between successive
occurrences of a Poisson process}
and a pdf given by

e

-t
t 0 = 1/
f(t) = in book

0 t <0
Same Poisson process as in Poisson distribution.
25
Exponential Distribution vs. Poisson Distribution

e
-t
t 0 (t)x e -t
x0
f(t) = P(x) = x!

0 t <0
0 x <0

Similarities Differences
The random process is a Poisson:
Poisson process. Interested in a RV X that is
discrete.
represents average #
of events per unit time Exponential:
or space. Interested in a RV T that is
continuous.

The random process is the same for both!


26
Normal Distribution

A normal RV is denoted by
XN(,)
where
= mean & = standard deviation
and the pdf is given by
-(x-)2
1 2 2
f(x) = e - < x <
2
Also know as the Gaussian distribution.
27
Normal Distribution

How to use the ZN(0,1) table for XN,).


-(-)2
1 x 2 2
F(x) = e d
2 -
Substitute z = ( - )/ d = dz, then
2
(x-)/ -z Note
1
F(x) = e 2 dz If =x, then upper
2 - bound is (x-)/

x -
F(x) = FZ

28
Example

Portion of FZ(z) table for ZN(0,1)

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .9767
2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817
2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857

29
Special values for XN(,)
+ - - -
1. P(- X +) = FZ - FZ

= FZ(1) - FZ(-1)

= 0.8413 - 0.1587

= 0.6826
Approximately 2/3s of the outcomes from a
normally distributed random process will be
within one standard deviation of the mean.
30
Special Values for XN(,)
f(x) 0.5

0.45

0.4

0.35
P(- X +) = 0.683
0.3

P(-2 X +2) = 0.955


0.25

0.2

0.15

0.1
P(-3 X +3) = 0.997
0.05

x
-4 -2 0 2 4 6


2
3 31
Weibull Distribution
A continuous RV X has a Weibull distribution
if the pdf of X is
-1 -ax
ax e x0
f(x) =

0 x 0
Probability density function Cumulative distribution function
where the constant parameters a > 0 & > 0.

Note there are many physical systems where the


Weibull distribution fits the observed data very
well.
32
Weibull Distribution

The cumulative distribution function for the


Weibull distribution is given by

-ax
1- e x0
F(x) =

0 x 0


33
Bivariate Distributions

34
Schematic showing joint, marginal,
and conditional densities

35
Definition

Let X and Y be discrete RVs.


The joint probability function of X and Y is given by
P(x,y) = P[(X=x)(Y=y)]
The marginal probability functions of X and Y,
respectively, are given by

P(x) = P(x, y)
and y=-

P(y) = P(x, y)
x=- 36
Calculating P(x,y)
Consider a discrete bivariate distribution
with RVs X and Y. Values for the joint
probability function can be calculated using
P(x,y) = P[(X=x)(Y=y)]
= P(y|x)P(x) (1)
P(y|x) is the conditional probability function.
P(x) is the marginal probability.
Note its ok to exchange X with Y in (1).
37
Example

For this example, the structure for the joint probability


function table is

Y 0 1 P(x)
X
The RV Y & its
0 P(0,0) P(0,1) P(0) numerical
values.
1 P(1,0) P(1,1) P(1)

P(y) P(0) P(1) 1.0

38
Example

For this example, the structure for the joint probability


function table is

Y 0 1 P(x)
X
The RV X & its
0 P(0,0) P(0,1) P(0) numerical
values.
1 P(1,0) P(1,1) P(1)

P(y) P(0) P(1) 1.0

39
Example

For this example, the structure for the joint probability


function table is

Y 0 1 P(x)
X
The values
0 P(0,0) P(0,1) P(0) for the joint
probability
1 P(1,0) P(1,1) P(1)
function.
P(y) P(0) P(1) 1.0

40
Example

For this example, the structure for the joint probability


function table is

Y 0 1 P(x)
X
The values for
0 P(0,0) P(0,1) P(0) the marginal
probability
1 P(1,0) P(1,1) P(1)
function of Y.
P(y) P(0) P(1) 1.0
sum P(y) = P(x, y)
x=-
41
Example

For this example, the structure for the joint probability


function table is
sum
Y 0 1 P(x)
X
The values for
0 P(0,0) P(0,1) P(0) the marginal
probability
1 P(1,0) P(1,1) P(1)
function of X.
P(y) P(0) P(1) 1.0
P(x) = P(x, y)
y=-
42
Example

For this example, the structure for the joint probability


function table is

Y 0 1 P(x)
X
A reminder
0 P(0,0) P(0,1) P(0) these are
probabilities
1 P(1,0) P(1,1) P(1)
& must sum
P(y) P(0) P(1) 1.0 to 1.

43
Definition

Let X and Y be continuous RVs.


The joint probability density function, f(x,y), is such
that
b d
P[(a X b) (c Y d)] = f(x, y)dydx
a c
The marginal probability density functions are given by

Dont let the double integral scare you!
and f(x) = f(x, y)dy
Its not
- that bad.


f(y) = f(x, y)dx
- 44
Properties

The joint pdf is a pdf! Hence,


f(x,y) 0 for all x,yS
and

f(x, y)dydx = 1
- -

45
Illustration of Covariance

Consider a random process that produces RVs X and Y.


Collect several samples from this process and make a
scatter plot of its data. Applet

Y Y Y

X X X

If covariance If covariance If covariance


positive & large near zero negative & large
46
Definition

The covariance between RVs X and Y is given


by
COV(X,Y) = E[(X-X)(Y-Y)]
Formula valid for discrete RVs or continuous RVs.
Note an alternative formula for covariance is
COV(X,Y) = E(XY) - E(X)E(Y)

47
A Problem with Covariance

What is considered a large positive, or negative,


value for COV(X,Y)?
It depends on the units of the RVs X and Y.

Can eliminate this problem by considering the


correlation coefficient, which is defined by

COV(X, Y)
r=
VAR(X)VAR(Y)
Note that: -1 r 1
Also, r has same sign as COV(X,Y).
48
Whats It All About?
Consider a random process with RV X.

Probability Statistics
The type of distribution, The type of distribution,
and are all known. and are all unknown.
Use this information to Sample the output X then
compute the probability estimate and from the
of various outcomes X. sampled data.
Use this information to try
and determine the type of
distribution.
Called parametric estimation.
49
Central Limit Theorem
Consider N RVs X1, X2, ..., XN where
E(Xi) = i & VAR(Xi) = i2 i = 1,...,N

are known. Define a RV as


CLT implies that
X1 + X2 +... + XN X is approximately
X=
N a normal RV.
then let the standardized RV be,

X - 1 + ... + N /N Visualization
ZN =

12 + ... + 2N /N

and for N sufficiently large, ZN ZN(0,1).


50
More About the CLT

The approximation ZN ZN(0,1) is better as


N gets larger.
Only exact as N.
In general, a good Rule of Thumb is that the
Central Limit Theorem can be used if N > 30.
Can get accurate results using the CLT for some
distributions using a lower number of samples.
For example, N 12 works well for uniform distributions.
Some distributions (they rarely occur in the real-world)
require an N > 50 to get accurate results.
51
Statistical Sampling
Consider a random process with RV X.
Randomly sample N outputs from the
process and label their values x1, x2, ..., xN.
The sample mean is given by
x1 + x 2 +...+ x N
x=
N
The sample standard deviation is given by

Will show why Dont confuse



use N-1 rather
(x 1 - x )2 +...+(xN - x )2 these with
s=
than N later. N -1 and .

52
Statistical Sampling
distribution values
parameters from a sample
for the actual of the actual
population population
mean x
standard
s
deviation

Note that x and s are estimates of and ,


respectively, based on a sample of N outcomes.
As N gets larger, these estimates get better.

53
Once again, and are
Notes parameters for the
actual random process.

Some
. notes about the RV x.
Since it is a sum of RVs, it is an RV too!
Sometimes called an estimator or point estimator of .
The CLT implies that it has approximately a
normal distribution with mean of

E(x) =
and standard deviation of
Estimate of
gets better as
x =
N N gets larger.
We dont know the distribution of X ( and unknown.
54
How Good an Estimate of ?

Recall the RV x is approximately N(, / N).

fX x
Sample sizes
N<N<N

LargerWhy this
sample size N
behavior?
implies x more likely
to be closer to .

x (lbs.) 55
How Good an Estimate of ?

. x is approximately N( , / N), there is


Since
about a 95% chance that it is within 2 of .
But we dont know !
So use the estimate of from the sample s), thus

x s/ N
If N is large, this is a good estimate.
However,

if N is small, then will get a more accurate
result using the t-distribution rather than the normal
distribution.

Will consider shortly. 56


Back to the Example

Suppose we also randomly sampled 100 cars.


Sample size N = 10 Sample size N = 100

x = 3077 lbs x = 3156 lbs

s 565.6 s 483.2
X = =178.9 lbs X = = 48.3 lbs
N 10 N 100

Thus, a 95% chance 3077 lbs. Thus, a 95% chance 3156 lbs.
is within 357.8 lbs. of the is within 96.6 lbs. of the
average U.S. cars weight, . average U.S. cars weight, .
P(2719.2 3434.8) = 0.95 P(3059.4 3252.6) = 0.95

Called a confidence interval, will consider it soon. 57


Confidence Intervals

58
Whats a Confidence Interval?

Recall the sample meanx has approximately


a normal distribution.

Thus, can calculate the likelihood that x is within
a specified interval of the mean, , of the random
process.
Called a confidence level and its associated confidence
interval.

59
Definition

A confidence level and its associated confidence interval


is defined by
P(-c x - c) =1- a
where
[-c, c] is the confidence interval

and
(1-a) is the confidence level
In general, get an accurate result if N > 40.
Larger value than CLT Rule of Thumb due to increased
variability by using s instead of . 60
General Result - Normal Distribution

Some common confidence levels


& their associated confidence intervals.

Confidence Level Confidence Interval


(1 - a) [-c, c]
0.90 [-1.65s/N1/2, 1.65s/N1/2]

0.95 [-1.96s/N1/2, 1.96s/N1/2]

0.99 [-2.58s/N1/2, 2.58s/N1/2]

61
Recall This?

If the sample size is small, then the normal


distribution is not a good model for the distribution
of the sample mean x.
A better model is the t distribution.
Also called the Students t distribution.

62
t Distribution

Let X1, ..., XN be the sampled output from a


random process and define the sample mean
and standard deviation as
x1 + ... + x N (x1 - x)2 + ... + (xN - x)2
x= & s=
N N -1
then the RV given by

x - Note uses sample


T=
s/ N std. deviation.
has a t distribution with N-1 degrees of freedom.
63
How to read t-values
It is customary to let t represent
the t-value above which we find
an area equal to . Thus, the t-
value with 10 degrees of freedom
leaving an area of 0.025 to the
right (t0.025) is t = 2.228.

Find
a. t0.025 when v = 14
b. -t0.10 when v=10
c. t0.995 when v=7

64
Confidence Levels - t Distributions

The confidence interval [-c, c] associated with


the confidence level (1-a) for the t distribution
is given by
t a,k s
| c |=
N
ta,k is termed the percentage point of the t distribution.
Value of ta,k obtained from a table (see handout).

Note tables commonly present single-tailed ta,k values.

65
Hypothesis Testing

66
Format of the Hypothesis

Determine the probability that competing


hypotheses H0 and H1 are true where
H0: parameter = A
H1: parameter A
Hypothesis H0 called null hypothesis.
Hypothesis H1 called alternative hypothesis.
Note that H1: parameter A is termed a two-sided
alternative hypothesis.

67
Format of the Hypothesis

Note the alternative hypothesis may be of the


form
H1: parameter < A
or
H1: parameter > A

Called one-sided alternative hypothesis.

68
Testing the Hypotheses

Hypotheses are tested by estimating the value


of the parameter using sampling.
Sampled estimates value will rarely be equal to A.
Hence, the null hypothesis (H0: parameter = A) is
considered true if the estimate is close to A.
Close is defined by the critical values, AL and AH.
The acceptance region is given by
AL estimate AH
The critical region (reject H0) is given by
estimate < AL or AH < estimate
69
Testing the Hypotheses

Rejecting the null hypothesis H0 when it is


true is called a type I error.
That is, estimate implies H0 is false when it is
actually true.

Failing to reject the null hypothesis H0 when


it is false is called a type II error.
That is, estimate implies H0 is true when it is
actually false.

70
Hypothesis Testing Errors

Probability of a type I error is denoted by a,


P(reject H0 when H0 true) = a
Called the a error or significance level.

Probability of a type II error is denoted by ,


P(fail to reject H0 when H0 false) =
Called the error.

Note a 1 - since there are 4 possible outcomes.


71
Strength of Testing Decisions

The power of a statistical hypothesis test is the


probability of rejecting the null hypothesis H0
when the alternative hypothesis H1 is true.
Hence,
power = 1 -
Note the power represents the probability of correctly
rejecting a false null hypothesis, H0.

Power is a measure of the sensitivity of a


hypothesis test.

72
Minimum Significance Level

The P-value is the smallest significance level


that will result in rejection of the null
hypothesis H0 using the given sample data.
Recall the significance level is a.
If test statistic sample value is z0N(0,1) and null
hypothesis is H0: = 0, then

2(1 - FZ(z0)) H1: 0 (2-sided)


P= 1 - FZ(z0) H1: > 0 (1-sided)
FZ(z0) H1: < 0 (1-sided)
73

Das könnte Ihnen auch gefallen