Sie sind auf Seite 1von 37

Introductory Statistics

By Peter Woolf (pwoolf@umich.edu)


University of Michigan

Michigan Chemical Process


Dynamics and Controls
Open Textbook

version 1.0

Creative commons
“A foolish consistency is the hobgoblin of little
minds” R. W. Emerson

But is this always true??


Consistency
• Why might we want consistency?
– Integration of products within a larger
system
– Examples: want parts to fit together, want
consistent chemical feeds, want consistent
material properties, want consistent energy
content, want consistent flavor
Consistency
• What can be the downsides of
consistency?
– Make something consistently bad, but
consistent.
– Sometimes people trade consistency for
quality--this is not the goal.
– Examples: Fast food vs home made food
(depends the cook)
Measures of “Quality” (or lack there of):
Six Sigma: “Number of defects per million
opportunities”

Genichi Taguchi: “Uniformity around a target value”


or “The loss a product imposes on society after it
is shipped”

Process control is a central tool for reducing


variability by adjusting and correcting for
variations.

Key Questions: How can we know if our


control system is working well enough?
How can we measure variability?
Process Specific Questions
1) Do recent data indicate that the
process is broken or changed?
2) Is the process “out of control”?
3) What are the odds that two samples
come from the same distribution?
4) What factors influence this outcome?
Detecting if a process has
changed
Scenario: You are a small Acai juice
vendor trying to expand to a world
market with a consistent product.
Acai juice production
juice

Berry
crusher

Acai berries in the market


Acai juice production
juice

A key selling point of your acai


juice is that it contains a large
concentration of antioxidants.
With your berry crusher you get a
good quality product most of the time,
but not always. You don’t want to
waste berries if your crusher is
hurting your product, but how can
you know if it is not working right?
How can you test this?
Acai juice production
How can you test this? juice

1) Gather many samples from your


current process and measure the
antioxidant concentration

N sample values: 40.1, 41.3, 44.3, 39.3,


38.6,…..
How do we summarize this?
juice

N sample values: 40.1, 41.3, 44.3,


39.3, 38.6,…..
How do we summarize this?
N
   xi
1
Average:
N i1

Deviation from N


1
the average:  (x i  ) 2
(std deviation) N i1

Deviation from N


1
the average:  (x i  ) 2
(std deviation) N i1
Interpretation: The average distance from the mean
OR the width of the dispersion around the mean
Problem: What if I have only one sample (e.g. N=1)?

=0!!
Does this mean that the underlying process has no
variation or that I have not sampled it sufficiently?
Result:When N is small, the standard deviation will
underestimate the true variation
N


1
Solution: sample standard deviation s (x i  ) 2

(N 1) i1
Population standard Sample standard
deviation deviation
(“Real deviation”) (“Observed deviation”)
N N

 
1 1
 (x i  ) 2
s (x i  ) 2

N i1 (N 1) i1

With a measure of the mean and standard deviation, you


have enough information to define a Gaussian distribution
 Bell curve shape
based on a model of a
large number of random,
uncorrelated changes
Gaussian or Normal Distribution:
From previous lecture on Noise:
Approximate Gaussian distribution in Excel by:
=RAND()+RAND()+RAND()-RAND()-RAND()-RAND()
The approximation is better and better for larger numbers
of pairs of add and subtract
Gaussian distribution is the basis of much of statistical
quality control, six sigma, and quality engineering in general.
6 How do we mathematically
define a normal distribution?
-3 -2 2 3
mean and standard deviation are sufficient statistics, meaning that they are sufficient to
describe a normal distribution

Mathematically , we can describe a normal distribution by the foll owing probabi lity
distributio n function:

1  1 x   2
PDF (x | , )  exp   
 2  2    
If we want to find the density up to some point, say z or less we can just integrate:
z
1  z   
 PDF(x | , )dx  2 1 erf  2 


(Note: this just makes one hard problem into another, in that now we have to calculate the
error function)
The error function is defined as:
2 x
erf (x) 

 exp(t 2 )dt
0

How can we calculate this?


Excel:
Error function is Erf(), thus the solution above could be
expressed as
=1/2*(1+erf((z-m)/(s*sqrt(2))))
Mathematica:
Nintegrate[ f(x), {x,start, end}] General numerical integration
Or
Using analytical solution
N[1/2*(1+Erf[(z-m)/(s*Sqrt[2])])]
with error function
Acai juice problem revisited
juice
From 100 samples of the
current process we calculate
the following:
Mean=40 units
Standard deviation= 2 units

From these data, what are the


odds that the next batch will
have an antioxidant value of
37.5 or less?
 1 x   2  1  37.5   

37.5 1
exp   dx  1 erf  
 2  2     2    2 

Mean=40 units  1 x   2 

37.5 1
Standard deviation= 2 units exp   dx
 2  2    

From these data, what are the
odds that the next batch will
have an antioxidant value of 1  37.5   
 1 erf  
37.5 or less? 2    2 

In Mathematica:


short hand notation

Answer: ~10% of the time we expect this situation


Example 1:
Say that we have a reactor with a temperature mean of 100 and standard deviation of 5
degree. Calculate the probabili ty of measuring a temperature of 92 or less.
92
1  92 100  1
 PDF(x |100,5)dx  2 1 erf  5 2  2 1 erf 1.13  0.054


What about 100 or less? -> 0.5

Example 2:
Given this same system, what is the probabili ty that the reactor is within 4 sigma of the
mean? (e.g. +/- 10 degrees)
110 90

 PDF(x |100,5)dx   PDF(x |100,5)dx 


 

1  110 100  1  90 100 


1 erf   1 erf   0.9545
2   5 2  2   5 2 
Example 1:
Say that we have a reactor with a temperature mean of 100 and standard deviation of 5
degree. Calculate the probabili ty of measuring a temperature of 92 or less.
92
1  92 100  1
 PDF(x |100,5)dx  2 1 erf  5 2  2 1 erf 1.13  0.054


What about 100 or less? -> 0.5

Example 2:
Given this same system, what is the probabili ty that the reactor is within 4 sigma of the
mean? (e.g. +/- 10 degrees)
110 90

 PDF(x |100,5)dx   PDF(x |100,5)dx 


 

1  110 100  1  90 100 


1 erf   1 erf   0.9545
2   5 2  2   5 2 
Acai juice production as a function of time

 time

Is this process “out of control”?


Antioxidant
value Key question:
How do we define
Yes: “It is unusual to see so many “unusual”
batches with such a high value--
this is strange and suggests
something has changed.”
No: “This is just normal variation--
nothing is fundamentally different.”
One definition: Variation What are the odds of finding
outside of the six sigma something that falls out of this
window is unusual bound by chance?
6
Common confusion:
The “Six Sigma” process
-3 -2 2 3 defines unusual as 3.4 defects
out of 1 million, not within 6
standard deviations (more like
10.2 deviations)

mean and standard deviation are sufficient statistics, meaning that they are sufficient to
Find by integration!
describe a normal distribution

Mathematically , we can describe a normal distribution by the foll owing probabi lity
distributio n function:

1  1 x   2
PDF (x | , )  exp   
 2  2    

For both tails the probability is ~0.0027


or 1 in 370
Acai juice production as a function of time

 time

Is this process “out of control”?


Antioxidant
value
Translation: if we assume outside of 6 sigma variation is
“unusual”: Is this pattern expected to happen less than 1 in
370 of our samples?
Solution: Control charts!
Control charts determine if a process is behaving in an unusual
way.

Image from wikipedia western_electric_rules


Control charts determine if a process is behaving in an unusual
way.

UCL=Upper
control limit

X-bar=
average

LCL=Lower
control limit
What are the odds?
If each dot is a single measurement, and UCL is +3 sigma then
Rule 1:

For both tails the probability is ~0.0027


or 1 in 370 Image from wikipedia western_electric_rules
Control charts determine if a process is behaving in an unusual
way.

UCL=Upper
control limit

X-bar=
average

LCL=Lower
1 in 370 1 in 326 control limit
What are the odds? Rule 2: Can do using probability theory.
Assuming each sample is independent, then can find the
total probability of:
2*[P1(out+)P2(out+)P3(out+)+P1(out+)P2(out+)P3(in)
+P1(out+)P2(in)P3(out+)+P1(in)P2(out+)P3(out+)] =0.00305
or 1 in 326

=P(out+) P(in)=1-P(out+)
1 in 370 1 in 326
What are the odds? Alternative solution by sampling
Approach: Generate
thousands of samples
and test to see how
many satisfy the rule

~ similar to 1 in 370
What are the odds? Rule 2:
Alternative solution by sampling

1 in 326
(see mathematica code on
website under Lecture 21.nb)

~ similar to 1 in 326
Message: Many complex decision
processes can be evaluated
numerically with good accuracy
In all cases these
represent somewhat
“rare” cases in a
statistical sense, but
they are not all
Odds 1 in 370 equally rare.
Odds 1 in 326
These are not only
constrained on
statistics though..
e.g. What are the odds
of finding 15 consecutive
samples in zone c?
Odds 1 in 180 Odds 1 in 256

Thus is this system “out of control”?


Yes, but in a good way.
=Odds 1 in 306
Acai juice problem revisited
What if you know that each batch of
berries has some variation, but you
are unsure if the machine is
behaving strangely? Can you still
use your control charts?
Solution: Take samples from each
batch, average them and plot
these average values and
statistics on a control chart.
Day 1: 40.36, 39.36, 38.43, 39.67
Day 2: 39.96, 40.32, 39.88, 39.75

Problem: The process of
averaging out different samples
will change your odds--averaging
reduces out variation.
Raw Data:
Acai process
control using
X-bar charts
Plotting the raw data, it
is hard to say if
anything is going on..
Acai process To get something
like this need UCL
control using and LCL
X-bar charts
Raw Data:

Data in excel example online


Lecture.21.xls
UCL= grand avg+
To get something
A3*(avg stdev)
like this need UCL
= 39.86+ 1.628*0.55
and LCL
=40.76
UCL= grand avg+
To get something
A3*(avg stdev)
like this need UCL
= 39.86+ 1.628*0.55
and LCL
=40.76
Note: If you use A2, you
use the average R. The
result is 40.77--nearly the
same.
LCL=grand avg-A3*(avg stdev)
=38.96

UCL represents 3 standard


deviations away from the mean, so
the line between zones A/B is 2
standard deviations away:
A/B line=grand avg+
A3*(avg stdev)*(2/3)= 40.46
X-bar chart

Conclusion:
Not in statistical
Is it “in control”? control.
Rule 1: okay, no points outside of zone A
Rule 2: fail: points 9 and 10 are in zone A
Rules 3 and 4: okay
Take Home Messages
• Statistical process control is a method
for systematically identifying
inconsistencies.
• Probabilities are often based on a
Gaussian process
• Control charts provide a systematic
method for evaluating if a process is
under control.
“A foolish consistency is the hobgoblin of
little minds”
--R. W. Emerson

“An intelligent consistency is a virtue in


an integrated global economy”
--Anonymous

Das könnte Ihnen auch gefallen