Beruflich Dokumente
Kultur Dokumente
Basic Statistics
Tech-Pro Consultants
Objectives
Review & Enhance The Basic Statistical & Quality Terms Needed For Six Sigma Process Improvement Begin To Enhance Minitab Operating Skills
Politicians Promise: if elected, I'd make certain that everybody gets an above average income
Tech-Pro Consultants
What is Statistics?
Is the science that develops methods to effectively derive information from numerical data Statistics is a collection of scientific methods for collecting, organizing and interpreting data, usually with the goal of inferring certain properties of the population from a representative sample of the population science of collecting and classifying a group of facts according to their relative number and determining certain values that represent characteristics of the group
There are three kinds of Lies: Lie, Damned Lie and Statistics Mark Twain
Tech-Pro Consultants
Basic Statistics
Types of data Measures of the Center of the data Mean Median Mode Measures of the Spread of Data Range Variance Standard Deviation Normal Distribution and Normal Probabilities Process Stability and Process Capability
Ask a statistician for her phone number... and get an estimate with 95% confidence
Used With Permission AlliedSignal 1995 - Dr. Steve Zinkgraf
Tech-Pro Consultants
What sorts of data do you see being collected around your area?
(List them below)
___________________________________________________ ___________________________________________________ ___________________________________________________ ___________________________________________________ ___________________________________________________ ___________________________________________________ ___________________________________________________ ___________________________________________________ ___________________________________________________ ___________________________________________________ ___________________________________________________
3 families)
ATTRIBUTE DATA (Count Data) Just ask (#1) Number of Items in a Category (Count-Based Proportions) yourself, Am I Heads / Tails (i.e., counting # of Heads and # of Tails) counting Yes / No (Order Form Filled Out Accurately or Not) things, Pass / Fail; Good / Bad (Accurate Billing/Overcharged) here? (#2) Counts of Discrete Event Occurrences If yes, you # of Scratches on a Car Hood have attributes # of Errors on a Form data. # of Insulation Breaks in a Spool of Wire # of times customer hangs up before receiving response VARIABLE DATA (Continuous Measurement Scale) (#3) Continuous Data Decimal subdivisions are meaningful Ex: Time to answer the telephone ( Exact # of secs. per call)
3 Families of Data:
Sample#1 Sample#2 Sample#3 Sample#4
ATTRIBUTES DATA
TYPE-I
Any Bubbles?
(accept / reject the entire item)
Reject
Reject
Accept
Reject
Poisson Distribution Normal Distribution or Other
TYPE-II
Number of Bubbles? 3 2 0 4
VARIABLES DATA
Glass Weight
Weight = 12.2 Weight = 12.4 Weight = 11.9 Weight = 12.1
Tech-Pro Consultants
Binomial Distribution
3 Families of Data:
Form#1 Form#2 Form#3 Form#4
ATTRIBUTES DATA
TYPE-I
Any Errors?
(accept / reject the entire item)
Reject
Reject
Accept
Reject
Poisson Distribution Normal Distribution or Other
TYPE-II
VARIABLES DATA
36.1 hrs
24.6 hrs
21.0 hrs
29.2 hrs
Tech-Pro Consultants
Binomial Distribution
Sample at 8:00am
UCLLCL-
Sample at 9:00am
DATE CONTROL LIMITS CALCULATED:
Sample at 10:00am
Average Sample Size: Frequency:
8:00am 9:00am
ANY CHANGE IN PEOPLE, EQUIPMENT, MATERIALS, METHODS, ENVIRONMENT, OR MEASUREMENT just She tells you are just Average: never mind, she is SYSTEMS, SHOULD BE NOTED. THESE NOTES WILL HELP YOU TO TAKE CORRECTIVE OR PROCESS
UCL-
LCL-
8:30am
8:40am etc.
4 3 2 1
Sample (n) Number (np, c) Proportion (p,u) Date (Shift, Time, etc.)
ANY CHANGE IN PEOPLE, EQUIPMENT, MATERIALS, METHODS, ENVIRONMENT, OR MEASUREMENT SYSTEMS, SHOULD BE NOTED. THESE NOTES WILL HELP YOU TO TAKE CORRECTIVE OR PROCESS
(1)
(2) (3) (4)
What is the largest probability possible? _______ What does this mean? What is the smallest probability possible? _______ What does this mean? What does a probability of 0.50 mean? _______________ What is the probability you will be struck by lightning during your lifetime? _____________________ What are your chances of appearing on The Tonight Show? ___________________ What is the probability of being killed by terrorists overseas? ____________________ What are your chances of being killed by an American in Baltimore? _______________
Tech-Pro Consultants
What is the largest probability possible? ___1.0 = 100%__ What does this mean? What is the smallest probability possible? ___0.0 = 0%__ What does this mean? What does a probability of 0.50 mean? 50% Just flip a coin What is the probability you will be struck by lightning during your lifetime? 0.000001667 = 1/600,000 What are your chances of appearing on The Tonight Show? 0.00000204 = 1/490,000 What is the probability of being killed by terrorists overseas? 0.000001538 = 1/650,000 What are your chances of being killed by an American in Baltimore? 0.00025 = 1/4,000
Instructor Page Tech-Pro Consultants
Roll a fair die once, what is Prob(a six)? ______ Roll a fair die twice, what is Prob(a six on the second roll)?__ Roll two fair dice, what is Prob(get two sixes)?____________ What do you think of the recent headline, Education research shows 49.5% of all American high school students fall below the national average!
Tech-Pro Consultants
Probability
Suppose a certain customer permits only those combinations which yield 3, 4, 5, . . . , or 11.
Used With Permission 6 Sigma Academy Inc. 1995
Tech-Pro Consultants
2
3 4 5 6 7 8
3
4 5 6 7 8 9
4
5 6 7 8 9 10
5
6 7 8 9 10 11
6
7 8 9 10 11 12
= =
Ways to form a 2
Ways to form a 12 Probability of Defect
in
in
Tech-Pro Consultants
1 1 2 3 4 5 6
6
Die 1 1 2 3 4 Die 2 4 3 2 1 Total Probability .0278 .0278 .0278 .0278 .1111
.0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278 .0278
Tech-Pro Consultants
Probability of any given value on Die 1 = 1/6 = .1667 Probability of any given value on Die 2 = 1/6 = .1667 Probability of any given combination = 1/6 x 1/6 = 1/36 = .0278
Tech-Pro Consultants
18 16 14 12 10 2.8%
LSL
USL
2.8%
8
6 4 2 0 2 4
Total of Dice Values
10
12
14
Used With Permission 6 Sigma Academy Inc. 1995
Zone of Customer Satisfaction 94.4% . . .Hence, the probability of Customer Satisfaction is 94.4 %
Tech-Pro Consultants
Statistical Distributions
We can describe the behavior of any process or system by plotting multiple data points for the same variable
Over time Across products or business By different people, machines, etc...
The accumulation of these data can be viewed as a distribution of values Represented by:
Dot plots Histograms Normal curve or other smoothed distribution
Used With Permission AlliedSignal 1995 - Dr. Steve Zinkgraf
Tech-Pro Consultants
Process = Hose
1 Drop = 1 Unit of Output Histogram is ... a pile of individual values
Y = Weight (lbs)
220
160
100
Dotplot
:: :: .: .:: :::::::::: : . : . . ::.::::: :.:::.:.:.:.: : : : : . : . . -----+---------+---------+---------+---------+---------+-C1 100 125 150 175 200 225
Tech-Pro Consultants
Dot Plots
2nd Observation 1st Observation
1.0 1.05
1.1 1.15
1.2 1.25
1.3 1.35
1.4
Diameter
Suppose we have a manufacturing line that is producing shafts. Diameters range from 1.0 to 1.4 inches. As we make a measurement of a shaft, we record the value with a dot on the above scale Ex: 1st Observation = 1.4 inches 2nd Observation = 1.1 inches
Tech-Pro Consultants
Dot Plots
1.1
1.15
1.2
1.25
1.3
1.35
1.4
And Suppose we continue sampling until 150 shafts have been measured What Statements Can You Make About Our Process ?
Tech-Pro Consultants
Dot Plots
1.1
1.15
1.2
1.25
1.3
1.35
1.4
Now imagine the same data, grouped into intervals with bars used to represent how the data looks.
Tech-Pro Consultants
Histogram Distribution
35 30 25
F requency
Data represented just with the dots is called a Dot Plot Using data represented in the above bar format is called a Histogram
Tech-Pro Consultants
Histogram
Lower Specification Upper Specification
.001
1.0
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
2.0
Now weve combined the Histogram with our Lower and Upper Specifications.
Question #2: What can you say about our process now ?
Tech-Pro Consultants
Histogram
Lower Specification Upper Specification
1.0
1.05
1.1
1.15
1.2
1.25
1.3
1.35
1.4
1.1
1.3
Tech-Pro Consultants
Dotplot Distribution
: : :::. . . :.. :::::: : : . :.. ..::::::::::::::: :: ::. :..: .::::::::::::::::::::::::.:::..:.: . -+---------+---------+---------+---------+------28.0 29.0 30.0 31.0 32.0
Imagine a customer service help line in which the business knows that to stay competitive, it must return the customers telephone calls in less than 30 minutes. The actual response time was measured 150 times and plotted above.
Tech-Pro Consultants
26
28
30
32
34
Time
Cp 1.85 Targ * Mean 30.0692 %>USL Exp 0.00 PPM>USL Exp 0
Finally, we can view the data as a smoothed distribution (red line), in this CPU 1.83 USL 35.000 Mean+3s 32.7649 Obs 0.00 Obs 0 CPL 1.88 LSL 25.000 Mean-3s 27.3735 %<LSL Exp 0.00 PPM<LSL Exp example using the normal distribution assumption. It provides an0 Cpk 1.83 k 0.014 s 0.8986 Obs 0.00 approximation of how the data might look if we were to collectObs infinite an 0 Cpm * n 150.000 number of data points
Used With Permission AlliedSignal 1995 - Dr. Steve Zinkgraf
Tech-Pro Consultants
Units of Measure
p(x > a) =
a
1 2
2 e-(1/2)[(x - )/ ] dx
Performance Limit
Given that 100% of the area under the normal curve lies between , we may calculate that area which lies beyond the performance limit. Doing so would reveal the random chance probability of creating a defect.
Area of Yield
Probability of a Defect
- infinity
a Note: The tails of the normal curve will touch the baseline at infinity.
+ infinity
Used With Permission 6 Sigma Academy Inc. 1995
Tech-Pro Consultants
Basic Statistics
Types of data
Tech-Pro Consultants
Data Example
(Actual # of Days from Order to Ship)
140 145 160 190 155 165 150 190 195 138 160 155 153 145
170 175 175 170 180 135 170 157 130 185 190 155 170 155
215 150 145 155 155 150 155 150 180 160 135 160 140 142
130 155 150 148 155 150 140 180 190 145 150 164 112 125
136 123 155 140 120 130 138 121 125 116 145 150 102 115
130 120 130 131 120 118 125 135 125 118 122 115
Tech-Pro Consultants
X=
(1) arrange data in order from smallest to largest (2) the middle number is the median!
1, 2, 3, 14, 85
The median is 3
- Not heavily influenced by extreme values
Tech-Pro Consultants
As head of the universitys Communications Dept. you are asked to summarize the average starting salaries of Communications graduates.
However, under the advice of the Public Relations Dept. you consider to including one of your former Communications majors: Shaquille ONeal (a rather wealthy rookie basketball star)
Mode (not used as much): The value that occurs most often.
The Mode may not exist; and if does exist, it may not be unique. -Can be used with categorical/attribute data What is the mode for the following set of defect data? # of change notices issued: -Price change: 13 -Spec change: 112 -Ship to address change: 40 -Delivery date changed: 79
Tech-Pro Consultants
Breakout Example
Suppose
your son or daughter is considering going to work for a small, family owned business after graduation. The owner of the business proudly states that, of the last 7 college graduates hired, the mean salary was $25,000; the salaries were bimodal, with modes of $18,000 and $20,000; and the median salary was $19,000. He refuses to identify the individual salaries
Use
your knowledge of the mean, median, and mode to analyze the starting salary opportunities with this company. (Round all salaries to the nearest $1,000)
Exercise
Minitab can easily calculate the Mean and Median
1. Open up Minitab 2. Open file: Distskew.mtw 3. Perform The Following Stat> Basic Statistics> Descriptive Statistics> 4. Enter The Variables Names 5. Evaluate Results
Tech-Pro Consultants
TABULAR FORM
Variable Normal Pos Skew Neg Skew N 500 500 500 Mean 70.000 70.000 70.000 Median 69.977 65.695 73.783 TrMean 70.014 68.554 71.368 StDev 10.000 10.000 10.000
Graphical Form
Tech-Pro Consultants
Different Distributions
Sketch in the Means and Medians on each Distribution.
Comparison of Distributions.
300
300
Comparison of Distributions.
Frequency
Frequency
200
200
Tail
100 0 0 10 20 30 40 50 60 70 80
100
Tail
60 70 80 90 100 110 120 130
C3
C2
Negative Skew
Positive Skew
Symmetric Distribution
Comparison of Distributions.
100
Frequency
50
0 20 30 40 50 60 70 80 90 100 110
C1
Tech-Pro Consultants
Graphical Reminder
* The 3 Charts On The Previous Page Were Created Under The Minitab Histogram Option Graph>Histogram
Tech-Pro Consultants
300
Mean
Median
Median
300
Mean
Frequency
Frequency
0 10 20 30 40 50 60 70 80
200
200
100
100
Neg Skew
Pos Skew
Mean, Median
100
Frequency
50
0 20 30 40 50 60 70 80 90 100 110
Normal
Used With Permission AlliedSignal 1995 - Dr. Steve Zinkgraf
Tech-Pro Consultants
Basic Statistics
Types of data Measures of the Center of the Data Mean Median Mode Measures of the Spread of Data Range Variance Standard Deviation Normal Distribution and Normal Probabilities
Tech-Pro Consultants
Examples of POPULATION:
Entire United States Yrs. Worth of Acct. Payable Every Grain of Sand On The Beach
X
= Population Mean = Population Standard Deviation
= Sample Mean
s=^
Tech-Pro Consultants
Range = R
Standard Deviation = s
Tech-Pro Consultants
CLASS EXERCISE
Calculate manually the Variance and Standard Deviation of These 5 Data Points
X
5 4 3 1 2
X
Avg = ___
X
2
X
4
-1
S2
Divide the Sum by (n-1): = Variance = S2 = __________
Tech-Pro Consultants
CLASS EXERCISE
Calculate manually the Variance and Standard Deviation of These 5 Data Points
X
5 4 3 1 2
X
Avg = 3
X
2 1 0 -2 -1
X
4 1 0 4 1
S2
Divide the Sum by (n-1): = Variance = S2 = 2.5
Instructor Page
Tech-Pro Consultants
Computational Equations
N
Population Mean
Xi =
N
i 1
(X i =
i=1
)2
N
n
Sample Mean
xi
x=
N
i=1
n
(X i X )2
Used With Permission 6 Sigma Academy Inc. 1995
s=
Tech-Pro Consultants
i=1
n -1
Point of Inflection
The distance between the point of inflection and the mean constitutes the size of a standard deviation. If three such deviations can be fit between the target value and the specification limit, we would say the process has three sigma capability.
1
Upper Specification Limit (USL) Target Specification (T) Lower Specification Limit (LSL) Mean of the distribution ( ) Standard Deviation of the distribution ( )
p(d) USL
Tech-Pro Consultants
Basic Statistics
Types of data
Tech-Pro Consultants
Tech-Pro Consultants
Tech-Pro Consultants
Exercise
300
add 10 add 10 add 10
X Axis
(pounds)
Suppose the weights of players on a football team had =300 lbs and =10 lbs You fill in the X-axis values (weights) above
Tech-Pro Consultants
270
280
290
300
310
320
330
X Axis
(pounds)
Exercise
Suppose the weights of a football team had =300 lbs and =10 lbs You fill in the X-axis values (weights)
Tech-Pro Consultants
Instructor Page
68%
X Axis
270 280 290 300 310 320 330 (pounds)
Tech-Pro Consultants
Instructor Page
95%
X Axis
270 280 290 300 310 320 330 (pounds)
Tech-Pro Consultants
Instructor Page
99.7%
X Axis
270 280 290 300 310 320 330 (pounds)
Tech-Pro Consultants
Instructor Page
The Normal Curve and Probability Areas Associated with the Standard Deviation
Property 2: The area under sections of the curve can be used to estimate the cumulative probability of a certain event occurring
Probability of sample value
40% 30% 20% 10% 0% -4 -3 -2 -1 0 1 2 3 4
Used With Permission AlliedSignal 1995 - Dr. Steve Zinkgraf
68%
95%
99.73%
Theoretical Normal
Empirical Normal
+/- 1
68%
95% 99.7%
Tech-Pro Consultants
60-75%
90-98% 99-100%
Used With Permission AlliedSignal 1995 - Dr. Steve Zinkgraf
+/- 2
+/- 3
Tech-Pro Consultants
We can test whether a given data set can be described as normal with a test called a Normal Probability Plot If a distribution is close to normal, the normal probability plot will be a straight line. Minitab makes the normal probability plot easy. Using Distskew.Mtw. Choose: Stat>Basic Stats>Normality Tests Produce a normal plot of each of the first 3 columns. Which appear to be normal?
Tech-Pro Consultants
.99 .95
Probability
Frequency
50
0 20 30 40 50 60 70 80 90 100 110
Average: 70 Std Dev: 10 N of data: 500
26
36
46
56
66
76
86
96
106
Normal
Anderson-Darling Normality Test A-Squared: 0.418 p-value: 0.328
C1
.999 .99
Frequency
200
100
60
60 70 80 90 100 110 120 130
Average: 70 Std Dev: 10 N of data: 500
70
80
90
100
110
120
130
Pos Skew
Anderson-Darling Normality Test A-Squared: 46.447 p-value: 0.000
C2
Frequency
Probability
200
100
If the Normality Test shows a P-value that is less than 0.05, then the data is NOT represented well by a normal distribution
Used With Permission AlliedSignal 1995 Dr. Steve Zinkgraf
0 0 10 20 30 40 50 60 70 80
Average: 70 Std Dev: 10 N of data: 500
Probability
10
20
30
40
50
60
70
80
C3
Neg Skew
Anderson-Darling Normality Test A-Squared: 43.953 p-value: 0.000
Tech-Pro Consultants
If your P value is less that than .05, then the data is NOT approximately normal.
Tech-Pro Consultants
Mystery Distribution
Generate a Normal Probability Plot for the Mystery variable in Mystery.mtw What is your conclusion? Is this a normal distribution?
Mystery Distribution
Probability
50
100
150
Mystery
Average: 100 Std Dev: 32.3849 N of data: 500 Anderson-Darling Normality Test A-Squared: 27.108 p-value: 0.000
Tech-Pro Consultants
Various sampling distributions of individual measurements Random sample of g sets with n measurements assigned to each set
X X
The central limit theorem states that the distribution of the sample means, our estimate of , can be approximated with a normal distribution even though the original population may be non-normal. Given this, we may say that the grand average (resulting from averaging sets of samples) approaches the universe mean as the number of sample sets approaches infinity. This property is at the core of many statistical tests and is very important for resolving a wide array of industrial problems. Tech-Pro Consultants
Important Distinctions:
The Distribution of Averages
VS
Tech-Pro Consultants
What would the Distribution of Individuals look like? The Distribution of Individuals
?
96
85 74 Y = Lifetime(Hrs) 96 85 74 Y = Lifetime(Hrs)
= Individual Measurement
Flashlight
What would the Distribution of Individuals look like? The Distribution of Individuals
96
85 74 Y = Lifetime(Hrs)
96
85 74 Y = Lifetime(Hrs)
= Individual Measurement
Flashlight
Tech-Pro Consultants
Tech-Pro Consultants
Distribution of Individuals
1 point is ... Histogram is... Spread is...
Distribution of Averages
1 Avg (i.e., 1 X-Bar) A Pile of X-Bars
n
SE(Mean)
X
What is the probability that an individual battery will last beyond 87 hours?
What is the probability that the average lifetime of an n=20 sample will exceed 87 hours?
Graphically...
74
85
96
74
85
96
Tech-Pro Consultants
__ _ _ __ _ _ __ _ _ __ _ _ __ _ _ __ _ _ __ _ _ __ _ _ __ _ _ __ _ _ __ _ _ __ _ _ __
97
95
X
93
93
91
91
89
89
Individuals
87
87
85
85
83
83
81
81
79
79
77
77
75
75
n=1
73
n=2
n=4
n=12
Tech-Pro Consultants
n=20
n=50
73
Basic Statistics
Types of data Measures of the Center of the Data Mean Median Mode Measures of the Spread of Data Range Variance Standard Deviation Normal Distribution and Normal Probabilities Process Stability and Process Capability
Used With Permission AlliedSignal 1995 - Dr. Steve Zinkgraf
Tech-Pro Consultants
Basic Statistics
Variability Is the process on target with minimum variability? We use the mean to determine if process is on target.
We use the Standard Deviation determine variability Stability How does the process perform over time? Represented by a constant mean and predictable variability over time.
UCL=77.27
Sample Mean
Sample Mean
70
X=70.98 LCL=64.70
X=70.91 70
60
65 0 5 1 0 1 5 20 25
LCL=64.62 50 0 5 1 0 1 5 20 25
Sample Number
Sample Number
Variation
While every process displays Variation, some processes display controlled variation, while other processes display uncontrolled variation (Walter Shewhart). . Controlled Variation is characterized by a stable and consistent pattern of variation over time. Associated with Common Causes. Uncontrolled Variation is characterized by variation that changes over time. Associated with Special Causes. Process A shows controlled variation. Process B shows uncontrolled variation
X-Bar Chart for Process A
UCL=77.20 75
80 UCL=77.27
Sample Mean
Sample Mean
70
X=70.98 LCL=64.70
X=70.91 70
60
65 0 5 1 0 1 5 20 25
LCL=64.62 50 0 5 1 0 1 5 20 25
Sample Number
Sample Number
Special Causes
Tech-Pro Consultants
Traditional
Cost Acceptable Goal Post Mentality
OLD
LSL Nom USL
New Cost
Tech-Pro Consultants
3 Points
UNDER THE OLD RULES, The field goal kicker gets 3 points for his team as long as the ball falls between the LSL and USL.
Tech-Pro Consultants
Points
UNDER THE NEW RULES, The Field Goal Kicker Might Get... 3 points Target & +/-1 2 points Between +/-1 & +/-2 1 point > +/-2 Out To The LSL & USL
Tech-Pro Consultants
If not, identify the variables which affect the mean and determine optimal settings to achieve target value
Is
If not, identify the sources of the variability and eliminate or reduce their influence on the process
Used With Permission AlliedSignal 1995 - Dr. Steve Zinkgraf
Tech-Pro Consultants
Time 1
Time 2 Time 3
Time 4
General Assumptions:: Over time, a typical process will shift and drift by approx. 1.5
LSL
T
Tech-Pro Consultants
USL
Used With Permission 6 Sigma Academy Inc. 1995
Target
0% Rejected
Tech-Pro Consultants
Not capable of getting all the water output into the clowns mouth?
Lower Specification
Upper Specification
.001
1.0 1.05
1.1 1.15
1.2 1.25
1.3 1.35
1.4
2.0
Now it is capable of getting all the water output into the clowns mouth
Tech-Pro Consultants
LSL
Increase in nonconformance due to shift in process centering
1.233 1.235 1.237 1.239
USL
T
1.241 1.243 1.245 1.247
Part
Recognize that the process center (m) is independent of the design center (T). In other words, the ability of a process to repeat any given centering condition is independent of the design specifications.
Tech-Pro Consultants
LSL T
1.235 1.237 1.239 1.241
USL
1.243
1.245
1.247
Part
Recognize that the process width is independent of the design width. In other words, the inherent precision of a process is not determined by the design specifications.
Tech-Pro Consultants
LSL
USL
Y = f
(X1 . . . XN)
The variation inherent to any dependent variable (Y) is determined by the variations inherent to each of the independent variables.
LSL
USL
Tech-Pro Consultants
LSL
USL
Summary
Reviewed & Enhanced The Basic Statistical & Quality Terms Needed For Six Sigma Process Improvement Began to Build Up Minitab Operating Skills
Tech-Pro Consultants
Six Sigma
Q&A
Tech-Pro Consultants
Six Sigma
Thank You
Tech-Pro Consultants