Fundamentals of Ststisitics

Fundamentals of statistics
Definition of statistics
A collection of quantitative data

pertaining to any subject or group,
specially when the data are systematically
gathered & collected.
The science that deals with collection,

tabulation, analysis, interpretation, and
presentation of quantitative data.
Two phases of statistics

Descriptive
or deductive
statistics:
Describes and analyzes a subject or
group.
Inductive
statistics:
Determines from a limited sample of

data, an important conclusions about
the population
3
Data collection
Data may be collected by direct

observation or indirectly through written or
verbal questions.
Data that are collected for quality control

purposes are collected by direct
observation and are classified as either
variable or attribute.
Data collection
Variable data:
Measurable. If capable of any degree of
subdivision, it is referred to as continuous.
Examples: weight, length......
Variables that exhibit gaps are called discrete.

Sometimes it is convenient for verbal or non
numerical data to assume the nature of a variable,
e.g. the quality of a surface finish can be classified
as good (3), average (2), & poor (1).
While many quality characteristics are stated in
terms of variables, many others must be stated as
attributes.
5
Data collection
Attributes:
Are those quality characteristics that are

classified as either conforming or
nonconforming, go / no go.
Characteristics that are judged by visual
observation are classified as attributes.
Sometimes it is desirable for variables to be
classified as attributes e.g. the weight of a
package may not be as important as if the
weight is within specs or not.
Data collection
In data collection, the number of figures is a function of

the intended use of the data.
For example, data on the life of light bulbs, it is acceptable

to say 995.6 h. 995.632 is too accurate than necessary.
If your upper and lower specs are 9.58 and 9.52mm, then
the data collected should be to the nearest .01 mm.
Your measuring instruments may not give a true reading

because of problems due to accuracy and precision.
Accuracy and precision
Accurate
Precise
Accurate &
precise
Not accurate
&not precise
True value
Describing the data

Sometimes data collected are too many that they are more
confusing than helpful. Consider the data shown in Table 1
0
1
1
2
0
1
1
TABLE 1
1
5
0
1
4
3
3
3
4
2
1
1
4
0
0
1
0
1
3
0
1
1
2
0
2
1
0
2
Number of Daily Billing Errors.
Describing the data

Clearly these data, in this form, are difficult to use and are not
effective in describing the datas characteristics. Some means
of summarising the data are needed to show what values the
data tends to cluster about and how the data are dispersed or
spread out.
Two techniques are available to accomplish this summarization of
data, graphical and analytical.
The graphical technique is a plot or picture of a frequency

distribution.
Analytical techniques summarize data by computing a measure

of central tendency and a measure of the dispersion.
Sometimes both the graphical and analytical techniques are used.

1
Graphical Techniques
Ungrouped data comprises a listing of the observed
values as shown
in Table1.Histograms
A method of processing
Frequency
Distribution
the data is necessary.
Ungrouped
A much better
understanding can be obtained by
data
tallying the frequency of each value of Daily Billing
Errors as shown in Table 2.
The numerical value for the number of tallies is
called the frequency.
Frequency Distribution Histograms

Table2- Tally of Number of Daily Billing Errors
Number
Nonconforming
Tabulation
Frequency
13
1
1

Ungrouped data
If the "Tabulation" column is eliminated, the resulting table is

classified as a frequency distribution, and can be graphically
presented as a histogram.
A histogram consists of a set of rectangles that represent the

frequency in each category as shown in Fig. 1
Fig.1
Frequency histogram
Frequency
10
0
0
Number non conforming

1

Ungrouped data
Another types of graphical presentations is the relative frequency

distribution, the cumulative frequency distribution and relative
cumulative frequency distribution.
Relative frequency is calculated by dividing the frequency for
each data value by the total. These calculations are shown in the
3rd column of Table 3 . Graphical presentation is shown in Fig. 2
Cumulative frequency is calculated by adding the frequency of
each data value to the sum of the frequencies for the previous
data values. These calculations are shown in the 4 th column of
Table 3 . Graphical presentation is shown in Fig. 3
Relative cumulative frequency is calculated by dividing the
cumulative frequency for each data value by the total. These
calculations are shown in the 5 th column of Table 3 . Graphical
presentation is shown in Fig. 4
Table 3- Relative Frequency Distributions

of Data
Number
Nonconformin
g
Frequency
Relative
Frequency
Cumulative
Frequency
Relative
cumulative
Frequency
9/35= 0.26
9/35= 0.26
13
13/35= 0.37
9+13=22
22/35= 0.63
5/35= 0.14
22+5=27
27/35= 0.77
4/35= 0.11
27+4=31
31/35= 0.89
3/35= 0.09
31+3=34
34/35= 0.97
1/35= 0.03
34+1=35
35/35= 1.00
Total
35
1.00
1
0.3
0.2
0.1
0
Relative frequency
0.4
Fig.2 Relative frequency histogram

1
40
30
20
10
0
Cumulative frequency
Fig.3 Cumulative frequency

histogram

1
1.00
0.75
0.50
0.25
0
Relative Cumulative frequency
Fig.4 Relative cumulative

frequency histogram

1
Grouped data
Most data are continuous rather than
discrete and require grouping
1. Collect data and construct a tally sheet.
2. Determine the range.
3. Determine the cell interval and the
number of cells.
4. Determine the cell midpoints.
5. Post the cell frequency.
6. Construct the histogram
2
Grouped data
1.
2.
Collect data and construct a tally sheet.

Individual observations are collected
representing the data
Determine minimum and maximum
observations.
Determine the range.
The range is the difference between the

highest observed value and the lowest
observed value
R = XH-XL
XH = highest number
XL = Lowest Number
2
Grouped data
3.
Determine the cell interval and no. of

cells.
The cell interval is the distance between adjacent cell

midpoints as shown in Figure 3.
The cell interval ( i ) and the numbers of cells (h) are

interrelated by the formula,
h = R/i
Since h and I are both unknown, a trial and error

approach is used to find the interval that will meet the
following guidelines.
2
Grouped data
Guidelines to determine number of cells
In general, the number of cells should be between 5 and 20.
Use 5 to 9 cells when the number of observations is less than 100;
Use 8 to 17 cells when the umber of observations is between 100

and 500; and
Use 15 to 20 cells when the number of observations is greater than

500.
Another method to determine the number of cells h
h N
where N is the no. of observations
Fig. 5 Cell Classification

Interval (i)
Cell
Midpoint
Lower
Boundary
Upper
Boundary
Grouped data
4.
Determine the cell midpoints.

The cell midpoint is determined by using the formula
Mp = XL + i / 2
Where Mp is the midpoint of the cell
XL is the lower boundary of the cell
i
is the cell interval
5. Post the cell frequency.
Cell frequency is the sum of frequencies of values
within the cell boundaries. Make a tally of the
values
6. Construct the histogram
Grouped data
Example problem 1
A company that fills bottles of oil tries to maintain a
specific weight of the product. The table gives
the weight of 110 bottles that were checked at
random intervals. Make a tally of these weights
and construct a frequency histogram ( weight is
in KGs )
Grouped data
Example problem 1
6.0
0
5.9
8
6.0
1
6.0
1
5.9
7
5.9
9
5.9
8
6.0
1
5.9
9
5.9
8
5.9
6
5.9
8
5.9
9
5.9
9
6.0
3
5.9
9
6.0
1
5.9
8
5.9
9
5.9
7
6.0
1
5.9
8
5.9
7
6.0
1
6.0
0
5.9
6
6.0
0
5.9
7
5.9
5
5.9
9
5.9
9
6.0
1
6.0
0
6.0
1
6.0
3
6.0
1
5.9
9
5.9
9
6.0
2
6.0
0
5.9
8
6.0
1
5.9
8
5.9
9
6.0
0
5.9
8
6.0
5
6.0
0
6.0
0
5.9
8
5.9
9
6.0
0
5.9
7
6.0
0
6.0
0
6.0
0
5.9
8
6.0
0
5.9
4
5.9
9
6.0
2
6.0
0
5.9
8
6.0
2
6.0
1
6.0
0
5.9
7
6.0
1
6.0
4
6.0
2
6.0
1
5.9
7
5.9
9
6.0
2
5.9
9
6.0
2
5.9
9
6.0
2
5.9
9
6.0
1
5.9
8
5.9
9
6.0
0
6.0
2
5.9
9
6.0
2
5.9
5
6.0
2
5.9
6
5.9
9
6.0
0
6.0
0
6.0
1
5.9
9
5.9
6
6.0
1
6.0
0
6.0
1
5.9
8
6.0
5.9
5.9
5.9
6.0
5.9
6.0
5.9
6.0
6.0
5.9
Grouped data
Example problem 1 Sol.
R = XH - XL
= 6.05 5.94 = 0.11
N 110 10.49 11
h = R/i
11 = 0.11 / i
i = 0.11 / 11 = 0.01
Grouped data
Example problem 1
Sol.
Frequency
Group /cell
fi
5.94
5.95
5.96
5.97
5.98
16
5.99
24
6.00
20
6.01
17
6.02
13
6.03
6.04
6.05
Total
110
2

Histogram of Oil bottles weight
24
22
20
10
Oil bottles weight ( kgs)

3
Grouped data
Example problem 2
The relative strength of 150 silver solder welds are
tested, and the results are given in the table.
Determine the cell interval and the approximate
number of cells. Make a table showing cell
midpoints, cell boundaries, and observed
frequencies. Plot a frequency histogram
Grouped data
Example problem 2
1.5
1.2
3.1
1.3
0.7
1.3
3.4
1.3
1.7
2.6
1.1
0.8
0.1
2.9
1.0
1.3
2.6
1.7
1.0
1.5
2.2
3.0
2.0
1.8
0.3
0.7
2.4
1.5
0.7
2.1
2.9
2.5
2.0
3.0
1.5
1.3
3.5
1.1
0.7
0.5
1.6
1.4
2.2
1.0
1.7
3.1
2.7
2.3
1.7
3.2
3.0
1.7
2.8
2.2
0.6
2.0
1.4
3.3
2.2
2.9
1.8
2.3
3.3
3.1
3.3
2.9
1.6
2.3
3.3
2.0
1.6
2.7
2.2
1.2
1.3
1.4
2.3
2.5
1.9
2.1
3.4
1.5
0.8
2.2
3.1
2.1
3.5
1.4
2.8
2.8
1.8
2.4
1.2
3.7
1.3
2.1
1.5
1.9
2.0
3.0
0.9
3.1
2.9
3.0
2.1
1.8
1.1
1.4
1.9
1.7
1.5
3.0
2.6
1.0
2.8
1.8
1.8
2.4
2.3
2.2
2.9
1.8
1.4
1.4
3.3
2.4
2.1
1.2
1.4
1.6
2.4
2.1
1.8
2.1
1.6
0.9
2.1
1.5
2.0
1.1
3.8
1.3
1.3
1.0
0.9
2.9
2.5
1.6
1.2
2.4
3
Grouped data
R = XH - XL
= 3.8 0.1 = 3.7
N 150 12.25 13
h = R/i
13 = 3.7 / i
i = 3.7 / 13 = 0.3

Cell
boundaries
Midpoint
0.1 0.4
0.25
0.4 0.7
0.55
0.7 1.0
0.85
1.0 1.3
1.15
14
1.3 1.6
1.45
25
1.6 1.9
1.75
20
1.9 2.2
2.05
18
2.2 2.5
2.35
18
2.5 2.8
2.65
2.8 3.1
2.95
17
3.1 3.4
3.25
11
3.4 3.7
3.55
3.7 4.0
3.85
Total
xi
Frequency
150
fi

Histogram of strength of silver welds
26
24
22
20
10
0
Strength
3
Uses of Histogram
The histogram describes the variation in the process. It is used to:

1. Determine the process capability,
2. Compare with specifications,
3. Suggest the shape of the population, and
4. Indicate discrepancies in data such as gaps.
Fig.6 Characteristics of Frequency

Distribution Graphs
A smooth curve represents a population

frequency distribution whereas the histogram
represents a sample frequency distribution
Symmetrical
)Normal(
Bimodal
Skewed to the Right
Peaked
Skewed to the Left
Flat
3
Characteristics of Frequency
Distribution Graphs
Provide a basis for decision making without further analysis.
Have certain identifiable characteristics:
Symmetry or lack of symmetry of the data. Are the data equally distributed on
each side of the central value, or are the data skewed to the right or to the left?
Number of modes or peaks to the data.
Location of data.
the spread of data ( quite peaked or flat )
Location
Spread
Shape
Figure 7 Differences due to location, spread, and shape
Analysis of Histograms
Analysis of a histogram can provide information

concerning specifications.
Fig. 8 shows a histogram for the % of wash
concentration in a steel tube cleaning operation
prior to painting.
No complex statistics are needed to show that
corrective actions are needed to bring the spread
of the distribution closer to the ideal value of 1.6%.
Concentrations less than 1.45% produce poor
quality, while concentrations more than 1.75% are
costly and therefore reduce productivity
Fig. 8 Histogram of wash

concentration
Ideal
Frequency
10
0
0.7
1.0
1.3
1.6
1.9
2.2
2.5
2.8
Wash concentration %
4
Interpreting Histogram
Fig. 9 Histogram Shapes
Normal. Many measured characteristics follow a normal distribution .

The histogram is bell-shaped. Normal distribution is so common that if
the histogram is not bell shaped, we should ask ourselves why not?
Bimodal (or Multimodal). These histograms have two (bimodal) or

many (multimodal) peaks. Such histograms result when the data come
from two or more distributions. For example, if the data came from
different suppliers, machines, shifts, and so on, a bimodal (or
multimodal) histogram will signal large differences due to these causes.
Empty Interval. In this case, one of the intervals has zero frequency.
This may result from prejudice (unfairness) in data collection.
Positive Skew. Positive skew means a long tail to the right. This is
common when successful efforts are being made to minimize the
measured value. Also, variance has a positively skewed distribution.
Negative Skew. Negative skew means a long tail to the left. This is
common when successful efforts are being made to increase the measured
value. Such a histogram may also result if sorting is taking place.
Uniform. This histogram looks more like a rectangular distribution. Such a

histogram can result if the process mean is not in control, as in the case
when tool wear is taking place.
Outlier. Here one or more cells are greatly separated from the main body
of the histogram. Such observations are often the result of wrong
measurement or other mistakes.
The mean, standard deviation, and histogram provide extremely useful

summaries of the data. However, they do not contain all the information in
the data. In particular, data are often collected over time and any time
trends are lost in the summaries considered so far.
Test for normality

Histogram.
Visual examination of a histogram developed from

a large amount of data will give an indication of
the underlying population distribution.
If a histogram is unimodal, symmetrical, and

tapers off at the tails, normality is a definite
possibility and may be sufficient information in
many practical situations.
The larger the sample size, the better the

judgment of normality.
A minimum sample size of 50 is recommended.

4
Analytical Techniques
Measures of Central Tendency

It is a numerical value that describes the
central position of the data or how the data
tend to build up the center.
There are 3 measures in common use:
1- The average
2- The median
3- The mode

1- Average: is the most commonly used specially
with symmetrical distributions. It is the sum of
observations divided by their number.
n
Where
X = average
n = number of observed values
X
i 1
fi
= frequency of the i th cell
X 1 X 2 ..... X n
n
Ungrouped Data
h
Xi = observed values / midpoints of cells

h = number of cells
f x
i 1
h
f
i 1
Grouped Data
4
Ungrouped data - Example
1.
Resistance value of 5 coils in are

x1 = 3.35
2.
x2 = 3.37
3.
x3 = 3.28
4.
x4 = 3.34
5.
x5 = 3.30
Average
=
=
3.35 3.37 3.28 3.34 3.3

5
3.33
4
Grouped data Example 1
Given the frequency distribution of the life of

320 automotive tires in 1000 km as shown in
Table 4, determine the average
9
fx
i 1
h
f
i 1
11549
36.1
320
(In 1000 km) = 36100

km
Table 4- Frequency Distributions of

the life of 320 tires in 1000 km
Midpoint
Frequency
Computation
Group
xi
fi
fi
23.6 26.5
25
100
26.6 29.5
28
36
1008
29.6 32.5
31
51
1581
32.6 35.5
34
63
2142
35.6 38.5
37
58
2146
38.6 41.5
40
52
2080
41.6 44.5
43
34
1462
44.6 47.5
46
16
736
47.6 50.5
49
294
320
11546
Total
Xi
Grouped data Example 2
The weight of 65 castings is distributed as follows:
Midpoint
Frequency
xi
fi
3.5
3.8
1.Determine the average

4.1
18
2.Plot a frequency histogram
3.Evaluate the production process if4.4
specs are 4.25 0.60 kg 14
4.7
13
5.0
Grouped data Example 2 Sol
Compute the column fi Xi
Midpoint
Frequen
cy
xi
fi
3.5
21
3.8
34.2
4.1
18
73.8
4.4
14
61.6
4.7
13
61.1
5.0
25
Total
65
276.7
Computa
tion
fi
Xi
6
fx
i 1
6
f
i 1
276.7
4.27 kg
65
Fig.1
Frequency histogram
Frequency
20
The process is centered,
controlled, but not
applicable
10
0
3.5
3.8
3.65
4.1
4.25
4.4
4.7
5.0
Weight
4.85
5
2- The median: is the value that divides a series

of ordered observations. It is an effective
measure for skewed distributions.
3. The mode: is the value that occurs with
greatest frequency.
A series of numbers is referred to as unimodal :

if it has one mode
Bimodal : if it has two modes
Multimodal : if there are more than two modes
The Median
Example1. Find the median distance for the
following data.
85, 125, 130, 65, 100, 70, 75, 50, 140, 95, 70
Sol.
50, 65, 70, 70, 75, 85, 95, 100, 125, 130, 140
Single middle value
Ordered data
Median = 85
The Median
Example. Find the median distance for the following data
85, 125, 130, 65, 100, 70, 75, 50, 140, 135, 95, 70
Sol.
50, 65, 70, 70, 75, 85, 95, 100, 125, 130, 135, 140
Two middle values so

take the mean.
Ordered data
Median = 90
5
The Mode
The mode of a set of data is the value in the set

that occurs most often.
A set of data can be bimodal. It is also possible to
have a set of data with no mode.
Bimodal: 15, 18, 18, 18, 20, 22, 24, 24, 24,
26, 26
Unimodal :6, 8, 9, 9, 9, 10, 11 14, 15, 18
No Mode : 2.7, 3.5, 4.9, 5.1, 8.3

Figure 10 Relationship among average,
Median, and
Mode
Symmetrical
Average
Median
Mode
Positively Skewed
Mode
Median
Average
Negatively Skewed
Average Mode
Median
Measures of Dispersion
Introduction
A second tool of statistics is composed of the

measures of dispersion, which describe how the
data are spread out or scattered on each side of
the central value. Measures of dispersion and
measures of central tendency are both needed to
describe a collection of data.
Two common types of measures of dispersion:
1- Range
2- Standard deviation
1- Range
The range of a series of numbers is the
difference between the largest and
smallest values or observations.
Symbolically, it is given by the formula.
R = XH X L
Where
R = range
XH = highest observation in a series
XL = lowest observation in a series
Example problem
If the weights of a sample of 10 bottles of
shampoo are recorded as follows ( in gm).
150, 147, 152, 156, 144, 148, 149, 153,
146,151
Determine the range of sample
Solution
Max weight XH = 156 gm
Min weight XL = 144 gm

Range
R = X H XL
= 156 144 = 12 gm
6
2- Standard deviation
The standard deviation is a numerical value in the units of the

observed values that measures the spreading tendency of the data.
A large standard deviation shows greater variability of the data than
does a small standard deviation. In symbolic terms it is given by the
formula
n
S
Where
(X
i 1
X)
Or
n X i2
i 1
s (n=sample
standard deviation
1)
Xi
= observed value
Xi
i 1
n(n 1)
= average
= number of observed values
X
6
Example Problem 1
Determine the standard deviation of moisture content of a roll

of Kraft paper. The results of six readings across the paper
web are 6.7, 6.0, 6.4, 5.9, 6.4, and 5.8 %
i 1
n X i2
i 1
Xi
n(n 1)
6(231.26) (37.2) 2
6(6 1)
( see Table 5 )
0.35%
6
Example Problem 1
Table 5- Measure of standard
deviation
xi
X i2
6.7
44.89
36
6.4
40.96
5.9
34.81
6.4
40.96
5.8
33.64
37.2
231.26
Example Problem 2
Four readings of the thickness of a paper are 0.076, 0.082,

0.073, and 0.077mm. Determine the sample standard deviation
Sol.
n X i2
i 1
X
i 1
n(n 1)
4(0.023758) ( 0.308) 2
4(4 1)
0.000168
12
0.095032 0.094864
12
0.000014 0.0037
6
Example Problem 2
Measure of standard deviation
xi
X i2
0.076
0.005776
0.082
0.006724
0.073
0.005329
0.077
0.005929
0.308
0.023758
Relationship between the measures

of dispersion (range & standard
deviation)
Range is useful when data are too small
The standard deviation is used when a more precise

measure of dispersion is desired (# of observations > 10).
As shown in Fig. 11 two distributions may have the same

average and range, but their standard deviations are
different . The distribution on the bottom is much better
and the sample standard deviation is much smaller which
means better quality
Fig. 11 Comparison of two

distributions with equal average
and range
R
6
Table 6 Analytical Technique Recap
Concept of a population and a

sample
A sample is selected to represent the population.

Since the composition of samples will fluctuate, the
computed statistics will be larger or smaller than their
true population values (parameters).
Sampling is necessary when measuring of the entire
population is:
- impossible
- too expensive
- destructive
- too dangerous
We use different symbols to differentiate between
samples and population.
Table 7 Comparison of sample and

population
Sample
Population
statistic
parameter
X average
( Xo ) mean
S sample standard
deviation
(So) standard
deviation
Table 8 Results of 8 samples of

green & blue spheres
Sample
number
Sample
size
No. of
green
spheres
No. of blue
spheres
% of green
spheres
10
10
10
20
10
50
10
10
10
30
10
10
10
20
10
10
Total
80
15
65
18.8
Comparison of sample and

population
Table 8 shows the results of an experiment that illustrates

the relationship between samples and the population.
A container holds 800 blue and 200 green spheres . The 1000
spheres are considered the population with 20% green
spheres.
8 samples of size 10 spheres are selected, checked in colour

and replaced ( one by one ).
The table illustrate the difference between the sample results

and what should be expected from the known population.
The Normal Curve
One type of population that is quite common is called the

normal curve. The normal curve is a symmetrical,
unimodal, bell-shaped distribution with the mean,
median, and mode having the same value.
A population curve or distribution is developed from a

frequency histogram.
As the sample size of a histogram gets larger and larger,

the cell interval gets smaller and smaller.
The Normal Curve
When the sample size is quite large and the cell interval
is very small, the histogram will take on the appearance of a
smooth polygon or a curve representing the population.
Much of the variation in nature and in industry follows the

frequency distribution of the normal curve
A curve of the normal population of 1000 observations of the

resistance in ohms of an electrical device with population
mean, , of 90 and population standard deviation , of 2
is shown in figure12. The interval between dotted lines is
equal to one standard deviation, .
7
Figure 12 The normal curve
Frequency
84
90
88
86
+ 96
94
92
The standardized normal

distribution
Much of the variation in nature and in industry follows the

frequency distribution of the normal curve
All normal distributions of continuous variables can be

converted to the standardized normal distribution ( see fig.
13) by using the standardized normal value Z.
For example consider the value of 92 in fig. 12 , which is one

standard deviation above the mean. Conversion to the Z value is
X i 92 90
Figure 13 The standardized normal

distribution
0
1
-3
-2
-1
The standardized normal

distribution
Fig. 13 shows the standardized curve with its mean of Zero

and standard deviation of 1. The area under the curve is
equal to 1.0 or 100% and therefore can easily be used for
probability calculations.
A normal area table is provided as Table A in the appendix
Relationship to the mean and

standard deviation
Fig. 14 shows three normal curves with different mean values

and the same standard deviation. The only change is in
location.
Fig. 15 shows three normal curves with the same mean value
but different standard deviations. The figure illustrates the
principle that the larger the standard deviation, the flatter
the curve, and the smaller the standard deviation, the more
peaked the curve.
It is noted that the two parameters ( mean & standard

deviation ) are independent.
8
Figure 14 Normal curve with

different means, but identical
standard deviations
= 14
11
14
= 20
17
20
= 29
23 26
29 32 35
38
Figure 15 Normal curve with different

standard deviations, but identical means
= 1.5
=3
= 4.5
11
14
17
20
23
26
29
32
35
+
8
Figure 16 Percent of items included

between certain values of the
standard deviation
68.26%
95.46%
99.73%
- 3
- 2
- 1
1+
2+
3+
Applications
The areas under the curve for various Z values are given in
Table A in the appendix. Table A, "Areas under the Normal
Curve," is a left reading table, which means that the given
-
areas are for that
portion of the curve from
to a
particular value, Xi.
The first step is to determine the Z value using the formula
Xi
whereXZ
= standard normal value
i
= individual value
= mean
= population standard deviation
8
Example problem (1)
The mean value of the weight of a particular brand of cereal for

the past year is 0.297 kg (10.5 oz) with a standard deviation of 0.024
kg. assuming a normal distribution, find the percent of the data that
falls below the lower specification limit of 0.274 kg. (Note: Since the
mean and standard deviation were determined from a large number
of tests during the year, they are considered to be valid estimates of
the population values.)
Area1
= 0.024
= 0.297
Xi = 0.274
8
Example problem (solution)
Xi
Z
= 0.274 - 0.297
0.024
= - 0.96
From Table A it is found that for Z = - 0.96,
Area1 = 0.1685 or 16.85%
Thus, 16.85% of the data are less than 0.274 kg.
Example problem (2)

Using the data from the preceding problem, determine the
percentage of the data that fall above 0.347 kg.
Sol.
Since Table A is a left-reading table, the solution to this problem
requires the use of the relationship: Area1 + Area2 = AreaT = 1.0000.
Therefore, Area2 is determined and subtracted from 1.0000 to obtain
Area1.
AreaT = 1.0000
-
+
= 0.024
Area1
Area2
= 0.297
Xi = 0.347
8

Z = Xi -
= 0.347 0.297
0.024
= + 2.08
From Table A it is found that for Z2 = +2.08,
Area2 = 0.9812
Area1 = AreaT Area2
= 1.0000 0.9812
= 0.0188 or 1.88%
Thus, 1.88% of the data are above 0.347 kg.
8
Example problem (3)
A large number of tests of line voltage to home residences show a

mean of 118.5 V and a population standard deviation of 1.20 V.
determine the percentage of data between 116 and 120V.
Since Table A is a left-reading table. The solution requires that the
area to the left of 116 V be subtracted from the area to the left of
120 V. The graph and calculations show the technique.
-
-
Area3
Area2
= 1.20
Area1
= 118.5
Xi = 116
Xi = 120
9

Z2 = Xi -
Z3 = Xi -
= 116 118.5
1.20
= 120 118.5
1.20
= 2.08
= + 1.25
From Table A it is found that for Z2 = -2.08, Area2 = 0.0188, and

for
Z3 = + 1.25, Area3 = 0.8944.
Area1 = Area3 Area2
= 0.8944 0.0188
= 0.8756 or 87.56%
Thus, 87.56% of the data are between 116 and 120V.
9
Example problem (4)

If it is desired to have 12.1% of the line voltage below 115 V, how
should the mean voltage be adjusted? The dispersion is = 1.20 V.
The Solution to this type problem is the reverse of the other
problems. First 12.1% or 0.1210, is found in the body of table A. This
give a Z value and using the formula for Z, we can solve for the
mean voltage. Form Table A with Area1 = 0.1210, the Z value of
1.17 is obtained.
Area1 = 0.1210
= 1.20
X0 = ?
Xi = 115

Z = Xi X0
-1.17= 115 - X0
1.20
X0 = 116.4 V
Thus, the mean voltage should be centered at

116.4 V for 12.1% of the values to be less than
115V.
Example problem (5)
The population mean of a companys racing bicycle is 9.07 kg

with a population standard deviation of 0.4 kg. If the
distribution is approximately normal, determine
A) the % of bicycles less than 8.3 kg
B) the % of bicycles greater than 10.00 kg
C) the % of bicycles between 8.3 and 10.00 kg
-
-
Area2
Area1
= 0.4
Area3
Area4
= 9.07
Xa = 8.3
Xb = 10
9
Example problem 5 (solution)

Za = Xa -
Zb = Xb -
= 8.3 9.07
0.4
= 10 9.07
0.4
= 1.925
= + 2.325
a) From Table A it is found that for Za = 1.925 , Area1 = 0.0188,

Then 1.88% of bicycles have weights less than 8.3 kg
b) and for Zb = + 2.325 Area2 = 0.9899.
Then 0.9899 of bicycles have weights less than 10 kg
Bicycles have weights more than 10 kg = 1- 0.9899
= 0.0101 or 1.01% (Area3 )

a) From Table A it is found that for Za = 1.925 , Area1 = 0.0188,
Then 1.88% of bicycles have weights less than 8.3 kg
b) and for Zb = + 2.325 Area2 = 0.9899.
Then 0.9899 of bicycles have weights less than 10 kg
Bicycles have weights more than 10 kg = 1- 0.9899
= 0.0101 or 1.01% (Area3 )
c) Area4 = Area2 Area1
= 0.9899 0.0188
= 0.9711 or 97.11%
Thus, 97.11% of the bicycles are between 8.3 and 10 kg.
Example problem (6)
Plastic strips that are used in a sensitive electronic device are manufactured to a max specifications of 305.70 mm and a min specs. of 304.55 mm. If
the strips are less than the min specs., they are scrapped; if greater than the max specs, they are reworked.. The part dimensions are normally
distributed with a population standard deviation of o.25 mm. What % of the product is scrap? What % is rework? How can the process be centered to
eliminate all but 0.1% of the scrap? What is the rework % then?
-
-
Area2
= 0.25
Area1
Xmin = 304.55
Xmax = 305.7
9

.
= Xmin + Xmax
2= 304.55 + 305.70
0.024
= 305.125

Z1 = Xmin -
= 304.55 305.125
0.25
= 2.3
From Table A it is found that for Z1 = 2.3, Area1 = 0.0107
Thus, 1.07% of the strips are scrapped.

Z2 = Xmax -
= 305.7 305.125
0.25
= + 2.3
From Table A it is found that for Z1 = + 2.3, Area2 = 0.9916
Thus, % of rework = 1- 0.9916 = 0.0084 = 0.84%.

From Table A it is found that for a % of 0.1 scrap, Z= 1.28
Z = Xi - =
1.28
Xi = 1.28
0.25 + 305.125 = LCL
= 304.81
Xav = 304.81 + 3
UCL = 304.81 + 6
0.25 = 305.56
0.25 = 306.31

305.125
Z = Xi - 306.31
=
0.25
= 4.74
From Table A it is found that for Z = 4.74 ( > 3.5 ) area = 1.0
Thus, rework % = 0

Fundamentals of Ststisitics

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Fundamentals of Ststisitics

Hochgeladen von

Copyright:

Verfügbare Formate

Fundamentals of statistics

A collection of quantitative data

The science that deals with collection,

Two phases of statistics

Determines from a limited sample of

Data may be collected by direct

Data that are collected for quality control

Examples: weight, length......

Variables that exhibit gaps are called discrete.

Are those quality characteristics that are

In data collection, the number of figures is a function of

For example, data on the life of light bulbs, it is acceptable

Your measuring instruments may not give a true reading

Accuracy and precision

Describing the data

Number of Daily Billing Errors.

Describing the data

The graphical technique is a plot or picture of a frequency

Analytical techniques summarize data by computing a measure

Sometimes both the graphical and analytical techniques are used.

Frequency Distribution Histograms

Frequency Distribution Histograms

If the "Tabulation" column is eliminated, the resulting table is

A histogram consists of a set of rectangles that represent the

Number non conforming

Frequency Distribution Histograms

Another types of graphical presentations is the relative frequency

Table 3- Relative Frequency Distributions

Fig.2 Relative frequency histogram

Number non conforming

Fig.3 Cumulative frequency

Number non conforming

Relative Cumulative frequency

Fig.4 Relative cumulative

Number non conforming

Collect data and construct a tally sheet.

The range is the difference between the

Determine the cell interval and no. of

The cell interval is the distance between adjacent cell

The cell interval ( i ) and the numbers of cells (h) are

Since h and I are both unknown, a trial and error

In general, the number of cells should be between 5 and 20.

Use 5 to 9 cells when the number of observations is less than 100;

Use 8 to 17 cells when the umber of observations is between 100

Use 15 to 20 cells when the number of observations is greater than

Another method to determine the number of cells h

Fig. 5 Cell Classification

Determine the cell midpoints.

Example problem 1 Sol.

Oil bottles weight ( kgs)

Example problem 2 Sol.

Example problem 2 Sol.

The histogram describes the variation in the process. It is used to:

Fig.6 Characteristics of Frequency

A smooth curve represents a population

Skewed to the Right

Skewed to the Left

Number of modes or peaks to the data.

the spread of data ( quite peaked or flat )

Figure 7 Differences due to location, spread, and shape

Analysis of a histogram can provide information

Fig. 8 Histogram of wash

Normal. Many measured characteristics follow a normal distribution .