Sie sind auf Seite 1von 41

Chapter 2

Probability and Statistics


Introduction
• How close are these values taken from our
data set to the actual average size and
variation?
• Are these values exactly similar if we choose
another data set?
• What is the mean value of the set?
• Do the variations meet tolerances?
• How good are these results?
Introduction
• For a given set of measurements, one needs
to:
1. Quantify a single representative value that best
characterizes the average value
2. A representative value that characterizes the
variation in the data set
3. How well this average value represents the true
average
Statistical Measurement Theory
• In this chapter we are concerned with
quantifying the random errors and their
effects
• Systematic errors will be dealt with on the
following chapter
Statistical Measurement Theory
• Assuming that x‘ is the true value of a
measured variable x, then the true value of x
is the average of all possible values of x as the
number of experiments, N, approaches infinity
x'  x  u x ( P%)

• Where ux is the uncertainty interval due to


both random and systematic errors.
Probability Density Function, p(x)
• For any measurement, a random scatter in the
measured values will occur regardless of the
care taken during the experiment.
• If a variable is continuous in time and space,
then it is a ‚continuous random variable‘
• A variable which is represented by discrete
values while time and space are continous, is
called ‚discrete random variable‘
Probability Density Function, p(x)
• Central tendency of a random variable: is the tendency of a
variable to lie near to (or to be found more often in) one
interval, a sample of a random variable x exhibiting such
central tendency is given in the following table
i xi i xi
1 0.98 11 1.02
2 1.07 12 1.26
3 0.86 13 1.08
4 1.16 14 1.02
5 0.96 15 0.94
6 0.68 16 1.11
7 1.34 17 0.99
8 1.04 18 0.78
9 1.21 19 1.06
10 0.86 20 0.96
Probability Density Function, p(x)
• The measured values of this variable are plotted on a single axis as
shown in Figure below

• There exists a value on the axis where data tend to gather, this is
common for most measured continuous random variables
• Finding the convenient number of intervals for a continuous
random variable, can be achieved by implementing the following
formula:
K  1.87( N  1)0.4
1
• Where K is the number of intervals and N is the number of
data points.
Probability Density Function, p(x)
Compute the histogram and frequency distribution for the data given on previous table

Solution:

For N= 20  K = 7

j Interval nj fj = nj/N
1 0.65 ≤xi< 0.75 1 0.05
2 0.75 ≤xi< 0.85 1 0.05
3 0.85 ≤xi< 0.95 3 0.15
4 0.95 ≤xi< 1.05 7 0.35
5 1.05 ≤xi< 1.15 4 0.20
6 1.15 ≤xi< 1.25 2 0.1
7 1.25 ≤xi< 1.35 2 0.1
Probability Density Function, p(x)
• The probability density function defines the probability that a
measured variable might assume a particular value upon any
individual measurement.
• It also provides the central tendency of the variable, which is
the desired representative value that gives the best estimate
of the true mean value.
• There are a number of standard distribution shapes that
suggest how a variable will be distributed on the probability
density plot.
• Experimentally determined histograms are used to determine
which type of standard distribution the measured variable
tends to follow.
Probability Density Function, p(x)
• The assumption that the mean value of the
measurement yields the true value x’ as N  or T 
N T
1
x'  lim
N
x i
1
x'  lim  xi dt
N  i 1 T  T 0

• The width of the density function reflects the data


vairation, the true variance is given by
N T
1 1
 2 lim
N  N
 i
( x
i 1
 x ' ) 2
 2 lim  ( x(t )  x' ) 2 dt
T  T 0

• The standard deviation,    2


Infinite Statistics
• The most common distribution found in measurements is
the normal (Gaussian) distribution, which has the
following form 1  ( x  x' ) 2 
p ( x)  exp   1 / 2 
 2   2

• The exact form of p(x) depends on the values of x’ and .
0,9

0,8

0,7

0,6

0,5

0,4

0,3

0,2

0,1

0
0 1 2 3 4 5 6 7
Infinite Statistics
• The probability (P%) that any future measurement will lie
within a certain interval x'x , is the area under the
curve p(x). This area is found by integration:
x ' x
P( x'x  x  x'x)   p( x)dx
x ' x
Infinite Statistics
• Assuming
x  x' x1  x' x
 z1  
  
• The integral can be written as
z1
1
e
 2 / 2
P( z1    z1 )  d
2  z1

 1 z1   2 / 2 
 2 
 2 0
e d 

  
thisbracketis foundin tables
Infinite Statistics
• To find the probability, P%, that a measured value, xi, lies
between
xi  x'x  x' z1 ( P%)
Can be found by integrating the function p(x) between the
limits from –z1 to z1.
Infinite Statistics
• Example 1:
Find the probability, P%, that a measurement xi will lie between a
value x’.
Infinite Statistics
• Example 2:
Find the probability, P%, that a measurement xi will lie between a
value x’z1, when z1=2 & 3
Infinite Statistics
• Example 3:
It is known that the statistics of a well-defined voltage signal are
given by x’= 8.5 V and 2=2.25 V2. If a single measurement of the
voltage signal is made, determine the probability the measured
variable will lie between 10.0 & 11.5 V.
Finite Statistics
• Finite statistics means that only a finite number of measurements,
N, are to be made.
• The sample mean, x , is given by:
N
1
x
N
x
i 1
i

• And the sample variance, S x 2 , is given by:


1 N 1 N
  i    i 
2 2 2
Sx ( x x ) ( x x )
N  1 i 1  i 1
• And the standard deviation of the sample, S, is given by:
1/ 2
 1 N 2
Sx  Sx   ( xi  x) 
2

 N  1 i 1 
Finite Statistics
• Analogous to infinite statistics a measurement, xi , can lie within
the interval:
xi  x  t , P S x ( P%)
• The value of t estimator is given in Student’s t Distribution table
• The obtained standard deviation is a sample standard deviation, in
order to obtain the standard deviation of the means,S x , is given
by: Sx
Sx 
N
• Hence the true mean value might lie within the range
x  t , P S x ( P%)
• Or the true mean, x’, is equal to (in the absence of systematic
error):
x'  x  t , P S x ( P%)
Finite Statistics
• Example 4
Consider the data given in Table.
a) Compute the sample statistics for this data set
b) Estimate the interval of values over which 95% of the
measurements of the random continuous variable (measurand)
should be expected to lie.
c) Estimate the true mean value of the measurand at 95% probability
based on this finite data set.
i xi i xi
1 0.98 11 1.02
2 1.07 12 1.26
3 0.86 13 1.08
4 1.16 14 1.02
5 0.96 15 0.94
6 0.68 16 1.11
7 1.34 17 0.99
8 1.04 18 0.78
9 1.21 19 1.06
10 0.86 20 0.96
Finite Statistics
• Example 4 (contd)

j Interval nj fj = nj/N
1 0.65 ≤xi< 0.75 1 0.05
2 0.75 ≤xi< 0.85 1 0.05
3 0.85 ≤xi< 0.95 3 0.15
4 0.95 ≤xi< 1.05 7 0.35
5 1.05 ≤xi< 1.15 4 0.20
6 1.15 ≤xi< 1.25 2 0.1
7 1.25 ≤xi< 1.35 2 0.1
Finite Pooled Statistics
• Replications are independent estimates of the same measured
value. Combining data from replications leads to better statistical
estimates of a measured variable
• Samples that are grouped in a manner so as to determine a
common set of statistics are said to be pooled.
• The pooled mean of samples for M replications is denoted as x ,
M
and is computed as follows: N xj j
j 1
x  M

N j 1
j

• And the pooled standard deviation, Sx , is given by:


M


j 1
j (S x j )2
Sx  M

N
j 1
j
Finite Pooled Statistics
• The pooled standard deviation of the means
Sx
Sx 
M

N
j 1
j
Chi-Squared Distribution
Precision Interval in Sample variance
• Analogous to standard deviation of the means, or how well the
sample mean value, x , estimates the true mean value, x’, here we
need to determine how well the sample variance, Sx2, represents
the true variance, 2.
• The Chi-squared probability density function, p(2), is used to
determine how well S2 predicts 2
• For normal distribution chi-squred, 2, is given as follows
 2   S x2 /  2
• Where  is the degrees of freedom
• The precision interval of the sample variance 2 can be determined
by the probability statement
P( 12 / 2   2  2 / 2 )  1  
• Where  is the level of significance
Chi-Squared Distribution
Precision Interval in Sample variance

• Rearranging terms
P( S x2 / 2 / 2   2   S x2 / 12 / 2 )  1  

• For 95% precision interval by which Sx2 estimates 2, is given by

P( S x2 /  02.025   2   S x2 /  02.975) (95%)


Chi-Squared Distribution
Precision Interval in Sample variance

• Example 5,
• Ten steel specimens are tested from a large batch, and a sample
variance of 40 000 (kN/m2)2 is found. State the true variance
expected at 95% confidence.

• Note: as N gets larger, the interval gets narrower


Chi-Squared Distribution
Precision Interval in Sample variance
• Example 6
A manufacturer knows from experience that the variance in the
diameter of the roller bearings used in their bearings is 3.15 m2.
Rejecting bearings drives up the unit cost. However, they reject any
batch of roller bearings if the sample variance of 20 pieces selected
at random exceeds 5 m2. To which extent does the chosen sample
variance represent the true variance?
Chi-Squared Distribution
Goodness of Fit
• Chi-squared test is also used to determine how good a normal
distribution fit is applicable to measured data
• Here, the chi-squared test provides a measure of the discrepancy
between the measured variation of a data set and the variation
predicted by the assumed distribution function.
• Construct a histogram of K intervals from a data set of N
measurements.
• The number of occurrences, nj, that the measured value lies within
the jth interval is determined
• Calculate the degrees of freedom =K-m, where m is the number of
restrictions imposed.
Chi-Squared Distribution
Goodness of Fit
• Calculate the predicted number of occurrences, n’j, to be expected
from the function for the jth interval
• Calculate chi-squared value
(n j  n 'j ) 2
  j
2

n 'j
• The lower the value of chi-square, the better a data set fits
assumed distribution function.
Chi-Squared Distribution
Goodness of Fit
• Criteria:
– A very good fit for :
P(2)< 5% i.e.  > 95%
– Non decisive for

5%O P(2) O 95%


– Poor fit for:
P(2) > 95 %
• Data that fit too well to a given distribution should be suspected of
being constructed
Chi-Squared Distribution
Goodness of Fit
• Example 7
• Test the hypothesis that the variable (x) as given by the measured
data of given in table is described by a normal distribution.

i xi i xi
1 0.98 11 1.02
2 1.07 12 1.26
3 0.86 13 1.08
4 1.16 14 1.02
5 0.96 15 0.94
6 0.68 16 1.11
7 1.34 17 0.99
8 1.04 18 0.78
9 1.21 19 1.06
10 0.86 20 0.96
Chi-Squared Distribution
Goodness of Fit
j Interval nj fj = nj/N
1 0.65 ≤xi< 0.75 1 0.05
2 0.75 ≤xi< 0.85 1 0.05
3 0.85 ≤xi< 0.95 3 0.15
4 0.95 ≤xi< 1.05 7 0.35
5 1.05 ≤xi< 1.15 4 0.20
6 1.15 ≤xi< 1.25 2 0.1
7 1.25 ≤xi< 1.35 2 0.1
Data Outlier Detection
• The three-sigma test is used to detect
measurement outliers.
• Outliers are points lying outside the interval
xi  x  t ,99.8 S x P(99.8%)
• Compute the value of
xi  x
z0 
Sx
• Find the value of P(z0) from the normal
distribution table.
Data Outlier Detection (contd)
• The point is an outlier, if

0.5  P( z0 )  0.1
Data Outlier Detection (contd.)
• Example 8
– Consider the data given below for 10 measurements of tire pressure
made using a hand-held gauge (Notew:14.5 psi = 1 bar). Compute the
statistics of the data set; then test for outliers using the modified
three-sigma test.

i 1 2 3 4 5 6 7 8 9 10

xi [psi] 28 31 27 29 28 24 29 28 18 27
Number of Measurements Required
• This part is concerned with determination of the number of
measurements, N, which are required to reduce the random
error within an acceptable limit.
x'  x  t , P S x ( P%)

confidence int erval
• The confidence interval, CI, is:

Sx
CI  t , P S x   t , P ( P%)
N
• The one sided precision valued, d:
CI Sx
d  t , P ( P%)
2 N

• Therefore:
2
 t , P S x 
N    ( P%)
 d 
• Example 9:
– From 51 measurements of a variable, the standard
deviation is found to be 160 units. For a 95%
confidence interval of 60 units of the mean value,
estimate the total number of measurements
required.
Thank You
Term Reports
1. Pressure & Velocity Measurements, scientific
ethics
2. Temperature Measurements, Health, safety and
environment
3. Flow measurements, Health, safety and
environment
4. Force, torque and strain measurement, scientific
ethics
5. Air pollution measurements and sampling,
Health, safety and environment

Das könnte Ihnen auch gefallen