Bayesian 101

Bayesian Statistics 101
Dennis W. Dubro, Ph. D.

Pacific Gas & Electric Company, Applied Technical Services
3400 Crow Canyon Road, San Ramon, CA 94583
Telephone: (925) 866-5239 Fax: (925) 866-5681
Email: dwd8@pge.com
Abstract
In recent years, there have been a lot of papers on Bayesian statistics, but in individual
discussions the fact that many people still do not have a very clear idea of exactly what Bayesian
statistics is and how it relates to classical or frequentist statistics often arises. Everyone has
heard of the concepts of prior and posterior distributions but there is a lot of confusion
regarding their proper context, use, and interpretation. This paper is intended to be a basic
introduction to Bayesian statistics beginning with examples that one might find in Wikipedia,
moving on to the single probability calculations of Consumer or Producer Risk, and graduating
to a more sophisticated example in which the posterior distribution is a full, multi-valued
probability distribution.
The intended audience of this paper is fairly general. As the abstract indicates, it attempts to
give a broad introduction to Bayesian statistics to the metrologist who has been hearing a lot
about Consumer and Producer Risk in recent years as well as Bayesian statistics. It is hoped that
this paper will put the general reader in a better position to understand the more complex papers
on Bayesian approaches.
1.
Introduction
There is a lot of talk about Bayesian statistics at this conference and many advanced papers given
on the subject. With all the talk, many of us are interested in finding out more and how it will be
relevant to our lives in metrology. Because of the complexity and breadth of the subject, it is
difficult to "jump in and hit the ground running". The purpose of this paper is to give some
introductory background and general ideas about Bayesian statistics, some of its diverse
applications, and to clarify some of the concepts.
Use of Bayes' Theorem is a defining feature of Bayesian statistical analysis, but Bayes Theorem
also has an independent life of its own in the wider world of probability and mathematics, which
is one of the confusing issues. Thus the use of Bayes Theorem is not necessarily the same as
Bayesian statistics.
A simple example of the use of Bayes' Theorem which is neither distinctly Bayesian nor
frequentist is solving a cookie jar problem. Suppose you have two cookie jars, a red one and a
blue one. You know that the red one has 15 oatmeal cookies and 10 chocolate chip cookies in it.
The blue one has 6 oatmeal cookies and 23 chocolate chip cookies in it. If an oatmeal cookie
from one of the jars is given to you, you can use Bayes Theorem to calculate the probability of
2009 NCSL International Workshop and Symposium
its having chosen from the red jar. This is an example of Bayes Theorem used to solve a purely
probabilistic problem.
Looking ahead to some of the "behind the scenes" mathematics which leads to confusion, let us
point out that the answer to this problem is a statistical distribution. The way the problem is
phrased, one is led to looking for the single probability that the cookie is chosen from the red jar.
But the answer is a discrete, binomial distribution with a probability that the cookie was chosen
the red jar and a (complementary) probability that it was chosen from the blue one.
Another confusing issue is that Bayesian statisticians use a general mathematical definition of
probability and assign probabilities to a statement or event even when no physically-repeatable
random process is involved. Probability, for a Bayesian, is a way to represent an individual's
degree of belief in a particular statement, possibly given evidence in the form of statistical data.
Each time new evidence is obtained, the current degree of belief is updated to reflect the
evidence using Bayes Theorem.
In the frequentist approach probability is only used to describe a physically-repeatable
experiment with random outcomes. For a frequentist, the probability of a random event denotes
the relative frequency of occurrence of an experiment's outcome when repeating the experiment.
When describing such experiments, a Bayesian and a frequentist would probably often both use
the same probabilistic description of the experiment, but the Bayesian would feel free to use
probabilities to describe the possible outcomes of a historical event whose exact outcome is
unknown to him or her while the frequentist would not.
Thus for an uncertainty component evaluated using Type A methods (as defined in the GUM) a
Bayesian and frequentist analysts would both adopt the same assumptions about the statistical
model of the measurement process. Sources of uncertainty that are evaluated using Type B
methods have a more Bayesian flavor, however, because the measurands associated with those
sources of uncertainty, which are often corrections to the primary measurand, are assigned
probability distributions based on belief. Almost every uncertainty analysis has sources of
uncertainty evaluated using both Type A and Type B methods, which can also contribute some
confusion between Bayesian and frequentist uncertainty analysis. This paper will begin by "fastforwarding to the chase (scene)" and giving what is hopefully a clear example of the Bayesian
process. Then it will return "to the beginning" and work through some definitions and examples
to fill out the development of the topic. In that development, Bayes Theorem will be addressed
and some uses of Bayes Theorem will be illustrated, which will hopefully be entertaining and
educational.
Bayesian methods have to do with determining a conditional probability for an event (which can
be a full continuous probability distribution, rather than a single probability), called the
"posterior", from another conditional probability for the observed data (also called a likelihood
function) and a "prior" probability of the event (or distribution) with the help of a normalizing
factor that makes the posterior probability distribution integrate to 1.
Posterior =
LikelihoodFunction * Prior
NormalizingFactor
The basis of Bayesian methods is that the prior distribution is a starting belief about the values of
the parameters in the statistical model that describes the measurement process, including the
measurand. This is in contrast to assuming that those parameters are unknown constants that are
being estimated purely as the result of statistical sampling, which is the basis of frequentist
statistics. One of the difficulties with Bayesian statistics is that there is a certain amount of
mathematical machinery grinding in the background which is not immediately apparent in the
statement of the problem.
For those already a bit familiar with Bayesian terminology, as our "cut to the chase" example,
suppose there is an observable random variable X with a Normal probability distribution
function (pdf), indicated by N( ,2), where the mean, , is itself a random variable with a
fully-specified, assumed pdf, and the variance, 2, is a known value. The distribution of the
mean, , is the prior distribution that is chosen by the analyst. In this case we will choose a
uniform distribution with endpoints that are guaranteed to encompass the value of both above
and below. Our X's come from one member of a family of Normal pdf's that are indexed by the
potential values of . As a result, they give us information about the value of that will
augment our prior knowledge. Given a sample of data X1, X2 and X3, we can use the normal
likelihood function and the prior distribution as the basis for computing the distribution
P( | X1,X2,X3,2) as our posterior distribution. This is illustrated graphically in Figure 1. At
the top of the figure is the prior distribution for the value of , chosen to be a uniform
distribution in this case. Below that are three normal distributions representing the family of
normal distributions from one of which the data will be observed.
Further down are three data points observed from the normal distribution whose true mean value
we would like to know. The last item is the posterior distribution computed using Bayes
theorem taking the prior distribution for , the likelihood function, and the data as inputs.
The fact that the posterior distribution is narrower than the prior distribution and has a different
shape is a consequence of the updating of the model with the information about the value of
that is provided by the data.
Mean
Prior
Distribution
Uniform Prior
Distribution for
3 data points
Family of
Normal
curves
3 data points
Posterior
Distribution
Posterior
Distribution
around
mean
for
0.000
5.000
Figure 1
2.
Review of Probability
Stepping back now, we will begin with a brief review of probability theory. Probability is a
number assigned to an event which can occur in a defined experiment. There are three axioms
which define the limits of probability
Axiom 1: All probabilities have values between 0 and 1.
Axiom 2: The sum of the probabilities assigned to all possible events is 1.
Axiom 3: For two mutually exclusive events A and B that are potential results of the same
experiment, the probability of A or B occurring, P(A or B) is = P(A) + P(B).
and there are other theorems for calculating probabilities for the union and intersection of
different events.
Often the experiment defined is a measurement, and the event, or outcome of the experiment, is a
number. In this case it is convenient to talk about random variables rather than events. Axiom 3
points to the introduction of more complex probability problems which often use multiple
random variables connected with a joint probability function, represented by a notation like
P(X ,Y, Z).
One key to understanding your problem is to correctly identify the relevant random variables,
and the events associated with them.
Example #1
For a deck of 52 playing cards we might have three random variables -X = Color (Red, Non-Red)
Y = Number (Face, Non-Face)
Z = Suit (Hearts, Spades, Clubs, Diamonds)
For a simple example, we will draw cards from a deck only considering Color and Number, and
we will consider two events for each.
The Joint Probability Function is P(Color,Number).
Color = R or NR (Red or Non-Red)
Number = F or NF (Face or Non-Face)
Conditional Probability is the probability of some event A, given the occurrence of some other
event B. Conditional probability is written P(A|B), and is read "the probability of A, given B".
Marginal probability is then the unconditional probability P(A) of the event A; that is, the
probability of A, regardless of whether event B did or did not occur.
As a demonstration, the Table A1 in the appendix calculates all the conditional, marginal and
joint probabilities for events of a card selection experiment. The rectangular graphic has the
character of a Venn diagram where "right-hatching" indicates Red cards, "left-hatching"
indicates Face cards, and cross-hatching indicates the intersection of the two. This is also an
example where the result of a Bayes' Theorem calculation is a discrete binomial probability
distribution. Given that a Red card is chosen, there are different probabilities assigned to the
remaining events (Face and Non-Face). If we had chosen Y = Number (2, 3, 4, ... , Q, K, A),
then given that a Red card is chosen, we would have had a discrete multinomial distribution, a
generalization of the binomial distribution for situations with more than two possible outcomes.
3.
Bayes Theorem
Bayes' theorem relates the conditional and marginal probabilities of events A and B, where B has
a non-zero probability:
P( A | B) =
P( B | A) P ( A)
P( B)
It is easy to confirm from Table A1, if the Event A is a Face card and the Event B is a Red card,
that
P(A|B) = 0.5 * 0.308 / 0.5 = 0.308. This is a simplistic example, but all the other probabilities
can be calculated and confirmed. Notice that this example is confirms the symmetry of the card
deck with respect to color. Selection of a Red or Non-Red should, and according to Bayes
Theorem, does give the same result. Most problems in Bayesian statistics do not have such
symmetry however.
The derivation of Bayes' Theorem, for all the attention it gets, is quite simple. Beginning with
some simple Venn diagram mathematics the first equation, below, is given, and some simple
rearrangement of factors, gives us Bayes' Theorem in the second equation.
P( A | B) P( B) = P ( A B) = P( B | A) P( A)
P( A | B) =
P( A B ) P( B | A) P( A)
=
P( B)
P( B)
Example 2 -- Drug testing (This example has been modified from Wikipedia, but has been
confirmed by independent calculation [A1]
Bayes' theorem can be useful in evaluating the result of drug tests. Suppose a certain drug test is
99% sensitive and 99% specific, that is, the test will correctly identify a drug user as testing
positive 99% of the time, and will correctly identify a non-user as testing negative 99% of the
time. This would seem to be a relatively accurate test, but Bayes' theorem will reveal a potential
flaw. Let's assume a corporation decides to test its employees for opium use, and 0.5% of the
employees use the drug. We want to know the probability that, given a positive drug test, an
employee is actually a drug user. Let "User" be the event the employee is a drug user and
"NonUser" indicate being a non-user. Let "+tive" be the event of a positive drug test. We need to
know the following:
P(User), or the probability that the employee is a drug user, regardless of any other
information. This is 0.005, since 0.5% of the employees are assumed to be drug users.
This is the prior probability of User.
P(NonUser), or the probability that the employee is not a drug user. This is 1 P(User),
or 0.995.
P(+tive|User), or the probability that the test is positive, given that the employee is a drug
user. This is 0.99, since the test is 99% accurate.
P(+tive|NonUser), or the probability that the test is positive, given that the employee is
not a drug user. This is 0.01, since the test will produce a false positive for 1% of nonusers.
P(+tive), or the probability of a positive test event, regardless of other information. This
is 0.0149 or 1.49%, which is found by adding the probability that a true positive result
will appear (= 0.99 x 0.005 = 0.00495) plus the probability that a false positive will
appear (= 0.01 x 0.995 = 0.00995).
Given this information, we can compute the posterior probability P(User|+tive) of an employee
who tested positive for actually being a drug user:
P User + tive =
P User + tive =
P + tive User * P User

P + tive
P + tive User * P User
P + tive User * P User + P + tive NonUser * P NonUser
P User +tive =
0.99* 0.005
0.33
0.99* 0.005 + 0.01* 0.995
If you test positive, there is only a 33% chance that you are a drug user
That is, even if you test positive, chances are you are NOT actually a drug user! Table A-2 in
the Appendix spells out all the probabilities.
Example #3 - Risk in Calibration
Suppose we buy a Digital Voltmeter (DVM) and send it to a calibration lab. There is some
chance that the unit will be accepted, but still be bad (i.e., out of tolerance) due to the uncertainty
of the standards used in calibration. This is called false accept risk -- P(Accept | Bad) -- or
Consumer Risk. There is also a chance that the unit will be rejected and still be good, due to the
uncertainty. This is called false reject risk -- P(Reject | Good) -- or Producer Risk.
The
Bayesian questions of importance are what is the probability that an instrument is bad, given that
it was accepted, P(Bad | Accept), and the probability that it was good when it was rejected,
P(Good | Rejected).
The impact of the uncertainty of the standards used in calibration is illustrated in the following
sequence of figures.
Under ideal conditions, both the

Instrument Under Test (IUT) and the
standard would be in perfect
agreement, given their respective
uncertainties, as in Figure 2A. In this
case, the standard is 4 times more
accurate than the UUT and the
acceptance criteria are shown as
dashed lines at 2 standard deviations.
Figure 2A -- At calib.
0.5
0.4
0.3
0.2
UUT
0.1
STD
0.0
-6
-4
-2
Std Deviations
At a later time, the IUT drifts to the

tolerance limit. The uncertainty of the
IUT and the standard would be
represented by Figure 2B.
Figure 2B -- IUT at TOL Limit

0.5
0.4
0.3
0.2
UUT
0.1
STD
0.0
-6
-4
-2
Std Deviations
If the IUT drifts slightly out of

tolerance, as shown in Figure 2C, where
the peak of the UUT curve is more than
2 standard deviations from the peak of
the Standard curve, Consumer's Risk is
illustrated by the hatched area which
indicates the probability that the UUT is
still "called" in tolerance.
Figure 2C -- Note Consumer's Risk

0.5
0.4
0.3
0.2
UUT
0.1
STD
0.0
-6
-4
-2
Std Deviations
If the IUT drifts slightly in tolerance,

as shown in Figure 2D, where the peak
of the UUT curve is less than 2
standard deviations from the peak of
the Standard curve, Producer's Risk is
illustrated by the hatched area which
indicates the probability that the UUT
is still "called" out of tolerance.
Figure 2D -- Note Producers Risk

0.5
0.4
0.3
0.2
UUT
0.1
STD
0.0
-6
-4
-2
Std Deviations
Integrating over all out of tolerance

conditions to calculate the complete
Consumer's risk is illustrated in Figure
2E. A similar calculation integrating all
the components of Producer risk would
calculate the total Producer's risk.
Figure 2E -- Summation gives total

Consumer Rish
The relationships among the various risk probabilities are illustrated in Table A-3 of the
Appendix. In the case of the Consumer Risk, Castrup [A2] calls the false accept risk an
Unconditional False Accept Risk, UFAR (or Producers Option), since this is what the Producer
would see at the end of his production line before he releases product to the market. Items that
fail are sent back for adjustment and resubmitted to calibration before being released. This is
called the Conditional False Accept Risk, CFAR (or Consumer's Option), since the consumer
only sees the risk after it has been evaluated twice. CFAR = UFAR/P(Accept), which is
represented by the ratio of two vertical bars in Table A-3. Castrup [A2] may also be reviewed
for some of the mathematical details of the calculations.
All of the above examples resulted in a binomial distribution that had one probability for an
event and the complementary probability for a non-event. The following example moves into a
problem with three possibilities.
Example #4 is the famous 3-door problem from the TV show Let's Make A Deal hosted by
Monty Hall. In addition to Wikipedia, there are many web-sites addressing the problem.
Numerous PhD's have weighed in on both sides of the problem and it has received national
recognition by columnist and math-wizard Marilyn vos Savant. There are three doors. There is
a prize behind one door. No matter which door the contestant picks, the host opens a second
door and gives the contestant the chance to choose the other closed door. The big question is,
whether the contestant should stand firm or choose the other door? The answer is that the
contestant will always increase her probability of winning if she changes from her first choice
and selects the remaining door, assuming a uniform prior probability that the prize could be
behind any one of the three doors.
To set up the mathematical side of the problem, let's assign colors to the doors. They are B
(Blue), R (Red) and Y (Yellow). P(WinB), P(WinR), P(WinY) are the probabilities that the prize
is behind the respective door. These three probabilities are the parameters of a multinomial
probability distribution. These probabilities are initially taken to be 1/3 each, unless the
contestant has different prior beliefs about which door conceals the prize. We will assume that
the contestant accepts the uniform prior distribution, however, since that seems like a reasonable
choice.
The door that Monty opens is the data that the contestant can use to update her prior probabilities
of winning. P(B) is the probability that Monty opens the Blue door, P(R) the Red door, etc. This
probability can be 1/2 or 1 depending upon which door the contestant has chosen and which door
conceals the prize. If the prize is behind the door selected by the contestant, Monty has two
choices; if not, he only has one door to choose from. In this example, the contestant arbitrarily
chooses the Red door. Analogous results would follow from the contestants choice of any other
door, so we will only show the computation of the posterior distribution for this choice.
The following calculations give the multinomial posterior probability distribution for which door
conceals the prize if the host opens the Blue door. A similar set of computations applies to the
Yellow door (Montys only other choice in this case), with the same conclusion.
P Win B B =
P Win R B =
P Win Y B =
P B Win B * P Win B
PB
P B Win R * P Win R
PB
P B Win Y * P Win Y
PB
0*
1
2
1
3 =0
1 1
*
1
=2 3=
1
3
2
1
2
= 3=
1
3
2
1*
The reason that it is always in the contestant's best interest to choose the other available door
(under the assumption of a uniform prior distribution) is the hidden mathematical machinery
behind the game problem. If the contestant first chooses the winning door, the host is free to
open either of the other doors. However if the contestant chooses a non-winning door, the host is
constrained to open only the other non-winning door. The asymmetry of the host's choice in
opening a door is not immediately apparent to the observer, but it has a definite influence on the
conditional probabilities of winning.
Example #5 -- Full-blown Bayesian Example
An engineer is developing an automated tire pressure monitor to alert the driver if the pressure in
the spare tire is too low or too high. He knows that the spare tire should be inflated to 240 kP
( 35 lb/in2) and plans to implement the system so that whenever there is a significant
probability that the true pressure in the tire could be as low as 235 kP or as high as 245kP, it will
alert the driver.
To minimize the effect of random measurement error, he plans to design the system so that it will
base its assessment of the true tire pressure on 3 independent readings each time the car is
started. Before settling on the use of 3 measurements for the final system, however, he decides to
assess the uncertainty in the pressures that might be reported under different conditions using a
prototype system to ensure that the system will work as intended.
Because small variations in tire pressure are not practically significant, the engineer determines
that it will be sufficient to determine the tire pressure rounded to the nearest 1 kP when doing
these tests. Similarly, the pressure gauge the engineer is using also reads to the nearest 1 kP.
Note, to reduce its complexity, this example is being "incrementalized" with a small number of
discrete values, so that the calculation can be demonstrated in the spreadsheet in the Appendix
Table A4. As such, the distributions shown will be jagged rather than smooth. If you look at
Columns [1] and [2] in Table A4 and its plot above the columns, the curve of errors in the
pressure gauge, which in real life would probably be a Normal distribution function, only has
seven points. But with this approach the reader should be able to work through the calculation
without worrying about integrals and more complicated mathematics.
Due to its design and calibration, the pressure gauge is capable of determining the tire pressure
accurately within a range of 3 kP regardless of the actual tire pressure, temperature, tire
condition, etc. Based on his characterization of the pressure gauge in the lab, the engineer also
knows that measurement errors in the tire pressure readings follow the probability distribution
given in Columns [1] and [2] in the Appendix Table A4, and plotted above the columns.. Note
that the prior distribution does not need to extend below 235 kP or above 245 kP because the
system will also alarm if any readings below 234 kP or above 246 kP are observed.
Because the pressure measurements will be drawn from a discrete distribution and the sample
size will be relatively small, the engineer decides to assess the uncertainty using Bayesian
methods. As with any type of uncertainty analysis, a statistical model that describes the
relationship between the measurements and the measurand will be required. In addition, in order
to carry out a Bayesian analysis, the engineer also must specify a prior distribution for the tire
pressure as part of the statistical model. The prior distribution will describe the systems
knowledge of the tire pressure before any data is collected.
Since the engineer knows that this system will ultimately be used under a wide range of
conditions that could change suddenly, he chooses a prior distribution that spreads the
probability evenly over all of the tire pressures in the range of interest, as shown in Columns [3]
and [4] of Table A4 with the graphic above the columns. A relatively non-informative prior
distribution like this specifies little specific knowledge about the tire pressure prior to making
pressure measurements.
Based on his understanding of the behavior of the pressure gauge, the statistical model that the
engineer adopts for each pressure measurement is
PO = pT + E
where PO is a reading from the pressure gauge, pT is the true, unknown tire pressure at the time
that measurements are being made, and E is a random measurement error drawn from the
probability distribution given in Columns [1] and [2]. In addition to identifying different
quantities, the notation used here also indicates whether a quantity is a random variable (UPPER
CASE) or is a particular, realized value (lower case) and whether or not the quantity is directly
observable (regular face or italic, respectively). In this case, for example, the measurement errors
are random variables that can never be directly observed, the pressure of the tire is a fixed,
unknown value, and the readings from the pressure gauge are random values that will be
observed when the gauge is used.
The results of the uncertainty analysis will be based on the posterior probability distribution of
the random variable PT. This distribution is denoted P(PT = pT | PO = p O ) where the vertical bar
in the probability function indicates a probability conditional on the quantities listed after the
vertical bar. The posterior distribution is constructed from the prior probability distribution on
PT and the likelihood of the data, P(PO = p O | PT = pT ) using Bayes Theorem
P(PT = pT | PO = p O ) =
P(PO = p O | PT = pT )P(PT = pT )
.
[P(PO = p O | PT = pT )P(PT = pT )]
For one of the engineers tests, the observed tire pressures were 242 kP, 241 kP, and 239 kP.
The likelihood of the tire pressure values based on each of the three measurements is shown in
Columns [5abc] and [6abc]. These are computed by determining the values of e that must be
associated with each observed pressure given the value of the true pressure. Then, based on that
value, the associated likelihood is looked up in Column [2]. Based on the performance of the
pressure gauge, it is known that the measurement error has zero probability of being less than -3
kP or more than 3 kP.
Because the engineer is willing to assume that the replicate pressure measurements used to
determine the tire pressure are independent of one another (conditional on the value of pT ), the
joint likelihood for the set of three measurements will be the product of the individual
likelihoods which is shown in Column [7].
The application of Bayes Theorem is illustrated in Columns [8] and [9]. .Column [8] gives the
product of likelihood of the parameter values and their prior probabilities, which are used in the
computation. The posterior probability for each value of tire pressure is computed by
normalizing Column [8] and is shown in Column [9]. The portion of the posterior distribution
with non-zero probability is also plotted above Column [9].
After obtaining the posterior distribution for the pressure, the engineer computes an approximate
90% highest posterior density uncertainty interval for the tire pressure. This can be done by
sorting the possible tire pressure values from highest to lowest probability and cumulatively
summing the values until a total probability as close as possible to 90% is obtained. In this case
the highest density points that define the expanded uncertainty interval are those between 240 kP
and 242 kP, and have a total probability of 87.1%. These points are enclosed in a box. Note: The
reason that an exact 90% uncertainty interval cannot be obtained is because of the discreteness of
the posterior distribution. If the distribution were continuous, a highest posterior density region
with exactly 90% probability could be found.
Based on a series of tests throughout the pressure range of interest, the engineer determined that
the typical length of an approximate 90% uncertainty intervals obtained using 3 measurements
was less than 3 kP1.0 lb/in2, which he deemed small enough to allow sufficient ability to
differentiate between different tire pressures and large enough so that uncertainty due to the
resolution of the measurements was not significant.
In a short-hand, graphical manner this example can be summarized in the following form: We
have a distribution of errors from the pressure gauge which is somewhat bell-shaped (the
likelihood function). It is the probability of obtaining a specific observed reading, given a true
pressure, P(Observed | True). We initially assume that the true tire pressure distribution is a
uniform distribution from 235 to 245 kP, around the manufacturers recommended pressure of
240 kP. (This is the a priori distribution -- P(True Pressure)). From these values we calculate the
probability of any true tire pressure given the observed tire pressures, P(True | Observed).
0.30
0.50
0 . 10
0.40
0.20
0.08
0.30
0.06
0.20
0 . 10
0.04
0 . 10
0.02
0.00
0.00
220
230
240
250
260
P True Observed =
4.
0.00
-5
34.0
34.5
35.0
35.5
P Observed True * P True

P Observed
Discussion
In the more general picture, Bayes' Theorem can be written as
P( H | E ) =
P( E | H ) * P( H )
P( E )
36.0
where P(H) is the prior probability that a certain hypothesis is true. The likelihood function,
P(E | H) is the probability that particular experimental evidence is found, given the hypothesis.
The posterior probability is the probability that the hypothesis is true, given the evidence that
was found. This has epistemological implications since the conclusions drawn from the Bayesian
process are the result of the subjective judgment of the observer in determining the likelihood
function and the prior probability.
5.
Summary
We have given a short review of basic probability concepts, discussed Bayes' Theorem and
presented five Bayesian examples ranging from the simple to relatively complex. We have
discussed the terms "frequentist" and "Bayesian", which are analogous terms, showing them not
to be mutually exclusive or equally applicable to all types of problems. We have also
distinguished between simple, purely probabilistic applications of Bayes' Theorem and the more
complex applications of Bayesian statistical analysis. Based on this discussion, it is hoped that
the audience will be in a better position to appreciate other presentations on Bayesian concepts.
6.
Acknowledgments
The author would like to acknowledge Will Guthrie of the NIST Statistical Engineering Division
for lively discussions on Bayesian uncertainty analysis, sharing an initial version of the tire
pressure gauge example, and editorial comments on the manuscript.
7.
References
[A1]
http://en.wikipedia.org/wiki/Bayesian_inference, February 2009
[A2] Castrup, H.: Risk-Based Control Limits, Proc. Measurement. Science Conf., Anaheim,
CA, January 2001.
Appendix
Table A1 -- Example 1 -- Playing cards -- numbers of cards in each category and their
probabilities
Input Numbers in each category
Black
Red
Non-Face
18
18
Face
8
8
Total
52
Numbers and Probabilities in each category
18
0.346
8
0.154
8
0.154
26
Black
0.500
26
Red
0.500
Joint Probabilities
P(Color, Type)
P( R, F )
0.154
P( NR, F )
0.154
P( R, NF )
0.346
P( NR, NF ) 0.346
1.000
36
18
0.346
16
A
A'
B
B'
Face
0.308
Marginal Probabilities
Face
0.308
Non-Face
0.692
Red
0.500
Black
0.500
Face
Non-Face
Red
Black
Non-Face
0.692
Numbers
26
26
16
36
Union
Intersection
0.654
0.154
Conditional Probabilities
P(Face | Red)
P(Face | Black)
P(Non-Face | Red)
P(Non-Face | Black)
P(Red | Face)
P(Red | Non-Face)
P(Black | Face)
P(Black | Non-Face)
0.308
0.308
0.692
0.692
0.500
0.500
0.500
0.500
1.000
Total
1.000
Total
52
Total
52
Total
Table A2 --Example 2 --Drug Testing Example

NonUser
User
- tive
985.05
0.05
+ tive
9.95
4.95
Total
1000
Probabilities in each category
0.985
0.005
NonUser
0.995
User
0.005
0.015
0.005
+ tive | User
+ tive | NonUser
- tive | User
- tive | NonUser
User | + tive
User | - tive
NonUser | + tive
NonUser | - tive
0.990
0.010
0.010
0.990
0.332
0.000
0.668
1.000
- tive
0.985
0.00005
0.010
Union
Intersection
+ tive
0.015
+ tive
- tive
User
NonUser
+ tive
- tive
User
NonUser
5
995
14.9
985.1
0.015
0.985
0.005
0.995
Total
Total
1000
Total
1000
Total
Table A-3 -- Example 3 -- Calibration Risks

Rejected Acceped
Bad
10
20
Good
30
40
Total
100
Numbers and Probabilities in each category
10
0.100
30
20
0.200
30
0.300
40
0.400
40
Rejected
0.400
60
Acceped
0.600
0.900
0.400
Good | Acceped
Good | Rejected
Bad | Acceped
Bad | Rejected
Acceped | Good
Acceped | Bad
Rejected | Good
Rejected | Bad
0.667
0.750
0.333
0.250
0.571
0.667
0.429
0.333
Bad
0.300
70
Good
0.700
A
A'
B
B'
Good
Bad
Acceped
Rejected
0.700
0.300
0.600
0.400
Good
Bad
Acceped
Rejected
60
40
70
30
Union
Intersection
Total
Total
100
Total
100
Total
In this simplified table, the columns are headed [1] to [9]. Below the column headings are the formulae, which compose the column,
when applicable. E.g. Column [5a] is calculated as 242 - Column [3]. Graphs of some of the more significant columns are displayed.
Table A-4 Illustration of Bayesian calculation for Example 5

Bayesian 101

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Bayesian 101

Hochgeladen von

Copyright:

Verfügbare Formate

Bayesian Statistics 101

Dennis W. Dubro, Ph. D.

2009 NCSL International Workshop and Symposium

2009 NCSL International Workshop and Symposium

2009 NCSL International Workshop and Symposium

2009 NCSL International Workshop and Symposium

P + tive User * P User

P + tive User * P User + P + tive NonUser * P NonUser

2009 NCSL International Workshop and Symposium

Under ideal conditions, both the

At a later time, the IUT drifts to the

Figure 2B -- IUT at TOL Limit

If the IUT drifts slightly out of

Figure 2C -- Note Consumer's Risk

2009 NCSL International Workshop and Symposium

If the IUT drifts slightly in tolerance,

Figure 2D -- Note Producers Risk

Integrating over all out of tolerance

Figure 2E -- Summation gives total

2009 NCSL International Workshop and Symposium

2009 NCSL International Workshop and Symposium

2009 NCSL International Workshop and Symposium

P Observed True * P True

In the more general picture, Bayes' Theorem can be written as

2009 NCSL International Workshop and Symposium

http://en.wikipedia.org/wiki/Bayesian_inference, February 2009

2009 NCSL International Workshop and Symposium

Numbers and Probabilities in each category

2009 NCSL International Workshop and Symposium

Table A2 --Example 2 --Drug Testing Example

Probabilities in each category

2009 NCSL International Workshop and Symposium

Table A-3 -- Example 3 -- Calibration Risks

Numbers and Probabilities in each category

2009 NCSL International Workshop and Symposium

2009 NCSL International Workshop and Symposium

Table A-4 Illustration of Bayesian calculation for Example 5

Das könnte Ihnen auch gefallen