Bayesian Theory slide

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

24 Aufrufe

Bayesian Theory slide

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

- Numbers Are Us: Number Sense and Its Effects on Life
- Lecture16 Module3 Anova 1
- Ma 629
- James & McCulloch 1990
- Co Integration
- IEEEXplore_6
- ip sir
- Understanding Sensitivity BMJ Article
- aait-bn-pe
- CR Lower Bound Fisher Info
- Lab4_Bayesiannetworks
- RCBD
- Kwon Lippman2011
- Tenenbaum
- Edwards2 Chap One
- Igor Douven, Abduction
- Predicting Malaysian Road Fatalities for Year 2020
- UCM341359_Andy_slide.pdf
- Statistics Sample 9
- ONE-WAY Model in lm

Sie sind auf Seite 1von 45

Decision Theory

Instructor: Kathryn Blackmond Laskey

Room 2214 ENGR

(703) 993-1644

Office Hours: Wednesday and Thursday 4:30-5:30 PM,

or by appointment

Spring 2015

Bayesian Inference and Decision Theory

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 1 -

You will learn a way of thinking about problems of inference

under uncertainty

You will learn to construct mathematical models for inference

and decision problems

You will learn how to apply these models to draw inferences

from data and to make decisions

These methods are based on Bayesian Decision Theory, a

formal theory for rational inference and decision making

Spring 2015

Unit 1 (v5) - 2 -

Logistics

Web site

http://seor.gmu.edu/~klaskey/SYST664

Blackboard site: http://mymason.gmu.edu

Other recommended texts on course web site

Some assignments can be done in Excel

We will use R, a free open-source statistical computing environment:

http://www.r-project.org/. R code for many textbook examples is on authors web site

We will use JAGS, an open-source package for Markov Chain Monte Carlo simulation:

http://mcmc-jags.sourceforge.net/

Requirements

Regular assignments (20%): can be handed in on paper or through Blackboard

Take-home midterm (30%) and final (30%)

Project (20%): apply methods to a problem of your choosing

Office hours

Official office hours are 4:30-5:30PM Wednesdays and Thursdays

I respond to questions by email and am available by appointment

Academic integrity policy

Read the policies and resources section of the syllabus

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 3 -

Course Outline

Unit 1: A Brief Tour of Bayesian Inference and Decision Theory

Unit 2: Random Variables, Parametric Models, and Inference

from Observation

Unit 3: Statistical Models with a Single Parameter

Unit 4: Monte Carlo Approximation

Unit 5: The Normal Model

Unit 6: Gibbs Sampling

Unit 7: The Multivariate Normal Model

Unit 6: Bayesian Regression and Analysis of Variance

Unit 8: Hierarchical Bayesian Models

Unit 9: Linear Regression

Unit 10: Metropolis-Hastings Sampling

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 4 -

Describe the elements of a decision model

Refresh knowledge of probability

Apply Bayes rule for simple inference and decision problems

and interpret the results

Use a graph to express conditional independence among

uncertain variables

Explain why Bayesians believe inference cannot be separated

from decision making

Compare Bayesian and frequentist philosophies of statistical

inference

Compute and interpret the expected value of information (VOI)

for a decision problem with an option to collect information

Spring 2015

Unit 1 (v5) - 5 -

Bayesian Inference

Bayesians use probability to quantify rational degrees of belief

Bayesians view inference as belief dynamics

Begin with prior beliefs

Use evidence to update prior beliefs to posterior beliefs

Posterior beliefs become prior beliefs for next evidence

usually embedded

in decision problems

Spring 2015

Unit 1 (v5) - 6 -

Decision Theory

Decision theory is a formal theory of decision making under

uncertainty

A decision problem consists of:

Possible actions: {a}aA

States of the world (usually uncertain): {s}sS

Possible consequences: {c}cC (depends on state and action)

Answer (according to decision theory):

Measure goodness of consequences with a utility function u(c)

Measure likelihood of states with probability distribution p(s)

Best action with respect to model maximizes expected utility:

a *= argmax{E[u(c) | a]}

For brevity, we may write E[u(a)] for E[u(c) | a]

a

Caveat emptor:

How good it is for you depends on fidelity of model to your beliefs

and preferences

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 7 -

Illustrative Example

(Highly Oversimplified)

We suspect she may have disease but do not know

Without treatment the disease will lead to long illness

Treatment has unpleasant side effects

Decision model:

Actions: aT (treat) and aN (dont treat)

States of world: sD (disease now) and sW (well now)

Consequences: cWN (well shortly, no side effects), cWS (well shortly, side

effects), cDN (disease for long time, no side effects)

Probabilities and Utilities:

P(sD) = 0.3

u(cWN) = 100, u(cWS) = 90; u(cDN) = 0

u(cWN)

u(cWS)

100

90

EU(aN)

70

u(cDN)

Expected utility:

Treat:

.390 + .790 =

+ .7100 =

90

70

Kathryn Blackmond Laskey

P(sD) = 0.3

Spring 2015

P(sW) = 0.7

Unit 1 (v5) - 8 -

Sensitivity Analysis:

Optimal Decision as Function of Sickness Probability

Expected utility of the two actions:

E[U|aT] =

90

E[U|aN] =

0p + 100(1-p) = 100(1 p)

depends on p = P(sD), the probability

that the patient has the disease

The chart shows how the optimal decision changes as we vary p

treatment at p = 0.3

Spring 2015

Unit 1 (v5) - 9 -

Why be a Bayesian?

Arguments from theory

A coherent decision maker uses probability to represent uncertainty, uses utility

to represent value, and maximizes expected utility

If you are not coherent then someone can make "Dutch book" on you (turn you

into a "money pump")

Pragmatic arguments

Decision theory provides a useful and principled methodology for modeling

problems of inference, decision and learning from experience

Engineering tradeoffs between accuracy, complexity and cost can be analyzed

and evaluated

Both empirical data and informed engineering judgment can be explicitly

represented and incorporated into a model

Bayesian methods can handle small, moderate and large sample sizes; small,

moderate and large numbers of parameters

With other approaches it is often more difficult to understand why you got the

results you did and how to improve your model

Successful applications

Success attributed to decision theoretic technology

Caution:

Uncritical application of cookbook methods can lead to disaster!

Good modelers iteratively assess, check and revise assumptions

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 10 -

A probability distribution is a function P() applied to sets such that:

P(A) 0 for all subsets A of the universal set %

P() = 1

If AiAj = then P(A1A2) = P(A1) + P(A2)+

probability theory, e.g.:

P() = 0

P(A) 1 for all subsets A of %

P(AB) = P(A) + P(B) - P(AB) for any A and B

If AiAj = and A=A1A2, then P(AB) = i P(B|Ai)P(Ai)

Law of total

probability

defined as a number P(A|B) satisfying:

P(A|B)P(B) = P(AB)

If P(B)0 this is equivalent to the traditional

formula:

P(A B)

P(A | B) =

P(B)

A is independent of B if P(A|B) = P(A)

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 11 -

Gathering Information

We may be able to perform a test before deciding whether to

treat the patient

Test has two outcomes: tP (positive) and tN (negative)

Quality of test is characterized by two numbers:

Sensitivity: Probability that test is positive if patient has disease

Specificity: Probability that test is negative if patient does not have

disease

We will assume:

Sensitivity: P(tP | sD) = 0.95

Specificity: P(tN | sW) = 0.85

Take test, observe outcome t

Revise prior beliefs P(sD) to obtain posterior beliefs P(sD|t)

Re-compute optimal decision using P(sD|t)

Spring 2015

Unit 1 (v5) - 12 -

Objective: use evidence E to update beliefs about a hypothesis H

H: patient has (does not have) disease

E: evidence from test

P(H | E) =

=

=

P(E)

P(E)

P(E | H )P(H )

i P(E | H i )P(H i )

P(H 1 | E) P(E | H 1 )P(H 1 )

=

P(H 2 | E) P(E | H 2 )P(H 2 )

P(E)>0, P(H2)>0

Terminology:

P(H)

P(E)

P(E | H1)

P(E | H2)

P(H1)

P(H2)

more likely given H1 than given H2

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 13 -

Review of Problem Ingredients:

P(sD) = 0.3

P(tP | sD) = 0.95; P(tN | sW) = 0.85

u(cWN) = 100, u(cWS) = 90; u(cDN) = 0

(sensitivity & specificity of test)

(utilities)

If test is negative:

P(sD | tN) = (0.3 x 0.05)/(0.3 x 0.05 + 0.7 x 0.85) = 0.025

EU(aN | tN) = 0.025 0 + 0.975 100 = 97.5

EU(aT | tN) = 0.025 90 + 0.975 90 = 90

Best action is not to treat

EU(aN | tN)

If test is positive:

EU(aN | tP) = 0.731 0 + 0.269 100 = 26.9

EU(aT | tP) = 0.731 90 + 0.269 90 = 90

Best action is to treat

EU(aT)

= EU(aT|tP)

= EU(aT|tP)

97.5

90

70

EU(aN)

EU(aN | tP)

26.9

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 14 -

Value of Information

Reminder of problem ingredients:

P(sD) = 0.3

P(tP | sD) = 0.95; P(tN | sW) = 0.85

u(cWN) = 100, u(cWS) = 90; u(cDN) = 0

(sensitivity & specificity of test)

(utilities)

P(tP) = P(tP | sD) P(sD) + P(tP | sW) P(sW) = 0.950.3 + 0.150.7 = 0.39

If test is negative we should not treat, with EU(aN | tN) = 97.5

Expected utility of FollowTest strategy (treat if test is positive,

otherwise dont):

EU(aF) = P(tP) EU(aT) + P(tN) EU(aN | tN)

= 0.39 90 + (1-0.39) 97.5 = 94.575

with EU(aT) = 90

Gain from test is 94.575 90 = 4.575

This is called the Expected Value of Sample Information (EVSI)

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 15 -

Expected Value of Perfect Information (EVPI) is increase

in utility from perfect knowledge of an uncertain variable

Suppose an oracle will tell us whether patient is sick

An oracle has Sensitivity = Specificity = 1

70% chance we discover she is well and dont treat - utility 100

Expected utility if we ask the oracle 0.3 x 90 + 0.7 x 100 = 97

EVPI = 97 - 90 = 7

EVPI EVSI 0

EVPI = EVSI = 0 if the test will not change your decision

Spring 2015

Unit 1 (v5) - 16 -

General Principle: Free information can never hurt

To analyze decision of whether to collect information D on outcome

variable V:

Find maximum expected utility option if we don't collect information

Compute its expected utility U0

Find EVPI

For each possible value V=v, assume it is known to be the true outcome, find optimal

decision, calculate its expected utility, and multiply by the probability that V=v

Add these values and subtract from no-information expected utility to get EVPI

If EVPI is too small in relation to cost then stop; otherwise, compute EVSI

For each possible result D=d of the experiment, find the maximum expected utility

action a(d) and its utility u(a(d))

For each outcome V=v and result D=d of the experiment, find the joint probability P(v,d)

Calculate the expected utility with information USI = v,d P(v,d)u(a(d))

Compute difference USI = U0 in expected utility between no-information decision and

decision with information to get EVSI

Collect information if EVSI is large enough in relation to cost of information

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 17 -

as Function of = P(sD)

FollowTest strategy treats if test is positive and otherwise not

World

State

Probability

Action

Utility

Sick,

Positive

.95

Treat

90

Sick,

Negative

.05

NoTreat

Well,

Positive

.15(1-)

Treat

90

Well,

Negative

.85(1-)

NoTreat

100

= 98.5 13

Spring 2015

Unit 1 (v5) - 18 -

Expected Utility of FollowTest Policy

FollowTest strategy treats if test is positive and otherwise not

Before doing test, we think:

World

State

Probability

Action

Utility

Sick,

Positive

P(sD|tP)P(tP)

Treat

90

Sick,

Negative

P(sD|tN)P(tN)

NoTreat

Well,

Positive

P(sW|tP)P(tP)

Treat

90

Well,

Negative

P(sW|tN)P(tN)

NoTreat

100

result: it will be positive with

probability P(tP) and negative

with probability P(tN).

If it is positive our expected utility

will be EU[aT | tP].

If it is negative our expected

utility will be EU[aN | tN].

So our expected utility of

following the test is

P(tP) EU[aT|tP] + P(tN) EU[aN|tN]

+ P(sW|tP)P(tP)90 + P(sW|tN)P(tN)100

= P(tP) [P(sD|tP)90 + P(sW|tP)90]

(Compare with Slide 15)

+ P(tN) [P(sD|tN)0 + P(sW|tN)100]

= P(tP) EU[aT | tP] + P(tN) EU[aN | tN]

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 19 -

Strategy Regions

FollowTest:

AlwaysTreat:

EU(aF) = 98.5 13

EU(aT) = 90

NoTreat:

EU(aN) = 100(1 )

EVSI

0.017 < ps < 0.654

Spring 2015

Unit 1 (v5) - 20 -

FollowTest:

AlwaysTreat:

EU(aT) = 90

NoTreat:

EU(aN) = 100(1 )

Expected(U*lity(of(Op*mal(Strategy(with(Costly(Test(

100"

98"

(1.5+c)/87 < ps < (8.5-c)/13

c=0"

c=1"

96"

c=4"

c7.2"

94"

92"

90"

88"

0"

0.1"

0.2"

0.3"

0.4"

0.5"

0.6"

0.7"

Spring 2015

0.8"

0.9"

1"

Unit 1 (v5) - 21 -

For a test with cost c:

E[U|FollowTest] = 98.5 13 - c

NoTreat is best when < (1.5+c)/87

FollowTest is best when (1.5+c)/87 < < (8.5-c)/13

If 0.018<<0.029 then test if c= 0.1 but do nothing if c= 1

If 0.577<<0.646 then test if c=0.1 but treat if c=1

test

Expected(U*lity(of(Op*mal(Strategy(with(Costly(Test(

EVSI%as%a%Func,on%of%Prior%Probability%

100"

8"

98"

c=0"

7"

c=1"

96"

6"

c=4"

5"

c7.2"

94"

4"

Range of optimality

of test with c=1

3"

92"

2"

90"

1"

0"

88"

0"

0.1"

0.2"

0.3"

0.4"

0.5"

0.6"

0.7"

0.8"

0.9"

1"

Spring 2015

0"

0.2"

0.4"

0.6"

0.8"

1"

Unit 1 (v5) - 22 -

Collecting information may have value if it might change your

decision

Expected value of perfect information (EVPI) is utility gain from knowing true

value of uncertain variable

Expected value of sample information (EVSI) is utility gain from available

information

If 0.017 0.1

If 0.1 0.654

If = 0.3

EVSI is 87 - 1.5

EVSI is 8.5 - 13

EVSI is 8.5 - 13 = 4.6

(testing is optimal)

information

In our example:

If 0.017 0.1

If 0.1 0.654

If p = 0.3

Test if 8.5 - 13 > c

Test if 4.6 > c

Spring 2015

(test if c is less than 4.6)

Unit 1 (v5) - 23 -

Our disease model depends on several parameters

Prior probability of disease

Sensitivity of test

Specificity of test

judgment

Randomized clinical trials have established that Test T has sensitivity 0.95

and specificity 0.85 for Disease D

Given the presenting symptoms and my clinical judgment, I estimate a 60%

probability that the patient has Disease D.

Use clinical judgment to quantify uncertainty about the parameter as a

probability distribution

Collect data and use Bayes rule to obtain posterior distribution for the

parameters given the data

If appropriate, use clinical judgment to adjust results of studies to apply to a

particular patient

Spring 2015

Unit 1 (v5) - 24 -

drawn from a Parametric Model

Many statistical models assume observations are drawn at

random from a common probability distribution

Data X1, , Xn are drawn at random from distribution with

probability mass function f(x|) if:

Notation convention: uppercase

P(Xi=x|) = f(x|) for all i

Xi is independent of Xj given for ij

Example:

letters for particular values the

unknowns can take on

from a population with a proportion who have disease D

Data X1, , Xn are independent and identically distributed (iid) given

= P(Sick)

Xi iid means that if is

If the value is unknown we can express

our uncertainty about it by defining a probability known, learning the

condition of some patients

distribution for its value

does not affect our beliefs

Xi=1 (disease) or Xi=0 (no disease)

Pr(Xi = 1 | ) = (this is called a Bernoulli distribution) about the conditions of

the remaining patients

Pr( = ) = g()

g( | X1, , Xn )

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 25 -

We can use a graph to represent conditional

independence

Arc from to Xi means the distribution

of Xi depends on

No arc from Xi to Xj means that Xi and Xj

are independent given

Because all the Xi are inside the same plate,

they all follow the same probability distribution

X1

X2

X3

~ g()

Xi |

~ Bernoulli()

Xi

i=1,,3

Spring 2015

Unit 1 (v5) - 26 -

(with a very small sample)

We assume that can have one of 20 equally

likely values: 0.025, 0.075, , 0.975

We could represent prior knowledge by assigning

some values of a greater probability than others

range of values. We will treat

continuous parameters later.

For now we approximate with

a finite set of values.

Case 2 has disease; cases 1, 3, 4 and 5 do not

Likelihood function (probability of observations as function of ):

The likelihood function depends only on how many cases have the disease

The number of cases having the disease and the total number of cases are

sufficient for inference about %

0.12

0.10

0.08

0.625

0.725

0.825

0.925

0.14

0.12

0.10

0.525

Theta

0.08

0.425

0.06

0.325

Posterior Probability

0.225

0.04

distribution is proportional to the likelihood

0.125

0.02

0.025

0.00

Prior Probability

0.06

0.04

0.02

1

(1 )4

g( ) f (x | )

(1 )4

20

g( | x) =

=

=

' g( ') f (x | ') ' 1 '(1 ')4 ' '(1 ')4

20

0.00

distribution for :

0.14

0.025

0.125

0.225

0.325

0.425

0.525

0.625

0.725

0.825

0.925

Theta

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 27 -

R Code

0.08

0.06

0.00

0.02

0.04

Prior Probability

0.10

0.12

0.14

0.025

0.125

0.225

0.325

0.425

0.525

0.625

0.725

0.825

0.925

0.825

0.925

Theta

0.10

0.08

0.06

0.00

0.02

0.04

Posterior Probability

0.12

0.14

0.025

height of bar is probability that =

Kathryn Blackmond Laskey

Spring 2015

0.125

0.225

0.325

0.425

0.525

0.625

0.725

Theta

Unit 1 (v5) - 28 -

R Computing Environment

R (http://www.r-project.org) is a free, open source statistical

computing language and environment that includes:

matrix and array operations

tools for data analysis

graphical facilities for data analysis and display

programming language which includes conditionals, loops, user-defined

recursive functions and input and output facilities

Users can contribute packages to extend functionality

http://cran.r-project.org/doc/manuals/R-intro.pdf

development environment for R

We will use R heavily in this course

Most R assignments can be done by modifying sample code I will

provide

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 29 -

When the sample size is very large:

The posterior distribution will be concentrated around the maximum likelihood

estimate and is relatively insensitive to the prior distribution

We wont go too far wrong if we act as if the parameter is equal to the

maximum likelihood estimate

The posterior distribution is highly dependent on the prior distribution

Reasonable people may disagree on the value of the parameter

improvement on either expert judgment alone or data alone

Achieving the benefit requires careful modeling

This course will teach methods for constructing Bayesian models

to tailor results to moderate-sized sub-populations

Bayesian estimate shrinks estimates of sub-population parameters toward

population average

Amount of shrinkage depends on sample size and similarity of sub-population

to overall population

Shrinkage improves estimates for small to moderate sized sub-populations

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 30 -

Sample size 5: 1 with, 4 without

0.4"

distribution for when:

Prior distribution is uniform

20% of patients in sample have

the disease

0.35"

0.3"

0.25"

0.2"

more concentrated around 1/5 as

sample size gets larger

0.15"

0.1"

0.05"

0"

0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"

0.4"

0.4"

0.35"

0.35"

0.3"

0.3"

0.25"

0.25"

0.2"

0.2"

0.15"

0.15"

0.1"

0.1"

0.05"

0.05"

0"

0"

0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"

0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 31 -

on Impact of the Prior Distribution

Prior distribution favors low probabilities:

Prior%Distribu+on%%

Posterior(Distribu,on(for(1(Case(in(5(Samples(

0.4"

0.4"

0.35"

0.35"

0.35"

0.3"

0.3"

0.3"

0.25"

0.25"

0.25"

0.2"

0.15"

0.1"

0.05"

0.2"

0.2"

0.15"

0.15"

0.1"

0.1"

0.05"

0.05"

0"

0"

Posterior(Distribu,on(for(10(Cases(in(50(Samples(

0.4"

0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"

0"

0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"

0.35"

0.4"

0.4"

0.35"

0.35"

0.3"

0.3"

0.3"

0.25"

0.25"

0.25"

0.2"

0.2"

0.15"

0.15"

0.1"

0.1"

0.05"

0.05"

0"

0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"

Posterior(Distribu,on(for(10(Cases(in(50(Samples(

Posterior(Distribu,on(for(1(Case(in(5(Samples(

Prior%Distribu+on%

0.4"

0.2"

0.15"

0.1"

0.05"

0"

0.025" 0.075" 0.125" 0.175" 0.225" 0.275" 0.325" 0.375" 0.425" 0.475" 0.525" 0.575" 0.625" 0.675" 0.725" 0.775" 0.825" 0.875" 0.925" 0.975"

Posterior distribution for small sample is very sensitive to prior distribution

Posterior distribution for larger sample is less sensitive to prior distribution

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 32 -

Bayesian inference is as old as probability

Bayesian view fell into disfavor in nineteenth and early

twentieth centuries

Positivism, empiricism, and quest for objectivity in science

Paradoxes and systematic deviation of human judgment from

Bayesian norm

Computational advances make calculation possible for complex models

Bayesian models can coherently integrate many different kinds of

information

Logical implication

Informed expert judgment

Empirical observation

Clear connection to decision making

Spring 2015

Unit 1 (v5) - 33 -

Classical - Probability is a ratio of favorable cases to total (equipossible)

cases

Frequency - Probability is the limiting value as the number of trials

becomes infinite of the frequency of occurrence of some event

Logical - Probability is a logical property of ones state of information

about a phenomenon

Propensity - Probability is a propensity for certain kinds of event to occur

and is a property of physical systems

Subjectivist - Probability is an ideal rational agents degree of belief about

an uncertain event

Algorithmic - The algorithmic probability of a finite sequence is the

probability that a universal computer fed a random input will give the

sequence as output (related to Kolmogorov complexity)

Game Theoretic - Probability is an agents optimal announced certainty

for an event in a multi-agent game in which agents receive rewards that

depend on both forecasts and outcomes

Probability can represent all of these things.

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 34 -

Historical Notes

People have long noticed that some events are imperfectly predictable

Mathematical probability first arose to describe regularities in games of

chance

In the twentieth century it became clear that probability theory provided a

good model for a much broader class of problems:

Physical (thermodynamics; quantum mechanics)

Social (actuarial tables; sample surveys)

Industrial (equipment failures)

The subjectivist interpretation dates from the 18th century but fell out of

favor because of the positivitist orientation of Western 19th and 20th

century science

Von Mises formulated a rigorous (and much-debated) frequency theory in

the mid-twentieth century.

Hierarchy of generality:

Frequency interpretation is restricted to repeatable, random phenomena

Subjectivist interpretation applies to any event about which the agent is uncertain

Game theoretic interpretation applies even when probabilities are not true beliefs

Spring 2015

Unit 1 (v5) - 35 -

The Frequentist

A frequentist believes:

Probability is an objective property in the real world

Probability applies only to random processes

Probabilities are associated only with collectives not individual events

Frequentist Inference

Data are drawn from a distribution of known form but with an unknown parameter

(this includes nonparametric statistics in which the unknown parameter is the

distribution itself)

Often this distribution arises from explicit randomization (when not, statistician

argues that the procedure was close enough to randomized that inferences

apply)

Inferences regard the data as random and the parameter as fixed (even though

the data are known and the parameter is unknown)

For example: A sample X1,XN is drawn from a normal distribution with mean .

A 95% confidence interval is constructed. The interpretation is:

If an experiment like this were performed many times we would expect in 95% of the cases

that an interval calculated by the procedure we applied would include the true value of .

should believe about !

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 36 -

The Subjectivist

A subjectivist believes:

Probability as an expression of a rational agents degrees of belief about

uncertain propositions.

Rational agents may disagree. There is no one correct probability.

If the agent receives feedback her assessed probabilities will in the limit

converge to observed frequencies

Subjectivist Inference:

Probability distributions are assigned to the unknown parameters and to

the observations given the unknown parameters.

Condition on knowns; use probability to express uncertainty about

unknowns

For example: A sample X1,XN is drawn from a normal distribution with

mean having prior distribution g(). A 95% posterior credible interval is

constructed, and the result is the interval (3.7, 4.9). The interpretation is:

Given the prior distribution for and the observed data, the probability that lies

between 3.7 and 4.9 is 95%.

believe about and about what we should expect on the next trial

Spring 2015

Unit 1 (v5) - 37 -

Comparison: Understandability,

Subjectivity and Honest Reporting

Often the Bayesian answer is what the decision maker really wants

to hear.

Untrained people often interpret results in the Bayesian way.

Frequentists are disturbed by the dependence of the posterior

interval on the subjective prior distribution.

It is more important that stochastics provides a means of communication

among researchers whose personal beliefs about the phenomena under study

may differ. If these beliefs are allowed to contaminate the reporting of results,

how are the results of different researchers to be compared?

- H. Dinges

element in an analysis. Assumptions about the sampling distribution

are often also subjective.

Bayesian probability statements are always subjective, but statistical

analyses are often done for public consumption. Whose probability

distribution should be reported?

When there are enough data, a good Bayesian analysis and a good frequentist

analysis will typically be in close agreement

If the results are sensitive to the prior distribution, a Bayesian analyst is

obligated to report this sensitivity, and to present the range of results obtained

from a range of prior distributions

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 38 -

Comparison: Generality

Subjectivists can handle problems the frequentist approach

cannot (in particular, problems with not enough data for sound

frequentist inference).

Frequentist statisticians say this comes at a price -- when

there are not enough data the result will be highly dependent

on the prior distribution.

Subjectivists often apply frequentist techniques but with a

Bayesian interpretation

Frequentists often apply Bayesian methods if they have good

frequency properties

Spring 2015

Unit 1 (v5) - 39 -

De Groot, 1970

events, that satisfies the following conditions:

SP1. For any two uncertain events A and B, one of the following relations holds: A

B, A B or A ~ B.

SP2. If A 1, A2, B1, and B2 are four events such that A1A2=, B1B2=, and if Ai

Bi, for i = 1,2, then A1A2 B1B2. If in addition Ai Bi for either i=1 or

i=2, then A1A2 B1B2.

SP3. If A is any event, then A. Furthermore, there is some event A0 for which

A0.

SP4. If A1A2 is a decreasing sequence of events, and B is some event such

that Ai B for i=1, 2, , then i=1 Ai B .

SP5. There is an experiment, with a numerical outcome between the values of 0 and

1, such that if Ai is the event that the outcome x lies within the interval ai x

bi, for i=1,2, then A1 A2 if and only if (b1-a1) (b2-a2).

Spring 2015

Unit 1 (v5) - 40 -

Watson and Buede, 1987

A reward is a prize the decision maker cares about. A lottery is a situation in which

the decision maker will receive one of the possible rewards, where the reward to be

received is governed by a probability distribution. There is a qualitative relationship

of relative preference * , that operates on lotteries, that satisfies the following

conditions:

SU1. For any two lotteries L1 and L2, either L1 * L2, L1 * L2, or L1~*L2.

Furthermore, if L1, L2, and L3 are any lotteries such that L1 * L2 and

L2 * L3, then L1 * L3.

SU2. If r 1, r2 and r3 are rewards such that r1 * r2 * r3, then there exists a

probability p such that [r1: p; r3: (1-p)] ~* r2, where [r1:p; r3:(1-p)] is a

lottery that pays r1 with probability p and r3 with probability (1-p ).

SU3. If r1 ~* r2 are rewards, then for any probability p and any reward r3,

[r1: p; r3: (1-p)] ~* [r2: p; r3: (1-p )]

SU4. If r1 * r2 are rewards, then [r1: p; r2: (1-p)] * [r1: q; r2: (1-q)] if and

only if p > q.

SU5. Consider three lotteries, Li = [r1: pi; r2: (1-pi)], i = 1, 2, 3, giving different

probabilities of the two rewards r1 and r2. Suppose lottery M gives entry to

lottery L2 with probability q and L3 with probability 1-q. Then L1~*M if and

only if p1 = qp2 + (1-q)p3.

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 41 -

If your beliefs satisfy SP1-SP5, then there is a probability

distribution Pr() over events such that for any two events A1

and A2, Pr(A1) Pr(A2) if and only if A1 A2.

If your preferences satisfy SU1-SU5, then there is a utility

function u() defined on rewards such that for any two lotteries

L1 and L2, L1 * L2 if and only if E[u(L1)] E[u(L2)], where

E[] denotes the expected value with respect to the probability

distribution Pr().

Spring 2015

Unit 1 (v5) - 42 -

A common criticism of decision theory is that its adherents are cold-hearted

technocrats who care about numbers and not about what really matters

They would put a dollar value on a human life

They would send people to possible death on the basis of utilitarian calculations

And so on

Whether we quantify them or not, as a society and as individuals we make them all the time

They will be irrational and capricious if we approach them without a principled methodology

Refusing to think about the tradeoffs will only ensure that they will be addressed haphazardly and/

or by back door manipulation by powerful special interests

As a society we need open debate and discussion of the difficult tradeoffs we are forced to make.

Decision theory provides a justifiable, communicable framework for doing so

disagreements about fact are clearly separated from disagreements about value

inconsistencies can be spotted, discussed, and resolved

commonly recurring problems need not be revisited once consensus has been reached

Decision theory can be misused if models are sloppily built and leave out important

elements

When a group or society has not reached consensus there is no clear best choice

Explicitly modeling subjective elements of a problem provides a framework for

informed debate

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 43 -

The inventors of probability theory thought of it as a logic of

enlightened rational reasoning. In the nineteenth century this was

replaced by a view of probability as measuring objective

propensities of intrinsically random phenomena

The twentieth century has seen a resurgence of interest in subjective

probability and an increased understanding of the appropriate role of

subjectivity in science

Bayesian methods often require more computational power than

traditional frequentist methods

The computer revolution has enabled the Bayesian resurgence

Most statistics texts and courses take a frequentist approach but this

is changing

Bayesian decision theory provides methodology for rational choice

under uncertainty

Bayesian statistics is a theory of rational belief dynamics

We took a broad-brush tour of Bayesian methodology

We applied Bayesian thinking to a simple example that illustrates

many of the concepts we will be learning this semester

Kathryn Blackmond Laskey

Spring 2015

Unit 1 (v5) - 44 -

Philosophical Transactions of the Royal Society of London, 53:370- 418, 1763.

Bashir, S.A., Getting Started in R, http://www.luchsinger-mathematics.ch/Bashir.pdf

Dawid, A.P. and Vovk, V.G. (1999), Prequential Probability: Principles and Properties,

Bernoulli, 5: 125-162.

de Finetti, Bruno. Theory of Probability: A Critical Introductory Treatment. New York:

Wiley, 1974.

Gelman, et al., Chapter 1

Hjek, Alan, "Interpretations of Probability", The Stanford Encyclopedia of Philosophy

(Summer 2003 Edition), Edward N. Zalta(ed.), URL = <http://plato.stanford.edu/

archives/sum2003/entries/probability-interpret/>.

Lee, Chapter 1

Li, Ming and Vitanyi, Paul. An Introduction to Kolmogorov Complexity and Its

Applications. (2nd ed) Springer-Verlag, 2005.

Nau, Robert F. (1999), Arbitrage, Incomplete Models, And Interactive Rationality,

working paper, Fuqua School of Business, Duke University.

Neapolitan, R. Learning Bayesian Networks, Prentice Hall, 2003.

Jaynes, E., Probability Theory: The Logic of Science, Cambridge University Press,

2003)

Savage, L.J., The Foundations of Statistics. Dover, 1972.

Shafer, G. Probability and Finance: Its Only a Game, Wiley, 2001.

von Mises R., 1957, Probability, Statistics and Truth, revised English edition, New

York: Macmillan

Spring 2015

Unit 1 (v5) - 45 -

- Numbers Are Us: Number Sense and Its Effects on LifeHochgeladen vonNumbersAreUs
- Lecture16 Module3 Anova 1Hochgeladen vonamanpreet2190
- Ma 629Hochgeladen vonemilypage
- James & McCulloch 1990Hochgeladen vonAarón Perez Chan
- Co IntegrationHochgeladen vonVishyata
- IEEEXplore_6Hochgeladen vonHimanshu Gond
- ip sirHochgeladen vonGaurav Gupta
- Understanding Sensitivity BMJ ArticleHochgeladen vonFariyal Deepa
- aait-bn-peHochgeladen vonTrinhXuanTuan
- CR Lower Bound Fisher InfoHochgeladen vonrigasti
- Lab4_BayesiannetworksHochgeladen vonkasstoi
- RCBDHochgeladen vonAmiranagy
- Kwon Lippman2011Hochgeladen vongir_8
- TenenbaumHochgeladen vonTrí Tuyên Lục
- Edwards2 Chap OneHochgeladen vonMomentum Press
- Igor Douven, AbductionHochgeladen vonakansrl
- Predicting Malaysian Road Fatalities for Year 2020Hochgeladen vonTK Tan
- UCM341359_Andy_slide.pdfHochgeladen vonGrace Ma
- Statistics Sample 9Hochgeladen vonvivi_15o689_11272315
- ONE-WAY Model in lmHochgeladen vonHeavensThunderHammer
- PDF3.pdfHochgeladen vonabdulazizmoosa93
- Kesesuaian formularium 2Hochgeladen vonayu rahmawati
- exploracionHochgeladen vonSandra Smim
- Probability and StatisticsHochgeladen vonMario Pedroza
- Technical GuideHochgeladen vonAllister Hodge
- IE 211 Syllabus 2018-2019AHochgeladen vonJoe Ramos
- UPI 285 Nurfadilah Siregar, M.pd Revisi FIXHochgeladen vonNurfadilah Siregar
- pe.pdfHochgeladen vonQuTe DoGra G
- Critique of an assessment toolHochgeladen vonwaliworld
- 1. Nature of Statistics W1 (1)Hochgeladen vonMJ Sol

- Ticona_Hostaform_C9021.pdfHochgeladen vonamr_akram_2
- Technical Specification Boomer 322Hochgeladen vonDiego Pacheco Vela
- NEC - Table 8 Conductor PropertiesHochgeladen vonRogelio Revetti
- J2ME_IntroductionHochgeladen vonKhalil Ahmadi
- 3 Com 3824 User GuideHochgeladen vonjonleno
- Gadgets and GizmosHochgeladen voncandle stick
- Am 475 MasterHochgeladen vonryanb91
- Uni FlashHochgeladen vonVivian Clement
- Instance RecoveryHochgeladen vonRamp Kalapati
- Serv1 06e Hw ComHochgeladen vonNeagoe Cristian
- atenuare naturala THP.pdfHochgeladen vonmlchivu8385
- Analysis of AlgorithmsHochgeladen vonnalluri_08
- As 2365.7-2006 Methods for Sampling and Analysis of Indoor Air Determination of Total Suspended Particulate mHochgeladen vonSAI Global - APAC
- MHS-15 Prod NoteHochgeladen vonyenlitu
- KRONOS Op Guide E9copyHochgeladen vonslowtrain
- Advanced LIVA TIVA TechniquesHochgeladen vonfitrianugrah
- 1-basicHochgeladen vonanililhan
- MDSW-Meghalaya02Hochgeladen vonminingnova1
- Ei_ModbusRTUHochgeladen vondwmoises
- MELHochgeladen vonmaddy_i5
- Oerlikon B_6CP155_EN_20070514Hochgeladen vonMadd000
- Indal LumaHochgeladen vonPavle Neralic
- Final.ppt LUT MulHochgeladen vonHimanshu Sakaray
- PIC12C509AHochgeladen vonsandrorrr
- 536534_481431718_AssignmentQuestions (2)Hochgeladen vonrishi
- Eventually Consistent - Werner VogelsHochgeladen vonBruno Silva
- Create Descriptive Flex Field DFF in Custom FormHochgeladen vondeepthi412
- COMBI_PAL_Manual_Edition07.pdfHochgeladen vonEliot
- Composites from Rice Straw and High Density Polyethylene- Thermal and Mechanical PropertiesHochgeladen vonTI Journals Publishing
- IOM_MANUAL_SCE_EN.pdfHochgeladen voncool47guy09

## Viel mehr als nur Dokumente.

Entdecken, was Scribd alles zu bieten hat, inklusive Bücher und Hörbücher von großen Verlagen.

Jederzeit kündbar.