00 positive Bewertungen00 negative Bewertungen

42 Ansichten6 SeitenMay 29, 2014

© © All Rights Reserved

PDF, TXT oder online auf Scribd lesen

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

42 Ansichten

00 positive Bewertungen00 negative Bewertungen

© All Rights Reserved

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

Sie sind auf Seite 1von 6

In preparation for the release of the sequel to Dr. Richard Carriers Proving History, I

thought it would be good to address the foundations of the sequel as presented in

Proving History. Namely, Dr. Carriers arguments about Bayes Theorem or BT (as

presented in his book, anyway). There are several reasons for this. First, Dr. Carrier

argues that BT should be THE method for all historical research. Second, the entire

point of Proving History was to argue that the approach we will find in the sequel is the

best. The third and most important reason, however, is that Dr. Carrier doesnt

understand BT, how it differs from the method he actually advocates (Bayesian

inference/statistics), or how fundamentally inaccurate basically every claim he makes

about his methodology is.

We can begin our journey through Dr. Carriers failure to understand his own method by

looking at his flawed proof. At one point, Dr. Carrier states we can conclude here and

now that Bayes Theorem models and describes all valid historical methods. No other

method is needed (p. 106). His proof, though, stands or falls with the first

proposition in it: BT is a logically proven theorem (ibid.). This is true. However, Dr.

Carrier doesnt seem to have read the sources he cites. For example, on p. 50 Dr. Carrier

refers the reader via an endnote (no. 9) to several highly commendable texts on BT.

The one he states gives a complete proof of formal validity of BT is Papoulis, A. (1986).

Probability, Random Variables, and Stochastic Processes. (2nd Ed.). I dont have the

2nd edition, but I do have the 3rd and as this proof is trivial I really could use any intro

probability textbook. Papoulis begins his complete proof of formal validity (as opposed

to proof of informal validity? or incomplete proof?) by defining a set and probability

function for which the axioms of probability hold. A key axiom is that any set of possible

outcomes must sum or integrate to 1 (simplistically, for those who havent taken any

calculus, integration is a kind of summation). For example, imagine an individual names

Anna is drawing cards from the pack. Lets imagine that

It wasnt the Jack of Diamonds

Nor the Joker she drew at first

It wasnt the King or the Queen of Hearts

But the Ace of Spades reversed

The probability of drawing the card she did is 1/52. This is true for the other 51 cards as

well. The probability that she would draw a card from the deck that was in the deck is

52/52 or 1. This is intuitive and obvious, but the important point is that it also follows

from the fact that there are 52 cards and the probability for drawing any one of them is

1/52, hence the probability of drawing a card is given by the sum of the probabilities of

drawing each individual card, or 1/52 summed 52 times. In Dr. Carriers appendix (p.

284) he notes that probability functions must sum to 1. What he apparently doesnt

understand is what this entails. It means that in order to use BT to evaluate how

probable some outcome, result, historical event, etc., is, one must consider every single

one.

Let me make this simpler with a simple example. I can calculate the probability that,

given a full deck of cards, a random draw will yield an ace because I know in advance

every possible outcome. If, however, someone mixed together 10 cards drawn at random

from 300 different decks and asked me to pick a card, I cant calculate the probability

anymore. Even if I were told that the new deck contained 3,000 cards, I have no idea

how many are aces. Incidentally, this is the perfect situation for Bayesian statistical

inference, which works (simplistically) by assuming e.g., a certain distribution of aces

and then changing my model of how likely it is that the next draw will yield an ace as I

learn more about the distribution of cards in the entire 3,000 card deck.

Dr. Carrier wishes to use what he thinks BT is to evaluate the probability that particular

events occurred ~2,000 years ago. For example, on pp. 40-42 he considers the

possibility that Jesus was a legendary rabbi in terms of the class of legendary rabbis

and information we have on such a class. We are in a far worse position than in the

mixed card deck example above, because we dont even know the number of legendary

rabbis still less who they might be (if we did wed have the answer: Jesus either would be

one or wouldnt).

There is another basic property of BT Dr. Carrier seems to have missed. As Papoulis

clearly states, BT is only valid for events/outcomes that are mutually exclusive. Often,

both of these requirements (the sum to 1 and mutually exclusivity) are given together:

the set of outcomes must be collectively exhaustive and mutually exclusive, or BT is only

valid if

1) all possible outcomes are known

2) one and only one outcome can occur.

This makes BT useless for most purposes including historiography. However, Dr.

Carrier isnt really using BT. As his references show (as well as his formulation of the

theorem shows), he is actually using something called Bayesian inference/Bayesian

analysis. However, this negates his entire proof because it doesnt matter if BT is a

logically proven theorem and there is no complete proof of formal validity for some

Bayesian inference/analysis theorem Dr. Carrier could use in place of his first

proposition.

Ok, so we cant use BT, but that doesnt mean we cant use Bayesian methods. However,

in order to use Bayesian methods Dr. Carrier would have to understand Bayesian

statistics (and statistics in general). He doesnt. We can see this clearly when Dr. Carrier,

whose expertise is ancient history, addresses the frequentist vs. Bayesian debate. To

keep things simple, lets just say that this is an ongoing debate arguably going back to

Bayes but definitely is over a century old. Dr. Carrier is apparently so confident in his

mathematical acuity he resolves the dispute with almost no reference to math or the

literature in a few pages: The whole debate between frequentists and Bayesians,

therefore, has merely been about what a probability is a frequency of, the rules are the

same for either (p. 266). Hm. Amazing that generations of the best statistical minds

missed this. Oh wait. They didnt.

Lets look at how Carrier describes the dispute: The debate between the so-called

frequentists and Bayesians can be summarized thus: frequentists describe

probabilities as a measure of the frequency of occurrence of particular kinds of event

within a given set of events, while Bayesians often describe probabilities as measuring

degrees of belief or uncertainthy. (p. 265). This is laughably wrong:

Frequentist statistical procedures are mainly distinguished by two related features; (i)

they regard the information provided by the data x as the sole quantifiable form of

relevant probabilistic information and (ii) they use, as a basis for both the construction

and the assessment of statistical procedures, long-run frequency behaviour under

hypothetical repetition of similar circumstances.

Bernardo, J. M. & Smith, A. F. (1994). Bayesian Theory. Wiley.

Undoubtedly, the most critical and most criticized point of Bayesian analysis deals with

the choice of the prior distribution, since, once this prior distribution is known,

inference can be led in an almost mechanic way by minimizing posterior losses,

computing higher posterior density regions, or integrating out parameters to find the

predictive distribution. The prior distribution is the key to Bayesian inference and its

determination is therefore the most important step in drawing this inference. To some

extent, it is also the most difficult. Indeed, in practice, it seldom occurs that the available

prior information is precise enough to lead to an exact determination of the prior

distribution, in the sense that many probability distributions are compatible with this

informationMost often, it is then necessary to make a (partly) arbitrary choice of the

prior distribution, which can drastically alter the subsequent inference.

Robert, C. P. (2001). The Bayesian Choice: From Decision-Theoretic Foundations to

Computational Implementation (Springer Texts in Statistics). (2nd Ed.). Springer.

The frequency part of frequentist does have to do with kinds of events, but

frequencies are the measure of probability, not the reverse. To illustrate, consider the

bell curve (the graph of the normal distribution). Its a probability distribution. Now

imagine a standardized test like the SATs which is designed such that scores will be

normally distributed and have this bell curve graph. What does the graph tell us? It tells

us that the most people who take the test get very close scores, but very infrequently

some test-takers will get high scores and other will get low. In other words, the bell

curve is the graph of a probability function (technically, a probability density function or

pdf), and it is formed by the frequency of particular scores. We know that it is very

improbable for a persons score to fall in either of the ends/tails of the bell curve because

these are very infrequent outcomes.

What does this mean for frequentist methods? Well, Kaplan, The Princeton Review,

and other test prep companies try to show their methods work by using this normal

distribution. They claim that people who take their classes arent distributed the way the

population is, because too frequently students taking their class obtain scores above

average (i.e., those who take the classes have test scores that arent distributed the way

the population is). They use the frequency of higher-than-average scores to argue that

their class must improve scores.

Whats key is that the data are obtained and analyzed but the distribution is only used to

determine whether the values the analysis yielded are statistically significant. Bayesian

inference reverses this, creating fundamental differences. The process starts with a

probability distribution. The prior distributions obtained represent uncertainty and

make predictions about the data that will be obtained. Once the new data is obtained,

the model is adjusted to better fit it. This is usually done many, many times as more and

more information is tested against an increasingly more accurate model. The key

differences are

1) the iterative process

2) the use of models which make predictions

3) the use of distributions to represent unknowns and (in part) the way the model will

learn or adapt given new input.

So why dont we find any of this in Dr. Carriers description of Bayesian methods? Why

do we always find ad hoc descriptions of priors? Because Dr. Carrier wants to use

Bayesian analysis but apparently doesnt understand what priors actually are or how

complicated they can be in even simple models:

In many situations, however, the selection of the prior distribution is quite delicate in

the absence of reliable prior information, and generic solutions must be chosen instead.

Since the choice of the prior distribution has a considerable influence on the resulting

inference, this choice must be conducted with the utmost care.

Marin, J. M., & Robert, C. (2007). Bayesian Core: A Practical Approach to

Computational Bayesian Statistics. (Springer Texts in Statistics). Springer.

While the axiomatic development of Bayesian inference may appear to provide a solid

foundation on which to build a theory of inference, it is not without its problems.

Suppose, for example, a stubborn and ill-informed Bayesian puts a prior on a population

proportion p that is clearly terrible (to all but the Bayesian himself). The Bayesian will

be acting perfectly logically (under squared error loss) by proposing his posterior mean,

based on a modest size sample, as the appropriate estimate of p. This is no doubt the

greatest worry that the frequentist (as well as the world at large) would have about

Bayesian inference that the use of a bad prior will lead to poor posterior inference.

This concern is perfectly justifiable and is a fact of life with which Bayesians must

contendWe have discussed other issues, such as the occasional inadmissibility of the

traditional or favored frequentist method and the fact that frequentist methods dont

have any real, compelling logical foundation. We have noted that the specification of a

prior distribution, be it through introspection or elicitation, is a difficult and imprecise

process, especially in multiparameter problems, and in any statistical problem, suffers

from the potential of yielding poor inferences as a result of poor prior modeling.

Samaniego, F. J. (2010). A Comparison of the Bayesian and Frequentist Approaches to

Estimation. (Springer Texts in Statistics). Springer.

The stubborn and ill-informed Bayesian is in a much better position than Dr. Carrier.

Dr. Carrier has confused BT with Bayesian analysis and the Bayesian approach with the

frequentist all because he apparently hasnt understood any of these. Instead of prior

distributions his priors are best guesses. Instead of real belief functions we find heres

what I believe. No considerations are given to the nature of the data (categorical,

nominal, and in general none numerical data require specific models and tests, Bayesian

or not).

So instead of the universally valid historical method Dr. Carrier argues BT provides, all

that hes actually done is butcher mathematics in order to plug values in to a formula

that is as mathematical as numerology but apparently seems impressive if you have no

clue what you are talking about. Perhaps thats why Dr. Carriers CV indicates hes been

lecturing on Bayes Theorem since 2003, but his 2008 dissertation contains no

reference to Bayes Theorem.

## Viel mehr als nur Dokumente.

Entdecken, was Scribd alles zu bieten hat, inklusive Bücher und Hörbücher von großen Verlagen.

Jederzeit kündbar.