Sie sind auf Seite 1von 3

A False Coin Revisted

Simon Bridge
February 15, 2011

Abstract
It has been asserted that Bayesian statistics is not needed, that nor-
mal High School hypothesis testing is quite adequate to the task. Upon
examination, however, this will actually result in an overly conservative
assessment: chance, as a possible cause of the results, will be rejected too
soon.

Introduction
The usual treatment of statistical hypothesis testing typically taught in
secondary school proposes competing hypothesis to be compared, then
computing the probability of one (the hypothesis under test) allowing
that it may be rejected in favour of the other (the null hypothesis) if the
odds drop beyond established confidence limits.
For the coin-toss experiment [1], the test hypothesis will be that the
coin is fair, while the null hypothesis will be that the coin is not fair (to
avoid a false dichotomy since there are more ways the coin can be unfair
than that it is double-headed.) I’ll follow the notation in the previous
paper for ease of comparison.

The First Estimate


Probability of Y = y heads in y tosses is p(y) = (1/2)y so we reject the
test hypothesis with confidence a at toss y provided
 y
1
1− >a (1)
2

Initial Results
First sixteen rejection levels to five decimal places1 :
toss 1 to 4 0.50000 0.75000 0.87500 0.93750
toss 5 to 8 0.96875 0.98438 0.99219 0.99609
toss 9 to 12 0.99805 0.99902 0.99951 0.99976
toss 13 to 16 0.99988 0.99994 0.99997 0.99998
1 being GNU Octave single-precision

1
Analysis
The hypothesis will be rejected with 95% confidence on the fifth toss, with
99% confidence on the seventh, and to one part in ten-thousand by toss
fourteen.
However: “reject at 95% confidence” suggests that 95% of the time
you get five heads in a row, you are cheating. Can this be right?
The rejection rates agree well with the Bayesian analysis - provided
the prior of 0.5 ... that is not unreasonable because the normal testing
places equal a-priori emphasis on the two outcomes. Also, notice that
the rejection level for a single toss is 50% . . . while not enough to reject
the hypothesis, surely a single head is not even that good as evidence of
cheating?
Clearly we need to take into account the a-priori chance of cheating.

Adjusted statistics . . .
The initial chance of cheating must be guessed by some means. Imagine
I put 99 fair coins and one double-headed coin into a bag, shake them
up, then pick one out to toss. This is the same situation as in [1] with a
p = 0.99 prior.
In this situation, we reject the test hypothesis with confidence a at
toss y provided  y
1
1 − (0.99) >a (2)
2

Adjusted Results
First sixteen rejection levels:
toss 1 to 4 0.50500 0.75250 0.87625 0.93812
toss 5 to 8 0.96906 0.98453 0.99227 0.99613
toss 9 to 12 0.99807 0.99903 0.99952 0.99976
toss 13 to 16 0.99988 0.99994 0.99997 0.99998

Analysis
The modification has made no difference to the rejection tosses. The same
problems apply: there is too much weight given to the null hypothesis.
In addition we now see the effect of the a-priori chance is to increase the
rejection levels at each toss. The rejection level at one toss should be close
to 0.01: the initial chance of picking out the double-headed coin.
The actual chance the coin is double-headed from five heads is about
0.75 (from bayesian stats: 0.99 prior). It follows that the hypothesis has
been rejected too soon — for 95% confidence you need to wait for the 11th
toss. The reason is because the chance of the coin being double-headed is
very small at the outset.
The graph shows how the above statistics (blue) compares with the
equivalent Bayesian (red). The probability of cheating on a single toss is
Standard hypothesis testing only takes the forward probability into
account - which is OK when you have a very large number of trials with

2
Figure 1:

a spectrum of results - however, where you get a consistent result on a


small number of trials, you need to more closely model the way mounting
evidence changes the odds.
In terms of testing claims of the paranormal, the standard hypothesis
testing will have you rejecting chance, as a possible cause of the phenomi-
non, too soon.

With reference to:


[1] Bridge S. A fair coin? (self published 2011) retreived from url

The graph was prepared with GNU Octave, version 3.0.5, typesetting
by LATEX 2ε , running under Ubuntu 10.10 GNU/Linux kver: 2.6.32-28-
generic-pae.

Das könnte Ihnen auch gefallen