Beruflich Dokumente
Kultur Dokumente
com Sequences
Llizier Yudkowsky
Generated by lesswrong_book.py on 2013-01-05.
Pseudo-random version string: ee78df8a-6178-4d00-8a06-16a9ed81a6be.
A collection of posts dealing with the fundamentals of
rationality: the difference between the map and the
territory, Bayess Theorem and the nature of evidence,
why anyone should care about truth, and minds as
reflective cognitive engines.
Part I
Map and Territory
1. The Simple Truth
\e mean:
. Epistemic rationality: believing, and updating on
evidence, so as to systematically improve the
correspondence between your map and the territory
z
.
The art oI obtaining belieIs that correspond to reality as
closely as possible. This correspondence is commonly
termed truth" or accuracy", and we're happy to call it
that.
z. Instrumental rationality: achieving your values. Not
necessarily your values" in the sense oI being selfish values
or unshared values: your values" means anything you care
about. The art oI choosing actions that steer the Iuture
toward outcomes ranked higher in your preIerences. On
L\ we sometimes reIer to this as winning".
II that seems like a perIectly good deIinition, you can stop read
ing here, otherwise continue.
Sometimes experimental psychologists uncover human reason
ing that seems very strange Ior example
to
help transport thoughts Irom one mind to another. You cannot
change reality, or prove the thought, by manipulating which mean
ings go with which words.
So iI you understand what concept we are generally getting at with
this word rationality", and with the subterms epistemic rational
ity" and instrumental rationality", we have communicated: we have
accomplished everything there is to accomplish by talking about
how to deIine rationality". \hat's leIt to discuss is not what mean-
j. http://yudkowsky.net/virtues/
o. Page zz8, 'Lisputing LeIinitions'.
. Page z8, 'The Argument Irom Common \sage'.
26 MAP AND TERRITORY
ing to attach to the syllables rationality", what's leIt to discuss is
what is a good way to think.
\ith that said, you should be aware that many oI us will regard
as controversial at the very least any construal oI rationality" that
makes it non-normative:
For example, iI you say, The rational belieI is X, but the true
belieI is Y" then you are probably using the word rational" in a way
that means something other than what most oI us have in mind.
(L.g. some oI us expect rationality" to be consistent under reflection
rationally" looking at the evidence, and rationally" considering
how your mind processes the evidence, shouldn't lead to two diIIer
ent conclusions.) Similarly, iI you Iind yourselI saying The rational
thing to do is X, but the right thing to do is Y" then you are almost
certainly using one oI the words rational" or right" in a way that a
huge chunk oI readers won't agree with.
In this case or in any other case where controversy threatens
you should substitute more speciIic language
z
: The selIbeneIit
ing thing to do is to run away, but I hope I would at least try to drag
the girl oII the railroad tracks" or Causal decision theory as usually
Iormulated says you should twobox on ewcomb's Problem
, but
I'd rather have a million dollars."
X is rational!" is usually just a more strident way oI saying I
think X is true" or I think X is good". So why have an additional
word Ior rational" as well as true" and good": Because we want
to talk about systematic methods Ior obtaining truth and winning.
The word rational" has potential pitIalls, but there are plenty
oI nonborderline cases where rational" works Iine to communicate
what one is getting at, likewise irrational". In these cases we're not
aIraid to use it.
Yet one should also be careIul not to overuse that word. One
receives no points merely Ior pronouncing it loudly. II you speak
overmuch oI the \ay you will not attain it.
z. Page z,, 'Taboo Your \ords'.
. http://lesswrong.com/lw/nc/newcombs_problem_and_regret_of_rationality/
WHAT DO WE MEAN BY 'RATIONALITY? 27
3. An Intuitive Explanation of Bayes'
Theorem
Bayes Theorem
for the curious and bewildered;
an excruciatingly gentle introduction.
Your Iriends and colleagues are talking about something called
Bayes' Theorem" or Bayes' Rule", or something called Bayesian
reasoning. They sound really enthusiastic about it, too, so you
google and Iind a webpage about Bayes' Theorem and.
It's this equation. That's all. ]ust one equation. The page you
Iound gives a deIinition oI it, but it doesn't say what it is, or why
it's useIul, or why your Iriends would be interested in it. It looks
like this random statistics thing.
So you came here. Maybe you don't understand what the equation
says. Maybe you understand it in theory, but every time you try to
apply it in practice you get mixed up trying to remember the
diIIerence between p(a|x) and p(x|a), and whether
p(a)*p(x|a) belongs in the numerator or the denominator.
Maybe you see the theorem, and you understand the theorem, and
you can use the theorem, but you can't understand why your
Iriends and/or research colleagues seem to think it's the secret oI
the universe. Maybe your Iriends are all wearing Bayes' Theorem
Tshirts, and you're Ieeling leIt out. Maybe you're a girl looking
Ior a boyIriend, but the boy you're interested in reIuses to date
anyone who isn't Bayesian". \hat matters is that Bayes is cool,
and iI you don't know Bayes, you aren't cool.
\hy does a mathematical concept generate this strange
enthusiasm in its students: \hat is the socalled Bayesian
Revolution now sweeping through the sciences, which claims to
subsume even the experimental method itselI as a special case:
\hat is the secret that the adherents oI Bayes know: \hat is the
light that they have seen:
. http://yudkowsky.net/rational/bayes
Soon you will know. Soon you will be one oI us.
\hile there are a Iew existing online explanations oI Bayes'
Theorem, my experience with trying to introduce people to
Bayesian reasoning is that the existing online explanations are too
abstract. Bayesian reasoning is very counterintuitive. People do not
employ Bayesian reasoning intuitively, Iind it very diIIicult to learn
Bayesian reasoning when tutored, and rapidly Iorget Bayesian
methods once the tutoring is over. This holds equally true Ior
novice students and highly trained proIessionals in a Iield.
Bayesian reasoning is apparently one oI those things which, like
quantum mechanics or the \ason Selection Test, is inherently
diIIicult Ior humans to grasp with our builtin mental Iaculties.
Or so they claim. Here you will Iind an attempt to oIIer an
intuitive explanation oI Bayesian reasoning an excruciatingly
gentle introduction that invokes all the human ways oI grasping
numbers, Irom natural Irequencies to spatial visualization. The
intent is to convey, not abstract rules Ior manipulating numbers,
but what the numbers mean, and why the rules are what they are
(and cannot possibly be anything else). \hen you are Iinished
reading this page, you will see Bayesian problems in your dreams.
And let's begin.
Here's a story problem about a situation that doctors oIten
encounter:
% oI women at age Iorty who participate in routine
screening have breast cancer. 8o% oI women with breast
cancer will get positive mammographies. j.6% oI women
without breast cancer will also get positive
mammographies. A woman in this age group had a positive
mammography in a routine screening. \hat is the
probability that she actually has breast cancer:
\hat do you think the answer is: II you haven't encountered this
kind oI problem beIore, please take a moment to come up with
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 29
your own answer beIore continuing.
ext, suppose I told you that most doctors get the same wrong
answer on this problem usually, only around ,% oI doctors get it
right. (Really: ,%: Is that a real number, or an urban legend
based on an Internet poll:" It's a real number. See Casscells,
Schoenberger, and Crayboys j;8, Lddy j8z, Cigerenzer and
HoIIrage jj,, and many other studies. It's a surprising result
which is easy to replicate, so it's been extensively replicated.)
Lo you want to think about your answer again: Here's a ]avascript
calculator iI you need one. This calculator has the usual
precedence rules, multiplication beIore addition and so on. II
you're not sure, I suggest using parentheses.
On the story problem above, most doctors estimate the
probability to be between ;o% and 8o%, which is wildly incorrect.
Here's an alternate version oI the problem on which doctors Iare
somewhat better:
o out oI ooo women at age Iorty who participate in
routine screening have breast cancer. 8oo out oI ooo
women with breast cancer will get positive
mammographies. j6 out oI ooo women without breast
cancer will also get positive mammographies. II ooo
women in this age group undergo a routine screening, about
what Iraction oI women with positive mammographies will
actually have breast cancer:
And Iinally, here's the problem on which doctors Iare best oI all,
with 6% nearly halI arriving at the correct answer:
oo out oI o,ooo women at age Iorty who participate in
routine screening have breast cancer. 8o oI every oo
30 MAP AND TERRITORY
women with breast cancer will get a positive
mammography. j,o out oI j,joo women without breast
cancer will also get a positive mammography. II o,ooo
women in this age group undergo a routine screening, about
what Iraction oI women with positive mammographies will
actually have breast cancer:
The correct answer is ;.8%, obtained as Iollows: Out oI o,ooo
women, oo have breast cancer, 8o oI those oo have positive
mammographies. From the same o,ooo women, j,joo will not
have breast cancer and oI those j,joo women, j,o will also get
positive mammographies. This makes the total number oI women
with positive mammographies j,o-8o or ,oo. OI those ,oo
women with positive mammographies, 8o will have cancer.
Lxpressed as a proportion, this is 8o/,oo or o.o;;6; or ;.8%.
To put it another way, beIore the mammography screening, the
o,ooo women can be divided into two groups:
Croup : oo women with breast cancer.
Croup z: j,joo women without breast cancer.
Summing these two groups gives a total oI o,ooo patients,
conIirming that none have been lost in the math. AIter the
mammography, the women can be divided into Iour groups:
Croup A: 8o women with breast cancer, and a positive
mammography.
Croup B: zo women with breast cancer, and a negative
mammography.
Croup C: j,o women without breast cancer, and a
positive mammography.
Croup L: 8,j,o women without breast cancer, and a
negative mammography.
As you can check, the sum oI all Iour groups is still o,ooo. The
sum oI groups A and B, the groups with breast cancer, corresponds
to group , and the sum oI groups C and L, the groups without
breast cancer, corresponds to group z, so administering a
mammography does not actually change the number oI women with
breast cancer. The proportion oI the cancer patients (A - B)
within the complete set oI patients (A - B - C - L) is the same as
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 31
the % prior chance that a woman has cancer: (8o - zo) / (8o - zo -
j,o - 8j,o) oo / oooo %.
The proportion oI cancer patients with positive results, within the
group oI all patients with positive results, is the proportion oI (A)
within (A - C): 8o / (8o - j,o) 8o / oo ;.8%. II you
administer a mammography to o,ooo patients, then out oI the
oo with positive mammographies, 8o oI those positive
mammography patients will have cancer. This is the correct
answer, the answer a doctor should give a positivemammography
patient iI she asks about the chance she has breast cancer, iI
thirteen patients ask this question, roughly out oI those will
have cancer.
The most common mistake is to ignore the original Iraction oI
women with breast cancer, and the Iraction oI women without
breast cancer who receive Ialse positives, and Iocus only on the
Iraction oI women with breast cancer who get positive results. For
example, the vast majority oI doctors in these studies seem to have
thought that iI around 8o% oI women with breast cancer have
positive mammographies, then the probability oI a women with a
positive mammography having breast cancer must be around 8o%.
Figuring out the Iinal answer always requires all three pieces oI
inIormation the percentage oI women with breast cancer, the
percentage oI women without breast cancer who receive Ialse
positives, and the percentage oI women with breast cancer who
receive (correct) positives.
To see that the Iinal answer always depends on the original
Iraction oI women with breast cancer, consider an alternate
universe in which only one woman out oI a million has breast
cancer. Lven iI mammography in this world detects breast cancer
in 8 out oI o cases, while returning a Ialse positive on a woman
without breast cancer in only out oI o cases, there will still be a
hundred thousand Ialse positives Ior every real case oI cancer
detected. The original probability that a woman has cancer is so
extremely low that, although a positive result on the
32 MAP AND TERRITORY
mammography does increase the estimated probability, the
probability isn't increased to certainty or even a noticeable
chance", the probability goes Irom :,ooo,ooo to :oo,ooo.
Similarly, in an alternate universe where only one out oI a million
women does not have breast cancer, a positive result on the
patient's mammography obviously doesn't mean that she has an
8o% chance oI having breast cancer! II this were the case her
estimated probability oI having cancer would have been revised
drastically downward aIter she got a positive result on her
mammography an 8o% chance oI having cancer is a lot less than
jj.jjjj%! II you administer mammographies to ten million
women in this world, around eight million women with breast
cancer will get correct positive results, while one woman without
breast cancer will get Ialse positive results. Thus, iI you got a
positive mammography in this alternate universe, your chance oI
having cancer would go Irom jj.jjjj% up to jj.jjjj8;%. That
is, your chance oI being healthy would go Irom :,ooo,ooo down
to :8,ooo,ooo.
These two extreme examples help demonstrate that the
mammography result doesn't replace your old inIormation about
the patient's chance oI having cancer, the mammography slides the
estimated probability in the direction oI the result. A positive
result slides the original probability upward, a negative result slides
the probability downward. For example, in the original problem
where % oI the women have cancer, 8o% oI women with cancer
get positive mammographies, and j.6% oI women without cancer
get positive mammographies, a positive result on the
mammography slides the % chance upward to ;.8%.
Most people encountering problems oI this type Ior the Iirst time
carry out the mental operation oI replacing the original %
probability with the 8o% probability that a woman with cancer
gets a positive mammography. It may seem like a good idea, but it
just doesn't work. The probability that a woman with a positive
mammography has breast cancer" is not at all the same thing as
the probability that a woman with breast cancer has a positive
mammography", they are as unlike as apples and cheese. Finding
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 33
the Iinal answer, the probability that a woman with a positive
mammography has breast cancer", uses all three pieces oI problem
inIormation the prior probability that a woman has breast
cancer", the probability that a woman with breast cancer gets a
positive mammography", and the probability that a woman
without breast cancer gets a positive mammography".
Fun
Fact!
Q. What is the Bayesian Conspiracy?
A. The Bayesian Conspiracy is a
multinational, interdisciplinary, and
shadowy group oI scientists that controls
publication, grants, tenure, and the illicit
traIIic in grad students. The best way to be
accepted into the Bayesian Conspiracy is to
join the Campus Crusade Ior Bayes in high
school or college, and gradually work your
way up to the inner circles. It is rumored
that at the upper levels oI the Bayesian
Conspiracy exist nine silent Iigures known
only as the Bayes Council.
To see that the Iinal answer always depends on the chance that a
woman without breast cancer gets a positive mammography,
consider an alternate test, mammography-. Like the original test,
mammography- returns positive Ior 8o% oI women with breast
cancer. However, mammography- returns a positive result Ior only
one out oI a million women without breast cancer
mammography- has the same rate oI Ialse negatives, but a vastly
lower rate oI Ialse positives. Suppose a patient receives a positive
mammography-. \hat is the chance that this patient has breast
cancer: \nder the new test, it is a virtual certainty jj.j88%, i.e.,
a in 8o8z chance oI being healthy.
Remember, at this point, that neither mammography nor
mammography- actually change the number oI women who have
breast cancer. It may seem like There is a virtual certainty you
34 MAP AND TERRITORY
have breast cancer" is a terrible thing to say, causing much distress
and despair, that the more hopeIul verdict oI the previous
mammography test a ;.8% chance oI having breast cancer was
much to be preIerred. This comes under the heading oI Lon't
shoot the messenger". The number oI women who really do have
cancer stays exactly the same between the two cases. Only the
accuracy with which we detect cancer changes. \nder the previous
mammography test, 8o women with cancer (who already had
cancer, beIore the mammography) are Iirst told that they have a
;.8% chance oI having cancer, creating X amount oI uncertainty
and Iear, aIter which more detailed tests will inIorm them that
they deIinitely do have breast cancer. The old mammography test
also involves inIorming j,o women without breast cancer that they
have a ;.8% chance oI having cancer, thus creating twelve times as
much additional Iear and uncertainty. The new test,
mammography-, does not give j,o women Ialse positives, and the
8o women with cancer are told the same Iacts they would have
learned eventually, only earlier and without an intervening period
oI uncertainty. Mammography- is thus a better test in terms oI its
total emotional impact on patients, as well as being more accurate.
Regardless oI its emotional impact, it remains a Iact that a patient
with positive mammography- has a jj.j88% chance oI having
breast cancer.
OI course, that mammography- does not give j,o healthy women
Ialse positives means that all 8o oI the patients with positive
mammography- will be patients with breast cancer. Thus, iI you
have a positive mammography-, your chance oI having cancer is a
virtual certainty. It is because mammography- does not generate as
many Ialse positives (and needless emotional stress), that the
(much smaller) group oI patients who do get positive results will be
composed almost entirely oI genuine cancer patients (who have
bad news coming to them regardless oI when it arrives).
Similarly, let's suppose that we have a less discriminating test,
mammography*, that still has a zo% rate oI Ialse negatives, as in
the original case. However, mammography* has an 8o% rate oI
Ialse positives. In other words, a patient without breast cancer has
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 35
an 8o% chance oI getting a Ialse positive result on her
mammography* test. II we suppose the same % prior probability
that a patient presenting herselI Ior screening has breast cancer,
what is the chance that a patient with positive mammography* has
cancer:
Croup : oo patients with breast cancer.
Croup z: j,joo patients without breast cancer.
AIter mammography* screening:
Croup A: 8o patients with breast cancer and a positive"
mammography*.
Croup B: zo patients with breast cancer and a negative"
mammography*.
Croup C: ;jzo patients without breast cancer and a
positive" mammography*.
Croup L: j8o patients without breast cancer and a
negative" mammography*.
The result works out to 8o / 8,ooo, or o.o. This is exactly the
same as the % prior probability that a patient has breast cancer!
A positive" result on mammography* doesn't change the
probability that a woman has breast cancer at all. You can
similarly veriIy that a negative" mammography* also counts Ior
nothing. And in Iact it must be this way, because iI
mammography* has an 8o% hit rate Ior patients with breast
cancer, and also an 8o% rate oI Ialse positives Ior patients without
breast cancer, then mammography* is completely uncorrelated with
breast cancer. There's no reason to call one result positive" and
one result negative", in Iact, there's no reason to call the test a
mammography". You can throw away your expensive
mammography* equipment and replace it with a random number
generator that outputs a red light 8o% oI the time and a green
light zo% oI the time, the results will be the same. Furthermore,
there's no reason to call the red light a positive" result or the
green light a negative" result. You could have a green light 8o%
oI the time and a red light zo% oI the time, or a blue light 8o% oI
the time and a purple light zo% oI the time, and it would all have
the same bearing on whether the patient has breast cancer: i.e., no
bearing whatsoever.
\e can show algebraically that this must hold Ior any case where
36 MAP AND TERRITORY
the chance oI a true positive and the chance oI a Ialse positive are
the same, i.e:
Croup : oo patients with breast cancer.
Croup z: j,joo patients without breast cancer.
ow consider a test where the probability oI a true positive and
the probability oI a Ialse positive are the same number M (in the
example above, M8o% or M o.8):
Croup A: oo*M patients with breast cancer and a
positive" result.
Croup B: oo*( M) patients with breast cancer and a
negative" result.
Croup C: j,joo*M patients without breast cancer and a
positive" result.
Croup L: j,joo*( M) patients without breast cancer
and a negative" result.
The proportion oI patients with breast cancer, within the group oI
patients with a positive" result, then equals oo*M / (oo*M -
jjoo*M) oo / (oo - jjoo) %. This holds true regardless oI
whether M is 8o%, o%, ,o%, or oo%. II we have a
mammography* test that returns positive" results Ior jo% oI
patients with breast cancer and returns positive" results Ior jo%
oI patients without breast cancer, the proportion oI positive"
testing patients who have breast cancer will still equal the original
proportion oI patients with breast cancer, i.e., %.
You can run through the same algebra, replacing the prior
proportion oI patients with breast cancer with an arbitrary
percentage P:
Croup : \ithin some number oI patients, a Iraction P
have breast cancer.
Croup z: \ithin some number oI patients, a Iraction (
P) do not have breast cancer.
AIter a cancer test" that returns positive" Ior a Iraction M oI
patients with breast cancer, and also returns positive" Ior the
same Iraction M oI patients without cancer:
Croup A: P*M patients have breast cancer and a
positive" result.
Croup B: P*( M) patients have breast cancer and a
negative" result.
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 37
Croup C: ( P)*M patients have no breast cancer and a
positive" result.
Croup L: ( P)*( M) patients have no breast cancer
and a negative" result.
The chance that a patient with a positive" result has breast cancer
is then the proportion oI group A within the combined group A -
C, or P*M / |P*M - ( P)*M], which, cancelling the common
Iactor M Irom the numerator and denominator, is P / |P - ( P)]
or P / or just P. II the rate oI Ialse positives is the same as the
rate oI true positives, you always have the same probability aIter
the test as when you started.
\hich is common sense. Take, Ior example, the test" oI Ilipping
a coin, iI the coin comes up heads, does it tell you anything about
whether a patient has breast cancer: o, the coin has a ,o%
chance oI coming up heads iI the patient has breast cancer, and
also a ,o% chance oI coming up heads iI the patient does not have
breast cancer. ThereIore there is no reason to call either heads or
tails a positive" result. It's not the probability being ,o/,o" that
makes the coin a bad test, it's that the two probabilities, Ior
cancer patient turns up heads" and healthy patient turns up
heads", are the same. II the coin was slightly biased, so that it had
a 6o% chance oI coming up heads, it still wouldn't be a cancer test
what makes a coin a poor test is not that it has a ,o/,o chance oI
coming up heads iI the patient has cancer, but that it also has a ,o/
,o chance oI coming up heads iI the patient does not have cancer.
You can even use a test that comes up positive" Ior cancer
patients oo% oI the time, and still not learn anything. An
example oI such a test is Add z - z and see iI the answer is ."
This test returns positive oo% oI the time Ior patients with
breast cancer. It also returns positive oo% oI the time Ior
patients without breast cancer. So you learn nothing.
The original proportion oI patients with breast cancer is known as
the prior probability. The chance that a patient with breast cancer
gets a positive mammography, and the chance that a patient
without breast cancer gets a positive mammography, are known as
the two conditional probabilities. Collectively, this initial
inIormation is known as the priors. The Iinal answer the
38 MAP AND TERRITORY
estimated probability that a patient has breast cancer, given that
we know she has a positive result on her mammography is known
as the revised probability or the posterior probability. \hat we've just
shown is that if the two conditional probabilities are equal, the posterior
probability equals the prior probability.
Fun
Fact!
Q. How can I find the priors for a
problem?
A. Many commonly used priors are listed
in the Handbook of Chemistry and Physics.
Q. Where do priors originally come
from?
A. ever ask that question.
Q. Uh huh. Then where do scientists
get their priors?
A. Priors Ior scientiIic problems are
established by annual vote oI the AAAS. In
recent years the vote has become Iractious
and controversial, with widespread
acrimony, Iactional polarization, and several
outright assassinations. This may be a Iront
Ior inIighting within the Bayes Council, or
it may be that the disputants have too
much spare time. o one is really sure.
Q. I see. And where does everyone
else get their priors?
A. They download their priors Irom Kazaa.
Q. What if the priors I want arent
available on Kazaa?
A. There's a small, cluttered antique shop
in a back alley oI San Francisco's
Chinatown. Dont ask about the bronze rat.
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 39
Actually, priors are true or Ialse just like the Iinal answer they
reIlect reality and can be judged by comparing them against
reality. For example, iI you think that jzo out oI o,ooo women
in a sample have breast cancer, and the actual number is oo out oI
o,ooo, then your priors are wrong. For our particular problem,
the priors might have been established by three studies a study
on the case histories oI women with breast cancer to see how many
oI them tested positive on a mammography, a study on women
without breast cancer to see how many oI them test positive on a
mammography, and an epidemiological study on the prevalence oI
breast cancer in some speciIic demographic.
Suppose that a barrel contains many small plastic eggs. Some eggs
are painted red and some are painted blue. o% oI the eggs in the
bin contain pearls, and 6o% contain nothing. o% oI eggs
containing pearls are painted blue, and o% oI eggs containing
nothing are painted blue. \hat is the probability that a blue egg
contains a pearl: For this example the arithmetic is simple enough
that you may be able to do it in your head, and I would suggest
trying to do so.
A more compact way oI speciIying the problem:
- p(pearl) = 40%
- p(blue|pearl) = 30%
- p(blue|~pearl) = 10%
p(pearl|blue) = ?
~ is shorthand Ior not", so ~pearl reads not pearl".
blue|pearl is shorthand Ior blue given pearl" or the
probability that an egg is painted blue, given that the egg contains
a pearl". One thing that's conIusing about this notation is that the
order oI implication is read righttoleIt, as in Hebrew or Arabic.
blue|pearl means blue<-pearl", the degree to which pearl
ness implies blueness, not the degree to which blueness implies
pearlness. This is conIusing, but it's unIortunately the standard
notation in probability theory.
40 MAP AND TERRITORY
Readers Iamiliar with quantum mechanics will have already
encountered this peculiarity, in quantum mechanics, Ior example,
<d|c><c|b><b|a> reads as the probability that a particle at A
goes to B, then to C, ending up at L". To Iollow the particle, you
move your eyes Irom right to leIt. Reading Irom leIt to right, |
means given", reading Irom right to leIt, | means implies" or
leads to". Thus, moving your eyes Irom leIt to right,
blue|pearl reads blue given pearl" or the probability that an
egg is painted blue, given that the egg contains a pearl". Moving
your eyes Irom right to leIt, blue|pearl reads pearl implies
blue" or the probability that an egg containing a pearl is painted
blue".
The item on the right side is what you already know or the premise,
and the item on the leIt side is the implication or conclusion. II we
have p(blue|pearl) = 30%, and we already know that some
egg contains a pearl, then we can conclude there is a o% chance
that the egg is painted blue. Thus, the Iinal Iact we're looking Ior
the chance that a blue egg contains a pearl" or the probability
that an egg contains a pearl, iI we know the egg is painted blue"
reads p(pearl|blue).
Let's return to the problem. \e have that o% oI the eggs
contain pearls, and 6o% oI the eggs contain nothing. o% oI the
eggs containing pearls are painted blue, so z% oI the eggs
altogether contain pearls and are painted blue. o% oI the eggs
containing nothing are painted blue, so altogether 6% oI the eggs
contain nothing and are painted blue. A total oI 8% oI the eggs
are painted blue, and a total oI z% oI the eggs are painted blue
and contain pearls, so the chance a blue egg contains a pearl is z/8
or z/ or around 6;%.
The applet below, courtesy oI Christian Rovner, shows a graphic
representation oI this problem:
(Are you having trouble seeing this applet: Lo you see an image oI
the applet rather than the applet itselI: Try downloading an
updated ]ava
z
.)
z. http://www.java.com/en/index.jsp
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 41
Looking at this applet, it's easier to see why the Iinal answer
depends on all three probabilities, it's the differential pressure
between the two conditional probabilities, p(blue|pearl) and
p(blue|~pearl), that slides the prior probability p(pearl) to
the posterior probability p(pearl|blue).
As beIore, we can see the necessity oI all three pieces oI
inIormation by considering extreme cases (Ieel Iree to type them
into the applet). In a (large) barrel in which only one egg out oI a
thousand contains a pearl, knowing that an egg is painted blue
slides the probability Irom o.% to o.% (instead oI sliding the
probability Irom o% to 6;%). Similarly, iI jjj out oI ooo eggs
contain pearls, knowing that an egg is blue slides the probability
Irom jj.j% to jj.j66%, the probability that the egg does not
contain a pearl goes Irom /ooo to around /ooo. Lven when the
prior probability changes, the diIIerential pressure oI the two
conditional probabilities always slides the probability in the same
direction. II you learn the egg is painted blue, the probability the
egg contains a pearl always goes up but it goes up from the prior
probability, so you need to know the prior probability in order to
calculate the Iinal answer. o.% goes up to o.%, o% goes up to
z,%, o% goes up to 6;%, 8o% goes up to jz%, and jj.j% goes
up to jj.j66%. II you're interested in knowing how any other
probabilities slide, you can type your own prior probability into
the ]ava applet. You can also click and drag the dividing line
between pearl and ~pearl in the upper bar, and watch the
posterior probability change in the bottom bar.
Studies oI clinical reasoning show that most doctors carry out the
mental operation oI replacing the original % probability with the
8o% probability that a woman with cancer would get a positive
mammography. Similarly, on the pearlegg problem, most
respondents unIamiliar with Bayesian reasoning would probably
respond that the probability a blue egg contains a pearl is o%, or
perhaps zo% (the o% chance oI a true positive minus the o%
chance oI a Ialse positive). Lven iI this mental operation seems
like a good idea at the time, it makes no sense in terms oI the
42 MAP AND TERRITORY
question asked. It's like the experiment in which you ask a
secondgrader: II eighteen people get on a bus, and then seven
more people get on the bus, how old is the bus driver:" Many
secondgraders will respond: TwentyIive." They understand
when they're being prompted to carry out a particular mental
procedure, but they haven't quite connected the procedure to
reality. Similarly, to Iind the probability that a woman with a
positive mammography has breast cancer, it makes no sense
whatsoever to replace the original probability that the woman has
cancer with the probability that a woman with breast cancer gets a
positive mammography. either can you subtract the probability
oI a Ialse positive Irom the probability oI the true positive. These
operations are as wildly irrelevant as adding the number oI people
on the bus to Iind the age oI the bus driver.
I keep emphasizing the idea that evidence slides probability
because oI research that shows people tend to use spatial intutions
to grasp numbers. In particular, there's interesting evidence that
we have an innate sense oI quantity that's localized to leIt inIerior
parietal cortex patients with damage to this area can selectively
lose their sense oI whether , is less than 8, while retaining their
ability to read, write, and so on. (Yes, really!) The parietal cortex
processes our sense oI where things are in space (roughly speaking),
so an innate number line", or rather quantity line", may be
responsible Ior the human sense oI numbers. This is why I suggest
visualizing Bayesian evidence as sliding the probability along the
number line, my hope is that this will translate Bayesian reasoning
into something that makes sense to innate human brainware.
(That, really, is what an intuitive explanation" is.) For more
inIormation, see Stanislas Lehaene's The Number Sense.
A study by Cigerenzer and HoIIrage in jj, showed that some
ways oI phrasing story problems are much more evocative oI
correct Bayesian reasoning. The least evocative phrasing used
probabilities. A slightly more evocative phrasing used Irequencies
instead oI probabilities, the problem remained the same, but
instead oI saying that % oI women had breast cancer, one would
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 43
say that out oI oo women had breast cancer, that 8o out oI oo
women with breast cancer would get a positive mammography, and
so on. \hy did a higher proportion oI subjects display Bayesian
reasoning on this problem: Probably because saying out oI oo
women" encourages you to concretely visualize X women with
cancer, leading you to visualize X women with cancer and a
positive mammography, etc.
The most eIIective presentation Iound so Iar is what's known as
natural frequencies saying that o out oI oo eggs contain pearls, z
out oI o eggs containing pearls are painted blue, and 6 out oI 6o
eggs containing nothing are painted blue. A natural frequencies
presentation is one in which the inIormation about the prior
probability is included in presenting the conditional probabilities.
II you were just learning about the eggs' conditional probabilities
through natural experimentation, you would in the course oI
cracking open a hundred eggs crack open around o eggs
containing pearls, oI which z eggs would be painted blue, while
cracking open 6o eggs containing nothing, oI which about 6 would
be painted blue. In the course oI learning the conditional
probabilities, you'd see examples oI blue eggs containing pearls
about twice as oIten as you saw examples oI blue eggs containing
nothing.
It may seem like presenting the problem in this way is cheating",
and indeed iI it were a story problem in a math book, it probably
would be cheating. However, iI you're talking about real doctors,
you want to cheat, you want the doctors to draw the right
conclusions as easily as possible. The obvious next move would be
to present all medical statistics in terms oI natural Irequencies.
\nIortunately, while natural Irequencies are a step in the right
direction, it probably won't be enough. \hen problems are
presented in natural Irequences, the proportion oI people using
Bayesian reasoning rises to around halI. A big improvement, but
not big enough when you're talking about real doctors and real
patients.
A presentation oI the problem in natural frequencies might be
visualized like this:
44 MAP AND TERRITORY
In the Irequency visualization, the selective attrition oI the two
conditional probabilities changes the proportion oI eggs that
contain pearls. The bottom bar is shorter than the top bar, just as
the number oI eggs painted blue is less than the total number oI
eggs. The probability graph shown earlier is really just the
Irequency graph with the bottom bar renormalized", stretched
out to the same length as the top bar. In the Irequency applet you
can change the conditional probabilities by clicking and dragging
the leIt and right edges oI the graph. (For example, to change the
conditional probability blue|pearl, click and drag the line on
the leIt that stretches Irom the leIt edge oI the top bar to the leIt
edge oI the bottom bar.)
In the probability applet, you can see that when the conditional
probabilities are equal, there's no differential pressure the arrows
are the same size so the prior probability doesn't slide between
the top bar and the bottom bar. But the bottom bar in the
probability applet is just a renormalized (stretched out) version oI
the bottom bar in the Irequency applet, and the Irequency applet
shows why the probability doesn't slide iI the two conditional
probabilities are equal. Here's a case where the prior proportion
oI pearls remains o%, and the proportion oI pearl eggs painted
blue remains o%, but the number oI empty eggs painted blue is
also o%:
II you diminish two shapes by the same Iactor, their relative
proportion will be the same as beIore. II you diminish the leIt
section oI the top bar by the same Iactor as the right section, then
the bottom bar will have the same proportions as the top bar it'll
just be smaller. II the two conditional probabilities are equal,
learning that the egg is blue doesn't change the probability that the
egg contains a pearl Ior the same reason that similar triangles
have identical angles, geometric Iigures don't change shape when
you shrink them by a constant Iactor.
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 45
In this case, you might as well just say that 30% of eggs are painted
blue, since the probability oI an egg being painted blue is
independent oI whether the egg contains a pearl. Applying a test"
that is statistically independent oI its condition just shrinks the
sample size. In this case, requiring that the egg be painted blue
doesn't shrink the group oI eggs with pearls any more or less than
it shrinks the group oI eggs without pearls. It just shrinks the total
number oI eggs in the sample.
Fun
Fact!
Q. Why did the Bayesian reasoner cross
the road?
A. You need more inIormation to answer this
question.
Here's what the original medical problem looks like when
graphed. % oI women have breast cancer, 8o% oI those women
test positive on a mammography, and j.6% oI women without
breast cancer also receive positive mammographies.
As is now clearly visible, the mammography doesn't increase the
probability a positivetesting woman has breast cancer by
increasing the number oI women with breast cancer oI course
not, iI mammography increased the number oI women with breast
cancer, no one would ever take the test! However, requiring a
positive mammography is a membership test that eliminates many
more women without breast cancer than women with cancer. The
number oI women without breast cancer diminishes by a Iactor oI
more than ten, Irom j,joo to j,o, while the number oI women
with breast cancer is diminished only Irom oo to 8o. Thus, the
proportion oI 8o within ,oo is much larger than the proportion
oI oo within o,ooo. In the graph, the leIt sector (representing
women with breast cancer) is small, but the mammography test
projects almost all oI this sector into the bottom bar. The right
46 MAP AND TERRITORY
sector (representing women without breast cancer) is large, but the
mammography test projects a much smaller Iraction oI this sector
into the bottom bar. There are, indeed, Iewer women with breast
cancer and positive mammographies than there are women with
breast cancer obeying the law oI probabilities which requires that
p(A) >= p(A&B). But even though the leIt sector in the
bottom bar is actually slightly smaller, the proportion oI the leIt
sector within the bottom bar is greater though still not very
great. II the bottom bar were renormalized to the same length as
the top bar, it would look like the leIt sector had expanded. This
is why the proportion oI women with breast cancer" in the group
women with positive mammographies" is higher than the
proportion oI women with breast cancer" in the general
population although the proportion is still not very high. The
evidence oI the positive mammography slides the prior probability
oI % to the posterior probability oI ;.8%.
Suppose there's yet another variant oI the mammography test,
mammographyC, which behaves as Iollows. % oI women in a
certain demographic have breast cancer. Like ordinary
mammography, mammographyC returns positive j.6% oI the time
Ior women without breast cancer. However, mammographyC
returns positive o% oI the time (say, once in a billion) Ior women
with breast cancer. The graph Ior this scenario looks like this:
\hat is it that this test actually does: II a patient comes to you
with a positive result on her mammographyC, what do you say:
Congratulations, you're among the rare j.,% oI the population
whose health is deIinitely established by this test."
MammographyC isn't a cancer test, it's a health test! Few women
without breast cancer get positive results on mammographyC, but
only women without breast cancer ever get positive results at all.
ot much oI the right sector oI the top bar projects into the
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 47
bottom bar, but none oI the leIt sector projects into the bottom
bar. So a positive result on mammographyC means you definitely
don't have breast cancer.
\hat makes ordinary mammography a positive indicator Ior breast
cancer is not that someone named the result positive", but rather
that the test result stands in a speciIic Bayesian relation to the
condition oI breast cancer. You could call the same result
positive" or negative" or blue" or red" or ]ames RutherIord",
or give it no name at all, and the test result would still slide the
probability in exactly the same way. To minimize conIusion, a test
result which slides the probability oI breast cancer upward should
be called positive". A test result which slides the probability oI
breast cancer downward should be called negative". II the test
result is statistically unrelated to the presence or absence oI breast
cancer iI the two conditional probabilities are equal then we
shouldn't call the procedure a cancer test"! The meaning oI the
test is determined by the two conditional probabilities, any names
attached to the results are simply convenient labels.
The bottom bar Ior the graph oI mammographyC is small,
mammographyC is a test that's only rarely useIul. Or rather, the
test only rarely gives strong evidence, and most oI the time gives
weak evidence. A negative result on mammographyC does slide
probability it just doesn't slide it very Iar. Click the Result"
switch at the bottom leIt corner oI the applet to see what a
negative result on mammographyC would imply. You might intuit
that since the test could have returned positive Ior health, but
didn't, then the Iailure oI the test to return positive must mean
that the woman has a higher chance oI having breast cancer that
her probability oI having breast cancer must be slid upward by the
negative result on her health test.
This intuition is correct! The sum oI the groups with negative
results and positive results must always equal the group oI all
48 MAP AND TERRITORY
women. II the positivetesting group has more than its Iair share"
oI women without breast cancer, there must be an at least slightly
higher proportion oI women with cancer in the negativetesting
group. A positive result is rare but very strong evidence in one
direction, while a negative result is common but very weak
evidence in the opposite direction. You might call this the Law oI
Conservation oI Probability not a standard term, but the
conservation rule is exact. II you take the revised probability oI
breast cancer aIter a positive result, times the probability oI a
positive result, and add that to the revised probability oI breast
cancer aIter a negative result, times the probability oI a negative
result, then you must always arrive at the prior probability. II you
don't yet know what the test result is, the expected revised probability
aIter the test result arrives taking both possible results into
account should always equal the prior probability.
On ordinary mammography, the test is expected to return
positive" o.% oI the time 8o positive women with cancer plus
j,o positive women without cancer equals oo women with
positive results. Conversely, the mammography should return
negative 8j.;% oI the time: oo% o.% 8j.;%. A positive
result slides the revised probability Irom % to ;.8%, while a
negative result slides the revised probability Irom % to o.zz%. So
p(cancer|positive)*p(positive) -
p(cancer|negative)*p(negative) = 7.8%*10.3% +
0.22%*89.7% = 1% = p(cancer), as expected.
\hy as expected": Let's take a look at the quantities involved:
p(cancer): 0.01
Croup : oo women
with breast cancer
p(~cancer): 0.99
Croup z: jjoo women
without breast cancer
p(positive|cancer): 80.0%
8o% oI women with
breast cancer have
positive mammographies
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 49
p(~positive|cancer): 20.0%
zo% oI women with
breast cancer have
negative
mammographies
p(positive|~cancer): 9.6%
j.6% oI women without
breast cancer have
positive mammographies
p(~positive|~cancer): 90.4%
jo.% oI women
without breast cancer
have negative
mammographies
p(cancer&positive): 0.008
Croup A: 8o women
with breast cancer and
positive mammographies
p(cancer&~positive): 0.002
Croup B: zo women
with breast cancer and
negative
mammographies
p(~cancer&positive): 0.095
Croup C: j,o women
without breast cancer
and positive
mammographies
p(~cancer&~positive): 0.895
Croup L: 8j,o women
without breast cancer
and negative
mammographies
p(positive): 0.103
oo women with
positive results
p(~positive): 0.897
8j;o women with
negative results
p(cancer|positive): 7.80%
Chance you have breast
cancer iI mammography
is positive: ;.8%
50 MAP AND TERRITORY
p(~cancer|positive): 92.20%
Chance you are healthy
iI mammography is
positive: jz.z%
p(cancer|~positive): 0.22%
Chance you have breast
cancer iI mammography
is negative: o.zz%
p(~cancer|~positive): 99.78%
Chance you are healthy
iI mammography is
negative: jj.;8%
One oI the common conIusions in using Bayesian reasoning is to
mix up some or all oI these quantities which, as you can see, are
all numerically diIIerent and have diIIerent meanings. p(A&B) is
the same as p(B&A), but p(A|B) is not the same thing as
p(B|A), and p(A&B) is completely diIIerent Irom p(A|B). (I
don't know who chose the symmetrical | symbol to mean
implies", and then made the direction oI implication righttoleIt,
but it was probably a bad idea.)
To get acquainted with all these quantities and the relationships
between them, we'll play Iollow the degrees oI Ireedom". For
example, the two quantities p(cancer) and p(~cancer) have
degree oI Ireedom between them, because oI the general law p(A)
+ p(~A) = 1. II you know that p(~cancer) = .99, you can
obtain p(cancer) = 1 - p(~cancer) = .01. There's no
room to say that p(~cancer) = .99 and then also speciIy
p(cancer) = .25, it would violate the rule p(A) + p(~A) =
1.
p(positive|cancer) and p(~positive|cancer) also have
only one degree oI Ireedom between them, either a woman with
breast cancer gets a positive mammography or she doesn't. On the
other hand, p(positive|cancer) and
p(positive|~cancer) have two degrees oI Ireedom. You can
have a mammography test that returns positive Ior 8o% oI
cancerous patients and j.6% oI healthy patients, or that returns
positive Ior ;o% oI cancerous patients and z% oI healthy patients,
or even a health test that returns positive" Ior o% oI cancerous
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 51
patients and jz% oI healthy patients. The two quantities, the
output oI the mammography test Ior cancerous patients and the
output oI the mammography test Ior healthy patients, are in
mathematical terms independent, one cannot be obtained Irom
the other in any way, and so they have two degrees oI Ireedom
between them.
\hat about p(positive&cancer), p(positive|cancer),
and p(cancer): Here we have three quantities, how many
degrees oI Ireedom are there: In this case the equation that must
hold is p(positive&cancer) = p(positive|cancer) *
p(cancer). This equality reduces the degrees oI Ireedom by
one. II we know the Iraction oI patients with cancer, and chance
that a cancerous patient has a positive mammography, we can
deduce the Iraction oI patients who have breast cancer and a
positive mammography by multiplying. You should recognize this
operation Irom the graph, it's the projection oI the top bar into
the bottom bar. p(cancer) is the leIt sector oI the top bar, and
p(positive|cancer) determines how much oI that sector
projects into the bottom bar, and the leIt sector oI the bottom bar
is p(positive&cancer).
Similarly, iI we know the number oI patients with breast cancer
and positive mammographies, and also the number oI patients
with breast cancer, we can estimate the chance that a woman with
breast cancer gets a positive mammography by dividing:
p(positive|cancer) = p(positive&cancer) /
p(cancer). In Iact, this is exactly how such medical diagnostic
tests are calibrated, you do a study on 8,,zo women with breast
cancer and see that there are 6,86 (or thereabouts) women with
breast cancer andpositive mammographies, then divide 6,86 by
8,zo to Iind that 8o% oI women with breast cancer had positive
mammographies. (Incidentally, iI you accidentally divide 8,zo by
6,86 instead oI the other way around, your calculations will start
doing strange things, such as insisting that z,% oI women with
breast cancer and positive mammographies have breast cancer.
This is a common mistake in carrying out Bayesian arithmetic, in
52 MAP AND TERRITORY
my experience.) And Iinally, iI you know p(positive&cancer)
and p(positive|cancer), you can deduce how many cancer
patients there must have been originally. There are two degrees oI
Ireedom shared out among the three quantities, iI we know any
two, we can deduce the third.
How about p(positive), p(positive&cancer), and
p(positive&~cancer): Again there are only two degrees oI
Ireedom among these three variables. The equation occupying the
extra degree oI Ireedom is p(positive) =
p(positive&cancer) + p(positive&~cancer). This is
how p(positive) is computed to begin with, we Iigure out the
number oI women with breast cancer who have positive
mammographies, and the number oI women without breast cancer
who have positive mammographies, then add them together to get
the total number oI women with positive mammographies. It
would be very strange to go out and conduct a study to determine
the number oI women with positive mammographies just that
one number and nothing else but in theory you could do so. And
iI you then conducted another study and Iound the number oI
those women who had positive mammographies and breast cancer,
you would also know the number oI women with positive
mammographies and no breast cancer either a woman with a
positive mammography has breast cancer or she doesn't. In
general, p(A&B) + p(A&~B) = p(A). Symmetrically, p(A&B)
+ p(~A&B) = p(B).
\hat about p(positive&cancer),
p(positive&~cancer), p(~positive&cancer), and
p(~positive&~cancer): You might at Iirst be tempted to
think that there are only two degrees oI Ireedom Ior these Iour
quantities that you can, Ior example, get
p(positive&~cancer) by multiplying p(positive) *
p(~cancer), and thus that all Iour quantities can be Iound given
only the two quantities p(positive) and p(cancer). This is
not the case! p(positive&~cancer) = p(positive) *
p(~cancer) only iI the two probabilities are statistically
independent iI the chance that a woman has breast cancer has no
bearing on whether she has a positive mammography. As you'll
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 53
recall, this amounts to requiring that the two conditional
probabilities be equal to each other a requirement which would
eliminate one degree oI Ireedom. II you remember that these Iour
quantities are the groups A, B, C, and L, you can look over those
Iour groups and realize that, in theory, you can put any number oI
people into the Iour groups. II you start with a group oI 8o
women with breast cancer and positive mammographies, there's no
reason why you can't add another group oI ,oo women with breast
cancer and negative mammographies, Iollowed by a group oI
women without breast cancer and negative mammographies, and
so on. So now it seems like the Iour quantities have Iour degrees
oI Ireedom. And they would, except that in expressing them as
probabilities, we need to normalize them to fractions oI the
complete group, which adds the constraint that
p(positive&cancer) + p(positive&~cancer) +
p(~positive&cancer) + p(~positive&~cancer) = 1.
This equation takes up one degree oI Ireedom, leaving three
degrees oI Ireedom among the Iour quantities. II you speciIy the
fractions oI women in groups A, B, and L, you can deduce the
Iraction oI women in group C.
Civen the Iour groups A, B, C, and L, it is very straightIorward to
compute everything else: p(cancer) = A + B,
p(~positive|cancer) = B / (A + B), and so on. Since
ABCL contains three degrees oI Ireedom, it Iollows that the
entire set oI 6 probabilities contains only three degrees oI
Ireedom. Remember that in our problems we always needed three
pieces oI inIormation the prior probability and the two
conditional probabilities which, indeed, have three degrees oI
Ireedom among them. Actually, Ior Bayesian problems, any three
quantities with three degrees oI Ireedom between them should
logically speciIy the entire problem. For example, let's take a
barrel oI eggs with p(blue) = 0.40, p(blue|pearl) = 5/
13, and p(~blue&~pearl) = 0.20. Civen this inIormation,
you can compute p(pearl|blue).
As a story problem:
Suppose you have a large barrel containing a number oI plastic
54 MAP AND TERRITORY
eggs. Some eggs contain pearls, the rest contain nothing. Some
eggs are painted blue, the rest are painted red. Suppose that o%
oI the eggs are painted blue, ,/ oI the eggs containing pearls are
painted blue, and zo% oI the eggs are both empty and painted
red. \hat is the probability that an egg painted blue contains a
pearl:
Try it I assure you it is possible.
You probably shouldn't try to solve this with just a ]avascript
calculator, though. I used a Python console. (In theory, pencil
and paper should also work, but I don't know anyone who owns a
pencil so I couldn't try it personally.)
As a check on your calculations, does the (meaningless) quantity
p(~pearl|~blue)/p(pearl) roughly equal .,: (In story
problem terms: The likelihood that a red egg is empty, divided by
the likelihood that an egg contains a pearl, equals approximately
.,.) OI course, using this inIormation in the problem would be
cheating.
II you can solve that problem, then when we revisit Conservation
oI Probability, it seems perIectly straightIorward. OI course the
mean revised probability, aIter administering the test, must be the
same as the prior probability. OI course strong but rare evidence
in one direction must be counterbalanced by common but weak
evidence in the other direction.
Because:
p(cancer|positive)*p(positive)
+ p(cancer|~positive)*p(~positive)
= p(cancer)
In terms oI the Iour groups:
p(cancer|positive) = A / (A + C)
p(positive) = A + C
p(cancer&positive) = A
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 55
p(cancer|~positive) = B / (B + D)
p(~positive) = B + D
p(cancer&~positive) = B
p(cancer) = A + B
Let's return to the original barrel oI eggs o% oI the eggs
containing pearls, o% oI the pearl eggs painted blue, o% oI the
empty eggs painted blue. The graph Ior this problem is:
\hat happens to the revised probability, p(pearl|blue), iI the
proportion oI eggs containing pearls is kept constant, but 6o% oI
the eggs with pearls are painted blue (instead oI o%), and zo% oI
the empty eggs are painted blue (instead oI o%): You could type
6o% and zo% into the inputs Ior the two conditional
probabilities, and see how the graph changes but can you Iigure
out in advance what the change will look like:
II you guessed that the revised probability remains the same, because
the bottom bar grows by a Iactor oI z but retains the same
proportions, congratulations! Take a moment to think about how
Iar you've come. Looking at a problem like
% oI women have breast cancer. 8o% oI women with
breast cancer get positive mammographies. j.6% oI women
without breast cancer get positive mammographies. II a
woman has a positive mammography, what is the probability
she has breast cancer:
the vast majority oI respondents intuit that around ;o8o% oI
women with positive mammographies have breast cancer. ow,
looking at a problem like
Suppose there are two barrels containing many small plastic
eggs. In both barrels, some eggs are painted blue and the
56 MAP AND TERRITORY
rest are painted red. In both barrels, o% oI the eggs
contain pearls and the rest are empty. In the Iirst barrel,
o% oI the pearl eggs are painted blue, and o% oI the
empty eggs are painted blue. In the second barrel, 6o% oI
the pearl eggs are painted blue, and zo% oI the empty eggs
are painted blue. \ould you rather have a blue egg Irom
the Iirst or second barrel:
you can see it's intuitively obvious that the probability oI a blue egg
containing a pearl is the same Ior either barrel. Imagine how hard
it would be to see that using the old way oI thinking!
It's intuitively obvious, but how to prove it: Suppose that we call
P the prior probability that an egg contains a pearl, that we call M
the Iirst conditional probability (that a pearl egg is painted blue),
and the second conditional probability (that an empty egg is
painted blue). Suppose that M and are both increased or
diminished by an arbitrary Iactor X Ior example, in the problem
above, they are both increased by a Iactor oI z. Loes the revised
probability that an egg contains a pearl, given that we know the
egg is blue, stay the same:
- p(pearl) = P
- p(blue|pearl) = M*X
- p(blue|~pearl) = N*X
- p(pearl|blue) = ?
From these quantities, we get the Iour groups:
Croup A: p(pearl&blue) = P*M*X
Croup B: p(pearl&~blue) = P*(1 - (M*X))
Croup C: p(~pearl&blue) = (1 - P)*N*X
Croup L: p(~pearl&~blue) = (1 - P)*(1 -
(N*X))
The proportion oI eggs that contain pearls and are blue, within the
group oI all blue eggs, is then the proportion oI group (A) within
the group (A - C), equalling P*M*X / (P*M*X + (1 -
P)*N*X). The Iactor X in the numerator and denominator
cancels out, so increasing or diminishing both conditional
probabilities by a constant Iactor doesn't change the revised
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 57
probability.
Fun
Fact!
Q. Suppose that there are two barrels,
each containing a number of plastic
eggs. In both barrels, some eggs are
painted blue and the rest are painted
red. In the first barrel, 90% of the
eggs contain pearls and 20% of the
pearl eggs are painted blue. In the
second barrel, 45% of the eggs contain
pearls and 60% of the empty eggs are
painted red. Would you rather have a
blue pearl egg from the first or second
barrel?
A. Actually, it doesn't matter which barrel
you choose! Can you see why:
The probability that a test gives a true positive divided by the probability
that a test gives a false positive is known as the likelihood ratio oI that
test. Loes the likelihood ratio oI a medical test sum up everything
there is to know about the useIulness oI the test:
o, it does not! The likelihood ratio sums up everything there is
to know about the meaning oI a positive result on the medical test,
but the meaning oI a negative result on the test is not speciIied, nor
is the Irequency with which the test is useIul. II we examine the
algebra above, while p(pearl|blue) remains constant,
p(pearl|~blue) may change the X does not cancel out. As a
story problem, this strange Iact would look something like this:
Suppose that there are two barrels, each containing a
number oI plastic eggs. In both barrels, o% oI the eggs
contain pearls and the rest contain nothing. In both barrels,
some eggs are painted blue and the rest are painted red. In
the Iirst barrel, o% oI the eggs with pearls are painted blue,
and o% oI the empty eggs are painted blue. In the second
58 MAP AND TERRITORY
barrel, jo% oI the eggs with pearls are painted blue, and
o% oI the empty eggs are painted blue. \ould you rather
have a blue egg Irom the Iirst or second barrel: \ould you
rather have a red egg Irom the Iirst or second barrel:
For the Iirst question, the answer is that we don't care whether we
get the blue egg Irom the Iirst or second barrel. For the second
question, however, the probabilities do change in the Iirst barrel,
% oI the red eggs contain pearls, while in the second barrel 8.;%
oI the red eggs contain pearls! Thus, we should preIer to get a red
egg Irom the Iirst barrel. In the Iirst barrel, ;o% oI the pearl eggs
are painted red, and jo% oI the empty eggs are painted red. In
the second barrel, o% oI the pearl eggs are painted red, and ;o%
oI the empty eggs are painted red.
\hat goes on here: \e start out by noting that, counter to
intuition, p(pearl|blue) and p(pearl|~blue) have two
degrees oI Ireedom among them even when p(pearl) is Iixed
so there's no reason why one quantity shouldn't change while the
other remains constant. But we didn't we just get through
establishing a law Ior Conservation oI Probability", which says
that p(pearl|blue)*p(blue) +
p(pearl|~blue)*p(~blue) = p(pearl): Loesn't this
equation take up one degree oI Ireedom: o, because p(blue)
isn't Iixed between the two problems. In the second barrel, the
proportion oI blue eggs containing pearls is the same as in the Iirst
barrel, but a much larger Iraction oI eggs are painted blue! This
alters the set oI red eggs in such a way that the proportions do
change. Here's a graph Ior the red eggs in the second barrel:
Let's return to the example oI a medical test. The likelihood ratio
oI a medical test the number oI true positives divided by the
number oI Ialse positives tells us everything there is to know
about the meaning oI a positive result. But it doesn't tell us the
meaning oI a negative result, and it doesn't tell us how oIten the
test is useIul. For example, a mammography with a hit rate oI 8o%
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 59
Ior patients with breast cancer and a Ialse positive rate oI j.6% Ior
healthy patients has the same likelihood ratio as a test with an 8%
hit rate and a Ialse positive rate oI o.j6%. Although these two
tests have the same likelihood ratio, the Iirst test is more useIul in
every way it detects disease more oIten, and a negative result is
stronger evidence oI health.
The likelihood ratio Ior a positive result summarizes the
diIIerential pressure oI the two conditional probabilities Ior a
positive result, and thus summarizes how much a positive result
will slide the prior probability. Take a probability graph, like this
one:
The likelihood ratio oI the mammography is what determines the
slant oI the line. II the prior probability is %, then knowing only
the likelihood ratio is enough to determine the posterior
probability aIter a positive result.
But, as you can see Irom the Irequency graph, the likelihood ratio
doesn't tell the whole story in the Irequency graph, the proportions
oI the bottom bar can stay Iixed while the size oI the bottom bar
changes. p(blue) increases but p(pearl|blue) doesn't
change, because p(pearl&blue) and p(~pearl&blue)
increase by the same Iactor. But when you Ilip the graph to look at
p(~blue), the proportions oI p(pearl&~blue) and
p(~pearl&~blue) do not remain constant.
OI course the likelihood ratio cant tell the whole story, the
likelihood ratio and the prior probability together are only two
numbers, while the problem has three degrees oI Ireedom.
Suppose that you apply two tests Ior breast cancer in succession
say, a standard mammography and also some other test which is
independent oI mammography. Since I don't know oI any such
test which is independent oI mammography, I'll invent one Ior the
60 MAP AND TERRITORY
purpose oI this problem, and call it the TamsBraylor Livision
Test, which checks to see iI any cells are dividing more rapidly
than other cells. \e'll suppose that the TamsBraylor gives a true
positive Ior jo% oI patients with breast cancer, and gives a Ialse
positive Ior ,% oI patients without cancer. Let's say the prior
prevalence oI breast cancer is %. II a patient gets a positive result
on her mammography and her TamsBraylor, what is the revised
probability she has breast cancer:
One way to solve this problem would be to take the revised
probability Ior a positive mammography, which we already
calculated as ;.8%, and plug that into the TamsBraylor test as the
new prior probability. II we do this, we Iind that the result comes
out to 6o%.
But this assumes that Iirst we see the positive mammography
result, and then the positive result on the TamsBraylor. \hat iI
Iirst the woman gets a positive result on the TamsBraylor,
Iollowed by a positive result on her mammography. Intuitively, it
seems like it shouldn't matter. Loes the math check out:
First we'll administer the TamsBraylor to a woman with a %
prior probability oI breast cancer.
Then we administer a mammography, which gives 8o% true
positives and j.6% Ialse positives, and it also comes out positive.
Lo and behold, the answer is again 6o%. (II it's not exactly the
same, it's due to rounding error you can get a more precise
calculator, or work out the Iractions by hand, and the numbers will
be exactly equal.)
An algebraic prooI that both strategies are equivalent is leIt to the
reader. To visualize, imagine that the lower bar oI the Irequency
applet Ior mammography projects an even lower bar using the
probabilities oI the TamsBraylor Test, and that the Iinal lowest
bar is the same regardless oI the order in which the conditional
probabilities are projected.
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 61
\e might also reason that since the two tests are independent, the
probability a woman with breast cancer gets a positive
mammography and a positive TamsBraylor is jo% * 8o% ;z%.
And the probability that a woman without breast cancer gets Ialse
positives on mammography and TamsBraylor is ,% * j.6%
o.8%. So iI we wrap it all up as a single test with a likelihood
ratio oI ;z%/o.8%, and apply it to a woman with a % prior
probability oI breast cancer:
.we Iind once again that the answer is 6o%.
Suppose that the prior prevalence oI breast cancer in a
demographic is %. Suppose that we, as doctors, have a repertoire
oI three independent tests Ior breast cancer. Our Iirst test, test A,
a mammography, has a likelihood ratio oI 8o%/j.6% 8.. The
second test, test B, has a likelihood ratio oI 8.o (Ior example, Irom
jo% versus ,%), and the third test, test C, has a likelihood ratio oI
., (which could be Irom ;o% versus zo%, or Irom ,% versus
o%, it makes no diIIerence). Suppose a patient gets a positive
result on all three tests. \hat is the probability the patient has
breast cancer:
Here's a Iun trick Ior simpliIying the bookkeeping. II the prior
prevalence oI breast cancer in a demographic is %, then out oI
oo women have breast cancer, and jj out oI oo women do not
have breast cancer. So iI we rewrite the probability oI % as an odds
ratio, the odds are:
1:99
And the likelihood ratios oI the three tests A, B, and C are:
8.33:1 = 25:3
18.0:1 = 18:1
3.5:1 = 7:2
The odds Ior women with breast cancer who score positive on all
three tests, versus women without breast cancer who score
62 MAP AND TERRITORY
positive on all three tests, will equal:
1*25*18*7:99*3*1*2 =
3,150:594
To recover the probability Irom the odds, we just write:
3,150 / (3,150 + 594) = 84%
This always works regardless oI how the odds ratios are written,
i.e., 8.: is just the same as z,: or ;,:j. It doesn't matter in what
order the tests are administered, or in what order the results are
computed. The prooI is leIt as an exercise Ior the reader.
L. T. ]aynes, in Probability Theory \ith Applications in Science
and Lngineering", suggests that credibility and evidence should be
measured in decibels.
Lecibels:
Lecibels are used Ior measuring exponential diIIerences oI
intensity. For example, iI the sound Irom an automobile horn
carries o,ooo times as much energy (per square meter per second)
as the sound Irom an alarm clock, the automobile horn would be
o decibels louder. The sound oI a bird singing might carry ,ooo
times less energy than an alarm clock, and hence would be o
decibels soIter. To get the number oI decibels, you take the
logarithm base o and multiply by o.
decibels = 10 log
10
(intensity)
or
intensity = 10
(decibels/10)
Suppose we start with a prior probability oI % that a woman has
breast cancer, corresponding to an odds ratio oI :jj. And then we
administer three tests oI likelihood ratios z,:, 8:, and ;:z. You
could multiply those numbers. or you could just add their
logarithms:
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 63
10 log
10
(1/99) = -20
10 log
10
(25/3) = 9
10 log
10
(18/1) = 13
10 log
10
(7/2) = 5
It starts out as Iairly unlikely that a woman has breast cancer our
credibility level is at zo decibels. Then three test results come in,
corresponding to j, , and , decibels oI evidence. This raises the
credibility level by a total oI z; decibels, meaning that the prior
credibility oI zo decibels goes to a posterior credibility oI ;
decibels. So the odds go Irom :jj to ,:, and the probability goes
Irom % to around 8%.
In Iront oI you is a bookbag containing ,ooo poker chips. I
started out with two such bookbags, one containing ;oo red
and oo blue chips, the other containing oo red and ;oo
blue. I Ilipped a Iair coin to determine which bookbag to
use, so your prior probability that the bookbag in Iront oI
you is the red bookbag is ,o%. ow, you sample randomly,
with replacement aIter each chip. In z samples, you get 8
reds and blues. \hat is the probability that this is the
predominantly red bag:
]ust Ior Iun, try and work this one out in your head. You don't
need to be exact a rough estimate is good enough. \hen you're
ready, continue onward.
According to a study perIormed by Lawrence Phillips and \ard
Ldwards in j66, most people, Iaced with this problem, give an
answer in the range ;o% to 8o%. Lid you give a substantially
higher probability than that: II you did, congratulations \ard
Ldwards wrote that very seldom does a person answer this
question properly, even iI the person is relatively Iamiliar with
Bayesian reasoning. The correct answer is j;%.
64 MAP AND TERRITORY
The likelihood ratio Ior the test result red chip" is ;/, while the
likelihood ratio Ior the test result blue chip" is /;. ThereIore a
blue chip is exactly the same amount oI evidence as a red chip, just
in the other direction a red chip is .6 decibels oI evidence Ior the
red bag, and a blue chip is .6 decibels oI evidence. II you draw
one blue chip and one red chip, they cancel out. So the ratio oI red
chips to blue chips does not matter, only the excess oI red chips
over blue chips matters. There were eight red chips and Iour blue
chips in twelve samples, thereIore, Iour more red chips than blue
chips. Thus the posterior odds will be:
7
4
:3
4
= 2401:81
which is around o:, i.e., around j;%.
The prior credibility starts at o decibels and there's a total oI
around decibels oI evidence, and indeed this corresponds to
odds oI around z,: or around j6%. Again, there's some rounding
error, but iI you perIormed the operations using exact arithmetic,
the results would be identical.
\e can now see intuitively that the bookbag problem would have
exactly the same answer, obtained in just the same way, iI sixteen
chips were sampled and we Iound ten red chips and six blue chips.
You are a mechanic Ior gizmos. \hen a gizmo stops
working, it is due to a blocked hose o% oI the time. II a
gizmo's hose is blocked, there is a ,% probability that
prodding the gizmo will produce sparks. II a gizmo's hose is
unblocked, there is only a ,% chance that prodding the
gizmo will produce sparks. A customer brings you a
malIunctioning gizmo. You prod the gizmo and Iind that it
produces sparks. \hat is the probability that a spark
producing gizmo has a blocked hose:
\hat is the sequence oI arithmetical operations that you
perIormed to solve this problem:
(,%*o%) / (,%*o% - ,%*;o%)
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 65
Similarly, to Iind the chance that a woman with positive
mammography has breast cancer, we computed:
p(positive|cancer)*p(cancer)
_______________________________________________
p(positive|cancer)*p(cancer) +
p(positive|~cancer)*p(~cancer)
which is
p(positive&cancer) / [p(positive&cancer) +
p(positive&~cancer)]
which is
p(positive&cancer) / p(positive)
which is
p(cancer|positive)
The Iully general Iorm oI this calculation is known as Bayes
Theorem or Bayes Rule:
Bayes' Theorem:
p(A|X) =
p(X|A)*p(A)
p(X|A)*p(A) + p(X|~A)*p(~A)
Civen some phenomenon A that we want to investigate, and an
observation X that is evidence about A Ior example, in the
previous example, A is breast cancer and X is a positive
mammography Bayes' Theorem tells us how we should update our
probability oI A, given the new evidence X.
66 MAP AND TERRITORY
By this point, Bayes' Theorem may seem blatantly obvious or even
tautological, rather than exciting and new. II so, this introduction
has entirely succeeded in its purpose.
Fun
Fact!
Q. Who originally discovered Bayes
Theorem?
A. The Reverend Thomas Bayes, by Iar the
most enigmatic Iigure in mathematical
history. Almost nothing is known oI
Bayes's liIe, and very Iew oI his manuscripts
survived. Thomas Bayes was born in ;o
or ;oz to ]oshua Bayes and Ann Carpenter,
and his date oI death is listed as ;6. The
exact date oI Thomas Bayes's birth is not
known Ior certain because ]oshua Bayes,
though a surprisingly wealthy man, was a
member oI an unusual, esoteric, and even
heretical religious sect, the
onconIormists". The onconIormists
kept their birth registers secret, supposedly
Irom Iear oI religious discrimination,
whatever the reason, no true record exists
oI Thomas Bayes's birth. Thomas Bayes
was raised a onconIormist and was soon
promoted into the higher ranks oI the
onconIormist theosophers, whence comes
the Reverend" in his name.
In ;z Bayes was elected a Fellow oI the
Royal Society oI London, the most
prestigious scientiIic body oI its day,
despite Bayes having published no scientiIic
or mathematical works at that time.
Bayes's nomination certiIicate was signed
by sponsors including the President and the
Secretary oI the Society, making his
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 67
election almost certain. Lven today,
however, it remains a mystery why such
weighty names sponsored an unknown into
the Royal Society.
Bayes's sole publication during his known
liIetime was allegedly a mystical book
entitled Divine Benevolence, laying Iorth the
original causation and ultimate purpose oI
the universe. The book is commonly
attributed to Bayes, though it is said that
no author appeared on the title page, and
the entire work is sometimes considered to
be oI dubious provenance.
Most mysterious oI all, Bayes' Theorem
itselI appears in a Bayes manuscript
presented to the Royal Society oI London
in ;6, three years after Bayess supposed death
in 1761!
Lespite the shocking circumstances oI its
presentation, Bayes' Theorem was soon
Iorgotten, and was popularized within the
scientiIic community only by the later
eIIorts oI the great mathematician Pierre
Simon Laplace. Laplace himselI is almost as
enigmatic as Bayes, we don't even know
whether it was Pierre" or Simon" that was
his actual Iirst name. Laplace's papers are
said to have contained a design Ior an AI
capable oI predicting all Iuture events, the
socalled Laplacian superintelligence".
\hile it is generally believed that Laplace
never tried to implement his design, there
remains the Iact that Laplace presciently
Iled the guillotine that claimed many oI his
colleagues during the Reign oI Terror.
68 MAP AND TERRITORY
Lven today, physicists sometimes attribute
unusual eIIects to a Laplacian Operator"
intervening in their experiments.
In summary, we do not know the real
circumstances oI Bayes's birth, the ultimate
origins oI Bayes' Theorem, Bayes's actual
year oI death, or even whether Bayes ever
really died. onetheless Reverend
Thomas Bayes", whatever his true identity,
has the greatest Iondness and gratitude oI
Larth's scientiIic community.
So why is it that some people are so excited about Bayes' Theorem:
Lo you believe that a nuclear war will occur in the next zo years:
II no, why not:" Since I wanted to use some common answers to
this question to make a point about rationality, I went ahead and
asked the above question in an IRC channel, =philosophy on
LFet.
One LFetter who answered replied o" to the above question,
but added that he believed biological warIare would wipe out
jj.%" oI humanity within the next ten years. I then asked
whether he believed oo% was a possibility. "o," he said. "\hy
not:", I asked. "Because I'm an optimist," he said. (Roanoke oI
=philosophy on LFet wishes to be credited with this statement,
even having been warned that it will not be cast in a
complimentary light. Cood Ior him!) Another person who
answered the above question said that he didn't expect a nuclear
war Ior oo years, because All oI the players involved in decisions
regarding nuclear war are not interested right now." "But why
extend that out Ior oo years:", I asked. "Pure hope," was his
reply.
\hat is it exactly that makes these thoughts irrational" a poor
way oI arriving at truth: There are a number oI intuitive replies
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 69
that can be given to this, Ior example: It is not rational to believe
things only because they are comIorting." OI course it is equally
irrational to believe things only because they are discomforting; the
second error is less common, but equally irrational. Other
intuitive arguments include the idea that \hether or not you
happen to be an optimist has nothing to do with whether
biological warIare wipes out the human species", or Pure hope is
not evidence about nuclear war because it is not an observation
about nuclear war."
There is also a mathematical reply that is precise, exact, and
contains all the intuitions as special cases. This mathematical
reply is known as Bayes' Theorem.
For example, the reply \hether or not you happen to be an
optimist has nothing to do with whether biological warIare wipes
out the human species" can be translated into the statement:
p(you are currently an optimist ' biological war occurs within ten
years and wipes out humanity)
p(you are currently an optimist ' biological war occurs within ten
years and does not wipe out humanity)
Since the two probabilities Ior p(X|A) and p(X|~A) are equal,
Bayes' Theorem says that p(A|X) = p(A), as we have earlier
seen, when the two conditional probabilities are equal, the revised
probability equals the prior probability. II X and A are
unconnected statistically independent then Iinding that X is
true cannot be evidence that A is true, observing X does not
update our probability Ior A, saying X" is not an argument Ior A.
But suppose you are arguing with someone who is verbally clever
and who says something like, Ah, but since I'm an optimist, I'll
have renewed hope Ior tomorrow, work a little harder at my dead
end job, pump up the global economy a little, eventually, through
the trickledown eIIect, sending a Iew dollars into the pocket oI
the researcher who ultimately Iinds a way to stop biological
warIare so you see, the two events are related aIter all, and I can
use one as valid evidence about the other." In one sense, this is
70 MAP AND TERRITORY
correct any correlation, no matter how weak, is Iair prey Ior
Bayes' Theorem, but Bayes' Theorem distinguishes between weak
and strong evidence. That is, Bayes' Theorem not only tells us
what is and isn't evidence, it also describes the strength oI
evidence. Bayes' Theorem not only tells us when to revise our
probabilities, but how much to revise our probabilities. A
correlation between hope and biological warIare may exist, but it's
a lot weaker than the speaker wants it to be, he is revising his
probabilities much too Iar.
Let's say you're a woman who's just undergone a mammography.
Previously, you Iigured that you had a very small chance oI having
breast cancer, we'll suppose that you read the statistics somewhere
and so you know the chance is %. \hen the positive
mammography comes in, your estimated chance should now shiIt
to ;.8%. There is no room to say something like, Oh, well, a
positive mammography isn't deIinite evidence, some healthy
women get positive mammographies too. I don't want to despair
too early, and I'm not going to revise my probability until more
evidence comes in. \hy: Because I'm a optimist." And there is
similarly no room Ior saying, \ell, a positive mammography may
not be deIinite evidence, but I'm going to assume the worst until I
Iind otherwise. \hy: Because I'm a pessimist." Your revised
probability should go to ;.8%, no more, no less.
Bayes' Theorem describes what makes something evidence" and
how much evidence it is. Statistical models are judged by
comparison to the Bayesian method because, in statistics, the
Bayesian method is as good as it gets the Bayesian method
deIines the maximum amount oI mileage you can get out oI a given
piece oI evidence, in the same way that thermodynamics deIines
the maximum amount oI work you can get out oI a temperature
diIIerential. This is why you hear cognitive scientists talking about
Bayesian reasoners. In cognitive science, Bayesian reasoner is the
technically precise codeword that we use to mean rational mind.
There are also a number oI general heuristics about human
reasoning that you can learn Irom looking at Bayes' Theorem.
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 71
For example, in many discussions oI Bayes' Theorem, you may hear
cognitive psychologists saying that people do not take prior
frequencies sufficiently into account, meaning that when people
approach a problem where there's some evidence X indicating that
condition A might hold true, they tend to judge A's likelihood
solely by how well the evidence X seems to match A, without
taking into account the prior Irequency oI A. II you think, Ior
example, that under the mammography example, the woman's
chance oI having breast cancer is in the range oI ;o%8o%, then
this kind oI reasoning is insensitive to the prior Irequency given in
the problem, it doesn't notice whether % oI women or o% oI
women start out having breast cancer. Pay more attention to the
prior Irequency!" is one oI the many things that humans need to
bear in mind to partially compensate Ior our builtin inadequacies.
A related error is to pay too much attention to p(X'A) and not
enough to p(X'-A) when determining how much evidence X is Ior
A. The degree to which a result X is evidence for A depends, not
only on the strength oI the statement wed expect to see result X if A
were true, but also on the strength oI the statement we wouldnt
expect to see result X if A werent true. For example, iI it is raining,
this very strongly implies the grass is wet - p(wetgrass|rain)
~ 1 but seeing that the grass is wet doesn't necessarily mean
that it has just rained, perhaps the sprinkler was turned on, or
you're looking at the early morning dew. Since
p(wetgrass|~rain) is substantially greater than zero,
p(rain|wetgrass) is substantially less than one. On the other
hand, iI the grass was never wet when it wasn't raining, then
knowing that the grass was wet would always show that it was
raining, p(rain|wetgrass) ~ 1, even iI
p(wetgrass|rain) = 50%, that is, even iI the grass only got
wet ,o% oI the times it rained. Lvidence is always the result oI the
differential between the two conditional probabilities. Strong
evidence is not the product oI a very high probability that A leads
to X, but the product oI a very low probability that not-A could
have led to X.
The Bayesian revolution in the sciences is Iueled, not only by more and
more cognitive scientists suddenly noticing that mental
72 MAP AND TERRITORY
phenomena have Bayesian structure in them, not only by scientists
in every Iield learning to judge their statistical methods by
comparison with the Bayesian method, but also by the idea that
science itself is a special case of Bayes Theorem; experimental evidence is
Bayesian evidence. The Bayesian revolutionaries hold that when you
perIorm an experiment and get evidence that conIirms" or
disconIirms" your theory, this conIirmation and disconIirmation
is governed by the Bayesian rules. For example, you have to take
into account, not only whether your theory predicts the
phenomenon, but whether other possible explanations also predict
the phenomenon. Previously, the most popular philosophy oI
science was probably Karl Popper's falsificationism this is the old
philosophy that the Bayesian revolution is currently dethroning.
Karl Popper's idea that theories can be deIinitely IalsiIied, but
never deIinitely conIirmed, is yet another special case oI the
Bayesian rules, iI p(X|A) ~ 1 iI the theory makes a deIinite
prediction then observing -X very strongly IalsiIies A. On the
other hand, iI p(X|A) ~ 1, and we observe X, this doesn't
deIinitely conIirm the theory, there might be some other
condition B such that p(X|B) ~ 1, in which case observing X
doesn't Iavor A over B. For observing X to deIinitely conIirm A,
we would have to know, not that p(X|A) ~ 1, but that
p(X|~A) ~ 0, which is something that we can't know because
we can't range over all possible alternative explanations. For
example, when Linstein's theory oI Ceneral Relativity toppled
ewton's incredibly wellconIirmed theory oI gravity, it turned out
that all oI ewton's predictions were just a special case oI
Linstein's predictions.
You can even Iormalize Popper's philosophy mathematically. The
likelihood ratio Ior X, p(X|A)/p(X|~A), determines how much
observing X slides the probability Ior A, the likelihood ratio is
what says how strong X is as evidence. \ell, in your theory A, you
can predict X with probability , iI you like, but you can't control
the denominator oI the likelihood ratio, p(X|~A) there will
always be some alternative theories that also predict X, and while
we go with the simplest theory that Iits the current evidence, you
may someday encounter some evidence that an alternative theory
predicts but your theory does not. That's the hidden gotcha that
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 73
toppled ewton's theory oI gravity. So there's a limit on how
much mileage you can get Irom successIul predictions, there's a
limit on how high the likelihood ratio goes Ior confirmatory
evidence.
On the other hand, iI you encounter some piece oI evidence Y
that is deIinitely not predicted by your theory, this is enormously
strong evidence against your theory. II p(Y|A) is inIinitesimal,
then the likelihood ratio will also be inIinitesimal. For example, iI
p(Y|A) is o.ooo%, and p(Y|~A) is %, then the likelihood
ratio p(Y|A)/p(Y|~A) will be :oooo. o decibels oI
evidence! Or Ilipping the likelihood ratio, iI p(Y|A) is very small,
then p(Y|~A)/p(Y|A) will be very large, meaning that observing
Y greatly Iavors -A over A. FalsiIication is much stronger than
conIirmation. This is a consequence oI the earlier point that very
strong evidence is not the product oI a very high probability that A
leads to X, but the product oI a very low probability that not-A
could have led to X. This is the precise Bayesian rule that
underlies the heuristic value oI Popper's IalsiIicationism.
Similarly, Popper's dictum that an idea must be IalsiIiable can be
interpreted as a maniIestation oI the Bayesian conservationoI
probability rule, iI a result X is positive evidence Ior the theory,
then the result -X would have disconIirmed the theory to some
extent. II you try to interpret both X and -X as conIirming" the
theory, the Bayesian rules say this is impossible! To increase the
probability oI a theory you must expose it to tests that can
potentially decrease its probability, this is not just a rule Ior
detecting wouldbe cheaters in the social process oI science, but a
consequence oI Bayesian probability theory. On the other hand,
Popper's idea that there is only IalsiIication and no such thing as
conIirmation turns out to be incorrect. Bayes' Theorem shows
that IalsiIication is very strong evidence compared to conIirmation,
but IalsiIication is still probabilistic in nature, it is not governed by
Iundamentally diIIerent rules Irom conIirmation, as Popper
argued.
So we Iind that many phenomena in the cognitive sciences, plus
the statistical methods used by scientists, plus the scientiIic
74 MAP AND TERRITORY
method itselI, are all turning out to be special cases oI Bayes'
Theorem. Hence the Bayesian revolution.
Fun
Fact!
Q. Are there any limits to the power
of Bayes Theorem?
A. According to legend, one who Iully
grasped Bayes' Theorem would gain the
ability to create and physically enter an
alternate universe using only oIItheshelI
equipment and a short computer program.
One who Iully grasps Bayes' Theorem, yet
remains in our universe to aid others, is
known as a Bayesattva.
Bayes' Theorem:
p(A|X) =
p(X|A)*p(A)
p(X|A)*p(A) + p(X|~A)*p(~A)
\hy wait so long to introduce Bayes' Theorem, instead oI just
showing it at the beginning: \ell. because I've tried that beIore,
and what happens, in my experience, is that people get all tangled
up in trying to apply Bayes' Theorem as a set oI poorly grounded
mental rules; instead oI the Theorem helping, it becomes one more
thing to juggle mentally, so that in addition to trying to remember
how many women with breast cancer have positive
mammographies, the reader is also trying to remember whether it's
p(X|A) in the numerator or p(A|X), and whether a positive
mammography result corresponds to A or X, and which side oI
p(X|A) is the implication, and what the terms are in the
denominator, and so on. In this excruciatingly gentle
introduction, I tried to show all the workings oI Bayesian
reasoning without ever introducing the explicit Theorem as
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 75
something extra to memorize, hopeIully reducing the number oI
Iactors the reader needed to mentally juggle.
Lven iI you happen to be one oI the Iortunate people who can
easily grasp and apply abstract theorems, the mentaljuggling
problem is still something to bear in mind iI you ever need to
explain Bayesian reasoning to someone else.
II you do Iind yourselI losing track, my advice is to Iorget Bayes'
Theorem as an equation and think about the graph. p(A) and p(-A)
are at the top. p(X'A) and p(X'-A) are the projection Iactors.
p(X&A) and p(X&-A) are at the bottom. And p(A'X) equals the
proportion oI p(X&A) within p(X&A)-p(X&-A). The graph isn't
shown here but can you see it in your mind:
And iI thinking about the graph doesn't work, I suggest Iorgetting
about Bayes' Theorem entirely just try to work out the speciIic
problem in gizmos, hoses, and sparks, or whatever it is.
Having introduced Bayes' Theorem explicitly, we can explicitly
discuss its components.
p(A|X) =
p(X|A)*p(A)
p(X|A)*p(A) + p(X|~A)*p(~A)
\e'll start with p(A'X). II you ever Iind yourselI getting conIused
about what's A and what's X in Bayes' Theorem, start with p(A'X)
on the leIt side oI the equation, that's the simplest part to
interpret. A is the thing we want to know about. X is how we're
observing it, X is the evidence we're using to make inIerences
about A. Remember that Ior every expression p(Q'P), we want to
know about the probability Ior Q given P, the degree to which P
implies Q a more sensible notation, which it is now too late to
adopt, would be p(Q<-P).
p(Q'P) is closely related to p(Q&P), but they are not identical.
Lxpressed as a probability or a Iraction, p(Q&P) is the proportion
oI things that have property Q and property P within all things; i.e.,
76 MAP AND TERRITORY
the proportion oI women with breast cancer and a positive
mammography" within the group oI all women. II the total number
oI women is o,ooo, and 8o women have breast cancer and a
positive mammography, then p(Q&P) is 8o/o,ooo o.8%. You
might say that the absolute quantity, 8o, is being normalized to a
probability relative to the group of all women. Or to make it clearer,
suppose that there's a group oI 6 women with breast cancer and
a positive mammography within a total sample group oI 8j,o
women. 6 is the absolute quantity. II you pick out a random
woman Irom the entire sample, then the probability you'll pick a
woman with breast cancer and a positive mammography is
p(Q&P), or o.;z% (in this example).
On the other hand, p(Q'P) is the proportion oI things that have
property Q and property P within all things that have P; i.e., the
proportion oI women with breast cancer and a positive
mammography within the group oI all women with positive
mammographies. II there are 6 women with breast cancer and
positive mammographies, ;j, women with positive
mammographies, and 8j,o women, then p(Q&P) is the
probability oI getting one oI those 6 women iI you're picking at
random Irom the entire group oI 8j,o, while p(Q'P) is the
probability oI getting one oI those 6 women iI you're picking at
random Irom the smaller group oI ;j,.
In a sense, p(Q|P)really means p(Q&P|P), but speciIying the
extra P all the time would be redundant. You already know it has
property P, so the property you're investigating is Q even though
you're looking at the size oI group Q&P within group P, not the
size oI group Q within group P (which would be nonsense). This is
what it means to take the property on the righthand side as given;
it means you know you're working only within the group oI things
that have property P. \hen you constrict your Iocus oI attention
to see only this smaller group, many other probabilities change. II
you're taking P as given, then p(Q&P) equals just p(Q) at least,
relative to the group P. The old p(Q), the Irequency oI things that
have property Q within the entire sample", is revised to the new
Irequency oI things that have property Q within the subsample oI
things that have property P". II P is given, iI P is our entire world,
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 77
then looking Ior Q&P is the same as looking Ior just Q.
II you constrict your Iocus oI attention to only the population oI
eggs that are painted blue, then suddenly the probability that an
egg contains a pearl" becomes a diIIerent number, this proportion
is diIIerent Ior the population oI blue eggs than the population oI
all eggs. The given, the property that constricts our Iocus oI
attention, is always on the right side oI p(Q'P), the P becomes our
world, the entire thing we see, and on the other side oI the given"
P always has probability that is what it means to take P as
given. So p(Q'P) means II P has probability , what is the
probability oI Q:" or II we constrict our attention to only things
or events where P is true, what is the probability oI Q:" Q, on the
other side oI the given, is not certain its probability may be o%
or jo% or any other number. So when you use Bayes' Theorem,
and you write the part on the leIt side as p(A'X) how to update the
probability oI A aIter seeing X, the new probability oI A given that
we know X, the degree to which X implies A you can tell that X is
always the observation or the evidence, and A is the property being
investigated, the thing you want to know about.
The right side oI Bayes' Theorem is derived Irom the leIt side
through these steps:
p(A|X) = p(A|X)
p(A|X) =
p(X&A)
p(X)
p(A|X) =
p(X&A)
p(X&A) + p(X&~A)
p(A|X) =
p(X|A)*p(A)
p(X|A)*p(A) + p(X|~A)*p(~A)
The Iirst step, p(A|X) to p(X&A)/p(X), may look like a
tautology. The actual math perIormed is diIIerent, though.
p(A|X) is a single number, the normalized probability or
Irequency oI A within the subgroup X. p(X&A)/p(X) are usually
the percentage Irequencies oI X&A and X within the entire
78 MAP AND TERRITORY
sample, but the calculation also works iI X&A and X are absolute
numbers oI people, events, or things. p(cancer|positive) is
a single percentage/Irequency/probability, always between o and .
(positive&cancer)/(positive) can be measured either in
probabilities, such as o.oo8/o.o, or it might be expressed in
groups oI women, Ior example j/zj. As long as both the
numerator and denominator are measured in the same units, it
should make no diIIerence.
Coing Irom p(X) in the denominator to p(X&A)+p(X&~A) is a
very straightIorward step whose main purpose is as a stepping
stone to the last equation. However, one common arithmetical
mistake in Bayesian calculations is to divide p(X&A) by p(X&~A),
instead oI dividing p(X&A) by [p(X&A) + p(X&~A)]. For
example, someone doing the breast cancer calculation tries to get
the posterior probability by perIorming the math operation 8o /
j,o, instead oI 8o / (8o - j,o). I like to think oI this as a rose
Ilowers error. Sometimes iI you show young children a picture
with eight roses and two tulips, they'll say that the picture contains
more roses than Ilowers. (Technically, this would be called a class
inclusion error.) You have to add the roses and the tulips to get
the number oI flowers, which you need to Iind the proportion oI
roses within the Ilowers. You can't Iind the proportion oI roses in
the tulips, or the proportion oI tulips in the roses. \hen you look
at the graph, the bottom bar consists oI all the patients with
positive results. That's what the doctor sees a patient with a
positive result. The question then becomes whether this is a
healthy patient with a positive result, or a cancerous patient with a
positive result. To Iigure the odds oI that, you have to look at the
proportion oI cancerous patients with positive results within all
patients who have positive results, because again, a patient with a
positive result" is what you actually see. You can't divide 8o by
j,o because that would mean you were trying to Iind the
proportion oI cancerous patients with positive results within the
group oI healthy patients with positive results, it's like asking how
many oI the tulips are roses, instead oI asking how many oI the
Ilowers are roses. Imagine using the same method to Iind the
proportion oI healthy patients. You would divide j,o by 8o and
Iind that ,8;% oI the patients were healthy. Or to be exact, you
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 79
would Iind that ,8;% oI cancerous patients with positive results
were healthy patients with positive results.
The last step in deriving Bayes' Theorem is going Irom p(X&A) to
p(X|A)*p(A), in both the numerator and the denominator, and
Irom p(X&~A) to p(X|~A)*p(~A), in the denominator.
\hy: \ell, one answer is because p(X'A), p(X'-A), and p(A)
correspond to the initial inIormation given in all the story
problems. But why were the story problems written that way:
Because in many cases, p(X'A), p(X'-A), and p(A) are what we
actually know; and this in turn happens because p(X'A) and p(X'-A)
are oIten the quantities that directly describe causal relations, with
the other quantities derived Irom them and p(A) as statistical
relations. For example, p(X'A), the implication Irom A to X, where
A is what we want to know and X is our way oI observing it,
corresponds to the implication Irom a woman having breast cancer
to a positive mammography. This is not just a statistical implication
but a direct causal relation; a woman gets a positive mammography
because she has breast cancer. The mammography is designed to
detect breast cancer, and it is a Iact about the physical process oI
the mammography exam that it has an 8o% probability oI
detecting breast cancer. As long as the design oI the
mammography machine stays constant, p(X'A) will stay at 8o%,
even iI p(A) changes Ior example, iI we screen a group oI woman
with other risk Iactors, so that the prior Irequency oI women with
breast cancer is o% instead oI %. In this case, p(X&A) will
change along with p(A), and so will p(X), p(A'X), and so on, but
p(X'A) stays at 8o%, because that's a Iact about the mammography
exam itselI. (Though you do need to test this statement beIore
relying on it, it's possible that the mammography exam might work
better on some Iorms oI breast cancer than others.) p(X'A) is one
oI the simple Iacts Irom which complex Iacts like p(X&A) are
constructed, p(X'A) is an elementary causal relation within a
complex system, and it has a direct physical interpretation. This is
why Bayes' Theorem has the Iorm it does, it's not Ior solving math
brainteasers, but Ior reasoning about the physical universe.
80 MAP AND TERRITORY
Once the derivation is Iinished, all the implications on the right
side oI the equation are oI the Iorm p(X|A) or p(X|~A), while
the implication on the leIt side is p(A|X). As long as you
remember this and you get the rest oI the equation right, it
shouldn't matter whether you happened to start out with p(A'X) or
p(X'A) on the leIt side oI the equation, as long as the rules are
applied consistently iI you started out with the direction oI
implication p(X'A) on the leIt side oI the equation, you would need
to end up with the direction p(A'X) on the right side oI the
equation. This, oI course, is just changing the variable labels, the
point is to remember the symmetry, in order to remember the
structure oI Bayes' Theorem.
The symmetry arises because the elementary causal relations are
generally implications Irom Iacts to observations, i.e., Irom breast
cancer to positive mammography. The elementary steps in reasoning
are generally implications Irom observations to Iacts, i.e., Irom a
positive mammography to breast cancer. The leIt side oI Bayes'
Theorem is an elementary inferential step Irom the observation oI
positive mammography to the conclusion oI an increased
probability oI breast cancer. Implication is written righttoleIt, so
we write p(cancer|positive) on the leIt side oI the
equation. The right side oI Bayes' Theorem describes the
elementary causal steps Ior example, Irom breast cancer to a
positive mammography and so the implications on the right side
oI Bayes' Theorem take the Iorm p(positive|cancer) or
p(positive|~cancer).
And that's Bayes' Theorem. Rational inIerence on the leIt end,
physical causality on the right end, an equation with mind on one
side and reality on the other. Remember how the scientiIic
method turned out to be a special case oI Bayes' Theorem: II you
wanted to put it poetically, you could say that Bayes' Theorem
binds reasoning into the physical universe.
Okay, we're done.
Reverend Bayes says:
AN INTUITIVE EXPLANATION OF BAYES' THEOREM 81
You are now an initiate
of the Bayesian Conspiracy.
82 MAP AND TERRITORY
4. Why truth? And
.)
\alking along the street, your shoelaces come untied. Shortly
thereaIter, Ior some odd reason, you start believing your shoelaces
are untied. Light leaves the Sun and strikes your shoelaces and
bounces oII, some photons enter the pupils oI your eyes and strike
your retina, the energy oI the photons triggers neural impulses, the
neural impulses are transmitted to the visualprocessing areas oI the
brain, and there the optical inIormation is processed and recon
structed into a L model that is recognized as an untied shoelace.
There is a sequence oI events, a chain oI cause and eIIect, within
the world and your brain, by which you end up believing what you
believe. The Iinal outcome oI the process is a state oI mind which
mirrors the state oI your actual shoelaces.
\hat is evidence? It is an event entangled, by links oI cause
and eIIect, with whatever you want to know about. II the target oI
your inquiry is your shoelaces, Ior example, then the light entering
your pupils is evidence entangled with your shoelaces. This should
not be conIused with the technical sense oI entanglement" used in
physicshere I'm just talking about entanglement" in the sense oI
two things that end up in correlated states because oI the links oI
cause and eIIect between them.
ot every inIluence creates the kind oI entanglement" required
Ior evidence. It's no help to have a machine that beeps when you
enter winning lottery numbers, iI the machine also beeps when you
. http://lesswrong.com/lw/jl/what_is_evidence/
z. http://sl4.org/wiki/TheSimpleTruth
. Page 6z,, 'Feeling Rational'.
enter losing lottery numbers. The light reIlected Irom your shoes
would not be useIul evidence about your shoelaces, iI the photons
ended up in the same physical state whether your shoelaces were
tied or untied.
To say it abstractly: For an event to be evidence about a target
oI inquiry, it has to happen differently in a way that's entangled with
the different possible states oI the target. (To say it technically:
There has to be Shannon mutual inIormation between the eviden
tial event and the target oI inquiry, relative to your current state oI
uncertainty about both oI them.)
Lntanglement can be contagious when processed correctly, which is
why you need eyes and a brain. II photons reIlect oII your shoelaces
and hit a rock, the rock won't change much. The rock won't reIlect
the shoelaces in any helpIul way, it won't be detectably diIIerent
depending on whether your shoelaces were tied or untied. This is
why rocks are not useIul witnesses in court. A photographic Iilm
will contract shoelaceentanglement Irom the incoming photons, so
that the photo can itselI act as evidence. II your eyes and brain
work correctly, you will become tangled up with your own shoelaces.
This is why rationalists put such a heavy premium on the
paradoxicalseeming claim that a belieI is only really worthwhile iI
you could, in principle, be persuaded to believe otherwise. II your
retina ended up in the same state regardless oI what light entered
it, you would be blind. Some belieI systems, in a rather obvious
trick to reinIorce themselves, say that certain belieIs are only real
ly worthwhile iI you believe them unconditionally no matter what
you see, no matter what you think. Your brain is supposed to end
up in the same state regardless. Hence the phrase, blind Iaith".
II what you believe doesn't depend on what you see, you've been
blinded as eIIectively as by poking out your eyeballs.
II your eyes and brain work correctly, your belieIs will end up
entangled with the Iacts. Rational thought produces beliefs which are
themselves evidence.
II your tongue speaks truly, your rational belieIs, which are
themselves evidence, can act as evidence Ior someone else. Ln
tanglement can be transmitted through chains oI cause and eI
Iectand iI you speak, and another hears, that too is cause and
WHAT IS EVIDENCE? 87
eIIect. \hen you say My shoelaces are untied" over a cellphone,
you're sharing your entanglement with your shoelaces with a Iriend.
ThereIore rational belieIs are contagious, among honest Iolk
who believe each other to be honest. And it's why a claim that
your belieIs are not contagiousthat you believe Ior private reasons
which are not transmissibleis so suspicious. II your belieIs are en
tangled with reality, they should be contagious among honest Iolk.
II your model oI reality suggests that the outputs oI your
thought processes should not be contagious to others, then your
model says that your belieIs are not themselves evidence, meaning
they are not entangled with reality. You should apply a reIlective
correction, and stop believing.
Indeed, iI you feel, on a gut level, what this all means, you will
automatically stop believing. Because my belieI is not entangled
with reality" means my belieI is not accurate". As soon as you stop
believing 'snow is white' is true", you should (automatically!) stop
believing snow is white", or something is very wrong.
So go ahead and explain why the kind oI thought processes you
use systematically produce belieIs that mirror reality. Lxplain why
you think you're rational. \hy you think that, using thought pro
cesses like the ones you use, minds will end up believing snow is
white" iI and only iI snow is white. II you don't believe that the
outputs oI your thought processes are entangled with reality, why
do you believe the outputs oI your thought processes: It's the same
thing, or it should be.
88 MAP AND TERRITORY
6. How Much Evidence Does It Take?
:
I can not conceive oI a situation that would make z-z
Ialse. Perhaps Ior that reason, my belieI in z-z is
unconditional.
I admit, I cannot conceive oI a situation" that would make z
- z Ialse. (There are redeIinitions, but those are not situa
tions", and then you're no longer talking about z, , , or -.) But
that doesn't make my belieI unconditional. I Iind it quite easy to
imagine a situation which would convince me that z - z .
Suppose I got up one morning, and took out two earplugs, and
set them down next to two other earplugs on my nighttable, and
noticed that there were now three earplugs, without any earplugs
having appeared or disappearedin contrast to my stored memory
that z - z was supposed to equal . Moreover, when I visualized
the process in my own mind, it seemed that making XX and XX
come out to XXXX required an extra X to appear Irom nowhere,
and was, moreover, inconsistent with other arithmetic I visualized,
since subtracting XX Irom XXX leIt XX, but subtracting XX Irom
XXXX leIt XXX. This would conIlict with my stored memory
that z , but memory would be absurd in the Iace oI physical
and mental conIirmation that XXX XX XX.
. http://lesswrong.com/lw/jr/how_to_convince_me_that_2_2_3/
z. Page 86, '\hat is Lvidence:'.
. http://lesswrong.com/lw/jl/what_is_evidence/f7h
I would also check a pocket calculator, Coogle, and perhaps my
copy oI j8 where \inston writes that Freedom is the Ireedom
to say two plus two equals three." All oI these would naturally show
that the rest oI the world agreed with my current visualization, and
disagreed with my memory, that z - z .
How could I possibly have ever been so deluded as to believe
that z - z : Two explanations would come to mind: First, a
neurological Iault (possibly caused by a sneeze) had made all the
additive sums in my stored memory go up by one. Second, some
one was messing with me, by hypnosis or by my being a computer
simulation. In the second case, I would think it more likely that
they had messed with my arithmetic recall than that z - z actually
equalled . either oI these plausiblesounding explanations would
prevent me Irom noticing that I was very, very, very conIused
.
\hat would convince me that z - z , in other words, is exactly
the same kind oI evidence that currently convinces me that z - z
: The evidential crossIire oI physical observation, mental visual
ization, and social agreement.
There was a time when I had no idea that z - z . I did not ar
rive at this new belieI by random processesthen there would have
been no particular reason Ior my brain to end up storing z - z
" instead oI z - z ;". The Iact that my brain stores an answer
surprisingly similar to what happens when I lay down two earplugs
alongside two earplugs, calls Iorth an explanation oI what entangle
ment produces this strange mirroring oI mind and reality.
There's really only two possibilities, Ior a belieI oI Iact
,
either
the belieI got there via a mindreality entangling process
6
, or not.
II not, the belieI can't be correct except by coincidence. For belieIs
with the slightest shred oI internal complexity
;
(requiring a com
puter program oI more than o bits to simulate), the space oI
possibilities is large enough that coincidence vanishes.
\nconditional Iacts are not the same as unconditional belieIs.
II entangled evidence convinces me that a Iact is unconditional, this
doesn't mean I always believed in the Iact without need oI entan
gled evidence.
. Page z6, 'Your Strength as a Rationalist'.
,. Page 6z,, 'Feeling Rational'.
6. Page 86, '\hat is Lvidence:'.
;. Page j,, 'Occam's Razor'.
HOW TO CONVINCE ME THAT 2 2 3 93
I believe that z - z , and I Iind it quite easy to conceive oI a
situation which would convince me that z - z . amely, the same
sort oI situation that currently convinces me that z - z . Thus I
do not Iear that I am a victim oI blind Iaith.
II there are any Christians in the audience who know Bayess
Theorem (no numerophobes, please) might I inquire oI you what sit
uation would convince you oI the truth oI Islam: Presumably it
would be the same sort oI situation causally responsible Ior pro
ducing your current belieI in Christianity: \e would push you
screaming out oI the uterus oI a Muslim woman, and have you
raised by Muslim parents who continually told you that it is good to
believe unconditionally in Islam. Or is there more to it than that:
II so, what situation would convince you oI Islam, or at least, non
Christianity:
94 MAP AND TERRITORY
8. Occams Razor
You can have some Iun with people whose anticipations get out oI
sync with what they believe they believe
z
.
I was once at a dinner party, trying to explain to a man what I
did Ior a living, when he said: I don't believe ArtiIicial Intelligence
is possible because only Cod can make a soul."
At this point I must have been divinely inspired, because I
instantly responded: You mean iI I can make an ArtiIicial Intelli
gence, it proves your religion is Ialse:"
He said, \hat:"
I said, \ell, iI your religion predicts that I can't possibly make
an ArtiIicial Intelligence, then, iI I make an ArtiIicial Intelligence,
it means your religion is Ialse. Lither your religion allows that it
might be possible Ior me to build an AI, or, iI I build an AI, that
disproves your religion."
There was a pause, as the one realized he had just made his hy
pothesis vulnerable to IalsiIication, and then he said, \ell, I didn't
mean that you couldn't make an intelligence, just that it couldn't be
emotional in the same way we are."
I said, So iI I make an ArtiIicial Intelligence that, without
being deliberately preprogrammed with any sort oI script, starts
talking about an emotional liIe that sounds like ours, that means
your religion is wrong."
He said, \ell, um, I guess we may have to agree to disagree on
this."
I said: o, we can't, actually. There's a theorem oI rationality
called Aumann's Agreement Theorem which shows that no two ra
tionalists can agree to disagree. II two people disagree with each
other, at least one oI them must be doing something wrong."
\e went back and Iorth on this brieIly. Finally, he said, \ell,
I guess I was really trying to say that I don't think you can make
something eternal."
I said, \ell, I don't think so either! I'm glad we were able
to reach agreement on this, as Aumann's Agreement Theorem re
. http://lesswrong.com/lw/i5/bayesian_judo/
z. Page o8, 'BelieI in BelieI'.
quires." I stretched out my hand, and he shook it, and then he
wandered away.
A woman who had stood nearby, listening to the conversation,
said to me gravely, That was beautiIul."
Thank you very much," I said.
BAYESIAN JUDO 113
4. Professing and Cheering
. OI these, we
might call anticipationcontrolling belieIs proper belieIs" and the
other Iorms improper belieI". A proper belieI can be wrong or ir
rational, e.g., someone who genuinely anticipates that prayer will
cure her sick baby, but the other Iorms are arguably not belieI at
all".
Yet another Iorm oI improper belieI is belieI as groupidenti
Iicationas a way oI belonging. Robin Hanson uses the excellent
metaphor
,
oI wearing unusual clothing, a group uniIorm like a
priest's vestments or a ]ewish skullcap, and so I will call this belieI
as attire".
In terms oI humanly realistic psychology
6
, the Muslims who
Ilew planes into the \orld Trade Center undoubtedly saw them
selves as heroes deIending truth, justice, and the Islamic \ay Irom
hideous alien monsters a la the movie Independence Lay
;
. Only a
very inexperienced nerd, the sort oI nerd who has no idea how non
nerds see the world, would say this out loud in an Alabama bar.
It is not an American thing to say. The American thing to say is
that the terrorists hate our Ireedom" and that Ilying a plane into
a building is a cowardly act". You cannot say the phrases heroic
selIsacriIice" and suicide bomber" in the same sentence, even Ior
the sake oI accurately describing how the Lnemy sees the world.
The very concept oI the courage and altruism oI a suicide bomber is
Lnemy attireyou can tell, because the Lnemy talks about it. The
cowardice and sociopathy oI a suicide bomber is American attire.
There are no quote marks you can use to talk about how the Lnemy
sees the world, it would be like dressing up as a azi Ior Halloween.
BelieIasattire may help explain how people can be passionate
about improper belieIs. Mere belieI in belieI
8
, or religious pro
. http://lesswrong.com/lw/i7/belief_as_attire/
z. Page o,, 'Making BelieIs Pay Rent (in Anticipated Lxperiences)'.
. Page o8, 'BelieI in BelieI'.
. Page , 'ProIessing and Cheering'.
,. http://lesswrong.com/lw/i6/professing_and_cheering/egb
6. Page zj, 'Are Your Lnemies Innately Lvil:'.
;. http://www.imdb.com/title/tt0116629/
8. Page o8, 'BelieI in BelieI'.
Iessing
j
, would have some trouble creating genuine, deep, powerIul
emotional eIIects. Or so I suspect, I conIess I'm not an expert
here. But my impression is this: People who've stopped
anticipatingasiI their religion is true, will go to great lengths to
convince themselves they are passionate, and this desperation can be
mistaken Ior passion. But it's not the same Iire they had as a child.
On the other hand, it is very easy Ior a human being to gen
uinely, passionately, gutlevel belong to a group, to cheer Ior their
Iavorite sports team
o
. (This is the Ioundation on which rests the
swindle oI Republicans vs. Lemocrats" and analogous Ialse dilem
mas
What is true of one apple may not be true of another apple; thus more can be
said about a single apple than about all the apples in the world.
Twelve Virtues oI Rationality
z
\ithin their own proIessions, people grasp the importance oI
narrowness, a car mechanic knows the diIIerence between a car
buretor and a radiator, and would not think oI them both as car
parts". A huntergatherer knows the diIIerence between a lion and
a panther. A janitor does not wipe the Iloor with window cleaner,
even iI the bottles look similar to one who has not mastered the art.
Outside their own proIessions, people oIten commit the mis
step oI trying to broaden a word as widely as possible, to cover
as much territory as possible. Is it not more glorious, more wise,
more impressive, to talk about all the apples in the world: How
much loItier it must be to explain human thought in general, without
being distracted by smaller questions, such as how humans invent
techniques Ior solving a Rubik's Cube. Indeed, it scarcely seems
necessary to consider specific questions at all, isn't a general theory a
worthy enough accomplishment on its own:
It is the way oI the curious to liIt up one pebble Irom among
a million pebbles on the shore, and see something new about it,
something interesting, something diIIerent. You call these pebbles
diamonds", and ask what might be special about themwhat inner
qualities they might have in common, beyond the glitter you Iirst
noticed. And then someone else comes along and says: \hy not
call this pebble a diamond too: And this one, and this one:" They
are enthusiastic, and they mean well. For it seems undemocratic
and exclusionary and elitist and unholistic to call some pebbles dia
monds", and others not. It seems. narrow-minded iI you'll pardon
the phrase. Hardly open, hardly embracing, hardly communal.
You might think it poetic, to give one word many meanings, and
thereby spread shades oI connotation all around. But even poets, iI
they are good poets, must learn to see the world precisely. It is not
enough to compare love to a Ilower. Hot jealous unconsummated
love is not the same as the love oI a couple married Ior decades. II
. http://lesswrong.com/lw/ic/the_virtue_of_narrowness/
z. http://yudkowsky.net/virtues/
. http://yudkowsky.net/virtues/
you need a Ilower to symbolize jealous love, you must go into the
garden, and look, and make subtle distinctionsIind a Ilower with
a heady scent, and a bright color, and thorns. Lven iI your intent
is to shade meanings and cast connotations, you must keep precise
track oI exactly which meanings you shade and connote.
It is a necessary part oI the rationalist's artor even the poet's
art!to Iocus narrowly on unusual pebbles which possess some spe
cial quality. And look at the details which those pebblesand those
pebbles alone!share among each other. This is not a sin.
It is perIectly all right Ior modern evolutionary biologists to ex
plain just the patterns oI living creatures, and not the evolution"
oI stars or the evolution" oI technology. Alas, some unIortunate
souls use the same word evolution" to cover the naturally selected
patterns oI replicating liIe, and the strictly accidental structure oI
stars, and the intelligently conIigured structure oI technology. And
as we all know, iI people use the same word, it must all be the same
thing. You should automatically generalize anything you think you
know about biological evolution to technology. Anyone who tells
you otherwise must be a mere pointless pedant. It couldn't possibly
be that your abysmal ignorance oI modern evolutionary theory is
so total that you can't tell the diIIerence between a carburetor and
a radiator. That's unthinkable. o, the other guyyou know, the
one who's studied the mathis just too dumb to see the connec
tions.
And what could be more virtuous than seeing connections:
Surely the wisest oI all human beings are the ew Age gurus who
say Lverything is connected to everything else." II you ever say
this aloud, you should pause, so that everyone can absorb the sheer
shock oI this Leep \isdom.
There is a trivial mapping between a graph and its complement.
A Iully connected graph, with an edge between every two vertices,
conveys the same amount oI inIormation as a graph with no edges
at all. The important graphs are the ones where some things are not
connected to some other things.
\hen the unenlightened ones try to be proIound, they draw
endless verbal comparisons between this topic, and that topic,
which is like this, which is like that, until their graph is Iully con
nected and also totally useless. The remedy is speciIic knowledge
THE VIRTUE OF NARROWNESS 123
and indepth study. \hen you understand things in detail, you can
see how they are not alike, and start enthusiastically subtracting
edges off your graph.
Likewise, the important categories are the ones that do not con
tain everything in the universe. Cood hypotheses can only explain
some possible outcomes, and not others.
It was perIectly all right Ior Isaac ewton to explain just gravity,
just the way things Iall downand how planets orbit the Sun, and
how the Moon generates the tidesbut not the role oI money in hu
man society or how the heart pumps blood. Sneering at narrowness
is rather reminiscent oI ancient Creeks who thought that going out
and actually looking at things was manual labor, and manual labor
was Ior slaves.
As Plato put it (in The Republic, Book VII):
II anyone should throw back his head and learn
something by staring at the varied patterns on a ceiling,
apparently you would think that he was contemplating
with his reason, when he was only staring with his eyes.
I cannot but believe that no study makes the soul look
on high except that which is concerned with real being
and the unseen. \hether he gape and stare upwards, or
shut his mouth and stare downwards, iI it be things oI
the senses that he tries to learn something about, I
declare he never could learn, Ior none oI these things
admit oI knowledge: I say his soul is looking down, not
up, even iI he is Iloating on his back on land or on sea!"
Many today make a similar mistake, and think that narrow con
cepts are as lowly and unloIty and unphilosophical as, say, going out
and looking at thingsan endeavor only suited to the underclass.
But rationalistsand also poetsneed narrow words to express
precise thoughts, they need categories which include only some
things, and exclude others. There's nothing wrong with Iocusing
your mind, narrowing your categories, excluding possibilities, and
sharpening your propositions. Really, there isn't! II you make your
words too broad, you end up with something that isn't true and
doesn't even make good poetry.
124 MYSTERIOUS ANSWERS TO MYSTERIOUS QUESTIONS
And DONT EVEN GET ME STARTED on people who think
Wikipedia is an Artificial Intelligence, the invention of LSD was a Sin-
gularity or that corporations are superintelligent!
THE VIRTUE OF NARROWNESS 125
8. Your Strength as a Rationalist
.
So instead, by dint oI mighty straining, I Iorced my model oI re
ality to explain an anomaly that never actually happened. And I knew
how embarrassing this was. I knew that the useIulness oI a model is
not what it can explain, but what it can't. A hypothesis that Iorbids
nothing, permits everything, and thereby Iails to constrain anticipa
tion
.
Your strength as a rationalist is your ability to be more conIused
by Iiction than by reality. II you are equally good at explaining any
outcome, you have zero knowledge.
\e are all weak, Irom time to time, the sad part is that I could
have been stronger. I had all the inIormation I needed to arrive at
the correct answer, I even noticed the problem, and then I ignored
it. My Ieeling oI conIusion was a Clue, and I threw my Clue away.
I should have paid more attention to that sensation oI still feels
a little forced. It's one oI the most important Ieelings a truthseeker
can have, a part oI your strength as a rationalist. It is a design Ilaw
in human cognition that this sensation maniIests as a quiet strain in
the back oI your mind, instead oI a wailing alarm siren and a glow
ing neon sign reading LITHLR YO\R MOLLL IS FALSL OR
THIS STORY IS \ROC."
z. http://www.overcomingbias.com/2007/08/truth-bias.html
. http://www.wjh.harvard.edu/%7Edtg/
Gilbert%20et%20al%20%28EVERYTHING%20YOU%20READ%29.pdf
. Page o,, 'Making BelieIs Pay Rent (in Anticipated Lxperiences)'.
YOUR STRENGTH AS A RATIONALIST 127
9. Absence of Evidence Is Evidence of
Absence
.
\nder the vast majority oI realliIe circumstances, a cause may
not reliably produce signs oI itselI, but the absence oI the cause
is even less likely to produce the signs. The absence oI an obser
vation may be strong evidence oI absence or very weak evidence
oI absence, depending on how likely the cause is to produce the
observation. The absence oI an observation that is only weakly per
mitted (even iI the alternative hypothesis does not allow it at all), is
very weak evidence oI absence (though it is evidence nonetheless).
This is the Iallacy oI gaps in the Iossil record"Iossils Iorm only
rarely, it is Iutile to trumpet the absence oI a weakly permitted ob
servation when many strong positive observations have already been
recorded. But iI there are no positive observations at all, it is time
to worry, hence the Fermi Paradox.
Your strength as a rationalist
.)
So iI you claim
,
that no sabotage" is evidence for the existence
oI a ]apaneseAmerican FiIth Column, you must conversely hold
that seeing sabotage would argue against a FiIth Column. II you
claim that a good and proper liIe" is evidence that a woman is a
witch, then an evil and improper liIe must be evidence that she is
not a witch. II you argue
6
that Cod, to test humanity's Iaith, reIus
es to reveal His existence, then the miracles described in the Bible
must argue against the existence oI Cod.
Loesn't quite sound right, does it: Pay attention to that Ieeling
oI this seems a little forced, that quiet strain in the back oI your mind
;
.
It's important.
For a true Bayesian, it is impossible to seek evidence that con-
firms a theory. There is no possible plan you can devise, no clever
strategy, no cunning device, by which you can legitimately expect
your conIidence in a Iixed proposition to be higher (on average) than
beIore. You can only ever seek evidence to test a theory, not to con
Iirm it.
This realization can take quite a load oII your mind. You need
not worry about how to interpret every possible experimental result
to conIirm your theory. You needn't bother planning how to make
any given iota oI evidence conIirm your theory, because you know
that Ior every expectation oI evidence, there is an equal and op
positive expectation oI counterevidence. II you try to weaken the
. Page z8, 'An Intuitive Lxplanation oI Bayes' Theorem'.
,. Page z8, 'Absence oI Lvidence Is Lvidence oI Absence'.
6. http://lesswrong.com/lw/i8/religions_claim_to_be_nondisprovable/
;. Page z6, 'Your Strength as a Rationalist'.
CONSERVATION OF EXPECTED EVIDENCE 131
counterevidence oI a possible abnormal" observation, you can only
do it by weakening the support oI a normal" observation, to a pre
cisely equal and opposite degree. It is a zerosum game. o matter
how you connive, no matter how you argue, no matter how you
strategize, you can't possibly expect the resulting game plan to shiIt
your belieIs (on average) in a particular direction.
You might as well sit back and relax while you wait Ior the evi
dence to come in.
.human psychology is so screwed up.
132 MYSTERIOUS ANSWERS TO MYSTERIOUS QUESTIONS
11. Hindsight bias
Hindsight bias is when people who know the answer vastly overes
timate its predictability or obviousness, compared to the estimates oI
subjects who must guess without advance knowledge. Hindsight
bias is sometimes called the I-knew-it-all-along effect.
FischhoII and Beyth (j;,) presented students with historical
accounts oI unIamiliar incidents, such as a conIlict between the
Curkhas and the British in 8. Civen the account as background
knowledge, Iive groups oI students were asked what they would
have predicted as the probability Ior each oI Iour outcomes: British
victory, Curkha victory, stalemate with a peace settlement, or stale
mate with no peace settlement. Four experimental groups were
respectively told that these Iour outcomes were the historical out
come. The IiIth, control group was not told any historical out
come. In every case, a group told an outcome assigned substantially
higher probability to that outcome, than did any other group or the
control group.
Hindsight bias matters in legal cases, where a judge or jury must
determine whether a deIendant was legally negligent in Iailing to
Ioresee a hazard (Sanchiro zoo). In an experiment based on an ac
tual legal case, Kamin and Rachlinski (jj,) asked two groups to
estimate the probability oI Ilood damage caused by blockage oI a
cityowned drawbridge. The control group was told only the back
ground inIormation known to the city when it decided not to hire
a bridge watcher. The experimental group was given this inIorma
tion, plus the Iact that a Ilood had actually occurred. Instructions
stated the city was negligent iI the Ioreseeable probability oI Ilood
ing was greater than o%. ;6% oI the control group concluded the
Ilood was so unlikely that no precautions were necessary, ,;% oI
the experimental group concluded the Ilood was so likely that Iail
ure to take precautions was legally negligent. A third experimental
group was told the outcome andalso explicitly instructed to avoid
hindsight bias, which made no diIIerence: ,6% concluded the city
was legally negligent.
Viewing history through the lens oI hindsight, we vastly un
derestimate the cost oI eIIective saIety precautions. In j86, the
. http://lesswrong.com/lw/il/hindsight_bias/
Challenger exploded Ior reasons traced to an Oring losing Ilexibility
at low temperature. There were warning signs oI a problem with
the Orings. But preventing the Challenger disaster would have
required, not attending to the problem with the Orings, but at
tending to every warning sign which seemed as severe as the Oring
problem, without benefit of hindsight. It could have been done, but it
would have required a general policy much more expensive than just
Iixing the ORings.
Shortly aIter September th zoo, I thought to myselI, and now
someone will turn up minor intelligence warnings of something-or-other,
and then the hindsight will begin. Yes, I'm sure they had some mi
nor warnings oI an al Qaeda plot, but they probably also had minor
warnings oI maIia activity, nuclear material Ior sale, and an invasion
Irom Mars.
Because we don't see the cost oI a general policy, we learn overly
speciIic lessons. AIter September th, the FAA prohibited box
cutters on airplanesas iI the problem had been the Iailure to take
this particular obvious" precaution. \e don't learn the general les
son: the cost of effective caution is very high because you must attend to
problems that are not as obvious now as past problems seem in hindsight.
The test oI a model is how much probability it assigns to the
observed outcome. Hindsight bias systematically distorts this test,
we think our model assigned much more probability than it actually
did. Instructing the jury doesn't help. You have to write down
your predictions in advance
z
. Or as FischhoII (j8z) put it:
\hen we attempt to understand past events, we
implicitly test the hypotheses or rules we use both to
interpret and to anticipate the world around us. II, in
hindsight, we systematically underestimate the surprises
that the past held and holds Ior us, we are subjecting
those hypotheses to inordinately weak tests and,
presumably, Iinding little reason to change them.
z. Page o, 'Conservation oI Lxpected Lvidence'.
134 MYSTERIOUS ANSWERS TO MYSTERIOUS QUESTIONS
FischhoII, B. j8z. For those condemned to study the past:
Heuristics and biases in hindsight. In Kahneman et. al. j8z:
z-,.
FischhoII, B., and Beyth, R. j;,. I knew it would happen:
Remembered probabilities oI onceIuture things. Organizational
Behavior and Human PerIormance, : 6.
Kamin, K. and Rachlinski, ]. jj,. Lx Post = Lx Ante: Leter
mining Liability in Hindsight
This excerpt
z
Irom Meyers's Exploring Social Psychology is worth
reading in entirety. Cullen Murphy, editor oI The Atlantic, said that
the social sciences turn up no ideas or conclusions that can't be
Iound in |any] encyclopedia oI quotations. Lay aIter day social
scientists go out into the world. Lay aIter day they discover that
people's behavior is pretty much what you'd expect."
OI course, the expectation" is all hindsight
. (Hindsight bias:
Subjects who know the actual answer to a question assign much
higher probabilities they would have" guessed Ior that answer,
compared to subjects who must guess without knowing the answer.)
The historian Arthur Schlesinger, ]r. dismissed scientiIic studies
oI \\II soldiers' experiences as ponderous demonstrations" oI
common sense. For example:
. Better educated soldiers suIIered more adjustment
problems than less educated soldiers. (Intellectuals were
less prepared Ior battle stresses than streetsmart people.)
z. Southern soldiers coped better with the hot South Sea
Island climate than orthern soldiers. (Southerners are
more accustomed to hot weather.)
. \hite privates were more eager to be promoted to
noncommissioned oIIicers than Black privates. (Years oI
oppression take a toll on achievement motivation.)
. Southern Blacks preIerred Southern to orthern \hite
oIIicers (because Southern oIIicers were more experienced
and skilled in interacting with Blacks).
,. As long as the Iighting continued, soldiers were more
eager to return home than aIter the war ended. (Luring
the Iighting, soldiers knew they were in mortal danger.)
How many oI these Iindings do you think you could have predict
ed in advance: out oI ,: out oI ,: Are there any cases where
you would have predicted the oppositewhere your model takes a
hit
Once upon a time, there was an instructor who taught physics stu
dents. One day she called them into her class, and showed them a
wide, square plate oI metal, next to a hot radiator. The students
each put their hand on the plate, and Iound the side next to the
radiator cool, and the distant side warm. And the instructor said,
Why do you think this happens? Some students guessed convection oI
air currents, and others guessed strange metals in the plate. They
devised many creative explanations, none stooping so low as to say
I don't know" or This seems impossible.
z
"
And the answer was that beIore the students entered the room,
the instructor turned the plate around.
Consider the student who Irantically stammers, Lh, maybe be
cause oI the heat conduction and so:" I ask: is this answer a proper
belieI
said in a loud,
emphatic voice. But do the words actually control anticipation
,
:
Ponder that innocent little phrase, because oI", which comes
beIore heat conduction". Ponder some oI the other things we could
put aIter it. \e could say, Ior example, Because oI phlogiston", or
Because oI magic."
Magic!" you cry. That's not a scientific explanation!" Indeed,
the phrases because oI heat conduction" and because oI magic"
are readily recognized as belonging to diIIerent literary genres. Heat
conduction" is something that Spock might say on Star Trek, where
as magic" would be said by Ciles in Buffy the Vampire Slayer.
However, as Bayesians, we take no notice oI literary genres. For
us, the substance oI a model is the control it exerts on anticipation.
II you say heat conduction", what experience does that lead you to
anticipate? \nder normal circumstances, it leads you to anticipate
that, iI you put your hand on the side oI the plate near the radia
tor, that side will Ieel warmer than the opposite side. II because oI
heat conduction" can also explain the radiatoradjacent side Ieeling
cooler, then it can explain pretty much anything.
. http://lesswrong.com/lw/ip/fake_explanations/
z. Page z6, 'Your Strength as a Rationalist'.
. Page ;, 'BelieI as Attire'.
. Page , 'ProIessing and Cheering'.
,. Page o8, 'BelieI in BelieI'.
And
6
as
;
we
8
all
j
know
o
by
this
z
point
(I
do
,
hope
6
),
iI you are equally good at explaining any outcome, you have zero
knowledge. Because oI heat conduction", used in such Iashion, is
a disguised hypothesis oI maximum entropy. It is anticipationiso
morphic to saying magic". It Ieels like an explanation, but it's not.
Supposed that instead oI guessing, we measured the heat oI the
metal plate at various points and various times. Seeing a metal
plate next to the radiator, we would ordinarily expect the point
temperatures to satisIy an equilibrium oI the diIIusion equation
with respect to the boundary conditions imposed by the environ
ment. You might not know the exact temperature oI the Iirst point
measured, but aIter measuring the Iirst pointsI'm not physicist
enough to know how many would be requiredyou could take an
excellent guess at the rest.
A true master oI the art oI using numbers to constrain the an
ticipation oI material phenomenaa physicist"would take some
measurements and say, This plate was in equilibrium with the en
vironment two and a halI minutes ago, turned around, and is now
approaching equilibrium again."
The deeper error oI the students is not simply that they Iailed to
constrain anticipation. Their deeper error is that they thought they
were doing physics. They said the phrase because oI", Iollowed by
the sort oI words Spock might say on Star Trek, and thought they
thereby entered the magisterium oI science.
ot so. They simply moved their magic Irom one literary genre
to another.
6. Page o,, 'Making BelieIs Pay Rent (in Anticipated Lxperiences)'.
;. Page o8, 'BelieI in BelieI'.
8. Page z, 'Bayesian ]udo'.
j. Page , 'ProIessing and Cheering'.
o. Page ;, 'BelieI as Attire'.
. http://lesswrong.com/lw/i8/religions_claim_to_be_nondisprovable/
z. Page j, 'Focus Your \ncertainty'.
. Page z6, 'Your Strength as a Rationalist'.
. Page z8, 'Absence oI Lvidence Is Lvidence oI Absence'.
,. Page 6, 'Hindsight Levalues Science'.
6. Page o, 'Conservation oI Lxpected Lvidence'.
FAKE EXPLANATIONS 139
14. Guessing the Teachers Password
, and
we all know what that would imply.
The X-Men comics use terms like evolution", mutation", and
genetic code", purely to place themselves in what they conceive to
be the literary genre oI science. The part that scares me is wonder
ing how many people, especially in the media, understand science
only as a literary genre.
I encounter people who very deIinitely believe in
,
evolution,
who sneer at the Iolly oI creationists. And yet they have no idea
oI what the theory oI evolutionary biology permits and prohibits.
They'll talk about the next step in the evolution oI humanity", as
iI natural selection got here by Iollowing a plan. Or even worse,
they'll talk about something completely outside the domain oI evo
. http://lesswrong.com/lw/ir/science_as_attire/
z. Page 8, 'Fake Lxplanations'.
. Page ;, 'BelieI as Attire'.
. Page z6, 'Your Strength as a Rationalist'.
,. Page , 'ProIessing and Cheering'.
lutionary biology, like an improved design Ior computer chips, or
corporations splitting, or humans uploading themselves into com
puters, and they'll call that evolution". II evolutionary biology
could cover that, it could cover anything.
Probably an actual majority oI the people who believe in evolu
tion use the phrase because oI evolution
6
" because they want to be
part oI the scientiIic incrowdbelieI as scientiIic attire
;
, like wear
ing a lab coat. II the scientiIic incrowd instead used the phrase
because oI intelligent design", they would just as cheerIully use
that insteadit would make no diIIerence to their anticipation
controllers. Saying because oI evolution" instead oI because oI
intelligent design" does not, for them, prohibit Storm. Its only pur
pose, Ior them, is to identiIy with a tribe.
I encounter people who are quite willing to entertain the notion
oI dumberthanhuman ArtiIicial Intelligence, or even mildly
smarterthanhuman ArtiIicial Intelligence. Introduce the notion
oI strongly superhuman ArtiIicial Intelligence, and they'll suddenly
decide it's pseudoscience
8
. It's not that they think they have a
theory oI intelligence which lets them calculate a theoretical upper
bound on the power oI an optimization process. Rather, they as
sociate strongly superhuman AI to the literary genre oI apocalyptic
literature, whereas an AI running a small corporation associates to
the literary genre oI Wired magazine. They aren't speaking Irom
within a model oI cognition. They don't realize they need a model.
They don't realize that science is about models. Their devastating
critiques consist purely oI comparisons to apocalyptic literature, rather
than, say, known laws which prohibit such an outcome. They un
derstand science only as a literary genre, or ingroup to belong to.
The attire
j
doesn't look to them like a lab coat, this isn't the Ioot
ball team they're cheering
o
Ior.
Is there anything in science that you are proud oI believing, and
yet you do not use the belieI proIessionally: You had best ask your
selI which Iuture experiences your belieI prohibits Irom happening
to you. That is the sum oI what you have assimilated and made a
6. Page 8, 'Fake Lxplanations'.
;. Page ;, 'BelieI as Attire'.
8. http://lesswrong.com/lw/io/is_molecular_nanotechnology_scientific/
j. Page ;, 'BelieI as Attire'.
o. Page , 'ProIessing and Cheering'.
SCIENCE AS ATTIRE 145
true part oI yourselI. Anything else is probably passwords
or at
tire
z
.
. Page o, 'Cuessing the Teacher's Password'.
z. Page ;, 'BelieI as Attire'.
146 MYSTERIOUS ANSWERS TO MYSTERIOUS QUESTIONS
16. Fake Causality
about something
doesn't qualiIy it as a stopsign. I'm not exactly Iond oI terrorists
or IearIul oI private property, that doesn't mean Terrorists!" or
Capitalism!" are cognitive traIIic signals unto me. (The word in
telligence" did once have that eIIect on me, though no longer.)
\hat distinguishes a semantic stopsign is failure to consider the obvi-
ous next question.
. Page 6z,, 'Feeling Rational'.
SEMANTIC STOPSIGNS 153
18. Mysterious Answers to Mysterious
Questions
, Fake Causal
ity
:
Lmergence" has become very popular, just as saying magic"
used to be very popular. Lmergence" has the same deep appeal to
human psychology, Ior the same reason. Lmergence" is such a won
derIully easy explanation, and it Ieels good to say it, it gives you
a sacred mystery
,
to worship. Lmergence is popular because it is
the junk Iood oI curiosity. You can explain anything using emer
gence, and so people do just that, Ior it Ieels so wonderIul to explain
things. Humans are still humans, even iI they've taken a Iew science
classes in college. Once they Iind a way to escape the shackles
6
oI
settled science, they get up to the same shenanigans as their ances
tors, dressed up in the literary genre oI science"
;
but still the same
species psychology.
z. Page o, 'Cuessing the Teacher's Password'.
. Page 8, 'Fake Lxplanations'.
. Page z6, 'Your Strength as a Rationalist'.
,. Page ,, 'Mysterious Answers to Mysterious Questions'.
6. http://yudkowsky.net/virtues/
;. Page ;, 'BelieI as Attire'.
160 MYSTERIOUS ANSWERS TO MYSTERIOUS QUESTIONS
20. Say Not Complexity
It is said that parents do all the things they tell their children not to
do, which is how they know not to do them.
Long ago, in the unthinkably distant past, I was a devoted Tra
ditional Rationalist, conceiving myselI skilled according to that
kind, yet I knew not the \ay oI Bayes. \hen the young Lliezer
was conIronted with a mysteriousseeming question, the precepts
oI Traditional Rationality did not stop him Irom devising a Myste
rious Answer
z
. It is, by Iar, the most embarrassing mistake I made
in my liIe, and I still wince to think oI it.
\hat was my mysterious answer to a mysterious question: This
I will not describe, Ior it would be a long tale and complicated. I
was young, and a mere Traditional Rationalist who knew not the
teachings oI Tversky and Kahneman. I knew about Occam's Ra
zor, but not the conjunction Iallacy
, when I knew
not the \ay oI Bayes, I gave a Mysterious Answer
to a mysterious
seeming question. Many Iailures occurred in sequence, but one
mistake stands out as most critical: My younger selI did not real
ize that solving a mystery should make it feel less confusing. I was trying
to explain a Mysterious Phenomenonwhich to me meant pro
viding a cause Ior it, Iitting it into an integrated model oI reality.
\hy should this make the phenomenon less Mysterious, when that
is its nature: I was trying to explain the Mysterious Phenomenon,
not render it (by some impossible alchemy) into a mundane phe
nomenon, a phenomenon that wouldn't even call out Ior an unusual
explanation in the Iirst place.
As a Traditional Rationalist, I knew the historical tales oI as
trologers and astronomy, oI alchemists and chemistry, oI vitalists
and biology. But the Mysterious Phenomenon was not like this.
It was something new, something stranger, something more diI
Iicult, something that ordinary science had Iailed to explain Ior
centuries
as iI stars and matter and liIe had not been mysteries Ior hun
dreds oI years and thousands oI years, Irom the dawn oI human
thought right up until science Iinally solved them
\e learn about astronomy and chemistry and biology in school,
and it seems to us that these matters have always been the proper
realm oI science, that they have never been mysterious. \hen sci
ence dares to challenge a new Creat Puzzle, the children oI that
generation are skeptical, Ior they have never seen science explain
something that feels mysterious to them. Science is only good Ior
explaining scientific subjects, like stars and matter and liIe.
I thought the lesson oI history was that astrologers and al
chemists and vitalists had an innate character Ilaw
,
, a tendency
. http://lesswrong.com/lw/iz/failing_to_learn_from_history/
z. Page 6;, 'My \ild and Reckless Youth'.
. Page 6;, 'My \ild and Reckless Youth'.
. Page ,, 'Mysterious Answers to Mysterious Questions'.
,. Page z6, 'Correspondence Bias'.
toward mysterianism, which led them to come up with mysterious
explanations Ior nonmysterious subjects. But surely, iI a phe
nomenon really was very weird, a weird explanation might be in
order:
It was only aIterward, when I began to see the mundane struc
ture inside the mystery, that I realized whose shoes I was standing
in. Only then did I realize how reasonable vitalism had seemed at
the time, how surprising and embarrassing had been the universe's re
ply oI, LiIe is mundane, and does not need a weird explanation."
\e read history but we don't live it, we don't experience it.
II only I had personally postulated astrological mysteries and then
discovered ewtonian mechanics, postulated alchemical mysteries
and then discovered chemistry, postulated vitalistic mysteries and
then discovered biology. I would have thought oI my Mysterious
Answer and said to myselI: No way am I falling for that again.
FAILING TO LEARN FROM HISTORY 171
24. Making History Available
.
There is an inverse error to generalizing Irom Iictional evidence:
Iailing to be suIIiciently moved by historical evidence. The trouble
with generalizing Irom Iictional evidence is that it is Iictionit nev
er actually happened. It's not drawn Irom the same distribution
as this, our real universe, Iiction diIIers Irom reality in systematic
ways
,
. But history has happened, and should be available.
In our ancestral environment, there were no movies, what you
saw with your own eyes was true. Is it any wonder that Iictions we
see in liIelike moving pictures have too great an impact on us: Con
versely, things that really happened, we encounter as ink on paper,
they happened, but we never saw them happen. \e don't remem
ber them happening to us.
The inverse error is to treat history as mere story, process it
with the same part oI your mind that handles the novels you read.
You may say with your lips that it is truth", rather than Iiction",
but that doesn't mean you are being moved as much as you should
be. Many biases involve being insuIIiciently moved by dry, abstract
inIormation
6
.
Once upon a time, I gave a Mysterious Answer
;
to a mysterious
question, not realizing that I was making exactly the same mistake
as astrologers devising mystical explanations Ior the stars, or al
. http://lesswrong.com/lw/j0/making_history_available/
z. Page ;o, 'Failing to Learn Irom History'.
. http://en.wikipedia.org/wiki/Availability_heuristic
. http://singinst.org/Biases.pdf
,. http://www.overcomingbias.com/2007/07/tell-your-anti-.html
6. http://lesswrong.com/lw/hw/scope_insensitivity/
;. Page ,, 'Mysterious Answers to Mysterious Questions'.
chemists devising magical properties oI matter, or vitalists postulat
ing an opaque elan vital" to explain all oI biology.
\hen I Iinally realized whose shoes I was standing in
8
, there
was a sudden shock oI unexpected connection with the past. I
realized that the invention and destruction oI vitalismwhich I
had only read about in bookshad actually happened to real people,
who experienced it much the same way I experienced the invention
and destruction oI my own mysterious answer. And I also realized
that iI I had actually experienced the pastiI I had lived through
past scientiIic revolutions myselI, rather than reading about them in
history booksI probably would not have made the same mistake
again. I would not have come up with another mysterious answer,
the Iirst thousand lessons would have hammered home the moral.
So (I thought), to Ieel suIIiciently the Iorce oI history, I should
try to approximate the thoughts oI an Lliezer who had lived through
historyI should try to think as iI everything I read about in
history books, had actually happened to me. (\ith appropriate
reweighting Ior the availability bias oI history booksI should
remember being a thousand peasants Ior every ruler.) I should im
merse myselI in history, imagine living through eras I only saw as
ink on paper.
\hy should I remember the \right Brothers' Iirst Ilight: I
was not there. But as a rationalist, could I dare to not remember,
when the event actually happened: Is there so much diIIerence be
tween seeing an event through your eyeswhich is actually a causal
chain involving reIlected photons, not a direct connectionand
seeing an event through a history book: Photons and history books
both descend by causal chains Irom the event itselI.
I had to overcome the Ialse amnesia oI being born at a particular
time. I had to recallmake available
j
all the memories, not just
the memories which, by mere coincidence, belonged to myselI and
my own era.
The Larth became older, oI a sudden.
To my Iormer memory, the \nited States had always exist
edthere was never a time when there was no \nited States. I had
not remembered, until that time, how the Roman Lmpire rose, and
8. Page ;o, 'Failing to Learn Irom History'.
j. http://en.wikipedia.org/wiki/Availability_heuristic
MAKING HISTORY AVAILABLE 173
brought peace and order, and lasted through so many centuries, un
til I Iorgot that things had ever been otherwise, and yet the Lmpire
Iell, and barbarians overran my city, and the learning that I had pos
sessed was lost. The modern world became more Iragile to my eyes,
it was not the Iirst modern world.
So many mistakes, made over and over and over again, because I
did not remember making them, in every era I never lived.
And to think, people sometimes wonder iI overcoming bias is
important.
Lon't you remember how many times your biases have killed
you: You don't: I've noticed that sudden amnesia oIten Iollows
a Iatal mistake. But take it Irom me, it happened. I remember, I
wasn't there.
So the next time you doubt the strangeness oI the Iuture, re
member how you were born in a huntergatherer tribe ten thousand
years ago, when no one knew oI Science at all. Remember how you
were shocked, to the depths oI your being, when Science explained
the great and terrible sacred mysteries that you once revered so
highly. Remember how you once believed that you could Ily by
eating the right mushrooms, and then you accepted with disap
pointment that you would never Ily, and then you Ilew. Remember
how you had always thought that slavery was right and proper, and
then you changed your mind. Lon't imagine how you could have
predicted the change
o
, Ior that is amnesia. Remember that, in Iact,
you did not guess. Remember how, century aIter century, the world
changed in ways you did not guess.
Maybe then you will be less shocked by what happens next.
o. Page , 'Hindsight bias'.
174 MYSTERIOUS ANSWERS TO MYSTERIOUS QUESTIONS
25. Explain/Worship/Ignore?
, Say ot 'Complexity'
(which I Iailed to
Coogle Iound by ]eII Crey and Miguel) Irom some dictator or oth
er, who was asked iI he had any intentions to move his pet state
toward democracy:
\e believe we are already within a democratic system.
Some Iactors are still missing, like the expression oI the
people's will.
The substance oI a democracy is the speciIic mechanism that re
solves policy conIlicts. II all groups had the same preIerred policies,
there would be no need Ior democracywe would automatically co
operate. The resolution process can be a direct majority vote, or an
elected legislature, or even a votersensitive behavior oI an AI, but
it has to be something. \hat does it mean to call Ior a democratic"
solution iI you don't have a conIlictresolution mechanism in mind:
I think it means that you have said the word democracy", so
the audience is supposed to cheer. It's not so much a proposition-
al statement, as the equivalent oI the Applause" light that tells a
studio audience when to clap.
This case is remarkable only in that I mistook the applause
light Ior a policy suggestion, with subsequent embarrassment Ior
all. Most applause lights are much more blatant, and can be detect
ed by a simple reversal test. For example, suppose someone says:
\e need to balance the risks and opportunities oI AI.
. http://www.time.com/time/magazine/article/0,9171,954853,00.html
APPLAUSE LIGHTS 181
II you reverse this statement, you get:
\e shouldn't balance the risks and opportunities oI AI.
Since the reversal sounds abnormal, the unreversed statement
is probably normal, implying it does not convey new inIormation.
There are plenty oI legitimate reasons Ior uttering a sentence that
would be uninIormative in isolation. \e need to balance the risks
and opportunities oI AI" can introduce a discussion topic, it can
emphasize the importance oI a speciIic proposal Ior balancing, it
can criticize an unbalanced proposal. Linking to a normal assertion
can convey new inIormation to a bounded rationalistthe link it
selI may not be obvious. But iI no speciIics Iollow, the sentence is
probably an applause light.
I am tempted to give a talk sometime that consists oI nothing but
applause lights, and see how long it takes Ior the audience to start
laughing:
I am here to propose to you today that we need to
balance the risks and opportunities oI advanced ArtiIicial
Intelligence. \e should avoid the risks and, insoIar as it
is possible, realize the opportunities. \e should not
needlessly conIront entirely unnecessary dangers. To
achieve these goals, we must plan wisely and rationally.
\e should not act in Iear and panic, or give in to
technophobia, but neither should we act in blind
enthusiasm. \e should respect the interests oI all
parties with a stake in the Singularity. \e must try to
ensure that the beneIits oI advanced technologies accrue
to as many individuals as possible, rather than being
restricted to a Iew. \e must try to avoid, as much as
possible, violent conIlicts using these technologies, and
we must prevent massive destructive capability Irom
Ialling into the hands oI individuals. \e should think
through these issues beIore, not aIter, it is too late to do
anything about them.
182 MYSTERIOUS ANSWERS TO MYSTERIOUS QUESTIONS
28. Truly Part Of You
.
So now it's occurred to me that my productivity problem may
not be chaos, but my own stupidity.
And that may or may not help anything. It certainly doesn't Iix
the problem right away. Saying I'm ignorant" doesn't make you
knowledgeable.
But it is, at least, a diIIerent path than saying it's too chaotic".
. Page 68,, 'Mind Projection Fallacy'.
188 MYSTERIOUS ANSWERS TO MYSTERIOUS QUESTIONS
A series on the use and abuse of words; why you often cant
define a word any way you like; how human brains seem
to process definitions. First introduces the Mind
projection Iallacy and the concept of how an algorithm
feels from inside, which makes it a basic intro to key
elements of the LW zeitgeist.
Part III
A Humans Guide to Words
1. The Parable of the Dagger
Once upon a time, there was a court jester who dabbled in logic.
The jester presented the king with two boxes. \pon the Iirst
box was inscribed:
Lither this box contains an angry Irog, or the box with a
Ialse inscription contains an angry Irog, but not both."
On the second box was inscribed:
Lither this box contains gold and the box with a Ialse
inscription contains an angry Irog, or this box contains
an angry Irog and the box with a true inscription contains
gold."
And the jester said to the king: One box contains an angry
Irog, the other box gold, and one, and only one, oI the inscriptions
is true."
The king opened the wrong box, and was savaged by an angry
Irog.
You see," the jester said, let us hypothesize that the Iirst in
scription is the true one. Then suppose the Iirst box contains gold.
Then the other box would have an angry Irog, while the box with a
true inscription would contain gold, which would make the second
statement true as well. ow hypothesize that the Iirst inscription
is Ialse, and that the Iirst box contains gold. Then the second in
scription would be"
The king ordered the jester thrown in the dungeons.
A day later, the jester was brought beIore the king in chains, and
shown two boxes.
One box contains a key," said the king, to unlock your chains,
and iI you Iind the key you are Iree. But the other box contains a
dagger Ior your heart, iI you Iail."
And the Iirst box was inscribed:
. http://lesswrong.com/lw/ne/the_parable_of_the_dagger/
Lither both inscriptions are true, or both inscriptions
are Ialse."
And the second box was inscribed:
This box contains the key."
The jester reasoned thusly: Suppose the Iirst inscription is
true. Then the second inscription must also be true. ow suppose
the Iirst inscription is Ialse. Then again the second inscription
must be true. So the second box must contain the key, iI the Iirst
inscription is true, and also iI the Iirst inscription is Ialse. There
Iore, the second box must logically contain the key."
The jester opened the second box, and Iound a dagger.
How:!" cried the jester in horror, as he was dragged away. It's
logically impossible!"
It is entirely possible," replied the king. I merely wrote those
inscriptions on two boxes, and then I put the dagger in the second
one."
(Adapted from Raymond Smullyan.)
192 A HUMAN`S GUIDE TO WORDS
2. The Parable of Hemlock
.
You could reason, perhaps, as Iollows: All things I have ob
served which wear clothing, speak language, and use tools, have also
shared certain other properties as well, such as breathing air and
pumping red blood. The last thirty 'humans' belonging to this clus
. http://lesswrong.com/lw/nf/the_parable_of_hemlock/
z. Page j, 'The Parable oI the Lagger'.
. http://machall.com/view.php?date=2003-04-21
ter, whom I observed to drink hemlock, soon Iell over and stopped
moving. Socrates wears a toga, speaks Iluent ancient Creek, and
drank hemlock Irom a cup. So I predict that Socrates will keel over
in the next Iive minutes."
But that would be mere guessing. It wouldn't be, y'know, abso
lutely and eternally certain
.
Yet the brain goes on about its work oI categorization, whether
or not we consciously approve. All humans are mortal, Socrates
is a human, thereIore Socrates is mortal"thus spake the ancient
Creek philosophers. \ell, iI mortality is part oI your logical deI
inition oI human", you can't logically classiIy Socrates as human
until you observe him to be mortal
: Is transhumanism a cult"
: People
who argue that atheism is a religion because it states belieIs about
Cod" are really trying to argue (I think) that the reasoning methods
used in atheism are on a par with the reasoning methods used in
religion, or that atheism is no saIer than religion in terms oI the
probability oI causally engendering violence, etc. \hat's really at
stake is an atheist's claim oI substantial diIIerence and superiority
relative to religion, which the religious person is trying to reject by
denying the diIIerence rather than the superiority(!)
But that's not the a priori irrational part: The a priori irrational
part is where, in the course oI the argument, someone pulls out a
dictionary and looks up the deIinition oI atheism" or religion".
(And yes, it's just as silly whether an atheist or religionist does it.)
How could a dictionary possibly decide whether an empirical cluster
oI atheists is really substantially diIIerent Irom an empirical clus
ter oI theologians: How can reality vary with the meaning oI a
word: The points in thingspace don't move around when we re
draw a boundary.
But people oIten don't realize that their argument about where
to draw a deIinitional boundary, is really a dispute over whether to
inIer a characteristic shared by most things inside an empirical clus
ter.
Hence the phrase, disguised query".
. Page jz, '\ncritical Supercriticality'.
. Page , 'Cultish Countercultishness'.
216 A HUMAN`S GUIDE TO WORDS
9. Neural Categories
: Color, shape,
texture, luminance, and interior.
Suppose I want to create an ArtiIicial eural etwork (A)
to predict unobserved blegg characteristics Irom observed blegg
characteristics. And suppose I'm Iairly naive about As: I've
read excited popular science books about how neural networks are
distributed, emergent, and parallel just like the human brain!! but I
can't derive the diIIerential equations Ior gradient descent in a non
recurrent multilayer network with sigmoid units (which is actually a
lot easier than it sounds).
Then I might design a neural network that looks something like
this:
. http://lesswrong.com/lw/nn/neural_categories/
z. Page z, 'Lisguised Queries'.
. Page z, 'Lisguised Queries'.
. Page zoj, 'The Cluster Structure oI Thingspace'.
Blegg
etwork is Ior classiIying bleggs and rubes. But since blegg"
is an unIamiliar and synthetic concept, I've also included a similar
etwork b Ior distinguishing humans Irom Space Monsters, with
input Irom Aristotle (All men are mortal") and Plato's Academy
(A Ieatherless biped with broad nails").
A neural network needs a learning rule. The obvious idea is
that when two nodes are oIten active at the same time, we should
strengthen the connection between themthis is one oI the Iirst
rules ever proposed Ior training a neural network, known as Hebb's
Rule.
Thus, iI you oIten saw things that were both blue and
Iurredthus simultaneously activating the color" node in the -
state and the texture" node in the - statethe connection would
strengthen between color and texture, so that - colors activated -
textures, and vice versa. II you saw things that were blue and egg
shaped and vanadiumcontaining, that would strengthen positive
mutual connections between color and shape and interior.
Let's say you've already seen plenty oI bleggs and rubes come oII
the conveyor belt. But now you see something that's Iurred, egg
shaped, andgasp!reddish purple (which we'll model as a color"
218 A HUMAN`S GUIDE TO WORDS
activation level oI z/). You haven't yet tested the luminance, or
the interior. \hat to predict, what to predict:
\hat happens then is that the activation levels in etwork
bounce around a bit. Positive activation Ilows luminance Irom
shape, negative activation Ilows to interior Irom color, negative
activation Ilows Irom interior to luminance. OI course all these
messages are passed in parallel!! and asynchronously!! just like the hu
man brain.
Finally etwork settles into a stable state, which has high pos
itive activation Ior luminance" and interior". The network may
be said to expect" (though it has not yet seen) that the object will
glow in the dark, and that it contains vanadium.
And lo, etwork exhibits this behavior even though there's no
explicit node that says whether the object is a blegg or not. The
judgment is implicit in the whole network!! Bleggness is an attractor!!
which arises as the result oI emergent behavior!! Irom the distributed!!
learning rule.
ow in real liIe, this kind oI network designhowever Iaddish
,
it may soundruns into all sorts oI problems. Recurrent networks
don't always settle right away: They can oscillate, or exhibit chaotic
behavior, or just take a very long time to settle down. This is a Bad
Thing when you see something big and yellow and striped, and you
have to wait Iive minutes Ior your distributed neural network to set
tle into the tiger" attractor. Asynchronous and parallel it may be,
but it's not realtime.
And there are other problems, like doublecounting the evi
dence
6
when messages bounce back and Iorth: II you suspect that
an object glows in the dark, your suspicion will activate belieI that
the object contains vanadium, which in turn will activate belieI that
the object glows in the dark.
Plus iI you try to scale up the etwork design, it requires
O(
z
) connections, where is the total number oI observables.
So what might be a more realistic neural network design:
,. http://thedailywtf.com/Articles/No,_We_Need_a_Neural_Network.aspx
6. Page ;, 'Fake Causality'.
NEURAL CATEGORIES 219
Bleggz
In this network, a wave oI activation converges on the central node
Irom any clamped (observed) nodes, and then surges back out again
to any unclamped (unobserved) nodes. \hich means we can com
pute the answer in one step, rather than waiting Ior the network to
settlean important requirement in biology when the neurons only
run at zoHz. And the network architecture scales as O(), rather
than O(
z
).
Admittedly, there are some things you can notice more easily
with the Iirst network architecture than the second. etwork
has a direct connection between every two nodes. So iI red objects
never glow in the dark, but red Iurred objects usually have the
other blegg characteristics like eggshape and vanadium, etwork
can easily represent this: it just takes a very strong direct negative
connection Irom color to luminance, but more powerIul positive
connections Irom texture to all other nodes except luminance.
or is this a special exception" to the general rule that bleggs
glowremember, in etwork , there is no unit that represents
bleggness, bleggness emerges as an attractor in the distributed
network.
So yes, those
z
connections were buying us something. But
not very much. etwork is not more useIul on most realworld
220 A HUMAN`S GUIDE TO WORDS
problems, where you rarely Iind an animal stuck halIway between
being a cat and a dog.
(There are also Iacts that you can't easily represent in etwork
or etwork z. Let's say seablue color and spheroid shape, when
Iound together, always indicate the presence oI palladium, but
when Iound individually, without the other, they are each very
strong evidence Ior vanadium. This is hard to represent, in either
architecture, without extra nodes. Both etwork and etwork
z embody implicit assumptions about what kind oI environmental
structure is likely to exist, the ability to read this oII is what sepa
rates the adults Irom the babes, in machine learning.)
Make no mistake: either etwork , nor etwork z, are bi
ologically realistic. But it still seems like a Iair guess that however
the brain really works, it is in some sense closer to etwork z than
etwork . Fast, cheap, scalable, works well to distinguish dogs and
cats: natural selection goes Ior that sort oI thing like water running
down a Iitness landscape.
It seems like an ordinary enough task to classiIy objects as either
bleggs or rubes, tossing them into the appropriate bin. But would
you notice iI seablue objects never glowed in the dark:
Maybe, iI someone presented you with twenty objects that were
alike only in being seablue, and then switched oII the light, and
none oI the objects glowed. II you got hit over the head with it,
in other words. Perhaps by presenting you with all these seablue
objects in a group, your brain Iorms a new subcategory, and can de
tect the doesn't glow" characteristic within that subcategory. But
you probably wouldn't notice iI the seablue objects were scattered
among a hundred other bleggs and rubes. It wouldn't be easy or in-
tuitive to notice, the way that distinguishing cats and dogs is easy
and intuitive.
Or: Socrates is human, all humans are mortal, thereIore
Socrates is mortal." How did Aristotle know that Socrates was hu
man: \ell, Socrates had no Ieathers, and broad nails, and walked
upright, and spoke Creek, and, well, was generally shaped like a hu
man and acted like one. So the brain decides, once and Ior all, that
Socrates is human, and Irom there, inIers that Socrates is mortal
like all other humans thus yet observed. It doesn't seem easy or
intuitive to ask how much wearing clothes, as opposed to using lan
NEURAL CATEGORIES 221
guage, is associated with mortality. ]ust, things that wear clothes
and use language are human" and humans are mortal".
Are there biases associated with trying to classiIy things into
categories once and Ior all: OI course there are. See e.g. Cultish
Countercultishness
;
.
To be continued.
;. Page , 'Cultish Countercultishness'.
222 A HUMAN`S GUIDE TO WORDS
10. How An Algorithm Feels From Inside
it without applying a
second layer oI search Ior a countercounterargument, thereby ar
guing himselI into the opposite position. This doesn't require that
Barry's prior intuitionthe intuition Barry would have had, iI we'd
asked him beIore Albert spokehave diIIered Irom Albert's.
\ell, iI Barry didn't have a diIIering intuition beIore, he sure
has one now.
. http://lesswrong.com/lw/np/disputing_definitions/
z. Page zz, 'How An Algorithm Feels From Inside'.
. Page ,o,, 'Knowing About Biases Can Hurt People'.
. Page ,zj, 'Motivated Stopping and Motivated Continuation'.
Albert: \hat do you mean, there's no sound: The
tree's roots snap, the trunk comes crashing down and
hits the ground. This generates vibrations that travel
through the ground and the air. That's where the energy
oI the Iall goes, into heat and sound. Are you saying that
iI people leave the Iorest, the tree violates conservation
oI energy:"
Barry: But no one hears anything. II there are no
humans in the Iorest, or, Ior the sake oI argument,
anything else with a complex nervous system capable oI
'hearing', then no one hears a sound."
Albert and Barry recruit arguments that feel like support Ior
their respective positions, describing in more detail the thoughts
that caused their sound"detectors
,
to Iire or stay silent. But so Iar
the conversation has still Iocused on the Iorest, rather than deIini
tions. And note that they don't actually disagree on anything that
happens in the Iorest.
Albert: This is the dumbest argument I've ever been in.
You're a niddlewicking Iallumphing pickleplumber."
Barry: Yeah: \ell, you look like your Iace caught on
Iire and someone put it out with a shovel."
Insult has been proIIered and accepted, now neither party can
back down without losing Iace. Technically, this isn't part oI the
argument, as rationalists account such things, but it's such an impor
tant part oI the Standard Lispute that I'm including it anyway.
Albert: The tree produces acoustic vibrations. By
deIinition, that is a sound."
Barry: o one hears anything. By deIinition, that is not
a sound."
The argument starts shiIting to Iocus on deIinitions. \henever
you Ieel tempted to say the words by deIinition" in an argument
,. Page zz, 'How An Algorithm Feels From Inside'.
DISPUTING DEFINITIONS 229
that is not literally about pure mathematics, remember that any
thing which is true by deIinition" is true in all possible worlds
6
, and
so observing its truth can never constrain
;
which world you live in.
Albert: My computer's microphone can record a sound
without anyone being around to hear it, store it as a Iile,
and it's called a 'sound Iile'. And what's stored in the Iile
is the pattern oI vibrations in air, not the pattern oI
neural Iirings in anyone's brain. 'Sound' means a pattern
oI vibrations."
Albert deploys an argument that feels like support Ior the word
sound" having a particular meaning. This is a diIIerent kind oI ques
tion Irom whether acoustic vibrations take place in a Iorestbut
the shiIt usually passes unnoticed.
Barry: Oh, yeah: Let's just see iI the dictionary agrees
with you."
There's a lot oI things I could be curious about in the Ialling
tree scenario. I could go into the Iorest and look at trees, or learn
how to derive the wave equation Ior changes oI air pressure, or ex
amine the anatomy oI an ear, or study the neuroanatomy oI the
auditory cortex. Instead oI doing any oI these things, I am to
consult a dictionary, apparently. \hy: Are the editors oI the dic
tionary expert botanists, expert physicists, expert neuroscientists:
Looking in an encyclopedia might make sense, but why a dictionary?
Albert: Hah! LeIinition zc in Merriam\ebster:
'Sound: Mechanical radiant energy that is transmitted by
longitudinal pressure waves in a material medium (as
air).'"
Barry: Hah! LeIinition zb in Merriam\ebster:
'Sound: The sensation perceived by the sense oI
hearing.'"
6. Page j, 'The Parable oI Hemlock'.
;. Page 8, 'Fake Lxplanations'.
230 A HUMAN`S GUIDE TO WORDS
Albert and Barry, chorus: Consarned dictionary! This
doesn't help at all!"
Lictionary editors are historians oI usage, not legislators oI lan
guage. Lictionary editors Iind words in current usage, then write
down the words next to (a small part oI
8
) what people seem to mean
by them. II there's more than one usage, the editors write down
more than one deIinition.
Albert: Look, suppose that I leIt a microphone in the
Iorest and recorded the pattern oI the acoustic vibrations
oI the tree Ialling. II I played that back to someone,
they'd call it a 'sound'! That's the common usage! Lon't
go around making up your own wacky deIinitions!"
Barry: One, I can deIine a word any way I like so long
as I use it consistently. Two, the meaning I gave was in
the dictionary. Three, who gave you the right to decide
what is or isn't common usage:"
There's quite a lot oI rationality errors in the Standard Lispute.
Some oI them I've already covered, and some oI them I've yet to
cover, likewise the remedies.
But Ior now, I would just like to point outin a mournIul sort
oI waythat Albert and Barry seem to agree on virtually every ques
tion oI what is actually going on inside the Iorest, and yet it doesn't
seem to generate any Ieeling oI agreement.
Arguing about deIinitions is a garden path, people wouldn't go
down the path iI they saw at the outset where it led. II you asked
Albert (Barry) why he's still arguing, he'd probably say something
like: Barry (Albert) is trying to sneak in his own deIinition oI
'sound', the scurvey scoundrel, to support his ridiculous point, and
I'm here to deIend the standard deIinition."
But suppose I went back in time to beIore the start oI the argu
ment:
8. Page zoo, 'Lxtensions and Intensions'.
DISPUTING DEFINITIONS 231
(Eliezer appears from nowhere in a peculiar conveyance that
looks just like the time machine from the original The Time
Machine movie.)
Barry: Cosh! A time traveler!"
Lliezer: I am a traveler Irom the Iuture! Hear my
words! I have traveled Iar into the pastaround IiIteen
minutes"
Albert: FiIteen minutes?"
Lliezer: to bring you this message!"
(There is a pause of mixed confusion and expectancy.)
Lliezer: Lo you think that 'sound' should be deIined to
require both acoustic vibrations (pressure waves in air)
and also auditory experiences (someone to listen to the
sound), or should 'sound' be deIined as meaning only
acoustic vibrations, or only auditory experience:"
Barry: You went back in time to ask us that?"
Lliezer: My purposes are my own! Answer!"
Albert: \ell. I don't see why it would matter. You
can pick any deIinition so long as you use it consistently."
Barry: Flip a coin. Lr, Ilip a coin twice."
Lliezer: Personally I'd say that iI the issue arises, both
sides should switch to describing the event in
unambiguous lowerlevel constituents, like acoustic
vibrations or auditory experiences. Or each side could
designate a new word, like 'alberzle' and 'bargulum', to
use Ior what they respectively used to call 'sound', and
then both sides could use the new words consistently.
That way neither side has to back down or lose Iace, but
they can still communicate. And oI course you should
232 A HUMAN`S GUIDE TO WORDS
try to keep track, at all times, oI some testable
proposition that the argument is actually about. Loes
that sound right to you:"
Albert: I guess."
Barry: \hy are we talking about this:"
Lliezer: To preserve your Iriendship against a
contingency you will, now, never know. For the Iuture
has already changed!"
(Eliezer and the machine vanish in a puff of smoke.)
Barry: \here were we again:"
Albert: Oh, yeah: II a tree Ialls in the Iorest, and no
one hears it, does it make a sound:"
Barry: It makes an alberzle but not a bargulum. \hat's
the next question:"
This remedy doesn't destroy every dispute over categorizations.
But it destroys a substantial Iraction.
DISPUTING DEFINITIONS 233
12. Feel the Meaning
.
Think telepathy is simple: Try building a computer that will
be telepathic with you. Telepathy, or language", or whatever you
want to call our partial thought transIer ability, is more complicated
than it looks.
But it would be quite inconvenient to go around thinking, ow
I shall partially transduce some Ieatures oI my thoughts into a linear
sequence oI phonemes which will invoke similar thoughts in my
conversational partner."
So the brain hides the complexityor rather, never represents
it in the Iirst placewhich leads people to think some peculiar
thoughts about words.
. http://lesswrong.com/lw/nq/feel_the_meaning/
z. Page zz8, 'Lisputing LeIinitions'.
. Page ;;, 'Science" as CuriosityStopper'.
As I remarked earlier
runs as Iollows:
Albert: Look, suppose that I leIt a microphone in the
Iorest and recorded the pattern oI the acoustic vibrations
oI the tree Ialling. II I played that back to someone,
they'd call it a 'sound'! That's the common usage! Lon't
go around making up your own wacky deIinitions!"
Barry: One, I can deIine a word any way I like so long
as I use it consistently. Two, the meaning I gave was in
the dictionary. Three, who gave you the right to decide
what is or isn't common usage:"
ot all deIinitional disputes progress as Iar as recognizing the
notion oI common usage. More oIten, I think, someone picks
up a dictionary because they believe that words have meanings
,
and the dictionary IaithIully records what this meaning is. Some
people even seem to believe that the dictionary determines the mean
ingthat the dictionary editors are the Legislators oI Language.
Maybe because back in elementary school, their authorityteacher
,
said that they had to obey the dictionary, that it was a mandatory
rule rather than an optional one:
Lictionary editors read what other people write, and record
what the words seem to mean, they are historians. The OxIord Ln
glish Lictionary may be comprehensive, but never authoritative.
But surely there is a social imperative to use words in a com
monly understood way: Loes not our human telepathy, our valu
able power oI language, rely on mutual coordination to work: Per
haps we should voluntarily treat dictionary editors as supreme ar
biterseven iI they preIer to think oI themselves as historiansin
. http://lesswrong.com/lw/nr/the_argument_from_common_usage/
z. Page z, 'Feel the Meaning'.
. Page zz8, 'Lisputing LeIinitions'.
. Page z, 'Feel the Meaning'.
,. Page 6o, 'Absolute Authority'.
order to maintain the quiet cooperation on which all speech de
pends.
The phrase authoritative dictionary" is almost never used cor
rectly, an example oI proper usage being the Authoritative Lictio
nary oI ILLL Standards. The ILLL is a body oI voting members
who have a proIessional need Ior exact agreement on terms and deI
initions, and so the Authoritative Lictionary oI ILLL Standards is
actual, negotiated legislation, which exerts whatever authority one
regards as residing in the ILLL.
In everyday liIe, shared language usually does not arise Irom a
deliberate agreement, as oI the ILLL. It's more a matter oI in
Iection, as words are invented and diIIuse through the culture. (A
meme", one might say, Iollowing Richard Lawkins thirty years
agobut you already know what I mean, and iI not, you can look it
up on Coogle, and then you too will have been inIected.)
Yet as the example oI the ILLL shows, agreement on language
can also be a cooperatively established public good. II you and I
wish to undergo an exchange oI thoughts via language, the human
telepathy, then it is in our mutual interest that we use the same word
Ior similar conceptspreIerably, concepts similar to the limit oI
resolution in our brain's representation thereoIeven though we
have no obvious mutual interest in using any particular word Ior a
concept.
\e have no obvious mutual interest in using the word oto" to
mean sound, or sound" to mean oto, but we have a mutual interest
in using the same word, whichever word it happens to be. (PreIer
ably, words we use Irequently should be short, but let's not get into
inIormation theory just yet.)
But, while we have a mutual interest, it is not strictly necessary
that you and I use the similar labels internally, it is only convenient.
II I know that, to you, oto" means soundthat is, you associate
oto" to a concept very similar to the one I associate to
sound"then I can say Paper crumpling makes a crackling oto."
It requires extra thought, but I can do it iI I want.
Similarly, iI you say \hat is the walkingstick oI a bowling ball
dropping on the Iloor:" and I know which concept you associate
with the syllables walkingstick", then I can Iigure out what you
mean. It may require some thought, and give me pause, because I
THE ARGUMENT FROM COMMON USAGE 239
ordinarily associate walkingstick" with a diIIerent concept. But I
can do it just Iine.
\hen humans really want to communicate with each other,
we're hard to stop! II we're stuck on a deserted island with no com
mon language, we'll take up sticks and draw pictures in sand.
Albert's appeal to the Argument Irom Common \sage assumes
that agreement on language is a cooperatively established public
good. Yet Albert assumes this Ior the sole purpose oI rhetorically
accusing Barry oI breaking the agreement, and endangering the
public good. ow the Iallingtree argument has gone all the way
Irom botany to semantics to politics, and so Barry responds by chal
lenging Albert Ior the authority to deIine the word.
A rationalist, with the discipline oI hugging the query
6
active,
would notice that the conversation had gone rather Iar astray.
Oh, dear reader, is it all really necessary: Albert knows what
Barry means by sound". Barry knows what Albert means by
sound". Both Albert and Barry have access to words, such as
acoustic vibrations" or auditory experience", which they already
associate to the same concepts, and which can describe events in
the Iorest without ambiguity. II they were stuck on a deserted is
land, trying to communicate with each other, their work would be
done.
\hen both sides know what the other side wants to say, and
both sides accuse the other side oI deIecting Irom common usage",
then whatever it is they are about, it is clearly not working out a
way to communicate with each other. But this is the whole beneIit that
common usage provides in the Iirst place.
\hy would you argue about the meaning oI a word, two sides
trying to wrest it back and Iorth: II it's just a namespace conIlict
that has gotten blown out oI proportion, and nothing more is at
stake, then the two sides need merely generate two new words and
use them consistently
;
.
Yet oIten categorizations Iunction as hidden inIerences
8
and
disguised queries
j
. Is atheism a religion"
o
: II someone is arguing
6. Page , 'Hug the Query'.
;. Page zz8, 'Lisputing LeIinitions'.
8. Page j;, '\ords as Hidden InIerences'.
j. Page z, 'Lisguised Queries'.
o. Page jz, '\ncritical Supercriticality'.
240 A HUMAN`S GUIDE TO WORDS
that the reasoning methods used in atheism are on a par with the
reasoning methods used in ]udaism, or that atheism is on a par with
Islam in terms oI causally engendering violence, then they have a
clear argumentative stake in lumping it all together into an indis
tinct gray blur oI Iaith
.
Or consider the Iight to blend together blacks and whites as
people". This would not be a time to generate two wordswhat's
at stake is exactly the idea that you shouldn't draw a moral distinc
tion.
But once any empirical proposition is at stake, or any moral
proposition, you can no longer appeal to common usage.
II the question is how to cluster together similar things
z
Ior
purposes oI inIerence, empirical predictions will depend on the an
swer, which means that deIinitions can be wrong. A conIlict oI
predictions cannot be settled by an opinion poll.
II you want to know whether atheism should be clustered with
supernaturalist religions Ior purposes oI some particular empirical
inIerence, the dictionary can't answer you.
II you want to know whether blacks are people, the dictionary
can't answer you.
II everyone believes that the red light in the sky is Mars the Cod
oI \ar, the dictionary will deIine Mars" as the Cod oI \ar
. II
everyone believes that Iire is the release oI phlogiston, the dictio
nary will deIine Iire" as the release oI phlogiston.
There is an art to using words, even when deIinitions are not
literally true or Ialse, they are oIten wiser or more Ioolish. Lictio
naries are mere histories oI past usage, iI you treat them as supreme
arbiters oI meaning, it binds you to the wisdom oI the past
, Ior
bidding you to do better
,
.
Though do take care to ensure (iI you must depart Irom the wis
dom oI the past) that people can Iigure out what you're trying to
swim.
. Page ,jj, 'The Fallacy oI Cray'.
z. Page zo, 'Similarity Clusters'.
. Page zoo, 'Lxtensions and Intensions'.
. Page oj, 'Cuardians oI the Truth'.
,. http://lesswrong.com/lw/h9/tsuyoku_vs_the_egalitarian_instinct/
THE ARGUMENT FROM COMMON USAGE 241
14. Empty Labels
, Lost Purposes
.
How could tabooing a word help you keep your purpose:
From Lost Purposes
:
As you read this, some young man or woman is sitting at
a desk in a university, earnestly studying material they
have no intention oI ever using, and no interest in
knowing Ior its own sake. They want a highpaying job,
and the highpaying job requires a piece oI paper, and the
piece oI paper requires a previous master's degree, and
the master's degree requires a bachelor's degree, and the
university that grants the bachelor's degree requires you
to take a class in zthcentury knitting patterns to
graduate. So they diligently study, intending to Iorget it
8. Page z,, 'Taboo Your \ords'.
j. Page o,, 'Making BelieIs Pay Rent (in Anticipated Lxperiences)'.
o. Page o8, 'BelieI in BelieI'.
. Page , 'Hug the Query'.
z. Page ,, 'Cached Thoughts'.
. http://lesswrong.com/lw/le/lost_purposes/
. http://lesswrong.com/lw/le/lost_purposes/
REPLACE THE SYMBOL WITH THE SUBSTANCE 249
all the moment the Iinal exam is administered, but still
seriously working away, because they want that piece oI
paper.
\hy are you going to school": To get an education" ending
in a degree". Blank out the Iorbidden words and all their obvious
synonyms, visualize the actual details, and you're much more likely
to notice that school" currently seems to consist oI sitting next
to bored teenagers listening to material you already know, that a
degree" is a piece oI paper with some writing on it, and that edu
cation" is Iorgetting the material as soon as you're tested on it.
Leaky generalizations
,
oIten maniIest through categorizations:
People who actually learn in classrooms are categorized as getting
an education", so getting an education" must be good, but then
anyone who actually shows up at a college will also match against
the concept getting an education", whether or not they learn.
Students who understand math will do well on tests, but iI
you require schools to produce good test scores, they'll spend all
their time teaching to the test. A mental category, that imperIectly
matches your goal, can produce the same kind oI incentive Iailure
internally. You want to learn, so you need an education", and then
as long as you're getting anything that matches against the category
education", you may not notice whether you're learning or not. Or
you'll notice, but you won't realize you've lost sight oI your original
purpose, because you're getting an education" and that's how you
mentally described your goal.
To categorize is to throw away inIormation. II you're told that
a Ialling tree makes a sound", you don't know what the actual
sound is, you haven't actually heard the tree Ialling. II a coin lands
heads", you don't know its radial orientation. A blue eggshaped
thing may be a blegg", but what iI the exact egg shape varies, or the
exact shade oI blue: You want to use categories to throw away ir
relevant inIormation, to siIt gold Irom dust, but oIten the standard
categorization ends up throwing out relevant inIormation too. And
when you end up in that sort oI mental trouble, the Iirst and most
obvious solution is to play Taboo.
For example: Play Taboo" is itselI a leaky generalization. Has
bro's version is not the rationalist version, they only list Iive ad
,. http://lesswrong.com/lw/lc/leaky_generalizations/
250 A HUMAN`S GUIDE TO WORDS
ditional banned words on the card, and that's not nearly enough
coverage to exclude thinking in Iamiliar old words. \hat ratio
nalists do would count as playing Tabooit would match against
the play Taboo" conceptbut not everything that counts as play
ing Taboo works to Iorce original seeing. II you just think play
Taboo to Iorce original seeing", you'll start thinking that anything
that counts as playing Taboo must count as original seeing.
The rationalist version isn't a game, which means that you can't
win by trying to be clever and stretching the rules. You have to play
Taboo with a voluntary handicap: Stop yourselI Irom using syn
onyms that aren't on the card. You also have to stop yourselI Irom
inventing a new simple word or phrase that Iunctions as an equiv
alent mental handle to the old one. You are trying to zoom in on
your map, not rename the cities, dereIerence the pointer, not allo
cate a new pointer, see the events as they happen, not rewrite the
cliche in a diIIerent wording.
By visualizing the problem in more detail, you can see the lost
purpose: Lxactly what do you do when you play Taboo": \hat
purpose does each and every part serve:
II you see your activities and situation originally, you will be able
to originally see your goals as well. II you can look with Iresh eyes,
as though Ior the Iirst time, you will see yourselI doing things that
you would never dream oI doing iI they were not habits.
Purpose is lost whenever the substance (learning, knowledge,
health) is displaced by the symbol (a degree, a test score, medical
care). To heal a lost purpose, or a lossy categorization, you must do
the reverse:
Replace the symbol with the substance, replace the signiIier
with the signiIied, replace the property with the membership test,
replace the word with the meaning, replace the label with the
concept, replace the summary with the details, replace the proxy
question with the real question, dereIerence the pointer, drop into
a lower level oI organization, mentally simulate the process instead
oI naming it, zoom in on your map.
The Simple Truth
6
" was generated by an exercise oI this dis
cipline to describe truth" on a lower level oI organization, without
invoking terms like accurate", correct", represent", reIlect", se
6. http://yudkowsky.net/bayes/truth.html
REPLACE THE SYMBOL WITH THE SUBSTANCE 251
mantic", believe", knowledge", map", or real". (And remember
that the goal is not really to play Taboothe word true" appears
in the text, but not to deIine truth. It would get a buzzer in Has
bro's game, but we're not actually playing that game. Ask yourselI
whether the document IulIilled its purpose, not whether it Iollowed
the rules.)
Bayes's Rule itselI describes evidence" in pure math, without
using words like implies", means", supports", proves", or justi
Iies". Set out to define such philosophical terms, and you'll just go
in circles.
And then there's the most important word oI all to Taboo. I've
oIten
;
warned
8
that you should be careIul not to overuse it, or
even avoid the concept
j
in certain cases. ow you know the re
al reason why. It's not a bad subject to think about. But your true
understanding is measured by your ability to describe what you're
doing and why, without using that word or any oI its synonyms.
;. http://yudkowsky.net/virtues/
8. Page z, 'Two Cult Koans'.
j. http://lesswrong.com/lw/nc/newcombs_problem_and_regret_of_rationality/
252 A HUMAN`S GUIDE TO WORDS
17. Fallacies of Compression
oI Taboo
,
, and to the wise use oI words, because words oI
ten represent the points on our map, the labels under which we Iile
our propositions and the buckets into which we drop our inIorma
tion. Avoiding a single word, or allocating new ones, is oIten part
oI the skill oI expanding the map.
. Page z8, 'Replace the Symbol with the Substance'.
,. Page z,, 'Taboo Your \ords'.
256 A HUMAN`S GUIDE TO WORDS
18. Categorizing Has Consequences
. It would
seem that blood type plays the role in ]apan that astrological signs
play in the \est, right down to blood type horoscopes in the daily
newspaper.
This Iad is especially odd because blood types have never been
mysterious, not in ]apan and not anywhere. \e only know blood
types even exist thanks to Karl Landsteiner. o mystic witch doc
tor, no venerable sorcerer, ever said a word about blood types, there
are no ancient, dusty scrolls to shroud the error in the aura oI an
tiquity
is not
a neutral act. Maybe a more cleanly designed, more purely Bayesian
AI could ponder an arbitrary class and not be inIluenced by it. But
you, a human, do not have that option. Categories are not static
things in the context oI a human brain, as soon as you actually think
oI them, they exert Iorce on your mind. One more reason not to
believe you can deIine a word any way you like.
. Page zoj, 'The Cluster Structure oI Thingspace'.
CATEGORIZING HAS CONSEQUENCES 259
19. Sneaking in Connotations
, the ob
served characteristics are Socrates's clothes, speech, tool use, and
. http://lesswrong.com/lw/ny/sneaking_in_connotations/
z. Page z,;, 'Categorizing Has Consequences'.
. Page j, 'The Parable oI Hemlock'.
generally human shape, the categorization is human", the inIerred
characteristic is poisonability by hemlock.
OI course there's no hard distinction between observed charac
teristics" and inIerred characteristics". II you hear someone speak,
they're probably shaped like a human, all else being equal. II you
see a human Iigure in the shadows, then ceteris paribus it can prob
ably speak.
And yet some properties do tend to be more inIerred than ob
served. You're more likely to decide that someone is human, and
will thereIore burn iI exposed to open Ilame, than carry through the
inIerence the other way around.
II you look in a dictionary Ior the deIinition oI human", you're
more likely to Iind characteristics like intelligence" and Ieather
less biped"characteristics that are useIul Ior quickly eyeballing
what is and isn't a humanrather than the ten thousand conno
tations, Irom vulnerability to hemlock, to overconIidence, that we
can inIer Irom someone's being human. \hy: Perhaps dictionar
ies are intended to let you match up labels to similarity groups, and
so are designed to quickly isolate clusters in thingspace. Or per
haps the big, distinguishing characteristics are the most salient, and
thereIore Iirst to pop into a dictionary editor's mind. (I'm not sure
how aware dictionary editors are oI what they really do.)
But the upshot is that when Lanny pulls out his OLL to look
up wiggin", he sees listed only the Iirstglance characteristics that
distinguish a wiggin: Creen eyes and black hair. The OLL doesn't
list the many minor connotations that have come to attach to this
term, such as criminal proclivities, culinary peculiarities, and some
unIortunate childhood activities.
How did those connotations get there in the Iirst place: Maybe
there was once a Iamous wiggin with those properties. Or maybe
someone made stuII up at random, and wrote a series oI bestselling
books about it (The Wiggin, Talking to Wiggins, Raising Your Little
Wiggin, Wiggins in the Bedroom). Maybe even the wiggins believe it
now, and act accordingly. As soon as you call some people wig
gins", the word will begin acquiring connotations.
But remember the Parable oI Hemlock
: II we go by the logical
class deIinitions, we can never class Socrates as a human" until
. Page j, 'The Parable oI Hemlock'.
SNEAKING IN CONNOTATIONS 261
aIter we observe him to be mortal. \henever someone pulls a dic
tionary, they're generally trying to sneak in a connotation, not the
actual deIinition written down in the dictionary.
AIter all, iI the only meaning oI the word wiggin" is greeneyed
blackhaired person", then why not just call those people green
eyed blackhaired people": And iI you're wondering whether some
one is a ketchupreacher, why not ask
,
directly
6
, Is he a ketchup
reacher:" rather than Is he a wiggin:" (ote substitution oI sub
stance Ior symbol
;
.)
Oh, but arguing the real question would require work. You'd
have to actually watch the wiggin to see iI he reached Ior the
ketchup. Or maybe see iI you can Iind statistics on how many
greeneyed blackhaired people actually like ketchup. At any rate,
you wouldn't be able to do it sitting in your living room with your
eyes closed. And people are lazy. They'd rather argue by deIini
tion", especially since they think you can deIine a word any way
you like".
But oI course the real reason they care whether someone is
a wiggin" is a connotationa Ieeling that comes along with the
wordthat isn't in the deIinition they claim to use.
Imagine Lanny saying, Look, he's got green eyes and black
hair. He's a wiggin! It says so right there in the dictionary!there-
fore, he's got black hair. Argue with that, iI you can!"
Loesn't have much oI a triumphant ring to it, does it: II the
real point oI the argument actually was contained in the dictionary
deIinitioniI the argument genuinely was logically validthen the
argument would feel empty, it would either say nothing new, or beg
the question.
It's only the attempt to smuggle in connotations not explicitly
listed in the deIinition, that makes anyone Ieel they can score a point
that way.
,. Page j;, '\ords as Hidden InIerences'.
6. Page , 'Hug the Query'.
;. Page z8, 'Replace the Symbol with the Substance'.
262 A HUMAN`S GUIDE TO WORDS
20. Arguing By Definition
point
:
Long have I pondered the meaning oI the word Art",
and at last I've Iound what seems to me a satisIactory
deIinition: Art is that which is designed Ior the purpose
oI creating a reaction in an audience."
Just because theres a word art doesnt mean that it has a meaning,
floating out there in the void, which you can discover by finding the right
definition.
It Ieels that way
, X
z
, X
, X
, X
,
, X
6
, X
;
, X
8
.l
There's an extraordinarily ubiquitous quantityin physics,
mathematics, and even biologycalled entropy, and the entropy oI
X is bits. This means that, on average, we'll have to ask yesor
no questions to Iind out X's value. For example, someone could tell
us X's value using this code:
X
: oo X
z
: oo X
: o X
: oo
X
,
: o X
6
: o X
;
: X
8
: ooo
So iI I asked Is the Iirst symbol :" and heard yes", then asked
Is the second symbol :" and heard no", then asked Is the third
symbol :" and heard no", I would know that X was in state .
ow suppose that the system Y has Iour possible states with
the Iollowing probabilities:
Y
: /z
(,o%)
Y
z
: /
(z,%)
Y
: /8
(z.,%)
Y
: /8
(z.,%)
Then the entropy oI Y would be .;, bits, meaning that we can
Iind out its value by asking .;, yesorno questions.
\hat does it mean to talk about asking one and threeIourths
oI a question: Imagine that we designate the states oI Y using the
Iollowing code:
Y
: Y
z
: o Y
: oo Y
: ooo
. http://lesswrong.com/lw/o1/entropy_and_short_codes/
z. Page z6;, '\here to Lraw the Boundary:'.
First you ask, Is the Iirst symbol :" II the answer is yes",
you're done: Y is in state . This happens halI the time, so ,o% oI
the time, it takes yesorno question to Iind out Y's state.
Suppose that instead the answer is o". Then you ask, Is the
second symbol :" II the answer is yes", you're done: Y is in state
z. Y is in state z with probability /, and each time Y is in state z
we discover this Iact using two yesorno questions, so z,% oI the
time it takes z questions to discover Y's state.
II the answer is o" twice in a row, you ask Is the third sym
bol :" II yes", you're done and Y is in state , iI no", you're done
and Y is in state . The /8 oI the time that Y is in state , it takes
three questions, and the /8 oI the time that Y is in state , it takes
three questions.
(/z * ) - (/ * z) - (/8 * ) - (/8 * )
o., - o., - o.;, - o.;,
.;,.
The general Iormula Ior the entropy oI a system S is the sum,
over all S
i
, oI p(S
i
)*log
z
(p(S
i
)).
For example, the log (base z) oI /8 is . So (/8 * ) o.;, is
the contribution oI state S
, instead
oI ooo: \ouldn't that let us transmit messages more quickly:"
But iI you use the code o Ior Y
() or Y
, Iollowed by Y
z
" or Y
, Iol
lowed by Y
."
The moral is that short words are a conserved resource.
The key to creating a good codea code that transmits mes
sages as compactly as possibleis to reserve short words Ior things
272 A HUMAN`S GUIDE TO WORDS
that you'll need to say Irequently, and use longer words Ior things
that you won't need to say as oIten.
\hen you take this art to its limit, the length oI the message you
need to describe something, corresponds exactly or almost exact
ly to its probability. This is the Minimum Lescription Length or
Minimum Message Length Iormalization oI Occam's Razor
.
And so even the labels that we use Ior words are not quite ar
bitrary. The sounds that we attach to our concepts can be better
or worse, wiser or more Ioolish. Lven apart Irom considerations oI
common usage
!
I say all this, because the idea that You can X any way you like"
is a huge obstacle to learning how to X wisely. It's a Iree coun
try, I have a right to my own opinion
,
" obstructs the art oI Iinding
truth. I can deIine a word any way I like" obstructs the art oI carv
ing reality at its joints
6
. And even the sensiblesounding The labels
we attach to words are arbitrary" obstructs awareness oI compact
ness. Prosody too, Ior that matterTolkien once observed what a
beautiIul sound the phrase cellar door" makes, that is the kind oI
awareness it takes to use language like Tolkien.
The length oI words also plays a nontrivial role in the cognitive
science oI language:
Consider the phrases recliner", chair", and Iurniture". Re
cliner is a more speciIic category than chair, Iurniture is a more
general category than chair. But the vast majority oI chairs have a
common useyou use the same sort oI motor actions to sit down
in them, and you sit down in them Ior the same sort oI purpose (to
take your weight oII your Ieet while you eat, or read, or type, or
rest). Recliners do not depart Irom this theme. Furniture", on the
other hand, includes things like beds and tables which have diIIer
ent uses, and call up diIIerent motor Iunctions, Irom chairs.
In the terminology oI cognitive psychology, chair" is a basic-lev-
el category.
People have a tendency to talk, and presumably think, at the ba
sic level oI categorizationto draw the boundary around chairs",
. Page j,, 'Occam's Razor'.
. Page z8, 'The Argument Irom Common \sage'.
,. http://www.overcomingbias.com/2006/12/you_are_never_e.html
6. Page z6;, '\here to Lraw the Boundary:'.
ENTROPY, AND SHORT CODES 273
rather than around the more speciIic category recliner", or the
more general category Iurniture". People are more likely to say
You can sit in that chair" than You can sit in that recliner" or
You can sit in that Iurniture".
And it is no coincidence that the word Ior chair" contains
Iewer syllables than either recliner" or Iurniture". Basiclevel cat
egories, in general, tend to have short names, and nouns with short
names tend to reIer to basiclevel categories. ot a perIect rule, oI
course, but a deIinite tendency. Frequent use goes along with short
words, short words go along with Irequent use.
Or as Louglas HoIstadter put it, there's a reason why the Ln
glish language uses the" to mean the" and antidisestablishmentar
ianism" to mean antidisestablishmentarianism" instead oI antidis
establishmentarianism other way around.
274 A HUMAN`S GUIDE TO WORDS
23. Mutual Information, and Density in
Thingspace
8 and z
z
, so questions can distinguish 8 possibilities and z
questions can distinguish possibilities, but remember that iI the
possibilities were not all equally likely, we could use a more clever
code to discover Y's state using e.g. .;, questions on average. In
this case, though, X's probability mass is evenly distributed over all its
possible states, and likewise Y, so we can't use any clever codes.
\hat is the entropy oI the combined system (X,Y):
You might be tempted to answer, It takes questions to Iind
out X, and then z questions to Iind out Y, so it takes , questions
total to Iind out the state oI X and Y."
But what iI the two variables are entangled, so that learning the
state oI Y tells us something about the state oI X:
In particular, let's suppose that X and Y are either both odd, or
both even.
ow iI we receive a bit message (ask questions) and learn
that X is in state ,, we know that Y is in state or state , but not
state z or state . So the single additional question Is Y in state
:", answered o", tells us the entire state oI (X,Y): XX
,
, YY
.
And we learned this with a total oI questions.
Conversely, iI we learn that Y is in state using two questions,
it will take us only an additional two questions to learn whether X
is in state z, , 6, or 8. Again, Iour questions to learn the state oI
the joint system.
. http://lesswrong.com/lw/o2/mutual_information_and_density_in_thingspace/
z. Page z;, 'Lntropy, and Short Codes'.
The mutual information oI two variables is deIined as the diIIer
ence between the entropy oI the joint system and the entropy oI
the independent systems: I(X,Y) H(X) - H(Y) H(X,Y).
Here there is one bit oI mutual inIormation between the two
systems: Learning X tells us one bit oI inIormation about Y (cuts
down the space oI possibilities Irom to z, a IactoroIz decrease in
the volume) and learning Y tells us one bit oI inIormation about X
(cuts down the possibility space Irom 8 to ).
\hat about when probability mass is not evenly distributed:
Yesterday, Ior example, we discussed the case in which Y had the
probabilities /z, /, /8, /8 Ior its Iour states. Let us take this to
be our probability distribution over Y, considered independently
iI we saw Y, without seeing anything else, this is what we'd expect
to see. And suppose the variable Z has two states, and z, with
probabilities /8 and ,/8 respectively.
Then iI and only iI the joint distribution oI Y and Z is as Iollows,
there is zero mutual inIormation between Y and Z:
Z
: /6 Z
Y
z
: /z Z
: /6 Z
: /6
Z
z
Y
: ,/6 Z
z
Y
z
: ,/z Z
z
Y
: ,/6 Z
z
Y
: ,/6
This distribution obeys the law:
p(Y,Z) P(Y)P(Z)
For example, P(Z
Y
z
) P(Z
)P(Y
z
) /8 * / /z.
And observe that we can recover the marginal (independent)
probabilities oI Y and Z just by looking at the joint distribution:
P(Y
can
happen
P(Z
) - P(Z
z
Y
)
/6 - ,/6
/z.
So, just by inspecting the joint distribution, we can determine
whether the marginal variables Y and Z are independent, that is,
276 A HUMAN`S GUIDE TO WORDS
whether the joint distribution Iactors into the product oI the
marginal distributions, whether, Ior all Y and Z, P(Y,Z) P(Y)P(Z).
This last is signiIicant because, by Bayes's Rule
:
P(Y
i
,Z
j
) P(Y
i
)P(Z
j
)
P(Y
i
,Z
j
)/P(Z
j
) P(Y
i
)
P(Y
i
'Z
j
) P(Y
i
)
In Lnglish, AIter you learn Z
j
, your belieI about Y
i
is just what
it was beIore."
So when the distribution Iactorizes when P(Y,Z) P(Y)P(Z)
this is equivalent to Learning about Y never tells us anything about
Z or vice versa."
From which you might suspect, correctly, that there is no mu
tual inIormation between Y and Z. \here there is no mutual
inIormation, there is no Bayesian evidence, and vice versa.
Suppose that in the distribution YZ above, we treated each
possible combination oI Y and Z as a separate eventso that
the distribution YZ would have a total oI 8 possibilities, with the
probabilities shownand then we calculated the entropy oI the dis
tribution YZ the same way we would calculate the entropy oI any
distribution:
/6 log
z
(/6) - /z log
z
(/z) - /6 log
z
(/6) - . - ,/6
log
z
(,/6)
You would end up with the same total you would get iI you sep
arately calculated the entropy oI Y plus the entropy oI Z. There is
no mutual inIormation between the two variables, so our uncertain
ty about the joint system is not any less than our uncertainty about
the two systems considered separately. (I am not showing the cal
culations, but you are welcome to do them, and I am not showing
the prooI that this is true in general, but you are welcome to Coogle
on Shannon entropy" and mutual inIormation".)
\hat iI the joint distribution doesn't Iactorize: For example:
Z
: z/6 Z
Y
z
: 8/6 Z
: /6 Z
: /6
. Page z8, 'An Intuitive Lxplanation oI Bayes' Theorem'.
MUTUAL INFORMATION, AND DENSITY IN THINGSPACE 277
Z
z
Y
: zo/6 Z
z
Y
z
: 8/6 Z
z
Y
: ;/6 Z
z
Y
: ,/6
II you add up the joint probabilities to get marginal probabil
ities, you should Iind that P(Y
) /z, P(Z
Y
z
)
equals 8/6, where P(Z
)P(Y
z
) would equal /8 * / 6/6. That is,
the probability oI running into Z
Y
z
together, is greater than you'd
expect based on the probabilities oI running into Z
or Y
z
separate
ly.
\hich in turn implies:
P(Z
Y
z
) > P(Z
)P(Y
z
)
P(Z
Y
z
)/P(Y
z
) > P(Z
)
P(Z
'Y
z
) > P(Z
)
Since there's an unusually high" probability Ior P(Z
Y
z
) de
Iined as a probability higher than the marginal probabilities would
indicate by deIault it Iollows that observing Y
z
is evidence which
increases the probability oI Z
must Iavor Y
z
.
As there are at least some values oI Y that tell us about Z
(and vice versa) there must be mutual inIormation between the two
variables, and so you will IindI am conIident, though I haven't ac
tually checkedthat calculating the entropy oI YZ yields less total
uncertainty than the sum oI the independent entropies oI Y and Z.
H(Y,Z) H(Y) - H(Z) I(Y,Z) with all quantities necessarily non
negative.
(I digress here to remark that the symmetry oI the
expression Ior the mutual inIormation shows that Y must
tell us as much about Z, on average, as Z tells us about
Y. I leave it as an exercise to the reader to reconcile this
with anything they were taught in logic class about how,
iI all ravens are black, being allowed to reason Raven(x)
>Black(x) doesn't mean you're allowed to reason Black(x)
>Raven(x). How diIIerent seem the symmetrical
278 A HUMAN`S GUIDE TO WORDS
probability Ilows oI the Bayesian, Irom the sharp lurches
oI logiceven though the latter is just a degenerate case
oI the Iormer.)
But," you ask, what has all this to do with the proper use oI
words:"
In Lmpty Labels
Y
z
will be just as long as the code Ior Z
plus the
code Ior Y
z
, unless P(Z
Y
z
) > P(Z
)P(Y
z
), in which case the code Ior
the word can be shorter than the codes Ior its parts.
And this in turn corresponds exactly to the case where we can
inIer some oI the properties oI the thing, Irom seeing its other
properties. It must be more likely than the deIault that Ieatherless
bipedal things will also be mortal.
OI course the word human" really describes many, many more
properties when you see a humanshaped entity that talks and
wears clothes, you can inIer whole hosts oI biochemical and
anatomical and cognitive Iacts about it. To replace the word hu
man" with a description oI everything we know about humans
would require us to spend an inordinate amount oI time talking.
. Page zz, 'Lmpty Labels'.
,. Page z8, 'Replace the Symbol with the Substance'.
MUTUAL INFORMATION, AND DENSITY IN THINGSPACE 279
But this is true only because a Ieatherless talking biped is Iar more
likely than deIault to be poisonable by hemlock, or have broad nails,
or be overconIident.
Having a word Ior a thing, rather than just listing its properties,
is a more compact code precisely in those cases where we can inIer
some oI those properties Irom the other properties. (\ith the ex
ception perhaps oI very primitive words, like red", that we would
use to send an entirely uncompressed description oI our sensory ex
periences. But by the time you encounter a bug, or even a rock,
you're dealing with nonsimple property collections, Iar above the
primitive level.)
So having a word wiggin
6
" Ior greeneyed blackhaired people,
is more useIul than just saying greeneyed blackhaired person",
precisely when:
. Creeneyed people are more likely than average to be
blackhaired (and vice versa), meaning that we can
probabilistically inIer green eyes Irom black hair or vice
versa, or
z. \iggins share other properties that can be inIerred at
greaterthandeIault probability. In this case we have to
separately observe the green eyes and black hair, but then,
aIter observing both these properties independently, we
can probabilistically inIer other properties (like a taste Ior
ketchup).
One may even consider the act oI deIining a word as a promise
to this eIIect. Telling someone, I deIine the word 'wiggin' to mean
a person with green eyes and black hair", by Cricean implication,
asserts that the word wiggin" will somehow help you make inIer
ences / shorten your messages.
II greeneyes and black hair have no greater than deIault prob
ability to be Iound together, nor does any other property occur at
greater than deIault probability along with them, then the word
wiggin" is a lie: The word claims that certain people are worth dis
tinguishing as a group, but they're not.
In this case the word wiggin" does not help describe reality
more compactlyit is not deIined by someone sending the shortest
messageit has no role in the simplest explanation. Lquivalently,
6. Page z6o, 'Sneaking in Connotations'.
280 A HUMAN`S GUIDE TO WORDS
the word wiggin" will be oI no help to you in doing any Bayesian
inIerence. Lven iI you do not call the word a lie, it is surely an error.
And the way to carve reality at its joints, is to draw your bound
aries around concentrations oI unusually high probability density in
Thingspace
;
.
;. Page zoj, 'The Cluster Structure oI Thingspace'.
MUTUAL INFORMATION, AND DENSITY IN THINGSPACE 281
24. Superexponential Conceptspace, and
Simple Words
between X and Y,
I(X,Y), which is the diIIerence between the entropy
oI the joint
probability distribution, H(X,Y) and the entropies oI the marginal
distributions, H(X) - H(Y).
I gave the example oI a variable X, having eight states ..8
which are all equally probable iI we have not yet encountered any
evidence, and a variable Y, with states .., which are all equally
probable iI we have not yet encountered any evidence. Then iI we
calculate the marginal entropies H(X) and H(Y), we will Iind that X
has bits oI entropy, and Y has z bits.
However, we also know that X and Y are both even or both odd,
and this is all we know about the relation between them. So Ior the
joint distribution (X,Y) there are only 6 possible states, all equally
probable, Ior a joint entropy oI bits. This is a bit entropy deIect,
compared to , bits oI entropy iI X and Y were independent. This
entropy deIect is the mutual inIormation the inIormation that X
tells us about Y, or vice versa, so that we are not as uncertain about
one aIter having learned the other.
Suppose, however, that there exists a third variable Z. Z has
two states, even" and odd", perIectly correlated to the evenness or
oddness oI (X,Y). In Iact, we'll suppose that Z is just the question
Are X and Y even or odd:"
II we have no evidence about X and Y, then Z itselI necessarily
has bit oI entropy on the inIormation given. There is bit oI mu
tual inIormation between Z and X, and bit oI mutual inIormation
between Z and Y. And, as previously noted, bit oI mutual in
Iormation between X and Y. So how much entropy Ior the whole
system (X,Y,Z): You might naively expect that
. http://lesswrong.com/lw/o8/conditional_independence_and_naive_bayes/
z. http://lesswrong.com/lw/o7/searching_for_bayesstructure/
. Page z;,, 'Mutual InIormation, and Lensity in Thingspace'.
. Page z;, 'Lntropy, and Short Codes'.
H(X,Y,Z) H(X) - H(Y) - H(Z) I(X,Z) I(Z,Y)
I(X,Y)
but this turns out not to be the case.
The joint system (X,Y,Z) only has 6 possible states since Z is
just the question Are X & Y even or odd:" so H(X,Y,Z) bits.
But iI you calculate the Iormula just given, you get
( - z - )bits bits \ROC!
,
\hy: Because iI you have the mutual inIormation between X
and Z, and the mutual inIormation between Z and Y, that may
include some oI the same mutual inIormation that we'll calculate ex
ists between X and Y. In this case, Ior example, knowing that X
is even tells us that Z is even, and knowing that Z is even tells us
that Y is even, but this is the same inIormation that X would tell
us about Y. \e doublecounted
6
some oI our knowledge, and so
came up with too little entropy.
The correct Iormula is (I believe):
H(X,Y,Z) H(X) - H(Y) - H(Z) I(X,Z) I(Z,Y) I(X,Y
' Z)
Here the last term, I(X,Y ' Z), means, the inIormation that X
tells us about Y, given that we already know Z". In this case, X
doesn't tell us anything about Y, given that we already know Z, so
the term comes out as zero and the equation gives the correct an
swer. There, isn't that nice:
o," you correctly
;
reply, Ior you have not told me how to cal-
culate I(X,Y'Z), only given me a verbal argument that it ought to be
zero."
\e calculate I(X,Y'Z) just the way you would expect. I(X,Y)
H(X) - H(Y) H(X,Y), so:
I(X,Y'Z) H(X'Z) - H(Y'Z) H(X,Y'Z)
,. http://www.youtube.com/watch?v=tRVUOGUmxJI
6. Page ;, 'Fake Causality'.
;. http://yudkowsky.net/bayes/technical.html
CONDITIONAL INDEPENDENCE, AND NAIVE BAYES 289
And now, I suppose, you want to know how to calculate the
conditional entropy: \ell, the original Iormula Ior the entropy is:
H(S) Sum i: p(S
i
)*log
z
(p(S
i
))
II we then learned a new Iact Z
o
, our remaining uncertainty
about S would be:
H(S'Z
o
) Sum i: p(S
i
'Z
o
)*log
z
(p(S
i
'Z
o
))
So iI we're going to learn a new Iact Z, but we don't know which
Z yet, then, on average, we expect to be around this uncertain oI S
aIterward:
H(S'Z) Sum j: (p(Z
j
) * Sum i: p(S
i
'Z
j
)*log
z
(p(S
i
'Z
j
)))
And that's how one calculates conditional entropies, Irom
which, in turn, we can get the conditional mutual inIormation.
There are all sorts oI ancillary theorems here, like:
H(X'Y) H(X,Y) H(Y)
and
iI I(X,Z) o and I(Y,X'Z) o then I(X,Y) o
but I'm not going to go into those.
But," you ask, what does this have to do with the nature oI
words and their hidden Bayesian structure:"
I am just so unspeakably glad that you asked that question, be
cause I was planning to tell you whether you liked it or not. But
Iirst there are a couple more preliminaries.
You will rememberyes, you will rememberthat there is a
duality between mutual inIormation and Bayesian evidence. Mutu
al inIormation is positive iI and only iI the probability oI at least
some joint events P(x, y) does not equal the product oI the prob
abilities oI the separate events P(x)*P(y). This, in turn, is exactly
equivalent to the condition that Bayesian evidence exists between x
and y:
290 A HUMAN`S GUIDE TO WORDS
I(X,Y) > o >
P(x,y) ! P(x)*P(y)
P(x,y) / P(y) ! P(x)
P(x'y) ! P(x)
II you're conditioning on Z, you just adjust the whole derivation
accordingly:
I(X,Y ' Z) > o >
P(x,y'z) ! P(x'z)*P(y'z)
P(x,y'z) / P(y'z) ! P(x'z)
(P(x,y,z) / P(z)) / (P(y, z) / P(z)) ! P(x'z)
P(x,y,z) / P(y,z) ! P(x'z)
P(x'y,z) ! P(x'z)
\hich last line reads Lven knowing Z, learning Y still changes
our belieIs about X."
Conversely, as in our original case oI Z being even" or odd", Z
screens oII
8
X Irom Y that is, iI we know that Z is even", learning
that Y is in state tells us nothing more about whether X is z, , 6, or
8. Or iI we know that Z is odd", then learning that X is , tells us
nothing more about whether Y is or . Learning Z has rendered
X and Y conditionally independent.
Conditional independence is a hugely important concept in
probability theoryto cite just one example, without conditional
independence, the universe would have no structure.
Today, though, I only intend to talk about one particular kind
oI conditional independencethe case oI a central variable that
screens oII other variables surrounding it, like a central body with
tentacles.
Let there be Iive variables \, V, \, X, Y, and moreover, sup
pose that Ior every pair oI these variables, one variable is evidence
about the other. II you select \ and \, Ior example, then learning
\\
will tell you something you didn't know beIore about the
probability \\
.
An unmanageable inIerential mess: Lvidence gone wild: ot
necessarily.
8. Page 8, 'Argument Screens OII Authority'.
CONDITIONAL INDEPENDENCE, AND NAIVE BAYES 291
Maybe \ is Speaks a language", V is Two arms and ten digits",
\ is \ears clothes", X is Poisonable by hemlock", and Y is Red
blood". ow iI you encounter a thingintheworld, that might be
an apple and might be a rock, and you learn that this thing speaks
Chinese, you are liable to assess a much higher probability that it
wears clothes, and iI you learn that the thing is not poisonable by
hemlock, you will assess a somewhat lower probability that it has
red blood.
ow some oI these rules are stronger than others. There is the
case oI Fred, who is missing a Iinger due to a volcano accident, and
the case oI Barney the Baby who doesn't speak yet, and the case
oI Irving the IRCBot who emits sentences but has no blood. So
iI we learn that a certain thing is not wearing clothes, that doesn't
screen oII everything that its speech capability can tell us about its
blood color. II the thing doesn't wear clothes but does talk, maybe
it's ude ellie.
This makes the case more interesting than, say, Iive integer vari
ables that are all odd or all even, but otherwise uncorrelated. In
that case, knowing any one oI the variables would screen oII ev
erything that knowing a second variable could tell us about a third
variable.
But here, we have dependencies that don't go away as soon as
we learn just one variable, as the case oI ude ellie shows. So is
it an unmanageable inIerential inconvenience:
Fear not! Ior there may be some sixth variable Z, which, iI we
knew it, really would screen oII every pair oI variables Irom each
other. There may be some variable Zeven iI we have to construct
Z rather than observing it directlysuch that:
p(u'v,w,x,y,z) p(u'z)
p(v'u,w,x,y,z) p(v'z)
p(w'u,v,x,y,z) p(w'z)
.
Perhaps, given that a thing is human", then the probabilities oI
it speaking, wearing clothes, and having the standard number oI Iin
gers, are all independent. Fred may be missing a Iinger but he is no
more likely to be a nudist than the next person, ude ellie never
wears clothes, but knowing this doesn't make it any less likely that
292 A HUMAN`S GUIDE TO WORDS
she speaks, and Baby Barney doesn't talk yet, but is not missing any
limbs.
This is called the aive Bayes" method, because it usually isn't
quite true, but pretending that it's true can simpliIy the living day
lights out oI your calculations. \e don't keep separate track oI the
inIluence oI clothedness on speech capability given Iinger number.
\e just use all the inIormation we've observed to keep track oI the
probability that this thingy is a human (or alternatively, something
else, like a chimpanzee or robot) and then use our belieIs about the
central class to predict anything we haven't seen yet, like vulnera
bility to hemlock.
Any observations oI \, V, \, X, and Y just act as evidence
Ior the central class variable Z, and then we use the posterior dis
tribution on Z to make any predictions that need making about
unobserved variables in \, V, \, X, and Y.
Sound Iamiliar: It should:
Bleggz
As a matter oI Iact, iI you use the right kind oI neural network
units, this neural network" ends up exactly, mathematically equiva
lent to aive Bayes. The central unit just needs a logistic thresh
oldan Scurve responseand the weights oI the inputs just need
to match the logarithms oI the likelihood ratios, etcetera. In Iact,
it's a good guess that this is one oI the reasons why logistic response
CONDITIONAL INDEPENDENCE, AND NAIVE BAYES 293
oIten works so well in neural networksit lets the algorithm sneak
in a little Bayesian reasoning while the designers aren't looking.
]ust because someone is presenting you with an algorithm that
they call a neural network" with buzzwords like scruIIy" and
emergent" plastered all over it, disclaiming proudly that they have
no idea how the learned network workswell, don't assume that
their little AI algorithm really is Beyond the Realms oI Logic.
For this paradigm oI adhockery , iI it works, will turn out to have
Bayesian structure
j
, it may even be exactly equivalent to an algo
rithm oI the sort called Bayesian".
Lven iI it doesn't look Bayesian, on the surIace.
And then you just know that the Bayesians are going to start
explaining exactly how the algorithm works, what underlying as
sumptions it reIlects, which environmental regularities
o
it exploits,
where it works and where it Iails, and even attaching understand
able meanings to the learned network weights.
Lisappointing, isn't it:
j. http://lesswrong.com/lw/o7/searching_for_bayesstructure/
o. Page z;,, 'Mutual InIormation, and Lensity in Thingspace'.
294 A HUMAN`S GUIDE TO WORDS
26. Words as Mental Paintbrush Handles
to consider, but:
how spectacularly silly.
This academic paradigm, I think, was mostly a deranged legacy
oI behaviorism, which denied the existence oI thoughts in humans,
and sought to explain all human phenomena as reIlex", including
. http://lesswrong.com/lw/o9/words_as_mental_paintbrush_handles/
z. Page z88, 'Conditional Independence, and aive Bayes'.
. Page 6, 'Hindsight Levalues Science'.
speech. Behaviorism probably deserves its own post at some point,
as it was a perversion oI rationalism, but this is not that post.
You call it 'silly'," you inquire, but how do you know that your
brain represents visual images: Is it merely that you can close your
eyes and see them:"
This question used to be harder to answer, back in the day oI the
controversy. II you wanted to prove the existence oI mental im
agery scientiIically", rather than just by introspection, you had to
inIer the existence oI mental imagery Irom experiments like, e.g.:
Show subjects two objects and ask them iI one can be rotated into
correspondence with the other. The response time is linearly pro
portional to the angle oI rotation required. This is easy to explain
iI you are actually visualizing the image and continuously rotating
it at a constant speed, but hard to explain iI you are just checking
propositional Ieatures oI the image.
Today we can actually neuroimage the little pictures in the visu
al cortex. So, yes, your brain really does represent a detailed image
oI what it sees or imagines. See Stephen Kosslyn's Image and Brain:
The Resolution of the Imagery Debate.
Part oI the reason people get in trouble with words, is that they
do not realize how much complexity lurks behind words.
Can you visualize a green dog": Can you visualize a cheese ap
ple":
Apple" isn't just a sequence oI two syllables or Iive letters.
That's a shadow. That's the tip oI the tiger's tail.
\ords, or rather the concepts behind them, are paintbrush
esyou can use them to draw images in your own mind. Literally
draw, iI you employ concepts to make a picture in your visual cor
tex. And by the use oI shared labels, you can reach into someone
else's mind, and grasp their paintbrushes to draw pictures in their
mindssketch a little green dog in their visual cortex.
But don't think that, because you send syllables through the air,
or letters through the Internet, it is the syllables or the letters that
draw pictures in the visual cortex. That takes some complex in
structions that wouldn't Iit in the sequence oI letters. Apple" is ,
bytes, and drawing a picture oI an apple Irom scratch would take
more data than that.
296 A HUMAN`S GUIDE TO WORDS
Apple" is merely the tag attached to the true and wordless apple
concept, which can paint a picture in your visual cortex, or collide
with cheese", or recognize an apple when you see one, or taste its
archetype in apple pie, maybe even send out the motor behavior Ior
eating an apple.
And it's not as simple as just calling up a picture Irom memory.
Or how would you be able to visualize combinations like a trian
gular lightbulb"imposing triangleness on lightbulbs, keeping the
essence oI both, even iI you've never seen such a thing in your liIe:
Lon't make the mistake the behaviorists made. There's Iar
more to speech than sound in air. The labels are just point
ers"look in memory area 8;,o". Sooner or later, when you're
handed a pointer, it comes time to dereIerence it, and actually look
in memory area 8;,o.
\hat does a word point to:
WORDS AS MENTAL PAINTBRUSH HANDLES 297
27. Variable Question Fallacies
.)
. You try to establish any sort of empirical proposition as being true
by definition. Socrates is a human, and humans, by
deIinition, are mortal. So is it a logical truth iI we
empirically predict that Socrates should keel over iI he
drinks hemlock: It seems like there are logically possible,
nonselIcontradictory worlds where Socrates doesn't keel
over where he's immune to hemlock by a quirk oI
biochemistry, say. Logical truths are true in all possible
worlds, and so never tell you which possible world you live
. http://lesswrong.com/lw/od/37_ways_that_words_can_be_wrong/
z. Page j, 'The Parable oI the Lagger'.
. Page j, 'The Parable oI Hemlock'.
in and anything you can establish by deIinition" is a
logical truth. (The Parable oI Hemlock
.)
. You unconsciously slap the conventional label on something,
without actually using the verbal definition you just gave. You
know perIectly well that Bob is human", even though, on
your deIinition, you can never call Bob human" without
Iirst observing him to be mortal. (The Parable oI
Hemlock
,
.)
,. The act of labeling something with a word, disguises a
challengable inductive inference you are making. II the last
eggshaped objects drawn have been blue, and the last 8
cubes drawn have been red, it is a matter oI induction to
say this rule will hold in the Iuture. But iI you call the
blue eggs bleggs" and the red cubes rubes", you may
reach into the barrel, Ieel an egg shape, and think Oh, a
blegg." (\ords as Hidden InIerences.
6
)
6. You try to define a word using words, in turn defined with ever-
more-abstract words, without being able to point to an example.
\hat is red:" Red is a color." \hat's a color:" It's a
property oI a thing:" \hat's a thing: \hat's a
property:" It never occurs to you to point to a stop sign
and an apple. (Lxtensions and Intensions.
;
)
;. The extension doesnt match the intension. \e aren't
consciously aware oI our identiIication oI a red light in the
sky as Mars", which will probably happen regardless oI
your attempt to deIine Mars" as The Cod oI \ar".
(Lxtensions and Intensions.
8
)
8. Your verbal definition doesnt capture more than a tiny fraction
of the categorys shared characteristics, but you try to reason as if
it does. \hen the philosophers oI Plato's Academy
claimed that the best deIinition oI a human was a
Ieatherless biped", Liogenes the Cynic is said to have
exhibited a plucked chicken and declared Here is Plato's
Man." The Platonists promptly changed their deIinition
. Page j, 'The Parable oI Hemlock'.
,. Page j, 'The Parable oI Hemlock'.
6. Page j;, '\ords as Hidden InIerences'.
;. Page zoo, 'Lxtensions and Intensions'.
8. Page zoo, 'Lxtensions and Intensions'.
302 A HUMAN`S GUIDE TO WORDS
to a Ieatherless biped with broad nails". (Similarity
Clusters.
j
)
j. You try to treat category membership as all-or-nothing, ignoring
the existence of more and less typical subclusters. Lucks and
penguins are less typical birds than robins and pigeons.
Interestingly, a betweengroups experiment showed that
subjects thought a dis ease was more likely to spread Irom
robins to ducks on an island, than Irom ducks to robins.
(Typicality and Asymmetrical Similarity.
o
)
o. A verbal definition works well enough in practice to point out the
intended cluster of similar things, but you nitpick exceptions. ot
every human has ten Iingers, or wears clothes, or uses
language, but iI you look Ior an empirical cluster oI things
which share these characteristics, you'll get enough
inIormation that the occasional nineIingered human
won't Iool you. (The Cluster Structure oI Thingspace
.)
. You ask whether something is or is not a category member but
cant name the question you really want answered. \hat is a
man": Is Barney the Baby Boy a man": The correct"
answer may depend considerably on whether the query
you really want answered is \ould hemlock be a good
thing to Ieed Barney:" or \ill Barney make a good
husband:" (Lisguised Queries.
z
)
z. You treat intuitively perceived hierarchical categories like the
only correct way to parse the world, without realizing that other
forms of statistical inference are possible even though your brain
doesnt use them. It's much easier for a human to notice
whether an object is a blegg" or rube", than for a human
to notice that red objects never glow in the dark, but red
Iurred objects have all the other characteristics oI bleggs.
Other statistical algorithms work diIIerently. (eural
Categories.
)
. You talk about categories as if they are manna fallen from the
Platonic Realm, rather than inferences implemented in a real
brain. The ancient philosophers said Socrates is a man",
j. Page zo, 'Similarity Clusters'.
o. Page zo6, 'Typicality and Asymmetrical Similarity'.
. Page zoj, 'The Cluster Structure oI Thingspace'.
z. Page z, 'Lisguised Queries'.
. Page z;, 'eural Categories'.
37 WAYS THAT WORDS CAN BE WRONG 303
not, My brain perceptually classiIies Socrates as a match
against the 'human' concept". (How An Algorithm Feels
From Inside.
)
. You argue about a category membership even after screening off
all questions that could possibly depend on a category-based
inference. AIter you observe that an object is blue, egg
shaped, Iurred, Ilexible, opaque, luminescent, and
palladiumcontaining, what's left to ask by arguing, Is it a
blegg:" But iI your brain's categorizing neural network
contains a (metaphorical) central unit corresponding to
the inIerence oI bleggness, it may still feel like there's a
leItover question. (How An Algorithm Feels From
Inside.
,
)
,. You allow an argument to slide into being about definitions, even
though it isnt what you originally wanted to argue about. II,
beIore a dispute started about whether a tree Ialling in a
deserted Iorest makes a sound", you asked the two soon
tobe arguers whether they thought a sound" should be
deIined as acoustic vibrations" or auditory experiences",
they'd probably tell you to Ilip a coin. Only aIter the
argument starts does the deIinition oI a word become
politically charged. (Lisputing LeIinitions.
6
)
6. You think a word has a meaning, as a property of the word itself;
rather than there being a label that your brain associates to a
particular concept. \hen someone shouts, Yikes! A
tiger!", evolution would not Iavor an organism that thinks,
Hm. I have just heard the syllables 'Tie' and 'Crr' which
my Iellow tribemembers associate with their internal
analogues oI my own tiger concept and which aiiieeee
CR\CH CR\CH C\LP." So the brain takes a
shortcut, and it seems that the meaning oI tigerness is a
property oI the label itselI. People argue about the correct
meaning oI a label like sound". (Feel the Meaning.
;
)
;. You argue over the meanings of a word, even after all sides
understand perfectly well what the other sides are trying to say.
The human ability to associate labels to concepts is a tool
. Page zz, 'How An Algorithm Feels From Inside'.
,. Page zz, 'How An Algorithm Feels From Inside'.
6. Page zz8, 'Lisputing LeIinitions'.
;. Page z, 'Feel the Meaning'.
304 A HUMAN`S GUIDE TO WORDS
Ior communication. \hen people want to communicate,
we're hard to stop, iI we have no common language, we'll
draw pictures in sand. \hen you each understand what is
in the other's mind, you are done. (The Argument From
Common \sage.
8
)
8. You pull out a dictionary in the middle of an empirical or moral
argument. Lictionary editors are historians oI usage, not
legislators oI language. II the common deIinition contains
a problem iI Mars" is deIined as the Cod oI \ar, or a
dolphin" is deIined as a kind oI Iish, or egroes" are
deIined as a separate category Irom humans, the
dictionary will reIlect the standard mistake. (The
Argument From Common \sage.
j
)
j. You pull out a dictionary in the middle of any argument ever.
Seriously, what the heck makes you think that dictionary
editors are an authority on whether atheism" is a
religion" or whatever: II you have any substantive issue
whatsoever at stake, do you really think dictionary editors
have access to ultimate wisdom that settles the argument:
(The Argument From Common \sage.
zo
)
zo. You defy common usage without a reason, making it gratuitously
hard for others to understand you. Fast stand up plutonium,
with bagels without handle. (The Argument From
Common \sage.
z
)
z. You use complex renamings to create the illusion of inference. Is a
human" deIined as a mortal Ieatherless biped": Then
write: All |mortal Ieatherless bipeds] are mortal, Socrates
is a |mortal Ieatherless biped], thereIore, Socrates is
mortal." Looks less impressive that way, doesn't it:
(Lmpty Labels.
zz
)
zz. You get into arguments that you could avoid if you just didnt use
the word. II Albert and Barry aren't allowed to use the
word sound", then Albert will have to say A tree Ialling
in a deserted Iorest generates acoustic vibrations", and
Barry will say A tree Ialling in a deserted Iorest generates
8. Page z8, 'The Argument Irom Common \sage'.
j. Page z8, 'The Argument Irom Common \sage'.
zo. Page z8, 'The Argument Irom Common \sage'.
z. Page z8, 'The Argument Irom Common \sage'.
zz. Page zz, 'Lmpty Labels'.
37 WAYS THAT WORDS CAN BE WRONG 305
no auditory experiences". \hen a word poses a problem,
the simplest solution is to eliminate the word and its
synonyms. (Taboo Your \ords.
z
)
z. The existence of a neat little word prevents you from seeing the
details of the thing youre trying to think about. \hat actually
goes on in schools once you stop calling it education":
\hat's a degree, once you stop calling it a degree": II a
coin lands heads", what's its radial orientation: \hat is
truth", iI you can't say accurate" or correct" or
represent" or reIlect" or semantic" or believe" or
knowledge" or map" or real" or any other simple term:
(Replace the Symbol with the Substance.
z
)
z. You have only one word, but there are two or more different
things-in-reality, so that all the facts about them get dumped into
a single undifferentiated mental bucket. It's part oI a
detective's ordinary work to observe that Carol wore red
last night, or that she has black hair, and it's part oI a
detective's ordinary work to wonder iI maybe Carol dyes
her hair. But it takes a subtler detective to wonder iI
there are two Carols, so that the Carol who wore red is
not the same as the Carol who had black hair. (Fallacies
oI Compression.
z,
)
z,. You see patterns where none exist, harvesting other
characteristics from your definitions even when there is no
similarity along that dimension. In ]apan, it is thought that
people oI blood type A are earnest and creative, blood
type Bs are wild and cheerIul, blood type Os are agreeable
and sociable, and blood type ABs are cool and controlled.
(Categorizing Has Consequences.
z6
)
z6. You try to sneak in the connotations of a word, by arguing from a
definition that doesnt include the connotations. A wiggin" is
deIined in the dictionary as a person with green eyes and
black hair. The word wiggin" also carries the
connotation oI someone who commits crimes and
launches cute baby squirrels, but that part isn't in the
dictionary. So you point to someone and say: Creen
z. Page z,, 'Taboo Your \ords'.
z. Page z8, 'Replace the Symbol with the Substance'.
z,. Page z,, 'Fallacies oI Compression'.
z6. Page z,;, 'Categorizing Has Consequences'.
306 A HUMAN`S GUIDE TO WORDS
eyes: Black hair: See, told you he's a wiggin! \atch,
next he's going to steal the silverware." (Sneaking in
Connotations.
z;
)
z;. You claim X, by definition, is a Y! On such occasions youre
almost certainly trying to sneak in a connotation of Y that wasnt
in your given definition. You deIine human" as a
Ieatherless biped", and point to Socrates and say, o
Ieathers two legs he must be human!" But what you
really care about is something else, like mortality. II what
was in dispute was Socrates's number oI legs, the other
Iellow would just reply, \haddaya mean, Socrates's got
two legs: That's what we're arguing about in the Iirst
place!" (Arguing By LeIinition".
z8
)
z8. You claim Ps, by definition, are Qs! II you see Socrates out
in the Iield with some biologists, gathering herbs that
might conIer resistance to hemlock, there's no point in
arguing Men, by deIinition, are mortal!" The main time
you Ieel the need to tighten the vise by insisting that
something is true by deIinition" is when there's other
inIormation that calls the deIault inIerence into doubt.
(Arguing By LeIinition".
zj
)
zj. You try to establish membership in an empirical cluster by
definition. You wouldn't Ieel the need to say, Hinduism,
by definition, is a religion!" because, well, oI course
Hinduism is a religion. It's not just a religion by
deIinition", it's, like, an actual religion. Atheism does not
resemble the central members oI the religion" cluster, so
iI it wasn't Ior the Iact that atheism is a religion by
definition, you might go around thinking that atheism
wasnt a religion. That's why you've got to crush all
opposition by pointing out that Atheism is a religion" is
true by definition, because it isn't true any other way.
(Arguing By LeIinition".
o
)
o. Your definition draws a boundary around things that dont really
belong together. You can claim, iI you like, that you are
defining the word Iish" to reIer to salmon, guppies, sharks,
z;. Page z6o, 'Sneaking in Connotations'.
z8. Page z6, 'Arguing By LeIinition"'.
zj. Page z6, 'Arguing By LeIinition"'.
o. Page z6, 'Arguing By LeIinition"'.
37 WAYS THAT WORDS CAN BE WRONG 307
dolphins, and trout, but not jellyIish or algae. You can
claim, iI you like, that this is merely a list, and there is no
way a list can be wrong". Or you can stop playing nitwit
games and admit that you made a mistake and that
dolphins don't belong on the Iish list. (\here to Lraw
the Boundary:
)
. You use a short word for something that you wont need to
describe often, or a long word for something youll need to describe
often. This can result in inefficient thinking, or even
misapplications of Occams Razor, if your mind thinks that short
sentences sound simpler. \hich sounds more plausible,
Cod did a miracle" or A supernatural universecreating
entity temporarily suspended the laws oI physics":
(Lntropy, and Short Codes.
z
)
z. You draw your boundary around a volume of space where there
is no greater-than-usual density, meaning that the associated
word does not correspond to any performable Bayesian inferences.
Since greeneyed people are not more likely to have black
hair, or vice versa, and they don't share any other
characteristics in common, why have a word Ior wiggin":
(Mutual InIormation, and Lensity in Thingspace.
)
. You draw an unsimple boundary without any reason to do so.
The act oI deIining a word to reIer to all humans, except
black people, seems kind oI suspicious. II you don't
present reasons to draw that particular boundary, trying to
create an arbitrary" word in that location is like a
detective saying: \ell, I haven't the slightest shred oI
support one way or the other Ior who could've murdered
those orphans. but have we considered ]ohn Q.
\iIIleheim as a suspect:" (Superexponential
Conceptspace, and Simple \ords.
)
. You use categorization to make inferences about properties that
dont have the appropriate empirical structure, namely,
conditional independence given knowledge of the class, to be well-
approximated by Naive Bayes. o way am I trying to
. Page z6;, '\here to Lraw the Boundary:'.
z. Page z;, 'Lntropy, and Short Codes'.
. Page z;,, 'Mutual InIormation, and Lensity in Thingspace'.
. Page z8z, 'Superexponential Conceptspace, and Simple \ords'.
308 A HUMAN`S GUIDE TO WORDS
summarize this one. ]ust read the blog post. (Conditional
Independence, and aive Bayes.
,
)
,. You think that words are like tiny little LISP symbols in your
mind, rather than words being labels that act as handles to direct
complex mental paintbrushes that can paint detailed pictures in
your sensory workspace. Visualize a triangular lightbulb".
\hat did you see: (\ords as Mental Paintbrush
Handles.
6
)
6. You use a word that has different meanings in different places as
though it meant the same thing on each occasion, possibly creating
the illusion of something protean and shifting. Martin told
Bob the building was on his leIt." But leIt" is a Iunction
word that evaluates with a speakerdependent variable
grabbed Irom the surrounding context. \hose leIt" is
meant, Bob's or Martin's: (Variable Question Fallacies.
;
)
;. You think that definitions cant be wrong, or that I can define
a word any way I like! This kind oI attitude teaches you to
indignantly deIend your past actions, instead oI paying
attention to their consequences, or Iessing up to your
mistakes. (; \ays That Suboptimal \se OI Categories
Can Have egative Side LIIects On Your Cognition.
8
)
Lverything you do in the mind has an eIIect, and your brain
races ahead unconsciously without your supervision.
Saying \ords are arbitrary, I can deIine a word any way I like"
makes around as much sense as driving a car over thin ice with the
accelerator Iloored and saying, Looking at this steering wheel, I
can't see why one radial angle is special so I can turn the steering
wheel any way I like."
II you're trying to go anywhere, or even just trying to survive,
you had better start paying attention to the three or six dozen
optimality criteria that control how you use words, deIinitions, cat
egories, classes, boundaries, labels, and concepts.
,. Page z88, 'Conditional Independence, and aive Bayes'.
6. Page zj,, '\ords as Mental Paintbrush Handles'.
;. Page zj8, 'Variable Question Fallacies'.
8. Page o, '; \ays That \ords Can Be \rong'.
37 WAYS THAT WORDS CAN BE WRONG 309